# ambiguityaware_abductive_learning__6817106e.pdf

Ambiguity-Aware Abductive Learning

Hao-Yuan He 1 2 Hui Sun 1 2 Zheng Xie 1 Ming Li 1 2

Abductive Learning (ABL) is a promising framework for integrating sub-symbolic perception and logical reasoning through abduction. In this case, the abduction process provides supervision for the perception model from the background knowledge. Nevertheless, this process naturally contains uncertainty, since the knowledge base may be satisfied by numerous potential candidates. This implies that the result of the abduction process, i.e., a set of candidates, is ambiguous; both correct and incorrect candidates are mixed in this set. The prior art of Abductive Learning selects the candidate that has the minimal inconsistency of the knowledge base. However, this method overlooks the ambiguity in the abduction process and is prone to error when it fails to identify the correct candidates. To address this, we propose Ambiguity-Aware Abductive Learning (A3BL), which evaluates all potential candidates and their probabilities, thus preventing the model from falling into sub-optimal solutions. Both experimental results and theoretical analyses prove that A3BL markedly enhances ABL by efficiently exploiting the ambiguous abduced supervision.

1. Introduction

Currently, machine learning methods are achieving significant success in perception (Krizhevsky et al., 2012; Vaswani et al., 2017). However, real-world learning tasks usually require not only the perception ability but also the logical reasoning ability (Kahneman, 2011). To address the limitations of current machine learning methods, the next generation of Artificial Intelligence calls for the integration of data-driven machine learning and knowledge-driven symbolic reasoning (Zhou, 2019).

1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 2School of Artificial Intelligence, Nanjing University, Nanjing, China. Correspondence to: Ming Li <lim@lamda.nju.edu.cn>.

Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s).

Abductive Learning (ABL) (Zhou, 2019; Dai et al., 2019) represents a novel framework seamlessly integrating machine learning systems with logical reasoning systems. Within this framework, the machine learning model is trained to transform raw input data, e.g., images and text, into sub-symbolic concepts; concurrently, the logical model is designed to conduct reasoning based on these concepts. Logical reasoning, facilitated through abductive reasoning (Magnani, 2009), is employed to identify the accurate concepts of unlabeled instances; these identified concepts are then utilized to update the machine learning model through supervised learning.

Consider the following illustrative example: the digit equation SUM(

) = 3 is presented, accompanied by the background knowledge that it represents a digit addition task. Initially, the machine learning model identifies

= 1. However, upon logical inference, it is deduced that 1 + 1 = 3. Nonetheless, a plausible explanation is proposed:

= 2, aligning with the background knowledge and thereby validating the equation.

Although the above procedure seems feasible, ambiguity still persists. The term ambiguity in this context means that the abduction process yields not only the correct result but also other validate candidates (Magnani, 2009), such as [

= 0], which are reasonable hypotheses given the existing background knowledge. The use of such ambiguous supervision derived from abductive reasoning brings significant challenges to model training processes. To address this issue, prior studies (Dai et al., 2019) have selected the nearest candidate (e.g., in terms of Hamming distance) to the model s prediction as pseudo-labels for learning. In this case, the accurate identification of candidates depends heavily on the performance of the perception model. When training the perception model from scratch, the initial recognitions are often unreliable, which means that the closest candidate of the abduction results may not be the correct one. Further training on these selected candidates as supervision data may lead to the perception model becoming entrapped in a sub-optimal state. Therefore, it still remains a challenge to properly utilize the ambiguity of abduction candidates in the ABL framework.

To address the ambiguity in the abduction results, this paper introduces the concept of Ambiguity-Aware Abductive

Ambiguity-Aware Abductive Learning

Learning, abbreviated as A3BL. Rather than considering a single candidate at once, this work focuses on assessing all potential abduction candidates. Specifically, A3BL employs an Expectation-Maximization (EM) algorithm for optimization purposes. Initially, A3BL assigns a weight to each candidate, derived from the machine learning model. Utilizing these weights, A3BL converts the ambiguous outcomes of abductive reasoning into instance-level class probability distributions. Then A3BL utilizes a uniquely formulated ambiguity-aware abductive loss function. Following the optimization of this loss function, the machine learning model updates the weight of each candidate, thereby revising the instance-level probability distribution for subsequent optimization steps. Through the iterative optimization of the EM algorithm, which guarantees convergence, A3BL effectively maximize the consistency between the perception model and the knowledge base. Both experimental outcomes and theoretical analyses substantiate that our approach significantly enhances ABL by efficiently leveraging ambiguous abduced supervision.

The contributions of this study are outlined as follows:

Identification and articulation of the ambiguity issue within the ABL framework, a challenge that stems from the inherent ambiguity in the abduction process.

Proposal of A3BL as a novel solution to mitigate this issue, accompanied by a comprehensive theoretical analysis with the establishment of an error bound to substantiate the effectiveness of A3BL.

Empirical studies present the superior performance of the proposed method A3BL.

The remainder of this paper is organized as follows: Section 2 introduces the majority of the related works pertinent to this paper. Then, Section 3 presents the preliminaries and the proposed method A3BL, including the theoretical analysis. After that, Section 4 details empirical studies conducted to support our claims and verify the performance of A3BL. Finally, Section 5 provides the conclusion and additional discussion of this paper.

2. Related Works

Neuro-Symbolic Learning Researchers have made attempts to combine neural networks and symbolic reasoning in order to achieve a more comprehensive form of Artificial Intelligence several decades ago (Towell & Shavlik, 1994; Sun, 1994; Garcez et al., 2002). Pioneering efforts have been made to convert logic rules into loss functions, such as the development of Semantic Loss (Xu et al., 2018). This approach utilizes probabilistic logic to transform logic rules into loss functions. Another notable method is Semantic Based Regularization (Roychowdhury et al., 2021), which

employs fuzzy logic to achieve the transformation of logic rules into loss functions. However, approximating logical reasoning cannot replace a true logical engine, and problems may arise when attempting to approximate discrete logical calculations (van Krieken et al., 2022; He et al., 2024). Recently, there have been advancements in this area by employing hybrid-system, including Deep Prob Log (Manhaeve et al., 2018; 2021), which integrates deep neural networks with problog (De Raedt et al., 2007) using a probabilistic logic approach. Another approach is Neur ASP (Yang et al., 2020), which is similar to Deep Prob Log but employs Answer Set Programming (ASP) (Dimopoulos et al., 1997) instead of problog. Additionally, Deep Stoch Log (Winters et al., 2022) is a related approach to Deep Prob Log, but it enhances computation speed by utilizing a stochastic logic approach.

Abductive Learning Abduction (Magnani, 2009) refers to the process of selectively inferring certain facts and hypotheses that explain phenomena and observations based on background knowledge. It has been a recurring topic of interest in the field of AI, as attempts have been made to integrate it with symbolic induction (Muggleton & Bryant, 2000; Mooney, 2000). Abductive Learning (ABL) (Dai et al., 2019; Zhou, 2019) aims to leverage learning and abduction in a mutually beneficial loop, presenting a novel paradigm for integrating machine learning and logical reasoning within a unified framework. The ABL framework is renowned for its expressive and flexible nature, as it can be applied to both labeled and unlabeled data with an appropriate knowledge base. Dai & Muggleton (2021) enhance the ABL framework s ability to induce knowledge from the raw data, the optimization builds upon the EM algorithm. Furthermore, ABL has been applied in various practical tasks, including theft judicial sentencing (Huang et al., 2020), stroke evaluation (Wang et al., 2021), optical character recognition (Cai et al., 2021), and historical document segmentation and recognition (Gao et al., 2024). Recently, Huang et al. released an open library for ABL, which is easy to use. Yang et al. (2024) worked on scenarios where the given knowledge base does not fit well with the ground-truth background knowledge. Despite its achievements in multiple applications, ABL still faces the challenge of the cold-start problem (Tao et al., 2024). This problem arises when it becomes difficult to abduce the correct candidate, particularly when the machine learning model is trained from scratch. This ambiguity stems from the the abduction process (Magnani, 2009), especially during the initial learning phase of the learning model.

Weakly Supervised Learning Our method draws inspiration from commonly employed weighting techniques in the field of weakly supervised learning (WSL) (Zhou, 2017). WSL aims to learn from imperfect supervision, which en-

Ambiguity-Aware Abductive Learning

compasses various approaches such as learning from noisy labels (Natarajan et al., 2013), multi-instance learning (Dietterich et al., 1997), multi-label learning (Zhang & Zhou, 2014), and partial label learning (Cour et al., 2011). WSL has also achieved considerable success in other applications, such as natural language process (Artzi & Zettlemoyer, 2013); object detection (Zhang et al., 2022); AUC optimization (Xie et al., 2024); and so on. The ABL framework can be viewed as expanding the domain of WSL (Zhou & Huang, 2022), where the supervision information can come from knowledge reasoning. Within the ABL framework, it is possible to get an effective model even when there is an insufficient amount of labeled or unlabeled data, as long as high-quality knowledge is available. Recently, Wang et al. (2023) investigated a typical case of neuro-symbolic system, from a multi-instance weak supervision perspective. Tao et al. (2024) analyzed the cold-start problem of ABL by adopting a perspective rooted in noisy-label learning. Both of these works focused on the theoretical aspects of the field without proposing any specific algorithms. Distinguishing itself from previous studies, our work specifically focuses on the ambiguity of abduction within the ABL framework. A3BL extend the ABL by enhancing stable training and promoting fast convergence, while also providing a promising theoretical analysis.

3. Ambiguity-Aware Abductive Learning

In this section, we begin by introducing the preliminaries of the problem setting. Subsequently, we discuss the optimization objectives of the previous ABL methods and examine their limitations. Following this, we presented the primary contributions of this paper, namely the Ambiguity-Aware Abductive Learning method, short for A3BL. Finally, we provide a theoretical analysis of A3BL, establishing an error bound and drawing connections between our optimization process and the Expectation-Maximization (EM) algorithm. To the best of our knowledge, this is the first attempt to provide an error bound analysis within the ABL framework.

3.1. Problem Setting

This study follows the common setting of the ABL. The ABL framework consists of a perception model and a reasoning model, e.g., a knowledge base. The perception model, denoted as f : X 7 Z, maps an instance x from the input space X to a label z in the symbol space Z = {1, , L}. The knowledge base, denoted as KB, is comprised of rules defined over a sequence of instances x = (x1, , xm) X m, where m denotes the sequence length. The corresponding labels of the instance sequence are denoted by z = (z1, , zm) Zm. To clarify, in some cases, f may represent a mapping from X to a distribution over Z. Additionally, when f processes a se-

quence input x, it correspondingly outputs a sequence. Though not knowing the sequence of labels z of x, we have some indirect information y Y, the target label, such that z KB |= y. If the target label is determined by z given the knowledge base KB, this process can be referred as a logical forward function σ( ), indicating that σ(z) = y. The overall training set with N sequences can be denoted as ( ˆ X m, ˆY), which is drawn from the distribution (X m, Y). Here, ˆ X m = {x(1), , x(N)}, and ˆY = {y(1), , y(N)}.

In a typical Abductive Learning process, the machine learning model can not access the sequence labels z during training, but it can access the knowledge base for abductive reasoning. When provided with the input x, the model outputs z = ( z1, , zm). If the prediction z with the knowledge base KB entails a wrong target label y = y, the resaoner immediately knows that z is incorrect. Following the abduction process, the resaoner could restrict the possible concept sequence z within a candidate set s Zm. We provide an example below to facilitate a better understanding of the notations mentioned above.

Example 3.1. The input data x = (x1, x2) X 2, where X is a digit image space (e.g., MNIST). Symbol space Z = {0, 1, , 9}, target label space Y = {0, 1, , 18}. The logical forward function σ( , ) : X 2 7 Y = Sum( , ). If target label y = 2 and the logical forward result σ( z) = 2, then the knowledge base will abduce candidate set s = {(0, 2), (1, 1), (2, 0)}.

The learning system of ABL is designed for the perception tasks. In this study, we specifically focus on the classification task, which involves minimizing the risk associated with classification errors. For convenience, we can equivalently rewrite the risk at the sequence level as follows:

Definition 3.1 (Classification risk). The objective of multiclass classification is to train a multi-class classifier that minimizes the classification risk defined as follows:

R(f; L) = Ep(x,z)[L(f(x), z)]. (1)

Here, the loss function L( , ) represents the aggregation of classification errors for each instance in the sequence x. Mathematically, it can be expressed as:

i=1 Lcls(f(xi), zi)

where Lcls typically denotes a classification loss function.

3.2. Previous Abductive Learning

To minimize the aforementioned classification risk, prior Abductive Learning methods basically solve an empirical

Ambiguity-Aware Abductive Learning

risk minimization problem which can be formalized as:

min f F 1 N

i=1 L(f(x(i)), z(i))

s.t. z(i) = arg min c s(i) Score(c, f(x(i))), i [N],

where s(i) is the abduced candidate set of f(x(i)), that is to say s(i) = {c|c KB |= y(i)}. The term Score(c, f(x(i))) is used to quantify how likely the candidate c is incorrect based on the model s prediction f(x(i)).

To implement this function, different measures can be used. For instance, Dai et al. (2019) use the Hamming distance (Hamming, 1950) as the score function:

Score(c, f(x(i))) = Hamming(c, f(x(i))). (4)

This scoring function tries to use the candidate c that has most of the same labels as the prediction f(x(i)). Additionally, it is feasible to extend this approach to include the confidence provided by the model s prediction. This extension is implemented in the official package1 as follows:

Score(c, f(x(i))) = 1

j=1 f(x(i))cj. (5)

In this context, f( ) represents the model s predicted distribution over Z. Additionally, f(x(i))cj is the model s estimated probability of the class cj given the input x(i) j .

3.3. Ambiguity-Aware Abductive Loss

As discussed above, the abductive reasoning process cannot eliminate all incorrect predictions, hence the candidate set cannot be used as an accurate supervision. Previous research attempts to select one assignment of the sequence labels with minimal difference to the predictions (Equation 4) or the maximal probability according to the model s confidence (Equation 5). However, the selected assignment is prone to error and could lead the model astray. To prevent this from happening, such methods have to require a number of instance-level supervision, i.e., Semi-Supervised Abductive Learning (Huang et al., 2020) or assume an initial model performance, i.e., pretrain the model (Dai et al., 2019). Such requirements further limit the application of the methods. The root cause of the learner being trapped in an incorrect situation stems from the ambiguity of the abduced result: once the model selects an incorrect label to learn, its incorrect perception can be strengthened in the subsequent training process, leading to a vicious cycle. Thus, it is critical to be aware that any possible candidate could be the correct one.

1https://github.com/Abductive Learning/ABLKit

To achieve this, we propose Ambiguity-Aware Abductive Learning, or A3BL for short. A3BL initially transforms the ambiguous abduction results into instance-level class probability distributions. Subsequently, it utilizes a novel ambiguity-aware abductive loss to enable the model to learn classification from these class probability distributions. By doing this, A3BL can fully leverage the ambiguous abductive results to facilitate learning.

Suppose the model predicts the labels of an instance sequence x = (x1, , xm) to be z = ( z1, , zm). Through abductive reasoning, it is realized that the candidate set of potentially correct label sequences should be s = {ci|ci KB |= y}. The probability of a candidate label sequence to be true can be formalized as:

p(z = ci|x) = Y

1 j m p(zj = cij|xj) . (6)

Due to the fact that all impossible label sequences are ruled out by the abduction, the posterior probability of a candidate label sequence ci to be all correct, given the candidate set s, can be obtained by:

p(z = ci|x, s) = p(z = ci|x) P

c s p(z = c|x) . (7)

Then, for an instance xj appeared in the sequence x, its probability of being k-th class should be:

p(z = k|xj, s) = X

ci s I[k = cij]p(z = ci|x, s) . (8)

By this equation, we obtain the class probability distribution of any instance xj that occurred in the sequence x, which can be used as a supervision for the model training.

The next key step is to train a classifier based on the class distribution p(z|x, s) = [p(z = 1|x, s), , p(z = L|x, s)] . To make the model learn from this ambiguous supervision, we minimize the difference between the model s output and the probability distribution:

j=1 Lcls f(x(i) j ), p(z|x(i) j , s(i)) , (9)

where the Lcls is the cross entropy loss. Here we denote s(i)

as the candidate set abduced from x(i). In fact, the above optimization object is equivalent to the empirical risk below:

c s(i) p(c|x(i), s(i)) L(f(x(i)), c). (10)

The term p(c|x(i), s(i)) is provided by the perception model f. This term represents the weight assigned to candidate

Ambiguity-Aware Abductive Learning

c, considering the candidate set s(i) and the sequence x(i). The equivalence of the above two equations can be easily validated by expanding both of them, and the proof can be seen in Appendix A. We name Equation (10) as ˆRA3, representing empirical ambiguity-aware abductive risk. By optimizing this risk, each potential candidate is evaluated with distinct weights, enabling A3BL to more effectively utilize ambiguous abduction results.

3.4. Theoretical Analysis

In this section, we provide a theoretical analysis of A3BL. Initially, we demonstrate that optimizing Equation (10) can be interpreted as a process of maximizing the log-likelihood PN i=1 log pθ(x(i), y(i)), via the Expectation-Maximization (EM) algorithm (Dempster et al., 1977), where θ denotes the parameters to be optimized. Additionally, we present an estimation error bound for Equation (10). Detailed proofs are available in Appendix A.

Theorem 3.2. The optimization of Equation (10) is actually optimizing the log-likelihood PN i=1 log pθ(x(i), y(i)), via the EM algorithm.

Proof Sketch. For simplicity, consider a single sample x(i), y(i) . By Jensen s inequality, the log-likelihood log p(x(i), y(i)) is lower bounded by:

log pθ(x(i), y(i)) X

c s(i) wc log(pθ(x(i), c)

where the wc [0, 1] and P

c s(i) wc = 1 is a coefficient. The equality holds when

ck, cj s(i), pθ(x(i), ck)

wck = pθ(x(i), cj)

It turns out that this equality holds when wc = p(z = c|x(i), s(i)). Thus, the E-step corresponds to setting wc = p(z = c|x(i), s(i)), and then the M-step involves optimizing the ambiguity-aware abductive risk.

The log-likelihood PN i=1 log pθ(x(i), y(i)) quantifies the consistency between the model s predictions and the underlying knowledge base. In this context, optimizing Equation (10) is to maximize the consistency between the machine learning model and the knowledge base. Next, we will establish an estimation error bound to analyze the difference between the estimation error of ˆf A3 and the optimal classifier f . Here, f is defined as the classifier that minimizes the risk R(f) over the function space F, while ˆf A3 is the empirical risk minimizer of ˆRA3(f) over the same function space F. To accomplish this, we need to define a class of real functions Fi (Maurer, 2016), and then F = i [K]Fi represents the K-valued function space, where K = Lm.

Thus, it can be observed that as m increases, the complexity of the learning task increases too.

Theorem 3.3 (Error bound). Suppose that L(f(x), z) is ρ-Lipschitz with respect to f(x) for all z Zm and upperbounded by M = supf F,x X m,z Zm L(f(x), z). Let Rn(Fi) be the Rademacher complexity of Fi with sample size n. Then for any δ > 0, with probability at least 1 δ,

R( ˆf A3) R(f ) 4

i=1 Rn(Fi) + 2M

As n , it follows that Rn(Fi) 0 for all parametric models with a bounded norm, such as deep networks trained with weight decay (Lu et al., 2019). Therefore, Theorem 3.3 demonstrates that ˆf A3 converges to f as the number of training data tends to infinity, indicating that the learning process is consistent in an asymptotic sense.

4. Empirical Study

In this section, we conduct experiments to verify our claims and validate the superior performance of A3BL. Specifically, the experiments focus on two challenging tasks in the neurosymbolic field: Digit Addition and Handwritten Formula Recognition. To ensure reproducibility, all experiments are repeated five times, each with a different random seed. The same backbone is used for all baselines; for further details, please refer to the appendix.

4.1. Settings of Digit Addition

Manhaeve et al. (2018) proposed the Digit Addition task, which is based on standard addition rules. The training data for this task is presented in the form SUM(

) = 3. Building upon this concept, we have expanded the task to incorporate four distinct datasets: MNIST (Deng, 2012), KMNIST (Clanuwat et al., 2018), CIFAR10 (Krizhevsky, 2009), and SVHN (Netzer et al., 2011). Each dataset consists of 10 classes, with the class indices representing digits from 0 to 9. In line with the work of Winters et al. (2022), to increase the task s complexity, we have extended the range of digit-size n from 1 to 4, e.g., SUM(

) = 33. When the size of the digits increases, the size of the abduced candidates set correspondingly expands, and so does the complexity of the background knowledge.

Compared Methods We conducted a comparative analysis of our method by contrasting it with several prominent hybrid system approaches. These methods include Deep Prob Log (Manhaeve et al., 2021), which integrates deep neural networks with Prob Log (De Raedt et al., 2007) through a probabilistic logic framework. Another noteworthy method is Neur ASP (Yang et al., 2020), akin

Ambiguity-Aware Abductive Learning

0 10 20 30 40 50

0 10 20 30 40 50

0 10 20 30 40 50

0 10 20 30 40 50

A3BL ABL-hamming ABL-conf Deepstochlog Deepproblog Neur ASP

(a) Addition-SVHN

0 10 20 30 40 50

0 10 20 30 40 50

0 10 20 30 40 50

0 10 20 30 40 50

A3BL ABL-hamming ABL-conf Deepstochlog Deepproblog Neur ASP

(b) Addition-CIFAR10

Figure 1. Performance curves (accuracy vs. epochs) of compared methods on digit addition tasks. The shaded area represents the standard error of the methods over 5 repetitions. Cross marks denote experiments that were not finished within 48 hours.

0 10 20 30 40 50

0 10 20 30 40 50

0 10 20 30 40 50

A3BL ABL-hamming ABL-conf Deepstochlog Deepproblog NGS-BS NGS-RL NGS-MAPO A3BL ABL-hamming ABL-conf Deepstochlog Deepproblog NGS-BS NGS-RL NGS-MAPO

Figure 2. Performance curves (accuracy vs. epochs) of compared methods on handwritten formula recognition tasks. The shaded area represents the standard error of the methods over 5 repetitions. Cross marks denote experiments that were not finished within 48 hours.

to Deep Prob Log but leveraging answer set programming (ASP) (Dimopoulos et al., 1997) in place of Prob Log. Deep Stoch Log (Winters et al., 2022) represents a related strategy to Deep Prob Log, distinguishing itself by enhancing computational speed via a stochastic logic methodology. Additionally, our analysis encompassed various implementations of ABL. The ABL version utilizing the score function Equation (4) was referred to as ABL-hamming, and

the one employing Equation (5) was termed ABL-conf. In our experimental setup, all methods utilize a consistent perception model for fairness in comparison. For the MNIST and KMNIST datasets, the selected perception model is Le Net (Le Cun & Bengio, 1998). For the CIFAR10 and SVHN datasets, we employ Res Net50 (He et al., 2016). All these perception models are trained from scratch, and all compared methods were evaluated based on their official

Ambiguity-Aware Abductive Learning

implementations. The results for the Addition-CIFAR10 and Addition-SVHN tasks are depicted in Figure 1. Owing to space limitations, the results for the Addition-MNIST and Addition-KMNIST tasks are provided in Appendix C.

4.2. Settings of Handwritten Formula Recognition

Li et al. (2020) introduced the Handwritten Formula Recognition (HWF) task, which is based on the CROHME 2019 Offline Handwritten Formula Recognition Task2. The HWF dataset, derived from this task, encompasses training data composed of equations with various lengths and their corresponding evaluation results. The equations in the dataset have lengths in the set {1, 3, 5, 7}. In contrast to prior approaches, we enhance the task s difficulty by exclusively considering equations whose lengths are greater than or equal to 5. Furthermore, to augment the task s perceptual difficulty, we incorporate CIFAR10 (Krizhevsky, 2009) and SVHN (Netzer et al., 2011), thereby introducing the HWFCIFAR and HWF-SVHN variants, respectively.

Compared Methods This study s selected comparative methods include the Neural-Grammar-Symbolic model (NGS) (Li et al., 2020), which adopts an approach akin to the ABL model by integrating symbolic reasoning with neural networks. NGS utilizes context-free grammar as its knowledge base and employs Markov Chain Monte Carlo (MCMC) sampling for candidate exploration in the solution space. We implemented various NGS variants as proposed by Li et al. (2020), including NGS-BS (employing backsearch), NGS-RL (utilizing REINFORCE algorithm), and NGS-MAPO (with Memory Augmented Policy Optimization). Methods including Deep Prob Log, Deep Stoch Log, ABL-hamming, and ABL-conf were also compared. In our experiments, all methods utilize a consistent perception model for fairness in comparison. For the HWF task, the selected perception model is Le Net (Le Cun & Bengio, 1998). For the HWF-CIFAR and HWF-SVHN, we employ Res Net50 (He et al., 2016). All these perception models are trained from scratch, and all compared methods were evaluated based on their official implementations. It is important to recall that in this study, only equations with lengths greater than or equal to 5 are considered, a criterion set to augment the task s difficulty. Consequently, the results obtained in this research are expected to differ from those reported by Li et al. (2020).

4.3. Empirical Analysis

(a) ABL Suffers from Ambiguity The ambiguity of abduction can significantly impair the performance of ABL. To support this assertion, the following analysis is presented.

2https://www.cs.rit.edu/ crohme2019/task.html

0 100 200 300

Addition-CIFAR n=3

0 100 200 300

Addition-CIFAR n=4

A3BL ABL-hamming ABL-conf

Figure 3. Abduction process (abduction accuracies vs. steps) of A3BL and ABL on digit addition tasks. The shaded area represents the standard error of the methods over 5 repetitions.

Analysis of Performance Curve. We investigate the performance of variants of ABL on two tasks: digit addition and handwritten formula recognition, as illustrated in Figure 1 and Figure 2, respectively. For the digit addition task, both ABL-conf and ABL-hamming encounter difficulties related to the inherent ambiguity in the abduction process. ABL-hamming, in particular, struggles to converge with increasing digit size and in complex perception scenarios, such as Addition-SVHN and Addition-CIFAR10. In contrast, ABL-conf is characterized by a high standard error, indicating an unstable training process. Similarly, in the handwritten formula recognition task, these patterns persist. ABL-hamming fails to converge effectively on the HWF-CIFAR and HWF-SVHN datasets, while ABL-conf again demonstrates a high standard error, underscoring the instability in its training process. However, in both tasks, A3BL emerges as significantly superior in terms of performance. This consistency across different challenges highlights its robustness and adaptability in handling complex tasks.

Analysis of Worst Case. The worst case, i.e., the worst performance run in repeated times. As demonstrated in Figure 4, the performance curves of A3BL and ABL-conf exhibit divergent trends. A3BL exhibits consistency between reasoning and concept accuracy, with both measures improving over time. Reasoning accuracy refers to the consistency between the model s prediction and the background knowledge, and concept accuracy refers to the instance-level classification. However, this is not the case for ABL-conf. In fact, the confusion matrix of ABLconf reveals a shortcut pattern. This pattern may slightly improve the reasoning accuracy, but it negatively impacts the concept accuracy. This finding clearly highlights that ABL-conf encounters difficulties in handling ambiguity, resulting in the adoption of shortcuts.

Ambiguity-Aware Abductive Learning

0 10 20 30 40 50

Reasoning Accuracy Concept Accuracy

0.88 0.01 0.02 0.00 0.02 0.00 0.00 0.01 0.01 0.04

0.01 0.74 0.01 0.09 0.01 0.04 0.03 0.03 0.04 0.00

0.03 0.01 0.90 0.00 0.03 0.01 0.00 0.00 0.00 0.01

0.01 0.11 0.00 0.77 0.01 0.03 0.03 0.04 0.02 0.00

0.02 0.01 0.03 0.00 0.86 0.03 0.01 0.01 0.01 0.01

0.01 0.05 0.01 0.03 0.04 0.78 0.04 0.02 0.03 0.00

0.00 0.04 0.01 0.02 0.01 0.04 0.80 0.04 0.03 0.00

0.01 0.02 0.00 0.03 0.01 0.01 0.03 0.88 0.01 0.00

0.00 0.04 0.00 0.02 0.00 0.02 0.02 0.01 0.88 0.00

0.05 0.01 0.01 0.00 0.03 0.00 0.00 0.00 0.01 0.88

0 10 20 30 40 50 0.0

Reasoning Accuracy Concept Accuracy

0.05 0.00 0.74 0.01 0.06 0.01 0.03 0.00 0.00 0.09

0.01 0.04 0.05 0.07 0.01 0.09 0.60 0.01 0.02 0.12

0.56 0.00 0.13 0.01 0.02 0.01 0.02 0.12 0.00 0.13

0.00 0.03 0.05 0.05 0.00 0.01 0.61 0.01 0.15 0.10

0.05 0.01 0.08 0.01 0.03 0.01 0.02 0.01 0.00 0.78

0.01 0.27 0.04 0.07 0.01 0.01 0.12 0.01 0.01 0.45

0.01 0.29 0.05 0.36 0.00 0.01 0.12 0.01 0.02 0.12

0.00 0.03 0.40 0.04 0.00 0.01 0.12 0.00 0.31 0.09

0.00 0.02 0.04 0.52 0.00 0.01 0.12 0.19 0.01 0.09

0.02 0.00 0.11 0.01 0.54 0.14 0.02 0.00 0.00 0.14

(b) ABL-conf

Figure 4. Worst case analysis between A3BL and ABL-conf. The concept accuracy of A3BL increases along with reasoning accuracy, while the concept accuracy of ABL-conf could worsen as reasoning accuracy increases. Confusion matrices indicate that ABL-conf falls into a shortcut, i.e., the model mistakenly categorizes instances to enhance reasoning accuracy, while leading to bad concept accuracy.

Table 1. Test accuracies (mean standard deviation) on digit addition tasks with varying digit sizes. Each experiment was conducted five times. The best performance is denoted in boldface. N/A indicates that the method failed to complete a single epoch within 48 hours.

Addition-SVHN Addition-CIFAR

Method n = 1 n = 2 n = 3 n = 4 n = 1 n = 2 n = 3 n = 4

Neur ASP (Yang et al., 2020) 6.63 0.23 7.84 0.15 7.84 0.15 N/A 10.00 0.00 9.73 0.46 9.73 0.46 N/A Deepproblog (Manhaeve et al., 2021) 6.53 1.12 N/A N/A N/A 7.55 1.26 N/A N/A N/A Deep Stoch Log (Winters et al., 2022) 91.61 0.27 90.81 0.52 90.28 0.35 90.92 0.75 66.32 0.72 67.63 1.45 65.55 0.87 66.38 1.28 ABL-hamming (Dai et al., 2019) 26.56 11.67 19.76 0.14 12.27 9.97 15.88 5.59 31.82 15.71 20.67 3.93 17.29 2.23 12.97 2.60 ABL-conf 91.02 0.95 64.66 43.45 65.12 43.76 65.00 43.97 68.42 1.79 62.94 25.74 58.29 30.19 62.52 26.15

A3BL (Ours) 91.86 0.86 91.81 0.87 92.49 0.37 91.99 0.46 72.82 0.95 76.09 0.18 74.45 0.37 72.57 1.22

(b) Accuracies of Abduction Figure 3 illustrates the abduction processes employed by A3BL, ABL-hamming, and ABL-conf on digit addition tasks. The process is conducted over 300 steps, corresponding to 10 epochs. In this context, abduction accuracy is defined as follows: for ABL-conf and ABL-hamming, accuracy depends on whether the result of their abduction process identifies the correct candidate. For A3BL, accuracy is contingent upon whether the candidate with the highest weight is indeed the correct one. ABLhamming failed to find the correct candidate, and ABL-conf also struggled with this. However, A3BL effectively distinguishes the correct candidate among abduction candidates through dynamic weight assignment. Within 10 epochs, the abduction accuracy of A3BL achieves nearly 1, which brings a clearly supervision for the machine learning model.

(c) Greater Stability and Faster Convergence Experimental results demonstrate that A3BL exceeds the performance of ABL variants, Deep Prob Log, Deep Stoch Log, NGS variants, and Neur ASP, exhibiting faster convergence and superior performance. In the task of digit addition, both Neur ASP and Deep Prob Log underperformed, failing to surpass a random classifier. Particularly, in certain settings, both methods failed to complete even a single training epoch within two days; this high computational complexity limits their applicability in realistic scenarios. In the task of handwritten formula recognition, all NGS variants failed,

possibly due to the increased difficulty from limiting the length of equations to five or more. Similarly, Deep Prob Log also failed to complete a training epoch within two days in this task. As previously discussed, A3BL outperforms ABL variants in all tasks. To summarize, A3BL beats other methods in terms of stability and convergence speed.

4.4. Efficiency Discussion

As shown in Theorem 3.2, the E-step in our algorithm can be computed efficiently, with only one step calculation. In other words, we use a fixed-form approach to derive this step. For comparison, the time complexity of the E-step in A3BL is O(n), where n is the size of the candidate set. However, the time complexity of ABL to select a candidate is still O(n), since they have to select the minimal inconsistent candidate from the set. After that, both ABL and A3BL will optimize their risk, which is with the same complexity see Equation (3) and Equation (9). In conclusion, the total time complexity of A3BL is the same as that of ABL.

Furthermore, we report the execution time for a sample training session (encompassing 5 epochs) for the Addition CIFAR task with n ranging from 1 to 4, see Table 2. The results were obtained using a computer setup consisting of an Intel Xeon Platinum 8538 CPU and an NVIDIA A100PCIE-40GB GPU on an Ubuntu 20.04 Focal platform.

Ambiguity-Aware Abductive Learning

Table 2. Execution time comparison on task Addition-CIFAR.

Method n = 1 n = 2 n = 3 n = 4

ABL 14m 4s 17m 15s 23m 12s 1h 3m 58s A3BL 14m 5s 19m 42s 27m 20s 1h 8m 49s

5. Conclusions and Future Directions

This study is the first to identify and articulate the issue of ambiguity within the Abductive Learning (ABL) framework, a challenge that arises due to the inherent ambiguity in the abduction process. To mitigate this challenge, we propose Ambiguity-Aware Abductive Learning (A3BL) as a novel solution. A3BL diverges from the conventional approach of selecting a single candidate at once; it considers all potential candidates, aggregating them into an instance-level class distribution, which is then optimized using the EM algorithm. This modeling approach enables A3BL to efficiently utilize ambiguous abduction results, thereby enhancing the learning system. Furthermore, we establish an error bound, which guarantees the promising performance of A3BL. Experimental results demonstrate that A3BL utilizes abduction results more effectively, achieving a high abduction accuracy in the training set within a few iterations. Compared to other baseline methods, A3BL demonstrates faster convergence, superior stability and better performance.

Although both experimental and theoretical analysis support the superior performance of A3BL, it may still encounter failures in some scenarios, e.g., the size of the abduction candidate set is tremendous. For instance, if the knowledge base is helpless, the abduction candidate set becomes the universal set, leading to an overwhelmingly large number of candidates, which can overburden the learning system. The challenge may be tackled from two perspectives: The first is Inductive Logic Programming (ILP), which derives knowledge from data, thus improving the knowledge base and subsequently aiding the abduction process. The other involves developing algorithms to streamline the abduction process, e.g., parallel abductive reasoning. Also, applying this framework across a broader spectrum of realistic scenarios presents a promising avenue.

Impact Statement

This paper introduces Ambiguity-Aware Abductive Learning (A3BL), a framework enhancing the integration of logical reasoning with machine learning. The advancement in A3BL primarily contributes to the efficiency and effectiveness of neuro-symbolic learning, with broad applicability in sectors such as healthcare and autonomous systems. We anticipate that this work will not introduce any negative ethical or social impacts.

Acknowledgments

This paper is supported by NSFC (62076121, 61921006) and Major Program (JD) of Hubei Province (2023BAA024). The authors would like to thank Prof. Wang-Zhou Dai for his helpful feedback on drafts of the paper.

Artzi, Y. and Zettlemoyer, L. Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions. Transactions of the Association for Computational Linguistics, 1:49 62, 2013.

Cai, L.-W., Dai, W.-Z., Huang, Y.-X., Li, Y.-F., Muggleton, S., and Jiang, Y. Abductive Learning with Ground Knowledge Base. In Proceedings of the 30th International Joint Conference on Artificial Intelligence, pp. 1815 1821, Virtual, 2021.

Clanuwat, T., Bober-Irizar, M., Kitamoto, A., Lamb, A., Yamamoto, K., and Ha, D. Deep Learning for Classical Japanese Literature. Co RR, abs/1812.01718, 2018.

Cour, T., Sapp, B., and Taskar, B. Learning from Partial Labels. Journal of Machine Learning Research, 12(42): 1501 1536, 2011.

Dai, W.-Z. and Muggleton, S. Abductive Knowledge Induction from Raw Data. In Proceedings of the 30th International Joint Conference on Artificial Intelligence, pp. 1845 1851, Virtual, 2021.

Dai, W.-Z., Xu, Q., Yu, Y., and Zhou, Z.-H. Bridging Machine Learning and Logical Reasoning By Abductive Learning. In Advances in Neural Information Processing Systems 32, pp. 2815 2826, Vancouver, BC, Canada, 2019.

De Raedt, L., Kimmig, A., and Toivonen, H. Prob Log: a Probabilistic Prolog and Its Application in Link Discovery. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, pp. 2468 2473, Hyderabad, India, 2007.

Dempster, A. P., Laird, N. M., and Rubin, D. B. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society: Series B, 39: 1 38, 1977.

Deng, L. The MNIST Database of Handwritten Digit Images for Machine Learning Research. IEEE Signal Processing Magazine, 29(6):141 142, 2012.

Dietterich, T. G., Lathrop, R. H., and Lozano-P erez, T. Solving the Multiple Instance Problem with Axis-parallel Rectangles. Artificial Intelligence, 89(1 2):31 71, 1997.

Ambiguity-Aware Abductive Learning

Dimopoulos, Y., Nebel, B., and Koehler, J. Encoding Planning Problems in Nonmonotonic Logic Programs. In Proceedings of the 4th European Conference on Planning, pp. 169 181, Toulouse, France, 1997.

Feng, L., Lv, J., Han, B., Xu, M., Niu, G., Geng, X., An, B., and Sugiyama, M. Provably Consistent Partial-label Learning. In Advances in Neural Information Processing Systems 33, pp. 10948 10960, Virtual, 2020.

Gao, E.-H., Huang, Y.-X., Hu, W.-C., Zhu, X.-H., and Dai, W.-Z. Knowledge-enhanced Historical Document Segmentation and Recognition. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, pp. 8409 8416, Vancouver, BC, Canada, 2024.

Garcez, A. S. d., Gabbay, D. M., and Broda, K. B. Neural Symbolic Learning System: Foundations and Applications. Springer-Verlag, Berlin, Heidelberg, 2002.

Hamming, R. W. Error Detecting and Error Correcting Codes. The Bell System Technical Journal, 29(2):147 160, 1950.

He, H.-Y., Dai, W.-Z., and Li, M. Reduced Implicationbias Logic Loss for Neuro-symbolic Learning. Machine Learning, 113:3357 3377, 2024.

He, K., Zhang, X., Ren, S., and Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 770 778, Las Vegas, NV, USA, 2016.

Huang, Y.-X., Dai, W.-Z., Yang, J., Cai, L.-W., Cheng, S., Huang, R., Li, Y.-F., and Zhou, Z.-H. Semi-supervised Abductive Learning and Its Application to Theft Judicial Sentencing. In Proceedings of the 20th IEEE International Conference on Data Mining, pp. 1070 1075, Sorrento, Italy, 2020.

Huang, Y.-X., Hu, W.-C., Gao, E.-H., and Jiang, Y. ABLkit: A Python Toolkit for Abductive Learning. Frontiers of Computer Science, in press.

Kahneman, D. Thinking, Fast and Slow. Farrar, Straus and Giroux, New York, USA, 2011.

Kingma, D. P. and Ba, J. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations, San Diego, CA, USA, 2015.

Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. Technical Report, Department of Computer Science, University of Toronto, 2009.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25, pp. 1097 1105, Lake Tahoe, Nevada, USA, 2012.

Le Cun, Y. and Bengio, Y. Convolutional Networks for Images, Speech, and Time Series. In The Handbook of Brain Theory and Neural Networks, pp. 255 258. MIT Press, Cambridge, MA, USA, 1998.

Li, Q., Huang, S., Hong, Y., Chen, Y., Wu, Y. N., and Zhu, S.-C. Closed Loop Neural-symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning. In Proceedings of the 37th International Conference on Machine Learning, pp. 5884 5894, Virtual, 2020.

Lu, N., Niu, G., Menon, A. K., and Sugiyama, M. On the Minimal Supervision for Training Any Binary Classifier from Only Unlabeled Data. In International Conference on Learning Representations, New Orleans, LA, USA, 2019.

Magnani, L. Abductive Cognition - The Epistemological and Eco-Cognitive Dimensions of Hypothetical Reasoning, volume 3 of Cognitive Systems Monographs. Springer, Berlin, Heidelberg, 2009.

Manhaeve, R., Dumancic, S., Kimmig, A., Demeester, T., and De Raedt, L. Deep Prob Log: Neural Probabilistic Logic Programming. In Advances in Neural Information Processing Systems 31, pp. 3753 3763, Montr eal, Canada, 2018.

Manhaeve, R., Dumanˇci c, S., Kimmig, A., Demeester, T., and De Raedt, L. Neural Probabilistic Logic Programming in Deep Prob Log. Artificial Intelligence, 298(C): 103504, 2021.

Maurer, A. A Vector-contraction Inequality for Rademacher Complexities. In International Conference on Algorithmic Learning Theory, pp. 3 17, Bari, Italy, 2016.

Mohri, M., Rostamizadeh, A., and Talwalkar, A. Foundations of Machine Learning. MIT Press, 2012.

Mooney, R. J. Integrating Abduction and Induction in Machine Learning. In Abduction and Induction: Essays on their Relation and Integration, pp. 181 191. Springer Netherlands, Dordrecht, 2000.

Muggleton, S. H. and Bryant, C. H. Theory Completion Using Inverse Entailment. In Proceedings of the 10th Inductive Logic Programming, pp. 130 146, London, UK, 2000.

Natarajan, N., Dhillon, I. S., Ravikumar, P. K., and Tewari, A. Learning with Noisy Labels. In Advances in Neural Information Processing Systems 26, pp. 1196 1204, Lake Tahoe, Nevada, USA, 2013.

Ambiguity-Aware Abductive Learning

Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., and Ng, A. Y. Reading Digits in Natural Images with Unsupervised Feature Learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning, Granada, Spain, 2011.

Roychowdhury, S., Diligenti, M., and Gori, M. Regularizing Deep Networks with Prior Knowledge: A Constraintbased Approach. Knowledge-Based Systems, 222:106989, 2021.

Sun, R. Integrating Rules and Connectionism for Robust Commonsense Reasoning, pp. 273. John Wiley & Sons, Inc., 1994.

Tao, L., Huang, Y.-X., Dai, W.-Z., and Jiang, Y. Deciphering Raw Data in Neuro-Symbolic Learning with Provable Guarantees. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, pp. 15310 15318, Vancouver, BC, Canada, 2024.

Towell, G. G. and Shavlik, J. W. Knowledge-based Artificial Neural Networks. Artificial Intelligence, 70(1):119 165, 1994.

van Krieken, E., Acar, E., and van Harmelen, F. Analyzing Differentiable Fuzzy Logic Operators. Artificial Intelligence, 302:103602, 2022.

Vapnik, V. N. An Overview of Statistical Learning Theory. IEEE Transactions on Neural Networks, 10(5):988 999, 1999.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. Attention Is All You Need. In Advances in Neural Information Processing Systems 30, pp. 6000 6010, Long Beach, CA, USA, 2017.

Wang, J., Deng, D., Xie, X., Shu, X., Huang, Y.-X., Cai, L.-W., Zhang, H., Zhang, M.-L., Zhou, Z.-H., and Wu, Y. Tac-Valuer: Knowledge-based Stroke Evaluation in Table Tennis. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 3688 3696, Virtual Event, Singapore, 2021.

Wang, K., Tsamoura, E., and Roth, D. On Learning Latent Models with Multi-instance Weak Supervision. In Advances in Neural Information Processing Systems 36, New Orleans, LA, USA, 2023.

Wei, Z., Feng, L., Han, B., Liu, T., Niu, G., Zhu, X., and Shen, H. T. A Universal Unbiased Method for Classification from Aggregate Observations. In Proceedings of the 40th International Conference on Machine Learning, pp. 36804 36820, Honolulu, Hawaii, USA, 2023.

Winters, T., Marra, G., Manhaeve, R., and Raedt, L. D. Deep Stoch Log: Neural Stochastic Logic Programming. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, pp. 10090 10100, Virtual, 2022.

Wu, Z., Lv, J., and Sugiyama, M. Learning with Proper Partial Labels. Neural Computation, 35(1):58 81, 2023.

Xie, Z., Liu, Y., He, H.-Y., Li, M., and Zhou, Z.-H. Weakly Supervised AUC Optimization: A Unified Partial AUC Approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1 16, 2024.

Xu, J., Zhang, Z., Friedman, T., Liang, Y., and Van den Broeck, G. A Semantic Loss Function for Deep Learning with Symbolic Knowledge. In Proceedings of the 35th International Conference on Machine Learning, pp. 5502 5511, Stockholm, Sweden, 2018.

Yang, X.-W., Shao, J.-J., Tu, W.-W., Li, Y.-F., Dai, W.- Z., and Zhou, Z.-H. Safe Abductive Learning in the Presence of Inaccurate Rules. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, volume 38, pp. 16361 16369, Vancouver, BC, Canada, 2024.

Yang, Z., Ishay, A., and Lee, J. Neur ASP: Embracing Neural Networks into Answer Set Programming. In Proceedings of the 29th International Joint Conference on Artificial Intelligence, pp. 1755 1762, Yokohama, Japan, 2020.

Zhang, D., Han, J., Cheng, G., and Yang, M.-H. Weakly Supervised Object Localization and Detection: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5866 5885, 2022.

Zhang, M.-L. and Zhou, Z.-H. A Review on Multi-label Learning Algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8):1819 1837, 2014.

Zhou, Z.-H. A Brief Introduction to Weakly Supervised Learning. National Science Review, 5(1):44 53, 2017.

Zhou, Z.-H. Abductive Learning: Towards Bridging Machine Learning and Logical Reasoning. Science China Information Sciences, 62(7):76101, 2019.

Zhou, Z.-H. and Huang, Y.-X. Abductive Learning. In Neuro-Symbolic Artificial Intelligence: The State of the Art, pp. 353 369. IOS Press, Amsterdam, 2022.

Ambiguity-Aware Abductive Learning

The structure of this appendix is as follows:

Appendix A includes the proofs3 omitted in the main text because of the space limit.

Appendix B introduces the details of the implementation of our method and the comparison baselines.

Appendix C provides more experiments about our method.

A.1. Proving the Equality between Equation (10) and Equation (9)

Proof. The Equation (10) is

c s(i) p(c|x(i), s(i)) L(f(x(i)), c) .

The Equation (9) is

j=1 Lcls f(x(i) j ), p(z|x(i) j , s(i)) .

To demonstrate the equality, we consider a single sample (x, y).

On the one hand, we have:

j=1 Lcls (f(xj), p(z|xj, s)) = 1

k=1 p(z = k|xj, s) log f(xj)

ci s I[k = cij]p(z = ci|x, s) log f(xj) .

The second equality build upon:

p(z = k|xj, s) = X

ci s I[k = cij]p(z = ci|x, s) .

On the other hand, we also have:

ci s p(c|x, s) L(f(x), ci) = X

ci s p(ci|x, s) 1

j=1 Lcls(f(xj), cij)

k=1 I[k = cij]p(z = ci|x, s) log f(xj) .

Thus we prove the equality.

Further, when the p(c|x, s) is correctly estimated, we can prove that the Ambiguity-Aware Abductive Risk is actually an unbiased risk estimator.

3The similar techniques we used here are widely adopted in many literatures in the (weakly) supervised learning field. Specifically, the risk rewrite technology can be found in (Feng et al., 2020; Wu et al., 2023; Wei et al., 2023) and so on. The error bound technology can be found in (Vapnik, 1999; Mohri et al., 2012; Maurer, 2016; Lu et al., 2019; Feng et al., 2020; Wu et al., 2023) and so on.

Ambiguity-Aware Abductive Learning

Theorem A.1 (Unbiased Risk Estimator). The classification risk Definition 3.1 can be equivalently expressed as:

R(f; L) = Ep(x,s)

c s p(c|x, s)L(f(x), c)

We named this risk as Ambiguity-Aware Abductive Risk.

R(f; L) = Ep(x,z) [L(f(x), z)] (17)

c Zm L(f(x), c)p(x, z = c)dx (18)

s Zm L(f(x), c) p(x, z = c, s)dx (19)

s Zm L(f(x), c) p(z = c|x, s) p(x, s)dx (20)

s Zm L(f(x), c) p(c|x, s) p(x, s)dx (21)

c s p(c|x, s) L(f(x), c)

A.2. Proof of Theorem 3.2

For reading convenience, we rewrite the theorem below:

Theorem A.2. The optimization of Equation (10) is actually optimizing the log-likelihood PN i=1 log pθ(x(i), y(i)), via the EM algorithm.

log pθ(x(i), y(i)) = log

c s(i) pθ(x(i), y(i), c)

c s(i) pθ(x(i), c) pθ(y(i)|x(i), c)

c s(i) pθ(x(i), c)

c s(i) wc pθ(x(i), c)

c s(i) wc log(pθ(x(i), c)

The second equality holds because pθ(y(i)|x(i), c) should always be equal to 1, as we can derive y(i) through candidate c using background knowledge, e.g., σ(c) = y(i).

The last inequality is derived from Jensen s inequality.

Ambiguity-Aware Abductive Learning

Suppose the size of candidate set s(i) = li, when the equality holds, which means:

pθ(x(i), c1)

wc1 = = pθ(x(i), cli)

wcli = C. (28)

Thus we can get:

pθ(x(i), cj)

j=1 wcj = 1. (29)

So we have: li X

j=1 pθ(x(i), cj) = C. (30)

It is easy to see that, pθ(x(i), cj, s(i)) = pθ(x(i), cj) because we can derive the whole candidate set from one candidate by using the knowledge base, and further we get:

j=1 pθ(x(i), cj, s(i)) = C = pθ(x(i), s(i)). (31)

Finally, we get: pθ(x(i), c) pθ(x(i), s(i)) = wc = pθ(x(i), c, s(i))

pθ(x(i), s(i)) = p(c|x(i), s(i)). (32)

The formulation presented above indicates that by setting wc as the candidate confidence introduced in Equation (7), we are performing the E-step in the Expectation-Maximization (EM) algorithm. After the E-step, the values of wc are fixed, and we proceed to optimize Equation (10) in the M-step in order to maximize the likelihood. This supports our earlier statement. Consider that EM algorithm is well established that has convergence promise, this implies that the optimization of A3BL can be guaranteed.

A.3. Proof of Theorem 3.3

Most of the prove tricks can be found in Mohri et al. (2012), especially the proof here were inspired by Wu et al. (2023) and Feng et al. (2020). For reading convenience, we rewrite the theorem below:

Theorem A.3 (Error bound). Suppose that L(f(x), z) is ρ-Lipschitz with respect to f(x) for all z Zm and upperbounded by M, i.e., M = supf F,x X m,z Zm L(f(x), z). Let Rn(Fi) be the Rademacher complexity of Fi with sample size n. Then for any δ > 0, with probability at least 1 δ,

R( ˆf A3) R(f ) 4

i=1 Rn(Fi) + 2M

To prove the Theorem 3.3, we first introduce the following lemma:

Lemma A.4. The following inequality holds:

0 R( ˆf A3) R(f ) 2 sup f |R(f) RA3( ˆf)|. (34)

Proof. By definition, R( ˆf A3) R(f ) 0, thus the first inequality is proved.

Ambiguity-Aware Abductive Learning

R( ˆf A3) R(f ) = (R( ˆf A3) RA3( ˆf A3)) + (RA3( ˆf A3) RA3(f )) + (RA3(f ) R(f )), (35)

(R( ˆf A3) RA3( ˆf A3)) + (RA3(f ) R(f )), (36)

2 sup f |R(f) RA3( ˆf)|, (37)

thus proving the second inequality.

Definition A.5 (Empirical Rademacher Complexity). Let G be a class of functions mapping Z 7 R and S = (z1, , zn) Zn a fixed sample of size n. Then the empirical Rademacher complexity of G with respect to the sample S is defined as:

ˆRS(G) = Eσ

i=1 σig (zi)

where σ = (σ1, , σn), with σi is independent uniform random variables taking from { 1, +1}.

Definition A.6 (Rademacher Complexity). Suppose the sample S with size n is drawn from distribution p i.i.d. The Rademacher complexity of G with respect to p is defined as:

Rn(G) = Ezi p h ˆRS(G) i . (39)

We introduce a class of functions defined on X m Zm according to Equation (7):

G = {(x, z) 7 X

c s p(c|x, s)L(f(x), c) : f F}. (40)

Then the Rademacher complexity of G with respect to p(x, c) is given as:

Rn(G) = E(xi,ci) p

i=1 σig(xi, ci)

Lemma A.7. Suppose M = supx X m,z Zm,f F L(f(x), z) < , then for any δ > 0, the following inequality holds with the probability at least 1 δ.

sup f F |R(f) ˆRA3(f)| 2Rn(G) + M

Proof. For a sample S, we define ϕ(S) = supf F(R(f) ˆRA3(f)). Suppose we replace an example (xi, ci) in the sample S with another example (x i, c i), the change of ϕ(S) is not greater than:

g(xi, ci) g(x i, c i) n M

since L is bounded by M. Then by Mc Diarmid s inequality, for any δ > 0, with a probability at least 1 δ

2, the following inequality holds:

ϕ(S) E(xi,ci) p[ϕ(S)] + M

By Chapter 3 in Mohri et al. (2012), it is easy to show that E(xi,ci) p [ϕ(S)] 2Rn(G). Hence the following holds with probability at least 1 δ/2:

sup f F (R(f) ˆRA3(f)) 2Rn(G) + M

thus complete the proof.

Ambiguity-Aware Abductive Learning

Lemma A.8. Suppose that the loss L(f(x), z) is ρ-Lipschitz with respect to f(x) for all z Zm. Then the following inequality holds:

i=1 Rn(Fi). (46)

Proof. Let Π = {(x, c) 7 L(f(x), c) : f F}. Notice that the candidate confidence p(c|x, s) is between 0 and 1, and that P

c s p(c|x, s) = 1. In this way, we can obtain:

Rn(G) = E(xi,ci) p

i=1 σig(xi, ci)

i=1 σi L(f(xi), ci)

= Rn(Π). (49)

Since L is ρ-Lipschitz, following the Rademacher vector contraction inequality (Maurer, 2016), we have:

i=1 Rn(Fi), (50)

which concludes this proof.

Finally, the proof of Theorem 3.3 can be completed by combining the above lemmas.

Ambiguity-Aware Abductive Learning

B. Implementation Details

All experiments were conducted on a system equipped with an NVIDIA Ge Force RTX 3090 GPU, Intel Xeon Silver 4210 CPU, 64GB of RAM, and Ubuntu 20.04 Focal. All experiments use Adam (Kingma & Ba, 2015) as the optimizer. For convenience, a summary table detailing the configurations of various methods is presented in Table 3.

Table 3. For the experimental setup, the configurations for the methods compared were chosen according to their official implementations, and A3BL aligns with the standard configurations of ABL.

Digit Addition Handwritten Formula Recognition

Method Optimizer Learning rate Batch size Epoch Optimizer Learning rate Batch size Epoch

A3BL Adam 0.001 256 50 Adam 0.001 1024 50 ABL-hamming Adam 0.001 256 50 Adam 0.001 1024 50 ABL-conf Adam 0.001 256 50 Adam 0.001 1024 50 Deep Prob Log Adam 0.001 128 50 Adam 0.001 32 50 Deep Stoch Log Adam 0.001 256 50 Adam 0.003 32 50 Neur ASP Adam 0.001 1000 50 - - - - NGS - - - - Adam 0.0005 64 50

B.1. Details of Datasets

For convenience, the information of datasets can be seen in Table 4.

Construction of Digit Addition The construction of the digit addition task is based on a ten-class dataset, such as MNIST. To construct a digit addition input x, we randomly select 2 n images from the dataset, where n represents the digit-size. The summation y of this input can be computed using digit addition rules. The size of the constructed equations remains constant across all digit-sizes. To ensure this consistency, each image is sampled n times. Consequently, the size of the constructed equations is invariably fixed to N/2, where N denotes the total number of images in the original dataset.

Construction of Handwritten Formula Recognition The development of the handwritten formula recognition task leverages the foundational implementation outlined by Li et al. (2020). In our experiments, we replace the corresponding digit with a randomly sampled image from a specific class within either the CIFAR or SVHN dataset. To ensure consistent training, for both CIFAR and SVHN datasets, we transform the original operator images from a dimension of 45 45 to 3 32 32.

Table 4. Dataset details for Handwritten Formula Recognition and Digit Addition tasks.

Task #instances of a sequence #sequence of training set #sequence of test set

Handwritten Formula Recognition 5 or 7 8000 1600 Digit Addition, n=1 2 30000 10000 Digit Addition, n=2 4 30000 10000 Digit Addition, n=3 6 30000 10000 Digit Addition, n=4 8 30000 10000

B.2. Implementation of A3BL

The pseudo-code of A3BL can be referred in Algorithm 1.

Abduction Process The official implementation of the ABL package implements the abduction process through zerothorder optimization, abduction search, or by pre-building the knowledge base, that is, storing all possible candidates concerning the potential target label y. A3BL adopts the same abduction process, which is built upon their implementations.

Ambiguity-Aware Abductive Learning

Algorithm 1 A3BL Algorithm

Require: Knowledge base KB, Perception model f, dataset ( ˆ X m, ˆY). 1: for each pair (x(i), yi) in ( ˆ X m, ˆY) do 2: ˆz(i) f(x(i)) 3: ˆy(i) KB.logical forward(ˆz(i)) 4: if y(i) = ˆy(i) then 5: s(i) KB.abduce(y(i), ˆzi) # Abduce all candidates that valid. 6: end if 7: loss 1

c s(i) pθ(c|x(i), s(i)) L(f(x(i)), c). # Calculate the ambiguity-aware abductive risk. 8: Update the perception model f 9: end for

Candidate Confidence For the computation of Equation (7), the equation can be reformulated as follows:

p(c|x, s) = p(c, x, s)

p(x, s) = p(c, s)

p(s, x) = p(c|x)

p(s|x). (51)

The validity of the second equation stems from the capability to generate the entire candidate set from a singular candidate by leveraging the knowledge base. Consider the case SUM(

), where a candidate is given as

= 1. Although this candidate may not be correct, it demonstrates that the SUM(

) = 3, thus can derive the entire candidate set.

Consequently, a parameterized model, exemplified by f, is utilized to estimate the final term p(c|x)/p(s|x) = pθ(c|x)/pθ(s|x). The denominator term pθ(s|x) is calculated using the equation pθ(s|x) = P c s pθ(c|x). The term pθ(c|x) is computed utilizing a neural network. Subsequently, all candidate confidences are normalized using a softmax function with temperature adjustment. In practice, the temperature is set to 0.3 without careful tuning .

Challenge: The Expanding Size of the Candidate Set A significant challenge inherent in our method is the expansion of the candidate set size with increasing background knowledge complexity, as illustrated in Figure 5. For instance, in the digit addition task, increasing the number of digits from one to two causes the average size of the candidate set to grow from approximately five to nearly fifty, potentially overburdening the learning system.

HWF Digit Addition 1 Digit Addition 2 Digit Addition 3 Digit Addition 4

Avg: 500.25

Avg: 5000.25

Figure 5. Variation in the size of abduction candidates across different tasks. This reveals two key observations: (a) As the size of the digits increases, the size of the abduction candidate set grows exponentially; (b) The size of the abduction candidate set for the HWF task varies from 1 to approximately 104, yet the average size remains relatively small.

Ambiguity-Aware Abductive Learning

Speed Up In Equation (10), different weights are assigned to various candidates based on their confidence levels. However, a significant number of candidates exhibit low confidence, resulting in minimal contribution to the optimization objective. To speed up, our objective is to balance between the consistency of risk and the efficiency of the training process. Consequently, a top-k selection, based on confidence, is introduced within the candidate set to facilitate algorithm implementation. While further tuning of this hyper-parameter could be beneficial to balance efficiency and performance, in this study, we chose k = 32 for all experiments.

Sensitivity Analysis about k To balance risk consistency and computational efficiency, a top-k selection, based on confidence, is introduced within the candidate set to facilitate algorithm implementation. Intuitively, as k increases, the derived risk tends to be unbiased, potentially leading to faster or more stable algorithm convergence.

0 100 200 300 400

Addition-CIFAR n=3

0 100 200 300 400

Addition-CIFAR n=4

A3BL topk=2

A3BL topk=4

A3BL topk=8

A3BL topk=16

A3BL topk=32

A3BL topk=64

Figure 6. Abduction process: x-axis refers the steps, y-axis refers the abduction accuraies. The abduction accuracy means whether the candidate with the highest weight is indeed the correct one.

0 2 4 6 8 10 12

Addition-CIFAR n=3

0 2 4 6 8 10 12

Addition-CIFAR n=4

A3BL topk=2

A3BL topk=4

A3BL topk=8

A3BL topk=16

A3BL topk=32

A3BL topk=64

Figure 7. Test accuracies: x-axis refers the epoches, y-axis refers the (concept) test accuraies.

As shown in the results in Figure 6 and Figure 7. For a small k, the model behaves similarly to a naive ABL approach, exhibiting high variance. Increasing k to a moderate value, such as 8, stabilizes performance and speeds up convergence. Further increasing k beyond a certain threshold does not significantly improve performance, indicating a saturation point.

Ambiguity-Aware Abductive Learning

B.3. Implementation of Compared Methods

The implementations of the compared methods are all based on their official implementations. To ensure a fair comparison, we may modify certain backbones of the machine learning models to maintain consistency across all methods.

Specifically:

The implementation of ABL-hamming and ABL-conf is based on their official package. 4

The implementation of NGS is based on their official package.5

The implementation of Deep Stoch Log is based on their official package.6

The implementation of Deep Prob Log is based on their official package.7

The implementation of Neur ASP is based on their official package.8

Some methods, such as Deep Stoch Log, Deep Prob Log, and Neur ASP, require the writing of Prolog-like knowledge bases, which are listed below.

digit(Y) :- member(Y,[0,1,2,3,4,5,6,7,8,9]). nn(number, [X], Y, digit) :: is_number(Y) --> [X]. addition(N) --> is_number(N1), is_number(N2), {N is N1 + N2}. multi_addition(N, 1) --> addition(N). multi_addition(N, L) --> {L > 1, L2 is L - 1}, addition(N1), multi_addition(N2, L2), {N is N1*(10**L2) + N2}.

Listing 1. Digit Addition knowledge base for Deep Stoch Log

dom_number(X) :- member(X, [0,1,2,3,4,5,6,7,8,9]). nn(number, [X], Y, dom_number) :: is_number(Y) --> [X].

dom_operator(X) :- member(X, [plus, minus, times, div]). nn(operator, [X], Y, dom_operator) :: operator(Y) --> [X]. factor(N) --> is_number(N).

0.34 :: term(N) --> factor(N). 0.33 :: term(N) --> term(N1), operator(times), factor(N2), {N is N1 * N2}. 0.33 :: term(N) --> term(N1), operator(div), factor(N2), {N2>0, N is N1 / N2}.

0.34 :: expression(N) --> term(N). 0.33 :: expression(N) --> expression(N1), operator(plus), term(N2), {N is N1 + N2}. 0.33 :: expression(N) --> expression(N1), operator(minus), term(N2), {N is N1 - N2}.

Listing 2. Handwritten Formula Recognition knowledge base for Deep Stoch Log

4https://github.com/Abductive Learning/ABLkit/tree/Dev 5https://github.com/liqing-ustc/NGS 6https://github.com/ML-KULeuven/deepstochlog/tree/main/examples 7https://github.com/ML-KULeuven/deepproblog/tree/master/src/deepproblog/examples 8https://github.com/azreasoners/Neur ASP/tree/master/examples

Ambiguity-Aware Abductive Learning

nn(mnist_net,[X],Y,[0,1,2,3,4,5,6,7,8,9]) :: digit(X,Y).

number([],Result,Result). number([H|T],Acc,Result) :- digit(H,Nr), Acc2 is Nr+10*Acc,number(T,Acc2,Result). number(X,Y) :- number(X,0,Y).

multi_addition(X,Y,Z) :- number(X,X2),number(Y,Y2), Z is X2+Y2. addition(X,Y,Z) :- digit(X,X2), digit(Y,Y2), Z is X2+Y2.

Listing 3. Digit Addition knowledge base for Deep Prob Log

nn(net1,[X],Y,[0,1,2,3,4,5,6,7,8,9]) :: detect_number(X,Y). nn(net2,[X],Y,[+,-,*,/]) :: detect_operator(X,Y).

detect_all([N],[N2]) :- detect_number(N,N2). detect_all([N,O|T],[N2,O2|T2]) :- detect_number(N,N2), detect_operator(O,O2), detect_all(T,T2).

almost_equal(X,Y) :- ground(Y),abs(X-Y) < 0.0001. almost_equal(X,Y) :- var(Y), Y is float(X).

expression(Images,Result) :- detect_all(Images,Symbols),parse(Symbols,Result).

parse([N],R) :-almost_equal(N,R).

parse([N1,+|T], R) :- parse(T,R2), almost_equal(N1+R2,R).

parse([N1,-|T], R) :- parse([-1,*|T],R2), almost_equal(N1+R2,R).

parse([N1,*,N2|T], R) :- N3 is N1*N2, parse([N3|T],R).

parse([N1,/,N2|T], R) :- N2 \== 0, N3 is N1/N2, parse([N3|T],R).

Listing 4. Handwritten Formula Recognition knowledge base for Deep Prob Log

img(i1). img(i2). addition(A,B,N) :- digit(0,A,N1), digit(0,B,N2), N=N1+N2. nn(digit(1,X), [0,1,2,3,4,5,6,7,8,9]) :- img(X).

Listing 5. Digit Addition knowledge base for Neur ASP

Ambiguity-Aware Abductive Learning

C. More Experiments

This section includes additional experiments not covered in the main text due to space limitations, as outlined below:

(a) Digit Addition The additional experimental results concerning Addition-MNIST and Addition-KMNIST are presented in Figure 8 and Table 5, respectively. The conclusion remains the same as for Addition-CIFAR and Addition-SVHN: ABL suffers from ambiguity in the abduction candidates, while A3BL can effectively utilize the abduction candidates. It is also worth noting that Deep Stoch Log performs well in these settings. However, A3BL converges faster and exhibits a smoother learning curve.

Table 5. Test accuracies (mean standard deviation) on digit addition task with different perception dataset. Each experiment was conducted five times. The best performance is denoted in boldface. N/A indicates that the method failed to complete a single epoch within 48 hours.

Addition-MNIST Addition-KMNIST

Method 1 2 3 4 1 2 3 4

Deep Stoch Log (Winters et al., 2022) 99.00 0.13 98.88 0.02 98.94 0.15 98.98 0.13 94.00 0.73 94.06 0.30 94.00 0.29 93.93 0.49 Neur ASP (Yang et al., 2020) 50.01 0.44 10.21 0.39 10.21 0.39 N/A 10.01 0.58 10.05 0.51 N/A N/A Deepproblog (Manhaeve et al., 2021) 97.42 0.29 N/A N/A N/A 80.54 0.21 N/A N/A N/A ABL-hamming (Dai et al., 2019) 98.49 0.22 98.65 0.08 98.63 0.12 60.41 40.27 91.48 0.54 92.43 0.65 63.51 36.91 47.74 37.13 ABL-conf 79.31 41.35 79.32 41.61 77.10 43.53 79.22 41.65 86.32 25.40 79.80 34.91 72.76 38.75 54.84 42.97

A3BL (Ours) 98.92 0.15 99.03 0.09 98.68 0.17 98.42 0.08 94.84 0.25 94.31 0.21 93.16 1.00 92.97 0.44

0 10 20 30 40 50

0 10 20 30 40 50

0 10 20 30 40 50

0 10 20 30 40 50

A3BL ABL-hamming ABL-conf Deepstochlog Deepproblog Neur ASP

(a) Addition-KMNIST

0 10 20 30 40 50

0 10 20 30 40 50

0 10 20 30 40 50

0 10 20 30 40 50

A3BL ABL-hamming ABL-conf Deepstochlog Deepproblog Neur ASP

(b) Addition-MNIST

Figure 8. Performance curves (accuracy vs. epochs) of compared methods on digit addition tasks. The shaded area represents the standard error of the methods over 5 repetitions. Cross marks denote experiments that were not finished within 48 hours.

Ambiguity-Aware Abductive Learning

(b) Handwritten Formula Recognition The test accuracy results are presented in Table 6. Both Deep Prob Log and variants of NGS underperform in this setting, while Deep Stoch Log performs well in HWF, it falls behind ABL-conf and A3BL in the HWF-CIFAR and HWF-SVHN settings. It is important to note that the average size of the candidate set is relatively small, approximately 39, as shown in Figure 5. Therefore, ABL-conf demonstrates strong performance in this task, but it still converge slower than A3BL.

Table 6. Test accuracies (mean standard deviation) on handwritten formula recognition tasks with different perception dataset. Each experiment was conducted five times. The best performance is denoted in boldface. N/A indicates that the method failed to complete a single epoch within 48 hours.

Method HWF-CIFAR HWF-SVHN HWF

NGS-BS (Li et al., 2020) 28.25 0.90 26.91 0.81 31.65 4.40 NGS-RL (Li et al., 2020) 16.45 1.62 23.38 3.31 16.92 0.30 NGS-MAPO (Li et al., 2020) 20.99 6.83 24.23 2.03 16.89 0.26 Deep Stoch Log (Winters et al., 2022) 75.41 2.21 75.29 23.81 99.65 0.06 Deepproblog (Manhaeve et al., 2021) N/A N/A N/A ABL-hamming (Dai et al., 2019) 41.73 2.47 37.18 6.57 99.40 0.10 ABL-conf 78.47 0.69 91.21 0.34 99.47 0.17

A3BL (Ours) 78.92 0.50 91.83 0.55 99.41 0.12

(c) Worst Case Analysis A worst-case analysis of ABL-hamming is depicted in Figure 9, focusing on the Addition-CIFAR task with a digit size of two. The learning curve of A3BL demonstrates consistent improvement in both reasoning and conceptual accuracy over time. However, ABL-hamming does not exhibit this trend. ABL-hamming struggles, displaying significant instability in its training process, particularly in the reasoning accuracy curve. Furthermore, the confusion matrix of ABL-hamming suggests that the model s predictions are nearly equivalent to random guessing. This suggests that the ambiguity inherent in the abduction results poses challenges for the machine learning model.

0 10 20 30 40 50

Reasoning Accuracy Concept Accuracy

0.88 0.01 0.02 0.00 0.02 0.00 0.00 0.01 0.01 0.04

0.01 0.74 0.01 0.09 0.01 0.04 0.03 0.03 0.04 0.00

0.03 0.01 0.90 0.00 0.03 0.01 0.00 0.00 0.00 0.01

0.01 0.11 0.00 0.77 0.01 0.03 0.03 0.04 0.02 0.00

0.02 0.01 0.03 0.00 0.86 0.03 0.01 0.01 0.01 0.01

0.01 0.05 0.01 0.03 0.04 0.78 0.04 0.02 0.03 0.00

0.00 0.04 0.01 0.02 0.01 0.04 0.80 0.04 0.03 0.00

0.01 0.02 0.00 0.03 0.01 0.01 0.03 0.88 0.01 0.00

0.00 0.04 0.00 0.02 0.00 0.02 0.02 0.01 0.88 0.00

0.05 0.01 0.01 0.00 0.03 0.00 0.00 0.00 0.01 0.88

0 10 20 30 40 50 0.0

Reasoning Accuracy Concept Accuracy

0.00 0.56 0.00 0.05 0.00 0.39 0.00 0.00 0.00 0.00

0.00 0.79 0.00 0.04 0.00 0.17 0.00 0.00 0.00 0.00

0.00 0.71 0.00 0.04 0.00 0.25 0.00 0.00 0.00 0.00

0.00 0.30 0.00 0.48 0.00 0.21 0.00 0.00 0.00 0.00

0.00 0.25 0.00 0.15 0.00 0.60 0.00 0.00 0.00 0.00

0.00 0.21 0.00 0.13 0.00 0.65 0.00 0.00 0.00 0.00

0.00 0.22 0.00 0.49 0.00 0.28 0.00 0.00 0.00 0.00

0.00 0.76 0.00 0.08 0.00 0.16 0.00 0.00 0.00 0.00

0.00 0.71 0.00 0.05 0.00 0.24 0.00 0.00 0.00 0.00

0.00 0.76 0.00 0.04 0.00 0.20 0.00 0.00 0.00 0.00

(b) ABL-hamming

Figure 9. Worst case performance tendency comparison between A3BL and ABL-hamming.

Ambiguity-Aware Abductive Learning

0 10 20 30 40 0.0

0 10 20 30 40

0 10 20 30 40

A3BL ABL-hamming ABL-conf

Figure 10. Abduction process (abduction accuracies vs. steps) of A3BL and ABL on handwritten formula recognition tasks. The shaded area represents the standard error of the methods over 5 repetitions.

(d) Abduction Accuracies Figure 10 illustrates the abduction processes employed by A3BL, ABL-hamming, and ABLconf on handwritten formula recognition tasks. The process is conducted over 40 steps, corresponding to 10 epochs. ABL-hamming was unable to identify the correct candidate, similarly, ABL-conf faced challenges in this regard. However, A3BL effectively distinguishes the correct candidate among abduction candidates through dynamic weight assignment to all potential candidates. Within 10 epochs, A3BL attains high abduction accuracy, providing clear supervision to the machine learning model.