# abductive_learning_with_ground_knowledge_base__216ed0a6.pdf

Abductive Learning with Ground Knowledge Base

Le-Wen Cai1 , Wang-Zhou Dai2 , Yu-Xuan Huang1 , Yu-Feng Li1 , Stephen Muggleton2 , Yuan Jiang1

1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China 2Department of Computing, Imperial College London, London SW7 2AZ, UK 1{cailw, huangyx, liyf, jiangy}@lamda.nju.edu.cn 2{w.dai, s.muggleton}@imperial.ac.uk

Abductive Learning is a framework that combines machine learning with ﬁrst-order logical reasoning. It allows machine learning models to exploit complex symbolic domain knowledge represented by ﬁrst-order logic rules. However, it is challenging to obtain or express the ground-truth domain knowledge explicitly as ﬁrst-order logic rules in many applications. The only accessible knowledge base is implicitly represented by groundings, i.e., propositions or atomic formulas without variables. This paper proposes Grounded Abductive Learning (GABL) to enhance machine learning models with abductive reasoning in a ground domain knowledge base, which offers inexact supervision through a set of logic propositions. We apply GABL on two weakly supervised learning problems and found that the model s initial accuracy plays a crucial role in learning. The results on a real-world OCR task show that GABL can signiﬁcantly reduce the effort of data labeling than the compared methods.

1 Introduction

To address current limitations of data-driven machine learning, the next generation of Artiﬁcial Intelligence asks for a strong integration of machine learning with knowledgedriven reasoning such as logic inference [Bengio, 2017]. Recent years have witnessed a vast growth in this area, representative progress includes Neuro-Symbolic Learning (Ne Sy) [Garcez et al., 2019] and Statistical Relational AI (Star AI) [Getoor and Taskar, 2007; Raedt et al., 2016]. However, most of them are trying to build an end-to-end learning pipeline by subsuming logical calculus into differentiable modules in deep learning or statistical inference, in which ﬁrst-order logical formulas are utilized as the basic relational topology for belief propagation and message passing. Abductive Learning (ABL) [Zhou, 2019; Dai et al., 2019] is a novel framework for combining machine learning with

Our work is supported by the National Key Research and Development Program of China No.2020AAA0109400 and the National Natural Science Foundation of China (61772262). Yuan Jiang is the corresponding author.

Figure 1: Example of the OCR Dictionary

pure ﬁrst-order logical reasoning in a mutually beneﬁcial way. In ABL, the machine learning model learns to convert raw data into primitive logic facts serving as input to symbolic reasoning; while logical reasoning can infer the truth-value of the facts, which are named as pseudo-labels, for training the machine learning model. The integration of the two systems is realized by abduction, i.e., abductive reasoning, which can selectively infer particular predicted facts based on existing background knowledge [Magnani, 2009]. Therefore, ABL allows machine learning to utilize complex domain knowledge such as ﬁrst-order logic theories [Dai et al., 2019; Huang et al., 2020]. Nevertheless, in many real-world applications, accessible knowledge bases only consist of a ﬁnite number of groundings (i.e., propositions or atomic logical formulas without variables). To give an example, for Optical Character Recognition (OCR) tasks, it is difﬁcult to explicitly represent the underlying structure of words and characters with ﬁrst-order theories, while the set of correct spellings can be easily obtained from a dictionary. As shown in ﬁgure 1, the dictionary of OCR can be represented as a ground knowledge base consisting of ground atoms of a predicate valid word(Y), such as valid word([ h , a , v , e ]), etc. For Star AI and Ne Sy systems, the lack of ﬁrst-order logic formulas means there is no relational structure to establish the paths for belief propagation and message passing in probabilistic reasoning; for abduction-based approaches, the lack of logic clauses makes logical abduction impossible. A possible workaround is formulating this type of problem as multi-class learning [Li et al., 2018], in which each ground atom (proposition) corresponds to a category. However, the lack of instances for each category brings class-imbalance issues [Japkowicz and Stephen, 2002], which makes machine learning even harder. Moreover, treating the groundings as independent categories neglects the implicit relational structure

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

among them (e.g., in English, the probability of s followed by e is much higher than that of s followed by z ). This paper presents Grounded Abductive Learning (GABL) to solve the above deﬁciency, which allows machine learning models to exploit ground domain knowledge base within ﬁrst-order logic context. Abduction in GABL is accomplished by augmenting the ground knowledge base with a default abductive logic program, which contains some general assumptions for abducing the pseudo-labels. For example, in OCR tasks, given an incorrect recognition result from an under-trained machine learning model, the augment abductive logic program could be ﬁnding the word in the dictionary with the highest recognition conﬁdence . The strong expressiveness power of ﬁrst-order logic allows GABL to exploit various complex assumptions in different applications. GABL provides a way to study in ABL, as the abductive reasoning in GABL is explicitly grounded. From the empirical study in a synthetic dataset, we ﬁnd that the initial accuracy of the machine learning model is crucial for GABL. When model s prediction accuracy is higher than a certain threshold, GABL could improve model performance with unlabeled data. Furthermore, we verify the performance of GABL in a real-world weakly supervised OCR task. Results show that GABL can use unlabeled data and ground knowledge base to improve model performance and signiﬁcantly decrease data labeling effort.

2 Related Work

In recent years, many approaches have been proposed to deal with the lack of labeled data in machine learning. Semisupervised learning is a powerful technique that attempts to exploit unlabeled data to improve model performance without human intervention. One category of semi-supervised learning methods related to this work is the proxy-label methods, which leverages the pre-trained model to produce pseudolabels for unlabeled data based on some heuristics. The representatives of them are Self-training [Yarowsky, 1995] and Tri-training [Zhou and Li, 2010]. The self-training method predicts the label of input data, and then uses the predicted examples with probability higher than a pre-deﬁned threshold or the top N conﬁdent predicted samples to retrain the model. Tri-training is a disagreement-based method based on ensemble, it uses diverse models to vote for the pseudo-labels for retraining the model. This work is also related to multi-class learning. In multiclass classiﬁcation, the model is required to classify instances into one of many categories. Error-Correcting Output Codes (ECOC) [Dietterich and Bakiri, 1994] is an ensemble method that transforms multi-class classiﬁcation task into multi-label learning by encoding each class with an error-correcting code, which introduces sub-labels and can model a certain degree of label correlation. The performance of ECOC is highly related to the label encoding, which is difﬁcult to construct without domain knowledge. Meanwhile, there are few studies about ECOC under the semi-supervised setting. Some methods consider using symbolic domain knowledge to help model training. Neuro-Symbolic Learning (Ne Sy) [Garcez et al., 2019] targets at combining machine

learning with symbolic reasoning. It tries to integrate the ability to learn from the environment (for perception and pattern recognition) and reason from what has been learned (for reasoning and explanation). In most Ne Sy systems, learning and reasoning are both realized by a neural network, in which the external domain knowledge is used for building an explainable neural structure. Statistical Relational Learning (SRL) [Getoor and Taskar, 2007; Raedt et al., 2016] shares the same motivation with Ne Sy, but it attempts to use domain knowledge to construct or initialize a probabilistic graphical model structure for statistical inference. Abductive Learning (ABL) [Dai et al., 2019; Zhou, 2019] is a framework that combines machine learning with pure ﬁrst-order logical reasoning in a mutually beneﬁcial way. ABL focuses on using ﬁrst-order logic rules to revise model predicted labels and using these revised labels for training machine learning models. However, in order to perform abduction, a ﬁrst-order abductive logic theory is required. Unlike previous approaches, GABL exploits both unlabeled data and a ground knowledge base to improve model performance. It is different from Ne Sy, SRL, and ABL, which require ﬁrst-order logic rules as domain knowledge.

3 Grounded Abductive Learning

This section presents problem setting and the Ground Abductive Learning (GABL) approach.

3.1 Problem Setting The main target of this paper is to improve a pre-trained model, whose labeled training data are unavailable, with a set of unlabeled instances together with a ground knowledge base (GKB) that constrains the model s output space. Formally, the input of the task contains a set of unlabeled training data D = {x1, . . . , xm} and a ground knowledge base GKB. Each xi X is corresponding to an unknown label yi Y, where X is the feature space and Y is the label space. GKB Y is a subset of the label space. In this paper, we consider classiﬁcation problems, so Y is discrete and symbolic, i.e., each point yi Y can be considered as a ground atom or a proposition in Herbrand universe. For example, for the OCR task in ﬁgure 1, x is an image of a hand-written word, y is a string composed of the 26 English characters. As we can see, Y could be inﬁnitely large (e.g., there could be an inﬁnite number of possible strings). GKB is the set of ground atoms that lists all valid candidates for y, e.g., the dictionary of correct spellings in English. Hence, for each xi D with a corresponding yi, we have yi GKB Y. From the aspect of ﬁrst-order logic, GKB is the answer set [Lifschitz, 2008] of an unknown ﬁrst-order logic theory. We denote the pre-trained machine learning model to be improved as M : X 7 Y. Given an input xi D, this model can output ˆyi = M(xi) as its prediction.

3.2 Abductive Learning In Abductive Learning (ABL), the machine learning model learns to convert raw data into primitive logic facts, which are regarded as pseudo-labels ˆy for logical reasoning. Meanwhile, abduction can selectively infer particular predicted

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

facts based on ﬁrst-order logic rules. The inferred facts, which are regarded as abduced-labels y, will be utilized like ground-truth labels for training the machine learning model. Abduction [Josephson and Josephson, 1996] is a basic form of logical inference for seeking the best explanation for observations based on implication. For example, when there is a formula wet ground rain (rain causes wet ground). When we observed the ground is wet, we could guess that it has rained.

Challenge It is challenging for GABL to realize abduction based on a ground knowledge base without any ﬁrst-order logic rules. The ground knowledge base could only offer predicates such as valid word(Y) to judge whether Y belongs to the knowledge base. However, abduction requires that there are some ﬁrst-order logic rules for abduction. Thus, it is impossible to perform abduction when there is only a ground knowledge base.

3.3 Implementation of GABL In order to perform abduction in a ground knowledge base, we propose to include an augmenting GKB with an abductive logic program that contains some very general assumptions that can constrain the search for y the revised pseudolabels for the predicted ˆy by M. Considering the motivating OCR task that tries to map images to strings, we assume each label yi as a sequence [yi,1, . . . , yi,Li], where Li is the length of sequence and yi,l is l-th sub-label of yi. For this type of problem, we could include a general abductive program that uses GKB to constrain the search of pseudo-labels by string distance (e.g., edit distances). The program can be represented in ﬁrst-order logic by following deﬁnitive clause:

program(ˆy, y, GKB) between(1, m, D) distance(ˆy, y, D) y GKB confidence( y, C) C threshold. (1)

Here m constrains the maximum allowed distance between model M s predicted-label ˆy and abduced-label y; threshold is used to exclude results with low conﬁdence; C is the conﬁdence of y that calculated by M. According to this ﬁrst-order logic rule, GABL can automatically ﬁnd out the y, which is close enough to ˆy and has high conﬁdence. When there is more than one solution after abduction, GABL can include another rule to pick out the most conﬁdent one. In fact, distance(ˆy, y, D) and confidence( y, C) could be combined as that how similar is the model predicted result to groundings. We use the model training loss function to represent the similarity because model training loss function is carefully designed for learning task. In other words, when we consider the conﬁdence in the abduction, we will directly select abduced-labels based on the loss function. Otherwise, we will select abduced-labels based only on the distance. The Grounded Abductive Learning algorithm is described in algorithm 1. GABL will repeat E epochs and for every

Algorithm 1 Grounded Abductive Learning Input: Unlabeled Dataset Du, Pre-trained Model M, Ground Knowledge Base GKB, Augment Abductive Logic Program P Parameter: Epoch E Output: Fine-tuned Model M 1: for e = 1 to E do 2: D = [] 3: for x Du do 4: ˆr = M(x) 5: y = abduce(ˆr, GKB, P) 6: if y is not None then 7: D.append((x, y)) 8: end if 9: end for 10: Updating model M via D 11: end for 12: return M

epoch, GABL uses model M to generate the result ˆr (labels with conﬁdence) of input data x. GABL selects abducedlabels based on Eq. (1). When the ˆy exits, GABL accepts it as training data and puts it into D. At the end of the epoch, we use the training database D to update model M.

4 Empirical Study

This section discusses why GABL can improve the machine learning model performance by leveraging unlabeled data and ground knowledge base. Firstly, we illustrate the mechanism of GABL through an intuitive example. Secondly, We construct experiments and aim to address: 1) how model accuracy would impact abduction learning when given domain knowledge; 2) How domain knowledge affects abductive learning.

4.1 Mechanism of GABL Intuitively, GABL is similar to the classical self-training method [Yarowsky, 1995] for semi-supervised learning, which is a pseudo-label based method that uses modelpredicted labels for further model training. Besides, GABL can exclude invalid pseudo-labels and even correct inaccurate pseudo-sub-labels, which could be more efﬁcient for exploiting the unlabeled data than self-training methods. For the input data x, when the model predicted label ˆy is inconsistent with the ground knowledge base, abduction needs to revise ˆy into y that belongs to ground knowledge base. It assumes that only a few parts of the predicted label are incorrect. The pseudo-label ˆy might be distorted from a neighboring label y that belongs to GKB. On the contrary, each ˆy, which is in the neighborhood of y, should be revised to y. Figure 2 better illustrates the mechanism of abduction in GABL. As shown in ﬁgure 2, every y in GKB covers a part of the space like a Vonoroi diagram [Edelsbrunner and Seidel, 1986] under some special distance measurement (Hamming distance or other distance). Because of some disturbance, the predicted results ﬂoat from points into their neighborhood, e.g., a ball surrounding it. The radius of balls depends on

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

Figure 2: The labels space Y divided by neighborhoods of groundings in abduction. The red points in the center of circles are groundings in GKB. Space is divided into four parts according to the distance measure in the label space Y. The predicted result (the yellow points) would appear in the blue circles with high probability. For example, an input data, whose ground truth label is point A, is predicted as A and we can use abduction method to re-annotate point A as A and ﬁne-tune the model. However, when two points in GKB are too closed, or the model prediction error is too large, the abduced result could be wrong. For example, an input data, whose ground-truth label is point C, is predicted as C and wrongly abduced as point D because of its distance to D is closer than C.

the accuracy of the pre-trained model, and higher accuracy leads to a smaller radius. When the model performance is ideal, almost all pseudo-labels fall into ground-truth covered space and can be classiﬁed correctly. When disturbance becomes more prominent, the radius of circles is bigger, more predicted results would be wrongly classiﬁed and may damage model performance. We can use uncertainty to explain the above phenomenon. According to information theory [Cover, 1999], the uncertainty about the ground-truth label based on the pseudo-label can be decomposed into the entropy of the ground-truth label and the mutual information of the pseudo-label about the ground-truth label. When the task is given, the uncertainty of model prediction depends on the model s prediction accuracy. Therefore, the model s prediction accuracy plays a crucial role in abductive learning. Moreover, GABL is a multi-epoch method. The model performance is boosted by repeatedly executing prediction and abduction. The model s accuracy after each epoch of training depends on the accuracy of the previous epoch. It can be seen that the initial accuracy of the model is crucial. Through some assumptions, we ﬁnd that the accuracy threshold exists when given a domain knowledge base. First of all, we assume that after training, the generalization accuracy of the model is inﬁnitely close to the accuracy of the data set. Second, we assume the higher the model s prediction accuracy leads to the higher the accuracy of the abduced result. Third, we assume that the prediction accuracy of all categories is almost the same. We note the accuracy of the label predicted by the model in epoch i as ˆpc i, and the accuracy of the sub-label after abduced is pc i. In particular, ˆpc 0 represents the sub-label prediction accuracy predicted by the initial model, and pc 0 represents the abduced sub-label accuracy predicted by the initial model. This means that if ˆpc 0 < pc 0, then ˆpc 0 < pc 0 ˆpc 1 <

Figure 3: Sequence prediction setting. There are two settings when the length of the label is L. (a) Every sub-label has a unique machine learning model for classiﬁcation; (b) All sub-labels share the same perception model.

Figure 4: Examples of GKBs.

ˆpc 1... ˆpc E < ˆpc E, where E means training epochs. But if ˆpc 0 > pc 0, then ˆpc 0 > pc 0 ˆpc 1 > ˆpc 1... ˆpc E > ˆpc E. Therefore, there is a threshold pt1. When ˆpc 0 > pt1, the GABL can be help the model improve performance. At the same time, there is another threshold pt2. When ˆpc 0 < pt2 the GABL would hurt model performance, and the accuracy rate will gradually decrease. Therefore, there are accuracy thresholds pt1 and pt2, and we note them as pt for convenience.

4.2 Experiment on Synthetic Data In this experiment, we verify whether there exists an initial accuracy threshold pt that improves the model during abductive learning with the augment program in Eq. (1), and explore what would impact the accuracy threshold pt based on the synthesized dataset. The code is available for download1. Does the Accuracy Threshold Exist? Dataset The dataset includes two parts, a ground knowledge base GKB represented by a set of groundings (as shown in ﬁgure 4)) and unlabeled training data Du = {x1, x2, ..., xn}. The groundings are generated by different domain knowledge base which includes hamming code of length 7 (experimental results note as hamming-* ) and decimal addition equation of length between 5 and 7 (experimental results note as addition-* ). Every unlabeled data xi contains Lid features, Li represents the length of label yi = [yi,1, yi,2, ..., yi,Li] GKB, d represents every sub-label yi,k corresponds to d features, such as [xi,(k 1)d+1, ..., xi,kd]. [xi,(k 1)d+1, ..., xi,kd] is sampled from basic data (MNIST images [Le Cun et al., 1995], CIFAR-10 images [Krizhevsky, 2009] or synthetic data (as shown in ﬁgure 5)) according to sub-label yi,k. Images of plus and equal signs are additionally added to MNIST images and CIFAR-10 images. Experiment Setting Mimicking the perception-andreasoning pipeline of Ne Sy and ABL models, we use one

1https://github.com/Abductive Learning/GABL

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

Figure 5: The distribution of synthetic data.

0.5 0.6 0.7 0.8 0.9 1.0 0.5

DT KNN cifar mnist

(a) Hamming w/o conﬁdence.

0.5 0.6 0.7 0.8 0.9 1.0 0.5

DT KNN cifar mnist

(b) Hamming w/ conﬁdence.

0.0 0.2 0.4 0.6 0.8 1.0 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

DT KNN cifar mnist

(c) Equation w/o conﬁdence.

0.0 0.2 0.4 0.6 0.8 1.0 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

DT KNN cifar mnist

(d) Equation w/ conﬁdence.

Figure 6: Model s Trained Accuracy w.r.t different Model s Initial Accyracy (a) and (b) use hamming code as GKB. (c) and (d) use addition equation as GKB. (a) and (c) use classiﬁcation result of sub-label in abduction. (b) and (d) consider conﬁdence generated by the model in abduction.

model for predicting the sub-labels and then feed them to GKB for reasoning. Speciﬁcally, as shown in ﬁgure 3(a), model M converts [xi,(k 1)d+1, ..., xi,kd] into sub-label yi,k. Ideally, the ﬁnal result ˆyi, which includes [ˆyi,1, ˆyi,2, ..., ˆyi,L], belongs to GKB. When basic data are MNIST images (or CIFAR-10 images), we use CNN as perception model M and note experiment result as mnist (or cifar ). When basic data are synthetic data, we use KNN (or decision tree) as perception model M and note experiment result as KNN (or DT ). We control the model s initial accuracy through noisy data or controlling the number of pre-training data. Abduction considering model prediction conﬁdence or not, are both tested. When abduction does not consider conﬁdence, GABL rejects the sample with multiple solutions. Additionally, we set k = 3 in KNN and let each leaf node of the decision tree at least three samples in training.

0 20 40 60 80 100 0.5

L = 11 L = 13 L = 15

Figure 7: Accuracy Threshold w.r.t the Size of Ground Knowledge Base (GKB). Every accuracy threshold pt in the ﬁgure is the maximum value in 10 experiments in which the GKBs have the same label s length L and GKB s size N.

Experimental Results Figure 6 shows there is an accuracy threshold in experiments. The experiments are conducted based on different GKB when models with different initial accuracy. As shown in ﬁgure 6, there are boundaries (accuracy threshold) in all experiments. When model initial prediction accuracy is high enough, GABL improves model performance via unlabeled data. It is worth noting that when we utilize model prediction conﬁdence in abduction, the accuracy threshold is lower than the accuracy of abduction only based on model classiﬁcation results.

How Does Domain Knowledge Affect the Threshold? Dataset & Setting We use random binary ﬁxed-length code as GKB whose size N and code s length L can be controlled. The basic data are sampled from synthetic data. We use a decision tree as the base model and allow only one sample in the leaf node in training. Abduction does not consider model prediction conﬁdence and only uses model classiﬁcation results. As shown in ﬁgure 3(b), when the code length of GKB is L, we use L classiﬁers as the perception model and the k-th classiﬁer response to predict k-th sublabel. It can avoid some special situations. For example, when there is only one model for predicting all sub-labels and GKB = { 101 , 100 }, the training process is almost supervised learning. Experimental Results Figure 7 illustrates the relationship between accuracy threshold pt and experiments parameters (the length L of groundings and the size N of GKB). The result in ﬁgure 7 shows that when GKBs have same size N, different label lengths have different threshold pt. Moreover, the longer code length (number of sub-labels) requires lower threshold pt. It is like information transmission, which uses longer code to overcome noise in the channel.

Summary In these experiments, we empirically verify that in Ground Abductive Learning (GABL), there is a threshold pt, such that when the model accuracy is higher than pt, GABL with the abductive program in Eq. (1) will improve model performance. Furthermore, the pt is related to the sparsity of label space Y, which is mainly inﬂuenced by the size N of GKB and the length L of the label. In short, a sparser space offers more tolerance of model errors and re-

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

Figure 8: A handwriting example in OCR task.

quires lower pre-trained model accuracy of task.

5 Optical Character Recognition Experiments

This section describes an experiment that applies GABL to a handwritten Optical Character Recognition (OCR) task. The experiment s main objective is to verify whether GABL can be applied in real-world applications with noisy input data. Optical Character Recognition is an important application in the real world. For example, many handwriting archival materials are not transcribed into text. It is not friendly for amateurs to read and not easy for information retrieval. Therefore, it is meaningful to transcribe these handwritten documents into text. In practice, there exist two kinds of handwriting recognition tasks, lexicon-free and lexiconbased. Lexicon-based handwriting recognition offers a lexicon for model inference, which means that it should pick words in the lexicon. Dataset We use IAM-database [Marti and Bunke, 2002] as the test benchmark. IAM-database contains 115,320 isolated word-level English handwriting images which are not pre-segmented. An example is shown in ﬁgure 8. Moreover, this is the ﬁrst time that abductive learning is applied to tasks with unsegmented raw inputs. We only reserve words whose length is longer than 3 because there are too many short words, which causes the long-tail problem and is beyond this article s scope. We split the dataset into labeled data, unlabeled data, and test data in experiments. We leave 10% of the data for testing and randomly pick out different number data as labeled data. We collect all labels of the IAM database s images as the ground knowledge base (GKB). Experimental Setting The setting in this experiment is like ﬁgure 3(a), where the same model predicts each sublabels. We use CRNN [Shi et al., 2017] as the basic machine learning model. During the prediction, the CRNN greedy selects the highest probability letters of each position and then merges the repeating letters. We use Burkhard-Kellertree [Burkhard and Keller, 1973] (BK-tree) to select similar candidates in GKB and use edit distance to measure the similarity between candidates in BK-tree. At last, we pick the abduced pseudo-labels ranked by the CTC loss [Graves et al., 2006]. We test our method in a semi-supervised setting. We use labeled data to train the model and then combine labeled data and abduced data for the training model. We compare GABL with three types of semi-supervised baselines which all use CRNN as the basic model. 1) ST: Self-training methods [Yarowsky, 1995]; 2) Tri: Tritraining [Zhou and Li, 2010]; 3) VAT: Virtual Adversarial Training [Miyato et al., 2019]. We also test CRNN s per-

5% 10% 15% 20% 25% 100%

CRNN 0.262 0.434 0.515 0.561 0.592 0.742

ST 0.461 0.572 0.636 0.660 0.677 - Tri 0.222 0.484 0.588 0.638 0.647 - VAT 0.301 0.476 0.567 0.594 0.627 - GABL 0.615 0.674 0.713 0.717 0.720 -

Table 1: Accuracy in handwriting experiments.

formance in fully supervised learning. Experimental Results We use the model s best performance on the test set as the experiment result. Because the performance of the comparison method will deteriorate rapidly as the number of training epochs increases, and it is difﬁcult to determine the optimal performance through the number of training epochs. As shown in table ??, Grounded Abductive Learning has achieved the best performance in the handwritten Optical Character Recognition tasks. Although it is not a fair comparison, GABL utilizes a ground knowledge base to improve model performance and reduce the number of labeled data. We also discover that insufﬁcient unlabeled data could limit GABL s performance due to model overﬁtting on insufﬁcient training data during the abduction process. Using all unlabeled data in one epoch may trap the model in a local optimum. When we subsample a batch of data for training in every epoch, models achieve better performance in the OCR. The experiments are run on a single V100S GPU. GABL takes about 48 hours to train a CRNN model. In each epoch, GABL takes twice as much time as self-training. GABL is faster than the Tri-training and slower than the VAT.

6 Conclusion

This paper presents Ground Abductive Learning (GABL) to exploit the logical domain knowledge base represented by groundings. By augmenting the ground knowledge base with a program that exploits edit distance to abduce pseudolabels, GABL can signiﬁcantly outperform the compared supervised and semi-supervised learning approaches given the same amount of labeled data. Empirical study shows that the augment logic program can improve the performance of model when the accuracy of the pre-trained model exceeds a threshold. From the results of our experiments with synthetic data, we show that the threshold depends on the size of the ground knowledge base and the sparsity of the space covered by groundings. In general, a ground knowledge base can be regarded as an answer set of a ﬁrst-order logic theory [Lifschitz, 2008]. Thus GABL is suitable for combining machine learning with any type of logic background knowledge.

Acknowledgements

The authors thank the Nanjing University-Imperial College London Machine Learning Joint Research Hub and the Ofﬁce of International Cooperation & Exchanges of Nanjing University for their ﬁnancial support.

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

References [Bengio, 2017] Yoshua Bengio. The consciousness prior. ar Xiv preprint ar Xiv:1709.08568, 2017. [Burkhard and Keller, 1973] Walter A. Burkhard and Robert M. Keller. Some approaches to best-match ﬁle searching. Communications of the ACM, 16(4):230 236, 1973. [Cover, 1999] Thomas M Cover. Elements of information theory. John Wiley & Sons, 1999. [Dai et al., 2019] Wang-Zhou Dai, Qiu-Ling Xu, Yang Yu, and Zhi-Hua Zhou. Bridging machine learning and logical reasoning by abductive learning. In Advances in Neural Information Processing Systems 32 (Neur IPS), pages 2811 2822, 2019. [Dietterich and Bakiri, 1994] Thomas G Dietterich and Ghulum Bakiri. Solving multiclass learning problems via error-correcting output codes. Journal of Artiﬁcial Intelligence Research, 2:263 286, 1994. [Edelsbrunner and Seidel, 1986] Herbert Edelsbrunner and Raimund Seidel. Voronoi diagrams and arrangements. Discrete & Computational Geometry, 1(1):25 44, 1986. [Garcez et al., 2019] Artur S. d Avila Garcez, Marco Gori, Lu ıs C. Lamb, Luciano Seraﬁni, Michael Spranger, and Son N. Tran. Neural-symbolic computing: An effective methodology for principled integration of machine learning and reasoning. Journal of Applied Logic, 6(4):611 632, 2019. [Getoor and Taskar, 2007] Lise. Getoor and Ben Taskar, editors. Introduction to statistical relational learning. MIT Press, Cambridge, Massachusetts, 2007. [Graves et al., 2006] Alex Graves, Santiago Fern andez, Faustino J. Gomez, and J urgen Schmidhuber. Connectionist temporal classiﬁcation: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning (ICML), pages 369 376, 2006. [Huang et al., 2020] Yu-Xuan Huang, Wang-Zhou Dai, Jian Yang, Le-Wen Cai, Shaofen Cheng, Ruizhang Huang, Yu Feng Li, and Zhi-Hua Zhou. Semi-supervised abductive learning and its application to theft judicial sentencing. In Proceedings of the 20th IEEE International Conference on Data Mining (ICDM), pages 1070 1075, 2020. [Japkowicz and Stephen, 2002] Nathalie Japkowicz and Shaju Stephen. The class imbalance problem: A systematic study. Intelligent Data Analysis, 6(5):429 449, 2002. [Josephson and Josephson, 1996] John R Josephson and Susan G Josephson. Abductive inference: Computation, philosophy, technology. Cambridge University Press, 1996. [Krizhevsky, 2009] Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical Report, University of Toronto, 2009. [Le Cun et al., 1995] Yann Le Cun, Lawrence D Jackel, L eon Bottou, Corinna Cortes, John S Denker, Harris Drucker,

Isabelle Guyon, Urs A Muller, Eduard Sackinger, Patrice Simard, et al. Learning algorithms for classiﬁcation: A comparison on handwritten digit recognition. Neural networks: the statistical mechanics perspective, 261(276):2, 1995. [Li et al., 2018] Jian Li, Yong Liu, Rong Yin, Hua Zhang, Lizhong Ding, and Weiping Wang. Multi-class learning: From theory to algorithm. In Advances in Neural Information Processing Systems (Neur IPS), pages 1586 1595, 2018. [Lifschitz, 2008] Vladimir Lifschitz. What is answer set programming? In Proceedings of the 23rd AAAI Conference on Artiﬁcial Intelligence (AAAI), pages 1594 1597, 2008. [Magnani, 2009] Lorenzo Magnani. Abductive Cognition: The Epistemological and Eco-Cognitive Dimensions of Hypothetical Reasoning. Springer, Berlin, 2009. [Marti and Bunke, 2002] Urs-Viktor Marti and Horst Bunke. The iam-database: an english sentence database for ofﬂine handwriting recognition. International Journal on Document Analysis and Recognition, 5(1):39 46, 2002. [Miyato et al., 2019] Takeru Miyato, Shin ichi Maeda, Masanori Koyama, and Shin Ishii. Virtual adversarial training: A regularization method for supervised and semisupervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8):1979 1993, 2019. [Raedt et al., 2016] Luc De Raedt, Kristian Kersting, Sriraam Natarajan, and David Poole. Statistical relational artiﬁcial intelligence: Logic, probability, and computation. Synthesis Lectures on Artiﬁcial Intelligence and Machine Learning, 10(2):1 189, 2016. [Shi et al., 2017] Baoguang Shi, Xiang Bai, and Cong Yao. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11):2298 2304, 2017. [Yarowsky, 1995] David Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 189 196, 1995. [Zhou and Li, 2010] Zhi-Hua Zhou and Ming Li. Semisupervised learning by disagreement. Knowledge and Information Systems, 24(3):415 439, 2010. [Zhou, 2019] Zhi-Hua Zhou. Abductive learning: towards bridging machine learning and logical reasoning. Science China Information Sciences, 62(7):76101, 2019.

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)