# abducing_relations_in_continuous_spaces__d9ee57ba.pdf

Abducing Relations in Continuous Spaces

Taisuke Sato1, Katsumi Inoue2, Chiaki Sakama3

1 AI research center AIST, Japan 2 National Institute of Informatics, Japan 3 Wakayama University, Japan satou.taisuke@aist.go.jp, inoue@nii.ac.jp, sakama@sys.wakayama-u.ac.jp

We propose a new approach to abduction, i.e., nondeductive inference to ﬁnd a hypothesis H for an observation O such that H,KB O where KB is background knowledge. We reformulate it linear algebraically in vector spaces to abduce relations, not logical formulas, to realize approximate but scalable abduction that can deal with web-scale knowledge bases. More speciﬁcally we consider the problem of abducing relations for Datalog programs with binary predicates. We treat two cases, the non-recursive case and the recursive case. In the non-recursive case, given r1(X,Y) and r3(X,Z), we abduce r2(Y,Z) so that r3(X,Z) Yr1(X,Y) r2(Y,Z) approximately holds, by computing a matrix R2 that approximately satisﬁes a matrix equation R3 = min1(R1R2) containing a nonlinear function min1(x). Here R1, R2 and R3 encode as adjacency matrix r1(X,Y), r2(Y,Z) and r3(Y,Z) respectively. We apply this matrix-based abduction to rule discovery and relation discovery in a knowledge graph. The recursive case is mathematically more involved and computationally more difﬁcult but solvable by deriving a recursive matrix equation and solving it. We illustrate concrete recursive cases including a transitive closure relation.

1 Introduction

Traditionally logical inference in AI has been conducted symbolically and low-level data processing such as image recognition is carried out non-symbolically using real numbers and vectors. However recent emergence of big knowledge graphs [Nickel et al., 2016] such as Freebase [Bollacker et al., 2008] and Knowledge Vault [Dong et al., 2014], where a proposition r(i, j) saying that a subject i and an object j stands in the relation r is represented as a triple (i,r, j), has spurred the development of scalable linear algebraic technology for relational inference that is applicable to logical inference as well. For example, given a domain of size n, we enumerate entities in the domain and represent the i-th entity (1 i n) by n 1 one-hot vector vi = (0, ,1,0 )T such that only the i-th entry is one. Also the relation r is encoded as an adjacency

matrix R such that Ri j = 1 if r(i, j) is true and Rij = 0 otherwise. The truth value (1 if true, else 0) of r(i, j) is computed by multiplying vi, v j and R like v T i Rvj {1,0}. We note that starting from this simple setting, a full-ﬂedged formulation of ﬁrst order logic is developed in tensor spaces where predicates and quantiﬁcations are represented by tensors [Sato, 2017b]. Also by applying such an algebraic formulation to deductive inference, i.e., Datalog evaluation, it is possible to achieve, for example, 101 times to 104 times faster computation than symbolic approaches for some class of Datalog programs [Sato, 2017a]. Recently it is reported that the wellknown logic programming semantics such as the least model semantics and the stable model semantics are rigorously formalized in tensor spaces [Sakama et al., 2017]. These results clearly demonstrate the potential of doing logic in continuous spaces and suggest further exploration of the linear algebraic approach in logical inference beyond deductive one. In this paper, we propose a new approach to abduction, i.e., non-deductive inference to ﬁnd a hypothesis H for an observation O such that H,KB O where KB is background knowledge. We reformulate it linear algebraically in vector spaces as linear algebraic abduction to abduce relations, not logical formulas, to realize approximate but scalable abduction that can deal with web-scale knowledge bases. More speciﬁcally we consider the problem of abducing relations for Datalog programs with binary predicates. We treat two cases, the non-recursive case and the recursive case. In the non-recursive case, given r1(X,Y) and r3(X,Z)1, we abduce r2(Y,Z) so that r3(X,Z) Yr1(X,Y) r2(Y,Z) approximately holds, by computing R2 that approximately satisﬁes a matrix equation R3 = min1(R1R2) containing a nonlinear function min1(x). Here R1, R2 and R3 encode as adjacency matrix r1(X,Y), r2(Y,Z) and r3(Y,Z) respectively. We apply this matrix-based abduction to rule discovery and relation discovery in a knowledge graph. The recursive case is mathematically more involved and computationally more difﬁcult but solvable by deriving a recursive matrix equation and solving it. We illustrate concrete recursive cases including a transitive closure relation. Our contribution thus includes a proposal of innovative linear algebraic approach to abduction and the establishment of its technical basis.

1We follow Prolog convention. So strings beginning with a capital letter are variables and others are constants or predicate symbols.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

2 Preliminaries

2.1 Abduction in Datalog

We ﬁrst brieﬂy explain abduction in Datalog [Kakas et al., 1992; Eiter et al., 1997; Gottlob et al., 2010]. A Datalog program KB = F R is a ﬁnite set of clauses without function symbols where F is a set of ground atoms called facts and R a set of rules, i.e., deﬁnite clauses. The least Herbrand model MKB is the set of ground atoms provable from KB. We assume KB has only binary predicates and each clause in R is of the form r0(X0,Xn) r1(X0,X1) rn(Xn 1,Xn). We also assume for simplicity that recursive programs have only one recursive goal in their clause body. Logically, deduction in Datalog can be done by SLD deduction. Think of three predicates, nationality(X,Z) (X s nationality is Z), live in(X,Y) (X lives in Y) and located in(Y,Z) (Y is located in Z) together with a program KB0 = F0 R0 where F0 = {live in(spielberg,california),located in(california,usa)} and R0 = {nationality(X,Z) live in(X,Y) located in(Y,Z)}. Datalog proves nationality(spielberg,usa) as a top-level query from KB0 using SLD deduction. Now consider abduction. Unlike deduction, abduction is invoked when a set F of facts is not enough to deduce observations. Suppose we have a set of special atoms called abducibles as possible hypothesis. When we have found that F ,R o for an observed ground atom o, we hypothesize, or abduce, a set {a1,...,am} of abducibles and add them to F so that {a1,...,am} F ,R o holds. Suppose for example we have F 0 = {live in(spielberg,california)} and observe o = nationality(spielberg,usa). We note that F 0 R0 o. So we attempt to retain the conclusion by abducing a hypothesis a = located in(california,usa) so that we have {a} F 0,R0 o. In Datalog, abduction can be realized by extending SLD deduction to return selected abducibles with which a top-level goal is proved. In our approach however, we abduce not some selected abducibles but the extension of an abducible predicate that represents a hypothesized/missing relation.

2.2 Matrix Compilation for Deductive Inference

Here we recap how a program KB is compiled to a set of matrix equations that compute the least model MKB [Sato, 2017a]. Without loss of generality, it is assumed that the domain of size n consists of integers i (1 i n). For concreteness, suppose KB contains ground atoms about live in/2 and located in/2 and one clause nationality(X,Z) live in(X,Y) located in(Y,Z). In compilation, ﬁrst ground atoms are compiled to matrices; ground atoms about live in/2 and located in/2 are respectively compiled to n n adjacency matrices Li and Lo such that (Li)ij = 1 if live in(i, j) KB and (Li)ij = 0 otherwise, etc. Then the deﬁnite clause nationality(X,Z) live in(X,Y) located in(Y,Z) is compiled, in which a conjunction in the clause body is compiled to a matrix product. Thus the clause body live in(X,Y) located in(Y,Z) is compiled to v T Xmin1(Li Lo)v Z where v X designates an indefinite one-hot vector corresponding to the variable X. Here min1(x) denotes a nonlinear thresholding function deﬁned by

min1(x) = 1 if x 1, and min1(x) = x otherwise. min1(A) for matrix A means a componentwise application of min1. The clause head nationality(X,Z) is compiled to v T XNav Z where Na is the adjacency matrix representing nationality/2. Finally, recalling that nationality(X,Z) Ylive in(X,Y) located in(Y,Z) holds in MKB for any X and Z, and hence so does v T XNav Z = v T Xmin1(Li Lo)v Z for any one-hot vectors v X and v Z, we reach the following matrix equation as compilation output: Na = min1(Li Lo). (1) This equation should hold for any Na, Li and Lo encoding nationality/2, live in/2 and located in/2 in MKB. We can use it for bottom-up computation of MKB in such a way that we evaluate the clause body ﬁrst, obtaining Li and Lo, then compute Na for the head predicate using (1).

2.3 Recursive Programs Now consider recursive programs. For illustration, we take up the following program (2) which computes the transitive closure r2(X,Z) of a base relation r1(X,Y). We can compile this program to a matrix equation with recursion using the method of [Sato, 2017a] as follows. r2(X,Y) r1(X,Y) r2(X,Z) r1(X,Y) r2(Y,Z) (2) The compilation procedure compiles each clause body, then combining the output by sum and min1, produces the following nonlinear recursive equation: R2 = min1(R1 +R1R2) (3) where R1 and R2 are n n adjacency matrices respectively encoding r1(X,Z) and r2(X,Z) in the least model. Although the matrix equation (3) holds true in the least model and solving (3) for R2 assuming R1 gives the transitive closure r2(X,Y) as an adjacency matrix, the nonlinearity of min1 blocks the application of linear algebraic operations to solve it. We circumvent this difﬁculty by linearizing (3) to R2 = ε(R1 +R1 R2) (4)

where ε is a small positive number such that (I εR1) 1 exists, for example ε = 1 1+ R1 2. This equation (4) is easy to

solve and gives R2 = (I εR1) 1εR1. It is proved that the thresholded matrix ( R2)>0 coincides with the (least) solution of equation (3), also denoted by R2 [Sato, 2017a]. R2 = ( R2)>0 (5)

Here ( R2)>0 is a matrix obtained by thresholding each entry of R2 at 0, i.e., replacing the entry ( R2)i j by 1 if ( R2)ij > 0, else by 0. This way of computing transitive closure has time complexity O(n3)3 and empirically outperforms conventional symbolic approaches to Datalog evaluation when R1 is nonsparse [Sato, 2017a]. Matrix compilation which produces matrix equations like (1) and (3) can be extended to general Datalog programs but we omit it. In what follows, we explain our new approach to abduction in detail. We ﬁrst deal with the non-recursive case.

2 A is a matrix norm deﬁned by A = maxi j |aij|. 3Or less, O(n2.376), if the Coppersmith-Winograd algorithm [Coppersmith and Winograd, 1990] is used.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

3 Non-recursive Abduction 3.1 Abducing Adjacency Matrices Our abduction approximately abduces relations as adjacency matrices by solving matrix equations. Suppose we have two relations nationality(X,Z) and live in(X,Y) as n n adjacency matrices Na and Li respectively together with a non-recursive clause nationality(X,Z) live in(X,Y) located in(Y,Z), and seek to abduce located in(Y,Z) such that nationality(X,Z) Y live in(X,Y) located in(Y,Z) as a missing relation. This abduction problem is equivalent to abducing an adjacency matrix Lo for located in(Y,Z) that satisﬁes a matrix equation Na = min1(Li Lo). Na = min1(Li Lo) would be exactly solvable for Lo by reformulating it to a SAT problem at the cost of seeking the truth value of n2 Boolean variables that encode Lo. However as we are thinking of applications to knowledge graphs where n easily goes up to 104 or higher, solving a SAT problem of having 108 Boolean variables seems not feasible as of now. Hence we approximately solve the problem in a vector space and obtain Lo such that Na min1(Li Lo) but how? We ﬁrst simply solve Na = Li X for X and put Lo = X>θ for some appropriate θ where X>θ denotes thresholding at θ of each entry x in X such that x is replaced by 1 if x > θ, else by 0. The difference between Na and min1(Li X>θ) will be minimized by adjusting θ. Formally we state

Problem deﬁnition:(relation abduction in vector spaces) Given adjacency matrices R1 and R3, our abduction problem is to ﬁnd R2 such that R3 min1(R1R2).

We solve this problem by computing a matrix X that minimizes J:

J = R3 R1X 2 F +λ X 2 F (6)

where λ > 0, F denotes Frobenius norm and λ X 2 F is a penalty term controlling the Frobenius norm of X.

Theorem: The minimizer of J is given by

X = (λI+RT 1 R1) 1(RT 1 R3) (7)

where I is an identity matrix. Sketch of proof: The partial derivatives of the objective function J w.r.t. Xijs set to zero give a matrix equation (λI + RT 1 R1)X = RT 1 R3. Since λI + RT 1 R1 is positive deﬁnite for λ > 0, it has an inverse thereby yielding (7). The error is computed as δ = R3 R1X = PR3 where P approaches the orthogonal projection onto the orthogonal complement of range(R1) when λ 0. Accordingly if R3 is factored into R3 = R1Y for some matrix Y, δ = PR3 = PR1Y = 0 (all 0 matrix) when λ 0, and hence we have R3 = R1X (but unfortunately, X is not yet an adjacency matrix, so we need thresholding to obtain R2). After computing X by (7), we abduce R2 as R2 = X>θ while adjusting the threshold θ so that R3 is best approximated by min1(R1R2). We evaluate the proximity of min1(R1R2) to R3 in terms of their F-measure4 where we

4The F-measure for two sets A and B is deﬁned as F(A,B) =

consider an adjacency matrix R as a set {(i, j) | Rij = 1}. The time complexity of this abduction, computing R2 and min1(R1R2) from R3 and R1, is O(n3). We next test our linear algebraic approach using artiﬁcial data and real data5.

3.2 Experiment with Random Graphs Here we conduct an experiment for non-recursive abduction with artiﬁcial data. We use adjacency matrices expressing directed random graphs D(n,pe) with n vertices such that an edge occurs independently with probability pe. In the experiment, we set n = 104, create two n n random adjacency matrices R1 and R2 encoding two D(n,pe)s and construct R3 = min1(R1R2). Note that R3 is not a pure random matrix representing some D(n,pe) but one with internal structure in the form of matrix production. The purpose of this experiment is to identify this internal structure by abduction; we compute X by (7) and put R2 abd = X>θ, and examine to what extent R3 = min1(R1R2 abd) holds. Or logically speaking, given r3(X,Z) and r1(X,Y), we examine whether r3(X,Z) can be written as an existential conjunction Yr1(X,Y) r2(Y,Z) or not. θ is adjusted as follows; we equally divide the space between the largest and smallest values of X into 50 discrete steps, test each step as a thresholding value and choose one achieving the best F-measure for R3 and min1(R1R2 abd). We evaluate the quality of abduction in two ways: one by the absolute error error, i.e., the number of different entries between R3 and min1(R1R2 abd)6 and the other their F-measure. We repeat this process for λ = 1 and pe {10 2,10 3,10 4,10 5}. The results are summarized in Table 1 where |R3| denotes the number of 1s in R3 etc and time means execution time. We can make several observations. First of all, recall that random matrices R1 and R2 encoding D(n,pe) where n = 104 have an entry 1 with probability pe. It follows by calculation that an entry in min1(R1R2) is 1 with probability 1 (1 p2 e)n and R3 is expected to have n2(1 (1 p2 e)n) 1s on average. The column |R3| of Table 1 supports this estimation well. Second, F-measure is high for all values of pe, which means if the observed relation r3(X,Z) is describable as Yr1(X,Y) r2(Y,Z), we may have a good chance of reconstructing it if r1(X,Z) is given. Also note that the absolute error error is small or even zero for sparse matrices where pe 10 3. Besides, |R2 abd| < |R2| holds for all pes and the difference widens as R3 becomes sparse. In other words, R3, if sparse, is approximated well using R2 abd which is smaller than the original R2. This is particularly important when applying to knowledge graphs as real relations tend to be very sparse. Third, computation is done in reasonable time despite the fact that we are dealing with 104 104 sized matrices. This

2|A B|/(|A| + |B|) where |A| denotes the cardinality of A. 0 F(A,B) 1 and F(A,B) = 1 if-and-only-if A = B. Hence, the higher, the better. 5All experiments in this paper are carried out by GNU Octave 4.0.0 on a PC with Intel(R) Core(TM) i7-3770@3.40GHz CPU, 28GB memory. 6 error is the number of unequal entries of R3 and min1(R1R2 abd).

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

pe |R3| |R1| |R2| |R2 abd| error F-measure time(sec)

10 2 63,209,013 999,390 1,000,216 998,936 3,434,829 0.947 1597.2 10 3 996,254 99,924 100,190 99,923 3 1.000 1629.8 10 4 9,772 10,003 9,818 7,501 342 0.966 1690.8 10 5 104 1,035 1,041 107 0 1.000 1673.6

Table 1: Abduction with random adjacency matrices

may be due to multi-threaded matrix operations on a multicore processor. Also notably, computation time remains almost constant irrespective of pe. We therefore may say that our linear algebraic abduction is applicable to considerably large relations.

3.3 Rule Discovery and Relation Discovery for Knowledge Graph

Now we deal with real data. We use FB15k, a standard knowledge graph comprised of triples (subject, relation, object) in RDF format for 1,345 binary relations and 14,951 entities [Bordes et al., 2013]. Let R1, R2 and R3 be matrices encoding relations r1(X,Y), r2(Y,Z) and r3(X,Z) in FB15k respectively. For later use, we deﬁne F-measure of r3(X,Z) r1(X,Y) r2(Y,Z) as the F-measure for min1(R1R2) and R3. We tackle two problems at once. One is rule discovery that ﬁnds a rule of the form r3(X,Z) r1(X,Y) r2(Y,Z) with a high F-measure. The other is relation discovery that seeks for an unknown relation runknown(Y,Z), not existent in the original database but connecting known relations r3(X,Z) and r1(X,Y) by a rule

r3(X,Z) r1(X,Y) runknown(Y,Z). (8)

A naive approach to the ﬁrst problem, rule discovery, would be to enumerate all possible triples (r1(X,Y),r2(Y,Z),r3(X,Z)) of relations in FB15k and check if r3(X,Z) r1(X,Y) r2(Y,Z) holds well. However since FB15k contains 1,345 relations, it is unrealistic to test their 1,3453 2.4 109 combinations. We therefore focus on relations on a speciﬁc domain such as person and ﬁlm. Furthermore we choose speciﬁcally top-ten largest relations (in the number of triplets) in the domain and search for (r1(X,Y),r2(Y,Z),r3(X,Z)) where r1(X,Y) and r3(X,Z) are two of the top-ten relations but r2(Y,Z) comes from the entire FB15k relations, hoping that we may be able to ﬁnd interesting rules that reveal unknown connections among existing relations in the domain. For the second problem, relation discovery, we simply abduce r2 abd(Y,Z) as an unknown relation so that a rule r3(X,Z) r1(X,Y) r2 abd(Y,Z) achieves a high F-measure. Fortunately by ﬁrst abducing r2 abd(Y,Z) and reusing it for rule discovery, we can ﬁnd rules and new relations simultaneously over the speciﬁed domain. More concretely, we do the following for all pairs of matrices (R3,R1), each 14,951 14,951 matrix, for the top-ten

relations in the domain7. First abduce R2 abd such that R3 min1(R1R2 abd). If F-measure of R3 and min1(R1R2 abd) is less than 0.5, do nothing. Otherwise, search for R2 sim from the entire 1,345 relations in FB15k which is most similar to R2 abd in terms of cosine similarity8. Return r2 abd(Y,Z), the binary relation represented by R2 abd, as an abduced relation and r3(X,Z) r1(X,Y) r2 sim(Y,Z) as a discovered rule for the pair (R3,R1). Since additional work to take cosine similarity is O(n2) for n n matrices, the time complexity to compute r3(X,Z) r1(X,Y) r2 sim(Y,Z) remains O(n3). We list two of the rules and new relations we discovered9. Given nationality(X,Z) and live in(X,Y) in a domain concerning people in FB15k, we discovered

nationality(X,Z) live in(X,Y) located in(Y,Z).

A typical instance of this rule is nationality(spielberg,usa) live in(spielberg,cincinnati) located in(cincinnati,usa). Here all three relations are existent in FB15k. The quality of the rule, the F-measure for Na and min1(Li Lo) is 0.415 where Na, Li and Lo respectively encode nationality(X,Z), live in(X,Y) and located in(Y,Z) in the knowledge graph. Comparing to the F-measure 0.007 for Na and Li, this value is much higher, which means nationality(X,Z) is much better predicted from live in(X,Y) located in(Y,Z) than from live in(X,Y) alone. Although we have discovered other interesting rules with high F-measures, they are omitted for space limitations. Concerning relation discovery, genre lang(Y,Z) in the clause below is an example of the abduced relations in a domain related to ﬁlm.

language(X,Z) genre(X,Y) genre lang(Y,Z)

Here language(X,Z) and genre(X,Y) are existing relations but genre lang(Y,Z) is an abduced one, not existent in FB15k. A plausible instance by known entities of this rule is language(the lord of the rings,english) genre(the lord of the rings,adventure) genre lang(adventure,english). The abduced relation genre lang(Y,Z) teaches us how genre(X,Y) is possibly

7The triples comprising a relation in FB15k are divided into training, validation and test sets [Bordes et al., 2013]. We use training sets for rule and relation discovery. 8The cosine similarity of matrices A and B is deﬁned as ( i j Aij Bi j)( A F B F) 1. 9We assign familiar names to relations in FB15k for intuitiveness.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

connected to language(X,Z). Actually, while the F-measure of a rule language(X,Y) genre(X,Y) that predicts language(X,Y) from genre(X,Y) alone is zero, the F-measure of the above rule is 0.655, a reasonably high value, which exempliﬁes the effectiveness of combining abduced relations and existing relations to predict target relations.

4 Recursive Abduction for Transitive Closure So far we have only been dealing with the non-recursive case where we abduce r2(Y,Z) for a non-recursive clause r3(X,Z) r1(X,Y) r2(Y,Z). However, our approach is applicable to the recursive case as well. Look at the equation R2 = min1(R1 + R1R2) (3) for transitive closure compiled from a recursive Datalog program (2) that computes the transitive closure r2(X,Y) of a base relation r1(X,Y), where matrices R1 and R2 encodes respectively r2(X,Y) and r1(X,Y). Then our abduction problem is stated as follows.

Problem deﬁnition:(recursive abduction in vector spaces) Given R2, ﬁnd a base relation R1 such that R2 min1(R1 +R1R2).

We abduce the base relation R1 from R2 in two steps. First we solve (9)

R2 = X+XR2 (9)

for X and obtain X = R2(I+R2) 1. Then we threshold X at some θ and return R1 abd = X>θ as an abduced base relation such that R2 min1(R1 abd +R1 abd R2). We conduct an experiment with artiﬁcial data to test the above approach. We use directed random graphs D(n,pe) similarly to the previous experiment. We set n = 104 and θ = 10 4. Given pe, we generate an n n random matrix R1 encoding D(n,pe) and compute its transitive closure R2. Then we abduce R1 abd such that R2 min1(R1 abd + R1 abd R2) and evaluate the quality of this abduction in terms of error = the number of different entries between R2 and min1(R1 abd+ R1 abd R2). For each pe {10 3,10 4,10 5,10 6}, we repeat the above process ﬁve times and take the average of |R2|, |R1|, |R1 abd|, error and computation time. The results are listed in Table 2. Seeing Table 2, we know error = 0, i.e. R2 = min1(R1 abd + R1 abd R2) holds exactly for all cases, so we may say our abduction is exact under the setting of this experiment. In particular for pe = 10 5, 10 6 when matrices are sparse, during the experiment for all runs, we observed that R1 abd = R1 (hence |R1 abd| = |R1| in the table) and R1 abd is a transitive reduction of R110. I.e., in addition to R3, the base relation R1 is also correctly recovered by the linear algebraic abduction for pe = 10 5, 10 6. This is a bit surprising considering the simplicity of our approach. However when pe increases to 10 4, the gap between R1 abd and R1 appears and at pe = 10 3, abduction apparently fails as |R1 abd| = |R2| strongly suggests we abduced R2 itself for all runs11. The

10For adjacency matrices A and B, A is said to be a transitive reduction of B if both have the same transitive closure and A is minimal as a graph. 11R2 = min1(R2R2) always holds as R2 is a transitive closure.

lesson we can draw from this experiment is that the linear algebraic abduction is likely to work well if matrices are sparse even though there is recursion. We then conduct a similar experiment with real data. We choose ﬁve network graphs from the Koblenz Network Collection [Kunegis, 2013] and apply the linear algebraic abduction. For each network R1 (as n n adjacency matrix), we compute its transitive closure R2 and abduce R1 abd so that R2 min1(R1 abd + R1 abd R2) holds. Table 3 contains the results from the experiment. We observe that networks vary in size and some networks are large (n > 20,000) but abduction is done in reasonable time. Also the transitive closure relations R2 are wellapproximated by the transitive closure of R1 abd12. Furthermore |R1 abd| < |R1| holds for the reactome network, which means R1 abd gives the same transitive closure but more compactly. We thus remark that the linear algebraic abduction can be a useful tool for approximately ﬁnding base relations for transitive closure relations from real data.

5 Related Work

There are not many papers concerning logical inference in vector spaces. Ceri and Tanca formulated relational equations in relational algebra to compute the least model semantics of Datalog and analyzed bottom-up and top-down execution methods. However relations are not expressed nor manipulated as matrices [Ceri et al., 1989]. Ioannidis and Wong gave an algebraic formalization of recursive Horn programs using the theory of closed semiring. They proved that processing recursive clauses are reduced to solving recursive operator equations [Ioannidis and Wong, 1991]. Lin formulated resolution linear algebraically and studied a linear algebraic treatment of SAT problems [Lin, 2013]. Grefenstette reconstructed quantiﬁer free ﬁrst-order logic in tensor spaces. Quantiﬁcation is also considered but nested one is excluded [Grefenstette, 2013]. Sato proposed tensorization of full ﬁrst-order logic with arbitrary quantiﬁcation [Sato, 2017b]. Sakama et al. formalized logic programming semantics of various types and their computation using tensors [Sakama et al., 2017]. Sato described a linear algebraic formulation of Datalog and demonstrated that faster computation can be achieved by compiling programs to matrix equations [Sato, 2017a]. However, it considers only deduction, and realizing abduction in matrix computation has been left open. In the ﬁeld of knowledge graph, linear algebraic techniques in vector spaces such as low-dimensional embedding to process a huge number of triplets in a knowledge graph have been developed [Bordes et al., 2013; Yang et al., 2015; Nickel et al., 2016; Rockt aschel et al., 2015; Garc ıa-Dur an et al., 2016; Trouillon et al., 2016]. Entities are compactly represented as low-dimensional vectors and binary relations are represented either by vectors or by matrices. Their learning from real data has been a central issue while logical inference is attempted to assist extracting new knowledge [Rockt aschel et al., 2015] or simulated in the form of link traversal in

12The absolute error error for subelj-cora looks large but the relative error is 6618944/231662 = 1.2%.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

pe |R2| |R1| |R1 abd| error time(sec)

10 3 99,992,000.0 100,110.0 99,992,000.0 0 40.3 10 4 313,433.4 9,972.8 11,814.0 0 41.2 10 5 1,092.2 982.2 982.2 0 43.4 10 6 100.4 99.8 99.8 0 43.9

(all ﬁgures are averages over ﬁve runs)

Table 2: Abduction through transitive closure of directed random graphs

Network n |R2| |R1| |R1 abd| error F-measure time(sec)

ego-twitter 23,370 353,882 33,101 172,870 0 1.000 559.7 subelj-cora 23,166 93,584,386 91,500 43,521,487 6,618,944 0.929 847.4 dblp-cite 12,591 7,583,575 49,743 1,439,337 0 1.000 83.7 reactome 6,327 4,744,333 147,547 99,912 65,380 0.986 11.2 moreno-blogs 1,224 982,061 19,025 962,118 0 1.000 0.1

Table 3: Abduction with network graphs from Koblenz Network Collection

the graph. Yang et al. attempted rule discovery in the lowdimensional embedding space. They basically enumerate all combinations of relations existent in the knowledge graph as the clause body unlike our approach that combines relations using relation discovery [Yang et al., 2015]. Widdows and Cohen studied the vector representation of propositions and apply it to analogical reasoning. They represent entities and binary relations in terms of sparse vectors of various types (binary, real, complex) and show how to compose and decompose propositions by speciﬁcally designed linear algebraic operations called binding and release [Widdows and Cohen, 2015]. These works however pay attention primarily to deduction and neither abduction nor relation discovery is treated.

Abduction has been extensively studied in logic programming [Kakas et al., 1992; Eiter et al., 1997; Gottlob et al., 2010]. Compared to the conventional approach, our abduction differs vastly in that ﬁrst it is computed in vector spaces, second it abduces not formulas but relations in the form of adjacency matrices, and third relations are obtained approximately by solving matrix equations.

These features bring several advantages. Since vector spaces are continuous and so is inference, our abduction, though approximate, is expected to be robust to impreciseness and noise, complementary to exact but brittle symbolic abduction. Also a broad range of linear algebraic operations are available in vector spaces which are difﬁcult or impossible to perform in symbolic approaches. They include inner product, projection, a various types of decomposition and low-dimensional embedding to name a few. Even recursion is implemented using matrix inversion. Furthermore their computation is highly scalable with modern computer technology such as multi-core CPUs.

6 Conclusion

We proposed an innovative approach, linear algebraic abduction, to abductive inference in Datalog, which approximately abduces relations in the form of adjacency matrices. Our formulation is based on the derivation of a matrix equation from a program. The equation reﬂects the least model of the program and abduction is performed by solving it w.r.t. an unknown matrix that represents the adjacency matrix for the missing relation to be abduced. The equation is non-recursive or recursive depending on whether the original program is non-recursive or recursive. We empirically showed that the linear algebraic abduction applied to the non-recursive case works well for sparse matrices representing directed random graphs. We also applied it to a knowledge graph and simultaneously discovered rules and new relations, which seems unprecedented.

For the recursive case, we conducted several experiments with the transitive closure relation where base relations are abduced for these relations as adjacency matrices. We observed that our approach can derive base relations which give relations identical to or close to the original relations.

In this paper, we outlined a new form of abduction in vector spaces but there remains a lot to be done. In particular, we need to expand the class of abducible programs to those with non-binary predicates by generalizing matrices to tensors for example based on the theoretical framework laid out by [Sato, 2017b] where a ternary predicated is represented by a 3rd order 0-1 tensor {Ri jk}. Also we need to ﬁnd a way to abduce not just one relation but many alternatives, which might be achieved by seeking multiple solutions when solving a matrix equation.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

Acknowledgments This research is supported in part by a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO) and also by NII Collaborative Research.

References [Bollacker et al., 2008] Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of data, pages 1247 1250, 2008. [Bordes et al., 2013] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating Embeddings for Modeling Multirelational Data. In C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 2787 2795. 2013. [Ceri et al., 1989] Stefano Ceri, Georg Gottlob, and Letizia Tanca. What you always wanted to know about datalog (and never dared to ask). IEEE Transactions on Knowledge and Data Engineering, 1(1):146 166, 1989. [Coppersmith and Winograd, 1990] Don Coppersmith and Shmuel Winograd. Matrix Multiplication via Arithmetic Progressions. Journal of Symbolic Computation, 9(3):251 280, 1990. [Dong et al., 2014] Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD2014, pages 601 610, 2014. [Eiter et al., 1997] Thomas Eiter, Georg Gottlob, and Nicola Leone. Abduction from Logic Programs: Semantics and Complexity. Theoretical Computer Science, 189(12):129 177, 1997. [Garc ıa-Dur an et al., 2016] Alberto Garc ıa-Dur an, Antoine Bordes, Nicolas Usunier, and Yves Grandvalet. Combining two and three-way embedding models for link prediction in knowledge bases. Journal of Artiﬁcial Intelligence Research (JAIR), 55:715 742, 2016. [Gottlob et al., 2010] Georg Gottlob, Reinhard Pichler, and Fang Wei. Tractable Database Design and Datalog Abduction Through Bounded Treewidth. Information Systems, 35(3):278 298, 2010. [Grefenstette, 2013] Edward Grefenstette. Towards a Formal Distributional Semantics: Simulating Logical Calculi with Tensors. Proceedings of the Second Joint Conference on Lexical and Computational Semantics, pages 1 10, 2013. [Ioannidis and Wong, 1991] Yannis E. Ioannidis and Eugene Wong. Towards an algebraic theory of recursion. J. ACM, 38(2):329 381, April 1991.

[Kakas et al., 1992] Antonis C. Kakas, Robert Kowalski, and Francesca Toni. Abductive logic programming. Journal of Logic and Computation, 2(6):719 770, 1992. [Kunegis, 2013] J erˆome Kunegis. KONECT - The Koblenz Network Collection. In Proceedings of International Web Observatory Workshop, pages 1343 1350, 2013. [Lin, 2013] Fangzhen Lin. From satisﬁability to linear algebra. Technical report, Hong Kong University of Science and Technology, 2013. [Nickel et al., 2016] Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich. A Review of Relational Machine Learning for Knowledge Graphs: From Multi-Relational Link Prediction to Automated Knowledge Graph Construction. Proceedings of the IEEE, 104(1):11 33, 2016. [Rockt aschel et al., 2015] Tim Rockt aschel, Sameer Singh, and Sebastian Riedel. Injecting Logical Background Knowledge into Embeddings for Relation Extraction. In Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2015. [Sakama et al., 2017] Chiaki Sakama, Katsumi Inoue, and Taisuke Sato. Linear Algebraic Characterization of Logic Programs. In Proceedings of the 10th International Conference on Knowledge Science, Engineering and Management (KSEM2017), LNAI 10412, Springer-Verlag, pages 520 533, 2017. [Sato, 2017a] Taisuke Sato. A linear algebraic approach to Datalog evaluation. Theory and Practice of Logic Programming (TPLP), 17(3):244 265, 2017. [Sato, 2017b] Taisuke Sato. Embedding Tarskian semantics in vector spaces. In AAAI-17 Workshop on Symbolic Inference and Optimization (Sym Inf Opt-17), 2017. [Trouillon et al., 2016] Th eo Trouillon, Johannes Welbl, Sebastian Riedel, Eric Gaussier, and Guillaume Bouchard. Complex embeddings for simple link prediction. In Proceedings of the 33rd International Conference on Machine Learning (ICML 2016), pages 2071 2080, 2016. [Widdows and Cohen, 2015] D. Widdows and T. Cohen. Reasoning with Vectors: A Continuous Model for Fast Robust Inference. Logic journal of the IGPL / Interest Group in Pure and Applied Logics, 23(2):141 173, 2015. [Yang et al., 2015] Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In Proceedings of the International Conference on Learning Representations (ICLR) 2015, 2015.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)