# practical_tbox_abduction_based_on_justification_patterns__0d1ca3df.pdf

Practical TBox Abduction Based on Justiﬁcation Patterns

Jianfeng Du,1,2 Hai Wan,3,4 Huaguan Ma3

1Collaborative Innovation Center for 21st-century Maritime Silk Road Studies, Guangdong University of Foreign Studies, Guangzhou 510420, P.R.China 2Cisco School of Informatics, Guangdong University of Foreign Studies, Guangzhou 510006, P.R.China jfdu@gdufs.edu.cn 3School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510006, P.R.China 4Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou 510006, P.R.China wanhai@mail.sysu.edu.cn

TBox abduction explains why an observation is not entailed by a TBox, by computing multiple sets of axioms, called explanations, such that each explanation does not entail the observation alone while appending an explanation to the TBox renders the observation entailed but does not introduce incoherence. Considering that practical explanations in TBox abduction are likely to mimic minimal explanations for TBox entailments, we introduce admissible explanations which are subsets of those justiﬁcations for the observation that are instantiated from a ﬁnite set of justiﬁcation patterns. A justiﬁcation pattern is obtained from a minimal set of axioms responsible for a certain atomic concept inclusion by replacing all concept (resp. role) names with concept (resp. role) variables. The number of admissible explanations is ﬁnite but can still be so large that computing all admissible explanations is impractical. Thus, we introduce a variant of subset-minimality, written ds-minimality, which prefers fresh (concept or role) names than existing names. We propose efﬁcient methods for computing all admissible ds-minimal explanations and for computing all justiﬁcation patterns, respectively. Experimental results demonstrate that combining the proposed methods is able to achieve a practical approach to TBox abduction.

Introduction Ontologies have been used in many real-life applications, including e-Commerce, medical informatics, bio-informatics, and the Semantic Web. As a popular formalism for expressing ontologies, description logics (DLs) (Baader et al. 2003) underpin the standard Web Ontology Language (OWL) (Horrocks, Patel-Schneider, and van Harmelen 2003). A DL ontology is often expressed as a knowledge base consisting of both schema information in the TBox and data information in the ABox. TBox abduction, advocated in (Elsenbroich, Kutz, and Sattler 2006), is a pragmatic formalism for abductive reasoning in DLs. Given a coherent TBox and an observation which is a concept inclusion not entailed by the TBox, TBox abduction often explains the observation by computing multiple sets of axioms called explanations such that each explanation does not entail the observation alone while appending an explanation to the TBox renders the observation entailed but does not introduce incoherence.

Corresponding author Copyright c 2017, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

TBox abduction is particularly useful in handling missing is-a relations (Lambrix, Dragisic, and Ivanova 2012; Lambrix and Liu 2013). An is-a relation is a concept inclusion between two concept names, called an atomic concept inclusion (ACI) in this paper. Simply adding the missing is-a relations to the TBox may not be a proper way to revise the TBox since some crucial axioms are still missing. A better way is to ﬁnd essential causes of the missing is-a relations and add axioms in these causes to the TBox. Take a TBox T1 consisting of the following axioms α1, ..., α6 for example. α1: Grand Father Man α2: Grand Father has Child. has Child.Human α3: Grand Mother Woman α4: Grand Mother has Child. has Child.Human α5: Man has Child.Human Father α6: Woman has Child.Human Mother Suppose there is a missing is-a relation Grand Father Father and we want to make it entailed by T1. Although simply adding this relation to T1 will work, this addition cannot make another missing is-a relation Grand Mother Mother entailed by T1. This means that such an addition does not sufﬁciently improve the quality of T1. In contrast, adding a crucial axiom has Child.Human Human to T1 will make the above two missing is-a relations entailed by T1. TBox abduction is required to ﬁnd such crucial axioms. Most existing approaches to TBox abduction adopt simple representation languages for explanations, such as ACI (Lambrix, Dragisic, and Ivanova 2012; Lambrix and Liu 2013; Wei-Kleiner, Dragisic, and Lambrix 2014) or slightly extended ACI that allows atomic negation and unqualiﬁed existential or universal restriction (Halland, Britz, and Klarman 2014). These representation languages may discard essential causes of non-entailments. For the previous example, if we disallow qualiﬁed existential restrictions, we cannot discover the explanation { has Child.Human Human} for the missing is-a relation Grand Father Father in T1. A more practical way is to allow more DL constructors including qualiﬁed existential restriction to appear in explanations. However, a representation language supporting qualiﬁed existential restriction will yield inﬁnitely many explanations since it allows inﬁnitely many nested role names to appear in a concept. To guarantee a ﬁnite space for explanations without losing DL expressivity, we consider patterns.

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17)

All entailments have minimal explanations of limited forms, which are often referred to as justiﬁcations. A justiﬁcation is a minimal set of axioms responsible for the entailment. It can be lifted to a justiﬁcation pattern by replacing all different concept (resp. role) names with different concept (resp. role) variables. Consider a TBox T2 obtained from the aforementioned TBox T1 by adding α7 : has Child.Human Human. A justiﬁcation for Grand Mother Mother in T2 is {α3, α4, α6, α7}, which can be lifted to a justiﬁcation pattern Jp = {X Z, X U. U.W, Z U.W Y , U.W W} for the ACI template X Y . To mimic minimal explanations for entailments, it can be assumed that a practical explanation E for a non-entailment α in a TBox T is a subset of some justiﬁcation for α in T E. By considering that any justiﬁcation can be instantiated from a justiﬁcation pattern by a ground substitution which replaces all different concept (resp. role) variables with different concept (resp. role) names, we assume that only a ﬁnite set P of justiﬁcation patterns occurring frequently in the domain of interest is used to yield such an explanation E. We call E an admissible explanation w.r.t. P. For example, given the aforementioned T1 and Jp, { has Child.Human Human} is an admissible explanation for both observations Grand Father Father and Grand Mother Mother w.r.t. {Jp}. It shows that ﬁnding admissible explanations is able to discover essential causes of non-entailments. An admissible explanation is allowed to involve fresh concept or role names that do not appear in the TBox nor in the observation. These fresh concept (resp. role) names can be interpreted as arbitrary concept (resp. role) names in a domain whose vocabulary is larger than that of the given TBox. Although the set of different admissible explanations (up to renaming fresh concept or role names) is ﬁnite, the number of different admissible explanations can still be so large that computing all of them is impractical. To compute less admissible explanations, we introduce a novel notion of minimality, called ds-minimality. By treating fresh (concept or role) names as variables, we call an explanation E ds-minimal if there is no explanation E such that E ds E and E ds E , where we write E ds E if there is a distinguishable substitution θ for E , which maps different fresh names to different fresh names or different existing names not occurring in E , such that E θ E. We simply call an admissible ds-minimal explanation a justiﬁcation pattern based or JP-based explanation. The primary challenge for computing JP-based explanations lies in discovering justiﬁcation patterns. Computing all justiﬁcations in a TBox before discovering justiﬁcation patterns is infeasible. We propose an efﬁcient method that alternately computes justiﬁcations and justiﬁcation patterns so that justiﬁcation patterns can be generated as early as possible. The method calls a DL reasoner as black-box and is applied to any DL. Moreover, we propose an efﬁcient method for computing all JP-based explanations from given justiﬁcation patterns in a coherent TBox. To show that the proposed methods can constitute a practical approach to TBox abduction, we conduct experiments on ten coherent TBoxes with various complexity. Full proofs are available at http://www.dataminingcenter.net/tboxabd/AAAI17.pdf.

Preliminaries We brieﬂy introduce DLs by taking the DL SROIQ (Horrocks, Kutz, and Sattler 2006) for example. SROIQ is an expressive DL corresponding to OWL 2 DL, the most expressive while decidable species in the OWL family. A SROIQ ontology is composed of a TBox and an ABox, both of which are sets of axioms. TBox axioms include concept inclusions C D, complex role inclusions r1 . . . rn s, role disjointness assertions r s and role property assertions that are expressible by other axioms: a role r being symmetric amounts to r r ; r being transitive amounts to r r r; r being reﬂexive amounts to r.Self; r being irreﬂexive amounts to r.Self . The semantics for SROIQ is given by interpretations I = (ΔI, I) for ΔI the domain of I and I a mapping function. A concept inclusion C D (resp. a complex role inclusion r1 . . . rn s, or a role disjointness assertion r s) is said to be satisﬁed by an interpretation I if CI DI (resp. r I 1 . . . r I n s I or r I s I = ). An interpretation I is called a model of a TBox T if it satisﬁes all axioms in T . A concept C is said to be unsatisﬁable in T if CI = for all models I of T . A TBox T is said to be coherent if all concept names in it are satisﬁable. An axiom α is said to be entailed by T or an entailment of T , written T |= α, if α is satisﬁed by all models of T . Given an entailment α of T , a subset J of T is called a justiﬁcation for α in T if J |= α and J |= α for all proper subsets J of J . By Jst(α, T ) we denote the set of justiﬁcations for α in T .

The Semantics for JP-Based Explanations Given a coherent TBox and an observation which is a concept inclusion not entailed by the TBox, TBox abduction often explains the observation by computing explanations for it, formally deﬁned below. The requirement that an explanation for a concept inclusion C D is coherent prevents meaningless explanations {C } from being considered. Deﬁnition 1 Given a coherent TBox T and a concept inclusion α such that T |= α and T {α} is coherent, an explanation E for α in T is a set of axioms such that T E |= α, E |= α (simply say E is nontrivial) and T E is coherent (simply say E is coherent). W.l.o.g. we assume that observations in TBox abduction are ACIs. This assumption does not impair applicability of TBox abduction because the set of explanations for C D in T for C and D general concepts coincides with the set of explanations for PC PD in T {C PC, PD D}, where PC and PD are globally unique fresh concept names. To guarantee a ﬁnite space for explanations without losing DL expressivity, we consider patterns for explanations. We notice that justiﬁcations for an entailment are irreducible sets of axioms that are responsible for the entailment. Hence we assume that an explanation for an ACI in TBox abduction has the same pattern as a justiﬁcation for the ACI. To deﬁne patterns for justiﬁcations, we introduce variational axioms in which variables may appear. A variational axiom is obtained from an axiom by replacing some concept (resp. role) names with concept (resp. role) variables. It is called a fully variational axiom if no concept or role names appear in it. A

fully variational axiom does not alter the top (bottom) concept ( ) and individuals in its original axiom. To make a pattern generalize as many justiﬁcations for ACIs as possible, we use a set of fully variational axioms to denote a pattern for justiﬁcations. This pattern should have some correspondence to a justiﬁcation. Thus, we introduce substitutions for variational axioms, which map concept variables to concept names or other concept variables and map role variables to role names or other role variables. A substitution is said to be ground if it maps concept (resp. role) variables to concept (resp. role) names only. One may only conﬁne that a pattern for justiﬁcations generalizes a justiﬁcation by applying some ground substitution. However, this restriction does not reject arbitrarily large patterns. For example, consider the justiﬁcation J = {A B} for the ACI A B. It can be seen that Jp = {X1 Y1, X2 Y2, . . . , Xn Yn} for n an arbitrarily large integer is a pattern generalizing J , because θ = {Xi A, Yi B | 1 i n} is a ground substitution for Jp such that Jpθ = J . To reject arbitrarily large patterns, we introduce differentiated substitutions which map different concept (resp. role) variables to different concept (resp. role) names. Intuitively, a differentiated substitution is a ground substitution that keeps a one-to-one correspondence between variables and names. We conﬁne that a pattern generalizes at least one justiﬁcation by applying some differentiated substitution. Moreover, an ideal pattern should generalize justiﬁcations only, hence we also conﬁne that a pattern for the ACI template X Y keeps entailment of Xσ Y σ for all ground substitutions σ. We call a pattern satisfying the above restrictions a justiﬁcation pattern.

Deﬁnition 2 A set Jp of fully variational axioms is called a justiﬁcation pattern for X Y in T for X and Y different concept variables, if (1) there is a differentiated substitution θ for Jp such that Jpθ Jst(Xθ Y θ, T ), and (2) Jpσ |= Xσ Y σ for all ground substitutions σ for Jp.

Consider the pattern Jp = {X1 Y1, X2 Y2, ..., Xn Yn} for n 2 again. As required, Deﬁnition 2 prevents Jp from being a justiﬁcation pattern for any Xi Yj (where 1 i, j n) in any TBox T . By lift(S, A, B) we denote the set of fully variational axioms obtained from a set S of axioms by replacing the concept name A with X, the concept name B with Y , other concept names C with concept variables ZC and role names r with role variables Ur. The following proposition provides a method for generating justiﬁcation patterns for the ACI template X Y from justiﬁcations for ACIs.

Proposition 1 Let T be a TBox, A B an ACI entailed by T , and J a justiﬁcation for A B in T . Then lift(J , A, B) is a justiﬁcation pattern for X Y in T .

To make explanations E for an ACI α in a coherent TBox T mimic the way justiﬁcation patterns are generated, we restrict E to be a subset of a certain justiﬁcation for α in T E, where this justiﬁcation is obtained from a given justiﬁcation pattern by applying some differentiated substitution.

Deﬁnition 3 Given a coherent TBox T , a set P of justiﬁcation patterns for X Y and an ACI observation α such that

T |= α and T {α} is coherent, an explanation E for α in T is said to be admissible w.r.t. P if there is a justiﬁcation pattern Jp P and a differentiated substitution θ for Jp such that α = Xθ Y θ, E Jpθ and Jpθ Jst(α, T E).

Example 1 Consider the justiﬁcation pattern Jp = {X Z, X U. U.W, Z U.W Y , U.W W} for X Y . Suppose n 2, T = {A r. r.C, r.C C, D1 B} {Di C | 2 i n}, P = {Jp} and the observation is A B. Then E = {A D1, D1 r.C B} is an admissible explanation for A B in T w.r.t. P, because T E |= A B, E |= A B, T E is coherent and θ = {X A, Y B, Z D1, W C, U r} is a differentiated substitution for Jp such that A B = Xθ Y θ, E Jpθ and Jpθ Jst(A B, T E).

Users of abduction often prefer explanations E that are subset-minimal, i.e., any proper subset of E is not an explanation. Admissible explanations are not necessarily subsetminimal. For example, the admissible explanation E in Example 1 is not subset-minimal because E = {A D1} is an explanation for A B in T such that E E. We call a subset-minimal explanation an admissible subset-minimal explanation if it is also admissible. Since admissible subset-minimal explanations do not contain axioms in the given TBox, it is natural to allow fresh names to appear in these explanations. We call a concept or role name a fresh name if it does not appear in the given TBox nor in the given observation, or an existing name otherwise. Consider Example 1. E1 = {A CZ, CZ r.C B} is an admissible subset-minimal explanation for A B in T w.r.t. P, where CZ is a fresh concept name. The introduction of fresh names enables a compact representation for all admissible subset-minimal explanations. We treat fresh names as variables and deﬁne a substitution for explanations as a mapping from fresh names to existing or fresh names. The set of admissible subset-minimal explanations can be compacted using substitutions. Consider Example 1 again. Ei = {A Di, Di r.C B} for i {2, . . . , n} are also admissible subset-minimal explanations for A B in T w.r.t. P. The set of admissible subsetminimal explanations {Ei | 1 i n} for A B in T w.r.t. P can be compacted as {E1} and recovered from {E1} by applying certain substitutions. To reduce the number of explanations that need to be computed, one may consider computing only those explanations that cannot be obtained from others by applying substitutions. However, the veriﬁcation of such explanations needs to consider longer explanations (i.e. those explanations having larger cardinalities) and thus is inefﬁcient. For example, the explanation {A r.A} can be obtained from a longer explanation {A r.CX1, CX1 r.CX2, ..., CXn 1 r.CXn} by applying θ = {CXi A | 1 i n}, where n 2 and CXi are fresh. To improve efﬁciency, we introduce the notion of distinguishable substitutions. A distinguishable substitution θ for E is a substitution for E that maps different fresh names to different fresh names or different existing names not occurring in E. Intuitively, the set of concept (resp. role) names in Eθ has a one-to-one correspondence to the set of concept (resp. role) names in E. For

the above example, θ is not a distinguishable substitution for E since it maps CX1 to A which appears in E. For two explanations E and E , we write E ds E if there is a distinguishable substitution θ for E such that E θ E, or write E ds E otherwise. We call an explanation E for an observation α in a TBox T ds-minimal if there is no explanation E for α in T such that E ds E and E ds E . The following proposition shows that ds-minimality implies subset-minimality.

Proposition 2 A ds-minimal explanation for an axiom α in a TBox T is a subset-minimal explanation for α in T .

The checking of ds-minimality does not need to consider longer explanations. In fact, it only needs to consider n explanations that are not longer than E, where n is the sum of the number of axioms in E and the number of existing names in E. To show this checking method, we introduce two variant sets for a set S of axioms. Firstly, we deﬁne a closest lift set of S as a set of axioms obtained from S by replacing an existing name with a fresh name not occurring in S. By lifts1(S) we denote the set of closest lift sets of S. For example, there are two closest lift sets of {A r.A}, namely {A r U.A} and {CX r.CX}, where r U is a fresh role name and CX is a fresh concept name. Secondly, we deﬁne a closest proper subset of S as a proper subset of S that consists of all but one axioms in S. By subs1(S) we denote the set of closest proper subsets of S. The following proposition shows that checking ds-minimality can be reduced to linearly many entailment tests w.r.t. the size of E.

Proposition 3 An explanation E for an axiom α in a TBox T is a ds-minimal explanation for α in T if and only if T E |= α for all E subs1(E) lifts1(E).

By combining ds-minimality with admissibility, we obtain a new class of explanations named justiﬁcation pattern based (simply JP-based) explanations, deﬁned below.

Deﬁnition 4 A ds-minimal explanation for α in T is said to be a justiﬁcation pattern based (simply JP-based) explanation for α in T w.r.t. P if it is admissible w.r.t. P.

We call two explanations different if they cannot be converted to each other by renaming fresh names. We propose to compute all different JP-based explanations rather than all different admissible subset-minimal explanations, because (1) by Proposition 2, a JP-based explanation must be admissible and subset-minimal, and (2) the number of different JP-based explanations can be much smaller than the number of different admissible subset-minimal explanations.

How to Compute JP-based Explanations Proposition 1 has indicated that we are able to compute justiﬁcation patterns from a given TBox by ﬁrst computing all justiﬁcations for ACIs entailed by the TBox and then lifting computed justiﬁcations to justiﬁcation patterns. However, this method is infeasible because computing all justiﬁcations for a single ACI is already time consuming (Kalyanpur et al. 2007; Du, Qi, and Fu 2014) even if it is optimized by modularization (Suntisrivaraporn et al. 2008). A practical way should compute justiﬁcation patterns as early as possible.

Algorithm 1 Compute Justiﬁcation Patterns(T )

1: P // P stores justiﬁcation patterns for X Y 2: for all ACIs A B such that A = B and T |= A B do 3: S {Jpθ | Jp P, θ is a differentiated substitution for Jp such that Xθ = A, Y θ = B, Jpθ T and J |= A B for all J subs1(Jpθ)} 4: while S = (then H is set as ) or there is a minimal hitting set H for S such that T \ H |= A B do 5: J Find Justiﬁcation( , T \ H, A B) 6: Jp lift(J , A, B) 7: P P {Jp} 8: S S {Jpθ | θ is a differentiated substitution for Jp such that Xθ = A, Y θ = B, Jpθ T and J |= A B for all J subs1(Jpθ)} 9: end while 10: end for 11: return P Function Find Justiﬁcation(Su, Sc, A B) // Return a minimal subset S of Sc such that Su S |= A B. 1: if Sc = or Su |= A B then 2: return 3: else if |Sc| = 1 then 4: return Sc 5: else 6: Divide Sc into two disjoint subsets S1 and S2 such that S1 S2 = Sc, S1 S2 = and 0 |S1| |S2| 1 7: Δ2 Find Justiﬁcation(Su S1, S2, A B) 8: Δ1 Find Justiﬁcation(Su Δ2, S1, A B) 9: return Δ1 Δ2 10: end if

Thus, we propose a method that alternately computes justiﬁcations and justiﬁcation patterns, formally shown in Algorithm 1. This method is extended from the method proposed in (Du, Qi, and Fu 2014) by adapting it to an incremental one and by embedding the computation of justiﬁcation patterns. The main function Compute Justiﬁcation Patterns(T ) in Algorithm 1 handles every entailed ACI one by one. For an ACI A B entailed by T , it computes all justiﬁcations for A B that can be retrieved from computed justiﬁcation patterns for X Y before computing new justiﬁcations (line 3). It then incrementally computes a new justiﬁcation for A B using the same search strategy given by (Du, Qi, and Fu 2014). A new justiﬁcation for A B exists if and only if there is a minimal hitting set (MHS) H for the set of computed justiﬁcations such that T \ H |= A B, where an MHS H for a set S of sets of axioms is a minimal set of axioms such that H S = for all S S. When there is an MHS H such that T \ H |= A B, a new justiﬁcation for A B in T is computed from T \ H by a applying divide-and-conquer method developed in (Junker 2004; Du, Qi, and Fu 2014) (line 5). After a new justiﬁcation J is computed, it is lifted to a justiﬁcation pattern Jp for X Y in T (line 6). Afterwards the main function retrieves more justiﬁcations for A B from Jp (line 8) and tries to ﬁnd a new justiﬁcation (line 4). Algorithm 1 eventually computes a set of justiﬁcation patterns for X Y that generalizes all justiﬁcations for all ACI entailments, as shown below.

Theorem 1 Compute Justiﬁcation Patterns(T ) returns a set P of justiﬁcation patterns for X Y in T such that for

all ACIs A B entailed by T and all justiﬁcations J for A B in T , there exists a justiﬁcation pattern Jp P and a differentiated substitution θ for Jp such that Jpθ = J . After a set P of justiﬁcation patterns for X Y in T is obtained, we are able to compute all JP-based explanations for A B in T w.r.t. P by bipartition of every justiﬁcation pattern in P and by treating one part of the pattern as a subset of T and the other part as a JP-based explanation. Given a justiﬁcation pattern Jp, we deﬁne a bipartition of Jp as a pair (J1, J2) such that J1 J2 = and J1 J2 = Jp. By bipart(Jp) we denote the set of bipartitions of Jp. For a set S of variational axioms, by inst(S) we denote the set of axioms obtained from S by replacing all different concept (resp. role) variables with different fresh concept (resp. role) names. The following theorem provides a method for computing all different JP-based explanations. Theorem 2 Given a coherent TBox T , a set P of justiﬁcation patterns for X Y in T , and an observation A B such that T |= A B and T {A B} is coherent, the set of different JP-based explanations for A B in T w.r.t. P amounts to S = {inst(J1θ) | Jp P, (J1, J2) bipart(Jp) and θ is a differentiated substitution for J2 {X Y } such that Xθ = A, Y θ = B, J2θ T , inst(J1θ) |= A B, T inst(J1θ) is coherent, J2θ inst(J1θ) Jst(A B, T inst(J1θ)), and no explanation E for A B in T fulﬁlls E ds inst(J1θ) and inst(J1θ) ds E } up to renaming fresh names. For efﬁciency, the above method can be optimized by simplifying the tests for non-triviality, ds-minimality and admissibility, as shown in the following corollary. Corollary 1 The set of different JP-based explanations for A B in T w.r.t. P amounts to {inst(J1θ) | Jp P, (J1, J2) bipart(Jp) and θ is a differentiated substitution for J2 {X Y } such that J2 Jp, Xθ = A, Y θ = B, J2θ T , T inst(J1θ) is coherent, J inst(J1θ) |= A B for all J subs1(J2θ), and T E |= A B for all E lifts1(inst(J1θ)) subs1(inst(J1θ))} up to renaming fresh names.

Experimental Evaluation We implemented both the method for computing all justiﬁcation patterns and the method for computing all JP-based explanations in Java, using the Pellet (Sirin et al. 2007) API (version 2.3.1) to discover and check entailments, and using My SQL to manage justiﬁcation patterns. The implementation of the former method computes justiﬁcations in the syntactic locality-based module (Grau et al. 2007) for the lefthand side of the given entailment rather than in the whole TBox. This optimization has been proved to be sound and complete (Suntisrivaraporn et al. 2008). The implementation of the latter method computes JP-based explanations for an observation from lower to higher levels, where at level k it computes all JP-based explanations whose cardinalities are k. This way enables us to stop computing JP-based explanations at a certain level when time resource is limited. We collected ten coherent TBoxes. Some of their statistics are reported in Table 1. The ﬁrst TBox is the wellknown LUBM (Guo, Pan, and Heﬂin 2005) TBox. The next

Table 1: The characteristics of every test TBox TBox DL Expressivity #C #R #A #E LUBM ALEHI+(D) 43 32 93 76 UOBM-Lite ALEHIN +(D) 52 43 145 86 UOBM-DL SHOIN(D) 69 44 206 113 generations ALCOIF 18 4 38 45 wine SHOIN(D) 77 14 657 322 philosurﬁcal AL 377 313 1465 2981 not-galen ALEHF + 3097 413 5771 32475 physiology ALEI 2129 134 2256 2146 drugs ALEHIF(D) 4258 89 5042 17103 pathology ALEI(D) 7378 258 7159 14235

Note: #C is the number of concept names; #R is the number of role names; #A is the number of axioms in the TBox; #E is the number of ACIs that are entailed by the TBox; The DL expressivity was detected by Pellet (version 2.3.1).

two are the UOBM (Ma et al. 2006) TBoxes. They extend LUBM by adding OWL Lite features and OWL DL features, respectively. The next four were collected from http: //protegewiki.stanford.edu/wiki/Protege Ontology Library. They describe family relationships, the philosophical domain, wines and an early prototype GALEN model, respectively. The last three TBoxes are large TBoxes about medical extensions and were collected from http://www.opengalen.org/ download/opengalen8-owl-sources.zip. To show whether the proposed methods can constitute a practical approach to TBox abduction, we carried out two experiments. One computes up to the initial 1000 justiﬁcation patterns in every test TBox. These justiﬁcation patterns are ranked in descending order of their supports, where the support of a justiﬁcation pattern Jp in a TBox T is deﬁned as the number of justiﬁcations that can be instantiated from Jp in T . The other experiment computes JP-based explanations whose cardinalities are respectively 1 and 2 for 50 randomly generated observations from the top 100 justiﬁcation patterns computed in the former experiment. Each generated observation is an ACI that is not entailed by the given TBox while adding it to the TBox does not introduce incoherence. The top justiﬁcation patterns have the largest supports in a given TBox, thus the JP-based explanations derived from them are more likely to mimic the way the TBox is modeled than other explanations. We set a time limit of 10000 seconds for computing JP-based explanations whose cardinalities are k (where k {1, 2}) for all 50 generated observations in a test TBox. When timeout incurs we can also see how many top justiﬁcation patterns are completely handled for all 50 generated observations. All experiments were conducted on a laptop with Intel Dual-Core 2.60GHz CPU and 8GB RAM, running Windows 7 (64 bit) with the maximum Java heap size set to 8GB. The implemented system and test TBoxes are available at http://www.dataminingcenter.net/tboxabd/. The JP-Comp part in Table 2 shows the statistics about every test TBox in computing up to the initial 1000 justiﬁcation patterns. If the number of justiﬁcation patterns is less than 1000, the actual number is shown. The reported execution time includes the time for loading the TBox and com-

Table 2: The running statistics for every test TBox

JP-Comp JPBE1-Comp JPBE2-Comp TBox #P T(s) #TP T1(s) #E #TP T2(s) #E LUBM 17 *1.0 17 3.9 253 17 178.8 18898 UOBM-Lite 23 *1.3 23 5.5 342 23 250.5 39870 UOBM-DL 44 *2.3 44 6.1 255 44 290.7 40659 generations 31 *0.9 31 2.7 182 wine 894 *86.6 100 66.7 0 100 188.9 10948 philosurﬁcal 12 *7.6 12 89.7 1834 7 7111.4 724961 not-galen 1000 1340.9 100 72.0 1384 3 5637.8 144626 physiology 274 *14.8 100 108.7 477 98 9996.8 398521 drugs 289 *69.6 100 193.3 617 4 2749.2 49650 pathology 1000 122.8 100 112.6 935 7 4161.8 29759

Note: #P is the number of computed justiﬁcation patterns; T is the execution time in seconds for computing up to the initial 1000 justiﬁcation patterns (leading with * if all justiﬁcation patterns are computed); #TP is the number of top justiﬁcation patterns that are completely handled (for all 50 generated observations) in 10000 seconds; T1 (T2) is the execution time in seconds for completely handling the top #TP justiﬁcation patterns; #E is the total number of JP-based explanations computed for all 50 generated observations after completely handling the top #TP justiﬁcation patterns.

puting syntactic locality-based modules. The results show that justiﬁcation patterns can efﬁciently be computed in expressive DLs. Except for not-galen (resp. pathology) that costs 1340.9 (resp. 122.8) seconds to compute the initial 1000 justiﬁcation patterns, other test TBoxes only cost less than two minutes to compute all justiﬁcation patterns. The JPBEk-Comp part of Table 2 shows the statistics about every test TBox in computing all JP-based explanations whose cardinalities are k for top 100 justiﬁcation patterns and all 50 generated observations. When k = 1, up to the top 100 justiﬁcation patterns are completely handled in at most a few minutes. But the setting of k = 1 cannot guarantee that at least one JP-based explanation is generated (see 0 for wine), hence it may need to compute JP-based explanations with cardinality 2. The case where k = 2 is inapplicable to generations because all JP-based explanations in generations have cardinality 1. For each of other TBoxes, when k = 2, there are at least several top justiﬁcation patterns that can be completed handled in 10000 seconds with a number of JP-based explanations generated. These results show that a practical use of the proposed methods is to compute JP-based explanations with cardinality 1 or from several top justiﬁcation patterns. The results also indicate that we may need to postprocess the computed JPbased explanations by choosing several best ones from them. This can be done, either by ranking the computed JP-based explanations, or by exploiting an interactive debugging approach such as (Shchekotykhin and Friedrich 2010).

Related Work For abductive reasoning in DLs, there are three different general problems advocated in (Elsenbroich, Kutz, and Sattler 2006). They are concept abduction (Colucci et al. 2004; Bienvenu 2008; Noia, Sciascio, and Donini 2009), ABox abduction (Klarman, Endriss, and Schlobach 2011; Du et al.

2011; Calvanese et al. 2013; Du, Wang, and Shen 2014) as well as TBox abduction targeted in this work. There are several slightly different problem settings for TBox abduction. All of them typically target computing explanations E for an observation α in a TBox T such that T E |= α. In (Elsenbroich, Kutz, and Sattler 2006) T and E are allowed to be expressed in different DLs, but no method is provided with this general setting. In (Hubauer, Lamparter, and Pirker 2010) E is deﬁned as a set of concept inclusions having speciﬁed patterns. An automata-based method for this setting is proposed but no evaluation is provided. In (Lambrix, Dragisic, and Ivanova 2012; Lambrix and Liu 2013; Wei-Kleiner, Dragisic, and Lambrix 2014) E is restricted to be a set of ACIs. Some methods are provided with this setting for concept taxonomy (Lambrix and Liu 2013), acyclic ALC TBox (Lambrix, Dragisic, and Ivanova 2012) and EL++ TBox (Wei-Kleiner, Dragisic, and Lambrix 2014), respectively. In (Halland, Britz, and Klarman 2014) E is deﬁned as a set of concept inclusions allowing only concept name, atomic negation and unqualiﬁed existential or universal restriction. A tableau-based method for this setting is proposed but no evaluation is provided. In (Koopmann and Schmidt 2015) E is restricted to be of the form {A r.(B1 . . . Bn)}, computed by the uniform interpolant of the negated observation α and a given acyclic ALCH TBox. Compared to (Hubauer, Lamparter, and Pirker 2010), our problem setting restricts the pattern of an explanation as a whole rather than patterns of individual axioms, thus making the computation of explanations more practical. Compared to other existing problem settings, our setting allows arbitrary DL constructors to be used in explanations. More importantly, we have provided evaluation results to show that our setting enables a practical approach to TBox abduction that works well with complex TBoxes.

Conclusion and Future Work

For practical TBox abduction in complex TBoxes, we made the following contributions in this paper. First of all, we proposed a novel class of preferred explanations in TBox abduction, namely JP-based explanations. The restriction of being derivable from justiﬁcation patterns guarantees a ﬁnite space for explanations. The ds-minimality further reduces the space for preferred explanations. Secondly, we proposed an efﬁcient method for computing all justiﬁcation patterns from a TBox expressible in any DL. Thirdly, we proposed an efﬁcient method for computing all JP-based explanations from a given set of justiﬁcation patterns. Finally, we empirically showed that the proposed methods are efﬁcient for complex TBoxes. For future work, we plan to explore real-life applications of TBox abduction based on justiﬁcation patterns besides repairing missing is-a relations. One potential application is recognizing textual entailment, in which some promising results achieved by applying abduction have already been shown (Raina, Ng, and Manning 2005). We also plan to study practical criteria for ranking JP-based explanations in TBox abduction and to investigate methods for computing high-ranked JP-based explanations as early as possible.

Acknowledgements This work was partly supported by National Natural Science Foundation of China under grants 61375056 and 61573386, Natural Science Foundation of Guangdong Province under grant 2016A030313292, Guangdong Province Science and Technology Plan projects under grant 2016B030305007, and Sun Yat-sen University Cultivation Project (16lgpy40).

References Baader, F.; Calvanese, D.; Mc Guinness, D. L.; Nardi, D.; and Patel-Schneider, P. F., eds. 2003. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press. Bienvenu, M. 2008. Complexity of abduction in the EL family of lightweight description logics. In Proceedings of the 11th International Conference on Principles of Knowledge Representation and Reasoning (KR), 220 230. Calvanese, D.; Ortiz, M.; Simkus, M.; and Stefanoni, G. 2013. Reasoning about explanations for negative query answers in DL-Lite. Journal of Artiﬁcial Intelligence Research 48:635 669. Colucci, S.; Noia, T. D.; Sciascio, E. D.; Donini, F. M.; and Mongiello, M. 2004. A uniform tableaux-based approach to concept abduction and contraction in ALN. In Proceedings of the 17th International Workshop on Description Logics. Du, J.; Qi, G.; Shen, Y.; and Pan, J. Z. 2011. Towards practical ABox abduction in large OWL DL ontologies. In Proceedings of the 25th National Conference on Artiﬁcial Intelligence (AAAI), 1160 1165. Du, J.; Qi, G.; and Fu, X. 2014. A practical ﬁne-grained approach to resolving incoherent OWL 2 DL terminologies. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM), 919 928. Du, J.; Wang, K.; and Shen, Y. 2014. A tractable approach to ABox abduction over description logic ontologies. In Proceedings of the 28th AAAI Conference on Artiﬁcial Intelligence (AAAI), 1034 1040. Elsenbroich, C.; Kutz, O.; and Sattler, U. 2006. A case for abductive reasoning over ontologies. In Proceedings of the 3rd OWLED Workshop on OWL: Experiences and Directions. Grau, B. C.; Horrocks, I.; Kazakov, Y.; and Sattler, U. 2007. Just the right amount: extracting modules from ontologies. In Proceedings of the 16th International Conference on World Wide Web (WWW), 717 726. Guo, Y.; Pan, Z.; and Heﬂin, J. 2005. LUBM: A benchmark for OWL knowledge base systems. Journal of Web Semantics 3(2-3):158 182. Halland, K.; Britz, A.; and Klarman, S. 2014. TBox abduction in ALC using a DL tableau. In Proceedings of the 27th International Workshop on Description Logics, 556 566. Horrocks, I.; Kutz, O.; and Sattler, U. 2006. The even more irresistible SROIQ. In Proceedings of the 10th International Conference on Principles of Knowledge Representation and Reasoning (KR), 57 67.

Horrocks, I.; Patel-Schneider, P. F.; and van Harmelen, F. 2003. From SHIQ and RDF to OWL: the making of a web ontology language. Journal of Web Semantics 1(1):7 26. Hubauer, T.; Lamparter, S.; and Pirker, M. 2010. Automatabased abduction for tractable diagnosis. In Proceedings of the 23rd International Workshop on Description Logics. Junker, U. 2004. QUICKXPLAIN: Preferred explanations and relaxations for over-constrained problems. In Proceedings of the 19th National Conference on Artiﬁcial Intelligence (AAAI), 167 172. Kalyanpur, A.; Parsia, B.; Horridge, M.; and Sirin, E. 2007. Finding all justiﬁcations of OWL DL entailments. In Proceedings of the 6th International Semantic Web Conference (ISWC), 267 280. Klarman, S.; Endriss, U.; and Schlobach, S. 2011. ABox abduction in the description logic ALC. Journal of Automated Reasoning 46(1):43 80. Koopmann, P., and Schmidt, R. A. 2015. LETHE: saturation-based reasoning for non-standard reasoning tasks. In Informal Proceedings of the 4th International Workshop on OWL Reasoner Evaluation (ORE), 23 30. Lambrix, P., and Liu, Q. 2013. Debugging the missing is-a structure within taxonomies networked by partial reference alignments. Data & Knowledge Engineering 86:179 205. Lambrix, P.; Dragisic, Z.; and Ivanova, V. 2012. Get my pizza right: Repairing missing is-a relations in ALC ontologies. In Proceedings of the 2nd Joint International Semantic Technology Conference (JIST), 17 32. Ma, L.; Yang, Y.; Qiu, Z.; Xie, G.; Pan, Y.; and Liu, S. 2006. Towards a complete OWL ontology benchmark. In Sure, Y., and Domingue, J., eds., Proceedings of the 3rd European Semantic Web Conference (ESWC), 125 139. Noia, T. D.; Sciascio, E. D.; and Donini, F. M. 2009. A tableaux-based calculus for abduction in expressive description logics: Preliminary results. In Proceedings of the 22nd International Workshop on Description Logics. Raina, R.; Ng, A. Y.; and Manning, C. D. 2005. Robust textual inference via learning and abductive reasoning. In Proceedings of the 20th National Conference on Artiﬁcial Intelligence (AAAI), 1099 1105. Shchekotykhin, K. M., and Friedrich, G. 2010. Query strategy for sequential ontology debugging. In Proceedings of the 9th International Semantic Web Conference (ISWC), 696 712. Sirin, E.; Parsia, B.; Grau, B. C.; Kalyanpur, A.; and Katz, Y. 2007. Pellet: A practical OWL-DL reasoner. Journal of Web Semantics 5(2):51 53. Suntisrivaraporn, B.; Qi, G.; Ji, Q.; and Haase, P. 2008. A modularization-based approach to ﬁnding all justiﬁcations for OWL DL entailments. In Proceedings of the 3rd Asian Semantic Web Conference (ASWC), 1 15. Wei-Kleiner, F.; Dragisic, Z.; and Lambrix, P. 2014. Abduction framework for repairing incomplete EL ontologies: Complexity results and algorithms. In Proceedings of the 28th AAAI Conference on Artiﬁcial Intelligence (AAAI), 1120 1127.