# learning_query_inseparable_εℒℋ_ontologies__8ac183fe.pdf

The Thirty-Fourth AAAI Conference on Artiﬁcial Intelligence (AAAI-20)

Learning Query Inseparable ELH Ontologies

Ana Ozaki, Cosimo Persia, Andrea Mazzullo KRDB Research Centre, Free University of Bozen-Bolzano

We investigate the complexity of learning query inseparable ELH ontologies in a variant of Angluin s exact learning model. Given a ﬁxed data instance A and a query language Q, we are interested in computing an ontology H that entails the same queries as a target ontology T on A , that is, H and T are inseparable w.r.t. A and Q. The learner is allowed to pose two kinds of questions. The ﬁrst is Does (T , A) |= q? , with A an arbitrary data instance and q and query in Q. An oracle replies this question with yes or no . In the second, the learner asks Are H and T inseparable w.r.t. A and Q? . If so, the learning process ﬁnishes, otherwise, the learner receives (A , q) with q Q, (T , A ) |= q and (H, A ) |= q (or vice-versa). Then, we analyse conditions in which query inseparability is preserved if A changes. Finally, we consider the PAC learning model and a setting where the algorithms learn from a batch of classiﬁed data, limiting interactions with the oracles.

Introduction Ontologies are a formal and popular way of representing knowledge. Taxonomies, categorisation of websites, products and their features, as well as more complex and specialized domain knowledge, can be represented with ontologies. Domain experts use ontologies while sharing and annotating information in their ﬁelds because in this way knowledge can be unambiguously understood and easily distributed. Medicine, for example, has produced large, standardised, and scalable ontologies (e.g. Galen and SNOMED CT). Broad and general so called knowledge graphs are emerging such as DBPedia (Bizer et al. 2009), Wikidata (Vrandeˇci c and Kr otzsch 2014), YAGO (Suchanek, Kasneci, and Weikum 2008). An ontology enables machines to process relations and deﬁnitions and reason about that knowledge. Sharing information, formalising a domain, and making assumptions explicit are some of the main reasons for using an ontology. Designing ontologies is a hard and error-prone task. The research community has approached the problem by developing editors that help ontology engineers to build ontologies manually (Knublauch et al. 2004) and deﬁned design principles (Stuckenschmidt, Parent, and Spaccapietra 2009). Even with tools, building ontologies is a laborious task that also needs expertise. An expert in designing ontologies is called

Copyright c 2020, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

an ontology engineer. Such an expert is normally familiar with the tools, languages, and techniques necessary to design an ontology through the communication with domain experts. The ontology engineer must communicate and understand the knowledge given by domain experts and then design an ontology that captures what is relevant in the domain. One of the main challenges in the process of building an ontology is that it often relies on the communication between the ontology engineer (or multiple ontology engineers) and domain experts that, in order to share knowledge, use the ambiguous natural language. Indeed, it can happen that some errors are made while designing it due to the difﬁculty of sharing knowledge. The ontology engineer can misunderstand the domain experts or they can inadvertently omit precious details. Moreover, knowledge can be implicit and while designing an ontology it is not easy to understand what are the real relationships between concepts and imagine all the possible consequences of designing concepts and their relationships in a particular way. There are several aspects which can inﬂuence the difﬁculty of creating an ontology. Following the approach by (Konev et al. 2018; Konev, Ozaki, and Wolter 2016), we focus on the problem of ﬁnding how concepts should be logically related, assuming that the relevant vocabulary is known by the domain experts, which can share this information with the ontology engineer. In this approach, the problem of building an ontology is treated as a learning problem in which the ontology engineer plays the role of a learner and communicates with the domain experts, who play the role of a teacher (also called an oracle). We assume that (1) the domain experts know the relevant knowledge about the domain and act in a consistent way as a single teacher; (2) the vocabulary that should be used in the ontology is known by the teacher and the learner; (3) the learner can pose queries to the teacher in order to acquire missing knowledge or check if it has learned enough in order to stop learning. The described model can be seen as an instance of Angluin s exact learning model (Angluin 1988) with membership and equivalence queries. The queries asked by the learner in order to acquire knowledge can be considered as membership queries and the queries that ask if the hypothesis of the learner correctly represents the relevant knowledge of the domain experts can be treated as equivalence queries. In the exact learning literature, one of the main goals is to determine the complexity of learning an abstract target.

Framework EL(H)lhs EL(H)rhs EL(H)

AQ IQ CQr ? CQ

AQ IQ CQr CQ

AQ IQ CQr CQ

Table 1: Polynomial query learnability of learning frameworks.

The complexity depends on the number and size of queries posed by the learner. In the work by (Konev et al. 2018; Konev, Ozaki, and Wolter 2016) it has been shown that exactly learnability of ontologies formulated in the very popular ELH (Baader et al. 2007a) ontology language is not possible with polynomially many polynomial size queries. Here, we investigate a more ﬂexible setting, where the ontology does not need to be logically equivalent but only inseparable w.r.t. a query language (Lutz and Wolter 2010) and a ﬁxed data instance1. Query inseparability is the basic requirement for ontology mediated query answering (OMQA) (Bienvenu 2016). In OMQA, relevant tasks such as ontology versioning, modularisation, update, and forgetting, depend on comparisons between ontologies based on answers given to queries (Botoeva et al. 2019). We study polynomial query and time learnability of the ELH ontology language in the OMQA setting. The query languages considered are atomic queries (AQ), instance queries (IQ), conjunctive queries (CQ) and a fragment of CQs called rooted CQs (denoted CQr). In particular, we show that in our setting the picture is brighter and ELH can be polynomially learned from IQs, however, it is still not possible to learn this language from CQs. Table 1 shows the different results obtained by previous works (Konev et al. 2018; Konev, Ozaki, and Wolter 2016) for logical equivalence and our results (shaded in gray) for query inseparability, taking into account the ontology (upper side) and the query languages (left side). means a positive result, i.e. polynomial query learnability; means that the query language is not expressive enough for exchanging information and means that polynomial query learnability cannot be achieved. Polynomial time learnability implies polynomial query learnability but the converse does not hold. All positive results for AQs, IQs, and CQrs also hold for polynomial time learnability. Since the learned ontology is not equivalent to the target, it may be the case that after the data is updated the learned ontology and the target are no longer query inseparable. We thus investigate conditions under which query inseparability is preserved when the data changes, which means that no further learning steps are needed after the change. In

1Here the term query refers to queries in the context of databases and query answering. Our data instances are ABoxes.

many applicaton scenarios, interactions with teachers may not be a viable option. We also consider learnability when the learner has only access to a batch of classiﬁed examples. Finally, we adapt the Probably Approximately Correct (PAC) model (Valiant 1984) to our OMQA setting, which we separate from the exact and query inseparable problem settings (Theorem 15). Our polynomial time results for query inseparability are transferable to the PAC model extended with membership queries (Theorem 14). Omitted proofs are available at https://arxiv.org/abs/1911.07229. Related Work. To cover the vast literature on ontology learning, we point to the collection edited by (Lehmann and V olker 2014) and surveys authored by (Cimiano, V olker, and Buitelaar 2010) and (Wong, Liu, and Bennamoun 2012). The closest works are the already mentioned papers on exact learning of lightweight description logics (DLs) (Konev et al. 2018; Duarte, Konev, and Ozaki 2018; Konev, Ozaki, and Wolter 2016). Exact learning of concepts formulated in the DL CLASSIC has been investigated by (Cohen and Hirsh 1994) and (Frazier and Pitt 1996). Some other works which are more closely related include works on learning EL concepts (Funk et al. 2019; Lehmann and Haase 2009). Formal Concept Analysis has been applied for learning DL ontologies (Rudolph 2004; Baader et al. 2007b; Borchmann and Distel 2011; Borchmann 2014; Ganter et al. 2016). Learnability of EL ontologies from ﬁnite interpretations has also been investigated (Klarman and Britz 2015). Association rule mining has been used to learn DL ontologies (with concept expressions of limited depth) (Sazonau and Sattler 2017; V olker and Niepert 2011; Fleischhacker, V olker, and Stuckenschmidt 2012; V olker, Fleischhacker, and Stuckenschmidt 2015).

Basic Deﬁnitions

Ontologies and Queries. The ELH syntax is deﬁned upon mutually disjoint countably inﬁnite sets of concept names NC, denoted with A, B, role names NR, denoted with r, s, and individual names NI, denoted with a, b. EL-concept expressions C are deﬁned inductively according to the rule C ::= A | | C C | r.C, where A NC and r NR. For simplicity, we omit ELfrom EL-concept expressions. An ELH ontology, also called TBox, is a ﬁnite set of concept inclusions (CI) C D, where C, D are concept expressions, and role inclusions (RI) r s, where r, s NR. We call an ELH TBox T a terminology if for all C D T either C or D is a concept name2 and T has at most one3 inclusion of the form A C for every A NC. From now on we assume all ELH TBoxes we deal with are terminologies. An ABox A is a ﬁnite set of expressions of the form A(a) or r(a, b), called assertions, with A NC, r NR and a, b NI. We denote by ind(A) the set of individual names occurring in an

2In the literature, the term terminology commonly refers to sets of concept inclusions A C and concept deﬁnitions A C, with no concept name occurring more than once on the left. As A C can be equivalently rewritten as A C and C A, our deﬁnition is a natural extension of this one. 3If a terminology contains A C and A D one can always rewrite it into A C D.

ABox A. The signature ΣT of a TBox T is the set of concept and role names occurring in it, and similarly for the signature ΣA of an ABox A. A knowledge base (KB) is a pair (T ,A) where T is a TBox and A is an ABox. We investigate classical query languages considered in the OMQA literature. An AQ takes the form of an assertion. An IQ is of the form C(a) or r(a, b), where C is a concept expression, r NR and a, b NI. A CQ is a ﬁrst-order sentence xϕ( a, x), where ϕ is a conjunction of atoms of the form r(t1, t2) or A(t), where t1, t2, t (called terms) can be individual names from a or individual variables from x. With an abuse of notation, we denote by AQ, IQ and CQ the sets of atomic, instance and conjunctive queries q, respectively, and we call a query language a set Q {AQ, IQ, CQ}. The size of a concept expression C (TBox T , ABox A, query q), denoted by |C| (and, respectively, |T |, |A|, |q|) is the length of the string that represents it, where concept, role, and individual names are considered to be of length one. The semantics of ELH is given as follows. An interpretation is a pair I = (ΔI, I), where ΔI is a non-empty set, called domain, and I is a function that maps every a NI to a I ΔI, every A NC to AI ΔI and every r NR to r I ΔI ΔI. The function I extends to other concept expressions as follows: I = ΔI, (C D)I := CI DI and ( r.C)I := {d ΔI | there is e CI : (d, e) r I}. An interpretation I satisﬁes a CI C D if CI DI, and an RI r s if r I s I. It satisﬁes an assertion A(a) if a I AI, and an assertion r(a, b) if (a I, b I) r I. I satisﬁes an AQ if it satisﬁes the corresponding assertion, and it satisﬁes an IQ C(a), or r(a, b), if a I CI, or (a I, b I) r I. I satisﬁes a CQ q (or there is a homomorphism from q to I) if there is a function π, mapping terms of q to elements of ΔI, such that: π(t) = t I, if t NI; π(t) AI, for every A(t) of q; and (π(t1), π(t2)) r I, for every r(t1, t2) of q. We write I |= α to state that I satisﬁes a CI, RI, assertion, or query α. I satisﬁes a TBox T , if it is a model of every CI and RI in T , and it satisﬁes an ABox A if it satisﬁes every assertion in A. I satisﬁes a KB K = (T , A), written I |= K, if it satisﬁes both T and A. A KB K entails a CI, an RI, an assertion, or query α, written K |= α, if, for all interpretations I, I |= K implies I |= α. A KB K entails a KB K , written K |= K , if I |= K implies I |= K ; K and K are equivalent, in symbols K K , if K |= K and K |= K. We may also speak of entailments and equivalences of TBoxes and ABoxes, deﬁned as usual (Baader et al. 2007a). For Q {AQ, IQ, CQ}, the KBs K and K are Q-inseparable, in symbols K Q K , if for every query q Q, we have that K |= q iff K |= q. Tree Representation and Homomorphisms. We will also represent a concept expression C as a ﬁnite directed tree TC = (VC, EC, l C), where VC is the set of all vertices, with the root denoted by ρC, EC is the set of all edges, and l C is a labelling function that maps every node to a set of concept names and every edge to a role name. This tree representation of C uniquely represents the corresponding concept expression, and it is inductively deﬁned as follows:

for C = , VC = {ρC} and l C(ρC) = ;

for C =A, where A NC, VC = {ρC} and l C(ρC) = A;

for C = r.D, TC is obtained from TD by adding a new

root ρC and an edge from ρC to the root ρD of TD with label l C(ρC, ρD) = r (we call ρD an r-successor of ρC);

for C =D1 D2, TC is obtained by identifying the roots of TD1 and TD2, with l C(ρC) = l D1(ρD1) l D2(ρD2).

The ABox representation of C, AC, is the ABox encoding the tree representation TC of C, deﬁned as follows. For each v VC, we associate av NI. Then, for every u VC and every (u, v) EC, we put: A(au) AC iff l C(u) = A; r(au, av) AC iff l C(u, v) = r. Given A, A be ABoxes, a function h: ind(A) ind(A ) is called an ABox homomorphism from A to A if: for every C(a) A, C(h(a)) A ; and, for every r(a, b) A, r(h(a), h(b)) A . Learning Model. We provide basic notions related to the exact learning model, extending the notation in (Konev et al. 2018). A learning framework F is a quadruple (E, S, L, μ), where E is a set of examples, S is a subset of E, L is a set of concept representations (also called hypothesis space), and μ is a mapping from L to 2E. We omit S if S = E. Each element l of L is assumed to be represented using a set of symbols Σl (if l is a TBox T , ΣT is the signature of T ). We say that e E is a positive example for l L if e μ(l) and a negative example for l if e μ(l). Given a learning framework F = (E, S, L, μ), we are interested in the exact identiﬁcation of a target concept representation t L w.r.t. the subset S of examples, by posing queries to oracles. Let MQF,t be the oracle that takes as input some e E and returns yes if e μ(t) and no otherwise. A membership query is a call to the oracle MQF,t. For every t L, we denote by EQF,t the oracle that takes as input a hypothesis concept representation h L and returns yes if μ(h) S = μ(t) S and a counterexample e (μ(h) μ(t)) S otherwise, where denotes the symmetric set difference. There is no assumption regarding which counterexample in (μ(h) μ(t)) S is chosen by the oracle. An equivalence query with respect to S is a call to the oracle EQF,t (if S = E, we omit with respect to S ). An (exact) learning algorithm for F = (E, S, L, μ) is a deterministic algorithm that, for a ﬁxed but arbitrary t L, takes Σt as input, is allowed to make queries to MQF,t and EQF,t (without knowing what the target t to be learned is), and that eventually halts and outputs some h L with μ(h) S = μ(t) S. We say that the learning algorithm is positive bounded if, in addition, μ(h) μ(t). We say that F is exact learnable if there is a learning algorithm for F and that F is polynomial query learnable if it is exact learnable by an algorithm A such that at every step the sum of the sizes of the inputs to membership and equivalence queries made by A up to that step is bounded by a polynomial p(|t|, |e|), where t is the target and e S is the largest counterexample seen so far (Arias 2004). Similarly, F is polynomial time learnable if it is exact learnable by an algorithm A such that at every step (we count each call to an oracle as one step of computation) of computation the time used by A up to that step is bounded by a polynomial p(|t|, |e|), where t L is the target and e S is the largest counterexample seen so far. We denote by PQUERYL and PTIMEL the class of learning frameworks which are, respectively, polynomial query and polynomial time learnable. Clearly, PTIMEL PQUERYL.

We now introduce the special case of learning frameworks which we focus in this work, called OMQA learning frameworks. Let L, A , and Q be, respectively, an ontology language, a ﬁxed but arbitrary ABox, and a query language. An OMQA learning framework F(L, A , Q) is a learning framework (E, S, L, μ) where E is the set of all pairs (A, q) with A an ABox (may be different from A ) and q Q; L is the set of all TBoxes formulated in L sharing a common ﬁnite signature (we write ΣT to refer to the signature of the target, which is assumed to be same as the one used for the hypothesis); S is the set of elements (A, q) of E where A = A ; and, for all T L, μ(T ) = {(A, q) E | (T , A) |= q}. We assume that the signature of q, i.e., the set of concept and role names occurring in q, is ΣT ΣA. Moreover, we deﬁne the size of an example (A, q), denoted by |(A, q)|, as the sum of the size of A and q. Given an OMQA learning framework F(L, A , Q) = (E, S, L, μ), for all H, T L, if μ(H) S = μ(T ) S, then H, T are Q-inseparable w.r.t. A . Since an equivalence query in an OMQA learning framework with S = E is in fact asking whether a given TBox is query inseparable from the target TBox w.r.t. a ﬁxed ABox and a query language, we may call it an inseparability query.

Polynomial Learnability We investigate whether, for a given ﬁxed ABox and query language, the problem of learning query inseparable ELH TBoxes is polynomial. We ﬁrst discuss the relationship between our OMQA setting and the data retrieval one (Konev, Ozaki, and Wolter 2016). The difference between the two settings is that here the oracle can only choose counterexamples of the form (A , q), with the ﬁxed ABox A given as input, whereas in the mentioned work the ABox in a counterexample can be arbitrary. In both settings the learner can pose membership queries with an arbitrary ABox in the examples. Formally, given an ontology language L and a query language Q, we denote by F(L, Q) the learning framework (E, S, L, μ), where E is the set of all pairs (A, q) with A an ABox and q Q; L is the set of all TBoxes formulated in L; E = S; and, for all T L, μ(T ) = {(A, q) E | (T , A) |= q}. We denote by ELHlhs and ELHrhs the fragments of ELH which only allow complex concept expressions on the lefthand side and on the right-hand side of CIs, respectively. It is known that the learning frameworks F(ELHlhs, AQ) and F(ELHrhs, IQ) are in PTIMEL, whereas F(ELH, IQ) and F(ELHrhs, CQ) are not in PQUERYL (Konev, Ozaki, and Wolter 2016). It follows from our deﬁnitions that, for any ontology language L, query language Q and ABox A , polynomial learnability of F(L, Q) implies polynomial learnability of F(L, A , Q). Thus, the positive results for F(ELHlhs, AQ) and F(ELHrhs, IQ) hold in the OMQA setting. That is, for any ABox A , the learning frameworks F(ELHlhs, A , AQ) and F(ELHrhs, A , IQ) are in PTIMEL.

Proposition 1. For any ontology language L, query language Q and ABox A , if F(L, Q) is in PTIMEL/PQUERYL, then F(L, A , Q) is in PTIMEL/PQUERYL.

In the following, we extend the positive results for ELHlhs and ELHrhs to the class of ELH terminologies (the union of ELHlhs and ELHrhs) in the OMQA setting based on IQs.

This is in contrast with the negative result for the class of ELH terminologies in the data retrieval setting with IQs, thus, showing that the converse direction of Proposition 1 does not hold. Throughout this section A is a ﬁxed but arbitrary ABox. We also analyse the learning framework F(ELHrhs, A , CQ), which is also not in PQUERYL in the OMQA setting. Learning ELH ontologies with AQ. We argue that the learning framework F(ELH, A , AQ) is in PTIMEL. There are only polynomially many counterexamples that the oracle can give since they can only be of the form (A , q) with q an atomic query (using symbols from ΣT ). The set of RIs entailed by T can be learned in polynomial time by adding to the hypothesis H all RIs r s such that the membership oracle replies yes given an example ({r(a, b)}, s(a, b)) as input. Therefore we only need to show that one can compute concept inclusions that together with A entail precisely the same atomic queries as the target ontology T on A . The following lemma establishes that indeed one can compute such concept inclusions in polynomial time.

Lemma 2. Let T and H be resp. the target and the hypothesis (of size polynomial in |T |) terminologies. Given a positive counterexample (A , A(a)), one can compute in polynomial time in |A | and |T | a CI C B such that (T , A ) |= B(b), (H, A ) |= B(b), and (H {C B}, A ) |= B(b), for some b ind(A ). Moreover, T |= C B. The main idea for proving Lemma 2 is to transform the structure of A into a tree shaped structure, as in (Konev, Ozaki, and Wolter 2016). However, in the mentioned work only CIs of the form C A are considered, while in our case CIs of the form A C may also be present. With such CIs the computed tree shaped ABox may be smaller than the original concept expression in T (as in Example 3).

Example 3. Assume that T = {B s.B, r. s.B A} and A = {r(a, b), B(b)}. We have that (T , A) |= A(a) and the concept expression r.B encoded in A is smaller than the original concept expression r. s.B in T implying the concept name A. Intuitively, even though there is no homomorphism from A r. s.B to A, we have that B abbreviates s.B, because it is implied by B.

Since there are polynomially many possible counterexamples, our upper bound for AQs follows from Lemma 2.

Theorem 4. F(ELH, A , AQ) is in PTIMEL. Moreover, there is a positive bounded learning algorithm for showing such upper bound which only poses membership queries (no inseparability queries are needed).

Learning ELH ontologies with IQ. We build on the result for AQs (Theorem 4) and argue that the learning framework F(ELH, A , IQ) is in PTIMEL. By Theorem 4, CIs of the form C A which ensure AQ-inseparability w.r.t. A can be learned with membership queries. It remains to show how one can learn CIs of the form A C, so that H is IQ-inseparable from T (w.r.t. A ). By Theorem 4, we can assume that one can construct a hypothesis H such that T |= H, since there is a positive bounded learning algorithm. The next lemma states that if T |= H and (H, A ) AQ (T , A ) then one can transform a counterexample of the form (A , D(a)) into

a CI such that T |= A C, and H |= A C, with C a subconcept of D and A ΣT such that (H, A ) |= A(b) for some b NI. Since |ΣT | is polynomial in |T | and the number of subconcepts of D is also polynomial in |D|. One can ﬁnd such A C in polynomial time w.r.t. |T | and |(A , D(a))|. Lemma 5. Let T and H be ELH terminologies and let A be an ABox. Assume (H, A ) AQ (T , A ). If (A , D(a)) μ(T ) \ μ(H) then there is A(b) and a subconcept C of D such that (H, A ) |= A(b), T |= A C, and H |= A C. Given A C such that T |= A C, and H |= A C, one can compute with polynomially many polynomial size queries another CI A C entailed by T but not by H belonging to a class of CIs called T -essential (Konev et al. 2018, Lemma 29). In fact this can be done in polynomial time given the complexity of entailment checking (Baader, Lutz, and Brandt 2008) (no inverse roles). Such T -essential CIs have the property that their size is bounded polynomially in the size of T (Konev et al. 2018, Lemma 32) and if α1 = A C1 and α2 = A C2 are T -essential and not equivalent then one can compute in polynomial time a T -essential CI A C such that it entails α1 and α2 (Konev et al. 2018, Lemma 30). All in all, if the learner computes such T -essential counterexamples and adds/reﬁnes them in the hypothesis (see (Konev et al. 2018, Algorithm 2)) then, after learning from polynomially many counterexamples, it will terminate and output a hypothesis IQ-inseparable from the target (w.r.t. A ). The presence of CIs of the form C A does not affect this result (Duarte, Konev, and Ozaki 2018). Theorem 6. F(ELH, A , IQ) is in PTIMEL. In contrast, the learning framework F(ELH, IQ) is not in PQUERYL (Konev, Ozaki, and Wolter 2016). We observe that the counterexamples used in such hardness proof are based on an exponential number of ABoxes encoding concept expressions of the form Cb = i n Ci, where b = b1 . . . bn is a sequence with bi {0, 1}, Ci = Ai if bi = 1, and Ci = Bi if bi = 0. In the OMQA setting, the ABox is ﬁxed and, as stated in Theorem 6, this lowers the complexity. Learning ELH ontologies with CQ. ELH ontologies are not polynomial query learnable in the data retrieval setting with CQs as the query language (in fact not even the fragment ELHrhs) (Konev, Ozaki, and Wolter 2016). The counterexamples used by the oracle in the hardness proof are of the form ({A(a)}, q) (Konev, Ozaki, and Wolter 2016, proof of Lemma 8), so {A(a)} can be considered as the ﬁxed ABox given as part of the input in an OMQA learning framework. Thus, the mentioned hardness result can be transferred to our setting. We formalise this result with the next theorem. Theorem 7. F(ELH, A , CQ) is not in PQUERYL. The hardness proof in the mentioned paper uses a very simple CQ of the form x M(x), which has a match in the anonymous part of the model but hides the concept on the right side of a CI that causes the entailment of this query. This phenomenon makes one wonder whether restricting to the class of queries in which every variable needs to be reachable by an individual name (as it happens with IQs) can tame the complexity of the problem. Our next theorem proves this. Given a CQ q, we deﬁne Gq as the directed graph (V, E) where the nodes V are the terms of q and the edges E are the

a x1 x3 r s

Figure 1: Assume T = {A r. s. }, A = {A(a)} and H = . A call to EQF,T can output (A , x(r(a, x1) r(a, x2) s(x1, x3) s(x1, x4) s(x2, x4) s(x2, x5)), which can be converted into (A , r. s. (a)) by merging variables and asking MQF,T whether the new query holds.

pairs (t1, t2) such that there is an atom of the form r(t1, t2) in q. We say that a CQ q = xϕ( a, x) is rooted if for every x in x, we have that x is reachable from a node in Gq that is in a. We denote by CQr the class of all rooted CQs. The next lemma establishes that one can transform queries in CQr into queries in IQ (by posing membership queries). Lemma 8. Let T and H be ELH TBoxes and assume T and H entail the same RIs. Given a positive counterexample (A , q) (for T and H), with q CQr, one can contruct a positive counterexample (A , q ) with q IQ in polynomial time in |(A , q)||ΣT |. In Figure 1, it is shown an example of this conversion. Even though the conversion involves deciding query answering, which is NP-hard, these checks are on the side of the oracle, and so, they do not affect the complexity of learning. By Lemma 8 a IQ-inseparable hypothesis can be found by following the same steps of a learning algorithm for IQ after q in (A , q) is converted into an instance query. For ELH, if TBoxes entail the same RIs then IQ-inseparability implies CQr-inseparability. Since RIs can be easily learned with membership queries, we obtain our next theorem. Theorem 9. F(ELH, A , CQr) is in PTIMEL.

Data Updates The algorithm presented in the previous section for IQs computes an ontology H that is IQ-inseparable from the target T w.r.t. a ﬁxed ABox A . In this section, we ﬁrst study when IQ-inseparability is preserved, without changes to the hypothesis H, if A is updated to an ABox A. Then, given an OMQA learning framework F(ELH, A , IQ) = (E, S, L, μ), we determine conditions on an updated ABox A, sufﬁcient to guarantee that a learning framework (E, S , L, μ) with S S is still in PTIMEL. To characterise when IQ-inseparability is preserved if A is updated to an ABox A, we use the classical notion of bisimulation. Let I = (ΔI, I), J = (ΔJ , J ) be two interpretations. A bisimulation is a non-empty relation Z ΔI ΔJ satisfying the following conditions, for all (d, e) Z: (1) for all concept names A NC, d AI iff e AJ ; (2) for all role names r NR, if (d, d ) r I, d ΔI, then there exists e ΔJ such that (e, e ) r J and (d , e ) Z; (3) for all role names r NR, if (e, e ) r J , e ΔJ , then there exists d ΔI such that (d, d ) r I and (d , e ) Z. If (d, e) Z, we write (I, d) (J , e). Theorem 10. Let T and H be ELH terminologies entailing the same RIs, and let A and A be ABoxes. If, for all b

ind(A), there is a ind(A ) such that (IA , a) (IA, b), then (H, A ) IQ (T , A ) implies (H, A) IQ (T , A). Theorem 10 does not hold if we require the ABoxes A and A to be homomorphically equivalent, i.e., if there are ABox homomorphisms from A to A and from A to A . Example 11. Consider T = { r.A1 B} and A = {r(a, b), A1(b), A2(b)}. The hypothesis H = { r.(A1 A2) B} is IQ-inseparable. However, if A = A {r(a , b ), A1(b )}, then IQ-inseparability is not preserved, even though A and A are homomorphically equivalent. The problem here is that the left-hand side of CIs in H could be biased and too speciﬁc for the individuals in A . Indeed, in Example 11, if the more general concept expression r.A1 on the left-side had been learned, then IQinseparability would have been preserved after the update. If we allow for modiﬁcations to the learned hypothesis H, we can extend the class of updated ABoxes A not only to those in which every individual in A is bisimilar to an individual in A but in which a more relaxed condition is required. It is easy for the learner to make certain kinds of generalisation, for instance, check whether T |= r.A1 B and add such more general CI to H. Therefore, the idea is to suitably generalise the left-hand side of CIs in the hypothesis H computed by the learning algorithm. Generalisation of C A H for T consists of replacing C by the result C of (1) replacing a concept name B in C with or B such that T |= B B and T |= B B if T |= C A; or (2) replacing a role name r in C with s such that T |= r s and T |= s r if T |= C A. We say that C A H is generalised for T if generalization of C A H for T has been exhaustively applied. H is generalised for T if all the CIs in it are generalised. We may omit for T if this is clear from the context. With the following deﬁnitions, we deﬁne a class of ABoxes that are guaranteed to preserve IQ-inseparability if the hypothesis is generalised. Given a TBox T and concept names A, B ΣT NC, we say that there is a linear derivation from A to B if T |= A B and for all B NC such that T |= A B we have that T |= B B. Similarly, for r, s ΣT NR, there is a linear derivation from r to s if T |= r s and for all s NR such that T |= r s we have that T |= s s. We write A < A if A is the result of replacing A(a) A by B(a) and there is a linear derivation from A to B, or if A is the result of replacing r(a, b) A by s(a, b) and there is a linear derivation from r to s. We deﬁne g T (A ) as the set of all ABoxes A such that there is a sequence A1 < . . . < An with A1 = A and An = A. The following theorem establishes an our upper bound for learning frameworks extending S with all the examples of the form (A, q), where A g T (A ) and q IQ. Theorem 12. Let F be the learning framework that results from adding all pairs of the form (A, q), with A g T (A ) and q IQ, to the set S in F(ELH, A , IQ) = (E, S, L, μ), where T L. Assume ΣT ΣA . Then, F is in PTIMEL.

Learning from Data The existence of oracles that correctly answer to all the queries posed by the learner does not naturally ﬁt those set-

tings in which only a direct access to data is available. In this section, we study how the oracle-based approach presented so far can be modiﬁed so to allow access to examples retrieved from data, thus reducing the dependency of our learning model on membership and inseparability queries. Firstly, we consider a ﬁnite batch of examples (Arias, Khardon, and Maloberti 2007), to be used as a representative of the entire data pool, and study conditions under which it is guaranteed the existence of such a batch that allows us to learn inseparable ontologies. Then, we analyse how a data-driven approach can be used as a basis for a learning model for DL ontologies based on the well-known PAC learning model, possibly extended with membership queries (Valiant 1984). Learning from batch. Given an OMQA learning framework F(L, A , Q) = (E, S, L, μ), a batch B is a ﬁnite subset of E. One could ask under which conditions a batch that allows us to construct an ontology H L which is Q-inseparable from a target T L is guaranteed to exist. If no restrictions are imposed on the form of the examples occurring in B, the answer is trivial for EL ontologies and IQs. Indeed, for every C D T , consider the set B of examples of the form (AC, D(ρC)), obtained by representing the concept C as a labelled tree with root ρC and encoded in the ABox AC. These examples have the property that, for every T L, T |= C D iff (T , AC) |= D(ρC). By setting

H = {C D | (AC, D(ρC)) B},

we obtain that H is equivalent to (and thus IQ-inseparable, w.r.t. any ABox, from) T . This construction can be easily extended to ELH by using examples of the form ({r(a, b)}, s(a, b)). However, instead of allowing the examples retrieved from our data to have no restrictions in their size and in the shape of the ABoxes, it would be more natural to require that these ABoxes contain less information than A , given as a parameter. This intuition can be made precise by imposing that, for each ABox A in the batch, there is an ABox homomorphism from A to A and |A| is polynomial in |T |. Our next theorem states that, under these assumptions, one can construct a Q-inseparable ELH terminology. Theorem 13. Let F(ELH, A , Q) = (E, S, L, μ) be an OMQA learning framework, with Q {AQ, IQ, CQr}, and let T L be such that ΣT ΣA . Let X E be the set of examples (A, q) such that there is an ABox homomorphism from A to A . Then, there is a batch B X, polynomial in |T |, and an algorithm such that it takes B as input, it eventually halts, and returns H L such that μ(H) S = μ(T ) S. PAC learning. Let F = (E, S, L, μ) be a learning framework. A probability distribution D on S is a function D: 2S [0, 1] R such that D(

i I D(Xi) for mutually disjoint Xi, where I is a countable set of indices, Xi S, and D(S) = 1. Given a target t L, let EXD F,t be the oracle that takes no input, and outputs a classiﬁed example (e, ℓt(e)), where e S is sampled according to the probability distribution D, ℓt(e) = 1, if e μ(t) S (positive example), and ℓt(e) = 0, otherwise (negative example). An example query is a call to the oracle EXD F,t. A sample generated by EXD F,t is a (multi-)set of indexed classiﬁed ex-

amples, independently and identically distributed according to D, sampled by calling EXD F,t. A learning framework F is PAC learnable if there is a function f : (0, 1)2 N and a deterministic algorithm such that, for every ϵ, δ (0, 1) R, every probability distribution D on S, and every target t L, given a sample of size m f(ϵ, δ) generated by EXD F,t, the algorithm always halts and outputs h L such that with probability at least (1 δ) over the choice of m examples in S, we have that D((μ(h) μ(t)) S) ϵ. If the time used by the algorithm is bounded by a polynomial function p(|t|, |e|, 1/ϵ, 1/δ), where e is the largest example in the sample, then we say that F is polynomial time PAC learnable. If, in addition, the algorithm is allowed to make membership queries (where each call to MQF,t counts as one step of computation), we say that F is polynomial time PAC learnable with membership queries. Theorem 14 ((Angluin 1988), (Mohri, Rostamizadeh, and Talwalkar 2012)). If F is in PTIMEL, then F is polynomial time PAC learnable with membership queries. However, the converse direction of Theorem 14 does not hold (Blum 1994). The argument in the mentioned paper is based on the assumption that one-way functions exist and cannot be easily adapted to serve as a counterexample for OMQA learning frameworks. Our next result, showing that the converse direction of Theorem 14 does not hold in our setting, does not rely on cryptographic assumptions, however, it is representation-dependent. Given a sequence σ = σ1σ2 . . . σn, with σi {r, s}, let the expression σ.C stand for σ1. σ2. . . . . σn.C (clearly, there are 2n expressions of this form). Consider the OMQA learning framework F(L, A , Q) where A = {A(a)}; Q is the query language that extends IQs with a CQ of the form x M(x); and L is an ontology language allowing only ELHrhs TBoxes of the form Tσ = {A σ.M} T0 to be expressed, where

T0 = {A X0, M r.M s.M}

{Xi r.Xi+1 s.Xi+1 | 0 i < n}. It can be shown, with an argument similar to the one used in (Konev, Ozaki, and Wolter 2016, proof of Lemma 8) (and Theorem 7 above), that such framework is not polynomial query learnable. However, due to the restrictions on the hypothesis space, it is polynomial time PAC learnable, even without membership queries. Theorem 15. There is a polynomial time PAC learnable OMQA learning framework that is not in PQUERYL. A learning framework F = (E, S, L, μ) shatters a set of examples X S if |{μ(h) X | h L}| = 2|X|. The VC-dimension (Vapnik 1995) of F, denoted VC(F), is the maximal size of a set X S such that F shatters X. If F can shatter arbitrarily large sets then F has inﬁnite VC-dimension. Example 16. For n N, let

An = {r(ai, ai+1), s(ai, ai) | 1 i < n} {r(an, a1)}.

Each ai can be identiﬁed by Ci = rn i. s. , since ai C IAn i ( rk is a shorthand for k nestings of the form r). E.g., a1 is the only individual not in ( r. s. )IA2 (see Figure 2).

a1 a2 r s r

Figure 2: For A2 , X = {(A , A(a1)), (A , A(a2))} is shattered because we can ﬁnd in L: h1 = { s. r. s. A}, h2 = { r. s. A}, h3 = { s. A}, h4 = h2 h3.

For all n N, F(ELH, An , AQ) shatters {(An , A(ai)) | 1 i n}. This does not hold if we add s(an, an) to An .

For discrete cases, in particular, for fragments of ﬁrst-order logic, the lower bounds obtained with the VC-dimension cannot be larger than the size of the learned expressions assuming a reasonable encoding scheme (Arias and Khardon 2006). The authors argue that many VC-dimension bounds in the literature showing exponential or inﬁnite growth are in terms of some parameters (number of clauses, etc.) determining the size of the target, while other parameters (number of literals, etc.) are ignored. Let Fm = (E, S, L, μ) be a learning framework where the string size of the elements of L is bounded by m. Since the VC-dimension is bounded by a logarithm of |L| (for L ﬁnite), VC(Fm) = O(m) (Vapnik 1995).

Proposition 17. For all m N, Fm(ELH, A , CQ) is PAC learnable with a polynomial number of example queries.

Since the sample complexity (number of classiﬁed examples) is polynomial in the size of the target, polynomial time PAC learnability amounts to showing that one can compute a hypothesis in L that is consistent with the classiﬁcation of the examples in polynomial time. However, even if A is ﬁxed, checking whether (A , q) is a positive example for a hypothesis H is NP-hard if the underlying structure of A is non-bipartite (Hell and Neˇsetˇril 1990). So (unless P = NP) there is not much hope for polynomial time learnability, even with membership queries, since in this case one may not be able to convert the CQ into an IQ (as we did in Theorem 9).

We introduced the OMQA learning setting and investigated the complexity of learning ELH ontologies in this setting with various query languages. We then considered what happens when the data changes and adaptations to settings where the algorithm learns from classiﬁed data, limiting interactions with oracles. Our positive result for IQ-inseparable ELH TBoxes paves the way for further studies on the complexity of learning ontologies formulated in more expressive languages. We leave the problem of exactly learning ELH TBoxes with CQrs as an open problem. Learning with a more expressive query language is not easier because the oracle can formulate counterexamples which are not informative. Neither it is more difﬁcult because on the other hand, with a more expressive language, the learner can pose more informative membership queries. It would also be interesting to investigate a similar data model in which the ABox is ﬁxed for all the examples, so that the data pool contains examples in the form of queries alone.

Acknowledgments This research has been supported by the Free University of Bozen-Bolzano through the projects PACO and MLEARN.

Angluin, D. 1988. Queries and concept learning. Machine Learning 2(4):319 342. Arias, M., and Khardon, R. 2006. Complexity parameters for ﬁrst order classes. Machine Learning 64(1-3):121 144. Arias, M.; Khardon, R.; and Maloberti, J. 2007. Learning horn expressions with LOGAN-H. Machine Learning Research 8:549 587. Arias, M. 2004. Exact learning of ﬁrst-order expressions from queries. Ph.D. Dissertation, Citeseer. Baader, F.; Calvanese, D.; Mc Guinness, D.; Nardi, D.; and Patel Schneider, P., eds. 2007a. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, second edition. Baader, F.; Ganter, B.; Sertkaya, B.; and Sattler, U. 2007b. Completing description logic knowledge bases using formal concept analysis. In IJCAI, 230 235. Baader, F.; Lutz, C.; and Brandt, S. 2008. Pushing the EL envelope further. In OWLED. Bienvenu, M. 2016. Ontology-mediated query answering: Harnessing knowledge to get more from data. In IJCAI, 4058 4061. Bizer, C.; Lehmann, J.; Kobilarov, G.; Auer, S.; Becker, C.; Cyganiak, R.; and Hellmann, S. 2009. DBpedia A crystallization point for the Web of Data. Web Semantics 7(3):154 165. Blum, A. L. 1994. Separating distribution-free and mistakebound learning models over the boolean domain. SIAM J. Comput. 23(5):990 1000. Borchmann, D., and Distel, F. 2011. Mining of EL-GCIs. In ICDM Workshops, 1083 1090. Borchmann, D. 2014. Learning terminological knowledge with high conﬁdence from erroneous data. Ph.D. Dissertation, Higher School of Economics. Botoeva, E.; Lutz, C.; Ryzhikov, V.; Wolter, F.; and Zakharyaschev, M. 2019. Query inseparability for ALC ontologies. Artif. Intell. 272:1 51. Cimiano, P.; V olker, J.; and Buitelaar, P. 2010. Ontology construction. In Handbook of Natural Language Processing, Second Edition. Chapman and Hall/CRC. 577 604. Cohen, W. W., and Hirsh, H. 1994. Learning the CLASSIC description logic: Theoretical and experimental results. In KR, 121 133. Duarte, M. R. C.; Konev, B.; and Ozaki, A. 2018. Exact Learner: A tool for exact learning of EL ontologies. In KR, 409 414. Fleischhacker, D.; V olker, J.; and Stuckenschmidt, H. 2012. Mining RDF data for property axioms. In On the Move to Meaningful Internet Systems: OTM 2012. Springer. 718 735. Frazier, M., and Pitt, L. 1996. Classic learning. Machine Learning 25(2-3):151 193. Funk, M.; Jung, J. C.; Lutz, C.; Pulcini, H.; and Wolter, F. 2019. Learning description logic concepts: When can positive and negative examples be separated? In IJCAI, 1682 1688. Ganter, B.; Obiedkov, S. A.; Rudolph, S.; and Stumme, G. 2016. Conceptual exploration. Springer. Hell, P., and Neˇsetˇril, J. 1990. On the complexity of h-coloring. J. Comb. Theory Ser. B 48(1):92 110.

Klarman, S., and Britz, K. 2015. Ontology learning from interpretations in lightweight description logics. In ILP, 76 90. Knublauch, H.; Fergerson, R. W.; Noy, N. F.; and Musen, M. A. 2004. The Prot eg e OWL plugin: An open development environment for semantic web applications. 229 243. Springer. Konev, B.; Lutz, C.; Ozaki, A.; and Wolter, F. 2018. Exact Learning of Lightweight Description Logic Ontologies. Machine Learning Research 18(201):1 63. Konev, B.; Ozaki, A.; and Wolter, F. 2016. A model for learning description logic ontologies based on exact learning. In AAAI, 1008 1015. Lehmann, J., and Haase, C. 2009. Ideal Downward Reﬁnement in the EL Description Logic. In ILP, 73 87. Lehmann, J., and V olker, J. 2014. Perspectives on Ontology Learning, volume 18. IOS Press. Lutz, C., and Wolter, F. 2010. Deciding inseparability and conservative extensions in the description logic EL. J. Symb. Comput. 45(2):194 228. Mohri, M.; Rostamizadeh, A.; and Talwalkar, A. 2012. Foundations of Machine Learning. Adaptive computation and machine learning. MIT Press. Rudolph, S. 2004. Exploring relational structures via FLE. In International Conference on Conceptual Structures, ICCS, 196 212. Sazonau, V., and Sattler, U. 2017. Mining hypotheses from data in OWL: advanced evaluation and complete construction. In ISWC, 577 593. Stuckenschmidt, H.; Parent, C.; and Spaccapietra, S., eds. 2009. Modular Ontologies: Concepts, Theories and Techniques for Knowledge Modularization, volume 5445 of Lecture Notes in Computer Science. Springer. Suchanek, F. M.; Kasneci, G.; and Weikum, G. 2008. YAGO: A large ontology from Wikipedia and Word Net. Web Semantics 6(3):203 217. Valiant, L. G. 1984. A theory of the learnable. Commun. ACM 27(11):1134 1142. Vapnik, V. N. 1995. The Nature of Statistical Learning Theory. Springer-Verlag. V olker, J., and Niepert, M. 2011. Statistical schema induction. In The Semantic Web: Research and Applications. Springer. 124 138. V olker, J.; Fleischhacker, D.; and Stuckenschmidt, H. 2015. Automatic acquisition of class disjointness. Web Semantics 35:124 139. Vrandeˇci c, D., and Kr otzsch, M. 2014. Wikidata: A Free Collaborative Knowledgebase. Commun. ACM 57(10). Wong, W.; Liu, W.; and Bennamoun, M. 2012. Ontology Learning from Text: A Look Back and into the Future. ACM Computing Surveys 44(4):20:1 20:36.