# querybased_entailment_and_inseparability_for__c1ed3c1d.pdf

Query-Based Entailment and Inseparability for ALC Ontologies

Elena Botoeva,1 Carsten Lutz,2 Vladislav Ryzhikov,1 Frank Wolter,3 Michael Zakharyaschev4

1Faculty of Computer Science, Free University of Bozen-Bolzano

2Fachbereich Informatik, University of Bremen 3Department of Computer Science, University of Liverpool 4Department of Computer Science, Birkbeck, University of London

We investigate the problem whether two ALC knowledge bases are indistinguishable by queries over a given vocabulary. We give model-theoretic criteria and prove that this problem is undecidable for conjunctive queries (CQs) but decidable in 2EXPTIME for unions of rooted CQs. We also consider the problem whether two ALC TBoxes give the same answers to any query in a given vocabulary over all ABoxes, and show that for CQs this problem is undecidable, too, but becomes decidable and 2EXPTIME-complete in Horn-ALC, and even EXPTIME-complete in Horn-ALC when restricted to (unions of) rooted CQs.

1 Introduction

In recent years, data access using description logic (DL) TBoxes has become one of the most important applications of DLs [Poggi et al., 2008; Bienvenu and Ortiz, 2015], where the underlying idea is to use a TBox to specify semantics and background knowledge for the data (stored in an ABox), and thereby derive more complete query answers. A major research effort has led to the development of efﬁcient algorithms and tools for a number of DLs ranging from DL-Lite [Calvanese et al., 2007; Rodriguez-Muro et al., 2013] via more expressive Horn DLs such as Horn-ALC [Eiter et al., 2012; Trivela et al., 2015] to DLs with all Boolean constructors such as ALC [Kollia and Glimm, 2013; Zhou et al., 2015].

While query answering with DLs is now well-developed, this is much less the case for reasoning services that support ontology engineering and target query answering as an application. In ontology versioning, for example, one would like to know whether two versions of an ontology give the same answers to all queries formulated over a given vocabulary of interest, which means that the newer version can safely replace the older one [Konev et al., 2012]. Similarly, if one wants to know whether an ontology can be safely replaced by a smaller subset (module), it is the answers to all queries that should be preserved [Kontchakov et al., 2010]. In this context, the fundamental relationship between ontologies is thus not whether they are logically equivalent (have the same models), but whether they give the same answers to any relevant query.

The resulting entailment problem can be formalized in two ways, with different applications. First, given a class Q of queries, knowledge bases (KBs) K1 and K2, and a signature of relevant concept and role names, we say that K1 -Qentails K2 if the answers to any -query in Q over K2 are contained in the answers to the same query over K1. Further, K1 and K2 are -Q-inseparable if they -Q-entail each other. Note that a KB includes an ABox, and thus this notion of entailment is appropriate if the data is known and does not change frequently. Applications include data-oriented KB versioning and KB module extraction, KB forgetting [Wang et al., 2014], and knowledge exchange [Arenas et al., 2013].

If the data is not known or changes frequently, it is not KBs that should be compared, but TBoxes. Given a pair = ( 1, 2) that speciﬁes a relevant signature 1 for ABoxes and 2 for queries, we say that a TBox T1 -Qentails a TBox T2 if, for every 1-ABox A, the KB (T1, A) 2-Q-entails (T2, A). T1 and T2 are -Q-inseparable if they -Q-entail each other. Applications include data-oriented TBox versioning, TBox modularization and TBox forgetting [Kontchakov et al., 2010].

In this paper, we concentrate on the most important choices for Q, conjunctive queries (CQs) and unions thereof (UCQs); we also consider the practically relevant classes of rooted CQs (r CQs) and UCQs (r UCQs), in which every variable is connected to an answer variable. So far, CQ-entailment has been studied for Horn DL KBs [Botoeva et al., 2014], EL TBoxes [Lutz and Wolter, 2010; Konev et al., 2012], DL-Lite TBoxes [Kontchakov et al., 2009], and also for OBDA speciﬁcations, that is, DL-Lite TBoxes with mappings [Bienvenu and Rosati, 2015]. No results are available for non-Horn DLs (neither in the KB nor in the TBox case) and for expressive Horn DLs in the TBox case. In particular, query entailment in non-Horn DLs has had the reputation of being a technically challenging problem.

This paper makes a ﬁrst breakthrough into understanding query entailment and inseparability in these cases, with the main results summarized in Figures 1 and 2 (those marked with (?) are from [Botoeva et al., 2014]). Three of them came as a real surprise to us. First, it turned out that CQand r CQentailment between ALC KBs is undecidable, even when the ﬁrst KB is formulated in Horn-ALC (in fact, EL) and without any signature restriction. This should be contrasted with the decidability of subsumption-based entailment between ALC

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16)

Queries ALC Horn-ALC

ALC to Horn-ALC Horn-ALC

CQ undecidable ? =EXPTIME(?) UCQ ? r CQ undecidable 2EXPTIME =EXPTIME(?) r UCQ 2EXPTIME

Figure 1: KB query entailment.

Queries ALC Horn-ALC

ALC to Horn-ALC Horn-ALC

CQ undecidable ? =2EXPTIME UCQ ? r CQ undecidable =EXPTIME =EXPTIME r UCQ ?

Figure 2: TBox query entailment.

TBoxes [Ghilardi et al., 2006] and of CQ-entailment between Horn-ALC KBs [Botoeva et al., 2014]. The second surprising result is that entailment between ALC KBs becomes decidable when CQs are replaced with r UCQs. For ALC TBoxes, CQand r CQ-entailment are undecidable as well. We obtain decidability for Horn-ALC TBoxes (where CQund UCQentailments coincide) using the fact that non-entailment is always witnessed by tree-shaped ABoxes. As another surprise, CQ-entailment of Horn-ALC TBoxes is 2EXPTIMEcomplete while r CQ-entailment is only EXPTIME-complete. This should be contrasted with the EL case, where both problems are EXPTIME-complete [Lutz and Wolter, 2010]. All upper bounds and most lower bounds hold also for inseparability in place of entailment. A model-theoretic foundation for these results is a characterization of query entailment between KBs and TBoxes in terms of (partial) homomorphisms, which, in particular, enables the use of tree automata techniques to establish the upper bounds in Figs. 1 and 2. Omitted proofs are available in the full version [Botoeva et al., 2016].

2 Preliminaries

Fix lists of individual names ai, concept names Ai, and role names Ri, for i < !. ALC-concepts, C, are deﬁned by the grammar

C ::= Ai | > | C | C1 u C2 | 9Ri.C.

We use ?, C1 t C2 and 8R.C as abbreviations for >, ( C1 u C2) and 9R. C, respectively. A concept inclusion (CI) takes the form C v D, where C and D are concepts. An ALC TBox is a ﬁnite set of CIs. In a Horn-ALC TBox, no concept of the form C occurs negatively and no 9R. C occurs positively [Hustadt et al., 2005; Kazakov, 2009]. An EL TBox does not contain at all. An ABox, A, is a ﬁnite set of assertions of the form Ak(ai) or

Rk(ai, aj); ind(A) is the set of individual names in A. Taken together, T and A form a knowledge base (KB) K = (T , A); we set ind(K) = ind(A).

The semantics is deﬁned as usual based on interpretations I = ( I, I) that comply with the standard name assumption in the sense that a I

i = ai [Baader et al., 2003]. We write I |= if an inclusion or assertion is true in I. If I |= , for all 2 T [ A, then we call I a model of K and write I |= K. K is consistent if it has a model; we then also say that A is consistent with T . K |= means that I |= for all I |= K.

A conjunctive query (CQ) q(x) is a formula 9y '(x, y), where ' is a conjunction of atoms of the form Ak(z1) or Rk(z1, z2) with zi in x, y; the variables in x are the answer variables of q(x). We call q rooted (r CQ) if every y 2 y is connected to some x 2 x by a path in the graph whose

nodes are the variables in q and edges are the pairs {u, v} with R(u, v) 2 q, for some R. A union of CQs (UCQ) is a disjunction q(x) = W

i qi(x) of CQs qi(x) with the same answer variables x; it is rooted (r UCQ) if all qi are rooted.

A tuple a in ind(K) is a certain answer to a UCQ q(x) over K = (T , A) if I |= q(a) for all I |= K; in this case we write K |= q(a). If x = ;, the answer to q is yes if K |= q and no otherwise. The problem of checking whether a tuple is a certain answer to a given (U)CQ over a given ALC KB is known to be EXPTIME-complete for combined complexity [Lutz, 2008]. The EXPTIME lower bound actually holds for Horn-ALC [Kr otzsch et al., 2013].

A set M of models of a KB K is called complete for K if, for every UCQ q(x), we have K |= q(a) iff I |= q(a) for all I 2 M. We call an interpretation I a ditree interpretation if the directed graph GI with nodes d 2 I and edges (d, e) 2 RI, for some R, is a tree and RI \ SI = ;, for any distinct roles R and S. I has outdegree n if GI has outdegree n. A model I of a KB K = (T , A) is forest-shaped if I is the disjoint union of ditree interpretations Ia with root a, for a 2 ind(A), extended with all R(a, b) 2 A. The outdegree of I is the maximum outdegree of the Ia. It is well known that the class M fo

K of all forest-shaped models of an ALC KB K of outdegree bounded by |T | is complete for K [Lutz, 2008]. If K is a Horn-ALC KB, then a single member IK of M fo

K is complete for K. IK is constructed using the standard chase procedure and called the canonical model of K.

A signature, , is a set of concept and role names. By a -concept, -CQ, etc. we understand any concept, CQ, etc. constructed using the names from . We say that is full if it contains all concept and role names. A model I of a KB K is -connected if, for any u 2 I \ ind(K), there is a path RI

1 (a, u1), . . . , RI

n(un, u) with a 2 ind(K) and the Ri in . Deﬁnition 1. Let K1 and K2 be consistent KBs, a signature, and Q one of CQ, r CQ, UCQ or r UCQ. We say that K1 -Q-entails K2 if K2 |= q(a) implies a ind(K1) and K1 |= q(a), for all -Q q(x) and all tuples a in ind(K2). K1 and K2 are -Q inseparable if they -Q entail each other.

As larger classes of queries separate more KBs, -UCQ inseparability implies all other inseparabilities. The following example shows that, in general, no other implications between the different notions of inseparability hold for ALC. Example 2. Suppose T0 = ;, T 0

0 = {E v A t B} and 0 = {A, B, E}. Let A0 = {E(a)}, K0 = (T0, A0), and K0

0, A0). Then K0 and K0

0 are 0-CQ inseparable but not 0-r UCQ inseparable. In fact, K0

0 |= q(a) and K0 6|= q(a) for q(x) = A(x) _ B(x).

Now, let 1 = {E, B}, T1 = ;, and T 0

1 = {E v 9R.B}. Let A1 = {E(a)}, K1 = (T1, A1), and K0

Then K1 and K0

1 are 1-r UCQ inseparable but not 1-CQ inseparable. In fact, K0

1 |= 9x B(x) but K1 6|= 9x B(x). Deﬁnition 3. Let T1 and T2 be TBoxes, Q one of CQ, r CQ, UCQ or r UCQ, and let = ( 1, 2) be a pair of signatures. We say that T1 -Q entails T2 if, for every 1-ABox A that is consistent with both T1 and T2, the KB (T1, A) 2-Q entails the KB (T2, A). T1 and T2 are -Q inseparable if they - Q entail each other. If 1 is the set of all concept and role names, we say full ABox signature 2-Q entails or full ABox signature 2-Q inseparable .

We only consider ABoxes that are consistent with both TBoxes because the problem whether a 1-ABox consistent with T2 is also consistent with T1 is well understood: it is mutually polynomially reducible with the containment problem for ontology-mediated queries with CQs of the form 9x A(x), which is NEXPTIME-complete for ALC and EXPTIME-complete for Horn-ALC [Bienvenu et al., 2012; 2014]. Example 4. Consider the TBoxes T0 and T 0

0 from Example 2 and let = ( , ) for = {R, A, B, E}. Then T0 does not -r CQ entail T 0

0, A) |= q(a) and (T0, A) 6|= q(a) for

B R R q(x):

We observe that -CQ-entailment in the restricted case with = ( , ) has been investigated for EL TBoxes by Lutz and Wolter [2010] and Konev et al. [2012].

As in the KB case, -UCQ inseparability of ALC TBoxes implies all other types of inseparability, and Example 2 can be used to show that no other implications hold in general. The situation changes for Horn-ALC KBs and TBoxes. The following can be proved by observing that a Horn-ALC KB entails a UCQ iff it entails one of its disjuncts: Theorem 5. Let K1 be an ALC KB and K2 a Horn-ALC KB. Then K1 -UCQ entails K2 iff K1 -CQ entails K2. The

same holds for r UCQ and r CQ, and for TBox entailment.

3 Model-Theoretic Criteria for ALC KBs We now give model-theoretic criteria for -entailment between KBs. The product Q I of a set I of interpretations is deﬁned as usual in model theory [Chang and Keisler, 1990, page 405]. Note that, for any CQ q(x) and any tuple a of individual names, Q I |= q(a) iff I |= q(a) for each I 2 I.

Suppose Ii is an interpretation for a KB Ki, i = 1, 2. A function h: I2 ! I1 is called a -homomorphism if u 2 AI2 implies h(u) 2 AI1 and (u, v) 2 RI2 implies (h(u), h(v)) 2 RI1 for all u, v 2 I2, -concept names A,

and -role names R, and h(a) = a for all a 2 ind(K2). It is known from database theory that homomorphisms characterize CQ-containment [Chandra and Merlin, 1977]. For KB -query entailment, ﬁnite partial homomorphisms are required. We say that I2 is n -homomorphically embeddable into I1 if, for any subinterpretation I0

2 of I2 with | I0

2| n, there is a -homomorphism from I0

2 to I1. If, additionally, we require I0

2 to be -connected then I2 is said to be con-n - homomorphically embeddable into I1.

Theorem 6. Let K1 and K2 be ALC KBs, a signature, and let Mi be complete for Ki, i = 1, 2.

(1) K1 -UCQ entails K2 iff, for any n > 0 and I1 2 M1,

there exists I2 2 M2 that is n -homomorphically embeddable into I1. (2) K1 -r UCQ entails K2 iff, for any n > 0 and I1 2 M1,

there exists I2 2 M2 that is con-n -homomorphically embeddable into I1. (3) K1 -CQ entails K2 iff Q M2 is n -homomorphically

embeddable into Q M1 for any n > 0. (4) K1 -r CQ entails K2 iff Q M2 is con-n -homomor-

phically embeddable into Q M1 for any n > 0.

Proof. We only show (1). Suppose K2 |= q but K1 6|= q. Let n be the number of variables in q. Take I1 2 M1 such that I1 6|= q. Then no I2 2 M2 is n -homomorphically embeddable into I1. Conversely, suppose I1 2 M1 is such that, for some n, no I2 2 M2 is n -homomorphically embeddable into I1. We can regard any subinterpretation of any I2 2 M2 with domain of size n as a CQ (with answer variable corresponding to ABox individuals). The disjunction of all such CQs is entailed by K2 but not by K1.

Note that n -homomorphic embeddability cannot be replaced by -homomorphic embeddability. For example, in (1), let K1 = K2 = ({> v 9R.>}, {A(a)}), M1 = {I1}, where I1 is the inﬁnite R-chain starting with a, and let M2 contain arbitrary ﬁnite R-chains starting with a followed by an arbitrary long R-cycle. M1 and M2 are both complete for K, but there is no -homomorphism from any I2 2 M2 to I1. In Section 5, we show that in some cases we can ﬁnd characterizations with full -homomorphisms and use them to present decision procedures for entailment.

If both Mi are ﬁnite and contain only ﬁnite interpretations, then Theorem 6 provides a decision procedure for KB entailment. This applies, for example, to KBs with acyclic classical TBoxes [Baader et al., 2003], and to KBs for which the chase terminates [Grau et al., 2013].

4 Undecidability for ALC KBs and TBoxes

We show that CQ and r CQ-entailment and inseparability for ALC KBs are undecidable even if the signature is full and K1 is a Horn-ALC (in fact, EL) KB. We establish the same results for TBoxes except that in the r CQ case, we leave it open whether the full ABox signature is sufﬁcient for undecidability.

Theorem 7. (i) The problem whether a Horn-ALC KB -Q entails an ALC KB is undecidable for Q 2 {CQ, r CQ}.

(ii) -Q inseparability between Horn-ALC and ALC KBs is undecidable for Q 2 {CQ, r CQ}.

(iii) Both (i) and (ii) hold for the full signature .

Proof. The proof is by reduction of the undecidable N Mtiling problem: given a ﬁnite set T of tile types T with four colours up(T), down(T), left(T) and right(T), a tile type I 2 T, and two colours W (for wall) and C (for ceiling), decide whether there exist N, M 2 N such that the N M grid can be tiled using T in such a way that (1, 1) is covered by a tile of type I; every (N, i), for i M, is covered by a tile

T0 T0 T0 T1 T1 T1 T1

End T2 _ Row Row Start

hl hl hr hr

A Start I0 T N1

0 Start T 12

I0. . . T N1

0 . . . T 1M-2

0 . . . T NM-2

1 . . . T NM-1

Start B1 BN BN+1 Bn N Bn 1 Bn End qn

Figure 3: The structure of models Il and Ir of K2, and homomorphisms hl : qn ! Il and hr : qn ! Ir.

of type T with right(T) = W; and every (i, M), for i N, is covered by a tile of type T with up(T) = C.

Given an instance of this problem, we ﬁrst describe a KB K2 = (T2, {A(a)}) that uses (among others) 3 concept names Tk, k = 0, 1, 2, for each tile type T 2 T. If a point x in a model I of K2 is in Tk and right(T) = left(T 0), then x has an R-successor in T 0

k. Thus, branches of I deﬁne (possibly inﬁnite) horizontal rows of tilings with T. If a branch contains a point y 2 Tk with right(T) = W, then this y can be the last point in the row, which is indicated by an R-successor z 2 Row of y. In turn, z has R-successors in all T(k+1) mod 3 that can be possible beginnings of the next row of tiles. To coordinate the up and down colours between the rows which will be done by the CQs separating K1 and K2 we make every x 2 Tk, starting from the second row, an instance of all T 0

(k 1) mod 3 with down(T) = up(T 0). The row started by z 2 Row can be the last one in the tiling, in which case we require that each of its tiles T has up(T) = C. After the point in Row indicating the end of the ﬁnal row, we add an R-successor in End for the end of tiling. The beginning of the ﬁrst row is indicated by a P-successor in Start of the ABox element a, after which we add an R-successor in I0 for the given initial tile type I; see the lowest branch in Fig. 3. To generate a tree with all possible branches described above, we only require EL axioms of the form E v D and E v 9S.D.

The existence of a tiling of some N M grid for the given instance can be checked by Boolean CQs qn that require an R-path from Start to End going through Tkor Row-points:

R(xi, xi+1)

Bi(xi) End(xn+1)

with Bi 2 {Row} [ {Tk | T 2 T, k = 0, 1, 2}; see Fig. 3. The key trick is using an axiom of the form D v E t E0 to ensure that the Row-point before the ﬁnal row of the tiling has two alternative continuations: one as described above, and the other one having just a single R-successor in End; see Fig. 3 where _ indicates an or-node. This or-node gives two models of K2 denoted Il and Ir in the picture. If K2 |= qn, then qn holds in both of them, and so there are homomorphisms hl : qn ! Il and hr : qn ! Ir. As hl(xn 1) and hr(xn 1) are instances of Bn 1, we have Bn 1 = T NM 1

1 in the picture, and so up(T NM 1) = down(T NM). By repeating this argument until x0, we see that the colours between horizontal rows match and the rows are of the same length. (For this trick to work, we have to make the ﬁrst Row-point in every branch

an instance of Start.) In fact, we have:

Lemma 8. An instance of the N M-tiling problem has a positive answer iff there exists qn such that K2 |= qn.

It is to be noted that to construct T2 with the properties described above one needs quite a few auxiliary concept names.

Next, we deﬁne K1 = (T1, {A(a)}) to be the EL KB with the following canonical model:

End, 0 End, 0

End, 0 End, 0

End, 0 End, 0

where 0 = {Row} [ {Tk | T 2 T, k = 0, 1, 2}. Note that the vertical R-successors of the Start-points are not instances of any concept name, and so K1 does not satisfy any query qn. On the other hand, K2 |= q implies K1 |= q, for every -CQ q without a subquery of the form qn and = sig(K1).

This proves (i) for -CQ entailment. For -r CQ entailment, we slightly modify the construction, in particular, by adding R(a, a) and Row(a) to the ABox {A(a)}, and a conjunct R(y, x0) with a free y to qn. (The loop R(a, a) plays roughly the same role as the path between two Start-points in Fig. 3.) To prove (ii), we take K0

2 = K2 [ K1 and show that K1 -CQ entails K2 iff K1 and K0

2 are -CQ inseparable. Finally, we prove (iii) by replacing nonsymbols in K2 with complex ALC-concepts that cannot be used in CQs and extending the TBoxes appropriately; cf. [Lutz and Wolter, 2012, Lemma 21].

The TBoxes from the proof above can also be used to obtain

Theorem 9. (i) The problem whether a Horn-ALC TBox -Q entails an ALC TBox is undecidable for Q 2 {CQ, r CQ}.

(ii) -Q inseparability between Horn-ALC and ALC TBoxes is undecidable for Q 2 {CQ, r CQ}.

(iii) For CQs, (i) and (ii) hold for full ABox signatures and for = ( 1, 2) with 1 = 2.

Observe that our undecidability proof does not work for UCQs as the UCQ composed of the two disjunctive branches shown in Fig. 3 (for non-trivial instances) distinguishes between the KBs independently of the existence of a tiling. We now show that, at least for r UCQs, entailment is decidable.

5 r UCQ-Entailment for ALC-KBs

Theorem 7 might seem to suggest that any reasonable notion of query inseparability is undecidable for ALC KBs. Interestingly, this is not the case: we show now that r UCQ-entailment is decidable. We ﬁrst strengthen the characterization of Theorem 6 (2), and then develop a decision procedure based on tree automata. The ﬁrst step replaces con-n -homomorphic embeddability with con- -homomorphic embeddability, where I2 is con- -homomorphically embeddable into I1 if the maximal -connected subinterpretation of I2 is - homomorphically embeddable into I1.

Theorem 10. Let K1 and K2 be ALC KBs, a signature, and let M1 be complete for K1. Then K1 -r UCQ entails K2 iff for any I1 2 M1, there exists I2 |= K2 such that I2 is con- -homomorphically embeddable into I1.

Proof. In view of Theorem 6 (2), it sufﬁces to prove ()). Suppose I1 2 M1. By Theorem 6 (2), for every n 0, we have J 2 M fo

K2 and a -homomorphism hn : J| n ! I1, where J| n is the subinterpretation of J whose elements are connected to ABox individuals by -paths of length n. Clearly, for any n 0, there are only ﬁnitely many non-isomorphic pairs (J| n, hn). It can be shown that, thus, one can construct the required I2 2 M fo

K2 and con- -homomorphism h as the limits of suitable chains J| 0 J| 1 and h0 h1 , respectively.

For the second step, let K1, K2 be ALC-KBs and a signature. We use two-way alternating automata on inﬁnite trees (2ATAs) with a trivial acceptance condition (every run is accepting) and employ Theorem 10 for the class M fo

K1, encoding forest-shaped interpretations as labeled trees to make them accessible to 2ATAs. A tree is a non-empty (possibly inﬁnite) set T

closed under preﬁxes with root ". We say that T is m-ary if, for every x 2 T, the set {i | x i 2 T} is of cardinality m. Let Γ be an alphabet with symbols from the set

{root, empty} [ (ind(K1) 2CN(T1)) [ (RN(T1) 2CN(T1)),

where CN(Ti) (resp. RN(Ti)) denotes the set of concept (resp. role) names in Ti. A Γ-labeled tree is a pair (T, L) with T a tree and L: T ! Γ a node labeling function. We represent forest-shaped models of T1 as m-ary Γ-labeled trees, with m = max(|T1|, |ind(K1)|). The root node labeled with root is not used in the representation. Each ABox individ-

ual is represented by a successor of the root labeled with a symbol from ind(K1) 2CN(T1); non-ABox elements are represented by nodes deeper in the tree labeled with a symbol from RN(T1) 2CN(T1). The label empty is used for padding to make sure that every tree node has exactly m successors.

Now we construct three 2ATAs Ai, for i = 0, 1, 2. A0 ensures that the tree is labeled in a meaningful way, e.g. that the root label only occurs at the root node; A1 accepts Γ-labeled trees that represent a model of K1, and A2 accepts Γ-labeled trees (T, L) which represent an interpretation I(T,L) such that some model of K2 is con- -homomorphically embeddable into I(T,L). The most interesting automaton is A2, which guesses a model of K2 along with a homomorphism to I(T,L);

in fact, both can be read off from a successful run of the automaton. The number of states of the Ai is exponential in |K1 [ K2|. It then remains to combine these automata into a single 2ATA A such that L(A) = L(A0) \ L(A1) \ L(A2), which is possible with only polynomial blowup, and to test (in time exponential in the number of states) whether L(A) = ;.

Theorem 11. It is in 2EXPTIME to decide whether an ALC KB K1 -r UCQ entails an ALC KB K2.

The best known lower bound is EXPTIME, which is easy to establish by reduction from satisﬁability.

6 (r)CQ-Entailment for (Horn-)ALC-TBoxes

We show that CQand r CQ-entailment between ALC TBoxes becomes decidable when the second TBox is given in Horn ALC. In this case, entailments for CQs and UCQs and, respectively, r CQs and r UCQs coincide. We start with r CQs.

Our ﬁrst observation is that if a 1-ABox is a witness for non- -r CQ entailment, then one can ﬁnd a witness 1-ABox that is tree-shaped and of bounded outdegree. Here, an ABox A is tree-shaped if the graph with nodes ind(A) and edges {a, b} for each R(a, b) 2 A is a tree, and R(a, b) 2 A implies S(a, b) /2 A for all S 6= R and S(b, a) /2 A for all S.

Theorem 12. Let T1 be an ALC TBox, T2 a Horn-ALC TBox, and = ( 1, 2). Then T1 -r CQ-entails T2 iff, for all tree-shaped 1-ABoxes A of outdegree bounded by |T2| and consistent with T1 and T2, IT2,A is con2-homomorphically embeddable into any model I1 of (T1, A).

Proof. It is known that Horn-ALC is unravelling tolerant, that is, (T , A) |= C(a) for a Horn-ALC TBox T and ELconcept C iff (T , A0) |= C(a) for a ﬁnite sub-ABox A0 of the tree-unravelling of A at a [Lutz and Wolter, 2012]. Thus, any witness ABox for non-entailment w.r.t. EL-instance queries can be transformed into a tree-shaped witness ABox. The result follows by observing that if T1 does not -r CQ-entail T2, then this is witnessed by an EL-instance query and by applying Theorem 10 to the KBs. The bound on the outdegree is obtained by a careful analysis of derivations.

For the automaton construction, let T1 be an ALC TBox, T2 a Horn-ALC TBox, and = ( 1, 2) a pair of signatures. Though Theorem 12 provides a natural characterization that is similar in spirit to Theorem 10, we ﬁrst need a further analysis of con2-homomorphic embeddability in terms of simulations whose advantage is that they are more compositional (they can be partial and are closed under union).

Let I1, I2 be interpretations and a signature. A relation S I1 I2 is a -simulation from I1 to I2 if (i) d 2 AI1 and (d, d0) 2 S imply d0 2 AI2 for all -concept names A, and (ii) if (d, e) 2 RI1 and (d, d0) 2 S then there is a (d0, e0) 2 RI2 with (e, e0) 2 S for all -role names R. Let

di 2 Ii, i 2 {1, 2}. (I1, d1) is -simulated by (I2, d2), in symbols (I1, d1) (I2, d2), if there exists a -simulation S with (d1, d2) 2 S.

Lemma 13. Let A be a 1-ABox and I1 a model of (T1, A). Then IT2,A is not con2-homomorphically embeddable into

I1 iff there is a 2 ind(A) such that one of the following holds:

(1) there is a 2-concept name A with a 2 AIT2,A \ AI1; (2) there is an R-successor d of a in IT2,A, for some 2-role

name R, such that d /2 ind(A) and, for all R-successors e of a in I1, we have (IT2,A, d) 6 2 (I1, e). We use a mix of two-way alternating B uchi automata on ﬁnite trees (2ABTAs) and non-deterministic top-down automata on ﬁnite trees (NTAs). A ﬁnite tree T is m-ary if, for every x 2 T, the set {i | x i 2 T} is of cardinality zero or exactly m. We use labeled trees to represent a tree-shaped ABox A and a model I1 such that, for some a 2 ind(A), conditions (1) and (2) from Lemma 13 are satisﬁed, and thus IT2,A is not con2-homomorphically embeddable into I1. To ensure that later, additional bookkeeping information is needed. Node labels are taken from the alphabet

Γ = Γ0 2cl(T1) 2CN(T2) {0, 1} 2sub(T2),

where Γ0 is the set of all subsets of 1 [ {R | R 2 1} that contain at most one role (a role name R or its inverse R ), cl(Ti) is the set of subconcepts of (concepts in) Ti closed under single negation, and sub(T2) is the set of subconcepts of (concepts in) T2. For a Γ-labeled tree (T, L) and a node x from T, we use Li(x) to denote the (i+1)st component of L(x), where i 2 {0, . . . , 4}. Intuitively, the L0-component represents the ABox A, the L1-component the model I1, the L2-component represents IT2,A, and the L3and L4-components help to guarantee conditions (1) and (2) from Lemma 13.

To ensure that each component i 2 {0, . . . , 4} indeed represents what it is supposed to, we impose on it an i-properness condition. For example, a Γ-labeled (T, L) tree is 0-proper if (i) L0(") contains no role and (ii) for every non-root node x of T, L0(x) contains a role. A 0-proper Γ-labeled tree (T, L) represents the following tree-shaped 1-ABox:

A(T,L) = {A(x) | A 2 L0(x)} [

{R(x, y) | R 2 L0(y), y is a child of x} [ {R(y, x) | R 2 L0(y), y is a child of x}.

Due to space limitations, we skip the remaining deﬁnitions of properness and concentrate on explaining the most interesting components L3 and L4 of Γ-labels. The L3-component marks a single node x in the tree, which is the individual a from Lemma 13 that satisﬁes conditions (1) and (2). If (1) is satisﬁed, we do not need the L4-component. Otherwise, we store in that component at x a set of concepts S = {9R.A, 8R.B1, . . . , 8R.Bn} such that R 2 2 and all concepts from S are true at x in IT2,A. This successor set represents the R-successor d in condition (2) of Lemma 13. We then have to make sure that, for any neighboring node y of x that represents an R-successor of x in A(T,L), we have (IT2,A, d) 6 2 (I1, y). This can again happen via a concept

name or via a successor; we are done in the fomer case and use the L4-component of y in the latter. It is important to note that we can never return to the same node in this tracing process since we only follow roles in the forward direction and the represented ABox is tree-shaped. This is crucial for achieving the EXPTIME overall complexity.

We show that T2 is not -r CQ-entailed by T1 iff there is an m-ary Γ-labeled tree that is i-proper for any i 2 {0, . . . , 4}. It then remains to design a 2ABTA A that accepts exactly those

trees. We construct A as the intersection of ﬁve automata Ai, i < 5, where each Ai ensures i-properness. Some of the automata are 2ABTAs with polynomially many states while others are NTAs with exponentially many states. We mix automata models since some properness conditions (2properness) are much easier to describe with a 2ABTA while for others (4-properness), it does not seem to be possible to construct a 2ABTA with polynomially many states. In summary, we obtain the following result. Theorem 14. It is EXPTIME-complete to decide whether an ALC TBox T1 ( 1, 2)-r CQ entails a Horn-ALC TBox T2.

Note that the EXPTIME lower bound holds already for entailment of EL TBoxes and 1 = 2 [Lutz and Wolter, 2010]. We now study the non-rooted case, starting with an analogue of Theorem 12. As expected, moving to unrestricted queries corresponds to moving to unrestricted homomorphisms. Theorem 15. Let T1 and T2 be Horn-ALC TBoxes and = ( 1, 2). Then T1 -CQ entails T2 iff, for all tree-shaped 1-

ABoxes A of outdegree |T2| and consistent with T1 and T2,

IT2,A is 2-homomorphically embeddable into IT1,A.

The automata construction described above can largely be reused for this case. The main difference is that the two conditions in Lemma 13 need to be extended with a third one: there is an element d in the subtree of IT2,A rooted at a that has an R-successor d0, R /2 2, such that, for all elements e of I1, we have (I2, d0) 6 2 (I1, e). To deal with this condition, it becomes necessary to store multiple successor sets in the L4components instead of only a single one, which increases the overall complexity to 2EXPTIME. A matching lower bound can be proved by a (non-trivial) reduction of the word problem for exponentially bounded alternating Turing machines. Theorem 16. -CQ entailment for Horn-ALC TBoxes is 2EXPTIME-complete. The lower bound holds for = ( , ).

7 Future Work We have made ﬁrst steps towards understanding query entailment and inseparability for KBs and TBoxes in expressive DLs. Many problems remain to be addressed. From a theoretical viewpoint, it would be of interest to solve the open problems in Figures 1 and 2, and also consider other expressive DLs such as DL-Lite H

bool [Artale et al., 2009] or ALCI. For example, if Theorem 10 could be generalized to UCQs (and -homomorphisms), we would obtain a 2EXPTIME upper bound for UCQ-entailment between ALC KBs using the same technique as for r UCQs. Also, our undecidability proof goes through for DL-Lite H

bool, but the other cases remain open. From a practical viewpoint, our model-theoretic criteria for query entailment are a good starting point for developing algorithms for approximations of query entailment based on simulations. Our undecidability and complexity results also indicate that r UCQ-entailment is more amenable to practical algorithms than, say, CQ-entailment and can be used as an approximation of the latter. Acknowledgments. This work has been supported by the EU IP project Optique, grant n. FP7-318338, DFG grant LU 1417/2-1, and EPSRC UK grants EP/M012646/1 and EP/M012670/1 (i Tract).

References [Arenas et al., 2013] Marcelo Arenas, Elena Botoeva, Diego

Calvanese, and Vladislav Ryzhikov. Exchanging OWL 2 QL knowledge bases. In Proc. of IJCAI. AAAI Press, 2013. [Artale et al., 2009] Alessandro Artale, Diego Calvanese, Ro-

man Kontchakov, and Michael Zakharyaschev. The DLLite family and relations. J. of Art. Intel. Research, 2009. [Baader et al., 2003] Franz Baader, Diego Calvanese, Deb-

orah Mc Guinness, Daniele Nardi, and Peter F. Patel Schneider, editors. The Description Logic Handbook: Theory, Implementation and Applications. 2003. [Bienvenu and Ortiz, 2015] Meghyn Bienvenu and Mag-

dalena Ortiz. Ontology-mediated query answering with data-tractable description logics. In RW, 2015. [Bienvenu and Rosati, 2015] Meghyn Bienvenu and Ric-

cardo Rosati. Query-based comparison of OBDA speciﬁcations. In Proc. of DL, volume 1350. CEUR-WS, 2015. [Bienvenu et al., 2012] Meghyn Bienvenu, Carsten Lutz, and

Frank Wolter. Query containment in description logics reconsidered. In Proc. of KR, pages 221 231, 2012. [Bienvenu et al., 2014] Meghyn Bienvenu, Balder ten Cate,

Carsten Lutz, and Frank Wolter. Ontology-based data access: A study through Disjunctive Datalog, CSP, and MMSNP. ACM Trans. on Database Systems, 39(4), 2014. [Botoeva et al., 2014] Elena Botoeva, Roman Kontchakov,

Vladislav Ryzhikov, Frank Wolter, and Michael Zakharyaschev. Query inseparability for description logic knowledge bases. In Proc. of KR, pages 238 247, 2014. [Botoeva et al., 2016] Elena Botoeva, Carsten Lutz, Vladislav Ryzhikov, Frank Wolter, and Michael Zakharyaschev. Query-based entailment and inseparability for ALC ontologies (full version). Corr technical report, 2016. Available at arxiv.org/abs/1604.04164. [Calvanese et al., 2007] Diego Calvanese, Giuseppe De Gi-

acomo, Domenico Lembo, Maurizio Lenzerini, and Riccardo Rosati. Tractable reasoning and efﬁcient query answering in description logics: The DL-Lite family. J. of Automated Reasoning, 39(3):385 429, 2007. [Chandra and Merlin, 1977] Ashok K. Chandra and Philip M.

Merlin. Optimal implementation of conjunctive queries in relational data bases. In Proc. of STOC, pages 77 90, 1977. [Chang and Keisler, 1990] C.C. Chang and H.J. Keisler.

Model Theory. North-Holland, Amsterdam, 1990. [Eiter et al., 2012] Thomas Eiter, Magdalena Ortiz, Mantas

Simkus, Trung-Kien Tran, and Guohui Xiao. Query rewriting for Horn-SHIQ plus rules. In Proc. of AAAI, pages 726 733. AAAI Press, 2012. [Ghilardi et al., 2006] S. Ghilardi, C. Lutz, and F. Wolter. Did

I damage my ontology? A case for conservative extensions in description logics. In Proc. of KR, 2006. [Grau et al., 2013] Bernardo Cuenca Grau, Ian Horrocks,

Markus Kr otzsch, Clemens Kupke, Despoina Magka, Boris Motik, and Zhe Wang. Acyclicity notions for existential

rules and their application to query answering in ontologies. J. of Art. Intel. Research, 47:741 808, 2013. [Hustadt et al., 2005] Ulrich Hustadt, Boris Motik, and Ul-

rike Sattler. Data complexity of reasoning in very expressive description logics. In Proc. IJCAI, 2005. [Kazakov, 2009] Yevgeny Kazakov. Consequence-driven rea-

soning for Horn SHIQ ontologies. In Proc. of IJCAI, 2009. [Kollia and Glimm, 2013] Ilianna Kollia and Birte Glimm.

Optimizing SPARQL query answering over OWL ontologies. J. of Art. Intel. Research, 48:253 303, 2013. [Konev et al., 2012] Boris Konev, Michel Ludwig, Dirk

Walther, and Frank Wolter. The logical difference for the lightweight description logic EL. J. of Art. Intel. Research, 44:633 708, 2012. [Kontchakov et al., 2009] R. Kontchakov, L. Pulina, U. Sat-

tler, T. Schneider, P. Seimer, F. Wolter, and M. Zakharyaschev. Minimal module extraction from DL-Lite ontologies using QBF solvers. In Proc. of IJCAI, 2009. [Kontchakov et al., 2010] Roman Kontchakov, Frank Wolter,

and Michael Zakharyaschev. Logic-based ontology comparison and module extraction, with an application to DL-Lite. Artiﬁcial Intelligence, 174:1093 1141, 2010. [Kr otzsch et al., 2013] Markus Kr otzsch, Sebastian Rudolph,

and Pascal Hitzler. Complexities of Horn description logics. ACM Trans. on Comp. Logic, 14(1):2, 2013. [Lutz and Wolter, 2010] Carsten Lutz and Frank Wolter. De-

ciding inseparability and conservative extensions in the description logic EL. J. of Symb. Comp., 45(2), 2010. [Lutz and Wolter, 2012] Carsten Lutz and Frank Wolter. Non-

uniform data complexity of query answering in description logics. In Proc. of KR, pages 297 307, 2012. [Lutz, 2008] Carsten Lutz. The complexity of conjunctive

query answering in expressive description logics. In Proc. of IJCAR, pages 179 193. Springer, 2008. [Poggi et al., 2008] Antonella Poggi, Domenico Lembo, Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Riccardo Rosati. Linking data to ontologies. J. on Data Sem., 10:133 173, 2008. [Rodriguez-Muro et al., 2013] Mariano Rodriguez-Muro, Roman Kontchakov, and Michael Zakharyaschev. Ontology-based data access: Ontop of databases. In Proc. of ISWC, pages 558 573. Springer, 2013. [Trivela et al., 2015] Despoina Trivela, Giorgos Stoilos, Alexandros Chortaras, and Giorgos B. Stamou. Optimising resolution-based rewriting algorithms for OWL ontologies. J. of Web Sem., 33:30 49, 2015. [Wang et al., 2014] Kewen Wang, Zhe Wang, Rodney W.

Topor, Jeff Z. Pan, and Grigoris Antoniou. Eliminating concepts and roles from ontologies in expressive descriptive logics. Computational Intelligence, 30(2), 2014. [Zhou et al., 2015] Yujiao Zhou, Bernardo Cuenca Grau, Ya-

vor Nenov, Mark Kaminski, and Ian Horrocks. Pagoda: Pay-as-you-go ontology query answering using a datalog reasoner. J. of Art. Intel. Research, 54:309 367, 2015.