# answering_conjunctive_queries_with_inequalities_in_dlliteℛ__313bd4d1.pdf

The Thirty-Fourth AAAI Conference on Artiﬁcial Intelligence (AAAI-20)

Answering Conjunctive Queries with Inequalities in DL-Lite R

Gianluca Cima,1 Maurizio Lenzerini,1 Antonella Poggi2,1

1Dipartimento di Ingegneria Informatica, Automatica e Gestionale, Sapienza Universit a di Roma 2Dipartimento di Lettere e Culture Moderne, Sapienza Universit a di Roma {cima, lenzerini, poggi}@diag.uniroma1.it

In the context of the Description Logic DL-Lite = R, i.e., DL-Lite R without UNA and with inequality axioms, we address the problem of adding to unions of conjunctive queries (UCQs) one of the simplest forms of negation, namely, inequality. It is well known that answering conjunctive queries with unrestricted inequalities over DL-Lite R ontologies is in general undecidable. Therefore, we explore two strategies for recovering decidability, and, hopefully, tractability. Firstly, we weaken the ontology language, and consider the variant of DL-Lite = R corresponding to RDFS enriched with both inequality and disjointness axioms. Secondly, we weaken the query language, by preventing inequalities to be applied to existentially quantiﬁed variables, thus obtaining the class of queries named UCQ =,bs. We prove that in the two cases, query answering is decidable, and we provide tight complexity bounds for the problem, both for data and combined complexity. Notably, the results show that answering UCQ =,bs over DL-Lite = R ontologies is still in AC0 in data complexity.

1 Introduction Description Logics (DLs) (Baader et al. 2003; 2017) allow for deﬁning ontologies in terms of two components, named TBox (general axioms on the concepts and relations in the domain of interest), and ABox (axioms about instances of concepts and relations). In this paper we consider DL-Lite R, which is the DL of the DL-Lite family (Calvanese et al. 2004; 2007) underpinning the OWL 2 proﬁle OWL 2 QL (Motik et al. 2012), and is arguably one of the most important formalisms in Ontology-Based Data Access (OBDA) (Poggi et al. 2008; Bienvenu 2016; Xiao et al. 2018; Ortiz 2018), where the aim is to access a typically huge amount of data represented as an ABox, either materialized or virtual. In particular, DL-Lite R has been designed so that answering unions of conjunctive queries (UCQs) posed to an ontology expressed in this language can be reduced to evaluating ﬁrst-order logic queries over the database corresponding to the ABox, and therefore the problem is in AC0 in the size of the ABox, i.e., in the so-called data complexity (Vardi 1982). Although UCQs constitute the most popular class of queries studied for both databases and ontologies, they have

Copyright c 2020, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

several limitations in expressive power. Notably, they do not allow any form of negation, not even the one expressed by the inequality (i.e., not equal ) predicate. For instance, the query computing all triangles in an undirected graph cannot be expressed as a conjunctive query (CQ), whereas it can be expressed as the following CQ with inequalities {(x, y, z) | edge(x, y), edge(x, z), edge(y, z), x = y, x = z, y = z}, where the predicate edge represents the connections between nodes in the graph. The above example shows that inequalities are indeed necessary for expressing even very simple properties, like triangle graphs. However, while answering UCQs in DL-Lite R has been extensively studied in recent years (Xiao et al. 2018), the problem of answering CQs with inequalities (CQ =s) and unions thereof (UCQ =s) has been rarely investigated. To the best of our knowledge, the basic facts that are known about such problem can be summarized as follows.

In stark contrast to the UCQ case, answering UCQ =s is undecidable, even in the case of ontologies expressed in DL-Litecore, which is the fragment of DL-Lite R without role disjointness and role inclusion axioms (Guti errez Basulto, Ib a nez-Garc ıa, and Kontchakov 2012). For DL-Lite R ontologies, undecidability holds already for CQ =s (Guti errez-Basulto et al. 2015). Looking at these results, one can easily realize that the main of source of undecidability stems from both the ability of the ontology language to express incomplete information through existential quantiﬁers, and the possibility of imposing inequalities between existential variables in the query.

In (Guti errez-Basulto et al. 2015) it is also proved that for the subclasses of CQ =s and UCQ =s named local CQ =s and local UCQ =s, respectively, query answering over DL-Lite R ontologies is decidable, but with a CONEXPTIME upper bound in data complexity. Furthermore, it is provably intractable (in general co NP-hard in data complexity) already for local CQ =s. Local (U)CQs are special (U)CQs with inequalities, designed in such a way that each inequality atom in the query that contributes to a certain answer with respect to a DL ontology has at least one of its terms bound by an individual in the ABox.

The goal of this paper is to investigate under which conditions, stronger than local UCQs, tractability of answering queries with inequalities is recovered, or at least the com-

plexity is lowered with respect to the one of local UCQs. The basic idea to achieve this goal is to explore ontology languages and query languages ensuring the following property: each inequality atom α = β that contributes to the certain answer to a query with respect to a DL ontology, it does so with both terms α and β bounded by individuals in the ABox. In order to follow this path, we consider as basic language DL-Lite R without the Unique Name Assumption (UNA)1 and with inequality axioms, called DL-Lite = R, and explore two alternative strategies.

The ﬁrst strategy is to weaken the ontology language, so as to eliminate all the constructs introducing incomplete information resulting from existentially quantiﬁed assertions. The outcome is a sublanguage of DL-Lite = R, that we call DL-Lite , = RDFS because it extends DL-Lite RDFS (Cuenca Grau 2004; Rosati 2007; Cima, Lenzerini, and Poggi 2019) with both inequality axioms (in the ABox) and disjointness axioms (in the TBox).

The second strategy is to keep DL-Lite = R as ontology language, but to weaken the query language by restricting the application of the inequality predicate to either individuals or distinguished variables (variables representing output values) only, as done in (Poggi 2016; Cima et al. 2017). The resulting query language is called (U)CQs with bounded inequalities , and the corresponding class is denoted by (U)CQ =,b. Observe that, although limited, the expressive power of (U)CQ =,b allow interesting queries to be expressed such as the one computing the triangles in a graph.

For the case of DL-Lite , = RDFS, we show that answering UCQ =s is decidable, and in particular co NP-complete in data complexity, and Πp 2-complete in combined complexity (i.e., with respect to the size of the whole input, including the query). We also investigate if the number of inequalities in each disjunct plays a role in falling into intractability. We answer positively to this question, by showing that if the query has at most one inequality per disjunct, answering UCQ =s is PTIME-complete in data complexity, and NP-complete in combined complexity (so, combined complexity class is the same as in the case without inequalities), while it is co NPhard in data complexity if the query is conjunctive and has at most two inequalities. We also show that going from one to two inequalities causes the jump from NP-hardness to Πp 2hardness in combined complexity for (U)CQ =s, and we conjecture that this holds already for CQ =s. For the case of (U)CQ =,b, we show that answering CQ =,bs over DL-Lite = R ontologies has the same complexity of the UCQ case, i.e., it is in AC0 in data complexity and NP-complete in combined complexity. However, perhaps surprisingly, answering UCQ =,bs over DL-Lite = R ontologies is Πp 2-complete in combined complexity. Therefore, unless NP = co NP, the presence of union makes the prob-

1Since the UNA can be enforced through suitable axioms in DL-Lite = R, such a language is a generalization of the classical DL-Lite R, and it turns out to be much more interesting when dealing with queries with inequalities.

lem of answering queries with inequalities over DL-Lite = R ontologies signiﬁcantly different from the case of UCQs. We argue that the above results considerably improve our understanding of the implication that the presence of inequalities in queries has in the context of lightweight ontologies. In particular, to the best of our knowledge, our investigation on DL-Lite , = RDFS provides the ﬁrst results on reasoning with inequalities when querying DL-Lite RDFS ontologies, and they also contribute a new result on containment of UCQs with inequalities in databases (Kolaitis, Martin, and Thakur 1998; Koutris et al. 2017): the problem is in NP (and therefore NP-complete) in the case of at most one inequality for each disjunct, and Πp 2-complete, in the case of at most two inequalities for each disjunct. On the other hand, our results on (U)CQ =,bs posed to DL-Lite = R ontologies show that this class is currently the only class of queries with inequalities that can be answered with AC0 data complexity. Indeed, the only previously result known for this class was the PTIME algorithm described in (Poggi 2016). We also observe that all the results on CQ =,bs presented in this paper easily extend to UCQ =s posed to OWL 2 QL ontologies interpreted under the Direct Semantics Entailment Regime (Glimm 2011), that is the regime usually adopted for SPARQL queries. So, we are improving on a result reported in (Cima et al. 2017), where it is shown that answering UCQ =s over OWL 2 QL ontologies can be polynomially reduced to the evaluation of a Datalog program, and therefore is in PTIME in data complexity, and in EXPTIME in combined complexity. The paper is organized as follows. In Section 2 we provide some details on the notions used in the paper. In Section 3 we illustrate the notion of chase that we use for DL-Lite = R, which is the basis for some of the technical results presented in this paper. In Section 4 and Section 5 we present our results on DL-Lite , = RDFS, and (U)CQ =,bs, respectively. Finally, in Section 6 we conclude the paper with a discussion on future work.

2 Preliminaries

We deﬁne the syntax and the semantics of DL-Lite = R, and present the query languages considered in the paper.

DL-Lite R and its variants. Essentially, DL-Lite = R generalizes DL-Lite R by removing the UNA, and adding axioms asserting inequalities of individuals2. Formally, starting with an alphabet including symbols for individuals, atomic concepts, and atomic roles, and the binary relation symbol =, a DL-Lite = R ontology, or simply an ontology, is a pair O = T , A , such that T , called a TBox, and A, called an ABox, are sets of axioms, that have, respec-

2In principle, the axiom a = b can be expressed by the axioms A1(a), A2(b), A1 A2, for two new atomic concepts A1, A2. However, we prefer to keep the inequality predicate, both for avoiding changing the TBox, and for being coherent with OWL 2 QL.

tively, the following forms:

T : B1 B2 R1 R2 (inclusion) B1 B2 R1 R2 (disjointness) A : A(a) P(a, b) (membership) a = b (inequality)

where a, b denote individuals, A and P denote an atomic concept and an atomic role, respectively, B1, B2 are basic concepts, i.e., expressions of the form A, P, or P , and R1 and R2 are basic roles, i.e., expressions of the form P, or P . In this paper we also consider other variants of DL-Lite R, obtained by progressively restricting DL-Lite = R. The ﬁrst one is obtained by disallowing existential quantiﬁers to occur in the right-hand side of inclusion axioms, and it is named DL-Lite , = RDFS, since it corresponds to DL-Lite RDFS extended with both inequality axioms and disjointness axioms. Other variants are DL-Lite RDFS and DL-Lite = RDFS, which are obtained by further disallowing inequality and disjointness axioms, respectively. As for the semantics of DL-Lite = R, an interpretation for O is a pair I = ΔI, I , where the interpretation domain ΔI is a non-empty set of objects, and the interpretation function I assigns to each individual a an object a I ΔI, to each atomic concept A a set of objects AI ΔI, to each atomic role a set of pairs of objects P I ΔI ΔI, and to the special predicate = the set of all pairs of distinct objects, i.e., =I= {(o1, o2) | o1, o2 ΔI o1 = o2} (so, we often write (o1, o2) =I as o I 1 =I o I 2, or even o I 1 = o I 2). The interpretation function extends to the other basic concepts and the other other basic roles as follows: (i) ( P)I = {o | o .(o, o ) P I}, (ii) ( P )I = {o | o .(o , o) P I}, and (iii) (P )I = {(o, o ) | (o , o) P I}. An interpretation I satisﬁes an axiom α β (resp., α β) if αI βI (resp., αI βI = ), an axiom A(a) (resp., P(a, b)) if a I AI (resp., (a I, b I) P I), and an axiom a = b if a I = b I). It satisﬁes a set γ of axioms if it satisﬁes all axioms in γ. Finally, O = T , A is satisﬁable if there exists a model of O, i.e., an interpretation for O that satisﬁes both the TBox T and the ABox A. Queries over DL-Lite = R. A conjunctive query with inequalities (CQ =) over an ontology O is an expression of the form q = { x | φ( x, y)}, where x and y are tuples of variables, called distinguished and existential variables of q, respectively, and φ( x, y), called the body of q, is a ﬁnite conjunction of DL-Lite = R ABox assertions with variables that can appear in predicate arguments, i.e., atoms of the form A(t1), P(t1, t2), or t1 = t2, where each tj is either an individual of O, or a variable in x or y. We impose that every variable in x or y appears in some atom of φ( x, y), as usual (Abiteboul, Hull, and Vianu 1995). If x is empty, then the query is called boolean. A CQ = q without atoms of the form x1 = x2 in its body is called a conjunctive query (CQ). An intermediate class of queries that lies between CQs and CQ =s is the class of conjunctive queries with bound inequalities (CQ =,b). Speciﬁcally, a CQ =,b q = { x | φ( x, y)} is a CQ = whose inequalities involve only individuals or distinguished variables, i.e., for every atom z1 = z2 appearing in φ( x, y), both z1 and z2 are not in y. A UCQ (resp., UCQ =,b, UCQ =)

is a union of a ﬁnite set of CQs (resp., CQ =,bs, CQ =s) with same arity. The set cert(q, O) of certain answers of a UCQ = q over O is the set of n-tuples t = t1, . . . , tn of individuals in O such that O |= q( t), i.e., t I 1, . . . , t I n q I, also written I |= q( t I 1, . . . , t I n ), for every model I of O, where q I denotes the extension of q in I. When q is a boolean query, we write O |= q if q I = { } (i.e., q is true in I, also denoted by I |= q) for every model I of O. Observe that, when I is ﬁnite it can be seen as a relational database (Abiteboul, Hull, and Vianu 1995), and q I simply denotes the evaluation of the UCQ q over I. When we talk about the problem of answering a query belonging to a class of queries Q over an L-ontology, i.e., an ontology expressed in the DL L, we implicitly refer to the following decision problem: Given a query q Q, an L-ontology O, and an n-tuple t of individuals of O, check whether t cert(q, O). From results of (Calvanese et al. 2007), it is well known that checking whether a DL-Lite R ontology O = T , A is satisﬁable can be done by evaluating a suitable query over the ABox A seen as a relational database, in particular it can be done in AC0 in data complexity and in PTIME in the size of the TBox T . Furthermore, when a UCQ q is posed over a satisﬁable DL-Lite R ontology O = T , A , it is possible to compute the set cert(q, O) of certain answers by ﬁrst reformulating q w.r.t. T , and then by evaluating the reformulated query (which is again a UCQ) over the ABox A seen as a relational database. This yields the well-known result that answering UCQs over DL-Lite R ontologies is in AC0 in data complexity and NP-complete in combined complexity. Observe that, since DL-Lite R is insensitive to the adoption of the UNA for UCQ answering (Artale et al. 2009), the same complexity results hold for the problem of answering UCQs over satisﬁable DL-Lite = R ontologies. We end the section with the notion of homomorphism (Chandra and Merlin 1977), that will be used in the following. A homomorphism h from a CQ = q to a structure B is a function from variables and individuals of q to elements of B such that (i) h(a) = a for each individual a occurring in q; (ii) for each atom of the form A(t1) (resp., P(t1, t2)), there is an atom A(h(t1)) (resp., P(h(t1), h(t2))) occurring in B; and (iii) for each atom of the form t1 = t2, we have that h(t1) = h(t2).

3 The chase

The conceptual tool that we use for addressing the problem of answering UCQ =,bs over DL-Lite = R ontologies is a modiﬁcation of the chase used for DL-Lite R (Calvanese et al. 2007). Speciﬁcally, given a DL-Lite = R ontology O = T , A , we build a (possibly inﬁnite) structure, starting from Ch0(O) = A, and repeatedly computing Chj+1(O) from Chj(O) by applying suitable rules, where each rule can be applied only if certain conditions hold. In doing so, we make use of a new inﬁnite alphabet V of variables for introducing fresh unknown individuals, and we follow a deterministic strategy that is fair, i.e., it is such that if at some point a

rule is applicable then it will be eventually applied. Finally, we set Ch(O) =

i N Chi(O). Note that we make use of the additional binary predicate symbol ineq, whose intended role is used to record all inequalities logically implied by O. The rules we use include all the ones illustrated in (Calvanese et al. 2007). For example, if A1 P T , A1(e1) is in Chj(O), and no e2 exists such that P(e1, e2) Chj(O), then we set Chj+1(O) = Chj(O) {P(e1, s)}, where s V does not appear in Chj(O). There are, however, crucial additions related to the ineq predicate. In what follows, when we say R(e1, e2) holds in Chj(O), where R is a basic role, we mean (i) P(e1, e2) Chj(O), if R = P, or (ii) P(e2, e1) Chj(O), if R = P . Also, when we say that B(e1) holds in Chj(O), where B is a basic concept, we mean (i) A(e1) Chj(O) if B = A, and (ii) R(e1, e2) holds in Chj(O) for some e2, if B = R. The additional rules are as follows:

If e1 = e2 is in Chj(O), and ineq(e1, e2) is not in Chj(O), then Chj+1(O) = Chj(O) {ineq(e1, e2)};

If ineq(e1, e2) is in Chj(O), and ineq(e2, e1) is not in Chj(O), then Chj+1(O) = Chj(O) {ineq(e2, e1)};

if B1 B2 T , B1(e1) and B2(e2) hold in Chj(O), and ineq(e1, e2) is not in Chj(O), then Chj+1(O) = Chj(O) {ineq(e1, e2)};

if R1 R2 T , either R1(e3, e1) and R2(e3, e2), or R1(e1, e3) and R2(e2, e3) hold in Chj(O), and ineq(e1, e2) is not in Chj(O), then Chj+1(O) = Chj(O) {ineq(e1, e2)}.

From Ch(O) it is immediate to deﬁne an interpretation IO for O, extended in order to deal with predicate ineq:

ΔIO = VO V , where VO is the set of individuals occurring in O;

e IO = e for every individual e VO;

AIO = {e | A(e) occurs in Ch(O)} for every atomic concept A;

P IO = {(e1, e2) | P(e1, e2) occurs in Ch(O)} for every atomic role P;

ineq IO = {(e1, e2) | ineq(e1, e2) occurs in Ch(O)}.

Note that, by deﬁnition, =IO is the set of all pairs of distinct individuals in VO V , i.e. =IO= {(e1, e2) | e1, e2 VO V e1 = e2}. Obviously, for a DL-Lite = R ontology, Ch(O) can be inﬁnite, due to the presence of existential quantiﬁers in the right-hand side of inclusion axioms, which, by introducing fresh unknown variables, can trigger an inﬁnite number of rule applications. It is easy to see that, on the contrary, for a DL-Lite , = RDFS ontology O, Ch(O) is ﬁnite, and can be computed in polynomial time in the size of O. We next show that IO enjoys some crucial properties for DL-Lite = R ontologies O.

Proposition 1. If M = ΔM, M is a model of a DL-Lite = R ontology O, then there exists a function Ψ from ΔIO to ΔM such that: 1. for every e ΔIO, if e AIO, then Ψ(e) AM; 2. for every pair e1, e2 ΔIO, if (e1, e2) P IO, then (Ψ(e1), Ψ(e2)) P M; 3. for every pair e1, e2 ΔIO, if (e1, e2) ineq IO, then Ψ(e1) = Ψ(e2). The above proposition shows the importance of distinguishing between = and ineq. Indeed, while by deﬁnition of IO two different individuals e1, e2 satisfy e1 =IO e2, it may happen that for some model M of O, e M 1 = e M 2 , implying that no function Ψ exists from ΔIO to ΔM such that Ψ(e1) = Ψ(e2). In other words, condition 3 in Proposition 1 does not hold if we replace ineq with =. Note that if IO satisﬁes all the axioms of O, then it is a model of O, and therefore O is satisﬁable. Otherwise, it can be easily seen that IO violates at least one disjointness or one inequality axiom of O. In particular, it can be proved that IO violates some inequality axiom if and only if e = e occurs in O for some e in VO. As a result, we can devise a satisﬁability checking algorithm for DL-Lite = R by slightly modifying the so-called violation query for DL-Lite R, and this shows that, similarly to the canonical interpretation of a DL-Lite R ontology, IO is instrumental for checking the satisﬁability of a DL-Lite = R ontology O. In turn, this implies that checking the satisﬁability of a DL-Lite = R ontology O = T , A can be done in AC0 in the size of A and in PTIME in the size of T , exactly like in DL-Lite R. A reasonable question to ask is whether IO is also the right tool for query answering. The next theorem provides a positive answer to this question for the class CQ =,b. In what follows, δ(q) denotes the query obtained by replacing each inequality atom t1 = t2 in q with the atom ineq(t1, t2).

Theorem 1. Let t be a tuple of individuals of a satisﬁable DL-Lite = R ontology O, and let q be a CQ =,b over O. We have that t cert(q, O) if and only if t δ(q)IO. The above theorem states that IO is instrumental also for answering CQ =,bs over DL-Lite = R ontologies. However, we will see in the next two sections that this theorem is no longer valid when we move from CQ =,bs to either UCQ =,bs, or CQ =s. From now on, we implicitly assume to deal with satisﬁable ontologies. Moreover, unless otherwise stated and without loss of generality, we consider only boolean UCQ =s. Indeed, given an n-ary UCQ = q, a DL-Lite = R ontology O = T , A , and an n-tuple t of individuals of O, checking whether t cert(q, O) is equivalent to checking whether O |= q( t), where q( t) denotes the boolean UCQ = obtained by replacing appropriately the distinguished variables of each disjunct of q with the individuals of t.

4 UCQ =s over DL-Lite , = RDFS ontologies

We study the problem of answering UCQ =s over satisﬁable DL-Lite , = RDFS ontologies.

Theorem 1 tells us that the certain answers to a CQ =,b q over a DL-Lite , = RDFS ontology O coincide with δ(q)IO. However, the following example shows that the problem drastically changes as soon as we consider general CQ =s. Example 1. Consider the DL-Lite RDFS ontology O = T , A , where T = {A1 A2} and A = {A1(a1), A2(a2), P(b, c1), P(b, c2), P(c1, a1), P(c2, a2)}. For the boolean CQ = q = {() | P(x, y1) P(x, y2) y1 = y2}, we have that δ(q)IO is false because ineq(c1, c2) is not in Ch(O). However O |= q, because in each model M where c M 1 = c M 2 the query is true with the bindings x, y1, y2 c1, a1, a2, whereas in each model M where c M 1 = c M 2 , q is true with the bindings x, y1, y2 b, c1, c2.

The above example provides a hint on how to design an algorithm for our problem. Intuitively, given a boolean UCQ = q, and a DL-Lite , = RDFS ontology O = T , A , we check whether O |= q by searching for a database that can be obtained from Ch(O) by equating some of the individuals, and that falsiﬁes q. We thus derive the upper bounds for the problem of answering UCQ =s in DL-Lite , = RDFS.

Theorem 2. Answering UCQ =s over DL-Lite , = RDFS ontologies is in co NP in data complexity and in Πp 2 in combined complexity. We now provide matching lower bounds for both data and combined complexity, showing that they hold already for the case of CQ =s. We start with data complexity.

Theorem 3. Answering CQ =s over DL-Lite , = RDFS ontologies is co NP-hard in data complexity. The proof of the above theorem has two interesting implications. (i) co NP-hardness in data complexity holds even for CQ2, =s over both DL-Lite RDFS and DL-Lite = RDFS, where CQk, = denotes the class of CQ = including at most k inequalities (and UCQk, = the class of unions of ﬁnite sets of CQk, =s with same arity). (ii) Answering UCQ2, =s over DL-Lite RDFS ontologies is co NP-hard too, and this corrects an erroneous statement in (Rosati 2007, Theorem 11), where it is claimed that answering UCQ =s over DL-Lite RDFS ontologies is in LOGSPACE in data complexity regardless of whether the UNA is adopted or not. It turns out that this latter statement is true only under the UNA. The following theorem provides the matching lower bound for combined complexity.

Theorem 4. Answering CQ =s over DL-Lite , = RDFS ontologies is Πp 2-hard in combined complexity. By looking at the proof of the theorem, one can see that Πp 2-hardness holds already in the case of both DL-Lite RDFS, and DL-Lite = RDFS. However, the reduction builds a CQ = whose number of inequalities depends on the input of the reduction, and therefore is not ﬁxed a priori. It is thus natural to ask which is the minimum number of inequalities in CQ =s that makes the problem Πp 2-hard in combined complexity. Similarly to the case of the co NP-hardness result in data complexity, we conjecture that such number is 2. Even though we have not been able to prove this conjecture, we show next that Πp 2-hardness holds for UCQ2, =s.

Algorithm Check Good(O, q, F) Input: DL-Lite , = RDFS ontology O = T , A , UCQ1, = q, sequence of functions F = {f1, . . . , fm} Output: true or false begin Compute B := Ch(O) for each i = 1 to m 1: if fi is a homomorphism from a disjunct of q1 to B let t1 = t2 be any inequality in any of such disjuncts if ineq(fi(t1), fi(t2)) B return true else replace each occurrence of fi(t1) appearing in B, in q, and in {fi+1, . . . , fm} with fi(t2) else return false return fm is a homomorphism from a disjunct of q2 to B end

Figure 1: The algorithm Check Good(O, q, F)

Theorem 5. Answering UCQ2, =s over DL-Lite , = RDFS ontologies is Πp 2-hard in combined complexity.

Interestingly, the proof of the previous theorem shows that Πp 2-hardness holds even if the query is the union of a CQ2, = and a CQ without inequalities, and the ontology is expressed in DL-Lite RDFS. Observe that, in this language, CQ =s containing even a single inequality have an empty set of certain answers. Thus, we are observing a surprising jump from constant time to Πp 2-hardness if we add union to such CQ =s. To complete the picture of answering UCQ =s in DL-Lite , = RDFS, it remains to study the case of UCQ1, =s. In what follows, without loss of generality, we assume that each UCQ1, = is written as q = q1 q2, where q2 is a UCQ with no inequalities and q1 is a UCQ1, = having exactly one inequality per disjunct. In principle, for answering UCQ1, =s over DL-Lite , = RDFS ontologies it is possible to use the algorithm provided in (Fagin et al. 2005, Theorem 5.12) in the context of data exchange. However, this would result in an exponential time algorithm with respect to the size of the query. On the contrary, by elaborating on the idea of (Fagin et al. 2005, Theorem 5.12), we have devised an algorithm that runs in PTIME in data complexity and in NP in combined complexity. We start with the following deﬁnition.

Deﬁnition 1. Let O = T , A be a DL-Lite , = RDFS ontology, and let q be a boolean UCQ1, = over O. A sequence F = {f1, . . . , fm} of functions from variables and individuals of q to individuals of A is good w.r.t. O and q if the algorithm Check Good(O, q, F) provided in Figure 1 returns true.

Roughly speaking, starting from B := Ch(O), in each step i from 1 to m 1 such that fi is a homomorphism from a disjunct of q1 to B, the algorithm Check Good(O, q, F) replaces everywhere the individual fi(t1) with the individual fi(t2), to consider the models in which fi(t1) = fi(t2) (since there is a homomorphism, q is true in the models where fi(t1) = fi(t2)). Afterwards, the algorithm sanctions that F is a good sequence if and only if either it is not possible to equate two individuals without contradicting an ineq

atom of B, or the resulting B and q2 are such that B |= q2. Using the above notion of good sequence, it is possible to derive the following characterization (n A denotes the number of individuals occurring in the ABox A).

Proposition 2. Let O = T , A be a DL-Lite , = RDFS ontology, and let q be a boolean UCQ1, = over O. We have that O |= q if and only if there exists a sequence F = {f1, . . . , fm} of m n A functions that is good w.r.t. O and q.

We are now ready to establish our result on answering UCQ1, =s in DL-Lite , = RDFS.

Theorem 6. Answering UCQ1, =s over DL-Lite , = RDFS ontologies is PTIME-complete in data complexity and NPcomplete in combined complexity.

Proof. (Sketch) NP-hardness in combined complexity follows from NP-hardness of CQ evaluation over relational databases (Chandra and Merlin 1977). In the rest of this proof sketch, we discuss only the upper bounds. By Proposition 2 it is possible to decide whether O |= q as follows. We guess a sequence F = {f1, . . . , fm} (with m n A) of functions from disjuncts of q1 to Ch(O) (note that this can be done in PTIME in the size of A). Then, by exploiting the algorithm Check Good(O, q, F), we check whether F is a good sequence w.r.t. O and q using: (i) a PTIME step in the size of O for computing B = Ch(O); (ii) for each i [1, m 1], a PTIME step for checking whether fi is a homomorphism from a disjunct of q1 to B, ineq(fi(t1), fi(t2)) B, and for replacing each occurrence of fi(t1) with fi(t2); ﬁnally, (iii) a PTIME step for checking whether fm is a homomorphism from some disjunct of q2 to B.

Again, from the proof of the theorem we can derive interesting observations. (i) PTIME-hardness in data complexity holds even for the problem of answering CQ1, =s over both DL-Lite = RDFS and DL-Lite RDFS ontologies. (ii) The result holds even for ontologies with disjointness on concepts only, and therefore it strengthens the PTIME-hardness result of (Guti errez-Basulto et al. 2015, Theorem 15) for the case of DL-Litecore ontologies. (iii) Answering UCQ1, =s over DL-Lite RDFS ontologies is PTIME-hard in data complexity, too.

5 The case of CQ =,bs and UCQ =,bs

In this section, we consider answering queries with bounded inequalities over satisﬁable DL-Lite = R ontologies. In particular, we ﬁrst deal with the case of CQ =,bs, and then we address the case of UCQ =,bs. As for CQ =,bs, we start by introducing some notations. Given an inequality atom x1 = x2 and a disjointness axiom γ, ρ(x1 = x2, γ) denotes the following formula:

ρ(x1 = x2, A1 A2) = A1(x1) A2(x2),

ρ(x1 = x2, A R) = ρ(x1 = x2, R A) = A(x1) R(x2, z), where z is a fresh variable,

ρ(x1 = x2, R1 R2) = R1(x1, z) R2(x2, w), where z and w are fresh variables, and

ρ(x1 = x2, R1 R2) = R1(x1, z) R2(x2, z) R1(z, x1) R2(z, x2), where z is a fresh variable,

where an atom of the form R(x, y) stands for either P(x, y) if R denotes an atomic role P, or P(y, x) if R denotes the inverse of an atomic role, i.e., R = P . Given an inequality atom x1 = x2 and a DL-Lite = R TBox T , we denote by σ(x1 = x2, T ) the disjunction

ineq(x1, x2) ineq(x2, x1)

i=1 (ρ(x1 = x2, γi) ρ(x2 = x1, γi)),

where γ1, . . . , γm are all the disjointness axioms of T . Finally, we denote by τ(q, T ) the query obtained from q by substituting every inequality x1 = x2 by σ(x1 = x2, T ), and then turning the resulting query into an equivalent UCQ. We next illustrate the query τ(q, T ) by an example.

Example 2. Consider the DL-Lite = R ontology O = T , A with T = {P1 P2, A1 A2}, and the CQ =,b

q = {(x1, x2) | P2(x1, x2) x1 = c}

over O. It is easy to see that σ(x1 = c, T ) is the formula ineq(x1, c) ineq(c, x1) A1(x1) A2(c) A2(x1) A1(c). Then, τ(q, T ) is the UCQ =,b whose disjuncts are the following: {(x1, x2) | P2(x1, x2) ineq(x, c)}, {(x1, x2) | P2(x1, x2) ineq(c, x1)}, {(x1, x2) | P2(x1, x2) A1(x1) A2(c)}, and {(x1, x2) | P2(x1, x2) A1(c) A2(x)}.

For a DL-Lite = R ontology O = T , A , we denote by Oineq = T , Aineq the DL-Lite R ontology where ineq is a new atomic role, and Aineq is the DL-Lite R ABox obtained from A by replacing each assertion c1 = c2 appearing in A with the assertion ineq(c1, c2). The next proposition, whose proof relies on an extension of (Calvanese et al. 2007, Lemma 39) and on Theorem 1, states that computing cert(q, O) for a given DL-Lite = R ontology O = T , A , and a CQ =,b q over O, can be reduced to computing the certain answers of the UCQ τ(q, T ) over the DL-Lite R ontology Oineq.

Proposition 3. Let O = T , A be a DL-Lite = R ontology, and let q be a CQ =,b over O. Then, we have that cert(q, O) = cert(τ(q, T ), Oineq). From the above proposition, we immediately derive that answering CQ =,bs over DL-Lite = R ontologies has the same data and combined complexity as answering UCQs over DL-Lite R ontologies.

Theorem 7. Answering CQ =,bs over DL-Lite = R ontologies is in AC0 in data complexity, and NP-complete in combined complexity. Looking at the proof of the two above statements, one realizes the importance of Theorem 1, stating that, similarly to DL-Lite R, DL-Lite = R admits a model IO that is representative of all the models of O w.r.t. answering CQ =,bs. One might therefore think that, analogously to DL-Lite R, this property extend to UCQ =,bs. The following example shows that, surprisingly, this is not the case.

Example 3. Consider the DL-Lite = R ontology O = T , A , where T = and A = {P(a, b)}. For the UCQ =,b Q = q1 q2, where q1 = {() | P(a, a)} and q2 = {() | a = b}, it is easy to see that δ(Q)IO is false. However, one can verify that O |= Q. Indeed, for any model M of O, either a M = b M and M |= q1, or a M = b M and M |= q2.

What the example tells us is that answering UCQ =,bs over a DL-Lite = R ontology O cannot be done simply using the interpretion IO. Nevertheless, we are able to prove that the problem of answering UCQ =,bs over DL-Lite = R ontologies is still in AC0 in data complexity, although, unless the polynomial hierarchy collapses to the ﬁrst level, it does not have the same combined complexity as the UCQ and the CQ =,b cases. We start by introducing the notions of e-satisﬁability and e-entailment for an equivalence relation e3. In what follows, we write c1 e c2 to denote (c1, c2) e.

Deﬁnition 2. Let O = T , A be a DL-Lite = R ontology, e be an equivalence relation on a set C of individuals of O, and I be a model of O. Then, we say that I is an e-model of O if, for any pair of individuals c1, c2 of O, we have that c I 1 = c I 2 if and only if c1 e c2. Furthermore, we say that O is e-satisﬁable if it has an e-model.

Deﬁnition 3. Let O = T , A be a DL-Lite = R ontology, e be an equivalence relation on a set C of individuals of O, and q be a boolean UCQ = over O. Then, we say that O e-entails q, denoted by O |=e q, if I |= q for each e-model I of O.

Note that for a DL-Lite = R ontology O = T , A , if e is the equivalence relation on the set of all individuals appearing in A such that e = {(c, c) | c occurs in A}, then the notions of e-satisﬁability and e-entailment coincide with the usual notion of satisﬁability and entailment, respectively, when the UNA is enforced. In such cases: (i) the e-satisﬁability can be checked in AC0 in the size of A and in PTIME in the size of T (Calvanese et al. 2007), and (ii) it can be readily seen that checking whether O |=e q for a UCQ =,b q over O can be done in AC0 in the size of A and it is NP-complete in combined complexity. The next proposition shows that the complexity of both the above computational problems remains the same even when e is any arbitrary equivalence relation on a set C of individuals of O.

Proposition 4. Let O = T , A be a DL-Lite = R ontology, and let e be an equivalence relation on a set of individuals C of O. We have that:

checking whether O is e-satisﬁable can be done in AC0 in the size of A and in PTIME in the size of T ; if q is a boolean UCQ =,b over O, then checking whether O |=e q can be done in AC0 in the size of A and it is NP-complete in combined complexity.

Based on this result, we now characterize when O |= q for a boolean UCQ =,b and a DL-Lite = R ontology O.

3An equivalence relation e on a set of individuals C is a binary relation over C that is reﬂexive, symmetric, and transitive.

Proposition 5. Let O = T , A be a DL-Lite = R ontology, and let q be a boolean UCQ =,b over O. We have that O |= q if and only if there exists an equivalence relation e on the set Cq of all individuals appearing in q such that O |=e q. Intuitively, to decide O |= q, it is sufﬁcient to guess an equivalence relation e between the individuals of q for which there exists an e-model I of O such that I |= q. Observe that, by deﬁnition, such model exists if and only if O |=e q. The following theorem characterizes the complexity of answering UCQ =,bs over DL-Lite = R ontologies.

Theorem 8. Answering UCQ =,bs over DL-Lite = R ontologies is in AC0 in data complexity and Πp 2-complete in combined complexity.

Proof. (Sketch) As for the upper bounds, we now show how to decide whether O |= q in AC0 in data complexity and in Σp 2 in combined complexity. In particular, observe that by Proposition 5 it is sufﬁcient to: (i) guess an equivalence relation e; (ii) and check whether O |=e q, where this last step, due to Proposition 4, can be done in AC0 in the size of A, and with an NP-oracle in the size of the input.

The proof of the above theorem allows us to conclude that the same complexity results hold even for the problem of answering UCQ =,bs over DL-Lite RDFS ontologies.

6 Conclusion We have carried a thorough analysis of the problem of answering UCQs with inequalities posed to a DL-Lite = R ontology. The results presented in this paper greatly contribute to clarify how inequalities impact on the problem of answering queries over DL-Lite = R ontologies. In particular, we have presented the ﬁrst results on dealing with inequalities in queries posed to DL-Lite , = RDFS ontologies, and we have deeply investigated a speciﬁc class of queries, namely UCQ =,bs, for which query answering over DL-Lite = R ontologies is still in AC0 in data complexity. We have also mentioned the connection between the problems studied here and two other problems, namely containment of conjunctive queries with inequalities in databases, and answering UCQ =s over OWL 2 QL ontologies under the direct semantics, although we could not elaborate on these aspects for the lack of space. There are several issues to consider for continuing the work presented in this paper, the most obvious being trying to decide which is the minimum number of inequalities that makes query answering over DL-Lite , = RDFS Πp 2-hard in combined complexity. Another interesting future work is to look for extensions of both DL-Lite = R, and UCQ =,bs for which query answering is still decidable/tractable. Finally, we observe that it is still open whether answering CQ =s over DL-Litecore ontologies is decidable.

Acknowledgments This work has been supported by Sapienza under the project PRE-O-PRE and by MIUR, under the SIR project MODEUS - grant n. RBSI14TQHQ, and under the PRIN 2017 project HOPE .

References Abiteboul, S.; Hull, R.; and Vianu, V. 1995. Foundations of Databases. Addison Wesley Publ. Co. Artale, A.; Calvanese, D.; Kontchakov, R.; and Zakharyaschev, M. 2009. The DL-Lite family and relations. Journal of Artiﬁcial Intelligence Research 36:1 69. Baader, F.; Calvanese, D.; Mc Guinness, D.; Nardi, D.; and Patel-Schneider, P. F., eds. 2003. The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press. Baader, F.; Horrocks, I.; Lutz, C.; and Sattler, U. 2017. An Introduction to Description Logic. Cambridge University Press. Bienvenu, M. 2016. Ontology-mediated query answering: Harnessing knowledge to get more from data. In Proceedings of the Twenty-Fifth International Joint Conference on Artiﬁcial Intelligence (IJCAI 2016), 4058 4061. Calvanese, D.; De Giacomo, G.; Lenzerini, M.; Rosati, R.; and Vetere, G. 2004. DL-Lite: Practical reasoning for rich DLs. In Proceedings of the Seventeenth International Workshop on Description Logic (DL 2004), volume 104 of CEUR Electronic Workshop Proceedings, http://ceur-ws.org/. Calvanese, D.; De Giacomo, G.; Lembo, D.; Lenzerini, M.; and Rosati, R. 2007. Tractable reasoning and efﬁcient query answering in description logics: The DL-Lite family. Journal of Automated Reasoning 39(3):385 429. Chandra, A. K., and Merlin, P. M. 1977. Optimal implementation of conjunctive queries in relational data bases. In Proceedings of the Ninth ACM Symposium on Theory of Computing (STOC 1977), 77 90. Cima, G.; De Giacomo, G.; Lenzerini, M.; and Poggi, A. 2017. On the SPARQL metamodeling semantics entailment regime for OWL 2 QL ontologies. In Proceedings of the Seventh International Conference on Web Intelligence, Mining and Semantics (WIMS 2017), 10:1 10:6. Cima, G.; Lenzerini, M.; and Poggi, A. 2019. Semantic characterization of data services through ontologies. In Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI 2019), 1647 1653. Cuenca Grau, B. 2004. A possible simpliﬁcation of the semantic web architecture. In Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004), 704 713. Fagin, R.; Kolaitis, P. G.; Miller, R. J.; and Popa, L. 2005. Data exchange: Semantics and query answering. Theoretical Computer Science 336(1):89 124. Glimm, B. 2011. Using SPARQL with RDFS and OWL entailment. In Reasoning Web. Semantic Technologies for the Web of Data Seventh International Summer School Tutorial Lectures (RW 2011). 137 201. Guti errez-Basulto, V.; Ib a nez-Garc ıa, Y. A.; Kontchakov, R.; and Kostylev, E. V. 2015. Queries with negation and inequalities over lightweight ontologies. Journal of Web Semantics 35:184 202. Guti errez-Basulto, V.; Ib a nez-Garc ıa, Y. A.; and Kontchakov, R. 2012. An update on query answering

with restricted forms of negation. In Web Reasoning and Rule Systems - Sixth International Conference (RR 2012), 75 89. Kolaitis, P. G.; Martin, D. L.; and Thakur, M. N. 1998. On the complexity of the containment problem for conjunctive queries with built-in predicates. In Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS 1998), 197 204. Koutris, P.; Milo, T.; Roy, S.; and Suciu, D. 2017. Answering conjunctive queries with inequalities. Theory of Computing Systems 61(1):2 30. Motik, B.; Cuenca Grau, B.; Horrocks, I.; Wu, Z.; Fokoue, A.; and Lutz, C. 2012. OWL 2 Web Ontology Language proﬁles (second edition). W3C Recommendation, World Wide Web Consortium. Available at http://www.w3.org/TR/ owl2-proﬁles/. Ortiz, M. 2018. Improving data management using domain knowledge. In Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI 2018), 5709 5713. Poggi, A.; Lembo, D.; Calvanese, D.; De Giacomo, G.; Lenzerini, M.; and Rosati, R. 2008. Linking data to ontologies. Journal on Data Semantics X:133 173. Poggi, A. 2016. On the SPARQL direct semantics entailment regime for OWL 2 QL. In Proceedings of the Twenty-Ninth International Workshop on Description Logics (DL 2014), CEUR Electronic Workshop Proceedings, http://ceur-ws.org/. Rosati, R. 2007. The limits of querying ontologies. In Proceedings of the Eleventh International Conference on Database Theory (ICDT 2007), volume 4353 of Lecture Notes in Computer Science, 164 178. Vardi, M. Y. 1982. The complexity of relational query languages. In Proceedings of the Fourteenth ACM SIGACT Symposium on Theory of Computing (STOC 1982), 137 146. Xiao, G.; Calvanese, D.; an Domenico Lembo, R. K.; Poggi, A.; Rosati, R.; and Zakharyaschev, M. 2018. Ontologybased data access: A survey. In Proceedings of the Twenty Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI 2018), 5511 5519.