# combining_existential_rules_and_description_logics__2d7dce0b.pdf

Combining Existential Rules and Description Logics

Antoine Amarilli Michael Benedikt Télécom Paris Tech; Institut Mines Télécom; CNRS LTCI University of Oxford antoine.amarilli@telecom-paristech.fr michael.benedikt@cs.ox.ac.uk

Query answering under existential rules implications with existential quantiﬁers in the head is known to be decidable when imposing restrictions on the rule bodies such as frontier-guardedness [Baget et al., 2010; 2011a]. Query answering is also decidable for description logics [Baader, 2003], which further allow disjunction and functionality constraints (assert that certain relations are functions); however, they are focused on ER-type schemas, where relations have arity two. This work investigates how to get the best of both worlds: having decidable existential rules on arbitrary arity relations, while allowing rich description logics, including functionality constraints, on arity-two relations. We ﬁrst show negative results on combining such decidable languages. Second, we introduce an expressive set of existential rules (frontier-one rules with a certain restriction) which can be combined with powerful constraints on aritytwo relations (e.g. GC2,ALCQIb) while retaining decidable query answering. Further, we provide conditions to add functionality constraints on the higher-arity relations.

1 Introduction

Recent years have seen an explosion of techniques for solving the query answering problem: given a query q, a conjunction F of atoms, and a set of logical constraints Σ, determine whether q follows from F and Σ. In databases this is called querying under constraints or the certain answer problem, seeing F as an incomplete database, and Σ as restrictions on the possible completions. For researchers working on description logics, F is referred to as the A-box and Σ the T-box. In both communities q is usually a conjunctive query, an existential quantiﬁcation of conjunctions of atoms, equivalent to a basic SQL SELECT. We will make this assumption throughout this work, referring for simplicity to the problem as just query answering (QA). QA is undecidable when Σ ranges over arbitrary ﬁrst-order logic constraints. This motivates the search for restricted constraint languages with decidable QA. Within the description

logic community, powerful such languages were developed to express constraints on vocabularies of arity two. The unary relations are referred to as concepts while the binary ones are the roles. The languages can build new concepts and roles from basic ones via Boolean operations and (limited) quantiﬁcation, and many of them, such as DL-Lite [Calvanese et al., 2005] or ALCQIb [Tobies, 2001], may restrict the input roles R(x,y) to be functional for all x there is at most one y such that R(x,y). Functionality constraints are crucial to faithfully model many real-world relationships: the relationship of a person to their birthdate, the relationship of an event to its starting time, etc. Hence, description logics are very powerful languages for arity-two vocabularies. In parallel, the AI and database communities have developed rich constraint languages on arbitrary arity via existential rules or tuple-generating dependencies (TGDs). Existential rules are constraints of the form x (φ(x) y ψ(x ,y)) where x x and φ and ψ are conjunctions of atoms. They generalize the well-known inclusion dependencies or referential constraints in databases [Abiteboul et al., 1995], and can also express mapping relationships used in data exchange [Fagin et al., 2005] and data integration [Lenzerini, 2002]. Although QA over general rules is undecidable, important subclasses are decidable. First, decidability holds whenever the chase procedure [Abiteboul et al., 1995] is guaranteed to terminate, which is ensured by a number of conditions on the rules, e.g., weak acyclicity [Fagin et al., 2005], joint acyclicity [Krötzsch and Rudolph, 2011], or the very restricted class of source-to-target TGDs. See [Grau et al., 2013] for a survey and [Baget et al., 2014] for a recent study. A second class of tame constraints are those that admit bounded-treewidth models. There are several such classes, such as guarded TGDs [Calì et al., 2012a], frontier-guarded TGDs [Baget et al., 2010], or the more general greedy bounded-treewidth sets [Baget et al., 2011b]. However, many features of description logics, such as disjunction or functionality restrictions, cannot be expressed by existential rules. Could we then enjoy the best of both worlds, by allowing both description logic constraints and existential rules, while maintaining the decidability of QA? This paper studies to what extent both paradigms can be combined, by looking for classes of constraints with decidable QA over relational schemas of arbitrary arity that can 1. express non-trivial existential rules over any relation in the schema and 2. assert expressive con-

Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015)

straints (e.g., in ALCQIb) on the arity-two subschema the subset of the relations of arity one and two within the schema Our ﬁrst results (Section 3) are negative: we show that aritytwo languages featuring functionality constraints on the aritytwo subschema may lead to undecidable QA when combined with even very simple acyclic rules (source-to-target TGDs, S2T), or with the simplest existential rules that export two variables (frontier-two inclusion dependencies, ID[2]). More surprisingly, undecidability can occur with rules exporting only a single variable, the class of frontier-one dependencies FR[1] of [Baget et al., 2009]. We say the existential rule languages S2T, ID[2], FR[1] are destructive of arity-two QA. We then show (Section 4) that by restricting FR[1] slightly, imposing that the head of the rules have a certain tree shape (denoted non-looping ), we can obtain a class of existential rules that can be combined with expressive constraints on the arity-two schema while maintaining decidable QA (we call this not destructive). The reduction proceeds in two steps. We ﬁrst handle rules with tree-shaped bodies, via a direct rewriting technique to constraints on an arity-two encoding of the schema. Second, we handle rules with non-tree-shaped bodies, showing that the bodies can be soundly replaced by a tree-shaped approximation. Soundness is proven by extending the technique of treeiﬁcation used previously in many modal and guarded logics (e.g., [Bárány et al., 2014]), showing that models of the constraints can be unraveled to be tree-shaped. We go on to study (Section 5) the addition of functional dependencies (FDs), a well-known generalization of description logic functionality constraints to arbitrary arity. QA with existential rules and FDs is generally undecidable unless their interaction with the existential rules is controlled, e.g., by imposing the non-conﬂicting condition [Calì et al., 2012b]. We show that FDs can be added to our existential rules while maintaining decidable QA with the arity-two constraints, as long as the non-conﬂicting condition is satisﬁed. As in the standard non-conﬂicting setting, we show that the FDs can always be satisﬁed unless the initial facts violate them. We prove this by modifying the unraveling argument. Our results have the advantage that QA for our combined constraints reduces to QA on an arity-two schema; hence, existing QA algorithms for rich description logics could be extended to arbitrary arity signatures with expressive constraints.

Related work. A great deal of research has centered around the integration of DLs with Datalog-style rules, including work as early as the 1990 s, when the languages AL-Log [Donini et al., 1991] and CARIN [Levy and Rousset, 1998] were introduced. AL-Log links Horn rules with concepts from a description logic terminology, while the later language CARIN provides a broader framework allowing both concepts and roles from a terminology to appear in rules. [Levy and Rousset, 1998] provides both entailment algorithms for CARIN and undecidability results exploring the borderline for combining rules and DLs. Datalog rules, however, unlike the existential rules that we consider in this work, do not allow existential quantiﬁcation in the head, so they cannot assert the existence of higher-arity facts on fresh elements. Another approach to combination are description logics

that support higher-arity relations directly. Languages such as DLRreg [Calvanese et al., 2008] give some support for higher arity while retaining a DL-style syntax. Unlike them, we support existential rules with cyclic bodies that cannot be encoded in DLRreg, as well as arbitrary higher-arity functional dependencies that go beyond DL-expressible functionality assertions. On the other hand, we do not support some features of DLRreg, such as regular expression on role paths. Indeed, we do not consider the interaction of rules with DLs supporting transitivity and other recursion mechanisms [Glimm et al., 2008], focusing instead only on ﬁrst-order-expressible constraints given by decidable DLs and existential rules.

2 Preliminaries

Signatures, facts, queries. A signature σ consists of relation names (e.g. R) and an associated arity (e.g. |R|). We write σ as σ 2 σ>2, containing respectively the relations of arity 2 and the higher-arity relations with arity > 2. An atom R(x) consists of a relation name R and an |R|-tuple x of variables. A σ-fact (or just fact when σ is clear from context) is a conjunction of atoms using relations in σ. A Boolean conjunctive query (or CQ) is an existentially quantiﬁed conjunction of atoms. In this paper we assume for simplicity that CQs are Boolean, i.e., have no free variables, and we disallow constants. This is without loss of generality: for non-Boolean queries we can enumerate all possible assignments, and constants can be encoded with fresh unary relations.

Constraints, QA. We consider constraints that are formulae in function-free and constant-free ﬁrst-order logic (FO), on the signature σ. A σ-interpretation I (or just interpretation) consists of a domain dom(I) and an interpretation function I mapping each relation R of σ to a set RI of |R|-tuples of dom(I). The deﬁnition of I satisfying a FO formula φ, written I |= φ, is standard. A witness W of F in I is an interpretation that maps each relation R to the tuples in RI obtained by substituting the atoms of F using some variable binding w such that I |= F(w). We study the query answering problem (QA): given a fact F, a set of constraints Σ, and a CQ q, decide the validity of x (F(x) Σ q); that is, whether F and Σ entail q. In this case, we write F Σ |= q. The combined complexity of QA, for a ﬁxed class of constraints, is the complexity of deciding it when all of F, Σ (in the constraint class) and q are given as input. If we assume that Σ and q are ﬁxed, and only F is given as input, then we deﬁne instead the data complexity. The QA problem above allows arbitrary FO constraint classes. Below we present two kinds of integrity constraints that are known to enjoy decidable QA.

Existential rules. An existential rule (or tuple-generating dependency, or TGD) is a logical constraint of the form x (φ(x) y ψ(x ,y)), with x x, where the body φ and head ψ are conjunctions of atoms. Equality atoms and constants are disallowed. For brevity, in rules we often omit the quantiﬁcation on x and write as a comma. A rule is single-head if its head consists of only one atom. QA is undecidable for general rules (following from [Beeri and Vardi, 1981]). One class of rules with decidable QA are

those satisfying acyclicity conditions. We will show negative results for one of the most restrictive classes, the class S2T of source-to-target TGDs, where σ is partitioned as σ = σS σT, the bodies of all rules only use relations in σS, and the heads only use relations in σT. Our results on S2T extend to more permissive acyclicity conditions, such as those mentioned in the introduction. A second class of decidable rules guarantees that it sufﬁces to consider bounded-treewidth interpretations, usually because of constraints on the rule bodies. We focus on the class FR[1] of frontier-one rules, following [Baget et al., 2009]: the frontier of a rule is the set x of variables that occur both in the body and the head, and a rule is frontier-one if |x | = 1. The class of inclusion dependencies ID imposes that the head and body are single atoms where each variable is used only once and that the frontier is not empty, and we will focus on the class ID[2] of the inclusion dependencies with frontier size 2. QA is decidable for FR[1] [Baget et al., 2009]. For ID it is decidable and has PTIME data complexity [Calì et al., 2003b]. Existential rules can be augmented with functional dependencies (FDs), which are variants of existential rules that impose equalities. Writing x = x1 xn and similarly for y, an FD on the relation R is of the form: xy (R(x1,...,xn) R(y1,...,yn) V l L xl = yl) xr = yr for some 1 r |R| and some subset L {1,...,|R|} which we call the determiner of the FD. QA is undecidable when combining existential rules and arbitrary FDs, for instance it is undecidable for ID[2] and FDs [Calì et al., 2003a].

Arity-two constraints. The second kind of tame constraints are arity-two constraints, which are only deﬁned on σ 2. The most general such language that we study is the two-variable guarded fragment with counting quantiﬁers, GC2 [Kazakov, 2004]. GC2 is the smallest class of constantfree FO formulae with at most two variables, containing all atoms for σ 2 relations, closed under Boolean connectives, under guarded universal and existential quantiﬁcation, and under number quantiﬁcations: if φ(x,y) is a GC2 formula and A(x,y) is an arity-two atom with two free variables (the guard), then ny A(x,y) φ(x,y) and <ny A(x,y) φ(x,y) are formulae, where n is an integer. QA for GC2 is decidable and its data complexity is in co-NP [Pratt-Hartmann, 2009]. Description logics (DLs) are arity-two constraint languages. Examples of DLs are DL-Lite [Calvanese et al., 2005], a lightweight DL often used in the context of ontology-based data access, and ALCQIb [Tobies, 2001], a more expressive DL that can make full use of number restrictions, a useful feature in practice. Both DL-Lite and ALCQIb can assert concept inclusions like C C , where C and C are concepts (arity 1 relations), meaning that C holds whenever C does; and functionality assertions funct(R), where R is a role (an arity 2 relation), corresponding to x 1y R(x,y) in GC2, or to the FD: x1x2y1y2 R(x1,x2) R(y1,y2) x1 = y1 x2 = y2. Despite its expressiveness, ALCQIb can still, as DL-Lite, be captured by GC2, which implies decidable QA. Roles and concepts can be atomic (i.e., from σ 2) or deﬁned using constructors; we give some examples from ALCQIb. The inverse R of an atomic role R is such that R (b,a) holds whenever R(a,b) does. An intersection of roles, which is

written R1 Rn, holds for (a,b) whenever Ri(a,b) holds for all 1 i n. and are the true and false concepts. The intersection of concepts C1,...,Cn, written C1 Cn, holds whenever each of the Ci does. The negation C of a concept C holds for elements where C does not hold. An existential concept R.C for a role R and concept C holds for every element a such that b R(a,b) C(b) does. Note that many of these features (e.g., functionality assertions and negation) cannot be expressed as existential rules.

Combining constraint classes. For any class CL of existential rules, we call CL non-destructive (of arity-two QA) if QA is decidable for the class CL GC2 of conjunctions of constraints of CL (on σ) and of constraints of GC2 (on σ 2). Otherwise, we call CL destructive.

3 Negative Results for Combination

We now present classes of existential rules which have decidable QA but are destructive. First, we observe that even the simplest class of rules that ensures decidability based on chase termination, the class S2T of source-to-target TGDs, is destructive. This is not so surprising, since the arbitrary constraints on the arity-two signature may add dependencies that are not source-to-target.

Theorem 3.1. S2T is destructive of arity-two QA, even when the whole σ has arity two and there is no query (i.e., this is just the satisﬁability problem asking whether the fact and constraints are satisﬁable).

Thus we move on to classes of existential rules that are decidable because of guardedness assumptions. We ﬁrst observe that the class ID[2] of frontier-two inclusion dependencies is destructive of arity-two QA. In fact, functionality assertions on the binary relations are sufﬁcient to get undecidability, because they can be lifted to functionality assertions on higher-arity relations using ID[2]. Thus, following a standard reduction from QA to entailment of dependencies as in [Calì et al., 2003a], we can use the undecidability of entailment for ID[2] and FDs (Theorem 2 of [Mitchell, 1983], which we adapt slightly) and prove the following:

Theorem 3.2. ID[2] is destructive of arity-two QA. In particular, QA is undecidable for ID[2] D, for any DL D (such as DL-Lite) featuring functionality assertions.

More surprisingly, frontier-one rules FR[1] are destructive of arity-two QA, even though they can only export a single variable, and this holds even when the whole σ has arity two. The reason is that FR[1] may be more expressive than GC2 as it can disobey the two-variable restriction.

Theorem 3.3. FR[1] is destructive of arity-two QA, even when the whole σ has arity two and there is no query.

This motivates the search for more restricted existential rule classes which could be non-destructive of arity-two QA.

4 From Existential Rules to Arity-Two

We will focus on the subclass of frontier-one rules whose heads do not contain non-trivial Berge cycles [Fagin, 1983].

Deﬁnition 4.1. A Berge cycle in a conjunction of atoms Ψ is a sequence A1,x1,A2,x2,...,An,xn of length n > 1 where the xi are pairwise distinct variables, the Ai are pairwise distinct atoms of Ψ, and every xi occurs in atoms Ai and Ai+1 (with addition modulo n, so xn occurs in A1). We say Ψ is non-looping if there is no Berge cycle of length above 2, and no Berge cycle that contains an atom of σ>2. We deﬁne the head-non-looping FR[1]Hnl subclass of FR[1] rules whose heads are non-looping. In particular, single-head FR[1] rules are always head-non-looping. Example 4.2. Rules A(x) yz R(x,y),S(y,z),T(z,x) and B(y) yz R(x,y),U(x,y,z) are not in FR[1]Hnl. However, A(x) y V(x,x,y,y) and B(x) y R(x,y),S(x,y),R(y,x) are in FR[1]Hnl. We claim that head-non-looping rules are non-destructive, in contrast with general frontier-one rules (Theorem 3.3): Theorem 4.3. FR[1]Hnl is not destructive of arity-two QA. Of course, this means that QA is decidable for FR[1]Hnl D, for any DL D expressible in GC2, such as ALCQIb. The rest of this section proves the theorem and addresses complexity.

Shredding. Our proof of Theorem 4.3 translates the FR[1]Hnl rules to arity-two constraints, using a common way to represent general relational databases in a binary relational store, which we call shredding: we represent an n-ary relation by a set of binary relations giving the link from each tuple (materialized as an element) to its attributes. We present ﬁrst the translation of the signature σ to its shredded arity-two signature σS, and the constraints imposed on σS-interpretations to ensure that they can be decoded back to σ-interpretations. Second, we explain how to shred facts and CQs. Deﬁnition 4.4. The shredded signature σS of a signature σ consists of σ 2, a unary relation Elt, and, for each R σ>2, a unary relation AR and binary relations Ri for 1 i |R|. The well-formedness constraints of σS, written wf(σS), are the following DL constraints (they are ALCQIb-expressible): C Elt for every unary relation C of σ 2 R. Elt and R . Elt for all binary R of σ 2 and the following, where R = S are in σ>2 and 1 i |R|: Ri. AR and R i . Elt Elt AR and AR AS AR Ri. and funct(Ri) The shredding SHR(F) of a σ-fact F is the σS-fact obtained by adding the atom Elt(x) for each variable x of F and replacing each atom R(x) of F when R σ>2 by the atoms AR(t) and Ri(t,xi) for 1 i |R|, for t a fresh variable. The shredding SHR(q) of a CQ q is similarly deﬁned. Example 4.5. Considering CQ q : xyz U(x),R(x,y),S(z,z,x), we deﬁne SHR(q) as: xyzt Elt(x),Elt(y),Elt(z),U(x), R(x,y),AR(t),S1(t,z),S2(t,z),S3(t,x).

Fully-non-looping. The interesting part is to deﬁne the shredding of FR[1]Hnl rules. We ﬁrst restrict to the class of fully-non-looping rules, FR[1]Fnl, whose head and body are non-looping. We show that FR[1]Fnl can be directly shredded to GC2. We will later move from FR[1]Fnl to FR[1]Hnl. For any existential rule τ : x φ(x) y ψ(x ,y) with x x , we deﬁne its shredding SHR(τ) as the existential rule

xt (SHR(φ(x))) yt (SHR(ψ(x ,y))), where t and t are the fresh elements introduced in the shredding of φ and ψ respectively. We claim the following: Lemma 4.6. For any FR[1]Fnl rule τ, SHR(τ) can be translated in PTIME to a GC2 sentence on σS. Example 4.7. For brevity, this example ignores the Elt and AR atoms when shredding. Consider the FR[1]Fnl rule: U(u),T(u,x),S(x) yz T(x,y),U(y),R(x,x,z,z) Its shredding is expressible in GC2 (and even in ALCQIb): ( T .U) S ( T.U) ( (R 1 R 2 ).( (R3 R4). )) By contrast, consider the following rule in FR[1]\FR[1]Hnl: U(x) yz R(x,y),S(x,y,z) Its shredding is as follows; it is not GC2-expressible: U(x) yzt R(x,y),S1(t,x),S2(t,y),S3(t,z) In the general case, the GC2 rewriting of Lemma 4.6 is obtained in PTIME by seeing the body and head of SHR(τ) as a tree, which is possible because τ is fully-non-looping. It is now easy to show the following general result: Proposition 4.8 (Shredding). For any fact F, GC2 constraints Σ, existential rules and CQ q, the following are equivalent: F Σ |= q; SHR(F) Σ SHR( ) wf(σS) |= SHR(q). Thus, from Lemma 4.6, as SHR(F), SHR( ), σS, wf(σS), and SHR(q) can be computed in PTIME following their deﬁnition, we deduce the following, in the case of FR[1]Fnl: Corollary 4.9. QA for GC2 and FR[1]Fnl constraints can be reduced to QA for GC2 in PTIME; further, when the constraints and query are ﬁxed in the input, they also are in the output, so data complexity bounds for GC2 QA are preserved. This concludes the proof of Theorem 4.3 for FR[1]Fnl constraints. It further implies that QA for GC2 and FR[1]Fnl has co-NP-complete data complexity, like GC2, [Pratt-Hartmann, 2009], and the combined complexity is the same as for GC2. Note that, although QA for GC2 is decidable, we know of no realistic implementations. Our translation could however reduce instead to arity-two QA with constraints in DLs such as ALCQIb, if we impose impose additional minor restrictions on the FR[1]Fnl rules (e.g., no atom of the form S(x,x)). For simplicity, however, we focus in the sequel on reductions to decidable QA on arity-two (i.e., translating to GC2) rather than investigating which restrictions would ensure that the output of our translations can be expressed in particular DLs.

Head-non-looping. We now extend the claim to FR[1]Hnl rather than FR[1]Fnl. The idea is that we rewrite FR[1]Hnl rules to FR[1]Fnl by treeifying them, considering all possible fullynon-looping rules that they imply, and all possible ways that they can match on the parts of the interpretations that satisfy the fact. To formalize this, we assume that we have added to the fact F one atom Px(x) for each variable x of F, where each Px is a fresh unary relation. We then deﬁne: Deﬁnition 4.10. The treeiﬁcation on fact F of a FR[1]Hnl rule τ : x (φ(x) y ψ(xf,y)), where xf x is the frontier variable, is the conjunction TRF(τ) of FR[1]Fnl rules deﬁned as follows:

consider every mapping f from x to itself, and let f(τ) be obtained from τ by renaming all variables in x with f; for every such f(τ), consider every x x and every mapping g from x to the variables of F, and construct g(f(τ)) by replacing every occurrence of each x x in φ(x) by fresh variables x1,...,xn, and adding the facts Pg(x)(xi) for all x x and all i (if xf x , also replace xf in ψ(xf,y) by one of its copies); if g(f(τ)) is fully-non-looping, add it to TRF(τ). Example 4.11. Consider a fact F and the following rule τ: R(x,y),S(y,z),T(z,w),U(w,x) A(x) The treeiﬁcation TRF(τ) contains the rule: R(x,y),S(y,z),T(z,y),U(y,x) A(x). Consider the rule τ : R(x,y),S(y,x,x) A(x), and a fact F containing variable z. Then TRF(τ ) contains: R(x1,y),S(y,x2,x3),Pz(x1),Pz(x2),Pz(x3) A(x1) We now claim: Proposition 4.12. For any fact F, GC2 constraints Σ, FR[1]Hnl rules and CQ q, the following are equivalent: F Σ |= q; F Σ TRF( ) |= q. This proposition implies that QA for FR[1]Hnl and GC2 can be reduced to QA for FR[1]Fnl and GC2, which is decidable by the Shredding Proposition, proving Theorem 4.3. To prove Proposition 4.12, for the ﬁrst direction, if F Σ |= q, one can show that all of the fresh unary relations Px in an interpretation of F Σ q can be assumed to be interpreted by one tuple. One then shows that implies TRF( ) on such interpretations. For the other direction, assuming that F Σ TRF( ) |= q, the Shredding Proposition implies that there is a σS-interpretation J of Θ = Σ SHR(TRF( )) wf(σS), q = SHR(q), and the existential closure of F = SHR(F). We apply an unraveling argument to show that J can be made cycle-free: Deﬁnition 4.13. The Gaifman graph G(I) of an interpretation I is the undirected graph on dom(I) connecting any two elements co-occurring in a tuple of I. Given a fact F, an interpretation I is cycle-free except for F if F has a witness W in I such that any cycle of G(I) is only on elements of dom(W). Lemma 4.14 (Unraveling). For any σS-fact F , GC2 constraints Θ, and CQ q , if ( xt F (x,t)) Θ q is satisﬁable then it has an interpretation which is cycle-free except for F . Letting J be the unraveling of our interpretation J (obtained by the Unraveling Lemma), we can then unshred J back to a σ-interpretation I: Deﬁnition 4.15. The unshredding I of a σS-interpretation J |= wf(σS) is obtained by setting RI = RJ for R σ 2, and, for all R σ>2 and t AJ R , creating the tuple a RI

such that (t,ai) RJ i for all 1 i |R|. As in the proof of the Shredding Proposition, we can show that the unshredding I is well-deﬁned and satisﬁes the unshredded constraints ( x F(x)) Σ TRF( ) q. Further, we show that it satisﬁes and not just TRF( ), because a match of a FR[1]Hnl rule τ in I must be a match of TRF(τ); otherwise the match would witness that J was not cycle-free:

Lemma 4.16 (Soundness). For a σ-fact F, FR[1]Hnl rule τ and σS-interpretation J , if J satisﬁes SHR(TRF(τ)) and is cycle-free except for SHR(F), then the unshredding I of J satisﬁes τ. We conclude by sketching the proof of the Unraveling Lemma, which follows [Kazakov, 2004; Pratt-Hartmann, 2009]. From an interpretation J of ( xt F (x,t)) Θ q , for all u = v in dom(J ) co-occurring in some tuple of J , we call a bag the interpretation with domain {u,v} consisting of the tuples of J mentioning only u,v. We build a graph G over the bags by connecting bags whose domain shares one element. We pick a witness W of F in J and merge in the fact bag all bags whose domain is included in dom(W). An unraveling is a tree T of bags obtained by unfolding G starting at the fact bag, which is preserved as-is. Each bag b of T except the fact bag has a domain containing two elements: one of them occurs exactly in b, its siblings and its parent; the other occurs exactly in b and its children (it is introduced in b). We see T as an interpretation formed of the union of its bags. We construct T from G inductively. For any bag b in T corresponding to a bag b in G, construct the children of b as follows. For each bag b adjacent to b in G, if b and b share the element corresponding to the element u introduced in b, create an isomorphic copy of b as a child of b in T, whose domain is u plus a fresh element, and perform the unraveling process recursively on the children. It can be shown that the unraveling operation preserves GC2 constraints, the fact F , and the negated CQ q . As T is a tree, the interpretation it describes is cycle-free (except for the witness W, because we copied the fact bag as-is).

Complexity. Proposition 4.12 gives a reduction from FR[1]Hnl and GC2 QA to FR[1]Fnl to GC2 QA, but its output is of exponential size in the input, because of treeiﬁcation. Hence, letting f(n) bound the size of the output of our reduction given an input of size n, and letting g(n) bound the combined complexity of GC2 QA, we have shown an upper bound of g(f(n)) for QA for FR[1]Hnl and GC2. Further, treeiﬁcation rewrites the rules in a fact-dependent way, so, unlike the previous case of FR[1]Fnl and GC2 QA, data complexity bounds for GC2 QA do not imply data complexity bounds for FR[1]Hnl and GC2 QA.

5 Adding Functional Dependencies The previous section showed that the language of head-nonlooping frontier-one rules is not destructive of GC2 QA. However, another kind of rules that we would want to support on higher-arity relations are functional dependencies (FDs). It is well-known that QA is undecidable for, e.g., ID[2] and arbitrary FDs [Calì et al., 2003a], so such constraints are trivially destructive. As it turns out, undecidability also holds for FR[1]Hnl rules and FDs; in fact, even for single-head FR[1] rules and FDs:

Theorem 5.1. QA is undecidable for FDs and single-head frontier-one rules, even if all FDs have a determiner of size 1. However, for certain kinds of existential rules and FDs, QA is known to be decidable: this is in particular the case of non-conﬂicting rules and FDs [Calì et al., 2012b]:

Deﬁnition 5.2. We say that a single-head existential rule τ is non-conﬂicting with respect to a set of FDs Φ if, letting A = R(z) be the head atom of τ, letting S be the subset of {1,...,|R|} such that zi is a frontier variable iff i S: No strict subset of S is the determiner of an FD in Φ; If S is exactly the determiner of an FD of Φ, then all existentially quantiﬁed variables in A occur only once. Note that this requires rules to be single-head, and thus head-non-looping. Our result with respect to adding FDs is: Theorem 5.3. Non-conﬂicting frontier-one rules and FDs are non-destructive of arity-two QA. In particular, single-head frontier-one rules and FDs are non-destructive of arity-two QA if all variables in the head atom of rules are assumed to have only one occurrence, as this simple sufﬁcient condition implies the non-conﬂicting condition. To prove the theorem, we assume without loss of generality that we only have FDs on higher-arity relations, as we can write them in GC2 otherwise. We cannot shred the FDs, as they would translate to a functionality assertion for the path, e.g., R i Rj, which is not expressible in GC2 (and not even in expressive DLs such as SROIQ [Horrocks et al., 2006]). However, we can show that, thanks to the non-conﬂicting requirement, FDs can always be made to hold on interpretations, as long as they hold on a witness of the fact. Proposition 5.4. For any GC2 constraints Σ, non-conﬂicting frontier-one rules , FDs Φ on σ>2, σ-fact F, and CQ q, if there is an interpretation I satisfying Θ = ( x F(x)) Σ q and there is a witness W of F in I satisfying Φ, then Θ Φ is satisﬁable. We ﬁrst prove Proposition 5.4. As in Section 4, consider the treeiﬁcation TRF( ): it is still non-conﬂicting as treeiﬁcation only affects rule bodies. Use the Shredding Proposition to obtain an interpretation J of q = SHR(q), Θ = Σ SHR(TRF( )) wf(σS), and the existential closure of F = SHR(F). By our hypothesis about the existence of a witness, we can assume that J has a witness W of F whose unshredding satisﬁes Φ. In the previous section, we used the Unraveling Lemma to show that J could be assumed to be cycle-free. We now modify the lemma to additionally ensure the following property on J , which will forbid FD violations in its unshredding: Deﬁnition 5.5. Given a set of FDs Φ on σ>2, a σSinterpretation J , and a witness W of a fact in J , we call J FD-safe except for W if for every a dom(J ), for any R σ>2 and FD determiner P of R in Φ, considering each t dom(J ) such that (t,a) RJ i for every i P, either there is at most one such t or all are in dom(W). FD-safety is useful for the following reason: Lemma 5.6. For any set of FDs Φ on σ>2, for any σSinterpretation J which is cycle-free and FD-safe except for a witness W, if the unshredding of W satisﬁes Φ, then the unshredding of J satisﬁes Φ. We now claim a variant of the Unraveling Lemma: Lemma 5.7 (FD-aware unraveling). Let Σ be a GC2 constraint, F a σ-fact, q a CQ, non-conﬂicting frontier-one

rules and Φ a set of FDs on σ>2. Let J be an interpretation satisfying Θ = ( xt SHR(F)(x,t)) Σ SHR(TRF( )) wf(σS) SHR(q), and W a witness of SHR(F) in J . Then there is an interpretation J satisfying Θ such that W is a witness of SHR(F) in J , and J is cycle-free and FD-safe except for W.

We prove the lemma by tweaking the unraveling process to ensure FD-safety: when creating children of each bag b in the unraveling T for neighbors of its corresponding bag b in the bag graph G, omit some neighbors that contain shreddings of higher-arity tuples if the shared element u occurs in a strict superset of an FD determiner of Φ, and unravel differently the neighbors where u occurs exactly at a determiner. This unraveling still satisﬁes Σ, q , and the existential closure of F , and satisﬁes SHR(TRF( )): the non-conﬂicting condition ensures that the omitted facts were not required by a rule. We then apply the FD-aware Unraveling Lemma to J and consider the unshredding I of the result; it satisﬁes all necessary constraints as in Section 4, including Φ by Lemma 5.6. This proves Proposition 5.4. We conclude by proving Theorem 5.3. We ﬁrst observe that the results of Section 4 extend to a more general notion of fact that allows inequality axioms (x = y); indeed, inequalities in the fact are preserved by shredding and unshredding, and by unraveling. So Theorem 4.3 holds for such facts with inequalities, with the same complexity. Second, we enumerate all possible equalities between variables of the fact F, and for each possibility, consider the fact F= where variables are merged following the equalities, and inequalities are asserted between the remaining variables. Proposition 5.4 implies that our original entailment holds iff all the derived entailments hold where F is replaced by some F= whose canonical interpretation satisﬁes Φ (this can be tested in PTIME for each F=). Thus we have reduced to QA for FR[1]Fnl and GC2. In terms of complexity, as GC2 QA is EXPTIME-hard in combined complexity (because satisﬁability for the usual twovariable guarded fragment is EXPTIME-hard [Grädel, 1999]), the additional exponential factor (from all possible F=) has no impact, so the bounds of Section 4 also apply to QA for GC2

and non-conﬂicting frontier-one rules and FDs.

6 Conclusion

In this paper, we have studied the impact of existential rules on the decidability of query answering for classes of arity-two constraints. We also explained (in proving Theorem 5.3) how the decidability extends when inequalities are allowed in facts. We have limited our arbitrary arity constraints to rules, i.e., dependencies. In future work we will study how to extend our results to arbitrary arity constraint languages with more features, e.g., disjunction. We will also study what happens in the presence of constants (or nominals), which are disallowed in GC2 (and in the rule languages we consider), but are known not to break decidability in arity-two contexts [Rudolph and Glimm, 2010; Calvanese et al., 2009]. This, however, would probably require different techniques, as unraveling may create multiple copies of constants. Another question that would probably require speciﬁc tools is the study of ﬁnite QA, i.e., QA restricted to ﬁnite interpretations.

We are very grateful to Boris Motik and Pierre Senellart for their helpful feedback. This work was partly supported by the Télécom Paris Tech Research Chair on Big Data and Market Insights and by the Engineering and Physical Sciences Research Council, UK, grants EP/G004021/1 and EP/M005852/1.

References [Abiteboul et al., 1995] Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995. [Baader, 2003] Franz Baader. The description logic handbook: theory, implementation, and applications. Cambridge University Press, 2003. [Baget et al., 2009] Jean-François Baget, Michel Leclère, Marie-Laure Mugnier, and Eric Salvat. Extending decidable cases for rules with existential variables. In IJCAI, 2009. [Baget et al., 2010] Jean-François Baget, Michel Leclère, and Marie-Laure Mugnier. Walking the decidability line for rules with existential variables. In KR, 2010. [Baget et al., 2011a] Jean-François Baget, Michel Leclère, Marie-Laure Mugnier, and Eric Salvat. On rules with existential variables: Walking the decidability line. Artif. Intell., 175(9-10):1620 1654, 2011. [Baget et al., 2011b] Jean-François Baget, Marie-Laure Mugnier, Sebastian Rudolph, and Michaël Thomazo. Walking the complexity lines for generalized guarded existential rules. In IJCAI, 2011. [Baget et al., 2014] Jean-François Baget, Fabien Garreau, Marie-Laure Mugnier, and Swan Rocher. Extending acyclicity notions for existential rules. In ECAI, 2014. [Bárány et al., 2014] Vince Bárány, Georg Gottlob, and Martin Otto. Querying the guarded fragment. LMCS, 10(2), 2014. [Beeri and Vardi, 1981] Catriel Beeri and Moshe Y Vardi. The implication problem for data dependencies. In ICALP. 1981. [Calì et al., 2003a] Andrea Calì, Domenico Lembo, and Riccardo Rosati. On the decidability and complexity of query answering over inconsistent and incomplete databases. In PODS, 2003. [Calì et al., 2003b] Andrea Calì, Domenico Lembo, and Riccardo Rosati. Query rewriting and answering under constraints in data integration systems. In IJCAI, 2003. [Calì et al., 2012a] Andrea Calì, Georg Gottlob, and Thomas Lukasiewicz. A general Datalog-based framework for tractable query answering over ontologies. J. Web Semantics, 14, 2012. [Calì et al., 2012b] Andrea Calì, Georg Gottlob, and Andreas Pieris. Towards more expressive ontology languages: The query answering problem. Artif. Intel., 193, 2012. [Calvanese et al., 2005] Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Riccardo Rosati. DL-Lite: Tractable description logics for ontologies. In AAAI, 2005.

[Calvanese et al., 2008] Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini. Conjunctive query containment and answering under description logic constraints. TOCL, 9(3), 2008. [Calvanese et al., 2009] Diego Calvanese, Thomas Eiter, and Magdalena Ortiz. Regular path queries in expressive description logics with nominals. In IJCAI, 2009. [Donini et al., 1991] Francesco M. Donini, Maurizio Lenzerini, Daniele Nardi, and Andrea Schaerf. A hybrid system with Datalog and concept languages. In AI*IA, 1991. [Fagin et al., 2005] Ronald Fagin, Phokion G. Kolaitis, Renee J. Miller, and Lucian Popa. Data exchange: Semantics and query answering. TCS, 336(1), 2005. [Fagin, 1983] Ronald Fagin. Degrees of acyclicity for hypergraphs and relational database schemes. JACM, 30(3), 1983. [Glimm et al., 2008] Birte Glimm, Carsten Lutz, Ian Horrocks, and Ulrike Sattler. Conjunctive query answering for the description logic SHIQ. JAIR, 31, 2008. [Grädel, 1999] Erich Grädel. On the restraining power of guards. J. Symbolic Logic, 1999. [Grau et al., 2013] Bernardo Cuenca Grau, Ian Horrocks, Markus Krötzsch, Clemens Kupke, Despoina Magka, Boris Motik, and Zhe Wang. Acyclicity notions for existential rules and their application to query answering in ontologies. JAIR, 47, 2013. [Horrocks et al., 2006] Ian Horrocks, Oliver Kutz, and Ulrike Sattler. The even more irresistible SROIQ. In KR, 2006. [Kazakov, 2004] Yevgeny Kazakov. A polynomial translation from the two-variable guarded fragment with number restrictions to the guarded fragment. In JELIA. 2004. [Krötzsch and Rudolph, 2011] Markus Krötzsch and Sebastian Rudolph. Extending decidable existential rules by joining acyclicity and guardedness. In IJCAI, 2011. [Lenzerini, 2002] Maurizio Lenzerini. Data integration: A theoretical perspective. In PODS, 2002. [Levy and Rousset, 1998] Alon Y. Levy and Marie-Christine Rousset. Combining Horn rules and description logics in CARIN. Artif. Intell., 104(1-2):165 209, 1998. [Mitchell, 1983] John C. Mitchell. The implication problem for functional and inclusion dependencies. Information and Control, 56(3), 1983. [Pratt-Hartmann, 2009] Ian Pratt-Hartmann.Data-complexity of the two-variable fragment with counting quantiﬁers. Inf. Comput., 207(8), 2009. [Rudolph and Glimm, 2010] Sebastian Rudolph and Birte Glimm. Nominals, inverses, counting, and conjunctive queries or: Why inﬁnity is your friend! JAIR, 39, 2010. [Tobies, 2001] Stephan Tobies. Complexity results and practical algorithms for logics in knowledge representation. Ph D thesis, 2001.