# enhancing_existential_rules_by_closedworld_variables__b7ffb2d5.pdf

Enhancing Existential Rules by Closed-World Variables

Giovanni Amendola, Nicola Leone, Marco Manna and Pierfrancesco Veltri University of Calabria, Italy amendola@mat.unical.it, leone@mat.unical.it, manna@mat.unical.it, veltri@mat.unical.it

Existential rules generalize Datalog with existential quantiﬁcation in the head. Natively, Datalog is interpreted under a closed-world semantics, while existential rules typically employ the open-world assumption. The interpretation domain in the latter case is enlarged by inﬁnitely many anonymous individuals. Then, in any rule, each variable ranges over all individuals, even if not needed or required. In this paper, we enhance existential rules by closed-world variables to consciously reason on the properties of known (non-anonymous) and arbitrary individuals in different ways. Accordingly, we uniformly generalize the basic classes of existential rules that ensure decidability of ontologybased query answering. For them, after observing that decidability is preserved, we prove that a strict increase in expressiveness is gained, and in most cases the computational complexity is not altered.

1 Introduction Existential rules, also known as TGDs or datalog rules, are a fascinating research topic deeply studied not only in artiﬁcial intelligence [Baget et al., 2011; Amendola et al., 2017] but also in database theory [Bourhis et al., 2016; Alviano and Pieris, 2015] and logic [B ar any et al., 2014]. They are at the core of Datalog [Cal ı et al., 2009], an emerging family of ontology languages complementing the expressive power of Description Logics (DLs) [Baader et al., 2003]. Indeed, datalog generalizes the well-known language Datalog [Ceri et al., 1989] with existential quantiﬁcation in the head. Natively, Datalog is interpreted under a closed-world semantics, while existential rules typically employ the openworld assumption. For example, in classical query answering [Ortiz, 2013] where a query q is evaluated over a logical theory consisting of a database D paired with an ontology Σ the presence of existential quantiﬁers in Σ requires an interpretation domain of D Σ that extends the closed domain of D with inﬁnitely many extra anonymous individuals. Then, each variable of Σ does range over all individuals. To consciously reason on the properties of known (nonanonymous) and arbitrary individuals in different ways, we complement standard variables with closed(-world) variables

that range over the individuals of D Σ only. The resulting language, called datalog ,H, offers novel modeling capabilities, as it allows to specify properties at both data and conceptual level in a uniform way. Consider, for example, a scenario in which one has to model that every good has a price and a good is auctionable if some reference price can be associated to it . Such desiderata are expressible via the rules ρ1 =good(X) Y has Price(X, Y ) and ρ2 = good(X), has Price(X, b Y) auctionable(X), where b Y is a closed variable. Given D0 = {good(ferrari250)}, Σ0={ρ1, ρ2}, and the queries q1 = X Y has Price(X, Y ) and q2 = X Y has Price(X, Y ), auctionable(X). Clearly, q1 is entailed by D0 Σ0. But q2 is not since M = D0 {has Price(ferrari250, 10)} is a possible model of D0 Σ0. Indeed, 10 is not a reference price for ferrari250 but simply one of the inﬁnitely many anonymous individuals not in D0. Therefore rule ρ2 is satisﬁed in M. Of course, a ﬁrst natural question now is to wonder whether Σ0 can be expressed via some equivalent datalog ontology. Existential rules, besides offering good modeling capability, are extremely challenging from a computational viewpoint, as they make query answering undecidable in the general case [Beeri and Vardi, 1984]. To remedy this fact, several syntactic conditions have been proposed in the literature, with some giving rise to the ﬁve basic decidable datalog classes: linear [Cal ı et al., 2012a], weakly-acyclic [Fagin et al., 2005], guarded [Cal ı et al., 2013], sticky [Cal ı et al., 2012b], and shy [Leone et al., 2012]. The second natural question now is to wonder whether these conditions can be generalized to preserve decidability of query answering also for datalog ,H. Along the paper we give answers to the above questions, starting right here by summarizing the main contributions: For each basic datalog class C, we consider a naive and a reﬁned extension, denoted by CH and CH+, respectively. In naive extension, the syntactic conditions underlying C treat closed variables as standard ones. In the reﬁned one, the syntactic conditions are enforced over standard variables only. Decidability can be easily established. (Section 3.) We show that CH preserves the same data and combined complexity of each basic datalog class C. Likewise, this holds with shy H+ and w-acyclic H+ w.r.t. their standard counterparts. Differently, guarded H+ and sticky H+ exhibit an increase in data complexity, while only linear H+ has an increase in both data and combined complexity. (Section 4.)

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

We prove that datalog ,H is (resp., CH and CH+ are) strictly more expressive than datalog (resp., each basic class C). In particular, going back to our running example, there is no datalog ontology that, independently from the database at hand, behaves as Σ0 w.r.t. both q1 and q2. Also, each CH+ is even strictly more expressive than datalog. (Section 5.1.) We show that the well-known Description Logic ELH [Brandt, 2004b] is captured by linear H+, even if we only focus on linear H+ ontologies of arity at most two and with at most two atoms in the body (where the combined complexity of query answering drops to NP as for ELH). Interestingly, linear H+ keeps a lower computational complexity, compared to other datalog classes that can express this DL, namely guarded and its extensions. (Section 5.2.)

2 Existential Rules with Closed Variables

Basics. Let C (constants or individuals) and V (variables) be pairwise disjoint discrete sets of terms. A variable (x, y, ...) is either standard (X, Y, ...) or closed(-world) (b X, b Y, ...). We denote by Vs and Vc, the set of standard and closed variables, respectively. An atom α is a labeled tuple p(t), where p = pred(α) is a predicate symbol, t = t1, ..., tm is a tuple of terms, m = |p| is the arity of p or α, and α[i] = ti. Given a ﬁnite domain C of known individuals, a -substitution is any map µ : V C such that b X Vc implies µ(b X) . For a set A of atoms, µ(A) is obtained from A by replacing each variable x by µ(x). A database (resp., instance) is any variable-free ﬁnite (resp., possibly inﬁnite) set of atoms. Syntax. A datalog ,H rule ρ is a logical implication of the form X Y (φ(X, Y) Z ψ(X, Z)) with X Y V and Z Vs whose body (resp., head) b(ρ) = φ(X, Y) (resp., h(ρ) = ψ(X, Z)) is a conjunction (or set) of atoms, possibly with constants. As usual, the head is nonempty. Universal and existential variables are respectively denoted by UV(ρ) and EV(ρ). The set X is known as the frontier of ρ. If no closed variable is in ρ, then it is also a datalog rule; and if even EV(ρ) = , then it is also a datalog rule. A datalog ,H ontology Σ is any ﬁnite set of datalog ,H rules. We denote by R(Σ) the set of predicates occurring in Σ. A position p[i] is deﬁned as a predicate p of R(Σ) and its i-th attribute. Let pos(p)={p[1], ..., p[|p|]}. A CH (hybrid conjunctive) query is an expression of the form q(X) = Y φ(X, Y), where φ is as above. In case q contains no closed variable, it is also a C (conjunctive) query. For a structure ς over atoms (set, rule, query, ...), if b X occurs in ς, then X does not occur in ς. Also, atoms(ς), terms(ς), vars(ς) and std(ς) respectively denote the set of atoms in ς, the set of terms in atoms(ς), the set of variables in atoms(ς), and the structure built from ς by replacing each b X with X. Semantics. Consider a triple D, Σ, q as above, and let = terms(D, Σ) C. A model of D Σ is any instance M D such that, for each ρ Σ and each -substitution µ, µ(b(ρ)) M implies µ (h(ρ)) M for some -substitution µ µ|X. The answer to q over M is the set ans(q, M) of |X|-tuples t for which there is a -substitution µ such that µ(φ(t, Y)) M. The set of all models is denoted by mods(D, Σ). The (certain) answer to q is the set ans(q, D, Σ) = T

M mods(D,Σ) ans(q, M).

3 Decidability

Hereafter, QEVAL refers to the following decision problem: Given a database D, a datalog ,H ontology Σ, a CH query q(X) with |X| = n, and a tuple t Cn, decide whether t ans(q, D, Σ) holds. In this section, we ﬁrst introduce the ﬁve basic datalog classes ensuring decidability of QEVAL, as well as some of their generalizations that we need in our technical analysis: j-acyclic [Kr otzsch and Rudolph, 2011], w-sticky [Cal ı et al., 2012b], and w-guarded [Cal ı et al., 2013]. Then, we deﬁne hybrid(-world) extensions of the basic classes, and show that decidability is preserved.

3.1 Overview of Some Decidable datalog Classes

Fix a datalog ontology Σ. We assume that different rules of Σ share no variable. A term t occurs in a set A of atoms at position p[i] if there is α A s.t. pred(α) = p and α[i] = t. Position p[i] is invaded by an existential variable X if there is ρ Σ s.t.: (1) X occurs in h(ρ) at position p[i]; or (2) some y UV(ρ) attacked by X (i.e., y occurs in b(ρ) only at positions invaded by X) occurs in h(ρ) at position p[i]. A universal variable is protected if it is attacked by no variable. Linearity. Ontology Σ belongs to linear if, for each ρ Σ, b(ρ) contains at most one body atom. Acyclicity. The labeled graph of Σ is G(Σ) = N, A , where: (1) N = p R(Σ)pos(p); (2) (p[i], r[j], ) A if there are ρ Σ and X UV(ρ) s.t. X occurs both in b(ρ) at position p[i] and in h(ρ) at position r[j]; and (3) (p[i], r[j], ) A if there are ρ Σ, X UV(ρ) also occurring in h(ρ), and Y EV(ρ) s.t. both X occurs in b(ρ) at position p[i] and Y occurs in h(ρ) at position r[j]. The existential graph of Σ is G (Σ) = N, A , where N = ρ ΣEV(ρ) and (X, Y ) A if the rule ρ where Y occurs contains a universal variable attacked by X and occurring in h(ρ). Σ belongs to weakly-acyclic (resp., j-acyclic) if G(Σ) (resp., G (Σ)) has no cycle through an -arc (resp., is acyclic). Guardedness. Σ belongs to guarded if ρ Σ implies that there is α b(ρ) s.t. UV(ρ) = vars(α). Also, Σ belongs to w-guarded if, for each ρ Σ, there is an atom of b(ρ) containing all the attacked variables of ρ. Stickiness. A variable X of Σ is marked if (1) there is ρ Σ s.t. X occurs in b(ρ) but not in h(ρ); or (2) there are ρ, ρ Σ s.t. a marked variable occurs in b(ρ) at some position p[i] and X occurs in h(ρ ) at position p[i] too. Then, Σ is sticky if, for each ρ Σ, X occurs multiple times in b(ρ) implies X is not marked. Also, Σ belongs to w-sticky if, for each ρ Σ, X occurs multiple times in b(ρ) implies X is not marked or X occurs in some position never involved in cycles going through an -arc of G(Σ). Shyness. Σ belongs to shy if, for each ρ Σ: (1) X occurs in two different atoms of b(ρ) implies X is protected; and (2) if X and Y occur both in h(ρ) and in two different atoms of b(ρ), then X and Y are not attacked by the same variable.

Proposition 1. The considered classes are pairwise uncomparable, except for: linear guarded w-guarded, linear shy, datalog shy, sticky w-sticky, and datalog w-acyclic j-acyclic.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

Class C Data complexity (LB) (UB) Combined complexity (LB) (UB)

linear H in AC0 C 2 linear PSPACE linear C A3 linear H+ PTIME datalog 4 C C 1 shy EXPTIME datalog 4 C C 1 shy A3

w-acyclic H PTIME w-acyclic C C 1 2 j-acyclic 2EXPTIME w-acyclic C C 1 j-acyclic w-acyclic H+ PTIME w-acyclic C C 1 j-acyclic 2EXPTIME w-acyclic C C 1 j-acyclic

guarded H PTIME guarded C C 1 2 guarded 2EXPTIME guarded C C 1 guarded guarded H+ EXPTIME w-guarded 4 C C 1 w-guarded 2EXPTIME guarded C C 1 w-guarded

sticky H in AC0 C 2 sticky EXPTIME sticky C A3 sticky H+ PTIME datalog 4 C C 1 w-sticky EXPTIME sticky C A3

shy H PTIME shy C C 1 2 shy EXPTIME shy C C 1 shy A3 shy H+ PTIME shy C C 1 shy EXPTIME shy C C 1 shy A3

Table 1: Computational Complexity of QEVAL, where LB and UB stand for lower and upper bound, respectively.

3.2 Decidable Hybrid Extensions Let B = {linear, w-acyclic, guarded, sticky, shy}. For each C B, we deﬁne the naive and reﬁned hybrid(-world) extension of C, respectively denoted by CH and CH+. Formally, for each Σ datalog ,H, Σ CH if std(Σ) C, while Σ CH+ if thin(Σ) C, where thin(Σ) is obtained from Σ by replacing each closed variable by some constant and then eliminating every atom containing only constants. For example, g(X, b Y), s(b Y) r(X) belongs to linear H+ but not to linear H since g(X, c) r(X) belongs to linear but the rule g(X, Y ), s(Y ) r(X) does not.

Proposition 2. For each C, C B, C CH CH+ holds, as well as C C implies both CH C H and CH+ C H+.

For the decidability analysis, we reduce QEVAL over datalog ,H to QEVAL over datalog . To this end, we devise the following algorithm, whose key principle is reminiscent of analogous methods from the literature [Motik et al., 2005]:

Algorithm 1. Reduction A1 from a hybrid triple D, Σ, q

Σ {p(Xp) Γ(Xp), p(Xp) : p R(Σ)}; Σ {std(b( ρ)), Γ(Vρ) std(h( ρ)), Γ(Cρ) : ρ Σ}; q std( q), Γ(Vq); return D, Σ Σ , q ;

Legend. q (resp., ρ) is obtained from q (resp., ρ) by replacing each predicate p with p; Γ(t1, ..., tn) = c(t1), ..., c(tn), for any n > 0; V = {V : b V vars( )}; Cρ= C terms(h(ρ)); Xp = X1, ..., X|p|; and {c, p} R(Σ) = .

E.g., from q= X b Y r(X, b Y) and Σ={r(b X, Y ) Z r(Y, Z)}, we get q = X Y r(X, Y ), c(Y ), and Σ ={r(X1, X2) c(X1), c(X2), r(X1, X2)} and Σ = { r(X, Y ), c(X) Z r(Y, Z)}. Let us now highlight the key properties of A1.

Lemma 1. A1 ensures ans(q, D, Σ) = ans(q , D, Σ Σ ), and it behaves as follows: (1) guarded H guarded; (2) guarded H+ w-guarded; (3) sticky H+ w-sticky; (4) w-acyclic H+ jointly-acyclic; and (5) shy H+ shy.

Proof Sketch. Via -rules, each p(t) D gives rise to a twin atom p(t), and its constants are collected under the predicate c. Via -rules, each predicate p is renamed in p, each variable b V is replaced by V , and the atom containing V is

paired with the atom c(V ). This way, known individuals can be separated from anonymous ones, and c-atoms can mimic the semantics of closed variables. Consider now the range of the reduction; due to space limits, we only consider cases (3) and (4). In case (3), let Σ sticky H+. Each rules ρ cannot violate stickiness as no repeated variable appears in b(ρ ). Now, let X be a variable occurring multiple times in std(b( ρ)), Γ(Vρ). We distinguish two cases: (i) X was a standard variable in b(ρ). Then, it also occurred multiple times in b(ρ), Hence, by deﬁnition of sticky H+, X was not marked in Σ; and by A1 it appears multiple times in std(b( ρ)) only. Hence, X is also not marked in Σ Σ . (ii) X was a closed variable in b(ρ). Then, it appears both in std(b( ρ)) and in Γ(Vρ). But position c[1] is never involved in cycles going through an -arc of G(Σ Σ ). Hence, Σ Σ wsticky. In case (4), let Σ w-acyclic H+. Note that, given an existential variable X appearing in Σ Σ (and so in Σ), for each i, p[i] and c[1] are not invaded by X. Assume that there is a loop in G (Σ Σ ). Hence, there is a -rule ρ s.t. X EV(ρ ), and Y UV(ρ ) is attacked by X. Now, Y cannot appear in c. Hence, it is a standard variable in b(ρ). Then, Y is attacked by X in Σ. Thus, Σ w-acyclic H+. An induction on the length of the cycle concludes the proof.

The next result follows immediately. Theorem 3. Let C B. Then, QEVAL for CH queries over CH and CH+ ontologies is decidable.

4 Computational Complexity We now study the combined and data complexity of QEVAL over our hybrid extensions. The former is calculated by considering everything as input, while the latter by considering ﬁxed both the query and the ontology. From our analysis, Theorem 4. All results in Table 1 do hold. Each entry C1 x C2 reads as follows: Algorithm x deﬁnes a reduction Ax from QEVAL over C1 to QEVAL over C2, according to Lemma x possibly combined with Propositions 1 and 2. In particular, if x {1, 4}, then Ax works in polynomial-time. Symbol means that the result admits alternative proofs. Each entry C1 C2 comes from Proposition 2. Each entry Ax means that the upper bound is explicitly given by Algorithm x. The rest of the section is the proof

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

of Theorem 4. To complete data complexity upper bounds of all naive extensions, consider the following algorithm:

Algorithm 2. Reduction A2 from a hybrid triple D, Σ, q

Σ {p(Xp) Γ(Xp), p(Xp), p[c|p|](Xp) : p R(Σ)}; Σ {std(b(ρω) h(ρω), h( ρ), Γ(Cρ)) : ρ Σ, ω Ωρ}; q std( q), Γ(Vq); return D, Σ Σ , q ;

Legend. c|p| is the tuple c, ..., c of symbols having length |p|; Ωρ collects all maps of the form ω : vars(ρ) {c, o} such that ω(x) = o if x EV(ρ), and ω(x) = c if x UV(ρ) Vc (symbols c and o stand for closed and open, respectively); ρω denotes the rule obtained from ρ by replacing each atom of the form p(X) with p[ω(X)](X); and the rest is as in A1.

Basically, A2 avoids Γ(Vρ) in rule bodies by encoding in predicates those positions where only known individuals may occur. E.g., from q = X b Y r(X, b Y) and Σ={ρ}, where ρ = r(b X, Y ) Z r(Y, Z), we get Ωρ = {ω1, ω2} s.t.: ω1(b Y) = ω2(b Y)=c, ω1(Z)=ω2(Z)=o, ω1(Y )=c, ω2(Y )=o. Hence, Σ {r(X1, X2) c(X1), c(X2), r(X1, X2), r[c,c](X1, X2)} Σ {r[c,c](X, Y ) Z r[c,o](Y, Z), r(Y, Z); r[c,o](X, Y ) Z r[o,o](Y, Z), r(Y, Z)} q X Y r(X, Y ), c(Y ) By considering any universal model U of D Σ Σ i.e., a representative model of any other [Cal ı et al., 2013] subscripts guarantee that whenever there is a substitution µ that maps both the body and the head of a -rule ρω to U, then µ(X) terms(D) iff ω(X) = c. Then, Lemma 2. A2 ensures ans(q, D, Σ) = ans(q , D, Σ Σ ). In particular, it behaves as follows: CH C for each C B. Although exponential (each rule ρ admits 2|UV(ρ) Vs| different maps), when combined with Lemma 2, reduction A2 gives us the desired bounds. To complete with upper bounds, we design the following algorithm:

Algorithm 3. Alternating decision procedure A3

Input: Hybrid-world triple D, Σ, q where Σ is in normal form terms(D, Σ) C; / known / k (1 + |vars(q)|) maxp R(Σ) |p|; I {a1, ..., ak} C such that I = ; / anonymous / guess a -substitution µ : vars(q) I Q µ(atoms(q)) and Iq terms(Q) I for each a Iq do / guess atom αa introducing each a / guess αa {p(t, a) : p R(Σ), t ( I)|p| 1} for each α Q universally do / prove each atom α / if α D then accept else guess ρ Σ and a -substitution µ : vars(ρ) { I} if µ is not compatible with α then reject else Q µ(b(ρ)) and goto step

Legend. Σ is in normal form if, for each ρ Σ, |h(ρ)| = 1, |EV(ρ)| 1, and |EV(ρ)| = 1 implies the existential variable is in the last position; µ is not compatible with α if one of the following occurs: µ(h(ρ)) = α; or X EV(ρ), µ(X) Iq, and α = αµ(X); or µ maps some non-frontier variable into Iq.

It is a resolution-based algorithm, generally working in alternating polynomial space, hence in exponential time.

Lemma 3. If Σ is sticky H+ or shy H+, then A3 is correct and it runs in EXPTIME. If Σ linear H, then A3 runs in PSPACE.

Proof Sketch. A3 proves the query q by exploring a small (at most exponential) portion of some universal model of D Σ. In case of linear rules, the algorithm works in nondeterministic polynomial space as step is universal only once, namely at the very beginning when Q contains the image µ(atoms(q)) of q.

We close the section by providing missing lower bounds:

Algorithm 4. Reduction A4 from a standard triple D, Σ, q

Vp protected Vars(Σ); Σ {cls(ρ, Vp) : ρ Σ}; return D, Σ , q ;

Legend. Vp collects all protected standard universal variables of Σ and cls(ρ, Vp) replaces each variable X Vp by the closed one b X.

Lemma 4. A3 ensures ans(q, D, Σ) = ans(q, D, Σ ). In particular, it behaves as follows: 1) datalog CH+ for each C B; and 2) w-guarded guarded H+.

Proof Sketch. Equality of certain answers follows by the fact that protected variables implicitly behave as closed ones. (1) Let Σ datalog. Then, each variable appearing in Σ is protected. Hence, each rule in Σ has closed variables only. Thus, Σ belongs to each reﬁned extension, as the syntactic conditions are enforced over standard variables only. (2) Let Σ w-guarded. Let ρ Σ, and ρ be the corresponding rule in Σ . Then, by deﬁnition of w-guarded, there is an atom in ρ that covers all the non-protected universal variables appearing in b(ρ). Hence, the corresponding atom in ρ covers all the standard universal variables appearing in b(ρ ), as each protected variable is replaced by a closed one. Thus, Σ guarded H+.

5 Expressive Power

We now investigate the expressiveness of datalog ,H. After showing that there are simple hybrid ontologies that cannot be expressed by any datalog one under model equivalence, we consider the classical notion of program expressive power [Arenas et al., 2014], also known as query inseparability, which relies on answer equivalence and turns out to be more appropriate for OBQA purposes. However, also in this case we can show that datalog ,H is strictly more expressive than datalog . In particular, for both extensions of each basic datalog class, we prove a strict increase in expressiveness. We close the section by showing that linear H+ is strictly more expressive than the Description Logic ELH [Brandt, 2004b].

5.1 datalog ,H versus datalog

Two ontologies Σ1 and Σ2 are model-equivalent (ME), shortly Σ1 Σ2, if mods(D, Σ1) = mods(D, Σ2), for each database D. Accordingly, a class C2 of ontologies is strictly more expressive (under ME) than C1, denoted by C1 < C2, if (M1) for each Σ1 C1 there is Σ2 C2 s.t. Σ1 Σ2, and (M2) for some Σ2 C2 there is no Σ1 C1 s.t. Σ1 Σ2.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

Theorem 5. It holds that: (i) datalog < datalog ,H, and (ii) both C < CH and C < CH+, for each C B.

Proof. Consider the ontology Σ = {p(b X) r(b X)}. We proceed by contradiction. Assume Σ admits a model-equivalent datalog ontology Σ . Let D = . According to Section 2, M1 = {p(1)} is a model of D Σ as the interpretation domain of the closed variables is empty. Hence, M1 is a model of D Σ . Let D1 = {p(1)}. In this case, M1 = {p(1)} is not a model of D1 Σ as r(1) is required. Thus M1 is not a model of D1 Σ . But this is not possible for classical ﬁrst-order theories. In fact, M1 D1. Hence, if M1 is not a model of D1 Σ the only reason is that there exists some rule ρ Σ that is not satisﬁed. But since M1 D also holds, this means that M1 cannot be a model of D Σ as the same rule ρ would be unsatisﬁed. Hence, (i) follows since datalog datalog ,H, while (ii) holds since, for each C B, C CH CH+ by Proposition 2, and Σ CH.

We now consider a smoother notion of expressiveness. Two ontologies Σ1 and Σ2 are answer-equivalent (AE), shortly Σ1 = Σ2, if ans(q, D, Σ1) = ans(q, D, Σ2), for each database D and for each query q. Hence, if two ontologies are model-equivalent, then they are also answer-equivalent (i.e., Σ1 Σ2 implies Σ1 = Σ2, for each Σ1 and Σ2). Similarly, a class C2 of ontologies is strictly more expressive (under AE) than C1, denoted by C1 C2, if (A1) for each Σ1 C1 there is Σ2 C2 s.t. Σ1 = Σ2, and (A2) for some Σ2 C2 there is no Σ1 C1 s.t. Σ1 = Σ2. Note that, if C1 < C2 (resp., C1 C2), then condition (A1) (resp., (M2)) is guaranteed. By Lemma 4, ontology Σ = {p(b X) r(b X)} in the proof of Theorem 5 admits an answer-equivalent datalog ontology. Indeed, it is the output of reduction A3 when it takes the ontology std(Σ) = {p(X) r(X)} as input. Hence, to prove the next result, we need a stronger argument.

Theorem 6. It holds that: (i) datalog datalog ,H, and (ii) C CH, C CH+, and also datalog CH+, for each basic class C B.

Proof. Consider Σh = {p(b X) r(b X)} { Y p(Y )}. We proceed by contradiction. Assume Σh admits an answerequivalent datalog ontology Σ h. Let q1 = X p(X), r(X) and Dc = {p(c)}, for each constant c C. According to Section 2, ans(q1, Dc, Σh) = { }, and therefore also ans(q1, Dc, Σ h) = { }. Since Σ is a standard ﬁrst-order theory, this means that Dc Σ h |= q1, and therefore that Σ h |= {p(c) q1}, or equivalently mods(Σ h) mods({p(c) q1}). Hence, we have that mods(Σ h) T c C mods({p(c) q1}). But the common models are exactly those of φ1 = Y p(Y ) q1. Therefore, mods(Σ h) mods(φ1). Let q2 = Xp(X) and D = . Clearly, ans(q2, D , Σh) = { }, and therefore also ans(q2, D , Σ h) = . But this means that Σ h |= q2, or equivalently mods(Σ h) mods(q2). By combining the above results, we have mods(Σ h) mods(φ1) mods(q2). But the common models are exactly those of q1. This means that mods(Σ h) mods(q1), from which we get Σ h |= q1, implying that ans(q1, D , Σ h) = { }. But this is not possible since ans(q1, D , Σh) = . Hence, (i) follows

since datalog datalog ,H, while (ii) holds since, for each C B, C CH CH+ by Proposition 2, and Σh CH, and since datalog CH+ holds by Lemma 4.

5.2 linear H+ versus ELH

We now show that linear H+ is strictly more expressive than ELH [Brandt, 2004a; 2004b], even if we focus on linear H+ ontologies with bounded-rules (namely, both arities and number af atoms of each rule are bounded by some integer constant), in which case the combined complexity of QEVAL drops to NP as the complexity of QEVAL for C queries over ELH. (Note that ELH is not expressible in linear H.) In particular, we provide a polynomial time transformation that maps ELH ontologies into answer-equivalent linear H+ ones. This also shows that ELH is no more succinct than linear H+. In DLs, rules are called inclusions, which in ELH are of the form: C D; C D E; R S; C R.D; R.D C; where C, D, E are concepts, and R, S are roles. According to the semantics of DLs, they are modelequivalent (hence answer-equivalent) to the following existential rules [Baader et al., 2003], respectively: (i) C(X) D(X); (ii) C(X), D(X) E(X); (iii) R(X, Y ) S(X, Y ); (iv) C(X) Y R(X, Y ), D(Y ) (v) R(X, Y ), D(Y ) C(X). Only rules of the form (i), (iii), and (iv) are linear. To obtain a linear H+ ontology answer-equivalent to an ELH one, a possible way is to close join variables in the body of non-linear rules, i.e., of the form (ii) and (v). This would preserve soundness, but not necessarily completeness. Hence, to guarantee answer equivalence, one should complement such (hybrid) rules with new linear ones that bypass propagations inhibited by closed variables. Formally,

Theorem 7. Under answer-equivalence, linear H+ with bounded-rules is strictly more expressive than ELH. In particular, for each ELH ontology, an equivalent linear H+ one of quadratic size can be constructed in polynomial time.

Proof Sketch. Given an ELH ontology Σ in datalog form, we construct a datalog ,H ontology Σ as follows: (0) Let Σ = ; (1) Add to Σ each rule of Σ of the form (i), (iii) or (iv); (2) For each rule of Σ of the form (ii) (resp., (v)), add to Σ the hybrid rule C(b X), D(b X) E(b X) (resp., R(X, b Y), D(b Y) C(X)); (3) For each pair (B, A) of unary predicates (i.e., concepts) occurring in Σ, add to Σ the standard bypass rule B(X) A(X), provided that Σ logically entails the rule B(X) A(X), namely whether Σ |= B A (B is subsumed by A in Σ) in DLs terminology. By construction, Σ is linear H+. Also, the addition of bypass rules makes Σ answer-equivalent to Σ (they share the same universal models). This completes our reduction, which works in polynomial time, since it is known that also concept subsumption in ELH can be performed in polynomial time [Brandt, 2004a]. Regarding the size of Σ , it sufﬁces to observe that |Σ | = |Σ| at the end of step (2), and also that the number of rules added at step (3) are at most quadratic in the number of concepts occurring in Σ. To conclude our proof, we consider the linear H+ ontology Σ = {p(b X), s(b Y) g(b X, b Y)}. It is well-known that Σ cannot be expressed in ELH, as it deﬁnes the so-called cross-product, namely p s g.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

6 Related Work

Interest in reconciling openand closed-world semantics has a long history [Cadoli et al., 1990]. Since then, different paradigms have been proposed: epistemic and modal operators [Donini et al., 1992; Calvanese et al., 2007], hybrid knowledge bases [Motik et al., 2005; Rosati, 2005; 2006; Eiter et al., 2008; Kr otzsch et al., 2008; Motik and Rosati, 2010; Knorr et al., 2011; Libkin and Sirangelo, 2011; Bajraktari et al., 2017], closed predicates [Seylan et al., 2009; Lutz et al., 2013; 2015; Ngo et al., 2016], and nominal schemas [Kr otzsch et al., 2011; Kr otzsch and Rudolph, 2014]. In case of monotonic Horn DLs, modal operator K behaves as closed variables. Indeed, axiom KC D is answer-equivalent to the rule C(b X) D(b X). Hybrid KBs typically combine DLs and rule-based formalisms by enforcing syntactic safety condition, while closed predicates are those whose extensions have to be interpreted as complete. So, they are less related to our framework, although we borrowed from them some useful tool, as said in Section 3. Nominal schemas, instead, represent the proposal which is closest to ours. Intuitively, a nominal variable {z} is represents some arbitrary nominal (i.e., known individual). When occurring in the left-hand side of a concept inclusion, {z} behaves as the closed variable b Z. Indeed, axiom has Parent.{z} has Parent. married.{z} C is model-equivalent to has Parent(X, b Z), has Parent(X, Y ), married(Y, b Z) C(X). But in DLs, {z} may also be existentially quantiﬁed to mimic disjunction among nominals. Concerning expressiveness, different notions have been also considered in the literature. In [Gottlob et al., 2014], Σ1 and Σ2 are gr-equivalent if ans(q, D, Σ1) = ans(q, D, Σ2), for each database D and query q G, where G collects ground queries, i.e., all variable-free atoms. Under this notion guarded gr datalog, and datalog ,H is no more expressive than datalog (the latter obtained via a minor modiﬁcation of Algorithm 1). Indeed, closed variables do not increase the so-called query expressivity [Rudolph and Thomazo, 2015], deﬁned by ﬁxing a special predicate goal as the only possible ground query. In [Gottlob et al., 2018], (Σ1, q1) and (Σ2, q2) are re-equivalent if ans(q1, D, Σ1) = ans(q2, D, Σ2), for each database D. Then, guarded, C re datalog, G and also sticky, C re , UC , where UC is the class of union of conjunctive queries. Differently from program expressive power, however, these notions are more suitable to compare ontology formalisms from a computational viewpoint rather than from a knowledge representation one. Indeed, reequivalence coincides with the so-called query rewritability.

7 Future Work and Conclusion

In conclusion, closed variables represent a very natural, ﬂexible and effective extension of standard existential rules. In the future, we would like to investigate whether our naive or reﬁned extensions can express other ontology languages, as well as to close a question that has been left (partially) open in Theorems 5 and 6 concerning the expressivity of CH vs. CH+ by varying C. Indeed, so far, what is known is a strict increase in expressivity in the two cases exhibiting a jump in data complexity from AC0 to PTIME, namely when

C {linear, sticky}. Also, it would be reasonable to extend the computational analysis to other known classes. Interestingly, concerning the latter point, while moving to decidable abstract (i.e., not recognizable) classes of rules [Baget et al., 2011], such as fes generalizing w-acyclic, we realized that there are ontologies in fes H that are not mapped to fes via Algorithm 1; hence, a separate approach is needed here. Also, one could study the impact of stratiﬁed negation in rules and queries, for reasoning even on the anonymity of individuals. As for nominal schemas in DLs, existentially quantiﬁed closed variables can be certainly considered to mimic some form of disjunction. Finally, implementing closed variables in some existing datalog system as well as testing performances on real-world ontologies are also tasks in our agenda.

Acknowledgments The paper has been partially supported by the MISE under project S2BDW (n. F/050389/01-03/X32) - PON I&C 2014-20, and by Regione Calabria under project DLV Large Scale (CUP J28C17000220006) - POR Calabria 2014-20.

References [Alviano and Pieris, 2015] Mario Alviano and Andreas Pieris. Default negation for non-guarded existential rules. In Proc of PODS, pages 79 90, 2015. [Amendola et al., 2017] Giovanni Amendola, Nicola Leone, and Marco Manna. Finite model reasoning over existential rules. TPLP, 17(5-6):726 743, 2017. [Arenas et al., 2014] Marcelo Arenas, Georg Gottlob, and Andreas Pieris. Expressive languages for querying the semantic web. In Proc. of PODS, pages 14 26, 2014. [Baader et al., 2003] Franz Baader, Diego Calvanese, Deborah L. Mc Guinness, Daniele Nardi, and Peter F. Patel Schneider, editors. The description logic handbook. 2003. [Baget et al., 2011] Jean-Franc ois Baget, Michel Lecl ere, Marie-Laure Mugnier, and Eric Salvat. On rules with existential variables: Walking the decidability line. AIJ, 175(910):1620 1654, 2011. [Bajraktari et al., 2017] Labinot Bajraktari, Magdalena Ortiz, and Mantas Simkus. Clopen knowledge bases: Combining description logics and answer set programming. In Proc. of DL, 2017. [B ar any et al., 2014] Vince B ar any, Georg Gottlob, and Martin Otto. Querying the guarded fragment. LMCS, 10(2), 2014. [Beeri and Vardi, 1984] Catriel Beeri and Moshe Y. Vardi. A proof procedure for data dependencies. J. ACM, 31(4):718 741, 1984. [Bourhis et al., 2016] Pierre Bourhis, Marco Manna, Michael Morak, and Andreas Pieris. Guarded-based disjunctive tuple-generating dependencies. ACM TODS, 41(4):27:1 27:45, 2016. [Brandt, 2004a] Sebastian Brandt. On subsumption and instance problem in ELH w.r.t. general tboxes. In Proc. of DL, 2004.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

[Brandt, 2004b] Sebastian Brandt. Polynomial time reasoning in a description logic with existential restrictions, GCI axioms, and - what else? In Proc. of ECAI, pages 298 302, 2004.

[Cadoli et al., 1990] Marco Cadoli, Francesco M. Donini, and Marco Schaerf. Closed world reasoning in hybrid systems. In Proc. of ISMIS, pages 474 481, 1990.

[Cal ı et al., 2009] Andrea Cal ı, Georg Gottlob, and Thomas Lukasiewicz. Datalog : a uniﬁed approach to ontologies and integrity constraints. In Proc. of ICDT, pages 14 30, 2009.

[Cal ı et al., 2012a] Andrea Cal ı, Georg Gottlob, and Thomas Lukasiewicz. A general datalog-based framework for tractable query answering over ontologies. JWS, 14:57 83, 2012.

[Cal ı et al., 2012b] Andrea Cal ı, Georg Gottlob, and Andreas Pieris. Towards more expressive ontology languages: The query answering problem. AIJ, 193:87 128, 2012.

[Cal ı et al., 2013] Andrea Cal ı, Georg Gottlob, and Michael Kifer. Taming the inﬁnite chase: Query answering under expressive relational constraints. JAIR, 48:115 174, 2013.

[Calvanese et al., 2007] Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Riccardo Rosati. Eql-lite: Effective ﬁrst-order query processing in description logics. In Proc. of IJCAI, pages 274 279, 2007.

[Ceri et al., 1989] Stefano Ceri, Georg Gottlob, and Letizia Tanca. What you always wanted to know about datalog (and never dared to ask). TKDE, 1(1):146 166, 1989.

[Donini et al., 1992] Francesco M. Donini, Maurizio Lenzerini, Daniele Nardi, Andrea Schaerf, and Werner Nutt. Adding epistemic operators to concept languages. In Proc. of KR, pages 342 353, 1992.

[Eiter et al., 2008] Thomas Eiter, Giovambattista Ianni, Thomas Lukasiewicz, Roman Schindlauer, and Hans Tompits. Combining answer set programming with description logics for the semantic web. AIJ, 172(12-13):1495 1539, 2008.

[Fagin et al., 2005] Ronald Fagin, Phokion G. Kolaitis, Ren ee J. Miller, and Lucian Popa. Data exchange: semantics and query answering. TCS, 336(1):89 124, 2005.

[Gottlob et al., 2014] Georg Gottlob, Sebastian Rudolph, and Mantas Simkus. Expressiveness of guarded existential rule languages. In Proc. of PODS, pages 27 38, 2014.

[Gottlob et al., 2018] Georg Gottlob, Andreas Pieris, and Mantas Simkus. The impact of active domain predicates on guarded existential rules. Fundam. Inform., 159(12):123 146, 2018.

[Knorr et al., 2011] Matthias Knorr, Jos e J ulio Alferes, and Pascal Hitzler. Local closed world reasoning with description logics under the well-founded semantics. AIJ, 175(910):1528 1554, 2011.

[Kr otzsch and Rudolph, 2011] Markus Kr otzsch and Sebastian Rudolph. Extending decidable existential rules by joining acyclicity and guardedness. In Proc. of IJCAI, pages 963 968, 2011. [Kr otzsch and Rudolph, 2014] Markus Kr otzsch and Sebastian Rudolph. Nominal schemas in description logics: complexities clariﬁed. In Proc. of KR, pages 308 317, 2014. [Kr otzsch et al., 2008] Markus Kr otzsch, Sebastian Rudolph, and Pascal Hitzler. ELP: tractable rules for OWL 2. In Proc. of ISWC, pages 649 664, 2008. [Kr otzsch et al., 2011] Markus Kr otzsch, Frederick Maier, Adila Krisnadhi, and Pascal Hitzler. A better uncle for OWL: nominal schemas for integrating rules and ontologies. In Proc. of WWW, pages 645 654, 2011. [Leone et al., 2012] Nicola Leone, Marco Manna, Giorgio Terracina, and Pierfrancesco Veltri. Efﬁciently computable Datalog programs. In Proc. of KR, pages 13 23, 2012. [Libkin and Sirangelo, 2011] Leonid Libkin and Cristina Sirangelo. Data exchange and schema mappings in open and closed worlds. JCSS, 77(3):542 571, 2011. [Lutz et al., 2013] Carsten Lutz, Inanc Seylan, and Frank Wolter. Ontology-based data access with closed predicates is inherently intractable(sometimes). In Proc. of IJCAI, pages 1024 1030, 2013. [Lutz et al., 2015] Carsten Lutz, Inanc Seylan, and Frank Wolter. Ontology-mediated queries with closed predicates. In Proc. of IJCAI, pages 3120 3126, 2015. [Motik and Rosati, 2010] Boris Motik and Riccardo Rosati. Reconciling description logics and rules. J. ACM, 57(5):30:1 30:62, 2010. [Motik et al., 2005] Boris Motik, Ulrike Sattler, and Rudi Studer. Query answering for OWL-DL with rules. JWS, 3(1):41 60, 2005. [Ngo et al., 2016] Nhung Ngo, Magdalena Ortiz, and Mantas Simkus. Closed predicates in description logics: Results on combined complexity. In Proc. of KR, pages 237 246, 2016. [Ortiz, 2013] Magdalena Ortiz. Ontology based query answering: The story so far. In Proc. of AMW, 2013. [Rosati, 2005] Riccardo Rosati. On the decidability and complexity of integrating ontologies and rules. JWS, 3(1):61 73, 2005. [Rosati, 2006] Riccardo Rosati. Dl+log: Tight integration of description logics and disjunctive datalog. In Proc. of KR, pages 68 78, 2006. [Rudolph and Thomazo, 2015] Sebastian Rudolph and Micha el Thomazo. Characterization of the expressivity of existential rule queries. In Proc. of IJCAI, pages 3193 3199, 2015. [Seylan et al., 2009] Inanc Seylan, Enrico Franconi, and Jos de Bruijn. Effective query rewriting with ontologies over dboxes. In Proc. of IJCAI, pages 923 925, 2009.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)