# modeltheoretic_characterizations_of_existential_rule_languages__0ca38ca1.pdf

Model-theoretic Characterizations of Existential Rule Languages

Heng Zhang 1 , Yan Zhang 2,4 and Guifei Jiang 3

1College of Intelligence and Computing, Tianjin University, China 2School of Computer, Data and Mathematical Sciences, Western Sydney University, Australia 3College of Software, Nankai University, China 4School of Computer Science & Technology, Huazhong University of Science & Technology, China heng.zhang@tju.edu.cn, yan.zhang@westernsydney.edu.au, g.jiang@nankai.edu.cn

Existential rules, a.k.a. dependencies in databases, and Datalog+/- in knowledge representation and reasoning recently, are a family of important logical languages widely used in computer science and artiﬁcial intelligence. Towards a deep understanding of these languages in model theory, we establish model-theoretic characterizations for a number of existential rule languages such as (disjunctive) embedded dependencies, tuple-generating dependencies (TGDs), (frontier-)guarded TGDs and linear TGDs. All these characterizations hold for the class of arbitrary structures, and most of them also work on the class of ﬁnite structures. As a natural application of these results, complexity bounds for the rewritability of above languages are also identiﬁed.

1 Introduction

Existential rule languages, a family of languages that extend Datalog by allowing existential quantiﬁers in the rule head, had been initially introduced in databases in 1970s to specify the semantics of data stored in a database [Abiteboul et al., 1995]. Since then, existential rule languages such as tuplegenerating dependencies (TGDs), embedded dependencies and equality-generating dependencies have been extensively studied. These languages have been recently rediscovered as languages for data exchange [Fagin et al., 2005], data integration [Lenzerini, 2002] and ontology-mediated query answering [Cal ı et al., 2010]. Towards tractable reasoning, many restricted classes of these languages have been proposed, including linear and guarded TGDs [Cal ı et al., 2012], as well as frontier-guarded TGDs [Baget et al., 2011]. As a family of important logical languages, their model theory has not been fully investigated yet. In this work we aim at characterizing existential rule languages in a model-theoretic approach. Model-theoretic characterizations, which assert that a sentence in a language is deﬁnable in another language if, and only if, it enjoys some semantic properties, play a key role in the study of logic [Chang and Keisler, 1992]. We are interested in semantic properties that are simple and manageable. Model-theoretic characterizations based on such properties

Corresponding author.

thus provide a natural tool for identifying the expressibility of a language, i.e., determining which knowledge or ontology can be expressed in the language. Besides the major position in model theory and the key role on understanding expressiveness, model-theoretic characterizations also have many potential implications. For example, model-theoretic characterizations provide a natural way for developing algorithms to identify language rewritability, i.e., to decide whether a given theory or ontology can be rewritten in a simpler language. Such algorithms may play important roles in implementing systems for ontology-mediated query answering. Moreover, we are also interested in understanding why the guarded-based restrictions make existential rule languages tractable. We hope our characterizations give an alternative explanation on this question, which may provide a new insight to exploit new tractable languages. Model-theoretic characterizations over the class of ﬁnite structures for full TGDs (i.e., TGDs without existential quantiﬁers) and equality-generating dependencies had been studied in [Makowsky and Vardi, 1986], which are established by involving inﬁnite sets of dependencies. To remedy the ﬁnite expressibility, some conditions had been proposed, including Hull s ﬁnite-rank notion [1984] and Makowsky and Vardi s locality [1986]. Yet both of them are not very natural. Over ﬁnite structures, even for full TGDs, a natural model-theoretic characterization remains open [ten Cate and Kolaitis, 2014]. For arbitrary structures, except for some simple classes of dependencies such as full TGDs and negative constraints, to the best of our knowledge, no model-theoretic characterization is known for expressive existential rule languages such as TGDs and its guarded-based restrictions. In this work, we characterize existential rule languages by some natural semantic properties. The addressed languages consist of (disjunctive) embedded dependencies, TGDs, and several important restricted classes of TGDs such as frontierguarded TGDs, guarded TGDs and linear TGDs, three of the main languages for ontology-mediated query answering [Cal ı et al., 2010]. All the semantic properties involved in our characterizations are algebraic relationships among structures, incuding variants of homomorphisms and unions, as well as direct products. Interestingly, except the characterizations w.r.t. ﬁrst-order logic, all other characterizations hold for both ﬁnite structures and arbitrary structures. As a natural application, we also use the obtained characterizations to identify the

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

complexity of rewritability among the above languages. For proof details please refer to a long version of this paper, which is available at https://arxiv.org/abs/2001.08688.

2 Preliminaries

2.1 Notations and Conventions

All signatures involved in this paper are relational, consisting of a set of constant symbols and a set of relation symbols, each of which is armed with a natural number, its arity. Each term is either a variable or a constant symbol. Given a signature τ, atomic formulas, (ﬁrst-order) formulas and sentences over τ are deﬁned as usual. An atomic formula is relational if it is of the form R( t) where R is a relation symbol other than the equality symbol =. Given a formula ϕ, we write ϕ( x) if every free variable of ϕ appears in x. Fix τ as a signature. Every structure A over τ (or simply τ-structure) consists of a nonempty set A called its domain, a relation RA An for each n-ary relation symbol R τ, and a constant c A A for each constant symbol c τ. A structure is ﬁnite if its domain is ﬁnite, and inﬁnite otherwise. Let A be a τ-structure, and X a subset of A such that c A X for all constant symbols c τ. The substructure of A induced by a set X A, denoted A|X, is a τ-structure with domain X which interprets each relation symbol R τ as RA|X, and interprets each constant symbol c τ as c A. A structure B is called a substructure of A, or equivalently, A is called an extension of B, if B = A|X for some set X A. Let ν be a signature such that τ ν. A ν-structure B is called a ν-expansion of A if they have the same domain and share the same interpretation on every symbol in τ. Suppose a1, . . . , ak A, by (A, a1, . . . , ak) we denote the expansion of A that assigns each constant ai to a fresh constant symbol. Let A and B be τ-structures. If A and B have the same interpretations on constant symbols then let A B denote the union of A and B, which is a τ-structure with domain A B, interpreting R as RA RB for each relation symbol R τ, and interpreting c as c A for each constant symbol c τ. We say A is homomorphic to B, written A B, if there is a function h : A B such that (i) h(c A) = c B for all constant symbols c τ, and (ii) h(RA) RB for all relation symbols R τ. We write A B if both A B and B A hold. Let A be a structure. An assignment in A is a function from a set of variables to A. Given a tuple a of constants in A and a tuple x of variables of the same length, we let a/ x denote the assignment that maps the i-component of x to the i-component of a for 1 i | x|, where | x| denotes the length of x. Let s be an assignment in A and ϕ( x) be a ﬁrstorder formula. By A |= ϕ[s] we mean that ϕ is satisﬁed by s in A. In particular, if ϕ is a sentence, we simply write A |= ϕ, and say ϕ is satisﬁed in A, or equivalently, A is a model of ϕ. If the assignment a/ x is clear from the context, we simply use ϕ[ a] to denote ϕ[ a/ x]. Let Σ be a set of sentences, A is a model of Σ if A |= ϕ for all ϕ Σ. Given a sentence ψ, we write Σ ψ (resp., Σ ﬁn ψ) if every model (resp., ﬁnite model) of Σ is also a model of ψ.

2.2 Existential Rule Languages A generalized dependency (GD) is a sentence σ of the form

x(φ( x) y(ψ1( x, y) ψn( x, y)) (1)

where n 0, and φ, ψ1, . . . , ψn are conjunctions of atomic formulas. The left-hand (resp., right-hand) side of the implication is called the body (resp., head). Variables among x and y are called universal, and existential, respectively. A frontier variable is a universal variable that occurs in the head. In particular, σ is called nondisjunctive if n 1, and called a negative constraint if n = 0. In the latter case, we write σ as

x(φ( x) ). (2)

For simplicity, we will omit the universal quantiﬁers and the brackets appearing outside the atoms if no confusion occurs. Furthermore, a GD σ is called safe if every frontier variable of σ has at least one occurrence in some relational atomic formula in the body of σ. Every disjunctive embedded dependency (DED) is a safe generalized dependency which is not a negative constraint. Every embedded dependency (ED) is a nondisjunctive DED. In addition, an ED is called an tuplegenerating dependency (TGD) if it is equality-free. We will also address several important classes of restricted TGDs. A TGD σ is called frontier-guarded (resp., guarded) if there is a relational atomic formula α in its body that contains all the frontier (resp., universal) variables of σ. In either case, α is called the guard of σ. Moreover, σ is linear if the body of σ consists of exactly one conjunct. Note that all linear TGDs are guarded and all guarded TGDs are frontier-guarded.

3 Model-theoretic Characterizations

In this section, we address the model-theoretic characterizations of existential rule languages mentioned above.

3.1 Generalized Dependencies We ﬁrst give some notions. Let A and B be structures over a signature τ. By a tuple on A we mean a ﬁnite sequence of constants in A. We say that A is globally-homomorphic to B, written A B, if there is a function π that maps each tuple a on A to a tuple π( a) on B such that (A, a) (B, π( a)); in this case, we call π a global homomorphism from A to B, and call A a globally-homomorphic preimage of B. Given a ﬁrst-order sentence ϕ over τ, we say that ϕ is preserved under globally-homomorphic preimages [in the ﬁnite] if for all [ﬁnite] τ-structures A and B, if A is globally homomorphic to B and B is a model ϕ, then A is also a model of ϕ. Notice that every sentence preserved under globallyhomomorphic preimages is also preserved under globallyhomomorphic preimages in the ﬁnite, but not vice versa. By a routine check, it is easy to prove the following:

Proposition 1. Every set of GDs is preserved under globallyhomomorphic preimages [in the ﬁnite].

To establish the desired characterization, we hope that the preservation under globally-homomorphic preimages is not too powerful. The following is a very simple example which is slightly beyond the class of GDs but already not preserved under globally-homomorphic preimages in the ﬁnite.

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

Example 2. Let ψ denote x Q(x) and τ = {Q}. Let A be a τ-structure with A = {a, b} and QA = {a}. Let B be the substructure of A induced by {a}. Clearly, B is globally homomorphic to A. It is also easy to see that A is a model of ψ, but B is not, which implies that ψ is not preserved under globally-homomorphic preimages even in the ﬁnite.

The following theorem establishes the desired characterizations for the class of ﬁnite sets of GDs.

Theorem 3. A ﬁrst-order sentence is equivalent to a ﬁnite set of GDs iff it is preserved under globally-homomorphic preimages.

To prove this theorem, we need some notions and lemmas. Let A and B be structures over a signature τ. Given a class C of sentences over τ, we write A C B if for all sentences ϕ C, A |= ϕ implies B |= ϕ. For simplicity, we simply drop the subscript C if C is the class of all ﬁrst-order sentences over τ. We write A B if both A B and B A hold. We write Γ(x) to denote a set of formulas with exactly one free variable x. We say that Γ(x) is realized in a structure A if there is some a A such that A |= ϑ[a/x] for all formulas ϑ(x) Γ(x). By Th(A) we denote the class of all ﬁrst-order sentences satisﬁed in A. We say that A is ω-saturated if for every ﬁnite set X A, every set Γ(x) of formulas consistent with Th((A, a)a X) is realized in (A, a)a X. It is wellknown [Chang and Keisler, 1992] that for every structure A there is an ω-saturated structure B such that A B. Every existential-positive formula is a ﬁrst-order formula built on atomic formulas and negated atomic formulas by using connectives , and the quantiﬁer . Let + denote the class of existential-positive sentence. It is easy to prove:

Lemma 4. Let A and B be structures over the same signature. Then both of the following are true:

1. If A B then A + B.

2. If A + B and B is ω-saturated then A B.

Let GD denote the class of ﬁnite sets fo generalized dependencies. With Lemma 4, we are able to prove the following:

Lemma 5. Let A and B be ω-saturated structures over the same signature. If B GD A then A B.

Proof. Assume B GD A. We need to prove A B. By Lemma 4, it sufﬁces to show that for each tuple a on A there is a tuple π( a) such that (B, π( a)) GD (A, a). Note that, by Proposition 5.1.1 in [Chang and Keisler, 1992], (B, π( a)) and (A, a) are ω-saturated; so Lemma 4 is applicable. The desired statement can be done by an induction on the length of a. It is trivial for the case where | a| = 0. Assume as induction hypothesis that the desired statement holds for | a| = k 0, we need to prove that it also holds for the case where | a| = k + 1. Suppose a = ( a0, a). By inductive hypothesis, there is a tuple b0 such that

(B, b0) GD (A, a0). (3)

Let Γ(x) be the class of existential-positive formulas and their negations such that (A, a0) |= ϕ[a/x] for all ϕ(x) Γ(x). To prove the existence of a constant b B such that

(B, b0, b) GD (A, a0, a), (4)

by the ω-saturatedness of B, it sufﬁces to show that every ﬁnite subset of Γ(x) is realized in (B, b0). Let Γ0(x) be any ﬁnite subset of Γ(x). Let ϕ(x) denote the conjunction of all formulas in Γ0(x), and let ψ = xϕ(x). Clearly, ψ is equivalent to a ﬁnite set of GDs and (A, a0) |= ψ. By the inductive assumption (3), we know (B, b0) |= ψ, or equivalently, there exists a constant b B such that (B, b0) |= ϕ[b /x]. Consequently, Γ0(x) is realized in (B, b0), which is as desired.

Now we are able to prove the desired theorem.

Proof of Theorem 3. (Only-if) By Proposition 1. (If) We assume that ϕ is a ﬁrst-order sentence preserved under globally-homomorphic preimages. Let con(ϕ) denote the class of all GDs that are logical consequences of ϕ. We want to show that con(ϕ) is equivalent to ϕ, which implies the desire result by compactness. Let A be any model of con(ϕ). It sufﬁces to show that A is also a model of ϕ. Let

Σ = { γ : γ GD & A |= γ}.

Now we prove the following property: Claim. Σ {ϕ} is satisﬁable. Let Σ0 be an arbitrary ﬁnite subset of Σ. To show the claim, by compactness, it sufﬁces to show that Σ0 {ϕ} is satisﬁable. Towards a contradiction, assume that this is not the case. Suppose Σ0 = { γ1, . . . , γn}, and let ψ denote the formula γ1 γn. Then we must have ϕ ψ. It is not difﬁcult to see that ψ is equivalent to a GD (by renaming the individual variables and lifting the universal quantiﬁers, and then by a routine transformation). Thus, A should be a model of ψ. This implies that there is some integer i : 1 i n such that A |= γi, which contradicts with γi Σ and the deﬁnition of Σ. So, we obtain the claim. Applying the above claim, there is thus a model, say B, of Σ {ϕ}. Consequently, we have B GD A. Let A+ and B+ be ω-saturated structures such that A A+ and B B+. Then B+ GD A+ is clearly true, and B+ is a model of ϕ. By Lemma 5, A+ is then globally homomorphic to B+. Since by assumption ϕ is preserved under globally-homomorphic preimages, A+ should be a model of ϕ. So, A is also a model of ϕ. This thus completes the proof of Theorem 3.

Note that the above argument only works on the class of arbitrary structures. Over ﬁnite structures, the characterization is in general not true, as shown by the following proposition.

Theorem 6. There is a ﬁrst-order sentence that is preserved under globally-homomorphic preimages in the ﬁnite but is not equivalent to any ﬁnite set of GDs over ﬁnite structures.

This can prove by constructing an example, which can be done by a slight modiﬁcation to Gurevich and Shelah s counterexample (see, e.g., Theorem 2.1.1 in [Rosen, 2002]).

3.2 Disjunctive Embedded Dependencies According to the deﬁnition, DEDs are safe GDs that are not negative constraints. So, to characterize DEDs, we need some properties to assure the safeness and to avoid occurrences of negative constraints. To do the latter, we use a technique

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

called trivial structure, which was used in [Makowsky and Vardi, 1986] to characterize full TGDs. We ﬁrst recall the notion of trivial structure. A structure A is called trivial if the domain of A consists of exactly one element and every relation symbol in the signature is interpreted by A as the full relation on the domain of a proper arity. To capture the safeness of a DED, we propose a similar notion. A structure A is called sharp if all the following hold:

the domain of A consists of exactly two distinct constants, say and ;

for each constant symbol c in the signature, c A = ;

for each relation symbol R in the signature, RA consists of exactly a single tuple ( , . . . , ) of a proper length.

The following example shows that the sharp models are able to separate the class of DEDs from the class of GDs:

Example 7. Let σ be a DED of the following form:

P(x) R(x, y) Q(y). (5)

Let τ = {P, Q, R}, and let A be a τ-structure with the domain {a, b}, interpreting both P and Q as {a}, and interpreting R as {(a, a)}. Clearly, A is a sharp model of σ. Let σ0 denote the GD obtained from σ by replacing R(x, y) with R(x, x). Clearly, σ0 is a GD that is not satisﬁed in A.

The following result can be shown by a routine check:

Proposition 8. Let Σ be a ﬁnite set of GDs. Then all the following properties are equivalent:

1. Σ is equivalent to a ﬁnite set of DEDs;

2. Σ is equivalent to a ﬁnite set of DEDs over ﬁnite structures;

3. Σ has both a trivial model and a sharp model.

Note that both ϕ has a trivial model and ϕ has a sharp model can be regarded as trivial preservation properties.

3.3 Embedded Dependencies To characterize EDs, we use the notion of direct products. Let A and B be structures over a signature τ. The direct product of A and B, denoted A B, is a τ-structure deﬁned as follows:

the domain of A B is A B;

for all constant symbols c τ, c A B = c A, c B ;

for all k-ary relation symbols R τ, all tuples a on A, and all tuples b on B, ( a1, b1 , . . . , ak, bk ) RA B

if a RA and b RB, where ai and bi denote the i-th component of a and b, respectively.

We say a sentence ϕ is preserved under direct products [in the ﬁnite] if, for any two [ﬁnite] models A and B of ϕ, A B is also a model of ϕ. The following can be shown by a routine check.

Proposition 9. Every set of EDs is preserved under direct products [in the ﬁnite].

In general, the direct product preservation fails for DEDs. A simple counterexample is given as follows:

Example 10. Let σ denote the DED R S T where R, S and T are nullary relation symbols. Let τ be the signature {R, S, T}. Let A and B be τ-structures such that A and B have the same domain {a}; RA = RB = SA = T B = true, SB = T A = false. Clearly, both A and B are models of σ, but A B is not. Thus, σ is not preserved under direct products even in the ﬁnite. The following result shows that the property of direct product preservation exactly captures the class of DEDs in which the disjunctions can be eliminated. This works over the class of ﬁnite structures as well as the class of arbitrary structures. Theorem 11. A ﬁnite set of DEDs is equivalent to a ﬁnite set of EDs [over ﬁnite structures] iff it is preserved under direct products [in the ﬁnite].

3.4 Tuple-generating Dependencies Let A and B be structures over a signature τ. A strict homomorphism from A into (resp., onto) B is a function h from A into (resp., onto) B such that for every relation symbol R τ and every tuple a on A of a proper length, we have a RA iff h( a) RB, and for every constant symbol c τ, we have h(c A) = c B. If such a strict homomorphism exists, we say B is a strictlyhomomorphic image of A, and say A is, conversely, a strictlyhomomorphic preimage of B. A sentence ϕ is said to be preserved under strictly-homomorphic (pre)images [in the ﬁnite] if, for every [ﬁnite] model A of ϕ and every [ﬁnite] strictlyhomomorphic (pre)image B of A, B is also a model of ϕ. The following gives us the desired characterazations: Theorem 12. A ﬁnite set of EDs is equivalent to a ﬁnite set of TGDs [over ﬁnite structures] iff it is preserved under both strictly-homomorphic images and preimages [in the ﬁnite]. Interestingly, it is not difﬁcult to show that, if no constant symbol is involved, the strictly-homomorphic image preservation can be removed from the characterization.

3.5 Frontier-guarded TGDs To characterize frontier-guarded TGDs, we ﬁrst deﬁne some notations. Let A be a structure. We deﬁne {AX : X A} as a family of structures over the same signature such that for all X A, there is an isomorphism p X from A to AX such that p X(a) = a for all a X; for all X, Y A, AX AY = X Y , where AX and AY denote the domains of AX and AY , respectively. Every guarded set of A is deﬁned as a ﬁnite subset X of A that contains all interpretations of constant symbols in A. A sentence ϕ is said to be preserved under isomorphic unions [in the ﬁnite] if, for all [ﬁnite] models A of ϕ and all ﬁnite sets G of guarded sets of A, S X G AX is also a model of ϕ.

Example 13. Let τ denote {R} where R is a binary relation symbol. Let A be a τ-structure deﬁned as follows: the domain A consists of two distinct constants a and b; the relation symbol R is interpreted as A A.

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

A AX AY AX AY

Figure 1: Isomorphic Union in Example 13

Let X = {a}, Y = {a, b}, and G = {X, Y }. Then AX, AY and S Z G AZ are τ-structures illustrated by Figure 1. By a routine check, one can prove the following property: Proposition 14. Every set of frontier-guarded TGDs is preserved under isomorphic unions [in the ﬁnite]. Now, a natural question arises as to whether the isomorphic union preservation is able to separate frontier-guarded TGDs from TGDs. The following example shows that it is true. Example 15 (Example 13 cont.). Let σ denote the TGD

R(x, y) R(y, z) R(x, z) (6)

and let A be the structure deﬁned in Example 13. Then it is easy to see that A is a model of σ but S Z G AZ is not. So, σ is not preserved under isomorphic unions even in the ﬁnite. The following result provides the desired characterization. Note that the characterization also holds over ﬁnite structures. Theorem 16. A ﬁnite set of TGDs is equivalent to a ﬁnite set of frontier-guarded TGDs [over ﬁnite structures] iff it is preserved under isomorphic unions [in the ﬁnite]. Every conjunctive query (CQ) is a ﬁrst-order formula of the form yϑ( x, y) where ϑ is a conjunction of relational atomic formulas. Now we ﬁrst present a lemma as follows: Lemma 17. Let φ( x) be a CQ, τ the signature τ of φ, A a τ-structure, a a tuple on A with | a| = | x|, and G a ﬁnite set of guarded sets of A such that every constant in a belongs to some X G. If S X G AX |= φ[ a] then A |= φ[ a]. Now we are in the position to prove the theorem.

Sketched Proof of Theorem 16. (Only-if) By Proposition 14. (If) Only address arbitrary structures. A slight modiﬁcation to the following argument applies to ﬁnite structures. Let Σ be a ﬁnite set of TGDs preserved under isomorphic unions. We ﬁrst show that Σ is equivalent to a set of diverse dependencies, each of which is a sentence of the form

x(λuna( x) φ( x) yψ( x, y)) (7)

where φ and ψ are conjunctions of relational atomic formulas, and λuna( x) denotes V 1 i<j k ti = tj with t1, . . . , tk being an enumeration (without repetition) of all constant symbols and universal variables in φ and ψ. It is easy to show Claim 1. Σ is equivalent to a ﬁnite set of diverse dependencies. To present the proof, more notions are needed. Let σ be a diverse dependency of the form (7). The graph of σ is deﬁned as an undirected graph with each conjunct of ψ as a vertex and

with each pair of conjuncts of ψ that share some existential variable as an edge. We say that σ is quasi-frontier-guarded if, for every connected component δ of the graph of σ, the set of variables that occurs in both δ and x (the tuple of universal variables of σ) co-occur in some atomic formula in φ. Let Γ be a ﬁnite set of diverse dependencies that is equivalent to Σ. Take γ Γ as a diverse dependency of the form (7). Let Sγ denote the set of substitutions, which only map existential variables to some terms in γ, such that s(γ) is a quasifrontier-guarded diverse dependency. Let γ denote

λuna( x) φ( x) y _

s Sγ s(ψ)( x, y)

and let Γ be the set of γ for all γ Γ. We want to prove that Γ is equivalent to Σ. The direction Γ Σ follows from the deﬁnition of Γ . To show the converse, it sufﬁces to prove Claim 2. Σ γ for all γ Γ. Proof. Let A be a model of Σ and a a tuple on A such that A |= λuna[ a] and A |= φ[ a]. Let C be the set of all interpretations of constant symbols in A. Let G be the set of guarded sets of A such that if X G then all constants in X \ C co-occur in an atomic formula in φ( a). Let B = S X G AX. By deﬁnition we know B |= λuna[ a] and B |= φ[ a]. As Σ is preserved under isomorphic unions, B must be a model of Σ. Consequently, B is a model of γ. We thus have that B |= yψ[ a/ x], i.e., there is a tuple b on B with B |= ψ[ a, b]. Deﬁne a substitution s as follows: Given i = 1, . . . , | y|, let s(yi) = c if for some constant symbol c with bi = c A; if no such c then let s(yi) = xj for some j with bi = aj; if no such j either then let s(yi) = yi, where ai, bi, xi, yi denote the i-th components of a, b, x, y, respectively. Clearly B |= s( yψ)[ a/ x]. By Lemma 17, we have A |= s( yψ)[ a/ x]. By a careful check, one can show s Sγ, i.e., s(γ) is quasifrontier-guarded as desired. We omit the proof here.

With Claim 2, we then have that Γ is equivalent to Σ. Take γ Γ and suppose γ is of the form (8). It is easy to see that γ can be equivalently rewritten as a sentence γ of the form

s Sγ ys(ψ) _

1 i<j k xi = xj

where t1, . . . , tk is an enumeration (without repetition) of all terms in γ. Let Γ consist of γ for all γ Γ, and let (γ) be a set that consists of the TGD

x(φ( x) ys(ψ)) (10)

for all s Sγ, and the union of (γ) for all γ Γ. Let con(Γ ) denote the set of TGDs σ such that Γ σ. It is easy to see that each TGD in con(Γ ) is equivalent to a ﬁnite number of frontier-guarded TGDs. To complete the proof, it is thus sufﬁcient to show the following property: Claim 3. con(Γ ) is equivalent to Γ . This can be proved by combining the direct-product argument that proves Theorem 11 with the strictly-homomorphic preimage preservation argument that proves Theorem 12.

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

3.6 Guarded TGDs We say that a sentence ϕ is preserved under disjoint unions [in the ﬁnite] if, for each pair of [ﬁnite] models A and B of ϕ, A B is also a model of ϕ if both the following hold: (i) A and B have the same interpretations on constant symbols, and (ii) if X = A B and X = then A|X = B|X. Proposition 18. Every set of guarded TGDs is preserved under disjoint unions [in the ﬁnite]. The following example shows that the above property separates guarded TGDs from frontier-guarded TGDs. Example 19. Let σ be the following frontier-guarded TGD: E(x, y) E(y, z) C(y) (11) and let τ = {C, E}. Let A and B be τ-structures deﬁned by: the domain of A is {a, b} and the domain of B is {b, c}; CA = CB = , EA = {(a, b)}, and EB = {(b, c)}. Let X = A B = {b}. Clearly, A|X = B|X. By deﬁnition, A B is a τ-structure with {a, b, c} as domain, interpreting C as , and interpreting E as {(a, b), (b, c)}. It is easy to see that both A and B are models of σ, but A B is not. So, σ is not preserved under disjoint unions even in the ﬁnite. Now, let us present the desired characterization. Theorem 20. A ﬁnite set of TGDs is equivalent to a ﬁnite set of guarded TGDs [over ﬁnite structures] iff it is preserved under disjoint unions [in the ﬁnite]. The general idea of proving the hard direction is as follows: First show that every ﬁnite set of frontier-guarded TGDs preserved under disjoint unions [in the ﬁnite] is equivalent to a ﬁnite set of guarded TGDs [over ﬁnite structures]. As the disjoint union preservation always implies the isomorphic union preservation, by Theorem 16, we then have the desired result.

3.7 Linear TGDs Every sentence ϕ is said to be preserved under unions [in the ﬁnite] if, for all [ﬁnite] models A and B of ϕ with the same interpretations on constant symbols, A B is a model of ϕ. The following theorem was obtained by ten Cate et al.: Theorem 21 ([ten Cate et al., 2015]). A ﬁnite set of TGDs is equivalent to a ﬁnite set of linear TGDs over ﬁnite structures iff it is preserved under unions in the ﬁnite. To separate the class of linear TGDs from guarded TGDs, a simple example is presented as follows: Example 22. Let σ denote the following guarded TGD: P(x) Q(x) R(x). (12) Let τ denote {P, Q, R}. Let A and B be τ-structures with the same domain {a} such that P A = QB = RA = RB = and P B = QA = {a}. Then it is obvious that both A and B are models of σ. However, A B does not satisfy σ. Therefore, σ is not preserved under unions even in the ﬁnite. It is worth noting that ten Cate et al. s proof of Theorem 21 does not work over arbitrary structures. Fortunately, thanks to Theorem 16 and the ﬁnite model property of frontier-guarded TGDs, we are able to show the following characterization: Theorem 23. A ﬁnite set of TGDs is equivalent to a ﬁnite set of linear TGDs iff it is preserved under unions.

PSPACE-hard

PSPACE-hard 2EXPTIME-c

in PTIME RE-c

in RE in co RE

Figure 2: Complexity of Rewritability

4 Application: Complexity of Rewritability

As a direct application, we use the obtained model-theoretic characterizations to identify complexity bounds of language rewritability. Let PTIME (resp., PSPACE, 2EXPTIME) denote the class of languages accepted by some deterministic Turing machine in polynomial time (resp., polynomial space, double-exponential time). By [CO]RE we mean [the complement of] the class of recursively enumerable languages. Let FO denote the class of all ﬁrst-order sentences. Let GD (resp., DED, ED, TGD, FGTGD, GTGD and LTGD) denote the class of all ﬁnite sets of GDs (resp., DEDs, EDs, TGDs, frontier-guarded TGDs, guarded TGDs and linear TGDs). Suppose C and C are classes of ﬁrst-order sentences, and K a complexity class. A sentence ϕ C is called rewritable to C [in the ﬁnite] if there is a sentence ψ C such that ϕ is equivalent to ψ [over ﬁnite strutures]. We say that the rewritability of C to C [in the ﬁnite] is in K if there is a Turing machine M in K such that, given a sentence ϕ C as input, M accepts ϕ if and only if ϕ is rewritable to C [in the ﬁnite].

Theorem 24. The complexity of rewritability for the above existential rule languages is illustrated in Figure 2, where, along each arrow, the bound without underline indicates the complexity over arbitrary structures, and the bound with underline indicates the complexity over ﬁnite structures.

To prove the above theorem, we only explain the idea of proving the 2EXPTIME-completeness of the rewritability of FGTGD to GTGD. By Theorem 20, it sufﬁces to prove that recognizing the preservation of FGTGD under disjoint unions is 2EXPTIME-complete, which is proved in Statement 6 of Theorem 25. So, it remains to prove the following theorem:

Theorem 25. 1. Determining whether a given ﬁrst-order sentence is preserved under globally-homomorphic preimages [in the ﬁnite] is [co]RE-complete.

2. Determining whether a given set of GDs has both a trivial model and a sharp model is in PTIME.

3. Determining whether a given set of DEDs is preserved under direct products [in the ﬁnite] is [co]RE-complete.

4. Determining whether a given set of EDs is preserved under both strictly-homomorphic images and preimages [in the ﬁnite] is in [co]RE.

5. Determining whether a given set of EDs is preserved under isomorphic unions [in the ﬁnite] is in [co]RE.

6. Determine whether a given set of frontier-guarded TGDs is preserved under disjoint unions [in the ﬁnite] is 2EXPTIME-complete.

7. Determining whether a given set of guarded TGDs is preserved under unions [in the ﬁnite] is PSPACE-hard.

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

Sketched Proof. Only explain the idea of proving Statement 6. To yield the 2EXPTIME-membership, it sufﬁces to prove that determining whether a given set Σ of frontier-guarded TGDs is preserved under disjoint unions [in the ﬁnite] is in 2EXPTIME. We implement it by constructing a sentence ϕΣ such that Σ is preserved under disjoint unions [in the ﬁnite] iff ϕΣ is unsatisﬁable [over ﬁnite structures]. Thanks to the simplicity of the disjoint-union-preservation property, ϕΣ can be expressed in the guarded negation logic, a fragment of ﬁrst-order logic whose [ﬁnite] satisﬁability problem is proved to be 2EXPTIME-complete [B ar any et al., 2015]. For the 2EXPTIME-hardness, we reduce the boolean query answering problem for guarded TGDs to the disjoint-unionpreservation property of frontier-guarded TGDs. The former is proved by Cal ı et al. [2013] to be 2EXPTIME-hard. To implement the reduction, given a set Σ of guarded TGDs, a boolean atomic query (i.e., a boolean atomic formula) q and a database (i.e., a ﬁnite set of boolean atomic formulas) D, we construct a set Γ of frontier-guarded TGDs such that D Σ [ﬁn] q iff Γ is preserved under disjoint unions [in the ﬁnite], which thus completes the proof of Statement 6.

5 Conclusion and Related Work

We have established model-theoretic characterizations for several important classes of existential rules. Very interestingly, our characterizations show that the guarded-based notions are exactly captured by union-like preservations. Since union-like preservations can be regarded as modular properties in a certain sense, this work also provides alternative perspective on why guarded-based existential rule languages enjoy good computational properties. We believe this may shed new insight on identifying new tractable languages. There have been a number of earlier works related to ours. Over ﬁnite structures, Makowsky and Vardi [1986] established several characterizations for full TGDs (i.e., TGDs without existential quantiﬁers) and equality-generating dependencies; ten Cate et al. [2015] observed that the union preservation captures the deﬁnability of TGDs by linear TGDs. Over arbitrary structures, Lutz et al. [2011] established characterizations for description logics EL and DLLitehorn. Note that both EL and DL-Litehorn are sublanguages of existential rule languages. B ar any et al. [2013] proved that every TGDs-deﬁned ﬁrst-order sentence in the guarded negation fragment is deﬁnable by frontier-guarded TGDs. Moreover, in the setting of schema mapping, ten Cate and Kolaitis [2010] estashlished a number of characterizations for source-to-target TGDs (a class of acyclic TGDs) and its subclasses. In the setting of ontology-mediated query answering, [Zhang et al., 2015] characterizes the class of weakly-acyclic TGDs by using semi-oblivious chase termination; [Zhang et al., 2016] characterizes the class of DEDs by using both complexityand model-theoretic properties.

Acknowledgments

We are deeply indebted to Professor Carsten Lutz for insightful discussions. This work was partially supported by the National Natural Science Foundation of China (No. 61806102).

References [Abiteboul et al., 1995] Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995. [Baget et al., 2011] Jean-Franc ois Baget, Michel Lecl ere, Marie Laure Mugnier, and Eric Salvat. On rules with existential variables: Walking the decidability line. Artif. Intell., 175(910):1620 1654, 2011. [B ar any et al., 2013] Vince B ar any, Michael Benedikt, and Balder ten Cate. Rewriting guarded negation queries. In Proc. MFCS, pages 98 110, 2013. [B ar any et al., 2015] Vince B ar any, Balder ten Cate, and Luc Segouﬁn. Guarded negation. J. ACM, 62(3):22, 2015. [Cal ı et al., 2010] Andrea Cal ı, Georg Gottlob, Thomas Lukasiewicz, Bruno Marnette, and Andreas Pieris. Datalog+/-: A family of logical knowledge representation and query languages for new applications. In Proc. LICS, pages 228 242, 2010. [Cal ı et al., 2012] Andrea Cal ı, Georg Gottlob, and Thomas Lukasiewicz. A general datalog-based framework for tractable query answering over ontologies. J. Web Sem., 14:57 83, 2012. [Cal ı et al., 2013] Andrea Cal ı, Georg Gottlob, and Michael Kifer. Taming the inﬁnite chase: Query answering under expressive relational constraints. J. Artif. Intell. Res., 48:115 174, 2013. [Chang and Keisler, 1992] Chen C. Chang and H. Jerome Keisler. Model theory, Third Edition, volume 73 of Studies in logic and the foundations of mathematics. North-Holland, 1992. [Fagin et al., 2005] Ronald Fagin, Phokion Kolaitis, Ren ee J. Miller, and Lucian Popa. Data exchange: Semantics and query answering. Theor. Comput. Sci., 336(1):89 124, 2005. [Hull, 1984] Richard Hull. Finitely speciﬁable implicational dependency families. J. ACM, 31(2):210 226, 1984. [Lenzerini, 2002] Maurizio Lenzerini. Data integration: A theoretical perspective. In Proc. PODS, pages 233 246, 2002. [Lutz et al., 2011] Carsten Lutz, Robert Piro, and Frank Wolter. Description logic tboxes: Model-theoretic characterizations and rewritability. In Proc. IJCAI, pages 983 988, 2011. [Makowsky and Vardi, 1986] Johann A. Makowsky and Moshe Y. Vardi. On the expressive power of data dependencies. Acta Inf., 23(3):231 244, 1986. [Rosen, 2002] Eric Rosen. Some aspects of model theory and ﬁnite structures. Bulletin of Symbolic Logic, 8(3):380 403, 2002. [ten Cate and Kolaitis, 2010] Balder ten Cate and Phokion Kolaitis. Structural characterizations of schema-mapping languages. Commun. ACM, 53(1):101 110, 2010. [ten Cate and Kolaitis, 2014] Balder ten Cate and Phokion Kolaitis. Schema mappings: A case of logical dynamics in database theory. In Johan van Benthem on Logic and Information Dynamics, pages 67 100, 2014. [ten Cate et al., 2015] Balder ten Cate, Ga elle Fontaine, and Phokion Kolaitis. On the data complexity of consistent query answering. Theory Comput. Syst., 57(4):843 891, 2015. [Zhang et al., 2015] Heng Zhang, Yan Zhang, and Jia-Huai You. Existential rule languages with ﬁnite chase: Complexity and expressiveness. In Proc. AAAI, pages 1678 1685, 2015. [Zhang et al., 2016] Heng Zhang, Yan Zhang, and Jia-Huai You. Expressive completeness of existential rule languages for ontology-based query answering. In Proc. IJCAI, pages 1330 1337, 2016.

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)