# firstorder_rewritability_of_frontierguarded_ontologymediated_queries__c813e5f6.pdf

First-Order Rewritability of Frontier-Guarded Ontology-Mediated Queries

Pablo Barcel o1, Gerald Berger2, Carsten Lutz3 and Andreas Pieris4

1 Millennium Institute for Foundational Research on Data & DCC, University of Chile 2 Institute of Logic and Computation, TU Wien 3 Department of Mathematics and Computer Science, University of Bremen 4 School of Informatics, University of Edinburgh

We focus on ontology-mediated queries (OMQs) based on (frontier-)guarded existential rules and (unions of) conjunctive queries, and we investigate the problem of FO-rewritability, i.e., whether an OMQ can be rewritten as a ﬁrst-order query. We adopt two different approaches. The ﬁrst approach employs standard two-way alternating parity tree automata. Although it does not lead to a tight complexity bound, it provides a transparent solution based on widely known tools. The second approach relies on a sophisticated automata model, known as cost automata. This allows us to show that our problem is 2EXPTIME-complete. In both approaches, we provide semantic characterizations of FO-rewritability that are of independent interest.

1 Introduction

Ontology-based data access (OBDA) is a successful application of KRR technologies in information management systems [Poggi et al., 2008]. One premier goal is to facilitate access to data that is heterogeneous and incomplete. This is achieved via an ontology that enriches the user query, typically a union of conjunctive queries, with domain knowledge. It turned out that the ontology and the user query can be seen as two components of one composite query, called ontologymediated query (OMQ) [Bienvenu et al., 2014]. The problem of answering OMQs is thus central to OBDA. Building ontology-aware database systems from scratch, with sophisticated optimization techniques, is a non-trivial task that requires a great effort. An important route towards practical implementation of OMQ answering is thus to use conventional database management systems. The problem that such systems are unaware of ontologies can be addressed by query rewriting: the ontology O and the database query q are combined into a new query q O, the so-called rewriting, which gives the same answer as the OMQ consisting of O and q over all input databases. It is of course essential that the rewriting q O is expressed in a language that can be handled by standard database systems. The typical language that is considered in this setting is ﬁrst-order (FO) queries.

Although in the OMQ setting description logics (DLs) are often used for modeling ontologies, it is widely accepted that for handling arbitrary arity relations in relational databases it is convenient to use tuple-generating dependencies (TGDs), a.k.a. existential rules or Datalog rules. It is known, however, that evaluation of rule-based OMQs is undecidable [Cal ı et al., 2013]. This has led to a ﬂurry of activity for identifying restrictions on TGDs that lead to decidability. The main decidable classes are (i) (frontier-)guarded TGDs [Baget et al., 2011; Cal ı et al., 2013], which includes linear TGDs [Cal ı et al., 2012a], (ii) acyclic sets of TGDs [Fagin et al., 2005], and (iii) sticky sets of TGDs [Cal ı et al., 2012b]. There are also extensions that capture Datalog; see the same references.

For OMQs based on linearity, acyclicity, and stickiness, FO-rewritings are always guaranteed to exist [Gottlob et al., 2014]. In contrast, there are (frontier-)guarded OMQs that are inherently recursive, and thus not expressible as a ﬁrstorder query. This brings us to our main question: Can we check whether a (frontier-)guarded OMQ is FO-rewritable? Notice that for OMQs based on more expressive classes of TGDs that capture Datalog, the answer to the above question is negative, since checking whether a Datalog query is FOrewritable is an undecidable problem. Actually, we know that a Datalog query is FO-rewritable iff it is bounded [Ajtai and Gurevich, 1994], while the boundedness problem for Datalog is undecidable [Gaifman et al., 1993].

The above question has been studied for OMQ languages based on Horn DLs, including EL and ELI, which (up to a certain normal form) are a special case of guarded TGDs [Bienvenu et al., 2013; 2016; Lutz and Sabellek, 2017]. More precisely, FO-rewritability is semantically characterized in terms of the existence of certain tree-shaped ABoxes, which in turn allows the authors to pinpoint the complexity of the problem by employing automata-based procedures. As usual in the DL context, schemas consist only of unary and binary relations. However, in our setting we have to deal with relations of higher arity. This indicates that the techniques devised for checking the FO-rewritbility of DL-based OMQs cannot be directly applied to rule-based OMQs; this is further explained in Section 3. Therefore, we develop new semantic characterizations and procedures that are signiﬁcantly different from those for OMQs based on description logics.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

Our analysis aims to develop specially tailored techniques that allow us to understand the problem of checking whether a (frontier-)guarded OMQ is FO-rewritable, and also to pinpoint its computational complexity. Our plan of attack and results can be summarized as follows: We ﬁrst focus on the simpler OMQ language based on guarded TGDs and atomic queries, and, in Section 3, we provide a characterization of FO-rewritability that forms the basis for applying tree automata techniques. We then exploit, in Section 4, standard two-way alternating parity tree automata. In particular, we reduce our problem to the problem of checking the ﬁniteness of the language of an automaton. The reduction relies on a reﬁned version of the characterization of FO-rewritability established in Section 3. This provides a transparent solution to our problem based on standard tools, but it does not lead to an optimal result. Towards an optimal result, we use, in Section 5, a more sophisticated automata model, known as cost automata. This allows us to show that FO-rewritability for OMQs based on guarded TGDs and atomic queries is in 2EXPTIME, and in EXPTIME for predicates of bounded arity. Our application of cost automata is quite transparent, which, as above, relies on a reﬁned version of the characterization of FO-rewritability established in Section 3. However, the complexity analysis relies on an intricate result on the boundedness problem for a certain class of cost automata from [Benedikt et al., 2015]. Finally, in Section 6, by using the results of Section 5, we obtain our main results. We show that FO-rewritability is 2EXPTIME-complete for OMQs based on guarded TGDs and on frontier-guarded TGDs, no matter whether the actual queries are conjunctive queries, unions thereof, or the simple atomic queries. This remains true when the arity of the predicates is bounded by a constant, with the exception of guarded TGDs and atomic queries, for which the complexity then drops to EXPTIME-complete. In principle, the procedure based on tree automata also provides concrete FO-rewritings when they exist, but it is not tailored towards doing this in an efﬁcient way. Efﬁciently constructing rewritings is beyond the scope of this work.

2 Preliminaries

Basics. Let C, N, and V be disjoint, countably inﬁnite sets of constants, (labeled) nulls, and (regular) variables, respectively. A schema S is a ﬁnite set of relation symbols. The width of S, denoted wd(S), is the maximum arity among all relation symbols of S. We write R/n to denote that the relation symbol R has arity n 0. A term is either a constant, null, or variable. An atom over S is an expression of the form R( v), where R S is of arity n 0 and v is an n-tuple of terms. A fact is an atom whose arguments are constants.

Databases. An S-instance is a (possibly inﬁnite) set of atoms over the schema S that contain only constants and nulls, while an S-database is a ﬁnite set of facts over S. The active domain of an instance J, denoted adom(J), consists of all terms occurring in J. For X adom(J), we denote by J[X] the subinstance of J induced by X, i.e., the set of all facts R( a) with a X. A tree decomposition of an instance J is a tuple δ = (T , (Xt)t T ), where T = (T, E) is a (directed) tree

with nodes T and edges E, and (Xt)t T is a collection of subsets of adom(J), called bags, such that (i) if R( a) J, then there is v T such that a Xv, and (ii) for all a adom(J), the set {v T | a Xv} induces a connected subtree of T . The width of δ is the maximum size among all bags Xv (v T) minus one. The tree-width of J, denoted tw(J), is min{n | there is a tree decomposition of width n of J}.

Conjunctive queries. A conjunctive query (CQ) over S is a ﬁrst-order formula of the form q( x) := y φ( x, y), where x and y are tuples of variables, and φ is a conjunction of atoms R1( v1) Rm( vm) over S that mention variables from x y only. The variables x are the answer variables of q( x). If x is empty then q is a Boolean CQ. Let var(q) be the set of variables occurring in q. As usual, the evaluation of CQs over instances is deﬁned in terms of homomorphisms. A homomorphism from q to J is a mapping h: var(q) adom(J) such that Ri(h( vi)) J for each 1 i m. We write J |= q( a) to indicate that there is such a homomorphism h such that h( x) = a. The evaluation of q( x) over J, denoted q(J), is the set of all tuples a such that J |= q( a). A union of conjunctive queries (UCQ) q( x) over S is a disjunction n i=1 qi( x) of CQs over S. The evaluation of q( x) over J, denoted q(J), is the set of tuples 1 i n qi(J). We write J |= q( a) to indicate that J |= qi( a) for some 1 i n. Let CQ be the class of conjunctive queries, and UCQ the class of UCQs. We also write AQ0 for the class of atomic queries of the form P(), where P is a 0-ary predicate.

Tuple-generating dependencies. A tuple-generating dependency (TGD) (a.k.a. existential rule) is a ﬁrst-order sentence of the form τ : x, y (φ( x, y) z ψ( x, z)), where φ and ψ are conjunctions of atoms that mention only variables. For brevity, we write φ( x, y) z ψ( x, z), and use comma instead of for conjoining atoms. We assume that each variable of x is mentioned in ψ. We call φ and ψ the body and head of the TGD, respectively. The TGD τ is logically equivalent to the sentence x (qφ( x) qψ( x)), where qφ( x) and qψ( x) are the CQs y φ( x, y) and z ψ( x, z), respectively. Thus, an instance J satisﬁes τ if qφ(J) qψ(J). Also, J satisﬁes a set of TGDs O, denoted J |= O, if J satisﬁes every τ O. Let TGD be the class of ﬁnite sets of TGDs.

Ontology-mediated queries. An ontology-mediated query (OMQ) is a triple Q = (S, O, q( x)), where S is a (nonempty) schema (the data schema), O is a set of TGDs (the ontology), and q( x) is a UCQ over S sig(O), where sig(O) is the set of relation symbols in O. Notice that the ontology O can introduce relations that are not in S; this allows us to enrich the schema of q( x). We include S in the speciﬁcation of Q to emphasize that Q will be evaluated over S-databases, even though O and q( x) may use additional relation symbols. The semantics of Q is given in terms of certain answers. The certain answers to a UCQ q( x) w.r.t. an S-database D, and a set O of TGDs, is the set of all tuples a of constants, where | a| = | x|, such that (D, O) |= q( a), i.e., J |= q( a) for every instance J D that satisﬁes O. We write D |= Q( a) if a is a certain answer to q w.r.t. D and O. Moreover, we set Q(D) := { a adom(D)| x| | D |= Q( a)}.

Ontology-mediated query languages. We write (C, Q) for

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

the class of OMQs (S, O, q), where O falls in the class of TGDs C, and q in the query language Q. The evaluation problem for (TGD, UCQ), i.e., given a query Q (TGD, UCQ) with data schema S, an S-database D, and a adom(D)| x|, to decide whether D |= Q( a), is undecidable; this holds even for (TGD, AQ0) [Cal ı et al., 2013]. Here we deal with one of the most paradigmatic decidable restrictions, i.e., guardedness. A TGD is guarded if it has a body atom, called guard, that contains all the body variables. Let G be the class of all ﬁnite sets of guarded TGDs. A TGD τ is called frontierguarded if its body contains an atom, called frontier-guard, that contains the frontier of τ, i.e., the body variables that appear also in the head. We write FG for the class of all ﬁnite sets of frontier-guarded TGDs. Roughly, the evaluation problem for (G, UCQ) and (FG, UCQ) is decidable since G and FG admit tree-like universal models [Cal ı et al., 2013].

First-order rewritability. A ﬁrst-order (FO) query over a schema S is a (function-free) FO formula φ( x), with x being its free variables, that uses only relations from S. The evaluation of φ over an S-database D, denoted φ(D), is the set of tuples { a adom(D)| x| | D |= φ( a)}; |= denotes the standard notion of satisfaction for FO. An OMQ Q = (S, O, q( x)) is FO-rewritable if there exists a (ﬁnite) FO query φQ( x) over S that is equivalent to Q, i.e., for every S-database D it is the case that Q(D) = φQ(D). We call φQ( x) an FO-rewriting of Q. A fundamental task for an OMQ language (C, Q), where C is a class of TGDs and Q is a class of queries, is deciding ﬁrst-order rewritability:

PROBLEM : FORew(C, Q) INPUT : An OMQ Q (C, Q). QUESTION : Is it the case that Q is FO-rewritable?

First-order rewritability of (FG, UCQ)-queries. As shown by the following example, there exist (G, CQ) queries (and thus, (FG, UCQ) queries) that are not FO-rewritable.

Example 1. Consider the OMQ Q = (S, O, q) (G, CQ), where S = {S/3, A/1, B/1}, O consists of

S(x, y, z), A(z) R(x, z), S(x, y, z), R(x, z) R(x, y),

and q = x, y, z (S(x, y, z) R(x, z) B(y)). Intuitively, an FO-rewriting of Q should check for the existence of a set of atoms {S(c, ai, ai 1)}1 i k, among others, for k 0. However, since there is no upper bound for k, this cannot be done via a ﬁnite FO-query, and thus, Q is not FO-rewritable. A proof that Q is not FO-rewritable is given below.

On the other hand, there are (frontier-)guarded OMQs that are FO-rewritable; e.g., the OMQ obtained from the query Q in Example 1 by adding A(z) to q is FO-rewritable with x, y, z (S(x, y, z) B(y) A(z)) being an FO-rewriting.

3 Semantic Characterization

We proceed to give a characterization of FO-rewritability of OMQs from (G, AQ0) in terms of the existence of certain

tree-like databases. Our characterization is related to, but different from characterizations used for OMQs based on DLs such as EL and ELI [Bienvenu et al., 2013; 2016]. The characterizations in [Bienvenu et al., 2013; 2016] essentially state that a unary OMQ Q is FO-rewritable iff there is a bound k such that, whenever the root of a tree-shaped database D is returned as an answer to Q, then this is already true for the restriction of D up to depth k. The proof of the (contrapositive of the) only if direction uses a locality argument: if there is no such bound k, then this is witnessed by an inﬁnite sequence of deeper and deeper tree databases that establish non-locality of Q. For guarded TGDs, we would have to replace tree-shaped databases with databases of bounded tree-width. However, increasing depth of tree decompositions does not correspond to increasing distance in the Gaifman graph, and thus, does not establish non-locality. We therefore depart from imposing a bound on the depth, and instead we impose a bound on the number of facts, as detailed below. It is also interesting to note that, while it is implicit in [Bienvenu et al., 2016] that an OMQ based on ELI and CQs is FO-rewritable iff it is Gaifman local, there exists an OMQ from (G, CQ) that is Gaifman local, but not FO-rewritable. Such an OMQ is the one obtained from the query Q given in Example 1, by removing the existential quantiﬁcation on the variable x in the CQ q, i.e., converting q into a unary CQ.

Theorem 1. Consider an OMQ Q (G, AQ0) with data schema S. The following are equivalent:

1. Q is FO-rewritable.

2. There is a k 0 such that, for every S-database D of tree-width at most wd(S) 1, if D |= Q, then there is a D D with at most k facts such that D |= Q.

For (1) (2) we exploit the fact that, if Q (G, AQ0) is FO-rewritable, then it can be expressed as a UCQ q Q. This follows from the fact that OMQs from (G, AQ0) are preserved under homomorphisms [Bienvenu et al., 2014], and Rossman s Theorem stating that an FO query is preserved under homomorphisms over ﬁnite instances iff it is equivalent to a UCQ [Rossman, 2008]. It is then easy to show that (2) holds with k being the size of the largest disjunct of the UCQ q Q. For (2) (1), we use the fact that, if there is an S-database D that entails Q, then there exists one of tree-width at most wd(S) 1 that entails Q, and can be mapped to D. The next example illustrates Theorem 1.

Example 2. Consider the OMQ Q = (S, O, P) (G, AQ0), where S = {S/3, A/1, B/1}, and O consists of the TGDs given in Example 1 plus the guarded TGD

S(x, y, z), R(x, z), B(y) P,

which is essentially the CQ q from Example 1. It is easy to verify that, for an arbitrary k 0, the S-database

Dk = {A(a0), S(c, a1, a0), . . . , S(c, ak 1, ak 2), B(ak 1)}

of tree-width wd(S) 1 = 2 is such that Dk |= Q, but for every D Dk with at most k facts, D |= Q. Thus, by Theorem 1, Q is not FO-rewritable.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

4 Alternating Tree Automata Approach

In this section, we exploit the well-known algorithmic tool of two-way alternating parity tree automata (2ATA) over ﬁnite trees of bounded degree (see, e.g., [Cosmadakis et al., 1988]), and prove that FORew(G, AQ0) can be solved in elementary time. Although this result is not optimal, our construction provides a transparent solution to FORew(G, AQ0) based on standard tools. This is in contrast with previous studies on closely related problems for guarded logics, in which all elementary bounds heavily rely on the use of intricate results on cost automata [Blumensath et al., 2014; Benedikt et al., 2015]. We also apply such results later, but only in order to pinpoint the exact complexity of FORew(G, AQ0). The idea behind our solution to FORew(G, AQ0) is, given a query Q (G, AQ0), to devise a 2ATA BQ such that Q is FO-rewritable iff the language accepted by BQ is ﬁnite. This is a standard idea with roots in the study of the boundedness problem for monadic Datalog (see e.g., [Vardi, 1992]). In particular, our main result establishes the following:

Theorem 2. Let Q (G, AQ0) with data schema S. There is a 2ATA BQ on trees of degree at most 2wd(S) such that Q is FO-rewritable iff the language of BQ is ﬁnite. The state set of BQ is of double exponential size in wd(S), and of exponential size in |S sig(O)|. Furthermore, BQ can be constructed in double exponential time in the size of Q.

As a corollary to Theorem 2 we obtain the following result:

Corollary 3. FORew(G, AQ0) is in 3EXPTIME, and in 2EXPTIME for predicates of bounded arity.

From Theorem 2, to check whether a query Q (G, AQ0) is FO-rewritable, it sufﬁces to check that the language of BQ is ﬁnite. The latter is done by ﬁrst converting BQ into a nondeterministic bottom-up tree automaton B Q; see, e.g., [Vardi, 1998]. This incurs an exponential blowup, and thus, B Q has triple exponentially many states. We then check the ﬁniteness of the language of B Q in polynomial time in the size of B Q by applying a standard reachability analysis; see [Vardi, 1992]. For predicates of bounded arity, a similar argument as above provides a double exponential time upper bound. In the rest of Section 4 we explain the proof of Theorem 2. The intuitive idea is to construct a 2ATA BQ whose language corresponds to suitable encodings of databases D of bounded tree-width that minimally satisfy Q, i.e., D |= Q, but if we remove any atom from D, then Q is no longer satisﬁed.

A reﬁned semantic characterization. In order to apply an approach based on 2ATA, it is essential to revisit the semantic characterization provided by Theorem 1. To this end, we need to introduce some auxiliary terminology. Let D be a database, and δ = (T , (Xv)v T ), where T = (T, E), a tree decomposition of D. An adornment of the pair (D, δ) is a function η: T 2D such that η(v) D[Xv] for all v T, and v T η(v) = D. Therefore, the pair (δ, η) can be viewed as a representation of the database D along with a tree decomposition of it. For the intended characterization, it is important that this representation is free of redundancies, formalized as follows. We say that δ is η-simple if |η(v)| 1

for all v T, and non-empty η-labels are unique, that is, η(v) = η(w) for all distinct v, w T with η(v) and η(w) non-empty. Nodes v T with η(v) empty, called white from now on, are required since we might not have a (unique!) fact available to label them. Note, though, that white nodes v are still associated with a non-empty set of constants from D via Xv. All other nodes are called black. While δ being η-simple avoids redundancies that are due to a fact occurring in the label of multiple black nodes, additional redundancies may arise from the inﬂationary use of white nodes. We say that a node v T is η-well-colored if it is black, or it has at least two successors and all its successors are η-well-colored. We say that δ is η-well-colored if every node in T is η-wellcolored. For example, δ is not η-well-colored if it has a white leaf, or if it has a white node and its single successor is also white. Informally, requiring δ to be η-well-colored makes it impossible to blow up the tree by introducing white nodes without introducing black nodes. For i {1, 2}, let Di be a database, δi a tree decomposition of Di, and ηi an adornment of (Di, δi). We say that (D1, δ1, η1) and (D2, δ2, η2) are isomorphic if the latter can be obtained from the former by consistenly renaming constants in D1 and tree nodes in δ1. We are now ready to revisit the characterization of FOrewritability for OMQs from (G, AQ0) given in Theorem 1.

Theorem 4. Consider an OMQ Q (G, AQ0) with data schema S. The following are equivalent: 1. Condition 2 from Theorem 1 is satisﬁed. 2. There are ﬁnitely many non-isomorphic triples (D, δ, η), where D is an S-database, δ a tree decomposition of D of width at most wd(S) 1, and η an adornment of (D, δ), such that (a) δ is η-simple and η-well-colored, (b) D |= Q, and (c) for every α D, it is the case that D \ {α} |= Q.

Devising automata. We proceed to discuss how the 2ATA announced in Theorem 2 is constructed. Consider an OMQ Q = (S, O, P) from (G, AQ0). Our goal is to devise an automaton BQ whose language is ﬁnite iff Condition 2 from Theorem 4 is satisﬁed. By Theorems 1 and 4, Q is then FOrewritable iff the language of BQ is ﬁnite. The 2ATA BQ will be the intersection of several automata that check the properties stated in item 2 of Theorem 4. But ﬁrst we need to say a few words about tree encodings. Let Γ be a ﬁnite alphabet, and let (N \ {0}) denote the set of all ﬁnite words of positive integers, including the empty word. A ﬁnite Γ-labeled tree is a partial function t: (N \ {0}) Γ such that the domain of t is ﬁnite and preﬁx-closed. Moreover, if v i belongs to the domain of t, then v (i 1) also belongs to the domain of t. In fact, the elements in the domain of t identify the nodes of the tree. It can be shown that an Sdatabase D, a tree decomposition δ of D of width w 1, and an adornment η of (D, δ), can be encoded as a ΓS,w-labeled tree t of degree at most 2w, where ΓS,w is an alphabet of size double exponential in w and exponential in S, such that each node of δ corresponds to exactly one node of t and vice versa. Although every D can be encoded into a ΓS,w-labeled tree t, the converse is not true in general. However, it is possible to

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

deﬁne certain syntactic consistency conditions such that every consistent ΓS,w-labeled t can be decoded into an S-database, denoted Jt K, whose tree-width is at most w. We are going to abbreviate the alphabet ΓS,wd(S) by ΓS.

Lemma 5. There is a 2ATA CS that accepts a ΓS-labeled tree t iff t is consistent. The number of states of CS is constant. CS can be constructed in polynomial time in the size of ΓS.

Since a ΓS-labeled tree incorporates the information about an adornment, the notions of being well-colored and simple can be naturally deﬁned for ΓS-labeled trees. Then:

Lemma 6. There is a 2ATA RS that accepts a consistent ΓSlabeled tree iff it is well-colored and simple. The number of states of RS is exponential in wd(S) and linear in |S|. RS can be constructed in polynomial time in the size of ΓS.

Concerning property 2(b) of Theorem 4, we can devise a 2ATA that accepts those trees whose decoding satisﬁes Q:

Lemma 7. There is a 2ATA AQ that accepts a consistent ΓSlabeled tree iff Jt K |= Q. The number of states of AQ is exponential in wd(S) and linear in |S sig(O)|. AQ can be constructed in double exponential time in the size of Q.

The crucial task is to check condition 2(c) of Theorem 4, which states the key minimality criterion. Unfortunately, this involves an extra exponential blowup:

Lemma 8. There is a 2ATA MQ that accepts a consistent ΓS-labeled tree t iff Jt K \ {α} |= Q for all α Jt K. The state set of MQ is of double exponential size in wd(S), and of exponential size in |S sig(O)|. Furthermore, MQ can be constructed in double exponential time in the size of Q.

Let us brieﬂy explain how MQ is constructed. This will expose the source of the extra exponential blowup, which prevents us from obtaining an optimal complexity upper bound for FORew(G, AQ0). We ﬁrst construct a 2ATA DQ that runs on ΛS-labeled trees, where ΛS is an alphabet that extends ΓS with auxiliary symbols that allow us to tag some facts in the input tree. In particular, DQ accepts a tree t iff t is consistent, there is at least one tagged fact, and Jt K |= Q where Jt K is obtained from Jt K by removing the tagged facts. Having DQ in place, we can then construct a 2ATA DQ that accepts a ΓS-labeled tree t if there is a way to tag some of its facts so as to obtain a ΛS-labeled tree t with Jt K |= Q. This is achieved by applying the projection operator on DQ. Since for 2ATAs projection involves an exponential blowup and DQ already has exponentially many states, DQ has double exponentially many. It should be clear now that MQ is the complement of DQ, and we recall that complementation of 2ATAs can be done in polynomial time. The desired automaton BQ is obtained by intersecting the 2ATAs in Lemmas 5 to 8. Since the intersection of 2ATA is feasible in polynomial time, BQ can be constructed in double exponential time in the size of Q.

5 Cost Automata Approach

We proceed to study FORew(G, AQ0) using the more sophisticated model of cost automata. This allows us to improve the

complexity of the problem obtained in Corollary 3 as follows:

Theorem 9. FORew(G, AQ0) is in 2EXPTIME, and in EXPTIME for predicates of bounded arity.

As in the previous approach, we develop a semantic characterization that relies on a minimality criterion for trees accepted by cost automata. The extra features provided by cost automata allow us to deal with such a minimality criterion in a more efﬁcient way than standard 2ATA. While our application of cost automata is transparent, the complexity analysis relies on an intricate result on the boundedness problem for a certain class of cost automata from [Benedikt et al., 2015]. Before we proceed further, let us provide a brief overview of the cost automata model that we are going to use.

Cost automata models. Cost automata extend traditional automata by providing counters that can be manipulated at each transition. Instead of assigning a Boolean value to each input structure (indicating whether it is accepted or not), these automata assign a value from N := N { } to each input. Here, we focus on cost automata that work on ﬁnite trees of unbounded degree, and allow for two-way movements; in fact, the automata that we need are those that extend 2ATA over ﬁnite trees with a single counter. The operation of such an automaton A on each input t will be viewed as a twoplayer cost game G(A, t) between players Eve and Adam. Recall that the acceptance of an input tree for a conventional 2ATA can be formalized via a two-player game as well. However, instead of the parity acceptance condition for 2ATA, plays in the cost game between Eve and Adam will be assigned costs, and the cost automaton speciﬁes via an objective whether Eve s goal is to minimize or maximize that cost. In case of a minimizing (resp., maximizing) objective, a strategy ξ of Eve in the cost game G(A, t) is n-winning if any play of Adam consistent with ξ has cost at most n (resp., at least n). Given an input tree t, one then deﬁnes the value of t in A as

JAK(t) := op{n | Eve has an n-winning strategy in G(A, t)},

where op = inf (resp., op = sup) in case Eve s objective is to minimize (resp., maximize). Therefore, JAK deﬁnes a function from the domain of input trees to N . We call functions of that type cost functions. A key property of such functions is boundedness. We say that JAK is bounded if there exists an n N such that JAK(t) n for every input tree t. We employ automata with a single counter, where Eve s objective is to minimize the cost, while satisfying the parity condition. Such an automaton is known in the literature as dist parity-automaton [Benedikt et al., 2015]. To navigate in the tree, it may use the directions {0, }, where 0 indicates that the automaton should stay in the current node, and means that the automaton may move to an arbitrary neighboring node, including the parent. For this type of automaton, we can decide whether its cost function is bounded [Benedikt et al., 2015; Colcombet and Fijalkow, 2016]. As usual, A denotes the size of A. Then:

Theorem 10. There is a polynomial f such that, for every dist parity-automaton A using priorities {0, 1} for the parity acceptance condition, the boundedness for JAK is decidable in time A f(m), where m is the number of states of A.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

Our goal is to reduce FORew(G, AQ0) to the boundedness problem for dist parity-automata.

A reﬁned semantic characterization. We ﬁrst need to revisit the semantic characterization provided by Theorem 1. Consider an S-database D, and a query Q = (S, O, P) (G, AQ0). Let k Q := |S sig(O)| ww, where w := wd(S sig(O)). A derivation tree for D and Q is a labeled k Q-ary tree T , with η being a node labeling function that assigns facts R( a), where R S sig(O) and a adom(D), to its nodes, that satisﬁes the following conditions:

1. For the root node v of T , η(v) = P.

2. For each leaf node v of T , η(v) D.

3. For each non-leaf node v of T , with u1, . . . , uk being its children, ({η(u1), . . . , η(uk)}, O) |= η(v).

Roughly, T describes how the 0-ary predicate P can be entailed from D and O. In fact, it is easy to show that D |= Q iff there is a derivation tree for D and Q. The height of T , denoted hgt(T ), is the maximum length of a branch in T , i.e., of a path from the root to a leaf node. Assuming that D |= Q, the cost of D w.r.t. Q, denoted cost(D, Q), is deﬁned as

min{hgt(T ) | T is a derivation tree for D and Q},

while the cost of Q, denoted cost(Q), is deﬁned as

sup{cost(D, Q) | D |= Q, D is an S-database with tw(D) wd(S) 1}.

In other words, the cost of Q is the least upper bound of the height over all derivation trees for all S-databases D of treewidth at most wd(S) 1 such that D |= Q. If there is no such a database, then the cost of Q is zero since sup := 0. Actually, cost(Q) = 0 indicates that Q is unsatisﬁable, which in turn means that Q is trivially FO-rewritable. Having the notion of the cost of an OMQ from (G, AQ0) in place, it should not be difﬁcult to see how we can reﬁne the semantic characterization provided by Theorem 1.

Theorem 11. Consider an OMQ Q (G, AQ0) with data schema S. The following are equivalent:

1. Condition 2 from Theorem 1 is satisﬁed.

2. cost(Q) is ﬁnite.

Devising automata. We brieﬂy describe how we can use cost automata in order to devise an algorithm for FORew(G, AQ0) that runs in double exponential time. Consider an OMQ Q = (S, O, P) (G, AQ0). Our goal is to devise a dist parity-automaton BQ such that the cost function JBQK is bounded iff cost(Q) is ﬁnite. Therefore, by Theorems 1 and 11, to check whether Q is FO-rewritable we simply need to check if JBQK is bounded, which, by Theorem 10, can be done in exponential time in the size of BQ. The input trees to our automata will be over the same alphabet ΓS that is used to encode tree-like S-databases in Section 4. Recall that for a dist parity-automaton A, the cost function JAK is bounded over a certain class C of trees if there is an n N such that JAK(t) n for every input tree t C. Then:

Lemma 12. There is a dist parity-automaton HQ such that JHQK is bounded over consistent ΓS-labeled trees iff cost(Q) is ﬁnite. The number of states of HQ is exponential in wd(S), and polynomial in |S sig(O)|. Moreover, HQ can be constructed in double exponential time in the size of Q.

The automaton HQ is built in such a way that, on an input tree t, Eve has an n-winning strategy in G(HQ, t) iff there is a derivation tree for Jt K and Q of height at most n. Thus, Eve tries to construct derivation trees of minimal height. The counter is used to count the height of the derivation tree. Having this automaton in place, we can now complete the proof of Theorem 9. The desired dist parity-automaton BQ is deﬁned as C S HQ, where C S is similar to the 2ATA CS (in Lemma 5) that checks for consistency of ΓS-labeled trees of bounded degree. Notice that C S is essentially a dist parityautomaton that assigns zero (resp., ) to input trees that are consistent (resp., inconsistent), and thus, C S HQ is welldeﬁned. Since the intersection of dist parity-automata is feasible in polynomial time, Lemma 5 and Lemma 12 imply that BQ has exponentially many states, and it can be constructed in double exponential time. Lemma 12 implies also that JBQK is bounded iff cost(Q) is ﬁnite. It remains to show that the boundedness of JBQK can be checked in double exponential time. By Theorem 10, there is a polynomial f such that the latter task can be carried out in time BQ f(m), where m is the number of states of BQ, and the claim follows. For predicates of bounded arity, we provide a similar analysis.

6 Frontier-Guarded OMQs

The goal of this section is to show the following result:

Theorem 13. It holds that:

FORew(FG, Q) is 2EXPTIME-complete, for each Q {UCQ, CQ, AQ0}, even for predicates of bounded arity.

FORew(G, Q) is 2EXPTIME-complete, for each Q {UCQ, CQ}, even for predicates of bounded arity.

FORew(G, AQ0) is 2EXPTIME-complete. Moreover, for predicates of bounded arity it is EXPTIME-complete.

Lower bounds. The 2EXPTIME-hardness in the ﬁrst and the second items is inherited from [Bienvenu et al., 2016], where it is shown that deciding FO-rewritability for OMQs based on ELI and CQs is 2EXPTIME-hard. For the 2EXPTIME-hardness in the third item we exploit the fact that containment for OMQs from (G, AQ0) is 2EXPTIME-hard, even if the right-hand side query is FO-rewritable; this is implicit in [Barcel o et al., 2014]. Finally, the EXPTIME-hardness in the third item is inherited from [Bienvenu et al., 2013], where it is shown that deciding FO-rewritability for OMQs based on EL and atomic queries is EXPTIME-hard.

Upper bounds. The fact that for predicates of bounded arity FORew(G, AQ0) is in EXPTIME is obtained from Theorem 9. It remains to show that FORew(FG, UCQ) is in 2EXPTIME. We reduce FORew(FG, UCQ) in polynomial time to FORew(FG, AQ0), and then show that the latter is in 2EXPTIME. This reduction relies on a construction from [Bienvenu et al., 2016], which allows us to reduce FORew(FG, UCQ) to

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

FORew(FG, UBCQ) with UBCQ being the class of union of Boolean CQs, and the fact that a Boolean CQ can be seen as a frontier-guarded TGD. To show that FORew(FG, AQ0) is in 2EXPTIME, we reduce it to FORew(G, AQ0), and then apply Theorem 9. This relies on treeiﬁcation, and is inspired by a translation of guarded negation ﬁxed-point sentences into guarded ﬁxed-point sentences [B ar any et al., 2015]. Our reduction may give rise to exponentially many guarded TGDs, but the arity is increased only polynomially. Since the procedure for FORew(G, AQ0) provided by Theorem 9 is double exponential only in the arity of the schema the claim follows.

7 Future Work The procedure based on 2ATA provides an FO-rewriting in case the input OMQ admits one, but it is not tailored towards doing this in an efﬁcient way. Our next step is to exploit the techniques developed in this work for devising practically efﬁcient algorithms for constructing FO-rewritings.

Acknowledgements Barcel o is funded by the Millennium Institute for Foundational Research on Data and Fondecyt grant 1170109. Berger is funded by the FWF project W1255-N23 and a DOC fellowship of the Austrian Academy of Sciences. Lutz is funded by the ERC grant 647289 CODA . Pieris is funded by the EPSRC programme grant EP/M025268/ VADA .

References [Ajtai and Gurevich, 1994] Mikl os Ajtai and Yuri Gurevich. Datalog vs ﬁrst-order logic. J. Comput. Syst. Sci., 49(3):562 588, 1994. [Baget et al., 2011] Jean-Franc ois Baget, Michel Lecl ere, Marie-Laure Mugnier, and Eric Salvat. On rules with existential variables: Walking the decidability line. Artif. Intell., 175(9-10):1620 1654, 2011. [B ar any et al., 2015] Vince B ar any, Balder ten Cate, and Luc Segouﬁn. Guarded negation. J. ACM, 62(3):22:1 22:26, 2015. [Barcel o et al., 2014] Pablo Barcel o, Miguel Romero, and Moshe Y. Vardi. Does query evaluation tractability help query containment? In PODS, pages 188 199, 2014. [Benedikt et al., 2015] Michael Benedikt, Balder ten Cate, Thomas Colcombet, and Michael Vanden Boom. The complexity of boundedness for guarded logics. In LICS, pages 293 304, 2015. [Bienvenu et al., 2013] Meghyn Bienvenu, Carsten Lutz, and Frank Wolter. First-order rewritability of atomic queries in horn description logics. In IJCAI, 2013. [Bienvenu et al., 2014] Meghyn Bienvenu, Balder ten Cate, Carsten Lutz, and Frank Wolter. Ontology-based data access: A study through disjunctive datalog, CSP, and MMSNP. ACM Trans. Database Syst., 39(4):33:1 33:44, 2014. [Bienvenu et al., 2016] Meghyn Bienvenu, Peter Hansen, Carsten Lutz, and Frank Wolter. First order-rewritability and containment of conjunctive queries in horn description logics. In IJCAI, pages 965 971, 2016.

[Blumensath et al., 2014] Achim Blumensath, Martin Otto, and Mark Weyer. Decidability results for the boundedness problem. Logical Methods in Computer Science, 10(3), 2014. [Cal ı et al., 2012a] Andrea Cal ı, Georg Gottlob, and Thomas Lukasiewicz. A general datalog-based framework for tractable query answering over ontologies. J. Web Sem., 14:57 83, 2012. [Cal ı et al., 2012b] Andrea Cal ı, Georg Gottlob, and Andreas Pieris. Towards more expressive ontology languages: The query answering problem. Artif. Intell., 193:87 128, 2012. [Cal ı et al., 2013] Andrea Cal ı, Georg Gottlob, and Michael Kifer. Taming the inﬁnite chase: Query answering under expressive relational constraints. J. Artif. Intell. Res., 48:115 174, 2013. [Colcombet and Fijalkow, 2016] Thomas Colcombet and Nathana el Fijalkow. The bridge between regular cost functions and omega-regular languages. In ICALP, pages 126:1 126:13, 2016. [Cosmadakis et al., 1988] Stavros S. Cosmadakis, Haim Gaifman, Paris C. Kanellakis, and Moshe Y. Vardi. Decidable optimization problems for database logic programs (preliminary report). In STOC, pages 477 490, 1988. [Fagin et al., 2005] Ronald Fagin, Phokion G. Kolaitis, Ren ee J. Miller, and Lucian Popa. Data exchange: semantics and query answering. Theor. Comput. Sci., 336(1):89 124, 2005. [Gaifman et al., 1993] Haim Gaifman, Harry G. Mairson, Yehoshua Sagiv, and Moshe Y. Vardi. Undecidable optimization problems for database logic programs. J. ACM, 40(3):683 713, 1993. [Gottlob et al., 2014] Georg Gottlob, Giorgio Orsi, and Andreas Pieris. Query rewriting and optimization for ontological databases. ACM Trans. Database Syst., 39(3):25:1 25:46, 2014. [Lutz and Sabellek, 2017] Carsten Lutz and Leif Sabellek. Ontology-mediated querying with the description logic EL: trichotomy and linear datalog rewritability. In IJCAI, pages 1181 1187, 2017. [Poggi et al., 2008] Antonella Poggi, Domenico Lembo, Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Riccardo Rosati. Linking data to ontologies. J. Data Semantics, 10:133 173, 2008. [Rossman, 2008] Benjamin Rossman. Homomorphism preservation theorems. J. ACM, 55(3):15:1 15:53, 2008. [Vardi, 1992] Moshe Y. Vardi. Automata theory for database theoreticans. In Theoretical Studies in Computer Science, pages 153 180, 1992. [Vardi, 1998] Moshe Y. Vardi. Reasoning about the past with two-way automata. In ICALP, pages 628 641, 1998.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)