# query_rewriting_for_ontologymediated_conditional_answers__84ece55e.pdf

The Thirty-Fourth AAAI Conference on Artiﬁcial Intelligence (AAAI-20)

Query Rewriting for Ontology-Mediated Conditional Answers

Medina Andres el, Magdalena Ortiz, Mantas ˇSimkus {andresel, ortiz}@kr.tuwien.ac.at, simkus@dbai.tuwien.ac.at Institute of Logic and Computation, TU Wien, Austria

Among many solutions for extracting useful answers from incomplete data, ontology-mediated queries (OMQs) use domain knowledge to infer missing facts. We propose an extension of OMQs that allows us to make certain assumptions for example, about parts of the data that may be unavailable at query time, or costly to query and retrieve conditional answers, that is, tuples that become certain query answers when the assumptions hold. We show that querying in this powerful formalism often has no higher worst-case complexity than in plain OMQs, and that these queries are ﬁrst-order rewritable for DL-Lite R. Rewritability is preserved even if we allow some use of closed predicates to combine the (partial) closedand open-world assumptions. This is remarkable, as closed predicates are a very useful extension of OMQs, but they usually make query answering intractable in data complexity, even in very restricted settings.

Introduction Querying incomplete data and extracting useful answers from it is a long-standing challenge, and many approaches have been proposed over the years. Ontology-mediated queries (OMQs) are a promising way to exploit domain knowledge for this purpose (Xiao et al. 2018; Bienvenu and Ortiz 2015). In an OMQ a regular database query is paired with ontological knowledge expressed as a theory in a suitable logical formalism; frequently, Description Logics (DLs) are used to write ontologies (Baader et al. 2003). When the OMQ is evaluated over an input dataset, it can access not only the explicit facts, but also implicit facts that can be inferred from the data and the ontology. OMQs are useful, but in many scenarios they are insufﬁcient. In particular, they provide limited means to return relaxed query answers, i.e. answers that are not certain (in terms of logical entailment), but that are still related to the initial query. To obtain more and better answers from incomplete data, we extend OMQs in a novel way, building on a classic idea: rather than obtaining only certain answers, we retrieve conditional answers which, in a nutshell, pair tuples (which need not be certain answers) with sets of facts that would

Copyright c 2020, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

make them true certain answers. Traditional certain answers are just assumptive answers where the assumption component is empty. Conditional answers are a well known tool for acccessing data that need not be current and complete (Demolombe 1992; 1998; Christiansen and Andreasen 1998), and have been explored in databases for several purposes, including temporal data (Arenas and Bertossi 2002), data streams (Cruz-Filipe, Gaspar, and Nunes 2019), and other forms of incomplete data (Zhang et al. 2007). In the setting of OMQs, they are very natural. For example, when querying diverse data sources a key application of OMQs parts of the data may be missing completely or be very costly to obtain.With the above in mind, we enrich OMQs with assumption patterns, which deﬁne the form of complex assertions that may become true about some individuals. This leads to a powerful formalism, but does not impact signiﬁcantly the worst-case complexity compared to plain OMQs. Importantly, our OMQs with assumptions are ﬁrstorder (FO) rewritable for DL-Lite R (Artale et al. 2009). An OMQ is said to be FO-rewritable if it can be converted into FO queries that return all the answers to the OMQ when evaluated over the data alone (with no ontology). FO-rewritability is highly desired: it implies very low data complexity, and that OMQ answering can be realized using existing database technologies (Bienvenu et al. 2018; Gottlob et al. 2014).

Our extension of OMQs turns out to be particularly powerful when complete and incomplete information coexist. Based on classical logic, standard OMQs make the openworld assumption (OWA), where a fact that cannot be inferred may be either true or false. Traditional databases, in contrast, make the closed-world assumption (CWA): facts that cannot be inferred are assumed false. The OWA makes OMQs useful for incomplete data, but at the same time, too weak to yield useful answers when the data is known to be partially complete. For example, consider an OMQ where a query asking for stations accessible from airport Heathrow: q(x) = conn(Heathrow, x) Station(x) is paired with a TBox stating that each airport is connected to some station: (Airport conn.Station in DL notation). Under the classical semantics, this OMQ does not have answers over the dataset {Airport(Heathrow), Station(Hyde Park)}: we

know that Heathrow is connected to station but we cannot be certain that it is Hyde Park (could be another station). However, if we know that the list of stations is complete in the data, and thus that s is the only station, then we expect Heathrow to be an answer. A simple yet powerful way to achieve this is to explicitly declare the predicates that should be assumed complete, e.g., marking Station as closed. Unfortunately, doing so is costly: even for the most restricted OMQs languages, like CQs paired with DL-Lite ontologies, closed predicates make OMQ answering intractable in data complexity, and destroy FO-rewritability (Lutz, Seylan, and Wolter 2013). Remarkably, in our OMQs, we can use closed predicates in our assumptions, for instance assuming that conn.Station(Heathrow), will allow us, to obtain Hyde Park as part of the answer, while preserving FOrewritability. Proofs that are ommitted due to space restrictions can be found in the extended version.

Preliminaries Let NC, NR, NI, and NV be countably inﬁnite, disjoint sets of concept names, role names, individuals, and variables, respectively. Elements of N R := NR {r | r NR} are called roles, and r is the inverse of r. (ELI) concepts are deﬁned inductively: (a) and each A NC are concepts, and (b) if C, D are concepts and r is a role, then C D and r.C are concepts. A basic concept is any A NC, or a concept r. (written also r). A (DL-Lite R) concept inclusion has the form B1 B2, a role inclusion the form r1 r2, and a disjointness assertion the form disj(B1, B2) or disj(r1, r2), where B1, B2 are basic concepts and r1, r2 are roles. A (DL-Lite R) TBox T is a ﬁnite set of concept inclusions, role inclusions, and disjointness assertions. Elements of NI NV are terms. Atoms are of the form r(t, t ) and C(t), where t, t are terms, r N R , and C is a concept. Atoms without variables are called ground atoms or assertions. An ABox A is a ﬁnite set of assertions. We use the term database, and the symbol D, for ABoxes with assertions only of the form A(t) or r(t, t ), where A NC and r NR.1 We employ the standard notation for interpretations. They have the form I = (ΔI, I), where ΔI is the domain and I is the interpretation function. We make the standard name assumption (SNA), since some of our results consider closed predicates.2 In particular, we assume that NI ΔI and c I = c for all c NI. The notions of modelhood, consistency and entailment are standard. Note that each database D can be seen as an interpretation ID with domain NI and with AI D = {c | A(c) D} for all A NC and r I D = {(c, c ) | r(c, c ) D} for all r NR. A ﬁrst-order (FO) query ϕ is any formula of function-free FO logic, built using atoms over concept and role names only, as well as equality atoms t1 = t2 for terms t1, t2. We use terms(ϕ), vars(ϕ) and free(ϕ) to denote the terms,

1This use of databases is not related to the openor closedworld assumption, and is not to be confused with DBoxes (Franconi, Ib a nez-Garc ıa, and Seylan 2011). 2With no closed predicates, the SNA doesn t impact our results.

the variables, and the free (or answer) variables of ϕ, respectively. The arity of ϕ is |free(ϕ)|. We may write ϕ( x) to indicate that free(ϕ) = x (here we assume an arbitrary but ﬁxed ordering of free(ϕ)). If ϕ( x) has the form y.(ℓ1 ℓk), where ℓ1, . . . , ℓk are atoms, then it is a conjunctive query (CQ). We may sometimes treat such a ϕ( x) as the set {ℓ1, . . . , ℓk}. If ϕ( x) has the form q1( x) qk( x), where each qi( x) is a CQ, we call it a union of conjunctive queries (UCQ). An answer to an FO query ϕ of arity k over an interpretation I is a k-tuple c of individuals such that I ϕ( c). We say that a 0-ary (aka Boolean) query evaluates to true in I if I |= ϕ (i.e., the empty tuple is an answer), and to false otherwise. A (k-ary) ontology-mediated query (OMQ) is a pair (q( x), T ) where q( x) is a k-ary CQ and T is a TBox. Its (certain) answers over a database D are deﬁned as ans(Q, D) = { c (NI)k | I |= q( c) for all I |= (T , D)}. The query answering problem is to decide, given an OMQ Q, a tuple a and a database D, whether a ans(Q, D). We use standard notations for substitutions, which map terms to terms, and call a substitution grounding if all terms are mapped to individuals. A substitution θ uniﬁes two sets of atoms Γ1 and Γ2 if Γ1θ = Γ2θ.

Assumptive Ontology-mediated Queries We extend standard OMQs with assumption patterns, which prescribe the shapes of additional assertions that we may want to assume true for query answering. In this paper, we choose to deﬁne them as ELI concepts and roles: this appears to be a relatively simple language for the assumption patters, and it naturally captures interesting examples.

Deﬁnition 1. An assumptive ontology-mediated query is a triple Q = (q( x), T , H), where (q( x), T ) is an OMQ, and H is a set of atoms called assumption patterns.

Intuitively, Q gives us conditional answers over D: pairs ( c, E) where c is an answer to (q( x), T ) over D E.

Deﬁnition 2. A pair ( a, E) of a tuple of constants a and an ABox E is a conditional answer to Q = (q( x), T , H) over a database D if

1. D E is consistent with T , and 2. there is substitution π such that: (a) π( x) = a, (b) E Hπ, and (c) T , D E qπ.

We denote by cans(Q, D) the set of conditional answers to Q over D. The conditional answering problem is deﬁned as follows: Given database D, AOMQ Q and a pair ( a, E), test if ( a, E) is a conditional answer of Q w.r.t. D. We call ( a, E) cans(Q, D) a minimal conditional answer if there is no E E such that ( a, E ) cans(Q, D), and denote the set thereof by cansmin(Q, D).

The set of (minimal) conditional answers can be in general inﬁnite, e.g., if Q = (A(x), , {A(x)}) and D = then cansmin(Q, D) = {(c, A(c)) | c NI}. We often consider the conditional answers where π ranges over the active domain of D, adom(D) = {a | A(a) D} {a, b | r(a, b) D}. We denote this subset of conditional answers by cansadom(Q, D).

Note that H and q may share variables, and this restricts the query positions for which some assumption patterns may be applicable (as the same substitution π is applied to both).

Example 1. Ann wants to ﬁnd vegan restaurants in a central district that are easily accessible by bus. She can use a query

q(x) = yz Vegan Rest(x) loc Next(x, y) Bus Stop(y) part Of (y, z) Cnt Dst(z).

But she knows that, in the database, some spatial relations like loc Next are incomplete, and to get their complete extensions queries are sent to a remote and sometimes slow geospatial database. Therefore, she chooses to enable assumptions H = {loc Next(x, y), Bus Stop(y)} to postpone verifying whether points are next to a bus stop. The query has no certain answers over the database A = {Vegan Rest(r1), Bus Stop(s1), Bus Stop(s2) part Of (s1, a), Cnt Dst(a)}, but the AOMQ (q(x), , H) produces (r1, {loc Next(r1, s1)}) as (minimal) conditional answer, and she can later verify whether the candidate r1 indeed has the bus stop s1 nearby. More complex assumptions like H = {loc Next(x, y), Bus Stop part Of .Cnt Dst(y)} give (r1, {loc Next(r1, s2), Bus Stop part Of .Cnt Dst (s2)}) as a conditional answer, indicating that that r1 is also an answer to q(x) if it is known to be located next to s2 and additionally s2 is situated in a central district.

The example illustrates that conditional answers for AOMQs can retrieve additional information that we would not obtain with standard OMQs. Indeed, AOMQs are a generalization of standard OMQs: all certain answers are conditional answers, provided that the database is consistent.

Proposition 1. Let Q = (q( x), T ) be an OMQ. Then, for every database D consistent with T we have

a ans(Q, D) iff ( a, ) cans((q( x), T , ), D)

For OMQ languages where we can rely on exisiting algorithms for entailment of regular OMQs and consistency testing, it is not hard to obtain an algorithm for answering AOMQs. Given a candidate conditional answer ( a, E), a database D, and an AOMQ Q = (q( x), T , H), we can decide whether ( a, E) cans(Q, D) in three steps: 1. Guess H H and a substitution π from vars(H ) free(q) to individuals in E. 2. Check if H π = E, if π( x) = A, and if D E is consistent with T . 3. Test whether T , D E qπ. For typical OMQ languages that combine well known DLs with CQs, the combined complexity of OMQ answering falls into the classes NP, EXPTIME, or 2-EXPTIME. In such cases, this simple algorithm yields tight complexity bounds, as the cost of the additional steps is subsumed by the cost of answering OMQs.

Theorem 1. Given ( a, E), a database D, and an AOMQ Q = (q( x), T , H), the following results hold for the combined complexity of deciding ( a, E) cans(Q, D):

It is NP-complete for DL-Litecore, DL-Lite R and EL . It is EXPTIME-complete for ELI, Horn-SHIQ and Horn-SHOIQ. It is EXPTIME-complete for SHIQ, SHOQ and SHOI.

Rewriting DL-Lite R AOMQs In this section we provide a rewriting algorithm for AOMQs Q = (q( x), T , H) where T is a DL-Lite R TBox. The algorithm takes such a Q as input, and it outputs a set of FOqueries Rew(Q), whose answers over any database D are in one-to-one correspondence with the conditional answers of Q over D. Given a TBox T , we denote by neg(T ) the set of all disjointness axioms in T , and by pos(T ) the set T \ neg(T ). Our procedure to obtain the full FO-rewriting for AOMQs has three steps: (1) We rewrite Q w.r.t. pos(T ) using the standard DL-Lite R rewriting rules. (2) Each resulting query is rewritten w.r.t. H. (3) Finally, we take into account neg(T ).

Rewriting w.r.t. Positive TBox It is a well-known result due to (Calvanese et al. 2007) that an OMQ with a DL-Lite R TBox can be reformulated as a UCQ that can simply be evaluated over an input dataset (without taking the input TBox into account).

Proposition 2 ((Calvanese et al. 2007)). For an OMQ Q = (q( x), T ), where T is a DL-Lite R TBox, one can compute a UCQ q T ( x) such that ans(Q, D) = { c | ID q T ( c)} for every database D consistent with T .

The above rewriting from an ontology-mediated CQ into a plain UCQ preserves conditional answers.

Lemma 1. Let (q( x), T ) be an OMQ where T is a DL-Lite R TBox. Then, for every database D consistent with T and every set of assumption patterns H,

cans((q( x), T , H), D) = cans((q T ( x), , H), D).

Rewriting w.r.t. Assumption Patterns In the next step, we rewrite the query with respect to the assumption patterns. The core idea is to identify subqueries which are made true by the assumptions, and drop them. Since H can contain complex atoms, it is convenient to expand it as follows.

Deﬁnition 3. An expansion of a set H of assumption patterns is obtained by adding atoms using the following rules:

If (C1 C2)(t) H, add C1(t) and C2(t) to H. If ( p.C)(t) H and there are no {p(t, y), C(y)} H, add p(t, y) and C(y) to H for some fresh variable y. If r (t, t ) H, then add r(t , t) to H. If r(t, t ) H, then add r (t , t) to H.

We use exp(H) to denote a ﬁxed arbitrary expansion of H.

In AOMQs, we can assume any grounding of the atoms in the hypothesis H, and this may make some atoms of the query true. We reﬂect this in the rewriting by simply dropping such atoms. In general, we may drop atoms that contain

answer variables, or that share variables with other atoms that are not removed. These variables need some care: we must keep track of the terms in H that will give us their query match in the conditional answers.

Deﬁnition 4 (Rewriting w.r.t. H). Let q( t) be a CQ and Γ a subset of atoms in q. We denote by keep(q, Γ) the set of variables in vars(Γ) that are in free(q) vars(q \ Γ). Let H be a set of assumption patterns and q( t) a CQ. We write q( x) H q ( x ) and call q a rewriting of q w.r.t. H if it is obtained by choosing (i) a subset Γ of the atoms of q and a subset H of H, (ii) a substitution h that uniﬁes Γ and exp(H ), and which has h(x) terms(H ) for each x keep(q, Γ), and then doing the following: 1. Replace by every atom in Γ. 2. Add {x = h(x) | x keep(q, Γ) vars(H )}. 3. Drop existential quantiﬁcation for each x vars(H ). The set hrew(q, H) contains all q such that q H q . Please note that the rewritten queries may have more free variables than the original q. After each rule application the resulting q has vars(H ) as free variables, additionally to the original free(q). These additional free variables and the added equality atoms will allow us to read from each (ordinary) answer to some rewritten OMQ, the assignment for H that gives the corresponding conditional answer. Example 2. Recall q(x) from Example 1, and consider H = {loc Next(x , y ), Bus Stop(y )}. Take Γ = {loc Next(x, y), Bus Stop(y)} and H = H. For h = {x x , y y }, since keep(q, Γ) = {x, y} and h(x), h(y) vars(H), we obtain the following query q (x, x , y ) as a rewriting of q w.r.t. H:

yz Vegan Rest(x) part Of (y, z) Cnt Dst(z)

x = x y = y .

We obtain ans(q (x, x , y ), D) = {(r1, r1, s1)} over our example database D, and the substitution π = {x r1, x r1, y s1} yields the conditional answer (r1, {loc Next(r1, s1), Bus Stop(s1)}). Note that if H = { loc Next.Bus Stop(x)} the rewriting is not applicable to q(x), since there is no uniﬁer h such that h(y) = x, for y keep(q, Γ). Intuitively, this is because H just ensures that some point is next to a bus stop, but one cannot ensure that the bus stop is in a central district. In contrast, H = { loc Next.(Bus Stop part Of.Cnt Dst)(x)} yields q (x) = Vegan Rest(x) x = x. The key to the correctness is that the uniﬁer h for Γ and H exists iff the atoms of Γ are made true in a grounding of H, so each rewriting step drops precisely atoms that would be made true by assuming some grounding of the hypothesis. We show that the rewriting is complete. Lemma 2. Let Q = (q( x), , H) be an AOMQ where q( x) is a CQ. For every substitution π, every database D, and H H, if (π( x), H π) cansadom(Q, D), then there is some q H q such that π( x, y) ans(q ( x, y), D).

Proof(Sketch). Take D, H H and (π( t), H π) cansadom(Q, D). By deﬁnition, D H π qπ. Consider

maximal subquery q such that D q π and choose Γ the remainder subquery. Making some case distinctions according to the size and shape of q , the rewriting is shown to be applicable to Γ, thus obtaining q which consists of q plus some term equalities, which is matched by π into D.

The following lemma shows that it is also sound. Lemma 3. Let Q = (q( x), , H) be an AOMQ and where q is a CQ, and let D be a database. If q( x) H q ( x y) and π is a substitution with π( x, y) ans(q , D), then there is some H H such that (π( x), H π) cansadom(Q, D).

Proof(Sketch). Arbitrarily ﬁx D, let q ( x y) be a rewriting of q w.r.t. H and π an arbitrary substitution such that π( x, y) ans(q ( x y), D). By deﬁnition, for some Γ q and H H such that y = vars(H ) there is a uniﬁer h such that h(x) y, for each x keep(q, Γ). Since Γh = Γ h and x = h(x) q , for each x keep(q, Γ) y, and since y free(q ), we obtain that Γ π (Γ {x = h(x) | x keep(q, Γ) y})π. It follows that, D H π qπ hence (π( x), H π) cansadom(Q, D).

To rewrite an AOMQ, we apply ﬁrst the standard OMQ rewriting and then use the assumption patterns. It is then convenient to identify UCQ q T ( x) as the set of its CQs, denoted by trew(q( x), T ). Deﬁnition 5 (Perfect Rewriting of AOMQ over pos(T )). Let Q = (q( x), pos(T ), H) be an AOMQ where T is a DL-Lite R TBox. Then its perfect rewriting is the following set of CQs Rew(Q)

{q ( y) | q ( y) hrew(q , H) and q ( x) trew(q( x), T )}.

Note that the queries in Rew(Q) may not all share the same answer variables, hence we write it as a set of CQs rather than a UCQ. By Lemmas 1, 2, and 3, Rew(Q) is sound and complete for AOMQs over TBoxes with no disjointness assertions. The proof can be found in the online version, and it is a special case of Theorem 2 below.

Incorporating the Disjointness Assertions Now we consider disjointness assertions. In standard OMQs, to evaluate an OMQ (ϕ, T ) one can usually check ﬁrst if the given database D is consistent with T , and then answer ϕ assuming consistency of (T , D). Unfortunately, this approach is not sufﬁcient for AOMQs, since we also need to verify whether a candidate E causes the inconsistency of an otherwise consistent (T , D). For that reason, we incorporate the inconsistency check as part of the rewritten query. Deﬁnition 6 (Inconsistency CQs). For any given DL-Lite R TBox T , let α neg(T ). We deﬁne the (Boolean) CQ qα() as follows: if α = disj(A, A ), then qα is x A(x) A (x), if α = disj(A, r), then qα is xy A(x) r(x, y), if α = disj(r1, r2), then qα is xy r1(x, y) r2(x, y). For a given set of assumption patterns H, the set of inconsistent CQs for (T , H), denoted by Rew (H, T ) is

α neg(T ) {q | q hrew(q , H), and q trew(qα, T )}.

Note that, again, we obtain a set of queries with possibly different subsets of vars(H) as free variables. We show the following using Lemmas 2 and 3.

Lemma 4. Let T be a DL-Lite R TBox and H a set of assumption patterns. For any database D consistent with T , and π a substitution that ranges over adom(D), we have that π( y) ans(q , D) for some q Rew (H, T ) iff there is some H H such that D π(H ) is inconsistent with T .

To prevent inconsistency we can negate Rew (H, T ) and add it to the previous rewriting. The only minor issue to take care of is that the queries have different free variables. For a tuple y from vars(H), we denote by Rew(Q y pos) the UCQ whose disjuncts are the queries q Rew(Qpos) with free(q) = y, and similarly, Rew y(H, T ) is a UCQ with all queries q Rew y(H, T ) with free(q) = y. Then the rewriting w.r.t. a general TBox is obtained as follows.

Deﬁnition 7 (Perfect Rewriting of general AOMQs). Let Q = (q( x), T , H) be an AOMQ such that T is a DL-Lite R TBox, Then the perfect rewriting of Q, Rew(Q) is deﬁned as

y vars(H) {Rew(Q( x, y) pos ) Rew y(H, T )}.

The following theorem is proved using Lemma 4 and the fact that perfect rewriting for Qpos is correct.

Theorem 2. Let Q = (q( x), T , H) be a DL-Lite R AOMQ. For every database D, the following are equivalent:

( a, E) cansadom(Q, D). There exists H H, ϕ( x, y) Rew(Q), a substitution π such that π( x) = a, H π = E and π( x, y) is an answer to the FO query ϕ over ID.

AOMQs with Closed Predicates It has been argued often in the literature that strengthening the usual certain answer semantics of OMQs, where all models under the usual open-world semantics of ontologies are taken into account, is highly desirable. This can be done by declaring some predicates as closed.

Deﬁnition 8. Let Σ NC NR be a set of closed predicates. An interpretation I is a Σ-model of (T , D) , written I |=Σ (T , D), if I |= (T , D) and additionally a p I implies p( a) D for all p Σ. An OMQ with closed predicates (OMQC) is a tuple Q = (q( x), T , Σ) where (q, T ) is an OMQ.

The notion of certain answers for OMQCs is lifted in the usual way from OMQs. Unfortunately, checking if a given tuple is an answer over its Σ models is intractable even if T is in the most basic DLs. For DL-Litecore, for instance, it is CONP-hard, and thus not FO-rewritable (Lutz, Seylan, and Wolter 2013). In this section, we show that AOMQs allow us to add closed predicates in a restricted, yet non-trivial way, while preserving FO-rewritability.

Deﬁnition 9 (AOMQ with closed predicates (AOMQC)). An AOMQ with closed predicates (AOMQC) is a tuple

Q = (q, H, T , Σ), where (q, H, T ) is an AOMQ and Σ is a set of concept and role names that do not occur in T . We generalize the deﬁnition of conditional answers to AOMQCs. Note the ﬁrst condition: now we cannot take arbitrary groundings E of H in conditional answers, but only groundings that respect the extensions of Σ in the input D. Deﬁnition 10. A pair ( a, E) of a tuple of constants a and an ABox E, is called a conditional answer to Q over D if 1. there is a Σ-model of T , D E, 2. p( t) E implies p( t) D for all p Σ, 3. there is substitution π such that: (a) π(free(ϕ)) = a, (b) E Hπ, and (c) T , D E Σ qπ. Conditional answers of AOMQCs capture typical scenarios where closed predicates are desirable for OMQs. Example 3. Let Q = (q(x), T , H, Σ) be an AOMQC with q(x) = y.(Skoda Model(x) has Engine(x, y) ICEng(y)), assumption patterns H = { has Engine.Skoda Eng(x)}, closed predicates Σ = {Skoda Model, Skoda Eng}, and TBox T = {Diesel Eng ICEng, Petrol Eng ICEng}. Consider also a database: D = {Skoda Model(sm17), Skoda Eng(se1), Skoda Eng (se2), D = { Diesel Eng(se1), Petrol Eng(se2)}. Evaluating Q over D produces as conditional answer (sm17, { has Engine.Skoda Eng(sm17)}). Intuitively, if we know that the only two Skoda engines are se1 and se2, and both are internal combustion (IC) engines, then it sufﬁces to assume that a Skoda model x = sm17 has a Skoda engine to infer that it has a IC engine. This example is a a minor adaptation of Example 1 in (Lutz, Seylan, and Wolter 2015). They consider a plain OMQ, and T contains also the axiom Skoda Model has Engine.Skoda Eng. They argue that sm17 is an intended answer, but their OMQ falls in a fragment that is not FO-rewritable, unlike the FO-rewritable AOMQC above. The remarkable feature of this use of closed predicates is that it does not cause the usual complexity increase: any AOMQC is FO-rewritable when the TBox is in DL-Lite R.

Rewriting with Closed Predicates We present now an FO-rewriting for AOMQCs Q = (q( x), T , H, Σ) with T a DL-Lite R TBox in normal form. Similarly as for AOMQs, we ﬁrst apply the TBox rewriting in Proposition 2, and then rewrite w.r.t. assumption patterns. This step, however, is different if H has closed predicates. Intuitively, now we cannot assume an arbitrary grounding of H, but only groundings that respect the extensions of Σ in the input database. There is no canonical way to ground H, and instead we need to iterate over all groundings and exclude those that are not valid w.r.t. Σ and the input data. However, they are determined by the ﬁnite number of possible groundings of the closed predicates, so we use an FOquery with universal quantiﬁcation to iterate over them. We denote by termsΣ(q) all terms t terms(q) such that p( t) q and p Σ. Deﬁnition 11 (Rewriting w.r.t. (H, Σ)). Let H be a set of assumption patterns, Σ a set of closed predicates and q a

CQ. A rewriting of q w.r.t. (H, Σ), written q H,Σ ϕ, has the following form:

y1, . . . , yk.(p1( y1) pk( yk) q )

where k 0, yi are fresh variables, and each pi( yi) is such that pi Σ. Such a rewriting is obtained by choosing: (i) a subset Γ of the atoms of q, and a subset H of H (ii) a substitution h that uniﬁes Γ and exp(H ), and which has h(x) terms(H ) termsΣ(H ), for each x keep(q, Γ), and then doing the following. First, we add an atom pi( yi) for each pi Σ such that pi(h( z)) H for some z vars(Γ), and then to obtain q , we proceed as follows: 1. Replace by every atom in Γ. 2. Add { z = yi | z vars(Γ), pi(h( z)) exp(H )}. 3. Add {x = h(x) | x keep(q, Γ) vars(H )}. 4. Drop from the existential quantiﬁcation all x vars(H ). The set of ϕ such that q H,Σ ϕ is denoted hrew(q, H, Σ). The rewriting generalizes Deﬁnition 4, but the main difference is that we add to the universally quantiﬁed precondition each closed p(x) in exp(H) that participates in the assumptions that make Γ true. Example 4. Let Σ = {Bus Stop} be the set of closed predicates, and let Q = (q(x), , { loc Next.Bus Stop(x)}, Σ), with q(x) as in Example 1. In this case, since we can test for all bus stops if they are part of a central district, H is applicable. Our rewriting captures this as follows: choose Γ = {loc Next(x, y), Bus Stop(y)}, let exp(H) = {loc Next(x, y ), Bus Stop(y )} and let h = {x x, y y }; h satisﬁes condition (ii), hence we obtain ϕ:

y .(Bus Stop(y ) yz.Vegan Rest(x) part Of (y, z)

Cnt Dst(z) y = y ).

We show next that the rewriting w.r.t. (H, Σ) is correct, analogously to the previous case. Lemma 5. Let Q = (q( x), , H, Σ) be a DL-Lite R AOMQC. For any database D, any H H and any substitution π, if (π( x), H π) cansadom(Q, D), then there is some ϕ such that q( x) H,Σ ϕ( x, y) and π( x, y) ans((ϕ, , Σ), D).

Proof(Sketch). For an arbitrary D and H H, take any π such that (π( x), H π) cansadom(Q, D). By deﬁnition we have that D H π Σ qπ. Intuitively, we can rely on a set of representative ground expansions of exp(H π) w.r.t. (Σ, D) to homomorphically map q into each Σ-model of D H π. Such representatives disagree only on varsΣ(exp(H π)). Then, the part of q that is mapped by π over D, is the same over all Σ-models. Therefore we obtain that there exist Γ, H and h as in Deﬁnition 11. Let q be obtained following steps 1-4 and let ϕ be of the form z1, . . . , zn.(p1( z1) pk( zk) q ), where 0 k n and pi Σ such that pi(h( z)) exp(H ) for z vars(Γ). It follows then that D Σ π(ϕ), hence π( x, y) ans((ϕ, , Σ), D).

We show next that the rewriting w.r.t (H, Σ) is sound.

Lemma 6. Let Q = (q( x), , H, Σ) be a DL-Lite R AOMQC and ϕ( x, y) a rewriting w.r.t. (H, Σ). Let D be a database consistent with T . If there is a substitution π with π( x, y) ans((ϕ, , Σ), D), then for some H H (π( x), H π) cansadom(Q, D).

Proof (Sketch). Fix D, a rewriting ϕ( x, y) w.r.t. (H, Σ), and substitution π such that π( x, y) ans((ϕ, , Σ), D). By deﬁnition there is some H H and Γ q that unify. Let expσ(H π) be the result of applying a grounding σ to exp(H π), so that p( a) D, for each p( a) expσ(H π) with p Σ. Similarly to previous case, since the subsets of atoms chosen in step (i) are uniﬁable and due to equality atoms, we obtain that D expσ(H π) Σ qπ, for each such σ. Lastly, for each Σ-model I of D H π there is some σ such that D expσ(H π) is homomorphically mapped into I. Hence, (π( x), H π) cansadom(Q, D).

The perfect rewriting is similar to the previous case, but using the rewriting w.r.t. (H, Σ) instead of w.r.t. H alone. Deﬁnition 12 (Perfect Rewriting of AOMQC over pos(T )). Let Q = (q( x), pos(T ), Σ, H) be a DL-Lite R AOMQ with closed predicates. The perfect rewriting of Q, Rew(Q) is

{ϕ ( x ) | ϕ ( x ) hrew(q , H, Σ) q ( x) trew(q, T )}.

Example 5. For Q of Example 3, trew(Q) equivalent to y.(Skoda Model(x) has Engine(x, y) (ICEngine(y)) Petrol Engine(y)) Diesel Engine(y)). Then, Rew(Q) is equivalent to: Skoda Model(x) y .(Skoda Engine(y ) y.y = y (ICEngine(y) Petrol Engine(y) Diesel Engine(y))). Evaluating such query over the data, will allow us to easily construct the expected conditional answer. Soundness of this rewriting follows from Lemmas 1, 5 and 6.

Disjointness assertions As we did above, we embed the consistency check into the perfect rewriting of an AOMQC. Deﬁnition 13 (Perfect rewriting for general AOMQCs). Let T be a DL-Lite R TBox. For a given set of assumption patterns H and a set of closed predicates Σ, the set of inconsistent queries for (H, T , Σ), denoted by Rew (H, T , Σ) is

α neg(T ) {q | ϕ hrew(q , H, Σ), and q trew(qα, T )}.

Let Q = (q, T , H, Σ) be a DL-Lite R AOMQC and let Qpos = (q, pos(T ), T , H, Σ). The perfect rewriting of Q, Rew(Q) is as follows:

y vars(H) { Rew y(H, T , Σ) (Rew(Q( x, y) pos ) Rew y(H, T , Σ)}.

The perfect rewriting is similar to Deﬁnition 7, except that in this case the ﬁrst conjunct checks if there are any groundings of H H that are consistent w.r.t. T and Σ, and then the second one checks if Rew(Q( x,vars(H )) pos ) holds for each such grounding. Analogously as before, we have the following lemma.

Q [#H] #trew(Q) #Rew(Q) #cansmin(Q) Time (sec) ans(Rew(Q)) cansmin(Q) Test gr(H) Fed SPARQL q1[3] 12 54 14415 0,6 1,1 40,4 42,7 q2[4] 44 47 1603 0,1 0,2 4,8 6,6 q3[4] 65 69 980 0,3 0,5 3,5 5,4 q4[3] 14 64 60501 2,1 4,6 145,6 163,9 q5[1] 3 6 8742 0,08 0,2 22 32,2

Table 1: Evaluation results for AOMQs over My ITS dataset.

Lemma 7. Take a DL-Lite R TBox T , a set of assumption patterns H, and a set Σ of predicates. For any database D consistent with T and any substitution π ranging over adom(D), we have π( y) ans(q , D) for some q Rew (H, T , Σ) iff there is some H H such that D π(H ) is Σ-inconsistent with T .

We can now prove the correctness of our rewriting.

Theorem 3. Let Q = (q( x), T , H, Σ) be a DL-Lite R AOMQC. For every database D, the following are equivalent:

( a, E) cansadom(Q, D). There exists H H, ϕ( x, y) Rew(Q), and a substitution π such that π( x) = a, H π = E and π( x, y) is an answer to the FO query ϕ over ID.

Empirical Evaluation To demonstrate the potential usefulness of our approach, we developed a prototype implementation of the AOMQ rewriting. It was done in Java using Apache Jena 2.11 and Jena ARQ as SPARQL query engine, and tested on a Mac Book Pro i5 2.7, Sierra OS. We used the ontology, data, and a data generation tool from the My ITS project (Eiter, Krennwallner, and Schneider 2013; Eiter et al. 2015). The tool creates ABoxes with assertions for spatial relations like loc Next out of Open Street Map data, using parameters such as distance to create large sets of facts, in addition to other local data (e.g., crowd-sourced restaurant data). Then one can pose queries that need both parts of data, as well as ontological reasoning, to get answers (e.g., hotels in residential areas close to a subway station). Given the geospatial querying example in the introduction, My ITS is an relevant test case. Indeed, instead of creating large ABoxes, we may want to keep the access to spatial data remote. To simulate this scenario, we extracted some spatial relations and out-sourced their access via a SPARQL endpoint (using Jena Fuseki). This resulted in two sources: the local datasets with 227634 RDF triples, and a remote one with more than 2 million triples. We created 5 AOMQs based on test queries of Eiter et al. (2015), and treated spatial atoms as assumption patterns. In this way, we can query the local datasets and verify in the remote access point whether the spatial relations hold, only for the relevant candidates. Table 1 shows for these queries the sizes of rewritings w.r.t. T , and (T , H), and the size of cansmin, which gives a bound on the number of remote tests. We evaluated the time needed to answer the full rewriting over the local dataset,

the time to construct the set cansmin, and the time to test remotely the spatial atoms (using SPARQL ask queries). The results show that evaluating Rew(Q) and constructing cansmin(Q) is very efﬁcient, while testing the assumptions remotely was more expensive, as expected. In practice, this delay may be amortized in many cases, e.g., if many queries share remote tests. As a sanity check, we compared the total time needed by our approach to posing a federated SPARQL query (W3C 2013) using both data sources. The latter approach was slower, even despite the fact that we disregarded ontological reasoning; naively posing the result of TBox rewriting as a federated query seems infeasible.

Related Work Conditional answers were studied since the early nineties, e.g., to cope with incompleteness in disjunctive deductive databases (Demolombe 1992). They are related to the problem of evaluating queries in hypothetically updated states of databases (Gabbay et al. 1995; Christiansen and Andreasen 1998). Grifﬁn and Hull (1997) rewrite hypothetical queries into equivalent usual queries which are evaluated using standard techniques. Recently, ten Cate et al. (2015) deﬁne socalled why-not queries, where an ontology is leveraged to obtain explanations of why tuples are not an answer, and study the complexity of obtaining most general explanations. We consider the shape of the explanations to be part of the input, while focusing on preserving worst-case (data) complexity. This work is most in line with the work of Calvanese et al. (2013), where negative answers are employed for describing a tuple of individuals and an associated explanation to why it is not an answer to the given query. Explanations there are ABoxes with assertions over concept and role names, while here this is generalized to sets of ELI atoms. Additionally, we allow for closed predicates. Closed predicates are very useful to cope with partial incompleteness, but they can increase dramatically the complexity of reasoning. So far most results were negative, even for lightweight DLs (Lutz, Seylan, and Wolter 2013; 2015; Ngo, Ortiz, and ˇSimkus 2016).

Conclusions We have introduced AOMQs, which are extensions of OMQs with assumption patterns, designed for leveraging information when querying incomplete databases. Answering AOMQs consists of computing conditional answers, which generally extend the answers of OMQs with tuples that are made true by the assumptions. In the case of DL-Lite R

AOMQs, they remain FO-rewritable even in the presence of closed predicates. A simple prototype for constructing and testing minimal conditional answers shows promising results, and suggests that this approach may be useful in scenarios when answering OMQs over all relevant data at once is costly or infeasible. For future work, it would be interesting to consider different shapes of assumption patterns, and to extend the evaluation with closed predicates.

Acknowledgments. This work was supported by the Austrian Science Fund (FWF) projects P30360, P30873 and W1255.

References Arenas, M., and Bertossi, L. 2002. Hypothetical temporal reasoning in databases. Journal of Intelligent Information Systems 19(2):231 259. Artale, A.; Calvanese, D.; Kontchakov, R.; and Zakharyaschev, M. 2009. The dl-lite family and relations. J. Artif. Int. Res. 36(1):1 69. Baader, F.; Calvanese, D.; Mc Guinness, D. L.; Nardi, D.; and Patel-Schneider, P. F., eds. 2003. The Description Logic Handbook: Theory, Implementation, and Applications. New York, NY, USA: Cambridge University Press. Bienvenu, M., and Ortiz, M. 2015. Ontology-Mediated Query Answering with Data-Tractable Description Logics. Cham: Springer International Publishing. 218 307. Bienvenu, M.; Kikot, S.; Kontchakov, R.; Podolskii, V. V.; and Zakharyaschev, M. 2018. Ontology-mediated queries: Combined complexity and succinctness of rewritings via circuit complexity. J. ACM 65(5):28:1 28:51. Calvanese, D.; De Giacomo, G.; Lembo, D.; Lenzerini, M.; and Rosati, R. 2007. Tractable reasoning and efﬁcient query answering in description logics: The DL-Lite family. J. Autom. Reasoning 39(3):385 429. Calvanese, D.; Ortiz, M.; ˇSimkus, M.; and Stefanoni, G. 2013. Reasoning about explanations for negative query answers in dl-lite. J. Artif. Intell. Res. 48:635 669. Christiansen, H., and Andreasen, T. 1998. A practical approach to hypothetical database queries. In Freitag, B.; Decker, H.; Kifer, M.; and Voronkov, A., eds., Transactions and Change in Logic Databases, 340 355. Berlin, Heidelberg: Springer Berlin Heidelberg. Cruz-Filipe, L.; Gaspar, G.; and Nunes, I. 2019. Hypothetical answers to continuous queries over data streams. Co RR abs/1905.09610. Demolombe, R. 1992. A strategy for the computation of conditional answers. In ECAI, 134 138. Demolombe, R. 1998. Answers about validity and completeness of data: Formal deﬁnitions, usefulness and computation technique. In Andreasen, T.; Christiansen, H.; and Larsen, H. L., eds., Flexible Query Answering Systems, 138 147. Springer Berlin Heidelberg. Eiter, T.; Pan, J. Z.; Schneider, P.; ˇSimkus, M.; and Xiao, G. 2015. A rule-based framework for creating instance data from openstreetmap. In RR, volume 9209 of Lecture Notes in Computer Science, 93 104. Springer.

Eiter, T.; Krennwallner, T.; and Schneider, P. 2013. Lightweight spatial conjunctive query answering using keywords. In The Semantic Web: Semantics and Big Data, 243 258. Springer Berlin Heidelberg. Franconi, E.; Ib a nez-Garc ıa, Y. A.; and Seylan, I. 2011. Query answering with dboxes is hard. Electr. Notes Theor. Comput. Sci. 278:71 84. Gabbay, D.; Giordano, L.; Martelli, A.; and Olivetti, N. 1995. Hypothetical updates, priority and inconsistency in a logic programming language. In Logic Programming and Nonmonotonic Reasoning, 203 216. Springer Berlin Heidelberg. Gottlob, G.; Kikot, S.; Kontchakov, R.; Podolskii, V. V.; Schwentick, T.; and Zakharyaschev, M. 2014. The price of query rewriting in ontology-based data access. Artif. Intell. 213:42 59. Grifﬁn, T., and Hull, R. 1997. A framework for implementing hypothetical queries. SIGMOD Rec. 26(2):231 242. Lutz, C.; Seylan, I.; and Wolter, F. 2013. Ontologybased data access with closed predicates is inherently intractable(sometimes). In IJCAI, 1024 1030. IJCAI/AAAI. Lutz, C.; Seylan, I.; and Wolter, F. 2015. Ontology-mediated queries with closed predicates. In IJCAI, 3120 3126. AAAI Press. Ngo, N.; Ortiz, M.; and ˇSimkus, M. 2016. Closed predicates in description logics: Results on combined complexity. In KR, 237 246. AAAI Press. ten Cate, B.; Civili, C.; Sherkhonov, E.; and Tan, W. 2015. High-level why-not explanations using ontologies. In PODS, 31 43. ACM. W3C. 2013. Sparql 1.1 federated query. w3c recommendation 21 march 2013. https://www.w3.org/TR/sparql11federated-query/. Xiao, G.; Calvanese, D.; Kontchakov, R.; Lembo, D.; Poggi, A.; Rosati, R.; and Zakharyaschev, M. 2018. Ontologybased data access: A survey. In IJCAI, 5511 5519. ijcai.org. Zhang, Y.; Chen, H.; Sheng, H.; and Wu, Z. 2007. Applying hypothetical queries to e-commerce systems to support reservation and personal preferences. In Desai, B. C., and Barker, K., eds., Eleventh International Database Engineering and Applications Symposium (IDEAS 2007), September 6-8, 2007, Banff, Alberta, Canada, 46 53. IEEE Computer Society.