# relaxing_and_restraining_queries_for_obda__f90c01db.pdf

The Thirty-Third AAAI Conference on Artiﬁcial Intelligence (AAAI-19)

Relaxing and Restraining Queries for OBDA

Medina Andres el, Yazm ın Ib a nez-Garc ıa, Magdalena Ortiz, Mantas ˇSimkus {andresel,ibanez,ortiz}@kr.tuwien.ac.at | simkus@dbai.tuwien.ac.at Institute of Logic and Computation, TU Wien, Austria

We advocate the use of ontologies for relaxing and restraining queries, so that they retrieve either more or less answers, enabling the exploration of a given dataset. We propose a set of rewriting rules to relax and restrain conjunctive queries (CQs) over datasets mediated by an ontology written in a dialect of DL-Lite with complex role inclusions (CRIs). The addition of CRI enables the representation of knowledge about data involving ordered hierarchies of categories, in the style of multi-dimensional data models. Although CRIs in general destroy the ﬁrst-order rewritability of CQs, we identify settings in which CQs remain rewritable.

Introduction In Ontology-based data access (OBDA) an ontology provides a conceptual view of a collection of data sources, and can describe knowledge about the domain of interest at a high level of abstraction, using a familiar vocabulary (Poggi et al. 2008). An advantage of having an ontology is that the knowledge can be leveraged to retrieve more complete answers from incomplete data. For example, we consider the following dataset about cultural events and their locations

Ae = {Concert(c1), Cultur Evt(ev1), Exhibition(ex1), Venue(St Opera), Venue(Vk Theater), City(Vienna), Country(Austria), occ In(c1, St Opera), occ In(ex1, Vienna), occ In(ev1, Austria), loc In(St Opera, Vienna), loc In(Vk Theater, Vienna), loc In(Vienna, Austria)}

and the following ontology that, among other knowledge, says that concerts and exhibitions are cultural events

Te = { Concert Cultur Evt, Country Location, Exhibition Cultur Evt, City Location, Cultur Evt Event, Venue Location,

occ In Event, occ In Location}. Using this knowledge, all cultural events (ex1, ev1 and c1) can be retrieved with a simple conjunctive query (CQ):

q(x) Cultur Evt(x).

In OBDA, ontologies are often written in the Description Logics (DLs) of the DL-Lite family (Calvanese et al. 2007).

Copyright c 2019, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

These DLs are tailored towards FO-rewritability of CQs. This means that evaluating a query q over a dataset A mediated by an ontology T can be reduced to evaluating a query q T , incorporating knowledge from T , over A alone, which amounts to standard query evaluation in relational databases. In our example, a rewriting of q is the following query

q Te(x) Cultur Evt(x) Exhibition(x) Concert(x).

In this paper, we advocate a novel use of ontologies complementary to OBDA query answering. Namely, we use ontologies to modify queries, relaxing or restraining them, so that they can retrieve either more or less answers over a given dataset. This can help users reﬂect their information needs and ﬂexibly explore datasets. We build on the observation that query restrictions can be obtained using the standard DL-Lite rewriting rules (Calvanese et al. 2007). For example, qc(x) Concert(x) is a restriction of q, and it occurs as a disjunct in its rewriting q Te. Moreover, these rewriting rules have natural counterparts that produce relaxations. In our running example, if answers to a query for concerts are too scarce, one might get more answers by relaxing it to one asking for all cultural events. Conversely, if a query for cultural events produces too many answers, it is possible to restrict it to events of a speciﬁc type, for instance concerts. Notably, there are intuitive answers and reformulations that cannot be produced with the standard DL-Lite rewriting rules and their counterparts. For example, consider a query retrieving concerts occurring in Vienna:

q2(x) Concert(x), occ In(x, y), y = Vienna.

In the presence of standard DL-Lite ontologies, there are no answers to q2 when evaluated over (Te, Ae), although c1 may be considered an answer to q2 according to the intuition that if an event occurs in a venue located in a city, then it occurs in that city. In order to capture this kind of knowledge, we propose to extend the expressive power of DL-Lite with complex role inclusions (CRIs). For instance, adding

occ In loc In occ In (1)

to our example captures the intuition above, and makes c1 an answer of q2. The addition of CRIs enables DL-Lite to leverage hierarchical knowledge not captured by subclass relations. Indeed, venues, cities, and countries can be seen as different levels of a dimension we can call Location. Similarly, a Time dimension could include days, months, and

years, while the physical parts of complex objects may be hierarchically ordered along a Component dimension. Dimensions lie at the core of the so-called multi-dimensional data model (Hurtado and Mendelzon 2002) used for storing and accessing data at different granularity levels. We show that the addition of CRIs enables DL-Lite to leverage dimensional knowledge. Unfortunately, CRIs in DLs are computationally costly: unrestricted they easily lead to undecidability (Horrocks and Sattler 2004), and critically for DL-Lite, even one ﬁxed CRI destroys FO-rewritability of CQs. For this reason we devote a section to deﬁning an expressive setting that supports CRIs and enjoys FO-rewritability. Along with the addition of CRIs, we propose a set of reformulation rules operating not only along the subclass and subrole relations, but also along CRIs. For example we can use (1) to reformulate the query

q3(x) Concert(x), occ In(x, y), City(y),

from all concerts occurring in a city, to those occurring in some more speciﬁc location:

q 3(x) Concert(x), occ In(x, z), loc In(z, y), City(y).

We propose another set of rules that not only use the knowledge from the ontology, but also use instances of concepts and relations, as well as inclusions between concepts guaranteed to hold in the current dataset. For example, we can restrict q2 to ask for concerts in the State Opera, or relax it to concerts in Austria. Such reformulations are similar to the drilling down and rolling up operations used for navigating along a dimension. Note that these reformulations are not data independent, but instead rely on the current dataset Ae.

Preliminaries As usual, NC, NR, and NI are countable inﬁnite alphabets of concept, role, and individual names, respectively. In what follows, we will use A, A to denote elements in NC, s, p, p , elements in NR and a, b, elements in NI. In DL-Lite H(Artale et al. 2009), concepts B are built according to the grammar

B := A | r; r := p | p ,

where p is called an inverse role. The set of roles is deﬁned as NR = NR {p | p NR}. We assume w.l.o.g. that a DL-Lite HTBox (or ontology) T is a ﬁnite set of concept inclusion axioms taking the following normal form A A , A p, p A, p s, p s , together with a set of disjointness axioms of the form disj(A, A ), and disj(p, p ). An ABox (or dataset) A is a ﬁnite set of assertions A(a), and p(a, b). We denote the set of individuals occurring in A as ind(A). A knowledge base (KB) is a pair K = (T , A). The semantics is deﬁned in terms of interpretations I = ( I, I) consisting of a non-empty domain I and an interpretation function I, that complies with the standard name assumption i.e., a I = a for every individual. Satisfaction is deﬁned as usual. For K = (T , A) we write I |= K if I satisﬁes every axiom in T and every assertion in A, and in that case we say that I is a model of K. We say K is satisﬁable if it has a model. We consider the class of conjunctive queries and unions thereof. A term is either an individual name or a variable.

A conjunctive query (CQ) is a ﬁrst order formula with free variables x and existential variables y that takes the form q( x) ϕ( x, y), with ϕ a conjunction of atoms of the form A(x), r(x, y), and t = t , where t, t range over terms. Instance queries are CQs with exactly one atom and no existential variables. The terms occurring in q are denoted terms(q), and the variables vars(q). The free variables x of a query are called answer variables. Let I be an interpretation, q( x) a CQ. An answer to q in I is a tuple a of elements from I of length | x| such that there is a map π : terms(q) 7 I satisfying (i) π( x) = a, (ii) π(b) = b for each individual b, (iii) I |= P(π( z)) for each atom P( z) in q, and (iv) π(t) = π(t ) for each atom t = t in q, and in that case we write I |= q( a). The map π is called a match for q in I. The certain answers of q( x) over A w.r.t. T are denoted cert(q, T , A) and deﬁned as set of the tuples a such that I |= q( a) for every model I of (T , A). When we talk about the computational complexity of query answering, we mean the following decision problem: given a KB K, a query q, and a tuple of individuals a, check whether a cert(q, T , A).

DL-Lite with Complex Role Inclusions

In this section, we study an extension of DL-Lite H with complex role inclusions, which are critically important in our approach to ontology-based relaxing and restraining of queries. A complex role inclusion (CRI) is an expression of the form r s t, with r, s, t N R . An interpretation I = ( I, I) satisﬁes r s t if (d1, d2) r I, (d2, d3) s I imply (d1, d3) t I, for all d1, d2, d3 I. The addition of CRIs to DLs may lead to undecidability of reasoning (Horrocks and Sattler 2004), hence syntactic conditions such as regularity (Kazakov 2010) are often needed. In what follows, we assume a set NRs NR of simple roles closed w.r.t. inverses (i.e. s NRs implies s NRs); each r NR \ NRs is a non-simple role. We extend DL-Lite H with CRIs as follows: Deﬁnition 1 (DL-Lite HR). A DL-Lite HR TBox T is a DL-Lite H TBox that may also contain CRIs, and satisﬁes s NRs and t NR \ NRs, for every r s t T ; if s t T and t NRs, then s NRs.

Properties such as FO-rewritability are affected by CRIs. Indeed, using a single CRI r s r, it is possible to express that r corresponds to the transitive closure of role s on a given graph. Since KB satisﬁability in DL-Lite with transitive roles is NLOGSPACE-hard, the following follows easily. Lemma 1. (Artale et al. 2009) Answering instance queries in DL-Lite HR is NLOGSPACE-hard in data complexity, already for TBoxes consisting of the CRI r s r only. Therefore CQs are not FO-rewritable w.r.t. DL-Lite HR

TBoxes. Our goal next is to present a restricted DL-Lite HRbased setting that is expressive enough to capture the desired scenarios and that still supports FO-rewritability. To this end, we ﬁrst deﬁne a fragment of DL-Lite HR that disallows cyclic dependencies among roles in CRIs. This non-recursive fragment of DL-Lite HR is quite expressive

(e.g., KB satisﬁability is not tractable) and supports FOrewritability of CQs, yet it is not sufﬁcient to cover our motivating example that involves recursion. To overcome this, we carefully relax the non-recursiveness requirement so that desired cyclic dependencies in CRIs are allowed, obtaining recursion-safe DL-Lite HR. We then identify a sufﬁcient condition (over ABoxes) for eliminating recursive CRIs, return to the non-recursive setting, and regain FOrewritability. Moreover, we show that satisﬁability of such knowledge bases is tractable.

Non-recursive DL-Lite HR

We start by deﬁning a suitable notion of recursive CRIs. For a DL-Lite HR TBox T , the recursion graph GT of T is the directed graph that contains (i) a node v A for each concept name A in T , (ii) a node vr for each role name r in T , and (iii) there exists an edge from a node v P to a node v P whenever P occurs on the left-hand-side and P on the righthand-side of an axiom in T . A CRI t s r is recursive w.r.t. a TBox T if GT has a path from vt or vs to vr. In this case we also say that r is a recursive role in T .

Deﬁnition 2. A DL-Lite HR non-rec TBox is a DL-Lite HR TBox T without recursive CRIs. We now deﬁne query rewriting rules for non-recursive DL-Lite HR. For a CQ q, we denote by zq an arbitrary variable not occurring in q; we will use zq in the query rewriting rules through the rest of the paper. An atom substitution θ = [Γ1/Γ2] can be applied to q if Γ1 q and the effect is to replace atoms Γ1 with atoms Γ2 in q.

Deﬁnition 3. Let T be a DL-Lite HR non-rec TBox. For CQs q, q , we write q T q whenever q is obtained by B1 replacing x by y in q, for x, y vars(q) or by applying an atom substitution θ to q, as follows: S1 θ = [A2(x)/A1(x)], if A1 A2 T and A2(x) q; S2 θ = [r(x, y)/A(x)], if A r T , r(x, y) q and y is a non-answer variable occurring only once in q; S3 θ = [A(x)/r(x, zq)], if r A T and A(x) q; S4 θ = [s(x, y)/r(x, y)], if r s T and s(x, y) q; S5 θ = [s(x, y)/r(y, x)], if r s T and s(x, y) q; S6 θ = [r(x, y)/{t(x, zq), s(zq, y)}], if t s r T and r(x, y) q;

By q T q we denote the reﬂexive, transitive closure of q T q . The rewriting of q w.r.t. T is the set rew(q, T ) of all queries (modulo isomorphisms) q such that q T q . Moreover, the absence of recursive CRIs in T ensures that rew(q, T ) is ﬁnite and can be effectively computed.

Lemma 2. For a DL-Lite HR non-rec TBox T and CQ q, the size of each q rew(q, T ) is bounded by a polynomial, and can be computed in polynomial time, in the size of T and q. We can now show FO-rewritability of CQs in DL-Lite HR non-rec.

Theorem 1. Let T be a DL-Lite HR non-rec TBox, q a CQ. For every ABox A such that (T , A) is satisﬁable, we have:

cert(q, T , A) = [

q rew(q,T ) cert(q , , A).

Surprisingly, unlike the extension with transitive roles, KB satisﬁability for non-recursive CRIs is intractable.

Theorem 2. (i) Satisﬁability of DL-Lite HR non-rec KBs is CONP-complete for combined complexity, and (ii) CQ answering over consistent DL-Lite HR non-rec KBs is in AC0 for data, and NP-complete for combined complexity.

For (i), the upper bound is obtained similarly as for standard DL-Lite: KB satisﬁability can be reduced to UCQ answering, using a CQ qα for testing whether each disjointness axiom α is violated. By Lemma 2 and Theorem 1, an NP procedure can guess one such qα, guess a q α in its rewriting, and evaluate q α over A. The lower bound can be shown by a reduction of the complement of 3SAT to KB satisﬁability. For (ii), data complexity follows from Theorem 1, while for combined complexity, NP-hardness is inherited from CQ evaluation in relational databases. An NP procedure for answering q over A w.r.t. T can guess some q rew(q, T ) (which is polynomial in q and T ) and a map π : vars(q ) ind(A), and then verify whether π is a match of q in A, which can be done in polynomial time.

Recursion-safe DL-Lite HR

In DL-Lite HR non-rec we cannot express CRIs like the one in our motivating example. To overcome this, we introduce an FOrewritable fragment of DL-Lite HR able to express certain kind of recursive CRIs.

Deﬁnition 4 (DL-Lite HR rec-safe). A recursion safe DL-Lite HR

TBox is a TBox T where every CRI r1 s r2 T satisﬁes the following conditions.

If r2 is a recursive role, then every cycle in GT containing r2 has length at most one, and r1 = r2. There is no axiom of the form B t T with t s T s or t s T s , where s T denotes the reﬂexive and transitive closure of s1 s2 T with s2 NRs.

The ﬁrst condition restricts recursion to a simple form; we show later that this form of recursion can be eliminated when the ABox satisﬁes certain conditions. The second, on the other hand ensures that every CRI is guarded by a simple role that is not existentially implied. Thus, for query answering, we can assume that only ABox individuals are connected by these guarding simple roles, and thus edges in the extension of recursive roles produced by CRIs will always contain at least one individual. Note, for example, that Te extended with (1) is recursion safe, since the simple role loc In in the CRI is not existentially implied by other axioms in Te. In contrast to non-recursive DL-Lite HR, in combined complexity KB satisﬁability and instance query answering are tractable for DL-Lite HR rec-safe KBs.

Theorem 3. KB satisﬁability and answering instance queries in DL-Lite HR rec-safe are in PTIME for combined complexity.

The proof of the theorem above relies on the construction of a particular canonical model. Let K = (T , A) be a DL-Lite HR rec-safe KB. We deﬁne an interpretation ET ,A with

domain ET ,A = D0 D1 D2, where D0 = ind(A), D1 = {car | a D0, B r T }, D2 = {cr | B r T }, and such that each concept, respectively each role name in K is interpreted as the minimal subset of ET ,A, respectively of ET ,A ET ,A, such that for all concepts A, B and all roles r, r1, r2, s, t in K, the following conditions hold: If A(d) A then d AET ,A, and if r(d, d ) A then (d, d ) r ET ,A. If B r T , then (d, car) r ET ,A if d BET ,A D0, and (d, cr) r ET ,A if d BET ,A (D1 D2). If B A T and d BET ,A then d AET ,A. If r1 r2 T and (d, d ) r ET ,A 1 then (d, d ) r ET ,A 2 . If r1 r 2 T and (d, d ) r ET ,A 1 then (d , d) r ET ,A 2 . If r s t T and (d, d ) r ET ,A and (d , d ) s ET ,A then (d, d ) t ET ,A. It can be readily veriﬁed that ET ,A is of polynomial size in K, and that whenever K is satisﬁable then ET ,A |= K. Moreover, canonicity of ET ,A is given by the following. Claim 1. For any (satisﬁable) DL-Lite HR rec-safe KB (T , A) and any instance query q, cert(q, T , A) = ans(q, ET ,A).

Recall that by Lemma 1 DL-Lite HR rec-safe TBoxes are not FO-rewritable in general. However, we will show that recursive CRIs can be eliminated provided that they are only relevant on paths of bounded length in models of K. We formalize this intuition next. Deﬁnition 5 (k-bounded ABox). Let T be a DL-Lite HR

TBox. For S a set of simple roles, an S-path of length n between a and b in ind(A) w.r.t. T is a sequence of different pairs of individuals (d0, d1), (d1, d2), . . . , (dn 1, dn) such that d0 = a, dn = b and for each (di, di+1), 0 i < n, there exists s (di, di+1) A such that s s T s for some s S. Given an ABox A and some k 0, we say that A is kbounded for T if for each Sr = {s | r s r T } there is no Sr-path of length larger than k in A. If the given ABox is k-bounded for the given k, recursive CRIs can be unfolded into non-recursive ones. Therefore, queries can be rewritten using a TBox in which all recursive CRIs have been unfolded. Deﬁnition 6 (k-unfolding, k-rewriting). For an arbitrary DL-Lite HR rec-safe TBox T , and ﬁxed k 0, a k-unfolding of T is a DL-Lite HR non-rec TBox Tk obtained by replacing each r s r T with the axioms rj 1 s rj r r0 rj ˆr (1 j k), where ˆr and rj are fresh role names. For a CQ q, let ˆq be the query obtained from q by replacing, for every r s r T , each r(x, y) q by ˆr(x, y). For k-bounded ABoxes, rew(ˆq, Tk) is an FO-rewriting of q. Theorem 4. Let T be a DL-Lite HR rec-safe TBox, Tk a kunfolding of T , for some k 0, and q a CQ over the signature of T . Then, for every k-bounded ABox A:

cert(q, T , A) = [

q rew(ˆq,Tk) cert(q , , A).

Query Reformulations In this section we propose two sets of rules for relaxing and restraining queries. The ﬁrst one uses axioms in the ontology to guide the reformulation. Essentially, these are based on the usual rules for query rewriting and on suitable counterparts, resulting in restrictions and relaxations, respectively. The second set of rules are analogous but use dependencies that hold for a given dataset instead of axioms in the ontology. The goal of these rules is to provide a simple approach for reformulating queries, which is intuitive, computationally inexpensive, and that can leverage multidimensional knowledge.

Ontology-based Reformulations

The query rewriting rules B1 and S1-S6 for DL-Lite HR non-rec produce restrainings of a given query in the sense that the answers of the resulting query are necessarily contained in the answers of the original one.

Deﬁnition 7. Let T be a DL-Lite HR TBox. Given a pair of CQs q, q , we write q s T q if q T q , and call q a restraining of q w.r.t. T .

These reformulations are ontology-based because they depend on the axioms of T only, and they are restrainings for every dataset mediated by T .

Proposition 1. Let T be a DL-Lite HR TBox. For any two CQs such that q1 s T q2 and every ABox A, we have that cert(q2, T , A) cert(q1, T , A).

Example 1. Let Te be as above. For the following queries

q(x) Cultur Evt(x), occ In(x, y), City(y) q1(x) Concert(x), occ In(x, y), City(y) q2(x) Concert(x), occ In(x, z), loc In(z, y), City(y).

it holds that q s Teq1 s Teq2 since by applying S1 using Concert Cultur Evt Te we obtain q1, and further by applying S6 using (1) we obtain q2.

We have seen that the query rewriting rules that apply the axioms in a right-to-left fashion provide natural means to restrain queries. The natural next step is to deﬁne analogous rules that use the axioms in a left-to-right fashion to relax queries. Note that in the next deﬁnition, rules G1 G6 are, essentially, the dual of rules S1 S6, while rule R1 simply allows us to relax a query by dropping an atom.

Deﬁnition 8. Let T be a DL-Lite HR TBox. For CQs q, q , we write q g T q whenever q is obtained from q by

R1 removing an atom x = a or an atom A(x) with x a non-answer variable,

or by applying an atom substitution θ as follows:

G1 θ = [A1(x)/A2(x)], if A1 A2 T and A1(x) q; G2 θ = [A(x)/r(x, zq)], if A r T and A(x) q; G3 θ = [r(x, y)/A(x)], if r A T , r(x, y) q and y is a non-answer variable occurring only once in q; G4 θ = [r(x, y)/s(x, y)], if r s T and r(x, y) q; G5 θ = [r(x, y)/s(y, x)], if r s T and r(x, y) q;

G6 θ = [{t(x, y), s(y, z)}/r(x, z)], if t s r T , t(x, y), s(y, z) q and y is a non-answer variable that does not occur elsewhere in q;

We call q a query relaxation of q w.r.t T whenever q g T q . Example 2. Let Te be as above, and take the queries

q(x) Concert(x), occ In(x, y), loc In(y, z), z = Vienna, q1(x) Cultur Evt(x), occ In(x, y), loc In(y, z), z = Vienna; q2(x) Cultur Evt(x), occ In(x, z), z = Vienna.

it holds that q g Teq1 g Teq2, since by applying G1 using Concert Cultur Evt Te we obtain q1 and further by applying G6 using (1) we get q2.

The following result is the analogous of Proposition 1.

Proposition 2. Let T be a DL-Lite HR TBox. For any two CQs, such that q1 g T q2, and every ABox A, we have that cert(q1, T , A) cert(q2, T , A).

Data-dependent Query Reformulations The query reformulation rules above might miss interesting reformulations. For instance, in our running example, a query relaxation from concerts in Vienna, to concerts at some location in Austria, or a restraining to concerts at the State Opera would be meaningful. For this reason, we deﬁne data-dependent query refomulations that assume a ﬁxed dataset, and use assertions and dependencies that hold for it as if they were axioms in the ontology. In our running example, since a quick inspection at Ae tells us that every existing venue is located in a city, we could use this to relax the query

q(x) Event(x), occ In(x, y), Venue(y) into

q (x) Event(x), occ In(x, y), loc In(y, z), City(z).

We note that such a reformulation could be done using rule S2, if we had an inclusion Venue loc In.City in the TBox. But if we do not have such an axiom and it is not possible or desirable to add it, the rules below rely on

cert(Venue(x), Te, Ae) cert( loc In.City(x), Te, Ae)

to enable such a reformulation. We deﬁne ﬁrst data-dependent restraining rules of two kinds. First we have rules that use entailed assertions. For example, if K |= A(a) and q contains an atom A(x), then we can restrain q by equating x and a, as done by rule (SA1). Similarly with role assertions: if K |= r(a, b) and r(x, y) is in q, with (SA2) we can equate x to a, or y to b. We can also use K |= r(a, b) to replace r(x, y), y = b by x = a if y is a non-answer variable that does not occur elsewhere in q; see (SA3). The second kind of rules (SD1 SD5) use dependencies of the form cert(q1, T , A) cert(q2, T , A), written q1 (T ,A) q2 for short. They are similar to the rules in the previous section, but we may now replace B(x) by A(x) not only when A B is in T , but also when the weaker condition A(x) K B(x) holds. Such replacements are also possible for some more complex pairs of atoms. For example, if r.B(x) K A(x), then A(x) can be replaced by

r(x, y), B(y) to restrain q. This would be similar to having in Deﬁnition 3 a rule for (non-DL-Lite) axioms r.B A. The relaxation rules are dual to the restraining ones. For example, (GA1) is dual to (SA1): if the query is equating some variable x to a and a is an instance of A, then we can replace x = a by A(x), thus allowing x to be any instance of A, rather than just a. Note that there is no dual to (SA2) since it would simply need to drop x = a or y = b from q, but this does not depend on the data and is captured by (R1).

Deﬁnition 9. Let K = (T , A) be a DL-Lite HR KB. For CQs q, q , we write q s Kq if q s T q or q is obtained from q using atom substitution θ as follows:

SA1 θ = [ /x = a], if A(x) q and K |= A(a); SA2 θ = [ /x = a] or θ = [ /y = b], if r(x, y) q and K |= r(a, b); SA3 θ = [{r(x, y), y = b}/x = a], if r(x, y), y = b q, y does not occur elsewhere in q and K r(a, b);

SD1 θ = [A2(x)/A1(x)], if A1(x) K A2(x) and A2(x) q; SD2 θ = [{r(x, y), A (y)}/A(x)], if A(x) K r.A (x), and r(x, y), A (y) q such that y does not occur elsewhere in q; SD3 θ = [A(x)/{r(x, zq), A (zq)}], if r.A (x) K A(x) and A(x) q; SD4 θ = [s(x, y)/r(x, y)], if r(x, y) K s(x, y) and s(x, y) q; SD5 θ = [{r(x, y), A(y)}/{p(x, y), A (y)}], if r(x, y), A(y) q, y does not occur elsewhere in q and p.A (x) K r.A(x).

Further, we write q g Kq if q g T q or q is obtained from q using atom substitution θ as follows: GA1 θ = [x = a/A(x)], if x = a q and K |= A(a); GA3 θ = [x = a/{r(x, y), y = b}], if x = a q and K |= r(a, b);

GD1 θ = [A1(x)/A2(x)], if A1(x) K A2(x) and A1(x) q; GD2 θ = [A(x)/{r(x, zq), A (zq)}], if A(x) K r.A (x) and A(x) q; GD3 θ = [{r(x, y), A (y)}/A(x)], if r.A (x) K A(x) and r(x, y), A (y) q such that y does not occur elsewhere in q; GD4 θ = [r(x, y)/s(x, y)], if r(x, y) K s(x, y) and r(x, y) q; GD5 θ = [{p(x, y), A (y)}/{r(x, y), A(y)}], if p(x, y), A (y) q, y does not occur elsewhere in q, and p.A (x) K r.A(x). Example 3. Let Ke = (Te, Ae) be as above. For CQs

q(x) Concert(x), occ In(x, y), y = Vienna, q1(x) Concert(x), occ In(x, y), loc In(y, z), z = Austria, q2(x) Concert(x), occ In(x, y), loc In(y, z), Country(z), q3(x) Concert(x), occ In(x, y), City(y),

we have that q g Keq1 g Keq2 since q1 is obtained by applying GA3 to q using Ke |= loc In(Vienna, Austria) and

Country Austria

City Vienna

Venue St Opera Volks Theater

Figure 1: Dimension Location.

q2 is obtained from q1 by applying GA1 using Ke Country(Austria). We can now choose to restrain q2 and obtain q3 by applying SD2 on q2 using City(x) Ke loc In.Country(x), hence q2 s Keq3. Note that q3 can also be obtained as a relaxation of q with GA1 and Ke City(Vienna), thus q g Keq3 also holds. Our data-driven rules indeed relax and restrain queries when evaluated over (T , A), but not for any arbitrary ABox.

Proposition 3. For CQs q1, q2 and DL-Lite HR KB (T , A): (g) q1 g Kq2 implies cert(q1, T , A) cert(q2, T , A), and (s) q1 s Kq2 implies cert(q2, T , A) cert(q1, T , A).

Modeling Multi-dimensional Data

In this section we show that recursion safe DL-Lite HR together with k-bounded ABoxes, are well-suited to describe this kind of multi-dimensional knowledge in the setting of OBDA. In the multi-dimensional data model (Hurtado and Mendelzon 2002), the data-schema is usually formalized as a set of dimensions, comprising a ﬁnite set of categories and a partial order between them, sometimes called childparent relation. The model also considers a representation at the instance level that deﬁnes members for each category, and a child-parent relation between members of connected categories. In Figure 1 a dimension and instance of some Location hierarchy are illustrated, which makes use of concepts from Te as categories. The dashed arrows show the order between categories, rectangles contain members for each category and solid arrows represent the role loc In. We deﬁne order constraints to encode dimensions. Deﬁnition 10. For a simple role s, an order constraint (along s) takes the form ord(s, A, ), with A NC ﬁnite, and a strict partial order over A. An interpretation I satisﬁes ord(s, A, ) if

A1,A2 A (AI 1 AI 2), s I [

A1 A2 (AI 1 AI 2) = .

Intuitively, whenever ord(s, A, ) is satisﬁed in I, all objects connected by s are instances of A-concepts, in a way that is compliant with the order . Further, s-paths connecting instances of concepts in A which are incomparable w.r.t. will be disallowed in I. The latter is important as otherwise one cannot guarantee k-boundedness. Example 4. The Location dimension from Figure 1 is captured by Ke = (Te, Ae) and the order constraint

c = ord(loc In, {Venue, City, Country}, ) (2)

with Venue City Country. In each model of Ke satisfying c, the role loc In only connects instances of Venue with those of City or Country, and instances of City only with those of Country, thus capturing the intended semantics of the dimension.

Deﬁnition 11. A multi-dimensional KB is a triple (T , A, C) where (T , A) is a recursion safe DL-Lite HR KB and C a set of order constraints.

We will show that for multi-dimensional KBs providing certain guarantees w.r.t. C, there is a k that ensures kboundedness and allows construction of a k-unfolding of T .

Deﬁnition 12. C covers a role r in T if there exists a partial order (A, ) such that for every role s in the set {s | r s r T }, ord(s, A , ) C for some A A . We say that C covers T , if it covers every role r in T . Further, (T , A) is C-admissible if ET ,A satisﬁes each c C.

For example, the singleton set containing the order constraint c of Example 4 covers Te, and Ke is {c}-admissible since ETe,Ae satisﬁes c.

Lemma 3. Let (T , A) be a recursion-safe DL-Lite HR KB, and let C be a set of order constraints covering T . If (T , A) is C-admissible, then A is ℓ(C)-bounded for T , where ℓ(C) := max{|A| | ord(s, A, ) C}.

Lemma 3 together with Theorem 4 yield FO-rewritability of queries over multi-dimensional KBs.

Theorem 5. Let T be a recursion safe DL-Lite HR TBox, C a set of order constraints that covers T , and q a CQ. Let Q be the ℓ(C)-rewriting of q w.r.t. T . Then, for each ABox A such that (T , A) is C-admissible,

cert(q, T , A) = [

q Q cert(q , , A).

Finally, we note that C-admissibility amounts to evaluating simple queries on ET ,A. This can be done in time that is polynomial in C, T , and A. Moreover, although testing Cadmissibility is data dependent, once it is established, FOrewritability is guaranteed for any CQ.

Proposition 4. Checking C-admissibility for recursion safe DL-Lite HR KBs is feasible in polynomial time in combined complexity.

Reformulations for Dimensional Navigation The multi-dimensional data model enables data navigation along different axes given by the dimensions, similarly to points in a multi-dimensional space. This view lies at the core of OLAP and similar data analytic applications. For instance, using dimension Location, we can navigate from events occurring in some city to those occurring in some country, or in some venue. This navigation mechanism allows users to either zoom in or zoom out on the particular data, and it is usually realized by the so-called drill-down and and roll-up operators. In our setting we can deﬁne similar navigation mechanisms for multi-dimensional KBs. We do this by means of queries containing an entry point to some dimension,

represented by some order constraint. A CQ q refers to (T , A, C) if there are ord(s, A, ) C, r s r T and A A such that one of the following conditions is satisﬁed 1. {r(x, y), A(y)} q or 2. {r(x, y), y = a} q and (T , A) A(a) where y is a non-answer variable which does not occur elsewhere in q. We now deﬁne our version of the roll-up and drill-down operators. We restrict their application to coherent KBs where paths along the dimension exist. A multi-dimensional KB (T , A, C) is coherent if C covers T , (T , A) is Cadmissible and for each ord(s, A, ) C and each A A such that there is some A A with A A we have A(x) K s(x).

Deﬁnition 13 (Roll-up, drill-down). Let (T , A, C) be a coherent multi-dimensional KB, q a CQ that refers to (T , A, C), and Γ q the set of atoms witnessing this. A roll-up of q w.r.t. Γ is a CQ obtained by applying a substitution θr on q, and a drill-down of q w.r.t. Γ is a CQ obtained by applying substitution θd on q, where θr, θd are as follows:

θr = [A(x)/B(x)], if A(x) Γ, A B in some c C; θr = [x = a/x = b], if x = a Γ and (T , A) s(a, b); θd = [A(x)/B(x)], if A(x) Γ, B A in some c C; θd = [x = a/x = b], if x = a Γ and (T , A) s(b, a).

Example 5. Consider the KB Ke and order constraint c be as in Example 4. We have that (Ke, {c}) is coherent. Now, consider the following set of queries:

q1(x) Event(x), occ In(x, y), City(y); qr 1(x) Event(x), occ In(x, y), Country(y);

qd 1(x) Event(x), occ In(x, y), Venue(y).

The fact that q1 refers to (Ke, {c}) is witnessed by the atoms Γ1 = {occ In(x, y), City(y)}. Since City Country in c, qr 1 is a roll-up of q1 w.r.t. Γ1, and since Venue City in c, qd 1 is a drill-down of q1. Now, consider the following queries:

q2(x) Event(x), occ In(x, y), y = Vienna; qr 2(x) Event(x), occ In(x, y), y = Austria;

qd 2(x) Event(x), occ In(x, y), y = St Opera.

The fact that q2 refers to (Ke, {c}) is witnessed by the atoms Γ2 = {occ In(x, y), y = Vienna} and using Ke |= loc In(Vienna, Austria) and Ke |= loc In(St Opera, Vienna) we get that qr 2 is a roll-up q2 w.r.t. Γ2 and qd 2 is a drill-down of q2 w.r.t. Γ2.

For coherent multi-dimensional KBs, the roll-up operation can be seen as a sequence of relaxing rules, and similarly, drill-down can be seen as a sequence of restrainings. Let q be a CQ that refers to a coherent (T , A, C). If qr is a roll-up of q w.r.t. {r(x, y), A(y)}, then qr can be obtained as follows: 1. Coherence guaranties that GD2 can be applied, obtaining query q by replacing A(y) q with s(y, z), B(z), where z is a fresh variable; due to C-admissibility it must be that A B.

2. Since q refers to (T , A, C) we have that r s r T and r(x, y) q with y not occuring elsewhere in q . Thus we can apply G6 on q to obtain qr. Likewise, if qr is a roll-up of q w.r.t. {r(x, y), y = a}, then qr can be obtained as follows: 1. Since q refers to (T , A, C) it must be that (T , A) A(a), for some A in some order constraint in C; coherence ensures that there exists b ind(A) such that (T , A) s(a, b), therefore we can apply GA3 to obtain query q by replacing y = a with s(y, z), z = b, where z is a fresh variable. 2. Next, again we can apply G6 as above and obtain qr. For drill-down queries, we apply the dual rules in the reverse order. From this observation, we obtain:

Proposition 5. Let (T , A, C) be a coherent multidimensional KB, q a CQ that refers to (T , A, C) and Γ q a subset of atoms witnessing this. We have that the following hold: (r) q (T ,A) qr for each roll-up qr of q w.r.t. Γ, and (d) qd (T ,A) q for each drill-down qd of q w.r.t. Γ.

Related Work Query reformulations based on similarity measures have been considered for RDF data and SPARQL queries by Reddy and Kumar (2010), Huang and Liu (2010) and Virgilio et. al (2013), whereas approaches using simple ontological knowledge for such task have been proposed by Hurtado, Poulovassilis and Wood (2008), Elbassuoni et. al (2011), Dolog et al. (2009) and Frosini et. al (2017). Relaxations of SQL queries in relational databases using concept taxonomies have been studied by Martinenghi and Torlone (2014). For cooperative KBs, Inoue and Wiese (2011) deﬁne relaxations of CQs following a principled logic-based approach. Interactive faceted search techniques implement effective drill-down for reﬁning queries (Roy et al. 2008; Kashyap, Hristidis, and Petropoulos 2010); in the context of RDF and knowledge graphs Arenas et. al (2016) and Sherkhonov et. al (2017) addressed theoretical underpinnings of faceted search for data/ontolgy exploration. Query evaluation minimizing data access in an OBDA under generalization/specialization relations is studied by Andresel, Ortiz and ˇSimkus (2016). Logic-based formalizations of the multi-dimensional data model of Hurtado and Mendelzon (2002) have been proposed in the literature. Franconi and Sattler (1999), and Franconi and Kamble (2004) use DLs for modeling and reasoning about multi-dimensional data without considering querying, while Bertossi and Milani (2018) rely on an expressive fragment of Datalog to capture dimensional knowledge, although at the expense of higher complexity (i.e., not FO-rewritable). Query answering using CRIs is supported in ontology mediated settings where the DL has CRIs, or the query language contains conjunctive regular path queries (see (Ortiz 2013) and (Ortiz and ˇSimkus 2012) for references). However, query answering in all those settings is necessarily NLOGSPACE-hard in data complexity, and usually PSPACEhard in combined complexity even for lightweight DLs (Bi-

envenu, Ortiz, and ˇSimkus 2015), while our goal in this paper is to design FO-rewritable DL-Lite extensions.

Discussion and Conclusions We presented query reformulation rules that are data independent, as well as more ﬁne-grained rules leveraging current dataset. To capture multi-dimensional knowledge, we extended DL-Lite H with a restricted use of CRIs that preserve FO-rewritability, and yet cover the desired use case. Our data-driven rules take into account a particular kind of dependencies in the data. We chose these somewhat subjectively, aiming at enhanced dimensional navigation, while keeping the complexity in check. Indeed, testing those dependencies in DL-Lite HR KBs consisting of a recursion safe TBox and a k-bounded ABox is not computationally expensive (AC0 in data complexity). Clearly, other approaches for data-driven reformulation might be feasible. With the approach we have described there may be many possible reformulations of a given query, and not all of them are always equally interesting. In our future research we will investigate relaxations/restrictions that minimally modify the query answers, properties of our reformulation rules, and algorithms to effectively compute preferred reformulations. We are also investigating mechanisms for compiling the data and the ontology to support efﬁcient answering of reformulated queries, and the deﬁnition of a declarative query language considering relaxing/restraining operators as ﬁrst-class citizens. Finally, we also plan to consider aggregation and the deﬁnition of operators suitable for data analysis tasks in the spirit of OLAP systems.

Acknowledgments This work was supported by the Austrian Science Fund (FWF) projects P30360, P30873 and W1255.

Andresel, M.; Ortiz, M.; and ˇSimkus, M. 2016. A compilation technique for interactive ontology-mediated data exploration. In In Proc. of Description Logics 2016, volume 1577 of CEUR. Arenas, M.; Cuenca Grau, B.; Kharlamov, E.; Marciuska, S.; and Zheleznyakov, D. 2016. Faceted search over RDF-based knowledge graphs. J. Web Sem. 37-38:55 74. Artale, A.; Calvanese, D.; Kontchakov, R.; and Zakharyaschev, M. 2009. The DL-Lite family and relations. J. Artif. Intell. Res. 36:1 69. Bertossi, L. E., and Milani, M. 2018. Ontological multidimensional data models and contextual data quality. J. Data and Information Quality 9(3):14:1 14:36. Bienvenu, M.; Ortiz, M.; and ˇSimkus, M. 2015. Regular path queries in lightweight description logics: Complexity and algorithms. J. Artif. Int. Res. 53(1):315 374. Calvanese, D.; De Giacomo, G.; Lembo, D.; Lenzerini, M.; and Rosati, R. 2007. Tractable reasoning and efﬁcient query answering in description logics: The DL-Lite family. J. Autom. Reasoning 39(3):385 429. Dolog, P.; Stuckenschmidt, H.; Wache, H.; and Diederich, J. 2009. Relaxing RDF queries based on user and domain preferences. J. Intell. Inf. Syst. 33(3):239 260.

Elbassuoni, S.; Ramanath, M.; and Weikum, G. 2011. Query relaxation for entity-relationship search. In Proc. ESWC 2011, volume 6644 of LNCS, 62 76. Springer. Franconi, E., and Kamble, A. 2004. The GMD data model and algebra for multidimensional information. In CAi SE, volume 3084 of LNCS, 446 462. Springer. Franconi, E., and Sattler, U. 1999. A data warehouse conceptual data model for multidimensional aggregation. In Proc. DMDW 1999, volume 19 of CEUR. Frosini, R.; Cal ı, A.; Poulovassilis, A.; and Wood, P. T. 2017. Flexible query processing for SPARQL. Semantic Web 8(4):533 563. Horrocks, I., and Sattler, U. 2004. Decidability of SHIQ with complex role inclusion axioms. Artif. Intell. 160(1-2):79 104. Huang, H., and Liu, C. 2010. Query relaxation for star queries on RDF. In Proc. WISE 2010, volume 6488 of LNCS, 376 389. Springer. Hurtado, C. A., and Mendelzon, A. O. 2002. OLAP dimension constraints. In Proc. PODS 2002, 169 179. ACM. Hurtado, C. A.; Poulovassilis, A.; and Wood, P. T. 2008. Query relaxation in RDF. J. Data Semantics 10:31 61. Inoue, K., and Wiese, L. 2011. Generalizing conjunctive queries for informative answers. In Proc. FQAS 2011, volume 7022 of LNCS, 1 12. Springer. Kashyap, A.; Hristidis, V.; and Petropoulos, M. 2010. Facetor: cost-driven exploration of faceted query results. In Proc. CIKM 2010, 719 728. ACM. Kazakov, Y. 2010. An extension of complex role inclusion axioms in the description logic SROIQ. In Proc. IJCAR 2010. Martinenghi, D., and Torlone, R. 2014. Taxonomy-based relaxation of query answering in relational databases. VLDB J. 23(5):747 769. Ortiz, M., and ˇSimkus, M. 2012. Reasoning and query answering in description logics. In Reasoning Web, volume 7487 of LNCS, 1 53. Springer. Ortiz, M. 2013. Ontology based query answering: The story so far. In Proc. AMW 2013, volume 1087 of CEUR. Poggi, A.; Lembo, D.; Calvanese, D.; De Giacomo, G.; Lenzerini, M.; and Rosati, R. 2008. Linking data to ontologies. J. Data Semantics 10:133 173. Reddy, B. R. K., and Kumar, P. S. 2010. Efﬁcient approximate SPARQL querying of web of linked data. In Proc. URSW 2010, volume 654 of CEUR, 37 48. Roy, S. B.; Wang, H.; Das, G.; Nambiar, U.; and Mohania, M. K. 2008. Minimum-effort driven dynamic faceted search in structured databases. In Proc. CIKM 2008, 13 22. ACM. Sherkhonov, E.; Cuenca Grau, B.; Kharlamov, E.; and Kostylev, E. V. 2017. Semantic faceted search with aggregation and recursion. In Proc. ISWC 2017, volume 10587 of LNCS, 594 610. Springer. Virgilio, R. D.; Maccioni, A.; and Torlone, R. 2013. A similarity measure for approximate querying over RDF data. In Proc. EDBT/ICDT Workshops 2013, 205 213. ACM.