# answering_regular_path_queries_over_sq_ontologies__f9fe868c.pdf

Answering Regular Path Queries over SQ Ontologies

V ıctor Guti errez-Basulto Cardiff University, UK gutierrezbasultov@cardiff.ac.uk

Yazm ın Ib a nez-Garc ıa TU Wien, Austria yazmin.garcia@tuwien.ac.at

Jean Christoph Jung Universit at Bremen, Germany jeanjung@uni-bremen.de

We study query answering in the description logic SQ supporting qualiﬁed number restrictions on both transitive and non-transitive roles. Our main contributions are a tree-like model property for SQ knowledge bases and, building upon this, an optimal automata-based algorithm for answering positive existential regular path queries in 2EXPTIME.

1 Introduction The use of ontologies to access data has gained a lot of popularity in various research ﬁelds such as knowledge representation and reasoning, and databases. In the ontology-based data access (OBDA) scenario, ontologies are often encoded using description logic languages (DLs); as a consequence, a large amount of research on the query answering problem (QA) over DL ontologies has been conducted. In particular, several efforts have been put into the study of the query answering problem in DLs featuring transitive roles and number restrictions (Glimm, Horrocks, and Sattler 2008; Glimm et al. 2008; Eiter et al. 2009; Calvanese, Eiter, and Ortiz 2009; 2014). However, in all these works the application of number restrictions to transitive roles is forbidden. This is also reﬂected in the fact that the W3C ontology language OWL 2 does not allow for this interaction.1 Unfortunately, this comes as a shortcoming in crucial DL application areas like medicine and biology in which many terms are deﬁned and classiﬁed according to the number of components they contain or have as a part, in a transitive sense (Wolstencroft et al. 2005; Rector and Rogers 2006; Stevens et al. 2007). For instance, the ontology T below describes that the human heart has as a part (h Pt) exactly one mitral valve (MV), a left atrium (LA) and a left ventricle (LV); and the latter two (enforced to be distinct) also have as a part a mitral valve. Thus, the left atrium and left ventricle have to share the mitral valve.

T = { Heart (= 1 h Pt.MV) h Pt.LA h Pt.LV, LV LA , LV h Pt.MV, LA h Pt.MV }.

The lack of investigations of query answering in DLs of this kind is partly because (i) the interaction of these

Copyright c 2018, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved. 1https://www.w3.org/TR/webont-req/

features with other traditional constructors often leads to undecidability of the standard reasoning tasks (e.g., satisﬁability) (Horrocks, Sattler, and Tobies 2000); and (ii) for those DLs known to be decidable, such as SQ and SOQ (Kazakov, Sattler, and Zolin 2007; Kaminski and Smolka 2010), only recently tight complexity bounds were obtained (Guti errez-Basulto, Ib a nez-Garc ıa, and Jung 2017a). Moreover, these features, even with restricted interaction, pose additional challenges for devising decision procedures since they lead to the loss of properties, such as the tree model property, which make the design of algorithms for QA simpler. Clearly, these issues are exacerbated if number restrictions are imposed on transitive roles. Traditionally, most of the research in OBDA has focused on answering conjunctive queries. However, navigational queries have recently gained a lot of attention (Stefanoni et al. 2014; Bienvenu, Ortiz, and Simkus 2015; Baget et al. 2017) since they are key in various applications. For instance, in biomedicine they are used to retrieve speciﬁc paths from protein, cellular and disease networks (Dogrusoz et al. 2009; Lysenko et al. 2016). A prominent class of navigational queries is that of regular path queries (Florescu, Levy, and Suciu 1998), where paths are speciﬁed by a regular expression. Indeed, motivated by applications in the semantic web, the latest W3C standard SPARQL 1.1 includes property paths, related to regular expressions. The objective of this paper is to start the research on query answering in DLs supporting qualiﬁed number restrictions over transitive roles. We study the entailment problem of positive existential two-way regular path queries (Calvanese et al. 2000) over SQ ontologies, thus generalizing both conjunctive and regular path queries. To this end, we pursue an automata-based approach for query answering using twoway alternating tree automata (2ATA) (Vardi 1998). This roughly consists of three steps (Calvanese, Eiter, and Ortiz 2014): (i) show that, if a query ϕ is not entailed by the knowledge base K, there is a tree-like interpretation witnessing this, (ii) devise an automaton AK which accepts precisely the tree-like interpretations of K, (iii) devise an automaton Aϕ which accepts a tree-like interpretation iff it satisﬁes ϕ. Query entailment is then reduced to the question whether AK accepts a tree that is not accepted by Aϕ. In this paper, we signiﬁcantly adapt and extend each step to SQ, resulting in an algorithm running in 2EXPTIME, even

The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18)

for binary coding of numbers. A matching lower bound follows from positive existential QA in ALC (Calvanese, Eiter, and Ortiz 2014). More precisely, for step (i) we develop the notion of canonical tree decompositions which intuitively are tree decompositions tailored to handle the interaction of transitivity and number restrictions. We then show via a novel unraveling operation for SQ that, if the query is not entailed, there is a witness interpretation which has a canonical tree decomposition of width bounded exponentially in the size of K, cf. Section 3. These canonical tree decompositions are crucial in order to construct a small 2ATA AK in step (ii), which is done in Section 4.1. For step (iii), we propose in Section 4.2 a novel technique for answering regular path queries directly using a 2ATA Aϕ since a naive application of the techniques from (Calvanese, Eiter, and Ortiz 2014) does not lead to optimal complexity, because of the large width of the decompositions. An extended version with appendix can be found under www.informatik.uni-bremen.de/tdki/research/papers.html.

2 Preliminaries Syntax. We consider a vocabulary consisting of countably inﬁnite disjoint sets of concept names NC, role names NR, and individual names NI, and assume that NR is partitioned into two countably inﬁnite sets of non-transitive role names Nnt R and transitive role names Nt R. The syntax of SQconcepts C, D is given by the rule

C, D ::= A | C | C D | ( n r.C)

where A NC, r NR, and n is a number given in binary. We use ( n r.C) as an abbreviation for ( n 1 r.C), and other standard abbreviations like , , C D, r.C, r.C. Concepts of the form ( n r.C) and ( n r.C) are called at-most restrictions and at-least restrictions, respectively. An SQ-TBox (ontology) T is a ﬁnite set of concept inclusions C D where C, D are SQ-concepts. An ABox is a ﬁnite set of concept and role assertions of the form A(a), r(a, b) where A NC, r NR and {a, b} NI; ind(A) denotes the set of individual names occurring in A. A knowledge base (KB) is a pair K = (T , A). Semantics. An interpretation I = (ΔI, I) consists of a non-empty domain ΔI and an interpretation function I mapping concept names to subsets of the domain and role names to binary relations over the domain such that transitive role names are mapped to transitive relations. The interpretation function is extended to complex concepts by deﬁning ( C)I = ΔI \ CI, (C D)I = CI DI, and

( n r.C)I = {d ΔI | |{e CI | (d, e) r I}| n}.

For ABoxes A we adopt the standard name assumption (SNA), that is, a I = a, for all a ind(A), but we strongly conjecture that our results hold without it. The satisfaction relation |= is deﬁned as usual by taking I |= C D iff CI DI, I |= A(a) iff a AI, and I |= r(a, b) iff (a, b) r I. An interpretation I is a model of a TBox T , denoted I |= T , if I |= α for all α T ; it is a model of an ABox A, written I |= A, if I |= α for all α A; it is a model of a KB K if I |= T and I |= A.

Query Language. A positive existential regular path query (PRPQ) is a formula ϕ = x ψ(x) where ψ is constructed using and over atoms of the form E(t, t ) where t, t are variable or constant names, E is a regular expression over {r, r | r NR} {A? | A NC}, and the tuple x denotes precisely the free variables in ψ. Note that atoms A(t) are captured using A?(t, t). We denote with Iϕ the set of constant names in ϕ. A match for ϕ in I is a function π : x Iϕ ΔI such that π(a) = a, for all a Iϕ and I, π |= ψ(x) under the standard semantics of ﬁrst-order logic extended with the following rule for atoms of the form E(t, t ): I, π |= E(t, t ) if there is a word ν1 νn L(E) and a sequence d0, . . . , dn ΔI such that d0 = π(t), dn = π(t ), and for all i [1, n] we have that (i) if νi = A?, then di 1 = di AI, and (ii) if νi = r (resp., νi = r ), then (di 1, di) r I (resp., (di, di 1) r I). A query ϕ is entailed by a KB K, denoted as K |= ϕ, if there is a match for ϕ in every model I of K. The query entailment problem asks whether a KB K entails a PRPQ ϕ. It is well-known that the query answering problem can be reduced to query entailment, and that PRPQs are preserved under homomorphisms, that is, if I |= ϕ and there is a homomorphism from I to J , then also J |= ϕ. Additional Notation for Transitive Roles. Given some interpretation I, I|Δ denotes the restriction of I to domain Δ ΔI. For d ΔI and r Nt R, the r-cluster of d in I, denoted by QI,r(d), is the set containing d and all elements e ΔI such that both (d, e) r I and (e, d) r I. We call a set a ΔI an r-cluster in I if a = QI,r(d) for some d ΔI, and an r-root cluster if additionally (d, e) r I for all d a and e ΔI \ a. Note that both a single element without an r-loop and a single element with an r-loop are r-clusters of size 1; otherwise r-clusters can be viewed as r-cliques.

3 Tree Decompositions Existing algorithms for QA in expressive DLs, e.g., SHIQ (without number restrictions on transitive roles), exploit the fact that for answering queries it sufﬁces to consider canonical models that are forest-like, roughly consisting of an interpretation of the ABox and a collection of tree-shaped interpretations whose roots are elements of the ABox. We start with showing that for SQ this tree-model property is lost.

Example 1. The number restrictions in T , cf. Section 1, force that every model of T satisfying Heart contains the structure in Fig. 1(a). Moreover, in SQ clusters can be enforced. Let T be the following TBox, where r Nt R:

{A (= 3 r.B), B (= 3 r.B), A B}.

Then, in every model of T , an element satisfying A roots the structure depicted in Fig. 1(b), where the elements satisfying B form an r-cluster.

Nevertheless, we will establish a tree-like model property for SQ, showing that it sufﬁces to consider such models for query entailment. We ﬁrst introduce a basic form of tree decompositions suited for transitive roles. A tree is a preﬁxclosed subset T (N \ {0}) . A node w T is a successor

Figure 1: Example 1

of v T and v is a predecessor of w if w = v i for some i N. We denote with w 1 the predecessor of w, if it exists.

Deﬁnition 1. A tree decomposition of an interpretation I is pair (T, I) where T is a tree and I is a function that assigns an interpretation I(w) = (Δw, I(w)) to every w T, and the following conditions are satisﬁed:

w T Δw; 2. for every w T, we have I(w) = I|Δw; 3. r I = χr for r Nnt R and r I = χ+ r for r Nt R, where χr =

w T r I(w);

4. for every d ΔI, the set {w T | d Δw} is connected in T.

The width of (T, I) is the maximum domain size of interpretations that occur in the range of I minus 1, that is, supw T |Δw| 1. Its outdegree is the outdegree of T.

Unfortunately, this basic tree decomposition does not yet enable tree automata to count over transitive roles (with a small number of states) since the r-successors of an element, say d ΔI, are scattered in the decomposition; see Section 4.1 for further details. To address this, we extend tree decompositions with a third component r which assigns to every node w T \{ε} a role name r(w) and to the root ε. Intuitively, a node labeled with r = r(w) is responsible for capturing rsuccessors of some element(s) in the predecessor of w. We need some additional notation. Let (T, I, r) be such an extended tree decomposition, and let w T and r NR. We say that d Δw is fresh in w if w = ε or d / Δw 1, and rfresh in w if r = r(w) and it is either fresh or r = r(w 1). We denote with F(w) and Fr(w) the set of all fresh and rfresh elements in w, respectively. Intuitively, Fr(w) contains all elements which are allowed to have fresh r-successors in the successor nodes of w. Indeed, the following stronger form of tree decompositions implies (among other things) that, for all d and r, there is a unique w with d Fr(w).

Deﬁnition 2. An extended tree decomposition T = (T, I, r) of an interpretation I is canonical if the following conditions are satisﬁed for every w T with r = r(w) and every successor v of w with s = r(v):

(C1) if (d, e) s I(v) 1 , then s1 = s, or d = e and s1 Nt R; (C2) if s Nnt R , then Δv = {d, e}, for some d F(w), e F(v), and s I(v) = {(d, e)};

(C3) if s Nt R and r / { , s}, there are d F(w) and an r-root cluster a in I(v) such that Δw Δv = {d} and d a; moreover, there is no successor v = v of w satisfying this for d and r(v ) = s; (C4) if s Nt R and r { , s}, then there is an s-root cluster a in I(v) with: (a) a Fs(w); (b) a is an s-cluster in I(w); (c) for all d a and (d, e) s I(w), we have e Δv; (d) for all (d, e) s I(v), d a F(v) or e / F(v).

Deﬁnition 2 imposes restrictions on the structural relation between interpretations at neighboring nodes. Condition (C1) expresses that the interpretation at a node labeled with r(w) = r interprets essentially only r non-empty (among role names). Condition (C2) is in analogy with standard unravelling over non-transitive roles (Baader et al. 2003). Condition (C3) reﬂects that interpretations at neighboring nodes with different r-components do only interact via single elements. Most interestingly, Condition (C4) plays the role of (C2), but for transitive roles. Note that (C4) is based on r-clusters since they can be enforced, see Example 1 above.

3.1 Tree-like Model Property for SQ As our ﬁrst main result, we show a tree-like model property, in particular, that every model can be unraveled into a canonical decomposition of small width. The proof is via a novel unraveling operation tailored for the logic SQ and canonical decompositions.

Theorem 1. Let K = (T , A) be an SQ KB and ϕ a PRPQ with K |= ϕ. There is a model J of K and a canonical tree decomposition (T, I, r) of J with (i) J |= ϕ, (ii) I(ε) |= A, and (iii) width and outdegree of (T, I) are bounded by O(|A| 2p(|T |)), for some polynomial p.

Before outlining the proof of Theorem 1, we introduce some additional notation. The width of an interpretation I is the minimum k such that |QI,r(d)| k for all d ΔI, r Nt R. Moreover, for a transitive role r, we say that e is a direct r-successor of d if (d, e) r I but e / QI,r(d), and for each f with (d, f), (f, e) r I, we have f QI,r(d) or f QI,r(e); if r is non-transitive, then e is a direct r-successor of d if (d, e) r I. The breadth of I is the maximum k such that there are d, d1, . . . , dk and a role name r, all di are direct r-successors of d, and

if r is non-transitive, then di = dj for all i = j;

if r is transitive, then QI,r(di) = QI,r(dj), for i = j.

Let now be I |= K and I |= ϕ. As PRPQs are preserved under homomorphisms, the following lemma implies that we can assume without loss of generality that I is of bounded width and breadth. The proof of this lemma adapts a result in (Kazakov and Pratt-Hartmann 2009).

Lemma 1. For each I |= K, there is a sub-interpretation I of I with I |= K and width and breadth of I are bounded by O(|A| + 2p(|T |)).

Let cl(T ) be the set of all subconcepts occurring in T , closed under single negation. For each transitive role r, deﬁne a binary relation I,r on ΔI, by taking d I,r e if there is some ( n r.C) T such that d ( n r.C)I, e CI, and (d, e) r I. Based on the transitive, reﬂexive closure I,r of I,r, we deﬁne, for every d ΔI, the set Wit I,r(d) of r-witnesses for d by: Wit I,r(d) = e|d I,re QI,r(e).

Intuitively, Wit I,r(d) contains all r-witnesses of at-most restrictions of some element d, and due to using I,r, also all witnesses of at-most restrictions of those witnesses and so on. For the stated bounds, it is important that the size of Wit I,r(d) is bounded as follows: Lemma 2. For every d ΔI and transitive r, we have |Wit I,r(d)| |A| 2q(|T |), for some polynomial q. We describe now the construction of the interpretation J and its tree decomposition via a possibly inﬁnite unraveling process. Elements of ΔJ will be either of the form a with a ind(A) or of the form dx with d ΔI and some index x. We usually use δ to refer to domain elements in J (in either form), and deﬁne a function τ : ΔJ ΔI by setting τ(δ) = δ, for all δ ind(A), and τ(δ) = d, for all δ of the form dx. To start the construction of J and (T, I, r), initialize the domain ΔJ with ind(A)

r Nt R Δr, where the sets Δr are deﬁned as Δr = {dr | d

a ind(A) Wit I,r(a) \ ind(A)}.

Concept and role names are interpreted in a way such that J |ind(A) = I|ind(A), and for all r Nt R and all δ, δ ind(A) Δr, we have

δ AJ τ(δ) AI, for all A NC, and

(δ, δ ) r J (τ(δ), τ(δ )) r I. ( )

Now, initialize (T, I, r) with T = {ε}, Δε = ΔJ , and r(ε) = . This ﬁrst step ensures that all witnesses of ABox individuals appear in the root. In the inductive step, we extend J and (T, I, r) by applying the following rules exhaustively in a fair way. R1 Let r be non-transitive, w T, δ F(w), and d a direct r-successor of τ(δ) in I with {δ, d} ind(A). Then, add a fresh successor v of w to T, add the fresh element dv to ΔJ , extend J by adding (δ, dv) r J and dv AJ iff d AI, for all A NC, and set Δv = {δ, dv} and r(v) = r.

R2 Let r be transitive, w T, and δ0 F(w) such that: (a) w = ε and δ0 Δs, for some transitive s = r, or (b) w = ε and r(w) = r. Then add a fresh successor v of w to T, and deﬁne Δ = {ev | e Wit I,r(τ(δ0)) \ {τ(δ0)}}. Extend the domain of J with Δ and the interpretation of concept and role names such that ( ) is satisﬁed for all δ, δ Δ {δ0}. Finally, set Δv = Δ {δ0} and r(v) = r.

A1, A2 A1, A2, B

er A1 A1, A2, B

er A1 A1, A2, B

er A1 A1, A2, B

er A1 A1, A2, B

Figure 2: Example 2

R3 Let r be transitive, w T, a Fr(w) an r-cluster in I(w) such that:

(a) w = ε and a Δr ind(A), or (b) w = ε and r(w) = r.

If there is a direct r-successor e of τ(δ) in I for some δ a such that (δ, δ ) / r J for any δ with τ(δ ) = e, then add a fresh successor v of w to T, and deﬁne

Δ = {fv | f Wit I,r(e) \ Wit I,r(τ(δ))} and

Δv = Δ a {δ | r(δ , δ ) I(w) for some δ a}.

Then extend the domain of J with Δ and the interpretation of concept names such that ( ) is satisﬁed for all δ a Δ and δ Δv. Finally, set r(v) = r.

To ﬁnish the construction, let J be the interpretation obtained in the limit, and set I(w) = J |Δw, for all w T. It is veriﬁed in the appendix that (T, I, r) and J satisfy the conditions from Theorem 1. Notably, τ is a homomorphism from J to I, thus J |= ϕ, due to preservation under homomorphisms. Rules R1 R3 are, respectively, in one-to-one correspondence with Conditions (C2) (C4) in Deﬁnition 2. In particular, R1 implements the well-known unraveling procedure for non-transitive roles. R2 is used to change the role component for transitive roles by creating a fresh node whose interpretation contains all witnesses of the chosen element δ. Finally, R3 describes how to unravel direct r-successors in case of transitive roles r. In the deﬁnition of Δ it is taken care that witnesses which are inherited from predecessors are not introduced again, in order to preserve at-most restrictions. We ﬁnish the section with an illustrating example.

Example 2. Let K be the following KB, where r Nt R:

({A1 ( 1 r.B), A2 ( 1 r.C)}, {A1(a)}).

Figure 2 shows a model I of K and a canonical decomposition T of its unraveling (transitivity connections are omitted). In the initialization phase, the interpretation I(ε) is constructed starting from individual a. Since a I,r e and e I,r f, we have Wit I,r(a) = {e, f}, thus er and fr are added in this phase. The interpretations I(vi) are introduced using R3: In all cases Δε is the cluster a and δ = a; and, e.g., Δ = {cv1} for I(v1).

4 Automata-Based Query Entailment In this section, we devise an automata-based decision procedure for query entailment in SQ. We start with the necessary background about the used automata model. Alternating Tree Automata. A tree is k-ary if each node has exactly k successors. For brevity, we set [k] = { 1, 0, . . . , k}. Let Σ be a ﬁnite alphabet. A Σ-labeled tree is a pair (T, τ) with T a tree and τ : T Σ assigns a letter from Σ to each node. A two-way alternating tree automaton (2ATA) over Σ-labeled k-ary trees is a tuple A = (Q, Σ, q0, δ, F) where Q is a ﬁnite set of states, q0 Q is an initial state, δ is the transition function, and F is the (parity) acceptance condition (Vardi 1998). The transition function maps a state q and an input letter a Σ to a positive Boolean formula over the constants true and false, and variables from [k] Q. The semantics is given in terms of runs, see appendix. As usual, L(A) denotes the set of trees accepted by A. Emptiness of L(A) can be checked in exponential time in the number of states of A (Vardi 1998). General Picture. The leading thought is as follows. If K |= ϕ, then, by Theorem 1, there is a model J of K and a canonical tree decomposition thereof with small width and outdegree such that J |= ϕ. The idea is to design 2ATAs Acan, AK, and Aϕ which accept canonical tree decompositions, (tree-like) models of the KB K, and (tree-like) models of the query ϕ, respectively. Query answering is then reduced to the question whether some tree is accepted by Acan and AK, but not by Aϕ. As we shall see, these automata have size exponential in K and can be constructed in double exponential time. Since 2ATAs can be complemented and intersected in polynomial time, the automaton Acan AK Aϕ is of exponential size, and can be constructed in double exponential time. Checking it for nonemptiness can thus be done in double exponential time. A matching lower bound is inherited from positive existential query answering in ALC (Calvanese, Eiter, and Ortiz 2014). We thus obtain our main result.

Theorem 2. PRPQ entailment over SQ-knowledge bases is 2EXPTIME-complete.

Encoding Tree Decompositions. As the underlying interpretation might be inﬁnite, 2ATAs cannot directly work over tree decompositions. Thus, for the desired approach to work, it is crucial to encode tree decompositions using a ﬁnite alphabet. To this aim, we use an approach similar to (Gr adel and Walukiewicz 1999). Throughout this section, ﬁx a knowledge base K and let K and k be the bounds on width and outdegree, respectively, obtained in Theorem 1. Then, ﬁx a ﬁnite set Δ having 2K elements with ind(A) Δ, and deﬁne Σ = { } Σ , where Σ is the set of all pairs (I, x) such that I is an interpretation where only symbols from K are interpreted non-empty, ΔI Δ, |ΔI| K, and x is either a role name from K or . The symbol Σ is used to encode non-existing branches (tree decompositions are not necessarily uniformly branching). Let (T, τ) be a Σ-labeled tree with Σ as above. For convenience, we use Iw and rw to refer to the single components

of τ in a node w with τ(w) = , that is, τ(w) = (Iw, rw). Given an element d Δ, we say that v, w T are dconnected iff d ΔIu for all u on the unique shortest path from v to w. In case d ΔIw, we use [w]d to denote the set of all v which are d-connected to w. We call (T, τ) consistent if ε is the only node with rε = and (Iw)|D = (Iv)|D for all neighbors v, w T and D = ΔIw ΔIv. A consistent Σ-labeled tree (T, τ) represents a triple (T, I, r) of width at most K as follows. The domain underlying (T, I, r) is the set of all elements [w]d with w T and d ΔIw, and for every w T, the interpretation I(w) is deﬁned as:

Δw = {[w]d | d ΔIw}, AI(w) = {[w]d | d AIw},

r I(w) = {([w]d, [w]e) | (d, e) r Iw},

for all concept names A and role names r occurring in K; and r(w) is just rw. We denote with I(T,τ) the interpretation w T Iw; clearly, (T, I, r) is a tree decomposition of I(T,τ). As a convention, we use [ε]a to represent each ABox individual a ind(A) in the encoding. Based on the size 2K of Δ, it is not hard to verify that, conversely, for every width K tree decomposition of some I, there is a consistent (T, τ) such that I(T,τ) is isomorphic to I. It is easy to devise a 2ATA Acan which accepts an input (T, τ) iff it is consistent and the represented tree decomposition (T, I, r) is canonical. We thus concentrate on the most challenging automata AK and Aϕ.

4.1 Knowledge Base Automaton AK

The automaton AK is the intersection of two automata AA and AT verifying that the input satisﬁes the ABox and the TBox, respectively. Note that, by Point (ii) of Theorem 1, we can assume that the ABox is satisﬁed in the root; thus, an automaton AA checking whether I(T,τ) |= A just has to check the label τ(ε), see the appendix.

For the design of the automaton AT , assume w.l.o.g. that T is of the form { CT } and CT is in negation normal form. We present the main ideas of the construction of AT , see the appendix for further details. In its outer loop , the automaton visits every domain element d in state CT (d). This is realized using the initial state q0, and states of the form D(d), D a sub-concept of CT and d Δ via the following transitions for every (I, x) Σ:

δ(q0, (I, x)) =

1 i K(i, q0)

d ΔI(0, CT (d))

δ(q0, ) = true

If AT visits w in a state D(d) this presents the obligation to verify that, in the represented model, [w]d satisﬁes D. The Boolean operations are dealt with using the following transitions, for every (I, x) Σ:

δ(A(d), (I, x)) = if d AI, then true else false

δ( A(d), (I, x)) = if d / AI, then true else false δ((C1 C2)(d), (I, x)) = (0, C1(d)) (0, C2(d)) δ((C1 C2)(d), (I, x)) = (0, C1(d)) (0, C2(d))

For states of the form ( n r.D)(d) we have to be more careful. The naive approach for counting the number of rsuccessors of d satisfying D would be to count the number of r-successors satisfying D in the interpretation associated to the current node, and then move to all other nodes where d appears. Since interpretations associated to neighboring nodes might overlap, to avoid double counting, we have to store (in the states) all elements that have already been counted in the current node before changing the node. However, since the domain in each node has size exponential in |T |, we need doubly exponentially many states for this task. Since this naive approach does not result in optimal complexity, we pursue an alternative approach, based on canonicity, leading to only exponentially many states. Our approach is based on characterizing how r-successors of an element can be uniquely identiﬁed in canonical tree decompositions. Assume some (T, τ) L(Acan) and let r be a role name. In what follows, we assume that the notions of fresh and r-fresh are lifted to the encoding in the straightforward way. An r-path from [w]d to [v]e in (T, τ) is a sequence d0, w0, d1, . . . , wn 1, dn such that d = d0, e = dn, w0 [w]d, wn 1 [v]e, and (di, di+1) r Iwi , for all 0 i < n. It is downward if, for all 0 < i < n, wi is a successor of wi 1 and di is contained in an r-root cluster of wi. We then have: Lemma 3. For (T, τ) L(Acan), we have ([w]d, [v]e) r I(T,τ) iff one of the following is true: r is non-transitive and (d, e) r Iε or (d, e) r Iv, d is fresh in w, and v is a successor of w, or r is transitive, and there is an r-path d0, w0, . . . , dn from [w]d to [v]e such that one of the following holds: A d0 Fr(w0) Fr(w0 1), d1 Fr(w0), and d0, . . . , dn is downward, or B d0 Fr(w0), d1 / Fr(w0), and if n > 1, then d1, . . . , dn is downward and w1 1 [w]d1 is an ancestor of w0 such that d1 Fr(w1 1). This lemma suggests the following approach for verifying the obligation ( n r.D)(d) at some node w. If r is nontransitive, navigate with the automaton to the (unique!) w such that d F(w ) and count the r-successors of d in the successors v of w , or in ε. If r is transitive, navigate with the automaton to the unique w such that d Fr(w ) and change to a state q ( n r.D),d, starting from which AT systematically scans the r-successors according to A and B. We concentrate on verifying at-least restrictions, at-most restrictions are completely complementary. Assume τ(w ) = (I, x), and let a1, . . . , aℓbe all rclusters in I reachable from d (including QI,r(d)), and let a1, . . . , aℓbe representatives of these clusters. Moreover, let N be the set of all tuples n = (n1, . . . , nℓ) such that i ni = n. Then, the transition δ(q ( n r.D),d, (I, x)) is deﬁned as

i X (0, q A ( ni r.D),ai)

i [1,ℓ]\X (0, q B ( ni r.D),ai).

Thus, AT guesses a distribution of n to the reachable clusters. Moreover, it guesses from which clusters it starts paths

of the shape A and B. For both guesses, it veriﬁes that the chosen ai is r-fresh (for A) or not (for B), and continues in states q ( n r.D) and q ( n r.D), respectively. This is done using the following transitions:

δ(q A ( n r.D),d, (I, x)) = (0, Fr,d) (0, q ( n r.D),d)

δ(q B ( n r.D),d, (I, x)) = (0, F r,d) ( 1, q ( n r.D),d)

δ(Fr,d, (I, )) = true δ(Fr,d, (I, x)) = false if x / {r, }

δ(Fr,d, (I, r)) = ( 1, F r,d)

δ(F r,d, (I, x)) = true if x / {r, } or d ΔI, false otherwise,

and complementary transitions for F r,d. Now, in states q ( n r.D),d, the automaton goes up until it ﬁnds the world where d is r-fresh (corresponding to w1 1 in B) and looks for downward paths starting from there. This is done by taking setting δ(q ( n r.D),d, (I, x)) = false whenever d / ΔI, and otherwise: δ(q ( n r.D),d, (I, x)) = (0, q A ( n r.D),d) (0, q B ( n r.D),d).

It thus remains to describe transitions for states of the form q ( n r.D),d at some node w. Such situations represent the obligation to ﬁnd n r-successors along downward paths from d. Note that the transitions before ensure that d Fr(w). In this case, the automaton guesses how many of the n successors it will ﬁnd locally in the current cluster (using states ploc m,r,D,d), and how many are to be found in successor nodes (using psucc ( m r.D)). Formally, let M be the set of all tuples m = (m0, . . . , mk) with

i mi = n, and deﬁne the transition for δ(q ( n r.D),d, (I, x)) as:

(0, ploc m0,r,D,d)

i [1,k] (i, psucc ( mi r.D),d)

States of the form ploc n,r,D,d are used to verify that in QI,r(d) there are n elements satisfying D:

δ(ploc n,r,D,d, (I, x)) =

Y QI,r(d),|Y |=n

It remains to give the transitions for states psucc ( m r.D). To start, we set δ(psucc ( n r.D),d, σ) = true, whenever n = 0; δ(psucc ( n r.D),d, ) = false; and δ(psucc ( n r.D),d, (I, x)) = false whenever x = r or d is not in a root cluster of I. For all other cases, let a1, . . . , aℓbe all r-clusters reachable from d, except QI,r(d), let N be again the set of all n = (n1, . . . , nℓ) such that i ni = n, and include the transition

δ(psucc ( n r.D),d, (I, x)) =

i [1,ℓ] (0, q A ( ni r.D),ai).

Using the parity condition, we make sure that states q ( n r.D),d with n 1 are not suspended forever, that is, eventualities are ﬁnally satisﬁed. Lemma 4. For every (T, τ) L(Acan), we have (T, τ) L(AT ) iff I(T,τ) |= T . It can be constructed in time double exponential in |K|, and has exponentially many states in |K|.

4.2 Query Automaton Aϕ In previous work, we have observed that the approach for the query automaton taken in (Calvanese, Eiter, and Ortiz 2014) leads to a 2ATA with double exponentially many states in K, and thus not to optimal complexity (Guti errez-Basulto, Ib a nez-Garc ıa, and Jung 2017b). We thus take an alternative approach by ﬁrst giving an intermediate characterization for when a query has a match, and then show how to exploit this to build a 2ATA with exponentially many states. Fix a P2RPQ ϕ = x ψ(x). Note ﬁrst that since for every regular expression E over some alphabet Γ, one can construct in polynomial time an equivalent non-deterministic ﬁnite automaton (NFA) B = (QB, Γ, s0B, ΔB, FB) (F urer 1980), we generally assume an NFA-based representation, that is, atoms in ϕ take the shape B(t, t ), B an NFA. For states s, s QB, write Bs,s for the NFA that is obtained from B by taking s as initial state and {s } as the set of ﬁnal states. To give semantics to the automata based representation, we deﬁne I |= B(a, b) iff I |= EB(a, b), where EB is a regular expression equivalent to B. A conjunctive regular path query (CRPQ) is a PRPQ which does not use . It is well-known that the PRPQ ϕ is equivalent to a disjunction q1 . . . qn of CRPQs, where n is exponential in |ϕ|. Given a CRPQ p, we denote with ˆp the equivalent CRPQ obtained from p by replacing every occurrence of r or r , r transitive, with r r or r (r ) , respectively. Let (T, τ) be a consistent Σ-labeled tree. In the appendix, we show the following characterization. Lemma 5. A function π : x Iϕ ΔI(T,τ) with π(a) = [ε]a, for every a Iϕ, is a match for ϕ in I(T,τ) iff there is a qi such that for every B(t, t ) in ˆqi, there is a sequence

(d0, s0), w1, (d1, s1), w2, . . . , wn, (dn, sn),

where (di, si) Δ QB and wi T and such that: (a) s0 = s0B, sn FB, (b) π(t) = [w1]d0, π(t ) = [wn]dn, and (c) for every i [1, n], we have di 1, di ΔIwi , wi [wi 1]di 1 if i > 1, and Iwi |= Bsi 1,si(di 1, di). We will refer to such sequences as witness sequences. The lemma suggests the following approach. In order to check whether ϕ has a match in I(T,τ), the automaton guesses a qi and tries to ﬁnd the witness sequences characterizing a match. For this purpose, Aϕ uses as states triples p, Vl, Vr such that p ˆqi, Ip = , and: Vl and Vr are sets of expressions of the form (d, s) B x and x B (d, s), respectively, where B is the automaton of some atom B(t, t ) in ˆqi, s QB, d Δ, x var(p). Intuitively, when the automaton visits a node w in state p, Vl, Vr , this represents the obligation that each atom B(x, y) in p still has to be processed in the sense that all variables occuring in p will be instantiated in the subtree rooted at w, and

for each (d, s) B x Vl, Aϕ tries to ﬁnd a sufﬁx of the witness sequence for B(t, t ) starting with (d, s), for each x B (d, s) Vr, Aϕ tries to ﬁnd a preﬁx of the witness sequence for B(t, t ) ending with (d, s).

We describe verbally how the automaton Aϕ acts when visiting a node w in state p, Vl, Vr ; the complete transition function is given in the appendix. First, Aϕ nondeterministically chooses a partition S0, . . . , Sk (with Si possibly empty, for all i) of var(p) and values dx ΔIw for all x S0. Intuitively, S0 contains the variables that are to be instantiated in w, and Si contains the variables that are to be instantiated in the subtree rooted at w i. Based on the taken choice, Aϕ determines states pi, V i l , V i r which are then sent to the respective successors i [1, k] of w. Using the parity condition, we enforce that every variable is instantiated after ﬁnitely many of such steps. We demonstrate on several examples how to compute the states pi, V i l , V i r from S0, . . . , Sk and dx for all x S0.

Assume some B(x, y) p with x, y S0. In this case, Aϕ guesses some f FB and veriﬁes (using another set of states) that there is a witness sequence for B(x, y) starting with (dx, s0B) and ending with (dy, sf).

Assume B(x, y) p and x, y Si for some i > 0. In this case, just put B(x, y) into pi.

For an atom B(x, y) p with x S0 and y Si for i > 0, Aϕ guesses an intermediate tuple (d, s), veriﬁes that there is a witness sequence from (dx, s0B) to (d, s) and adds x B (d, s) to V i r .

For the treatment of Vl (Vr is similar), assume (d, s) B x Vl. If x S0, Aϕ veriﬁes that the sequence has a sufﬁx from (d, s) to (dx, sf), for some sf FB. If x Si, i > 0, Aϕ guesses an intermediate pair (d , s ), veriﬁes that there is an inﬁx between (d, s) and (d , s ) and includes (d , s ) B x V i l .

We show in the appendix how to verify the existence of an inﬁx of a witness sequence between two pairs (d, s) and (d , s ) as required in the ﬁrst, third and last item using only exponentially many states. Regarding number of states, observe that there are only exponentially many disjuncts (and thus states) qi and exponentially many states of the form p, Vl, Vr as described. We refer the reader to the appendix for the complete construction and a proof of the following lemma.

Lemma 6. There is a 2ATA Aϕ such that for every (T, τ) L(Acan), we have (T, τ) L(Aϕ) iff I(T,τ) |= q. It can be constructed in exponential time in |ϕ| + |K| and has exponentially in |ϕ| + |K| many states.

5 Discussion and Future Work The obtained results are both of practical and theoretical interest. From the practical point of view, our complexity results and application demands open up the possibility to include a proﬁle based on SQ to OWL 2. Note that there is no increase in the computational complexity in comparison with that of SQ without counting over transitive roles. From the theoretical perspective, our techniques are useful for several future lines of research. First, the unraveling lays the groundwork for studying extensions of SQ with other DL constructors. Second, the technique underlying the query automaton works for standard tree decompositions (it does not

rely on canonicity) of bounded outdegree, even if the width is high (exponential in our case). We thus believe that this technique is useful for query answering in other DLs. Finally, the gained understanding of the model-theoretic characteristics of SQ is an important step towards the development of more practical decision procedures. As future work, we will tackle the following four interesting problems: (i) The data complexity of deciding entailment of PRPQs in SQ. The present techniques give only exponential bounds, but we expect CONP-completeness. (ii) The complexity of deciding entailment of conjunctive queries (CQs) in SQ. The proposed automata-based approach yields the same upper bound for PRPQs or CQs, but we expect it to be easier for CQs. (iii) The complexity of deciding query entailment in generalizations of SQ with role composition or regular expressions on roles; or with nominals and (controlled) inverses. (iv) The complexity of query entailment in SQ over ﬁnite models. Indeed, SQ lacks ﬁnite controlability, that is, query entailment in the ﬁnite does not coincide with unrestricted query entailment:

Example 3. Consider A = , T = { r. }, and ϕ = x r(x, x) for some r Nt R. Clearly, (T , A) |= ϕ, but for every ﬁnite model I of (T , A), we have I |= ϕ.

Acknowledgments The ﬁrst author was funded by EU s Horizon 2020 programme under the Marie Skłodowska-Curie grant 663830, the second one by the FWF project P30360, and the third one by the ERC grant 647289 CODA.

References Baader, F.; Calvanese, D.; Mc Guinness, D. L.; Nardi, D.; and Patel-Schneider, P. F., eds. 2003. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press. Baget, J.; Bienvenu, M.; Mugnier, M.; and Thomazo, M. 2017. Answering conjunctive regular path queries over guarded existential rules. In In Proc. of IJCAI-17, 793 799. Bienvenu, M.; Ortiz, M.; and Simkus, M. 2015. Regular path queries in lightweight description logics: Complexity and algorithms. J. Artif. Intell. Res. (JAIR) 53:315 374. Calvanese, D.; De Giacomo, G.; Lenzerini, M.; and Vardi, M. Y. 2000. Containment of conjunctive regular path queries with inverse. In Proc. of KR-00, 176 185. Calvanese, D.; Eiter, T.; and Ortiz, M. 2009. Regular path queries in expressive description logics with nominals. In Proc. of IJCAI-09, 714 720. Calvanese, D.; Eiter, T.; and Ortiz, M. 2014. Answering regular path queries in expressive description logics via alternating tree-automata. Inf. Comput. 237:12 55. Dogrusoz, U.; Cetintas, A.; Demir, E.; and Babur, O. 2009. Algorithms for effective querying of compound graph-based pathway databases. BMC Bioinformatics 10(1):376. Eiter, T.; Lutz, C.; Ortiz, M.; and Simkus, M. 2009. Query answering in description logics with transitive roles. In Proc. of IJCAI-09, 759 764.

Florescu, D.; Levy, A. Y.; and Suciu, D. 1998. Query containment for conjunctive queries with regular expressions. In Proc. of PODS-98, 139 148. F urer, M. 1980. The complexity of the inequivalence problem for regular expressions with intersection. In Proc. of ICALP-80, 234 245. Glimm, B.; Lutz, C.; Horrocks, I.; and Sattler, U. 2008. Conjunctive query answering for the description logic SHIQ. J. Artif. Intell. Res. (JAIR) 31:157 204. Glimm, B.; Horrocks, I.; and Sattler, U. 2008. Unions of conjunctive queries in SHOQ. In Proc. of KR-08, 252 262. Gr adel, E., and Walukiewicz, I. 1999. Guarded ﬁxed point logic. In Proc. of LICS-99, 45 54. Guti errez-Basulto, V.; Ib a nez-Garc ıa, Y.; and Jung, J. C. 2017a. Number restrictions on transitive roles in description logics with nominals. In Proc. of AAAI-17. Guti errez-Basulto, V.; Ib a nez-Garc ıa, Y.; and Jung, J. C. 2017b. On query answering in description logics with number restrictions on transitive roles. In Proc. of DL-17. Horrocks, I.; Sattler, U.; and Tobies, S. 2000. Practical reasoning for very expressive description logics. Logic Journal of the IGPL 8(3):239 263. Kaminski, M., and Smolka, G. 2010. Terminating tableaux for SOQ with number restrictions on transitive roles. In Proc. of the 6th IFIP TC, 213 228. Kazakov, Y., and Pratt-Hartmann, I. 2009. A note on the complexity of the satisﬁability problem for graded modal logics. In Proc. of LICS-09, 407 416. Kazakov, Y.; Sattler, U.; and Zolin, E. 2007. How many legs do I have? Non-simple roles in number restrictions revisited. In Proc. of LPAR-07, 303 317. Lysenko, A.; Roznov at , I. A.; Saqi, M.; Mazein, A.; Rawlings, C. J.; and Auffray, C. 2016. Representing and querying disease networks using graph databases. Bio Data Mining 9(1):23. Rector, A. L., and Rogers, J. 2006. Ontological and practical issues in using a description logic to represent medical concept systems: Experience from GALEN. In Proc. of RW-06, 197 231. Stefanoni, G.; Motik, B.; Kr otzsch, M.; and Rudolph, S. 2014. The complexity of answering conjunctive and navigational queries over OWL 2 EL knowledge bases. J. Artif. Intell. Res. 51:645 705. Stevens, R.; Aranguren, M. E.; Wolstencroft, K.; Sattler, U.; Drummond, N.; Horridge, M.; and Rector, A. L. 2007. Using OWL to model biological knowledge. International Journal of Man-Machine Studies 65(7):583 594. Vardi, M. Y. 1998. Reasoning about the past with two-way automata. In Proc. of ICALP-98, 628 641. Wolstencroft, K.; Brass, A.; Horrocks, I.; Lord, P.; Sattler, U.; Turi, D.; and Stevens, R. 2005. A little semantic web goes a long way in biology. In Proc. of ISWC-05.