# cardinality_queries_over_dllite_ontologies__98822557.pdf

Cardinality Queries over DL-Lite Ontologies

Meghyn Bienvenu1 , Quentin Mani ere1 and Micha el Thomazo2

1CNRS, University of Bordeaux, Bordeaux INP, La BRI, Talence, France 2Inria, DI ENS, ENS, CNRS, University PSL, Paris, France {meghyn.bienvenu, quentin.maniere}@u-bordeaux.fr, michael.thomazo@inria.fr

Ontology-mediated query answering (OMQA) employs structured knowledge and automated reasoning in order to facilitate access to incomplete and possibly heterogeneous data. While most research on OMQA adopts (unions of) conjunctive queries as the query language, there has been recent interest in handling queries that involve counting. In this paper, we advance this line of research by investigating cardinality queries (which correspond to Boolean atomic counting queries) coupled with DL-Lite ontologies. Despite its apparent simplicity, we show that such an OMQA setting gives rise to rich and complex behaviour. While we prove that cardinality query answering is tractable (TC0) in data complexity when the ontology is formulated in DL-Litecore, the problem becomes co NP-hard as soon as role inclusions are allowed. For DL-Lite H pos (which allows only positive axioms), we establish a P-co NP dichotomy and pinpoint the TC0 cases; for DL-Lite H core (allowing also negative axioms), we identify new sources of co NP complexity and also exhibit L-complete cases. Interestingly, and in contrast to related tractability results, we observe that the canonical model may not give the optimal count value in the tractable cases, which led us to develop an entirely new approach based upon exploring a space of strategies to determine the minimum possible number of query matches.

1 Introduction In ontology-mediated query answering (OMQA) [Poggi et al., 2008; Bienvenu and Ortiz, 2015; Xiao et al., 2018], data is enriched with an ontology, which serves both to provide a user-friendly vocabulary for query formulation and to capture domain knowledge that is exploited at query time to obtain a more complete set of answers. While the OMQA approach offers many advantages, it also makes the query answering task more challenging than plain query evaluation. Indeed, instead of having to evaluate the query over the single explicitly given data instance, one must identify the certain answers, i.e. those holding in all possible situations (models) compatible with the data and the ontology.

A major topic in OMQA research has thus been to understand the complexity of OMQA and identify tractable settings. Nowadays, for the most commonly considered query language, namely, conjunctive queries (CQs), we have an almost complete picture of the complexity landscape for ontologies formulated in a wide range of different description logics (DLs) [Baader et al., 2017] and rule-based languages [Baget et al., 2011; Cal ı et al., 2012]. In particular, it has been shown that CQ answering is tractable in data complexity for ontologies expressed in the most commonly considered dialects of the DL-Lite family [Calvanese et al., 2007; Artale et al., 2009], which are often employed in OMQA. A well-known and frequently used property of such DL-Lite dialects and other Horn DLs is that they admit a canonical model, which is a single (possibly inﬁnite) model that, by virtue of being homomorphically embeddable into every model, is guaranteed to give the correct answers to all CQs. While CQs are a natural and well-studied class of queries, there are many other relevant forms of database queries that could be potentially be employed in OMQA. In the present paper, our focus will be on counting queries, which together with other forms of aggregate queries, are widely used for data analysis, yet still not well understood in the context of OMQA. A natural way to equip CQs with counting is to count the number of distinct query matches for each answer. As the count value may differ between models, Kostylev and Reutter (2015) advocated a form of certain answer semantics that considers lower and upper bounds on the count value across different models. Their work provided the ﬁrst investigation of the complexity of answering counting CQs in the presence of ontologies, revealing such queries to be much more challenging to handle than plain CQs: co NP-complete in data complexity for the well-known DL-Litecore and DL-Lite H core dialects. A recent work by Bienvenu et al. (2020) reﬁned and generalized the complexity results from Kostylev and Reutter to a wider class of counting queries and identiﬁed a restricted scenario with very low (TC0-complete) data complexity: rooted CQs coupled with DL-Litecore ontologies. A similar tractability result for connected rooted CQs was proven independently by Calvanese et al. (2020a), who also initiated a study of the impact of other restrictions on query shape and developed the ﬁrst query rewriting procedure for counting CQs. Notably, both the aforementioned TC0 result and the rewriting procedure crucially relied upon showing

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

that the canonical model gives the right answers under the considered restrictions. We brieﬂy mention two alternative approaches to counting queries: an epistemic semantics for aggregate queries (which only counts query matches over the data constants, ignoring unnamed elements) was explored by Calvanese et al. (2008), while another very recent study by Feier et al. (2021) classiﬁes the complexity of counting the number of certain answers (rather than the number of ways a certain answer is obtained) for guarded existential rules. While recent studies have improved our understanding of the complexity of counting CQs, there nevertheless remain many unanswered questions. In this paper, we focus on Boolean atomic counting queries of the form z.A(z) and z1, z2.R(z1, z2), which we term cardinality queries as they correspond to the natural task of determining (bounds on) the cardinality of a given concept or role name. The data complexity of answering such basic counting queries remains completely open for DL-Litecore ontologies, whilst for DL-Lite H core, the problem is known to be P-hard and in co NP [Calvanese et al., 2020a]. The main results of our investigation are displayed in Table 1. We show that when ontologies are expressed in DL-Litecore, cardinality query answering is tractable in data complexity and enjoys the lower possible complexity (TC0-complete). For cardinality queries based upon a concept atom, TC0 membership holds even for the fragment of DL-Lite H core obtained by disallowing negative role inclusions. By contrast, for role cardinality queries, we show that co NP-hard situations arise in DL-Lite H pos, which allows only positive concept and role inclusions. In fact, we obtain a complete data complexity classiﬁcation for DL-Lite H pos, showing that every ontology-mediated query is either TC0complete, co NP-complete, or is in P and logspace-equivalent to the complement of PERFECT MATCHING (whose precise complexity is a longstanding open problem). The preceding classiﬁcation does not extend to DL-Lite H core: we identify new sources of co NP-hardness and further exhibit Lcomplete cases. We ﬁnd it intriguing that such complex behaviour arises in what appears at ﬁrst glance to be a simple OMQA setting. Moreover, in all of the tractable cases we identify, the canonical model may not yield the minimum cardinality, and query answering involves solving non-trivial optimization problems. This led us to devise an entirely new approach based upon exploring a space of strategies to ﬁnd the optimal way of merging witnesses for existential axioms. The paper is organized as follows. Section 2 recalls relevant background material and presents the considered OMQA setting. Section 3 introduces strategies and uses them to establish TC0 membership. Our complexity classiﬁcation for DL-Lite H pos is the topic of Section 4, while Section 5 presents our results for DL-Lite H core. Section 6 concludes with a brief discussion of related and future work. An appendix with full proofs can be found in the long version of this paper, available on ar Xiv.

2 Preliminaries

We recall standard deﬁnitions and notation for OMQA in DLLite and introduce the particular setting studied in this paper.

Concept Role

DL-Litecore TC0-c TC0-c

DL-Lite H pos TC0-c TC0-c | co-PM-c | co NP-c

DL-Lite H core TC0-c | L-c TC0-c | L-c | co NP-c | ? | co-PM-c | co NP-c | ?

Table 1: Data complexity of cardinality queries based upon concept and role atoms for various DL-Lite dialects. : upper bound holds for all DL-Lite H core ontologies without negative role inclusions.

Knowledge Bases. We assume mutually disjoint sets NC of concept names (unary predicates), NR of role names (binary predicates), and NI of individual names (constants). We denote by N R the set NR {R | R NR} of role names and their inverses. A knowledge base (KB) K = (T , A) consists of an ABox (dataset) A and a TBox (ontology) T . An ABox is a ﬁnite set of concept assertions A(b) (with A NC, b NI) and role assertions P(a, b) (with P NR, a, b NI), while the TBox consists of a ﬁnite set of axioms, whose forms are dictated by the considered description logic. In this paper, our focus will be on DL-Lite H core (alternatively referred to as DL-Lite R), which is the logic underlying the OWL 2 QL proﬁle. DL-Lite H core TBoxes contain four types of axioms: positive concept inclusions B1 B2, negative concept inclusions B1 B2, positive role inclusions R1 R2, and negative role inclusions R1 R2, where the Bi and Ri are positive concepts and roles given by:

Bi := A | Ri Ri : = P | P (A NC, P NR)

The sublogic DL-Litecore allows only concept inclusions (which may be either positive or negative), while DL-Lite H pos is restricted to positive (concept and role) inclusions. We denote by Ind(A) the set of individuals occurring in an ABox A. A signature is a ﬁnite set of concept and role names. Given a signature Σ, we denote by Σ C (resp. Σ R ) the set of positive concepts (resp. roles) built from Σ. The signature of a TBox T (resp. ABox A) is the set of concept and role names it contains, denoted sig(T ) (resp. sig(A)). To simplify the presentation, we will assume w.l.o.g. that sig(A) sig(T ). Semantics of KBs. An interpretation takes the form I = ( I, I), where I is a non-empty set (called the domain) and I is the interpretation function that maps each A NC to AI I, each P NR to PI I I, and each a NI to a I. In this paper, we will make the Standard Names Assumption by setting a I = a. Note however that our results only rely upon the weaker Unique Names Assumption (UNA), which stipulates that a I = b I whenever a = b. The UNA is commonly adopted for DL-Lite KBs and enables more interesting reasoning in the context of counting queries. The function I is extended to general concepts and roles as follows: (P )I = {(e, d) | (d, e) PI}, ( R)I = {d | (d, e) RI}, and ( G)I = I \GI. An inclusion G H is satisﬁed in I if GI HI; an assertion A(b) (resp. P(a, b)) is satisﬁed in I if b AI (resp. (a, b) PI). An interpretation is a model of a TBox T (resp. ABox A)) if it satisﬁes all axioms in T (resp. assertions in A), and it is a model of a KB K = (T , A) if it is a model of both T and A. A KB

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

(a) Initial portion of the canonical model of Ke.

(b) Another model of Ke.

(c) Interpretation of strategy σe.

Figure 1: Models of the example KB Ke. For readability, we have omitted concepts and highlighted the role S from the cardinality query.

is satisﬁable if it has at least one model. An inclusion (resp. assertion) Φ is entailed from T (resp. K), written T |= Φ (resp. K |= Φ), if Φ is satisﬁed in every model of T (resp. K). We use K |= R(a) (resp. K |= R(a, b) with R N R ) to indicate a RI (resp. (a, b) RI) for every model I of K. Example 1. As a running example, we will consider the KB Ke = (Te, Ae) whose TBox contains the following inclusions A1 T1 A2 T2 T 1 S R 1 R 2 B1 R1 B2 R2 R 1 S R 1 T 1 T 2 S S S R 2 S

and whose ABox contains the assertions {A1(a1), A2(a2), B1(b1), B2(b2), R1(a1, a2), S(b2, b1)} Two ﬁnite models of Ke are displayed in Figures 1b and 1c. Canonical Model. Every satisﬁable DL-Lite H core KB K = (T , A) has a canonical model CK, deﬁned as follows. The domain of CK contains Ind(A) and all words a R1 . . . Rn, with a Ind(A), Ri N R , and n 1, such that: K |= R1(a) and there is no R1(a, b) A; for 1 i < n, T |= R i Ri+1 and R i = Ri+1. Concept and role names are interpreted as follows: ACK = {a Ind(A) | K |= A(a)}

{a R1 . . . Rn CK \ Ind(A) | T |= R n A}

PCK = {(a, b) | P(a, b) A} {(e1, e2) | e2 = e1R and T |= R P}

{(e2, e1) | e2 = e1R and T |= R P } We use gen K to refer to the set of generated roles, i.e. those R N R such that CK contains an element w R. Example 2. An initial portion of (the inﬁnite) canonical model of Ke is displayed in Figure 1a. Observe that gen K = {S, S , R1, R2, T1, T2}. It is well known (see e.g. [Calvanese et al., 2007]) that, for every model I of K, there is a homomorphism from CK to I, i.e. a function f : CK I such that (i) f(a) = a for all a Ind, (ii) e ACK implies f(e) AI, and (iii) (d, e) PCK implies (f(d), f(e)) PI.

Cardinality Queries. A cardinality query is either a concept cardinality query z.C(z) or a role cardinality query z1, z2.S(z1, z2). Throughout the paper, we use q C (resp. q S) as a shorthand for the cardinality query based upon C (resp. S). A match for a cardinality query q C (resp. q S) in an interpretation I is an element of CI (resp. SI). We deﬁne the answer to a cardinality query q in an interpretation I, denoted q I, as the number of matches of q in I, or equivalently, as the cardinality of FI, with F the concept or role name in q. A certain answer to q w.r.t. K is an interval [m, M] N N such that q I [m, M] for every model I of K. Example 3. Consider the role cardinality query q S. The answer to q S is + in CKe, 6 in the model from Figure 1b, and 5 in the model from Figure 1c. The latter implies that [6, + ] is not a certain answer. We leave it is an exercise to ﬁnd a model with 3 matches and show there is no model with fewer matches, which means that [m, + ] is a certain answer to q S over Ke if and only if m 3. Cardinality queries as deﬁned above correspond to a special case of the counting queries considered in [Kostylev and Reutter, 2015; Bienvenu et al., 2020; Calvanese et al., 2020a]. Observe that since DL-Lite H core cannot restrict the size of models, the value M in a certain answer [m, M] must be + whenever the query predicate F is satisﬁable w.r.t. T (i.e. there is a model I of T such that FI = ). For this reason, we assume the latter condition holds and focus on identifying certain answers of the form [m, + ]. Complexity. We will be interested in classifying the complexity of the following problem:

OMQA(q, T ): Given A and an integer m 1 (in binary), decide whether [m, + ] is a certain answer to q w.r.t. K.

where (q, T ) is an ontology-mediated query (OMQ) based upon a cardinality query q and a TBox T formulated in DL-Lite H core or one of its sublogics. Note that we are adopting the data complexity measure as (q, T ) is ﬁxed. Beyond well-known complexity classes such as P and co NP, we will refer to the following classes: TC0 is the class of problems solvable by families of constant-depth

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

polynomial-size circuits based upon AND, OR, NOT, and threshold gates, and L (resp. NL) is the class of problems solvable in deterministic (resp. nondeterministic) logarithmic space. It is known that: TC0 L NL ... P co NP.

3 Tractable Cases In this section, we identify two settings in which cardinality queries can be answered with the lowest possible complexity: Theorem 1. OMQA(q, T ) is TC0-complete if either (i) q is a role cardinality query and T a DL-Litecore TBox, or (ii) q is a concept cardinality query and T is a DL-Lite H core TBox without negative role inclusions. The remainder of this section is devoted to establishing TC0 membership for case (i) where our query is q S = z1, z2.S(z1, z2). A similar but simpler argument can be used for the membership half of case (ii), while TC0hardness is easily shown by reduction from the TC0-complete NUMONES problem [Aehlig et al., 2007] asking, given a binary string X and k 1, whether X contains at least k 1-bits. Existing proofs of sub-polynomial data complexity for restricted classes of counting queries rely on the canonical model minimizing the number of matches [Bienvenu et al., 2020; Calvanese et al., 2020a]. However, for the class of cardinality queries, the canonical model may not yield the minimum value. Therefore, we develop a different approach based upon a systematic exploration of a set of models that is guaranteed to contain an optimal model and whose size depends only on the TBox. This special set of models will be induced from strategies that dictate how to merge elements of the canonical model. To show such models contain the optimal value, we show that if we extract a strategy σ from an arbitrary model I and consider any model J induced by σ, then J has at most as many matches as the initial model I. We now formalize this approach. In order to abstract from speciﬁc ABox individuals, we introduce types. Deﬁnition 1. A type for a TBox T is a subset of sig(T ) C . The set of all types is ΘT = 2sig(T ) C . We denote by θK(d) the type of a domain element d w.r.t. K and deﬁne it by: θK(d) = B sig(T ) C | K |= B(d) if d Ind(A), else θK(d) = . Example 4. In our running example, θKe(a1) = {A1, R1, T1} and θKe(α) = (since α Ind(Ae)). Strategies indicate for each generated role R the type onto which all elements w R should merge. Several copies of a type might be required to comply with negative inclusions (e.g. R1 and R2 associated to the same type but T |= R 1 R 2 ). Deﬁnition 2. A strategy σ for the KB K is a function from gen K to ΘT {1, . . . , sig(T ) R }, that satisﬁes:

1. R gen K : σ(R) = (t, i) B t T |= R B.

2. R1, R2 gen K : σ(R1)=σ(R2) T |= R 1 R 2 . 3. t ΘT , if t = , then |{i | R gen K, σ(R) = (t, i)}| |{a | a Ind(A) θK(a) = t}|. Conditions 1 and 2 ensure that merging will not violate any negative inclusions. Condition 3 ensures the ABox provides at least as many individuals of a non-empty type as the strategy requires copies of this type.

Example 5. The following mapping σe is a strategy for Ke:

T1 7 ( , 1) R2 7 ({B1, R1, S, S }, 1) T2 7 ( , 2) S 7 ( , 2) R1 7 ( , 2) S 7 ({A1, R1, T1}, 1)

To construct a model from a strategy σ, the basic idea is to merge elements w R with an element of type σ(R), with the latter selected according to a choice of well-typed elements: Deﬁnition 3. A mapping ch : gen K Ind(A) { i | i = 1, . . . , sig(T ) R }, is a choice of well-typed elements for σ over K if it satisﬁes the following conditions: 1. R gen K, i such that σ(R) = (θK(ch(R)), i) 2. R1, R2 gen K, ch(R1) = ch(R2) σ(R1) = σ(R2). Example 6. The function che, deﬁned as below, is a choice of well-typed elements for σe over Ke:

T1 7 1 T2 7 2 R1 7 2 R2 7 b1 S 7 2 S 7 a1

It turns out however that when R = S or R = S , it is useful to depart from this guideline in order to reduce the number of query matches, as this stand-alone example illustrates: Example 7. Consider T = {A S, B S } and A = {A(a1), A(a2), B(b1), B(b2)}. If we merge a1S with a2S, and b1S with b2S , then there will be at least three matches of q S, no matter which further merges are performed. However, by pairing a1 with b1 and a2 with b2, we can obtain a model with only two matches: (a1, b1), (a2, b2). The next three deﬁnitions serve to identify the critical elements for which such a pairing operation is useful. Deﬁnition 4. We set D+ K = a | a Ind(A) a S CK

and D K = a | a Ind(A) a S CK .

Deﬁnition 5. Given a strategy σ, we set D+ σ = {R | R dom(σ) \ {S, S } T |= R S S / t if σ(R) = (t, k)} and D σ = {R | R dom(σ)\{S, S } T |= R S S / t if σ(R) = (t, k)}. Deﬁnition 6. Let ch be a choice of well-typed elements for σ. We set crit+ = D+ K ch(D+ σ ) and crit = D K ch(D σ ) and use critical elements to refer to the elements of these sets. Example 8. For σe and che as deﬁned in Examples 5 and 6, we have crit+ = {a2, b1, 1, 2} and crit = {a2, 2}. Intuitively, a pairing matches critical elements from crit+ (which require an outgoing S) with those from crit (which require an incoming S). Deﬁnition 7. A pairing for ch and σ consists of two partial functions p+ : crit+ crit and p : crit crit+ such that one of the functions is total and injective, and the other is its partial inverse. Example 9. A pairing for che and σe is given by p+ e = {a2 7 a2, b1 7 2} and p e = {a2 7 a2, 2 7 b1}. We are now ready to deﬁne the interpretation of a strategy. Deﬁnition 8. Consider a strategy σ, choice of well-typed elements ch, and pairing (p+, p ) for ch. For every R sig(T ) R , pick a function s R that maps every individual in

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

{a | K |= R(a, b) for some b NI} to an individual s R(a) such that K |= R(a, s R(a)). Deﬁne function χ as follows:

CK Ind(A) { i | i = 1, . . . , sig(T ) R } a 7 a

( s S(χ(w)) if s S(χ(w)) is deﬁned p+(χ(w)) else if p+(χ(w)) is deﬁned ch(S) otherwise

s S (χ(w)) if s S (χ(w)) is deﬁned p (χ(w)) else if p (χ(w)) is deﬁned ch(S ) otherwise w R 7 ch(R)

The interpretation of σ (according to ch, (p+, p ) and the s R) has domain χ( CK) and interpretation function χ CK. Example 10. With choice che and pairing (p+ e , p e ), we get χ(b2R2) = ch(R2) = b1, χ(b2R2S) = p+ e (b1) = 2, and χ(b2R2S ) = s S (b1) = b2 (observe that on our example, the function s S is uniquely deﬁned, and the same is true for the other roles). Figure 1c displays the interpretation of σe. Observe that the interpretation of a strategy σ depends not only on σ but also on the functions ch, p+, p , s R. Importantly, however, the key property of such interpretations (stated in Lemma 1 later in this section) holds for any particular choice of these functions. It remains to prove that a model minimizing the number of matches can be found among the interpretations of strategies. The ﬁrst step is to to extract a strategy from a model. Deﬁnition 9. Let I be a model of K, f : CK I be a homomorphism, and repr be a function mapping each role R gen K to an element with shape w R from CK. Then P = {P1, . . . , Pk}, deﬁned by

{P1, . . . , Pk} = {(f repr) 1(w) | w I} \ { }

is a partition of gen K. The strategy extracted from I (for f and repr) is deﬁned as:

gen K ΘT {1, . . . , sig(T ) R } R 7 ((θK f repr)(R), i) with R Pi

Example 11. In our running example, there is a unique homomorphism fe from CKe to the model displayed in Figure 1b. Let repre be:

T1 7 a1T1 R2 7 b2R2 T2 7 a2T2 S 7 b1SSS R1 7 b1R1 S 7 a2S

The strategy extracted from this model (for fe and repre) is the strategy provided in Example 5. By applying the next lemma to a model I having the fewest possible number of matches, we obtain the desired conclusion: there is a model minimizing the number of matches among the models obtained by interpreting a strategy. Lemma 1. Let I be a model of K, and J an interpretation of a strategy extracted from I. J is a model of K and q J S q I S.

We now sketch how to construct a family of TC0 circuits (one for each size of ABox) to decide OMQA(q S, T ). Each such circuit ﬁrst computes the set gen K and the type of each

ABox individual. Next, for each function ϱ : gen K ΘT {1, . . . , sig(T ) R } satisfying Conditions 1 and 2 of Deﬁnition 2, the circuit decides whether ϱ is a strategy for K (i.e. Condition 3 holds), and if so, computes the number of matches of q S in interpretations induced by ϱ. Importantly, this can be done without actually building interpretations: in the appendix we give an explicit formula for this number and show it can be computed with a TC0 circuit. Moreover, the number of strategies depends only on |T |, so is constant w.r.t. data complexity. Finally, the circuit computes the minimum value across strategies and compares it with the input number.

4 Complexity Classiﬁcation for DL-Lite H pos

In this section, we consider DL-Lite H pos TBoxes. We show that co NP-hard OMQs exist and prove a complexity trichotomy which precisely delineates the tractability boundary. We begin by exhibiting a co NP-complete1 situation. Example 12. OMQA(q S, {B R1, R1 S, R 1 R2, R2 S}) is co NP-complete. We consider the NPcomplete SET COVER problem: given a set U, set of subsets S 2U whose union is U, and number k, decide whether there exists a k-cover, i.e. a subset C of S with |C| k whose union is U. We prove that there exists a k-cover iff [Σs S|s| + k + 1, + ] is not a certain answer on the following ABox: {B(u) | u U} {S(u, s) | u s, s S}. Intuitively, from a k-cover C, we obtain a countermodel in which role R1 contains pairs (u, s) such that u s and s C, and there is one outgoing R2 role from each s C. The following deﬁnition abstracts the preceding example. Deﬁnition 10. A TBox T admits a propagation of role W by a concept B sig(T ) C and roles R1, R2 if T entails {B R1, R1 W, R 1 R2, R2 W}. A propagation of S (or S ) is not sufﬁcient to ensure co NPhardness: the reduction sketched in Example 12 will fail in the presence of interferences , which can be of three types. Deﬁnition 11. A role U interferes with the propagation of W by B, R1, R2 if it satisﬁes one of the following conditions: 1. T |= {B U, U W, U W }; 2. T |= { W U, U W} and either T |= U W or T |= R2 W ; 3. if B = T and T W, then T |= { T U, U W} and either T |= U W or T |= R2 W . Remarkably, the existence of a propagation without any interfering role (which we call a non-trivial propagation) ensures co NP-hardness, while its absence ensures that OMQA(q S, T ) is in P. We further distinguish two tractable cases, depending on the existence of a non-trivial pairing. Deﬁnition 12. A TBox T admits a non-trivial pairing of S if there exist B sig(T ) C and R sig(T ) R such that

T |= B R T |= R S T |= R S T |= S S

and if B = T, then either T |= T S or T |= T S .

1A P upper bound for atomic counting queries in DL-Lite H pos erroneously appears in Table 1 of [Calvanese et al., 2020a], but was corrected in a later ar Xiv version [Calvanese et al., 2020b].

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

To formulate our trichotomy result, we recall that a matching in a graph (V, E) is a set of edges that are pairwise vertexdisjoint. The PERFECT MATCHING problem (abbreviated to PM) asks whether there exists a matching such that every vertex is incident to one of its edges. Despite being the focus of intensive research, its exact complexity remains open: in P [Edmonds, 1965] and NL-hard [Chandra et al., 1984].

Theorem 2. Let T be a DL-Lite H pos TBox. OMQA(q S, T ) is co NP-complete if T admits a non-trivial propagation of either S or S , is L-equivalent to the complement of PM if it does not admit such a non-trivial propagation but admits a non-trivial pairing of S, and is in TC0 otherwise.

Proof sketch. The co NP-hardness proof generalizes the reduction sketched in Example 12. If there is a non-trivial pairing (but no non-trivial propagation), we show that, up to trivial cases solvable in TC0, the existence of a model with few matches is equivalent to the existence of a large matching between critical individuals. This yields L-equivalence with the MAXIMUM MATCHING decision problem, which is L-equivalent to the better-known PM problem [Rabin and Vazirani, 1989]. TC0 membership is proven by case analysis, where we exhibit for each case a model with an optimal (and easily computable) number of matches.

5 First Look at DL-Lite H core We now turn to DL-Lite H core and exhibit new situations that are not captured by the preceding complexity classiﬁcation. First, we observe that negative concept and role inclusions introduce two new sources of co NP-hardness.

Theorem 3. For T = {B U, U S, C V, V S, U V }, OMQA(q S, T ) is co NP-complete.

Proof sketch. Let (U, S, k) be an instance of SET COVER, and consider the ABox A = {B(u) | u U} {S(u, s ) | u s, s S} {C(s) | s S} {S(s, s ) | s S}. It can be shown that no k-cover exists iff every model of (T , A) has at least |S| + P s S |s| + k + 1 matches.

Theorem 4. For T = { B U, U S, U V, V S , V W }, OMQA(q S, T ) is co NP-complete.

Perhaps more surprising, we show that there exist co NPhard OMQs based upon concept cardinality queries.

Theorem 5. For T = {A U, U C, U U , B V, V C, V V , U V }, OMQA(q C, T ) is co NP-complete.

Proof sketch. Hardness is shown by reducing the tautology problem. Three individuals are introduced per propositional variable (one for the variable itself with concept A, two for its possible truth values), as well as one individual per clause (with concept B). Each variable should have a truth value given by U (whose possible values in the ABox are restricted through the use of U ), and each clause should have a falsiﬁed literal given by V (whose possibles values in the ABox are restricted, according to the input formula, with V ). The input formula is a tautology iff every model introduces a new element marked C (as a witness for either U or V).

Moreover, we further show that L-complete OMQs exist. The next result employs a role cardinality query, but a similar result can be obtained using a concept cardinality query. Theorem 6. For T = { B R, R S, R R }, OMQA(q S, T ) is L-complete.

Proof. Hardness is by reduction from the L-complete problem UNDIRECTED FOREST ACCESSIBILITY (UFA) [Cook and Mc Kenzie, 1987], which takes as input an undirected acyclic graph (V, E) with two connected components, vertices s, t V, and asks if t is reachable from s. We set A = {B(u) | u V} {S(u, v) | {u, v} E} {S(s, v ), S(t, v )} {R(s, v ), R(t, v )} and observe that ((V, E), s, t) UFA iff [2|E| + 3, + ] is a certain answer. Indeed, there are 2|E| + 2 matches in the ABox, and a further match arises if we add R-atoms to satisfy B R in a connected component that contains neither s nor t (such a match can be avoided if it contains s or t). For the upper bound, we characterize the minimum number of matches based upon the graph structure of the ABox and show it can be computed in L, by using an oracle for undirected reachability.

Our results imply that, under standard complexity-theoretic assumptions, at least four different complexities are possible for cardinality queries coupled with DL-Lite H core ontologies.

6 Conclusion In this paper, we investigated the complexity of answering cardinality queries in the presence of DL-Lite ontologies. Our study provides several novel insights into the challenge of adopting counting queries in OMQA. On the one hand, we identiﬁed new sources of co NP-hardness, showing that even single-atom counting queries can be difﬁcult to handle (which closes some questions about restricted forms of counting queries left open in [Calvanese et al., 2020a]). On the other hand, we exhibited several settings in which cardinality queries can be answered with (sub-)polynomial data complexity; in particular, the problem is in TC0 when the ontology is formulated in DL-Litecore. Interestingly, our tractability results do not rely on the canonical model yielding the minimum number of matches, but instead involve a sophisticated analysis of how to best merge witnesses for existential axioms. Differently from [Kostylev and Reutter, 2015; Calvanese et al., 2020a; Bienvenu et al., 2020], we conducted our complexity analysis on the level of ontology-mediated queries, and notably obtained a full classiﬁcation of the complexity of OMQs based upon DL-Lite H pos ontologies. We ﬁnd it promising that very low data complexity can be obtained even for settings in which non-trivial optimization is required, and we plan to explore how to extend and adapt our techniques to identify further tractability results for counting queries. Another important topic for future work is to transform our TC0 procedures into more practical algorithms that are suitable for implementation on top of database systems.

Acknowledgements This work was partially supported by ANR project CQFD (ANR-18-CE23-0003).

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

[Aehlig et al., 2007] Klaus Aehlig, Stephen A. Cook, and Phuong Nguyen. Relativizing small complexity classes and their theories. In Proc. of the 21st International Workshop on Computer Science Logic (CSL), pages 374 388, 2007.

[Artale et al., 2009] Alessandro Artale, Diego Calvanese, Roman Kontchakov, and Michael Zakharyaschev. The DL-Lite family and relations. Journal of Artiﬁcial Intelligence Research (JAIR), 36:1 69, 2009.

[Baader et al., 2017] Franz Baader, Ian Horrocks, Carsten Lutz, and Ulrike Sattler. An Introduction to Description Logic. Cambridge University Press, 2017.

[Baget et al., 2011] Jean-Franc ois Baget, Michel Lecl ere, Marie-Laure Mugnier, and Eric Salvat. On rules with existential variables: Walking the decidability line. Journal of Artiﬁcial Intelligence (JAR), 175(9-10):1620 1654, 2011.

[Bienvenu and Ortiz, 2015] Meghyn Bienvenu and Magdalena Ortiz. Ontology-mediated query answering with data-tractable description logics. In Tutorial Lectures of the 11th Reasoning Web International Summer School, pages 218 307, 2015.

[Bienvenu et al., 2020] Meghyn Bienvenu, Quentin Mani ere, and Micha el Thomazo. Answering counting queries over DL-Lite ontologies. In Proc. of the 29th International Joint Conference on Artiﬁcial Intelligence (IJCAI), pages 1608 1614, 2020.

[Cal ı et al., 2012] Andrea Cal ı, Georg Gottlob, and Thomas Lukasiewicz. A general datalog-based framework for tractable query answering over ontologies. Journal of Web Semantics (JWS), 14:57 83, 2012.

[Calvanese et al., 2007] Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Riccardo Rosati. Tractable reasoning and efﬁcient query answering in description logics: The DL-Lite family. Journal of Automated Reasoning (JAR), 39(3):385 429, 2007.

[Calvanese et al., 2008] Diego Calvanese, Evgeny Kharlamov, Werner Nutt, and Camilo Thorne. Aggregate queries over ontologies. In Proc. of the 2nd International Workshop on Ontologies and Information Systems for the Semantic Web (ONISW), pages 97 104, 2008.

[Calvanese et al., 2020a] Diego Calvanese, Julien Corman, Davide Lanti, and Simon Razniewski. Counting query answers over a DL-Lite knowledge base. In Proc. of the 29th International Joint Conference on Artiﬁcial Intelligence (IJCAI), pages 1658 1666, 2020.

[Calvanese et al., 2020b] Diego Calvanese, Julien Corman, Davide Lanti, and Simon Razniewski. Counting query answers over a DL-Lite knowledge base (extended version). ar Xiv:2005.05886v3, 2020.

[Chandra et al., 1984] Ashok K. Chandra, Larry J. Stockmeyer, and Uzi Vishkin. Constant depth reducibility. SIAM Journal on Computing, 13(2):423 439, 1984.

[Cook and Mc Kenzie, 1987] Stephen A. Cook and Pierre Mc Kenzie. Problems complete for deterministic logarithmic space. Journal of Algorithms, 8(3):385 394, 1987. [Edmonds, 1965] Jack Edmonds. Paths, trees and ﬂowers. Canadian Journal of Mathematics, 17:449 467, 1965. [Feier et al., 2021] Cristina Feier, Carsten Lutz, and Marcin Przybylko. Answer counting under guarded TGDs. In Proc. of the 24th International Conference on Database Theory (ICDT), 2021. [Kostylev and Reutter, 2015] Egor V. Kostylev and Juan L. Reutter. Complexity of answering counting aggregate queries over DL-Lite. Journal of Web Semantics (JWS), 33:94 111, 2015. [Poggi et al., 2008] Antonella Poggi, Domenico Lembo, Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Riccardo Rosati. Linking data to ontologies. Journal of Data Semantics, 10:133 173, 2008. [Rabin and Vazirani, 1989] Michael O. Rabin and Vijay V. Vazirani. Maximum matchings in general graphs through randomization. Journal of Algorithms, 10(4):557 567, 1989. [Xiao et al., 2018] Guohui Xiao, Diego Calvanese, Roman Kontchakov, Domenico Lembo, Antonella Poggi, Riccardo Rosati, and Michael Zakharyaschev. Ontologybased data access: A survey. In Proc. of the 27th International Joint Conference on Artiﬁcial Intelligence (IJCAI), pages 5511 5519, 2018.

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)