# querying_attributed_dllite_ontologies_using_provenance_semirings__4b3c228f.pdf

The Thirty-Third AAAI Conference on Artiﬁcial Intelligence (AAAI-19)

Querying Attributed DL-Lite Ontologies Using Provenance Semirings

Camille Bourgaux T el ecom Paris Tech & DI ENS, CNRS, ENS, PSL University & Inria, France

Ana Ozaki KRDB Research Centre, Free University of Bozen-Bolzano, Italy

Attributed description logic is a recently proposed formalism, targeted for graph-based representation formats, which enriches description logic concepts and roles with ﬁnite sets of attribute-value pairs, called annotations. One of the most important uses of annotations is to record provenance information. In this work, we ﬁrst investigate the complexity of satisﬁability and query answering for attributed DL-Lite R ontologies. We then propose a new semantics, based on provenance semirings, for integrating provenance information with query answering. Finally, we establish complexity results for satisﬁability and query answering under this semantics.

Introduction Description logic (DL) (Baader et al. 2007) ontologies allow to express complex relationships between concepts and roles, but they are ill-equipped to represent and reason about multiple and heterogeneous types of meta-knowledge, such as the temporal validity of a fact, or its source. For instance, the YAGO ontology (Hoffart et al. 2013) attaches provenance metadata to its facts (e.g., source and conﬁdence of the extraction) as well as temporal and geospatial information. Many practical applications therefore use knowledge graphs, which consist, like DL assertions, of directed labelled graphs but that also allow, unlike DLs, to add annotations to vertices and edges. Property Graph, the data model used in many graph databases (Rodriguez and Neubauer 2010), and Wikidata, the knowledge graph used by Wikipedia (Vrandeˇci c and Kr otzsch 2014), are prominent examples of such labelled graphs. To bridge the gap between DL and knowledge graphs, attributed description logics (Kr otzsch et al. 2017; Kr otzsch et al. 2018) have been recently introduced. They enrich DL concepts and roles with ﬁnite sets of attribute-value pairs, called annotations, and allow to express constraints on these annotations in the ontology inclusions. For example, the attributed DL assertion spouse(taylor, burton)@[start : 1975, end : 1976] states that Liz Taylor was married to Richard Burton from 1975 to 1976, and the following role inclusion expresses that spouse is a symmetric relation, where the inverse statement has the same start and end dates:

spouse@X spouse @[start : X.start, end : X.end].

Copyright c 2019, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

While the work by Kr otzsch et al. studied the complexity of the satisﬁability problem for several attributed DL languages, our focus in this paper is on query answering in attributed DL. The problem of querying DL ontologies using database-style queries (in particular, conjunctive queries) is an important research topic for which tractable DL languages have been tailored (Bienvenu and Ortiz 2015). We consider here the DLLite R dialect of the DL-Lite family (Calvanese et al. 2007), which underlies the OWL 2 QL proﬁle (Motik et al. 2009), and investigate attributed DL-Lite R. One of the main motivations of attributed DLs is to integrate annotations carrying provenance information, which are very frequent in knowledge graphs1. Recording and tracking provenance information is an important topic in database theory, where provenance semirings (Green, Karvounarakis, and Tannen 2007) were introduced as an abstract tool to relate the result of a query with information about the original sources of the data and the ways in which the query was obtained. Such information comes in the form of a provenance polynomial. It has been useful for many applications, such as query answer explanation or querying of probabilistic databases (Senellart 2017; Cheney, Chiticariu, and Tan 2009; Suciu et al. 2011). Bienvenu, Deutch, and Suchanek (2012) argued that provenance would be useful for Web data, e.g., to establish the authorship or determine the trust in a given piece of data, or to help to guarantee the privacy of information. Provenance has also been investigated for non-relational databases and Semantic Web (see Conclusion for discussion of related work). In this work, we propose a new semantics for the attributed DL annotations, based on provenance semirings, so that queries can be annotated with provenance polynomials. To the best of our knowledge, this is the ﬁrst work where provenance polynomials are embedded into both the syntax and the semantics of the query. The ﬁrst section introduces attributed DL-Lite R, following the formalism given by Kr otzsch et al. (2017; 2018). We then deﬁne attributed conjunctive queries and study the complexity of satisﬁability and query answering in attributed DL-Lite R. We next present our new semantics for the annotations to model provenance and analyse the complexity of satisﬁability and query answering with this new model, considering

1E.g., in Wikidata reference (provenance) is one the most frequent types of annotations https://www.wikidata.org.

queries that can be annotated with provenance polynomials. In particular, we show that satisﬁability and query answering in attributed DL-Lite R are PSPACE-complete problems. For the semirings-based semantics and queries annotated with provenance polynomials, we establish that although satisﬁability is EXPTIME-hard in the general case, the new semantics does not increase the complexity of query answering if the ontology contains only assertions and a restricted form of inclusions, which is close to the database setting considered by Green, Karvounarakis, and Tannen (2007). We also investigate various restrictions of the general setting. Our results are for combined complexity, when both the query and the ontology are considered as the input. Proofs are available in our technical report (Bourgaux and Ozaki 2018).

Attributed DL-Lite Attributed DLs are deﬁned over the usual DL signature with countable sets of concept names NC, role names NR, and individual names NI. We consider an additional set NU of set variables and a set NV of object variables. Annotation sets are deﬁned as ﬁnite binary relations, understood as sets of attribute-value pairs. Attributes and values refer to domain elements and are syntactically denoted by individual names. To describe annotation sets, we use speciﬁers. The set S of speciﬁers contains the following expressions:

set variables X NU;

closed speciﬁers [a1: v1, . . . , an: vn]; and

open speciﬁers a1: v1, . . . , an: vn ,

where ai NI and vi is either an individual name in NI, an object variable in NV, or an expression of the form X.a, with X a set variable in NU and a an individual name in NI. We use X.a to refer to the (ﬁnite, possibly empty) set of all values of attribute a in an annotation set X. A ground speciﬁer is a closed or open speciﬁer that only contains individual names. Intuitively, closed speciﬁers deﬁne speciﬁc annotation sets whereas open speciﬁers merely provide lower bounds (Kr otzsch et al. 2017).

Syntax. A DL-Lite R @ role (resp. concept) assertion is an expression R(a, b)@S (resp. A(a)@S), with R NR (resp. A NC), a, b NI, and S S a ground closed speciﬁer. DL-Lite R @ role and concept inclusions are of the form X : S (P Q) and X : S (B C) respectively, where X NU, S S is a closed or open speciﬁer, and P, Q and B, C are respectively role and concept expressions deﬁned by the following syntax, where A NC, R NR and S S:

P ::= R@S | R @S, Q ::= P | P, B ::= A@S | P, C ::= B | B.

We further require that all variables are safe. For set variables, this means that if Y NU occurs on the right side of an inclusion (or in a speciﬁer S such that the preﬁx of the inclusion is X : S and X occurs on the right side), then the speciﬁer of the left side expression is Y . For object variables, if they occur on the right side of an inclusion then they must also occur on the left side or in S such that X : S and X occurs on the left. Note that if object variables occur in S with X : S in

the preﬁx and X on the right side, then X is the speciﬁer on the left by the safety deﬁnition. If the preﬁx of an inclusion is X : S and X does not occur in the role/concept expressions of the inclusion, we may ommit X : S. A DL-Lite R @ ontology is a set of DL-Lite R @ assertions, role and concept inclusions. Also, we say that a DL-Lite R @ ontology is ground if it does not contain variables. To simplify notation, we omit the speciﬁer (meaning any annotation set ) in role or concept expressions. In this sense, any DLLite R axiom is also a DL-Lite R @ axiom. Moreover, we omit preﬁxes of the form X : , which state that there is no restriction on X. The size of an ontology O (or a query, deﬁned later), which we may denote with |O|, is the length of the string that represents it. Example 1. Our running example s ontology Oex expresses that those who are married (role spouse) to someone are married (concept Married), annotated with the same sources from which the information has been extracted (attribute src): spouse@X Married@ src : X.src . The assertion states that Zsa Zsa Gabor was married to Jack Ryan and it is annotated with the sources of this information: spouse(gabor, ryan)@[src:s1, src:s2].

Semantics. An interpretation I = ( I, I) of an attributed DL consists of a non-empty domain I and a function I. Individual names a NI are interpreted as elements a I I. To interpret annotation sets, we use the set ΦI := {Σ I I | Σ is ﬁnite } of all ﬁnite binary relations over I. Each concept name A NC is interpreted as a set AI I ΦI of elements with annotations, and each role name R NR is interpreted as a set RI I I ΦI of pairs of elements with annotations. Each element (pair of elements) may appear with multiple different annotations. I satisﬁes a concept assertion A(a)@[a1: v1, . . . , an: vn] if (a I, {(a I 1, v I 1 ), . . . , (a I n, v I n)}) AI. Role assertions are interpreted analogously. Expressions with free set or object variables are interpreted using variable assignments Z mapping object variables x NV to elements Z(x) I and set variables X NU to ﬁnite binary relations Z(X) ΦI. For convenience, we also extend variable assignments to individual names, setting Z(a) = a I for every a NI. A speciﬁer S S is interpreted as a set SI,Z ΦI of matching annotation sets. We set XI,Z := {Z(X)} for variables X NU. The semantics of closed speciﬁers is deﬁned as:

[a: v]I,Z := {{(a I, Z(v))}} where v NI NV;

[a: X.b]I,Z := {{(a I, δ) | (b I, δ) Z(X)}};

[a1: v1, . . . , an: vn]I,Z := {Sn i=1 Fi |Fi [ai: vi]I,Z}.

SI,Z therefore is a singleton set for set variables and closed speciﬁers. For open speciﬁers, however, we deﬁne a1: v1, . . . , an: vn I,Z to be the set:

{F ΦI | F G for {G} = [a1: v1, . . . , an: vn]I,Z}. Now given A NC, R NR, and S S, we deﬁne:

(A@S)I,Z := {δ | (δ, F) AI for some F SI,Z},

(R@S)I,Z:={(δ, ϵ) | (δ, ϵ, F) RI for some F SI,Z}.

Further DL expressions are deﬁned as usual: (R @S)I,Z = {(γ, δ) | (δ, γ) (R@S)I,Z}, P I,Z = ( I I)\P I,Z, P I,Z = {δ | there is (δ, ϵ) P I,Z}, CI,Z = I\CI,Z. I satisﬁes a concept inclusion X : S (B C) if, for all variable assignments Z that satisfy Z(X) SI,Z, we have BI,Z CI,Z. Satisfaction of role inclusions is deﬁned analogously. An interpretation I satisﬁes an ontology O, or is a model of O, if it satisﬁes all of its axioms. As usual, |= denotes the induced logical entailment relation. Example 2 (Example 1 cont d). Consider an interpretation I with domain I = {gabor, ryan, src, s1, s2} and such that I maps each individual name to itself and

spouse I = {(gabor, ryan, {(src, s1), (src, s2)})}

Married I = {(gabor, {(src, s1), (src, s2)})}.

The interpretation I is a model of Oex.

Reasoning in DL-Lite R @ In this section, we study the complexity of satisﬁability and query answering over DL-Lite R @ ontologies. Our ﬁrst result is that the satisﬁability problem, which is in NL for DL-Lite R (Artale et al. 2009), is harder for DL-Lite R @. The proof is by reduction from the word problem for polynomially space bounded deterministic Turing Machines (DTM). Annotations raise the complexity because they can encode conﬁgurations of a DTM, using expressions of the form X.b to encode the synchronization of successive conﬁgurations. Theorem 1. In DL-Lite R @, satisﬁability is PSPACE-hard. To prove the PSPACE upper bound for satisﬁability, we use grounding (Kr otzsch et al. 2017), which is a classical technique that consists in eliminating variables from an ontology to transform it into an equisatisﬁable ground ontology. The ground ontology can then be translated into an equisatisﬁable DL-Lite R ontology. The grounding leads to an exponential blowup of the ontology while the translation to DL-Lite R is polynomial. Since satisﬁability of DL-Lite R ontologies is in NL (Artale et al. 2009), it follows (by (Savitch 1970)) that satisﬁability of DL-Lite R @ ontologies is in PSPACE.

Theorem 2. In DL-Lite R @, satisﬁability is in PSPACE. We now turn our attention to the problem of querying DLLite R @ ontologies. In the following we only deﬁne and deal with conjunctive queries without free variables, i.e., boolean conjunctive queries (BCQ), as the problem of ﬁnding certain answers to a query is reducible to BCQ entailment. Deﬁnition 1 (Attributed Queries). An attributed boolean conjunctive query (BCQ@) q is an expression of the form:

x.X1 : S1, . . . , Xn : Sn ϕ(x) (1)

where, for 1 i n, Xi are the set variables occurring in ϕ(x), Si S, and ϕ(x) is a conjunction of atoms of the form A(t)@S or R(t, u)@S, with A NC, R NR, S S, and t, u individual names in NI or variables in x NV. We may write E(t)@S to refer to an atom of any of the two forms (E NC NR and t is a tuple of elements from NI x of the arity of E). An interpretation I = ( I, I) satisﬁes a BCQ@ q, written I |= q, if there exists a variable assignment Z such that:

Z(Xi) SI,Z i for all 1 i n; and (Z(t), F) EI for some F SI,Z, for every atom E(t)@S occurring in q. A BCQ@ q is entailed by O, written O |= q, iff q is satisﬁed by every model of O. A BCQ@ that consists of a single atom is an attributed boolean atomic query (BAQ@). We say that a BCQ@ is ground if it contains only ground speciﬁers.

BCQ@ can express conditions on annotations, for instance require that there exists an annotation set where a given attribute is present or has a speciﬁc value. Example 3. We modify Oex to express that those who have a spouse are married, associated with the same annotations:

spouse@X Married@X.

We also add assertions stating that Zsa Zsa Gabor was married to Jack Ryan from 1975 to 1976, while Liz Taylor was married to Richard Burton from 1975 to 1976, as well as the sources of this information:

spouse(gabor, ryan)@[start:1975, end:1976, src:s1], spouse(gabor, ryan)@[start:1975, end:1976, src:s2], spouse(taylor, burton)@[start : 1975, end : 1976, src : s3].

The following query expresses that Gabor and Taylor were married (to someone) with the same start and end dates:

qex = xy Married(gabor)@ start : x, end : y Married(taylor)@ start : x, end : y .

By the semantics of DL-Lite R @, it follows that Oex |= qex. This other query expresses that a set of sources that includes s1 and is associated with Gabor s married status is also associated with Taylor s married status:

q ex = X : src : s1 Married(gabor)@X Married(taylor)@ src : X.src .

By the semantics of DL-Lite R @, it follows that Oex |= q ex. While BCQ entailment is NP-complete in DL-Lite R, it follows from Theorem 1 that BAQ@ entailment is already PSPACE-hard. Indeed, satisﬁability can be reduced to BAQ@ entailment: O is unsatisﬁable iff O |= A(a) where A and a are respectively a concept and an individual name that do not occur in O. We show PSPACE-completeness of BCQ@ entailment by describing how to decide O |= q for a BCQ@ q, using only polynomial space w.r.t. the size of O and q. The main ingredients to prove our result are grounding, translation to DL-Lite R, and also query rewriting, a prominent query answering technique for DL-Lite R in which the query is rewritten w.r.t. the concept and role inclusions, to be evaluated over the assertions as in the classical database setting. However, as the ground version of O is of exponential size and the number of rewritten queries is exponential, we do not compute them but instead guess the DL-Lite R translation dl(q Z) of a grounded version q Z of q together with one of its rewritings q . We can verify in NP that q is entailed by the DL-Lite R translation of the assertions of O, in PTIME that dl(q Z) is the DL-Lite R translation of a grounded version of q,

and in PSPACE that q is indeed a rewriting of dl(q Z). For this last step, we propose a non-deterministic adaptation of the rewriting algorithm Perfect Ref for DL-Lite R by Calvanese et al. (2007) that takes as input dl(q Z), q and O. The main idea is to rewrite dl(q Z) by guessing at each step an atom of the query together with a positive inclusion that would appear in the DL-Lite R translation of the grounding of O, thus avoiding the computation of the grounding of O. Theorem 3. In DL-Lite R @, BCQ@ entailment is in PSPACE. The result of Theorem 3, which is for combined complexity, contrasts with the EXPTIME-hardness w.r.t. data complexity (only w.r.t. the data size) for MARPL, an attributed logic based on Datalog (Marx, Kr otzsch, and Thost 2017). Finally, we show lower complexity bounds in the case of ground ontologies. Indeed, when O is ground, one can build a DL-Lite R ontology of polynomial size w.r.t. the size of O that entails the DL-Lite R translation of a grounded version of q if and only if O |= q. Theorem 4. For ground DL-Lite R @ ontologies, satisﬁability is in PTIME and BCQ@ entailment is NP-complete.

Querying Using Provenance Semirings In this section, we investigate attributed DL in light of provenance semirings (Green, Karvounarakis, and Tannen 2007) and enhance the semantics of DL-Lite R @ to deal with provenance information. Semirings generalize formalisms such as why-provenance, lineages used in view maintenance, or the lineage used by the Trio uncertain management system (Senellart 2017). The main motivation is to use annotations to answer questions such as Where does the result come from? . Assuming that facts are annotated with their sources, we want to know which combinations of sources lead to the entailment of a query. Such annotations may represent various types of information, such as trust, probability, multiplicity or data classiﬁcation (see Example 8). Example 4 (Example 3 cont d). The result of the query qex over the ontology Oex can be obtained from source s3 together with any of s1, s2. Provenance semirings can formalize this information in the form of a provenance polynomial:

(s1 + s2) s3.

The intuitive meaning is that + corresponds to alternative use of data and to joint use of data. The goal of this section is to embed the formalism of provenance semirings into the semantics of DL-Lite R @, so that we can associate annotations using provenance polynomials to queries (e.g., associate the annotation src : (s1 + s2) s3 to the query qex of Example 3). We deﬁne DL-Lite R @,K as an order-sorted version of DLLite R @. Elements of different sorts correspond to sets of individual names NI, provenance sums NS and provenance polynomials NP. We represent provenance polynomials with the positive algebra provenance semiring for NI, deﬁned as the commutative semiring of polynomials with variables in NI and coefﬁcients from N, with operations deﬁned as usual: K = (N[NI], +, , 0, 1). We denote by NP the set of polynomials of K and by NS the subset of NP containing the sums of the commutative monoid (N[NI], +, 0). We thus have

NI NS NP. We may use the symbols P and Q to denote sum and product of elements in NP (which will then also be in NP). Elements of NS are used as values in the ontology speciﬁers while elements of NP appear as values in the query speciﬁers. Non-linear polynomials indicate the use of several assertions to derive a query, while provenance sums indicate that a query can be derived from different sources. Role and concept inclusions in DL-Lite R @,K are deﬁned similarly as in DL-Lite R @, with the only difference that we allow elements from NS to be values of attributes in the speciﬁers. Concept and role assertions are deﬁned as in DL-Lite R @. The fact that we do not allow values from NS in the assertions does not change the expressivity of DL-Lite R @,K, since inclusions can enforce the entailments of such assertions. Example 5. The following concept inclusion restricts that of Example 3 by further requiring that the fact that someone has a spouse has to be associated both with s1 and with s2 to conclude that this person is married.

X : src : s1 + s2 ( spouse@X Married@X)

Provenance Semantics. We now introduce the semantics of DL-Lite R @,K, based on provenance sums. A provenanceinterpretation I = ( I, I) is such that I maps polynomials a and b in NP to the same element a I = b I in I if and only if they are mathematically equal2. We denote by I I the domain of individuals and by I S the domain of provenance sums, which are the subsets of I corresponding to the image of elements in NI and NS, respectively. Thus I I I S I. To capture the semantics of provenance sums we develop a notion of closure. Intuitively, if a fact is annotated with n sources then it should also be annotated with the sum of any subset of these sources, since the fact can be retrieved alternatively by any source from this subset. For example, assume (a, F1), . . . , (a, Fn) are in the interpretation of a concept or a role name E. If there is (src I, s I i ) in each Fi and these annotation sets only differ by such pairs, then for each subset of {s1, . . . , sn}, the interpretation of E should have (a, Fs) with Fs differing from Fi only by the pair (src I, s I), where s is the sum of the elements of the subset. We say that G, H ΦI are differentiated by p in F if

F = G \ {(p, a) | (p, a) G} = H \ {(p, b) | (p, b) H}.

In this case, we denote by G +p H the set

F {(p,(a + b)I)|{a, b} NP,(p, a I) G, (p, b I) H}.

A sum of possibly more than two annotation sets differentiated by p may be denoted by Pp 1 i n Gi and is unique by the commutative law. For E NC NR, and a a tuple of the arity of E, we say that G +p H is non-primitive for a and EI if {(a, G), (a, H)} EI. We denote by Ep I,a,F the set of annotation sets G pairwise differentiated by p in F ΦI such that (a, G) EI with G primitive for a and EI. Deﬁnition 2 (Closure of EI). EI is closed under sum if for all tuples a (in I or I I according to the arity of E),

2According to associative, commutative and distributive laws. E.g., (a + b)I = (b + a)I by the commutative law.

{(a, Pp G σ G) | σ Ep I,a,F , σ = } EI for every p I and every F ΦI. A provenance-interpretation I = ( I, I) is well-founded if EI is closed under sum for all E NC NR. For all E NC NR and a with elements in I, we also require that the support of EI and a deﬁned as {F | (a, F) EI} is ﬁnite. This ensures that the sum in Deﬁnition 2 is ﬁnite. An interpretation of DL-Lite R @,K is a well-founded provenanceinterpretation. We denote by SS the set of speciﬁers deﬁned in the same way as S except that we use NS instead of NI when deﬁning values of attributes. The semantics of speciﬁers in SS is deﬁned as expected following the deﬁnition given in the Section Attributed DL-Lite and we use the same notions of satisﬁability and entailment. In Deﬁnition 2 we consider all subsets of Ep I,a,F rather than the sum of its elements. This is to ensure monotonicity of DL-Lite R @,K. Otherwise, given for example A(a)@[p : a] and A(a)@[p : b] we would lose the entailment A(a)@[p : a + b] by adding A(a)@[p : c]. Example 6. Consider the ontology O with the assertions spouse(gabor, ryan)@[src : s1], spouse(gabor, ryan)@[src : s2] and the concept inclusion of Example 5. Let I have domain I = {gabor, ryan, src, s1, s2, s1 + s2}, interpret each individual name by itself, (s1 + s2)I = s1 + s2, and

spouse I = {(gabor, ryan, G), (gabor, ryan, H), (gabor, ryan, G +src H)}

Married I = {(gabor, G +src H)} where G = {(src, s1)}, H = {(src, s2)} and G +src H = {(src, s1 + s2)}.

spouse I and Married I are closed under sum, I is a model of O and O |= spouse(gabor, ryan)@ src : s1 + s2 . We denote by SP the set of speciﬁers deﬁned in the same way as SS except that we use NP instead of NS for the values of attributes. The semantics of speciﬁers in SP is as expected from the Section Attributed DL-Lite . We assume that all polynomials occurring in a speciﬁer in SP are of the form Σ1 i n1Π1 j n2ai,j, where all ai,j NI. Given an interpretation I = ( I, I) and {F, G} ΦI, let F G be:

{(p,(a b)I)|{a, b} NP,(p, a I) F, (p, b I) G}.

Unlike +p, is not parameterized by an attribute because products combine different information, whereas sums represent alternative ways of obtaining the same information (i.e., tuple plus the same other attribute-value pairs). A product of annotation sets may be denoted by Q

1 i n Gi. We next deﬁne semiring attributed queries, which allow a ground speciﬁer to be associated to the whole conjunction of atoms. Deﬁnition 3 (Semiring Attributed Queries). A semiring attributed boolean conjunctive query (BCQ@,K) is an expression of the form:

x.X1 : S1, . . . , Xn : Sn (ϕ(x))@S,

where S is a ground speciﬁer in SP, for 1 i n, Xi NU are the set variables occurring in ϕ(x) and Si SS, and

1 j m Ej(tj)@Tj

where for 1 j m, Tj SS, Ej NC NR and tj is a tuple of elements from NI x. If S = , we say that the BCQ@,K is plain. Given a BCQ@,K q, let q be the BCQ@ that results from removing the outer speciﬁer from q. Let I = ( I, I) be an interpretation and let νI(q ) be the set of all variable assignments Z that fuﬁll the conditions of Deﬁnition 1 for I |= q . I satisﬁes q, written I |= q, if there is a non-empty χ νI(q ) such that: 1. for any Z, Z χ, there exists X NU occurring in q such that Z(X) = Z (X) or there exists x x such that Z(x) = Z (x); 2. for each Z χ and 1 j m, we have that (Z(tj), F Z j ) EI,Z j for some F Z j T I,Z j ;

3. there is p I and G ΦI such that all HZ = Q

1 j m F Z j with Z χ are differentiated by p in G, and Pp Z χ HZ SI,Z. Essentially, Deﬁnition 3 says that: (1) there are different variable assignments which (2) satisfy the homomorphic conditions and (3) correspond to the interpretation of the outer speciﬁer. Our semiring attributed queries can be easily extended so that the outer speciﬁer has fresh and free object variables. In this case the answer to the query would be the set of provenance polynomials related with the respective attribute and the query. Semiring attributed queries can be used to query a DL-Lite R @,K ontology using provenance polynomials, as we illustrate with the following example. Example 7 (Example 3 cont d). We now modify qex, so that we impose provenance constraints on the result: xy (Married(gabor)@ start: x, end: y Married(taylor)@ start: x, end: y )@ src: γ where γ is the polynomial (s1 s3) + (s2 s3)

By the semantics of DL-Lite R @,K, it follows that Oex |= qex. All shared attributes are taken into account when combining the annotations, while the non-shared attributes are irrelevant and lost in the product. Example 8. The query (Married(a) Married(b))@S with S = src: s1 s2, classif: public conﬁd, mult: 2 3 is entailed by {Married(a)@[src: s1, classif: public, mult: 2], spouse(b, c)@[src: s2, classif: conﬁd, mult: 3, time: t]} and the inclusion of Example 3. The fact that a and b are both married is obtained by combining sources s1 and s2, and by having access to both public and conﬁdential information. Note that using inclusions to propagate annotations allows the query derived from assertions with multiplicities 2 and 3 to have multiplicity 2 3, as it would be under the bag semantics (Nikolaou et al. 2017). When interpreted over provenance-interpretations, ontologies in the DL-Lite R @ fragment of DL-Lite R @,K (i.e., without sums) can entail queries with sums, as in Example 9. Example 9. Let O be the DL-Lite R @ ontology {A(a)@[p : a], A(a)@[p : b], A@X R@X}. Then the query xy(R(x, y)@[p : a+b])@ follows from O only under the semirings-based semantics.

Reasoning in DL-Lite R @,K Unfortunately, Theorem 5 shows that provenance sums increase the complexity of the satisﬁability problem. The proof is by reduction from the word problem for a polynomially space bounded Alternating Turing Machine (ATM) which is EXPTIME-hard (Chandra, Kozen, and Stockmeyer 1981).

Theorem 5. In DL-Lite R @,K, satisﬁability is EXPTIME-hard.

The hardness result of Theorem 5 holds even for DLLite R @,K ontologies without expressions of the form P, where P is a role expression. Motivated by this negative result, we investigate restricted cases for query answering. We ﬁrst show that for the class of DL-Lite R @,K ontologies which do not contain inclusions with expressions of the form P on the right side, we can check the entailment of BCQ@,K via a transformation to ground and plain BCQ@,Ks. Given such a DL-Lite R @,K ontology O, one can translate a BCQ@,K q into a set of ground and plain BCQ@,Ks gr plain(O, q) such that O |= q iff there is some qgp gr plain(O, q) that is entailed by an equisatisﬁable ground ontology. We can assume w.l.o.g. that if Ej(tj)@Tj occurs in q then Tj NU: if Tj is a speciﬁer one can always replace it by a fresh X NU and add X : Tj to the preﬁx of q, that is:

q = x. X1 : S1, . . . , Xm : Sm (

1 j m Ej(tj)@Xj)@S.

Assume NI does not occur in O nor in q and let NPmin be a ﬁxed but arbitrary minimal subset of NP such that for each a NP, NPmin contains an element b such that a is mathematically equal to b. Let I be a DL-Lite R @,K interpretation with domain I = NPmin and such that a I = a for every a NPmin. We say that a variable assignment Z is compatible with q if Z(Xj) SI,Z j , 1 j m. Let q be the result of removing the outer speciﬁer from q. Given a compatible Z, a Z-image V

1 j m Ej(tj)@Tj of q is obtained by:

replacing each Xj with Tj = [a: b | (a, b) Z(Xj)];

replacing each object variable x by Z(x);

if occurs in some Tj, replacing by Tj, where Tj is the set of attribute-value pairs in Tj that do not contain .

Given a ground speciﬁer T, let FT := {(a I, b I) | a : b occurs in T} ΦI. We deﬁne gr plain(O, q) as the set of ground plain BCQ@,Ks:

1 j m Ej(ti j)@Si j ))@

where the annotation sets Fi = Q

1 j m FSi j (1 i n) are such that there exists p such that (i) the Fi are differentiated by p in some annotation set, (ii) each Fi contains some (p, a) with a NP, and (iii) Pp 1 i n Fi SI. Also, V

1 j m Ej(ti j)@Si j is a Z-image of q with attribute-value pairs built from elements of NS. By construction, qgp does not contain variables.

Example 10 (Example 3 cont d). The query below is a ground and plain version of the query in Example 7 which is

entailed by Oex.

Married(gabor)@[start : 1975, end : 1976, src : s1 + s2] Married(taylor)@[start : 1975, end : 1976, src : s3].

One can show that, for DL-Lite R @,K ontologies O without expressions of the form P on the right side of inclusions, O |= q iff there is qgp gr plain(O, q) such that Ogr |= qgp, where Ogr is an equisatisﬁable ground ontology, obtained in a way similar to our construction of gr plain(O, q) but imposing that the image of the variable assignments is over a ﬁnite set of individual names deﬁned in terms of O. In the case where O is ground, we further have a polynomial bound on the size of such qgp. Lemma 1. Let q be a BCQ@,K and let O be a ground DL-Lite R @,K ontology without expressions of the form P on the right side of inclusions. O |= q iff there is qgp gr plain(O, q) such that (i) Ogr |= qgp, (ii) the size of qgp is polynomial in |q| and |O| and (iii) deciding qgp gr plain(O, q) is in PTIME. Lemma 1 does not hold for arbitrary DL-Lite R @,K ontologies, as illustrated by Example 11. Example 11. Let O be the DL-Lite R @,K ontology {A R, R A@[p : b], R B, B(a), A(a)@[p : b]}. Then, O entails q = x(A(x)@ )@ p : b + b , since there would be an R-successor in the anonymous part of the model, but there is no qgp gr plain(O, q) such that O |= qgp. We now use the polynomial bound in Lemma 1 to show an upper bound for a fragment, called simple, where we only allow inclusions of the form E1@S E2@T, with E1 and E2 concept/role names and S and T ground speciﬁers. We establish the complexity of BCQ@,K entailment from simple ontologies. This case is close to the classical problem of query answering over databases, considered by Green, Karvounarakis, and Tannen (2007). Theorem 6 states that this complexity remains the same as in the database case. Theorem 6. BCQ@,K entailment from a simple DL-Lite R @,K ontology is NP-complete.

Proof. Let O be a simple DL-Lite R @,K ontology. We ﬁrst show that one can decide in NP whether E(a)@S is entailed from O, where S is a ground speciﬁer. Claim 1. Deciding whether O |= E(a)@S is in NP. Proof of Claim 1 We ﬁrst guess the set Q of all atomic queries of the form E0(a)@T0 entailed by O such that E0@T0 occurs in O and an ordering for the entailment of such queries. If T0 is an open speciﬁer then replace it in Q by T0, , deﬁned as the ground closed speciﬁer containing all attribute-value pairs in T0 plus S : S with S the set of attribute-value pairs in T0. We make the usual assumption that individual names of the form S do not occur in O and E(a)@S. Denote by Qq the subset of Q containing all atomic queries which preceed q in the ordering. For each guessed query q = E0(a)@T0:

Denote by FT the set {(a, b) | a : b occurs in T} for any ground speciﬁer T and let E0(a)@S1, . . . , E0(a)@Sn be the assertions and atomic queries in O Qq where E0 and a occur.

Guess a tree of annotation sets rooted either in FT0 if T0 is a closed speciﬁer, or in a superset F of FT0 if T0 is an open speciﬁer, where each non-leaf node F is the parent of children G1, . . . , Gm such that F = Pp 1 i m Gi, for some attribute p, and such that each leaf is either: one of FS1, . . . , FSn, or some FT (or FT if T is open) such that there exist E1@T1 E0@T and E1(a)@T1 (or E1(a)@T1, if T1 is open) in O Qq.

Check in polynomial time whether the trees satisfy the described conditions. The size of Q (and so the number of trees to guess and the size of the ordering) is bounded by the number of atomic queries E0(a)@T0 that can be built from concept/role expressions and individual names in O, so it is polynomial in the size of O. To check whether O |= E(a)@S, we check whether E(a)@S Q (assuming w.l.o.g. that E@S occurs in O). The size of each guessed tree is polynomial in the size of O since each leaf corresponds to an assertion/atomic query in O or Q (or an assertion/atomic query in O or Q together with an inclusion in O) and they do not repeat in the tree. Thus, one can decide whether O |= E(a)@S in NP. By Lemma 1, O |= q iff there exists qgp gr plain(O, q) such that Ogr |= qgp. Moreover the size of qgp is polynomial in the size of q and O and qgp does not contain variables. We thus get the NP upper bound by guessing qgp as well as certiﬁcates that Ogr |= E(a)@S for each E(a)@S in qgp, using Claim 1 (indeed, Ogr is also a simple ontology and is of polynomial size w.r.t. O). The lower bound comes from the complexity of BCQ entailment in relational databases.

One of the difﬁculties in showing Theorem 6 for arbitrary DL-Lite R @,K ontologies is that one can express that elements in the anonymous part of the model are distinct, as illustrated in Example 11, and then our translation does not hold. In this case, gr plain(O, q) needs to include queries with inequalities to distinguish anonymous elements, and entailment of BCQs with inequalities over DL-Lite R ontologies easily leads to undecidability (e.g., see Theorem 13 in (Guti errez Basulto et al. 2015)). We now show an upper bound for satisﬁability in DLLite R @,K by translating the ontology into an equisatisﬁable ontology in a DL that we call DL-Lite R, Horn, which extends DL-Lite R with conjunctions on the left side of concept and role inclusions. Our translation is double-exponential since in DL-Lite R @,K we need to ensure, e.g., that elements in the extension of E@[src: s1] and E@[src: s2] are also in the extension of E@[src: s1 + s2]. Theorem 7. In DL-Lite R @,K, satisﬁability is in 2EXPTIME.

Sketch. We ﬁrst ground the ontology and then translate it into DL-Lite R, Horn. We encode the semantics of provenance sums using a double-exponential number of concept and role inclusions with conjunctions on the left side. Since satisﬁability in DL-Lite R, Horn is in PTIME (Artale et al. 2015) (Theorem 14), the 2EXPTIME upper bound follows.

We next analyse entailment of plain BCQ@,K w.r.t. DLLite R @,K ontologies: the outer speciﬁer is of the form but

inner speciﬁers can contain provenance sums (as in Ex. 9). We use the fact that BCQ entailment in DL-Lite R, Horn is in NP (Cal ı, Gottlob, and Pieris 2012, proof of Theorem 3.3).

Theorem 8. In DL-Lite R, Horn, BCQ entailment is in NP.

Theorem 9 establishes an upper bound for plain queries.

Theorem 9. In DL-Lite R @,K, entailment of plain BCQ@,K is in N2EXPTIME.

Sketch. The proof uses the translation to DL-Lite R, Horn which leads to a double-exponential blowup of the ontology. Here, since queries are plain the translation is as for BCQ@s. The result then follows from Theorem 8.

We investigated the complexity of satisﬁability and query answering in attributed DL-Lite R, for both the semantics introduced by Kr otzsch et al. (2017) and a new semantics based on provenance semirings, which allows to embed provenance polynomials into the query. In particular, we show that these problems are PSPACE-complete for the classical semantics and that in the case of simple ontologies, even query answering under the semirings-based semantics has the same complexity as query answering in DL-Lite R. However, satisﬁability of general DL-Lite R @,K ontologies is EXPTIME-hard.

Related Work. Our attributed ontology language differs from DL-Lite A (Calvanese et al. 2006), which allows to associate values to individuals or pairs of individuals, rather than to assertions, through binary or ternary relations called attribute concepts or attribute roles. In particular, while we can use the same attribute name to annotate different assertions about the same individual or pair of individuals, it would be ambiguous in DL-Lite A. For instance, we can express that Liz Taylor was married to Richard Burton from 1964 to 1974 and from 1975 to 1976 with spouse(taylor, burton)@[start : 1964, end : 1974], spouse(taylor, burton)@[start : 1975, end : 1976], while in DL-Lite A we would need reiﬁcation. The query spouse(taylor, burton)@[start : x, end : y] that returns the start and end dates of the marriages would be more complex (namely, e.g., z spouse1(z, taylor) spouse2(z, burton) start(z, x) end(z, y)). Another difference is the use in DLLite A of two distinct alphabets and interpretation domains for the individuals and the values, following the distinction made in OWL between objects and values. Regarding provenance, the topic has been extensively studied for relational databases (Cheney, Chiticariu, and Tan 2009), but has also drawn attention in other settings, e.g., for Datalog (Deutch et al. 2014), Datalog+/ (Lukasiewicz et al. 2014), and Semantic Web data, with numerous works proposing provenance models based on semirings for the evaluation of SPARQL queries over annotated RDF, see e.g., (Theoharis et al. 2011; Zimmermann et al. 2012; Geerts et al. 2016). In particular, Zimmermann et al. consider the possibility of having several annotations with different domains (fuzzy, temporal and provenance) and introduce an annotated version of SPARQL that manipulates explicitly

annotations, while most work on provenance only implicitly propagates provenance annotations.

Future Work. Our next step will be the study of the data complexity and the design of practical algorithms for querying attributed DL-Lite ontologies. In particular, we would like to extend the classical DL-Lite rewriting approach to the attributed setting to avoid grounding the ontology. For instance, if an ontology only contains inclusions of the form E@X F@X, the rewriting algorithm for DL-Lite R could be adapted to rewrite an attributed query where annotations sets are propagated in the rewriting process (e.g., Married(gabor)@ start : 1975 in Example 3 could be rewritten into y spouse(gabor, y)@ start : 1975 ).

Acknowledgements. We thank Stefan Borgwardt and an anonymous reviewer for pointing us to useful references. Partially supported by ANR-16-CE23-0007-01 ( DICOS ).

References Artale, A.; Calvanese, D.; Kontchakov, R.; and Zakharyaschev, M. 2009. The DL-Lite family and relations. J. Artif. Intell. Res. 36:1 69. Artale, A.; Kontchakov, R.; Ryzhikov, V.; and Zakharyaschev, M. 2015. Tractable interval temporal propositional and description logics. In Proceedings of AAAI. Baader, F.; Calvanese, D.; Mc Guinness, D.; Nardi, D.; and Patel-Schneider, P., eds. 2007. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, second edition. Bienvenu, M., and Ortiz, M. 2015. Ontology-mediated query answering with data-tractable description logics. In Reasoning Web, Tutorial Lectures, 218 307. Bienvenu, M.; Deutch, D.; and Suchanek, F. M. 2012. Provenance for Web 2.0 data. In Proceedings of Secure Data Management: 9th VLDB Workshop, SDM 2012. Bourgaux, C., and Ozaki, A. 2018. Querying attributed DL-Lite ontologies using provenance semirings. Technical Report 02, Free University of Bozen-Bolzano. Available at https://www.inf.unibz.it/krdb/KRDB%20ﬁles/tech-reports/ KRDB18-02.pdf. Cal ı, A.; Gottlob, G.; and Pieris, A. 2012. Towards more expressive ontology languages: The query answering problem. Artif. Intell. 193:87 128. Calvanese, D.; De Giacomo, G.; Lembo, D.; Lenzerini, M.; Poggi, A.; and Rosati, R. 2006. Linking data to ontologies: The description logic DL-Lite A. In Proceedings of the OWLED*06 Workshop on OWL: Experiences and Directions. Calvanese, D.; Giacomo, G. D.; Lembo, D.; Lenzerini, M.; and Rosati, R. 2007. Tractable reasoning and efﬁcient query answering in description logics: The DL-Lite family. J. of Automated Reasoning 39(3):385 429. Chandra, A. K.; Kozen, D. C.; and Stockmeyer, L. J. 1981. Alternation. J. of the ACM 28(1):114 133. Cheney, J.; Chiticariu, L.; and Tan, W. C. 2009. Provenance in databases: Why, how, and where. Foundations and Trends in Databases 1(4):379 474.

Deutch, D.; Milo, T.; Roy, S.; and Tannen, V. 2014. Circuits for datalog provenance. In Proceedings of ICDT. Geerts, F.; Unger, T.; Karvounarakis, G.; Fundulaki, I.; and Christophides, V. 2016. Algebraic structures for capturing the provenance of SPARQL queries. J. ACM 63(1):7:1 7:63. Green, T. J.; Karvounarakis, G.; and Tannen, V. 2007. Provenance semirings. In Proceedings of PODS. Guti errez-Basulto, V.; Ib a nez Garc ıa, Y.; Kontchakov, R.; and Kostylev, E. V. 2015. Queries with negation and inequalities over lightweight ontologies. Web Semant. 35(P4):184 202. Hoffart, J.; Suchanek, F. M.; Berberich, K.; and Weikum, G. 2013. YAGO2: A spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell. 194:28 61. Kr otzsch, M.; Marx, M.; Ozaki, A.; and Thost, V. 2017. Attributed description logics: Ontologies for knowledge graphs. In Proceedings of ISWC. Kr otzsch, M.; Marx, M.; Ozaki, A.; and Thost, V. 2018. Attributed description logics: Reasoning on knowledge graphs. In Proceedings of IJCAI. Lukasiewicz, T.; Martinez, M. V.; Predoiu, L.; and Simari, G. I. 2014. Information integration with provenance on the semantic web via probabilistic datalog+/-. In URSW 20112013, Revised Selected Papers. Marx, M.; Kr otzsch, M.; and Thost, V. 2017. Logic on MARS: Ontologies for generalised property graphs. In Proceedings of IJCAI. Motik, B.; Cuenca Grau, B.; Horrocks, I.; Wu, Z.; Fokoue, A.; and Lutz, C., eds. 2009. OWL 2 Web Ontology Language: Proﬁles. W3C Recommendation. Available at http://www. w3.org/TR/owl2-proﬁles/. Nikolaou, C.; Kostylev, E. V.; Konstantinidis, G.; Kaminski, M.; Grau, B. C.; and Horrocks, I. 2017. The bag semantics of ontology-based data access. In Proceedings of IJCAI. Rodriguez, M. A., and Neubauer, P. 2010. Constructions from dots and lines. Bulletin of the American Society for Information Science and Technology 36(6):35 41. Savitch, W. J. 1970. Relationships between nondeterministic and deterministic tape complexities. Journal of Computer and System Sciences 4(2):177 192. Senellart, P. 2017. Provenance and probabilities in relational databases. SIGMOD Record 46(4):5 15. Suciu, D.; Olteanu, D.; R e, C.; and Koch, C. 2011. Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers. Theoharis, Y.; Fundulaki, I.; Karvounarakis, G.; and Christophides, V. 2011. On provenance of queries on semantic web data. IEEE Internet Computing 15(1):31 39. Vrandeˇci c, D., and Kr otzsch, M. 2014. Wikidata: A free collaborative knowledgebase. Commun. ACM 57(10). Zimmermann, A.; Lopes, N.; Polleres, A.; and Straccia, U. 2012. A general framework for representing, reasoning and querying with annotated semantic web data. J. Web Sem. 11:72 95.