# provenance_for_the_description_logic_elhr__f1e53b25.pdf Provenance for the Description Logic ELHr Camille Bourgaux1 , Ana Ozaki2,4 , Rafael Pe naloza3 and Livia Predoiu4 1DI ENS, ENS, CNRS, PSL University & Inria, Paris, France 2University of Bergen, Norway 3University of Milano-Bicocca, Italy 4Free University of Bozen-Bolzano, Italy camille.bourgaux@ens.fr, ana.ozaki@uib.no, rafael.penaloza@unimib.it, livia.predoiu@unibz.it We address the problem of handling provenance information in ELHr ontologies. We consider a setting recently introduced for ontology-based data access, based on semirings and extending classical data provenance, in which ontology axioms are annotated with provenance tokens. A consequence inherits the provenance of the axioms involved in deriving it, yielding a provenance polynomial as annotation. We analyse the semantics for the ELHr case and show that the presence of conjunctions poses various difficulties for handling provenance, some of which are mitigated by assuming multiplicative idempotency of the semiring. Under this assumption, we study three problems: ontology completion with provenance, computing the set of relevant axioms for a consequence, and query answering. 1 Introduction Description logics (DLs) are a well-known family of firstorder logic fragments in which conceptual knowledge about a particular domain and facts about specific individuals are expressed in an ontology, using unary and binary predicates called concepts and roles [Baader et al., 2007a]. Important reasoning tasks performed over DL ontologies are axiom entailment, i.e. deciding whether a given DL axiom follows from the ontology; and query answering. Since scalability is crucial when using large ontologies, DLs with favorable computational properties have been investigated. In particular, the EL language and some of its extensions allow for axiom entailment in polynomial time, and conjunctive query entailment in NP [Baader et al., 2005; Baader et al., 2008a]. Many real-world ontologies, including SNOMED CT, use languages from the EL family, which underlies the OWL 2 EL profile of the Semantic Web standard ontology language. In many settings it is crucial to know how a consequence e.g. an axiom or a query has been derived from the ontology. In the database community, provenance has been studied for nearly 30 years [Buneman, 2013] and gained traction when the connection to semirings, so called provenance semirings [Green et al., 2007; Green and Tannen, 2017] was discovered. Provenance semirings serve as an abstract algebraic tool to record and track provenance information; that is, to keep track of the specific database tuples used for deriving the query, and of the way they have been processed in the derivation. Besides explaining a query answer, provenance has many applications like: computing the probability or the degree of confidence of an answer, counting the different ways of producing an answer, handling authorship, data clearance, or user preferences [Senellart, 2017; Suciu et al., 2011; Lukasiewicz et al., 2014; Ives et al., 2008]. Semiring provenance has drawn interest beyond relational databases (e.g. [Buneman and Kostylev, 2010; Zimmermann et al., 2012; Deutch et al., 2014; Ramusat et al., 2018; Dannert and Gr adel, 2019]), and in particular has recently been considered for ontology-based data access, a setting where a database is enriched with a DL-Lite R ontology and mappings between them [Calvanese et al., 2019]. In the latter, the ontology axioms are annotated with provenance variables. Queries are then annotated with provenance polynomials that express their provenance information. Example 1. Consider the facts mayor(Venice, Brugnaro) and mayor(Venice, Orsoni), stating that Venice has mayors Brugnaro and Orsoni, annotated respectively with provenance information v1 and v2, and the DL axiom ran(mayor) Mayor, expressing that the range of the role mayor is the concept Mayor, annotated with v3. The query x.Mayor(x) asks if there is someone who is a mayor. The answer is yes and it can be derived using ran(mayor) Mayor together with any of the two facts, interpreting x by Brugnaro or Orsoni. This is expressed by the provenance polynomial v1 v3 + v2 v3. Intuitively, expresses the joint use of axioms in a derivation path of the query, and + the alternative derivations. We adapt the provenance semantics of Calvanese et al. for the ELHr variant of EL, extending it to those ELHr axioms that do not occur in DL-Lite R. It turns out that handling the conjunction allowed in ELHr axioms is not trivial. To obtain models from which we can derive meaningful provenanceannotated consequences, we adopt -idempotent semirings and a syntactic restriction on ELHr (preserving the expressivity of full ELHr when annotations are not considered). After introducing the basic definitions and the semantics for DL ontologies and queries annotated with provenance information, we present a completion algorithm and show that it solves annotated axiom entailment and instance queries in ELHr in polynomial time in the size of the ontology and polynomial space in the size of the provenance polynomial. We then show that we can compute the set of relevant provenance variables Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) for an entailment in polynomial time. Finally, we investigate conjunctive query answering. Note that the query answering methods developed by Calvanese et al. cannot be extended to ELHr since they rely on the FO-rewritability of conjunctive queries in DL-Lite R, a property that does not hold for ELHr [Bienvenu et al., 2013]. Therefore, we adapt the combined approach for query answering in EL [Lutz et al., 2009] to provenance-annotated ELHr ontologies. Detailed proofs are available in [Bourgaux et al., 2020]. 2 Provenance for ELHr We first introduce our framework for provenance for ELHr ontology and discuss our design choices. 2.1 Basic Notions In the database setting, commutative semirings have proven to be convenient for representing various kinds of provenance information [Green et al., 2007; Green and Tannen, 2017]. In a commutative semiring (K, +, , 0, 1), the product and the addition + are commutative and associative binary operators over K, and distributes over +. Given a countably infinite set NV of variables that are used to annotate the database tuples, a provenance semiring is a semiring over a space of annotations, or provenance expressions, with variables from NV. Green and Tannen present a hierarchy of expressiveness for provenance annotations [2017]. The most expressive form of annotations is provided by the provenance polynomials semiring N[NV] = (N[NV], +, , 0, 1) of polynomials with coefficients from N and variables from NV, and the usual operations. The semiring N[NV] is universal, i.e., for any other commutative semiring K = (K, +, , 0, 1), any function ν : NV K can be extended to a semiring homomorphism h : N[NV] K, allowing the computations for K to factor through the computations for N[NV] [Green, 2011]. Hence, provenance polynomials provide the most informative provenance annotations and correspond to so-called how-provenance [Cheney et al., 2009]. Less general provenance semirings are obtained by restricting the operations + and to idempotence and/or absorption [Green and Tannen, 2017]. In this work, we focus on -idempotent semirings, i.e., for every v K, v v = v. This corresponds to the Trio semiring Trio(NV), defined in [Green, 2011] as the quotient semiring of N[NV] by the equivalence kernel trio of the function trio : N[NV] N[NV] that drops exponents. An annotation is a polynomial p that is understood to represent its equivalence class p/ trio. Trio(NV) encompasses in the hierarchy the well-known why-provenance semiring Why(NV) obtained by restricting + to be idempotent as well, where an annotation corresponds to the set of sets of tuples used to derive the result [Cheney et al., 2009]. We use the following notation. A monomial is a finite product of variables in NV. Let NM be the set of monomials, and NP the set of all finite sums of monomials, i.e., NP contains polynomials of the form P 1 ji mi vi,ji, with vi,ji NV; n, mi > 0. By distributivity, every polynomial can be written into this form. The representative [m] of a monomial m is the product of the variables occuring in m, in lexicographic order. Two monomials which are equivalent w.r.t. trio (i.e. are syntactically equal modulo commutativity, associativity and -idempotency) have the same representative, e.g., v u and u v u have representative u v. N[M] denotes the set {[m] | m NM}. As ontology language we use a syntactic restriction of ELHr. Consider three mutually disjoint countable sets of concept NC, role NR, and individual names NI, disjoint from NV. ELHr general concept inclusions (GCIs) are expressions of the form C D, built according to the grammar rules C ::= A | R.C | C C | D ::= A | R, where R NR, A NC. Role inclusions (RIs) and range restrictions (RRs) are expressions of the form R S and ran(R) A, respectively, with R, S NR and A NC. An assertion is an expression of the form A(a) or R(a, b), with A NC, R NR, and a, b NI. An axiom is a GCI, RI, RR, or assertion. An ELHr ontology is a finite set of ELHr axioms. ELHr usually allows GCIs of the form C C, but these can be translated into our format by exhaustively applying the rules: (i) replace C C1 C2 by C C1 and C C2, (ii) replace C1 R.C2 by C1 S, S R and ran(S) C2 where S is a fresh role name. The reason for syntactically restricting ELHr is that conjunctions or qualified restrictions of a role on the right-hand side of GCIs lead to counter-intuitive behavior when adding provenance annotations. We discuss this later in this section. 2.2 Annotated Ontologies Provenance information is stored as annotations. An annotated axiom has the form (α, m) with α an axiom and m NM. An annotated ELHr ontology O is a finite set of annotated ELHr axioms of the form (α, v) with v NV {1}. We denote by ind(O) the set of individual names occurring in O. The semantics of annotated ontologies extends the classical notion of interpretations to track provenance. An annotated interpretation is a triple I = ( I, I m, I) where I, I m are non-empty disjoint sets (the domain and domain of monomials of I, respectively), and I maps every a NI to a I I; every A NC to AI I I m; every R NR to RI I I I m; and every m, n NM to m I, n I I m s.t. m I = n I iff m trio n.1 We extend I to complex ELHr expressions as usual: ( )I = I {1I}; ( R)I = {(d, m I) | e I s.t. (d, e, m I) RI}; (C D)I = {(d, (m n)I) | (d, m I) CI, (d, n I) DI}; (ran(R))I = {(e, m I) | d I s.t. (d, e, m I) RI}; ( R.C)I = {(d, (m n)I) | e I s.t. (d, e, m I) RI, (e, n I) CI}. The annotated interpretation I satisfies: 1or iff m = n if we consider N[NV] instead of Trio(NV) Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) (R S, m), if, for all n NM, (d, e, n I) RI implies (d, e, (m n)I) SI; (C D, m), if, for all n NM, (d, n I) CI implies (d, (m n)I) DI; (A(a), m), if (a I, m I) AI; and (R(a, b), m), if (a I, b I, m I) RI. I is a model of the annotated ontology O, denoted I |= O, if it satisfies all annotated axioms in O. O entails (α, m), denoted O |= (α, m), if I |= (α, m) for every model I of O. Remark. While it may appear counter-intuitive at first sight that CI differs from (C C)I, this is in line with the intuition behind the provenance of a conjunction. In the database setting, the Trio-provenance of tuple (a) being an answer to query yz.R(x, y) R(x, z) over {R(a, b), R(a, c)} is also different from that of (a) being an answer to y.R(x, y). Example 2 illustrates the semantics and some differences with the DL-Lite R case from [Calvanese et al., 2019]. Example 2. Consider the following annotated ontology. O = {(mayor(Venice, Orsoni), v1), (predecessor(Brugnaro, Orsoni), v2), ( predecessor.Mayor Mayor, v3), (ran(mayor) Mayor, v4)}. Let I be s.t. I = {Brugnaro, Orsoni, Venice}, I m = N[M], individual names are interpreted by themselves, monomials by their representatives and mayor I = {(Venice, Orsoni, v1)}, predecessor I = {(Brugnaro, Orsoni, v2)}, Mayor I = {(Orsoni, v1 v4), (Brugnaro, v1 v2 v3 v4)}. I |= O by the semantics of annotated ELHr. Moreover, it can be verified that if I |= (α, m), then O |= (α, m). Note that O entails (Mayor(Brugnaro), v1 v2 v3 v4) whose provenance monomial contains v1 and v2, witnessing that the two assertions of O have been used to derive Mayor(Brugnaro). Combining two assertions to derive another one is not possible in DL-Lite R. The rewriting-based approach by Calvanese et al. cannot be applied here as predecessor.Mayor Mayor leads to infinitely many rewritings. 2.3 Discussion on Framework Restrictions Example 2 shows that conjunction and qualified role restriction lead to a behavior different from DL-Lite R. They are also the reason for some features of our setting. First, the next example illustrates the -idempotency impact for the EL family. Example 3. Let O = {(A B1, v1), (A B2, v2), (B1 B2 C, v3)}. If I is a model of O and (e, n I) AI, then (e, (n v1)I) BI 1 and (e, (n v2)I) BI 2 so (e, (n v1 n v2)I) (B1 B2)I, i.e. (e, (n v1 v2)I) (B1 B2)I by -idempotency, which implies (e, (n v1 v2 v3)I) CI. Thus O |= (A C, v1 v2 v3). This intuitive entailment is lost if is not idempotent. Indeed, assume that is not idempotent and let I be the interpretation defined as follows (where I = {e} and I m contains all monomials with variables in lexicographic order). AI ={(e, u)} BI 1 = {(e, u v1)} BI 2 = {(e, u v2)} CI ={(e, u u v1 v2 v3)}. I is a model of O such that I |= (A C, v1 v2 v3). A downside of -idempotency is a loss of the expressive power of provenance, neglecting the number of times an axiom is used in a derivation. Let O = {(A B, v1), (B A, v2)}. With -idempotency, O |= (A B, vk 1 vl 2) for k 1 and l 0 because for k, l 1, vk 1 vl 2 is interpreted by (v1 v2)I in any interpretation I. In contrast, if is not idempotent, we only obtain O |= (A B, vk+1 1 vk 2) for k 0 (in particular O |= (A B, v1 v2)), which is a more informative result. Some useful semirings are not -idempotent; e.g. the Viterbi semiring ([0, 1] , max, , 0, 1), where is the usual product over real numbers, which is applied for representing confidence scores. We limit ourselves to -idempotent semirings because we are interested in computing provenance not only for assertions or queries, but also for GCIs. In particular, when a non-annotated ontology entails the GCI C D, we want the annotated version of the ontology to entail (C D, m) for some monomial m. The non-idempotent case could be relevant when one is not concerned with provenance for GCI entailment, and is left as future work. Many useful semirings are -idempotent. Examples of these are: the Boolean semiring, used for probabilistic query answering in databases; the security semiring, used to determine the minimal level of clearance required to get the consequence; and the fuzzy semiring which allows to determine the truth degree of the consequence (see e.g. [Senellart, 2017] for details on these semirings and more examples). Second, let us explain the restrictions on the form of the right-hand side of the GCIs. Example 4 illustrates the case of conjunctions. Qualified role restrictions lead to the same kind of behavior (they can be seen as implicit conjunctions). Example 4. Let O = {(A B C, v), (A(a), u)}. All the following interpretations which interpret a by itself and monomials by their representatives are models of O: AI1 = {(a, u)}, BI1 = {(a, u v)}, CI1 = {(a, u v)} AI2 = {(a, u)}, BI2 = {(a, u)}, CI2 = {(a, v)} AI3 = {(a, u)}, BI3 = {(a, 1)}, CI3 = {(a, u v)} Since the semantics does not provide a unique way to split the monomial u v between the two elements of the conjunction, O |= (B(a), m) for any m NM, and in particular, O |= (B(a), u v). It is arguably counter-intuitive since we intuitively know that a is in A with provenance u and that A is a subclass of the intersection of B and C with provenance v. Partially normalizing the ontology before annotating it, or more specifically, replacing e.g. annotated GCIs of the form (C C1 C2, v) by (C C1, v) and (C C2, v), may be acceptable in most cases, even if the rewritten ontology leads to additional arguably natural consequences compared to the original one. For instance, even if O |= (A B, v) in Example 4, in many cases a user would accept to change the GCI of O to (A B, v) and (A C, v) as it may reflect Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) the original intention of the GCI since {A B, A C} and A B C are semantically equivalent. One could argue that it would be better to define the semantics so that only I1 was a model of O in Example 4, instead of restricting the language as we do. We explain next why this is not so simple. One possibility is to change the definition of satisfaction of a GCI by an interpretation such that I |= (A B C, m) iff for every (d, n I) AI, then (d, (m n)I) BI and (d, (m n)I) CI, and similarly for qualified role restrictions. This approach leads to a counter-intuitive behavior. For instance if O = {(A(a), u), (B(a), v)}, then O |= (A B A B, 1), since (a, (u v)I) (A B)I for every model I of O, but there is a model I of O such that (a, (u v)I) / AI (and (a, (u v)I) / BI). In contrast, our definition of satisfaction ensures that for every interpretation I and concept C, I |= (C C, 1). Another possibility is to modify the interpretation of conjunctions and qualified role restrictions such that (C D)I = {(d, m I) | (d, m I) CI, (d, m I) DI} and ( R.C)I = {(d, m I) | e I s.t. (d, e, m I) RI, (e, m I) CI}. In this case, we lose even basic entailments from annotated ABoxes; e.g., {(A(a), u), (B(a), v)} |= ((A B)(a), u v). We also lose the entailment of the GCI from Example 3. Hence, restricting the syntax to prevent conjunctions on the right and defining the semantics as usual in DLs seems to be the most natural way of handling provenance in DL languages with conjunction. Since EL ontologies are often already expressed in normal form, the main restriction in our language is the avoidance of qualified existential restrictions on the right-hand side. 2.4 Annotated Queries Following Calvanese et al. [2019], we extend DL conjunctive queries with binary and ternary predicates, where the last term of the tuple is used for provenance information. Recall that by the semantics of annotated ontologies, tuples can only contain monomials. A Boolean conjunctive query (BCQ) q is a sentence x.ϕ( x, a), where ϕ is a conjunction of (unique) atoms of the form A(t1, t), R(t1, t2, t); ti is an individual name from a, or a variable from x; and t (the last term of the tuple) is a variable from x that does not occur anywhere else in q (Calvanese et al. call such a query standard). We use P( t, t) to refer to an atom which is either A(t1, t) or R(t1, t2, t), and P( t, t) q if P( t, t) occurs in q. A match of the BCQ q = x.ϕ( x, a) in the annotated interpretation I is a function π : x a I I m, such that π(b) = b I for all b a, and π( t, t) P I for every P( t, t) q, where π( t, t) is a shorthand for (π(t1), π(t)) or (π(t1), π(t2), π(t)) depending on the arity of P. I satisfies the BCQ q, written I |= q, if there is a match of q in I. A BCQ q is entailed by an annotated ontology O, denoted O |= q, if every model of O satisfies q. For a BCQ q and an interpretation I, νI(q) denotes the set of all matches of q in I. The provenance of q on I is the expression prov I(q) := P π νI(q)[Q P ( t,t) q π (t)] where π(t) is the last element of the tuple π( t, t) P I; and π (t) is the only m N[M] s.t. m I = π(t). For p NP, we write p prov I(q) if p is a sum of monomials and for each occurrence of a monomial in p we find an occurrence of its representative in prov I(q). I satisfies q with provenance p NP, denoted I |= (q, p), if I |= q and p prov I(q). O |= (q, p), if O |= q and p prov I(q), for all I |= O. We call (q, p) an annotated query. Remark. When O contains only assertions (no GCIs, RIs, and RRs), we can compare the provenance annotations we obtain to the database case. Similarly to Trio-provenance, the sums of monomials distinguish different ways the query atoms can be mapped to annotated interpretations. For example, given O = {(R(a, b), v1), (R(b, a), v2)} and query q = xytt .R(x, y, t) R(y, x, t ), it holds that O |= (q, v1 v2 + v1 v2). The provenance annotation v1 v2 + v1 v2 distinguishes among two derivations using the same axioms, contrary to the why-provenance v1 v2. Note that given an axiom α and O that may contain GCIs, RIs, and RRs, the sum over all monomials m such that O |= (α, m) represents all possible derivations of α, in the why-provenance spirit. The size |X| of an annotated ontology, a polynomial or a BCQ X is the length of the string representing X, where elements of NC, NR, NI and NV in X are of length one. We often omit annotated and refer to ontologies, queries, assertions, etc. when it is clear from the context. 3 Reasoning with Annotated ELHr Ontologies We present a completion algorithm for deriving basic entailments from an ELHr ontology. As usual with completion algorithms, we restrict to ontologies in normal form. The annotated ELHr ontology O is in normal form if for every GCI (α, v) O, α is of the form A B, A A B, A R, or R.A B, with A, A NC { }, B NC. Every annotated ELHr ontology can be transformed, in polynomial time, into an ontology in normal form which entails the same axioms over the ontology signature, using the following rules where b C, b D / NC { } and A is a fresh concept name: NF1 : (C b D E, v) ( b D A, 1), (C A E, v) NF2 : ( R. b C D, v) ( b C A, 1), ( R.A D, v) NF3 : ( b C R, v) ( b C A, 1), (A R, v). Theorem 5. Let O be an annotated ELHr ontology, α an axiom, and m a monomial. Let O be obtained by applying exhaustively Rules NF1-NF3 to O. If O |= (α, m), then O |= (α, m). If O |= (α, m) and every concept name occurring in α occurs in O, then O |= (α, m). Before describing the reasoning algorithm in detail, we present an important property of entailment; namely, that all entailment problems can be polynomially reduced to each other. This allows us to focus on only one problem. In particular, we focus on entailment of annotated assertions. Theorem 6. Let O be an annotated ontology, and (α, m) an annotated GCI, RR, or RI. One can construct in polynomial time an ontology O and an annotated assertion (β, n) such that O |= (α, m) iff O |= (β, n). Conversely, if (α, m) is an Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) if then (if Φ / O) CR0 X NC NR { } occurs in O add Φ = (X X, 1) to O CR1 (R1 R2, m1), (R2 R3, m2) O add Φ = (R1 R3, [m1 m2]) to O CR2 (R S, m1), (ran(S) A, m2) O add Φ = (ran(R) A, [m1 m2]) to O CR3 (A R, m1), (R S, m2) O add Φ = (A S, [m1 m2]) to O CR4 (A B, m1), (B C, m2) O add Φ = (A C, [m1 m2]) to O CR5 (A B, m1), (B R, m2) O add Φ = (A R, [m1 m2]) to O CR6 (A B1, m1), (A B2, m2), (B1 B2 C, m3) O add Φ = (A C, [m1 m2 m3]) to O CR7 (ran(R) B1, m1), (ran(R) B2, m2), (B1 C1, m3), (B2 C2, m4), (C1 C2 C, m5) O add Φ = (ran(R) C, [m1 m2 m3 m4 m5]) to O CR8 (A B C, m1), ( B, m2) O add Φ = (A C, [m1 m2]) to O CR9 (A S, m1), (ran(S) B, m2), (B C, m3), (S R, m4), ( R.C D, m5) O add Φ = (A D, [m1 m2 m3 m4 m5]) to O CR10 (A R, m1), ( B, m2), ( R.B C, m3) O add Φ = (A C, [m1 m2 m3]) to O CR11 a ind(O) add Φ = ( (a), 1) to O CR12 (R(a, b), m1), (R S, m2) O add Φ = (S(a, b), [m1 m2]) to O CR13 (A(a), m1), (A B, m2) O add Φ = (B(a), [m1 m2]) to O CR14 (A1(a), m1), (A2(a), m2), (A1 A2 B, m3) O add Φ = (B(a), [m1 m2 m3]) to O CR15 (R(a, b), m1), (A(b), m2), ( R.A B, m3) O add Φ = (B(a), [m1 m2 m3]) to O CR16 (R(a, b), m1), (ran(R) A, m2) O add Φ = (A(b), [m1 m2]) to O Table 1: Completion rules. A, . . . , D NC { }, R, S, Ri NR, m, mi NM. annotated concept (resp. role) assertion, one can construct in polynomial time an ontology O and two annotated concept (resp. role) inclusions (β, n), (γ, n) such that O |= (α, m) iff O |= (β, n) or O |= (γ, n). We adapt the classical EL completion rules to handle annotated ELHr ontologies in normal form. The algorithm starts with the original ontology O, and extends it through an iterative application of the rules from Table 1 until O becomes saturated; i.e., no more rules are applicable. We cannot use the rules of [Baader et al., 2008a] which eliminate range restrictions by adding GCIs with qualified role restrictions on the right so we designed rules for ELHr. A rule application may add axioms annotated with monomials, and other assertions ( (a), 1), which are not foreseen in the definition of annotated ontologies. Still, -idempotency ensures that all monomials have at most |O| factors. To show that the completion algorithm is sound and complete for deciding assertion entailment, we prove a stronger result. The k-saturation of O is the saturated ontology Ok obtained from O through the completion algorithm restricted to monomials of length at most k. We show that Ok suffices for deciding entailment of annotated assertions (α, m) where m is a monomial of length at most k. Theorem 7. If Ok is the k-saturation of O, then 1. Ok is computable in polynomial time w.r.t. the size of O, and in exponential time w.r.t. k, 2. for every assertion α and monomial mk with at most k variables, O |= (α, mk) iff (α, [mk]) Ok. This theorem states that to decide whether an assertion (α, m) is entailed by O, one just needs to find the k-saturation of O, where k is the number of variables in m, and then check whether (α, [m]) Ok. Due to the first point of Theorem 7 and Theorem 6, we obtain the following corollary. Corollary 8. For every axiom α, O |= (α, m) is decidable in polynomial time in |O| and in exponential time in |m|. In general there is no need to interrupt the completion algorithm; the ontology saturated without restricting the monomial length can be used to decide all relevant entailments regardless of the length of the monomial. Using Ok is merely an optimisation when one is only interested in a short monomial. While the polynomial time upper bound w.r.t. the ontology size is positive, and in line with the complexity of the EL family, the exponential time bound on the monomial size does not scale well for entailments with larger monomials. Recall that these bounds are based on the number of annotated axioms generated by the completion rules. The following example illustrates the potential exponential blow-up. Example 9. Consider O = {(A Ai, vi), (Ai B, ui) | 0 i n} {(B A, u)}. If O is the result of applying the completion algorithm to O, then for every S {1, . . . , n}, (B A, [u Πi Sui vi]) O . Following Hutschenreiter and Pe naloza [2017], we can see the completion algorithm as an automaton. More precisely, given O and (α, m), we can construct a tree automaton A, whose states correspond exactly to all the elements in Ok, such that (α, [m]) Ok iff A accepts at least one tree. Briefly, A is constructed by reading the rule applications backwards, allowing transitions from the consequence to the premises of the rule; see [Hutschenreiter and Pe naloza, 2017] for details. The number of states in A is exactly the cardinality of Ok and hence potentially exponential on k. However, the size of each state is bounded polynomially on k; the arity of the automaton is bounded by the maximum number of premises in a rule, in this case 5; and one can bound polynomially on k the number of different states that may appear in any successful run of A. Thus, A satisfies the conditions for a PSpace emptiness test [Baader et al., 2008b], which yields the following result. Proposition 10. For every axiom α, O |= (α, m) is decidable in polynomial space in |m|. Interestingly, these results allow us to bound the full complexity of answering instance queries (IQ) of the form C(a) where C is an ELHr concept and a NI. Theorem 11. Let O be an ontology, C(a) an IQ and m NM. O |= (C(a), m) is decidable in polynomial time in |O| and |C(a)|, and polynomial space in |m|. 4 Computing Relevant Provenance Variables An interesting question is whether a given variable appears in the provenance of a query q; i.e., whether a given axiom occurs Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) in some derivation of q. Formally, v NV is relevant for q (w.r.t. ontology O) iff m NM s.t. O |= (q, v m). For IQs and ELHr this problem can be solved in polynomial time, via an algorithm computing all the relevant variables for all queries of the form A(a), with a NI, A NC. We modify the completion algorithm (Section 3) to combine all monomials from a derivation, instead of storing them separately. As in Section 3, the algorithm assumes normal form and keeps as data structure a set S of annotated axioms (α, m), where α uses the vocabulary of O, and m NM. S is initialised as the original ontology where annotations of the same axiom are merged into a single monomial: S := {(α, [Πv Vαv]) | (α, u) O, Vα = {v | (α, v) O}}, and extended by exhaustively applying the rules in Table 1, where rule applications change S into S (α, m) := S {(α, m)} if there is no (α, n) S S \ {(α, n)} {(α, [m n])} if (α, n) S; i.e., add the axiom α with an associated monomial if it does not yet appear in S, and modify the monomial associated to α to include new variables otherwise. To ensure termination, a rule is only applied if it modifies S. The rules are applied until no new rule is applicable; i.e., S is saturated. Example 12. The relevance algorithm on the ontology of Example 9, yields the saturated set S = {(A A, m), (B B, m), (A B, m), (B A, m)} {(Ai B, m), (B Ai, m), (Ai A, m), (A Ai, m) | (1 i n} {(Ai Aj, m) | 1 i, j n} with m = u Πn i=1ui Πn i=1vi. Each rule application either adds a new axiom, or adds to the label of an existing axiom more variables. As the number of concept and role names, and variables appearing in S is linear on O, at most polynomially many rules are applied, each requiring polynomial time; i.e, the algorithm is polynomial. Lemma 13. If S is the saturated set obtained from O, a NI, A NC, and v NV, then v is relevant for A(a) iff v occurs in m for some (A(a), m) S. The algorithm decides relevance for assertion entailment in ELHr, yielding a polynomial-time upper bound for this problem. As in Section 3, axioms and IQs can be handled in polynomial time as well. Theorem 14. Relevance for axiom and IQ entailment in ELHr can be decided in polynomial time. This result shows that if we only need to know which axioms are used to derive an axiom or an IQ, the complexity is the same as reasoning in ELHr without provenance. This contrasts with axiom pinpointing : the task of finding the axioms responsible for a consequence to follow, in the sense of belonging to some minimal subontology entailing it (a Min A). Deciding whether an axiom belongs to a Min A is NP-hard for Horn-EL [Pe naloza and Sertkaya, 2010]. Relevance is easier in our context since provenance does not require minimality: if O = {(A B, v1), (B C, v2), (C B, v3)}, v2 and v3 are relevant for A B, but the only Min A is {A B} so other axioms are not relevant for axiom pinpointing. Provenance relevance is related to lean kernels (LKs) [Pe naloza et al., 2017], which approximate the union of Min As. The LK of a consequence c is the set of axioms appearing in at least one proof of c in a given inference method, generalizing the notion from propositional logic, where an LK is the set of clauses appearing in a resolution proof for unsatisfiability. The sets of variables computed by our algorithm are the sets of axioms used in the derivations by the completion algorithm, which is a consequence-based method for ELHr. Thus they correspond to LKs for the associated axioms and our algorithm is an alternative way of computing LKs in ELHr. 5 Query Answering with Provenance Even if ELHr is expressive enough to reduce entailment of rooted tree-shaped BCQs to assertion entailment, the methods presented in Section 3 do not apply to other kinds of BCQs. Example 15. For O={(R(a, a), u1), (A(a), u2), (A R, v1), (ran(R) A, v2)} and q = xyztt t .R(x, x, t) R(x, y, t ) R(z, y, t ), O |= (q, u1) but O |= (q, u2 v1 v2): O has a model I with RI = {(a, a, u1), (a, b1, u2 v1), (a, c1, u1 v1 v2)} {(bi, bi+1, u2 v1 v2) | i 1} {(ci, ci+1, u1 v1 v2) | i 1}. We adapt the combined approach by Lutz et al. [2009] to trace provenance. Assume that queries contain only individual names occurring in the ontology O. The combined approach builds a canonical model for O and shows that every query q can be rewritten into a query q that holds in this canonical model iff O |= q. We first define the canonical model IO of an ontology O annotated with provenance information. Assume that O is in normal form; mon(O) denotes the set of monomial representatives built using variables of NV occurring in O, and rol(O) is the set of role names occurring in O. Also assume that ( ) if there is B NC, R NR, and n NM such that O |= (ran(R) B, n), then (ran(R) B, [n]) O. This simplifies the presentation of the construction of the canonical model. Let aux(O) := {dm R | R rol(O), m mon(O)}. Assume that ind(O) aux(O) = . We define the domain of IO and the domain of monomials of IO as follows: IO := ind(O) aux(O) IO m := N[M] We define the interpretation function of IO as the union of Ii O, i 0. The function I0 O sets a I0 O = a for all a ind(O) (for a NI \ ind(O) the mapping a IO is irrelevant), m I0 O = [m] for all m NM, and for all A NC and all R NR, AI0 O := {(a, [m]) | O |= (A(a), m)} RI0 O := {(a, b, [m]) | O |= (R(a, b), m)}. If Ii O is defined, we define Ii+1 O by choosing an annotated axiom α O and applying one of the following rules in a fair way (i.e., every applicable rule is eventually applied). R1 α = (C A, m): if there is d IO and n mon(O) s.t. (d, [n]) CIi O, then add (d, [m n]) to AIi O. R2 α = (C R, m): if there is n mon(O), d IO s.t. (d, [n]) CIi O, then add (d, d[m n] R , [m n]) to RIi O. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) R3 α = (R S, m): if there are d, d IO, n mon(O) s.t. (d, d , [n]) RIi O, then add (d, d , [m n]) to SIi O. Example 16. For our running example, IO is as follows: AIO = {(a, u2), (a, u1 v2), (du2 v1 R , u2 v1 v2), (du1 v1 v2 R , u1 v1 v2), (du2 v1 v2 R , u2 v1 v2)} RIO = {(a, a, u1), (a, du2 v1 R , u2 v1), (a, du1 v1 v2 R , u1 v1 v2), (du2 v1 R , du2 v1 v2 R , u2 v1 v2), (du1 v1 v2 R , du1 v1 v2 R , u1 v1 v2), (du2 v1 v2 R , du2 v1 v2 R , u2 v1 v2)}. Proposition 17 formalises the fact that IO is a model of O. Proposition 17. IO is a model of O. We define the rewriting q of a query q, closely following Lutz et al. [2009]. It contains an additional predicate Aux, always interpreted as ( IO \ ind(O)) {1IO} in IO. Let q be the smallest transitive relation over terms of q, term(q), that includes identity relation, and satisfies the closure condition ( ) R1(t1, t2, t), R2(t 1, t 2, t ) q, t2 q t 2 = t1 q t 1. Clearly, the relation q is computable in polynomial time in the size of q. Define for any equivalence class χ of q, the set pre(χ) = {t1 | R NR s.t. R(t1, t2, t) q and t2 χ}. We define the sets Cyc and Fork= whose main purpose in the translation is to prevent spurious matches (e.g., with cycles) of a query in the anonymous part of the canonical model. Fork= is the set of pairs (pre(χ), χ) with pre(χ) of cardinality at least two. Cyc is the set of variables x in term(q) such that there are R0(t0 1, t0 2, t0), . . ., Rm(tm 1 , tm 2 , tm), . . ., Rn(tn 1, tn 2, tn) in q with n, m 0, x q tj 1 for some j n, ti 2 q ti+1 1 for all i < n, and tn 2 q tm 1 . Fork=, and Cyc can also be computed in polynomial time in the size of q. For each equivalence class χ of q, we choose a representative tχ χ. For q = x.ψ, the rewritten query q is defined as x.(ψ ϕ1 ϕ2), where x Cyc Aux(x, 1) ({t1,...,tk},χ) Fork= (Aux(tχ, 1) 1 i