# revisiting_controlled_query_evaluation_in_description_logics__a2e2fee6.pdf

Revisiting Controlled Query Evaluation in Description Logics

Domenico Lembo1 , Riccardo Rosati1 and Domenico Fabio Savo2

1Sapienza Universit a di Roma 2Universit a degli Studi di Bergamo {lembo, rosati}@diag.uniroma1.it, domenicofabio.savo@unibg.it

Controlled Query Evaluation (CQE) is a conﬁdentiality-preserving framework in which private information is protected through a policy, and a (optimal) censor guarantees that answers to queries are maximized without violating the policy. CQE has been recently studied in the context of ontologies, where the focus has been mainly on the problem of the existence of an optimal censor. In this paper we instead consider query answering over all possible optimal censors. We study data complexity of this problem for ontologies speciﬁed in the Description Logics DL-Lite R and EL and for variants of the censor language, which is the language used by the censor to enforce the policy. In our investigation we also analyze the relationship between CQE and the problem of Consistent Query Answering (CQA). Some of the complexity results we provide are indeed obtained through mutual reduction between CQE and CQA.

1 Introduction In Controlled Query Evaluation (CQE), a policy, i.e., a set of logical assertions, regulates the access to a database or knowledge base by specifying the information that must be kept secret, and a censor alters answers to queries so that conﬁdential data cannot be inferred by the users on the basis of the queries they ask. The notion of censor traces back to [Sicherman et al., 1983], and since then it has been investigated for propositional closed databases [Biskup and Bonatti, 2004a; Biskup and Bonatti, 2004b], incomplete databases [Biskup and Weibert, 2008], and, more recently, Description Logic (DL) ontologies [Bonatti and Sauro, 2013; Cuenca Grau et al., 2013; Cuenca Grau et al., 2015]. In this latter context, optimal censors are deﬁned as those censors that modify query answers in a minimal way. Intuitively, such censors hide data to preserve conﬁdentiality according to the policy, without restricting unnecessarily the ability of the system to return answers to users queries. In general, several optimal censors may exist for an instance of the CQE problem, since several incomparable ways of altering the answers may exist. For example, if the policy does not allow both facts has Name(01,John) and has-

Salary(01,2000) to be divulged, an optimal censor discloses only the former, and another one only the latter (of course, censors hiding both facts are not optimal). Previous work on CQE in DLs has mainly focused on the tasks of establishing the existence of an optimal censor and characterizing the complexity of computing it. In practice, however, considering only one such censor means making an arbitrary selection among several optimal censors. To avoid such a discretionary choice, in this paper we adopt a different approach and study query answering over all optimal censors. Intuitively, given a query q, the answers to q are the answers in the intersection of the answers computed by the optimal censors. A similar idea has been also previously discussed in [Cuenca Grau et al., 2013]. Our approach has similarities with the work on Consistent Query Answering (CQA), a declarative framework for inconsistency management based on the notion of repair [Bertossi, 2011; Bienvenu and Bourgaux, 2016]. Roughly speaking, in DL, a repair of a possibly inconsistent ontology O is any ABox for O (i.e., the extensional component of the ontology) that is consistent with the TBox (i.e., the intensional component of O), and that differs minimally from the original ABox. Then, computing query answers in CQA amounts to reasoning over all repairs and the TBox. The connection between CQE and CQA in DL is based on the intuition that the assertions in the policy in CQE seem to act as the class of assertions in T that may be violated by the data of the ABox. Some connections between CQA and a declarative approach for privacy preservation had already been discussed in [Bertossi and Li, 2013]. The framework in that paper is similar to ours, with so-called secrecy views playing essentially the role of the policy. However, the setting considered there is relational and without intensional knowledge (TBox), and secrecy views are enforced through suitable virtual modiﬁcations of database values with SQL NULLs, so that this approach is incomparable with ours. Nonetheless, in this paper we elaborate on the intuition of [Bertossi and Li, 2013] and investigate in depth the relationship between our CQE framework and CQA in DLs. We provide some general conditions ensuring that the two problems are mutually reducible, and we show cases of practical interest for which such conditions are satisﬁed and cases for which they are not, which allows us to highlight similarities and differences between the two frameworks. We notice also that the connection between

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

CQA and CQE we explain in this paper was already discussed in a preliminary form in our extended abstract [Lembo et al., 2018], in the context of a general formal framework aiming at capturing CQA, CQE, and update of DL ontologies. The ultimate goal of this paper is to investigate data complexity of answering conjunctive queries (CQs) in CQE. In our analysis we consider ontologies speciﬁed in DL-Lite R [Calvanese et al., 2007] and EL [Baader et al., 2005], two popular lightweight ontology languages, which are at the basis of two OWL tractable proﬁles1. We also consider some variants of the censor language LC, which is the language used by the censor to enforce the policy. Roughly speaking, LC is the language in which the censor expresses the sentences implied by the ontology that can be disclosed to the users without violating the policy. We provide data complexity results for the cases when: (i) LC is the ABox of the ontology (i.e., the censor can enforce the policy only by selecting facts in the ABox); (ii) LC coincides with the set of facts expressed over the signature of the ontology; and (iii) LC is the language of CQs expressed over the signature of the ontology (for EL we in fact limit to the language of CQs whose maximum length is k). Some of the complexity results follow from the correspondence between CQA and CQE; we devise novel techniques for the cases in which the CQE problem does not have a CQA counterpart. Conﬁdentiality issues in DLs have been previously studied in [Calvanese et al., 2012], under authorization views, an approach to some extent complementary to ours. Provable data privacy on views has been considered in [Stouppa and Studer, 2009], for concept retrieval and subsumption queries over ALC ontologies. Properties of censors for Boolean ALC ontologies have been investigated in [Studer and Werner, 2014], for concept subsumption only. Secrecy preserving reasoning in the presence of several agents has been instead studied in [Tao et al., 2014], for propositional Horn logics and the DL AL. Privacy-preserving query answering as a reasoning problem has been considered in [Cuenca Grau and Horrocks, 2008], whereas instance checking for EL has been studied in [Tao et al., 2010], in both cases under frameworks different from CQE. Then, the problem of establishing whether an ontology-based data integration system discloses a source query has been recently studied in [Benedikt et al., 2018]. In the rest of the paper, after some preliminaries (Sec. 2), we introduce our CQE framework (Sec. 3), and study the relationship between CQE and CQA (Sec. 4). We then establish complexity of query answering (both instance checking and entailment of CQs) for restricted censor languages (Sec. 5), and for the full censor language considered in this paper, namely CQs (Sec. 6). We conclude the paper in Sec. 7.

2 Preliminaries

We consider a signature Σ of predicates and constants, and a countably inﬁnite alphabet of variables V. To simplify the presentation, we consider only languages containing FO sentences, i.e., formulas without free variables (our results apply to open formulas as well, modulo standard encoding of open

1https://www.w3.org/TR/owl2-proﬁles/

formulas into closed ones). We use FO to indicate the language of all function-free FO sentences over Σ and V. Every language considered in this paper is a subset of FO. Given a set K FO, Mod(K) indicates the set of models of K, i.e., the FO interpretations I such that φI (i.e., the interpretation of φ in I) evaluates to true, for each φ K. A set K is consistent if it has at least one model, i.e., if Mod(K) = , inconsistent otherwise, and it entails an FO sentence φ, denoted K |= φ, if φI is true in every I Mod(K). A Boolean conjunctive query (BCQ) q is an FO sentence of the form x.conj( x), where conj( x) = α1( x) . . . αn( x), x is a sequence of variables, and each αi( x) is an atom (possibly with constants) with predicate αi and variables in x. The length of q is the number of its atoms, denoted by length(q). In the following, CQ denotes the language of BCQs over Σ and V, CQk the language of BCQs from CQ whose maximum length is k, and GA the language of single atom queries with no variables, i.e., ground atoms or facts. Verifying whether K |= α for K FO and α GA is also called instance checking. Description Logics (DLs) are decidable FO languages using only unary and binary predicates, called concepts and roles (for more details on DLs and their relationship with FO we refer the reader to [Baader et al., 2007]). A DL ontology O is a set T A, where T is the TBox and A is the ABox, providing intensional and extensional knowledge, respectively. We assume that an ABox is always a set of ground atoms. In this paper, we consider DL ontologies expressed in DL-Lite R [Calvanese et al., 2007] and EL , which extends EL [Baader et al., 2005] with the empty concept . A DL-Lite R TBox is a ﬁnite set of assertions of the form B1 B2, B1 B2, R1 R2, R1 R2, where: each Ri, with i {1, 2}, is an atomic role Q Σ, or its inverse Q ; each Bi, with i {1, 2}, is an atomic concept A Σ, or a concept of the form Q or Q , i.e., unqualiﬁed existential restrictions, which denote the set of objects occurring as ﬁrst or second argument of Q, respectively. An EL TBox is a ﬁnite set of assertions of the form C1 C2, where each Ci, with i {1, 2}, is: an atomic concept A; a concept of the form Q.C, i.e., qualiﬁed existential restriction, which denotes the set of objects that the atomic role Q relates to some instance of C; a concept C C , i.e., a conjunction of two concepts; or , i.e., the empty concept. Besides DL-Lite R and EL assertions, we also consider denial assertions (or simply denials) over concepts and roles, i.e., sentences of the form x.conj( x) , where conj( x) is such that x.conj( x) is a BCQ whose atoms use only unary and binary predicates. The length of the denial is the length of such query. A denial is satisﬁed by an ontology O if O |= x.conj( x). We will use denials to specify the policy in CQE. We will also refer to the DL DL-Lite R,den, which is an extension of DL-Lite R with denials [Lembo et al., 2015]. Given an ontology O and a language L FO, we denote by L(O) the subset of formulas of L over the predicates and constants occurring in O and the variables in V. All the complexity results given in this paper are concerned with data complexity, that in our framework is the complexity computed only with respect to the size of the ABox.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

3 CQE Framework

Our framework for CQE is adapted from the one presented in [Cuenca Grau et al., 2015]. An L CQE instance E is a triple T , A, P , where T is a TBox in the DL L, A is an ABox such that T A is consistent, and P is the policy, i.e., a set of denial assertions over the signature of T A, such that T P is consistent. Intuitively, T is the schema a user interacts with to pose her queries; A is the dataset underlying the schema; P speciﬁes the knowledge that cannot be disclosed for conﬁdentiality reasons, in the sense that the user will never get, through query answers, sufﬁcient knowledge to violate the denials in P2. We then deﬁne a censor for a CQE instance.

Deﬁnition 1 Given a CQE instance E = T , A, P and a language Lc FO(T A), a censor for E in Lc is a function cens Lc that returns a set cens Lc(E) Lc (called the theory of the censor) such that: (i) T A |= φ, for each φ cens Lc(E), and (ii) T P cens Lc(E) is consistent.

Intuitively, the censor establishes which are the sentences in Lc (called the censor language) implied by T A that can be divulged to the user while preserving the policy. A censor cens Lc for E in Lc is optimal if there does not exist a censor cens Lc for E in Lc such that cens Lc(E) cens Lc(E). The set of theories of all the optimal censors in Lc for a CQE instance E is denoted with Thoc Lc(E). Hereinafter, to simplify the notation, we will sometimes omit to specify that a censor language is limited to the signature of T A (e.g., we will use CQ instead of CQ(T A)).

Example 1 To regulate access to information about customers and the medicines they buy, a CQE instance E = (T , A, P) is used in a pharmacy, where T is an empty TBox, i.e., without assertions, A = {Buy(c1, m A), Buy(c1, m B), Buy(c2, m A)}, and P = { x.Buy(x, m A) Buy(x, m B) }. The policy speciﬁes as conﬁdential the fact that a customer buys both medicine A and medicine B (this may reveal an embarrassing disease). The optimal censors for E in CQ are only cens1 CQ and cens2 CQ, where cens1 CQ(E) contains the queries x.Buy(x, m B), Buy(c1, m A), Buy(c2, m A), and all the queries in CQ inferred by them, and cens2 CQ(E) contains the queries Buy(c1, m B) and Buy(c2, m A), and all the queries in CQ inferred by them. If we instead restrict the censor language to A (i.e., censor theories can only contain facts of the ABox), we still have only two optimal censors, i.e., cens1 A(E) = {Buy(c1, m A), Buy(c2, m A)} and cens2 A(E) = {Buy(c1, m B), Buy(c2, m A)}.

Below we provide the deﬁnition of entailment in CQE. More precisely, we give a deﬁnition for each type of censor language considered in this paper.

Deﬁnition 2 Given a CQE instance E = T , A, P and an FO sentence φ, we deﬁne the following three decision problems:

2Our notion of policy generalizes the one given in [Cuenca Grau et al., 2015], where P is a single CQ.

(CQ-Cens-Entailment): decide whether Th |= φ for every Th Thoc CQ(E). If this is the case, we write E |=cqe CQ φ. (GA-Cens-Entailment): decide whether Th |= φ for every Th Thoc GA(E). If this is the case, we write E |=cqe GA φ. (ABox-Cens-Entailment): decide whether Th |= φ for every Th Thoc A (E). If this is the case, we write E |=cqe ABox φ.

When φ is a ground atom, the above entailment problems are called CQ-Cens-Instance-Checking, GA-Cens-Instance Checking and ABox-Cens-Instance-Checking, respectively.

Example 2 Let E be the CQE instance of Example 1. We have, for instance, that E |=cqe ABox x.Buy(c1, x), but E |=cqe ABox x.Buy(x, m B). Furthermore, we have that E |=cqe CQ x.Buy(c1, x) and E |=cqe CQ x.Buy(x, m B), but E |=cqe CQ Buy(c1, m B).

4 Relationship between CQE and CQA

In this section we discuss the relationship between the CQE framework we have just deﬁned and CQA. To this aim, we ﬁrst provide a general deﬁnition for CQA. An L CQA instance J is a pair T , A where T is a consistent TBox in the DL L, and A is a DL ABox, where T A is a possibly inconsistent ontology. We then give the notion of the consistent entailment set in a language L for an FO theory Ψ and an ABox A, denoted by CES L(Ψ, A), which is the set {φ | φ L and there exists a A A such that Ψ A is consistent and Ψ A |= φ}. A repair for a CQA instance is deﬁned as follows.

Deﬁnition 3 A repair for a CQA instance J = T , A in a language Lr FO(T A) (called repair language) is a subset R of Lr such that: (i) R CES Lr(T , A); (ii) T R is consistent; (iii) there does not exist any R such that R R CES Lr(T , A) and T R is consistent. We denote by Rep Set Lr(J ) the set of repairs of J .

Deﬁnition 3 captures some deﬁnitions of repair proposed in the literature, such as the repair at the basis of the prototypical AR-semantics or the repair adopted by the CARsemantics [Lembo et al., 2015; Rosati, 2011]. Indeed, given an ontology O = T A, repairs in the ARsemantics aim to preserve as many facts as possible of those belonging to A. This means that, in a CQA instance adopting the AR-semantics, the language Lr has to be set to A. Differently, the CAR-semantics aims to preserve as many facts as possible of those implied by T and each subset of A consistent with T . Therefore, to encode such semantics Lr has to coincide with the set of ground atoms GA(O). We now provide some conditions on CQE and CQA instances that allow to establish correspondences between (theories of) censors and repairs. We ﬁrst analyze CQE instances.

Deﬁnition 4 A CQE instance E = T , A, P is CQAreducible w.r.t. a language Lc FO(T A) if: (i) for every φ Lc such that T A |= φ and {φ} T P is consistent, there exists A A such that T A P is consistent and T A |= φ; (ii) for every φ Lc and every A A such that T A P is consistent, if T A P |= φ then T A |= φ.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

If Lc is CQ (or GA, or A) we say that the instance E is CQ-Cens-CQA-Reducible (resp. GA-Cens-CQA-Reducible, or ABox-Cens-CQA-Reducible).

In words, condition (i) imposes that every logical consequence of T A that is consistent with the policy and the TBox belongs to CES Lc(T P, A), i.e., the consistent entailment set in Lc for an ontology obtained by putting together the TBox, the ABox, and the policy (which indeed might result inconsistent). Condition (ii) instead says that in a CQE insance that is CQA-reducible w.r.t Lc the sentences in the policy act as constraints on top of T A, since they never contribute to infer new formulas from Lc if added to T A (notice however that T A can contradict denials in P, and thus in Deﬁnition 4 we consider subsets of A that are consistent with T P).

Example 3 The instance E = T , A, P with T = {A B}, A = {A(d)}, P = { x.A(x) } is not GA-Cens CQA-Reducible, since it does not respect condition (i), even though it satisﬁes condition (ii) (in a trivial way). Instead, E = T , A , P with A = {A(d), B(d)}, P = { x.A(x) B(x) } and T as before, is GA-Cens-CQA-Reducible.

For CQA-reducible instances the following result holds.

Theorem 1 Let E = T , A, P be a CQE instance and let Lc FO(T A), such that E is CQA-reducible w.r.t. Lc. Then Thoc Lc(E) = Rep Set Lc( T P, A ).

Below we consider reducibility of CQA instances into CQE ones, and provide a notion analogous to Deﬁnition 4.

Deﬁnition 5 A CQA instance J = T , A is CQE-reducible w.r.t. a language Lr if there exists a partition TP TN of T such that TP A is consistent, TN is equivalent to a set of denials, and: (i) for every φ Lr, such that TP A |= φ and {φ} T is consistent, there exists A A such that T A is consistent and T A |= φ; (ii) for every φ Lr and every A A such that T A is consistent, if T A |= φ then TP A |= φ. If Lr is CQ (or GA, or A) we say that the instance J is CQ-Rep-CQE-Reducible (resp. GA-Rep-CQE-Reducible, or ABox-Rep-CQE-Reducible).

Intuitively, the above deﬁnition says that in a CQEreducible instance we can identify a portion TN of T such that its assertions act as constraints on the ontology TP A (cond. (ii)), thus TN behaves as a policy in a CQE instance. At the same time, each logical consequence in Lr of TP A consistent with T must belong to CES Lr(T , A) (cond. (i)). CQE-reducible instances have the following property.

Theorem 2 Let J = T , A be a CQA instance, such that J is CQE-reducible w.r.t Lr and T = TP TN. Then Rep Set Lr(J ) = Thoc Lr( TP , A, TN ).

We now rephrase entailment in CQA [Lembo et al., 2015]. As done for CQE, we deﬁne three entailment problems. That is, given a CQA instance J = T , A and an FO sentence φ, we deﬁne: (CQ-Rep-Entailment), i.e., decide whether T R |= φ for every R Rep Set CQ(J ), denoted by J |=cqa CQ φ; (GA-Rep-Entailment), i.e., decide whether T R |= φ for

every R Rep Set GA(J ), denoted by J |=cqa GA φ; (ABox Rep-Entailment), i.e., decide whether T R |= φ for every R Rep Set A(J ), denoted by J |=cqa ABox φ. Notice that the last two types of entailment coincide with entailment under CARand AR-semantics, respectively. The following result follows immediately from Deﬁnition 2, the deﬁnition of entailment in CQA, and Theorem 1.

Corollary 1 Let X {CQ, GA, ABox}, and let E = T , A, P be a CQE instance, such that E is X-Cens-CQAreducible, and φ an FO sentence. Then, E |=cqe X φ iff J |=cqa X φ, where J = T P, A .

Analogously, the following result follows from Deﬁnition 2, the deﬁnition of entailment in CQA, and Theorem 2.

Corollary 2 Let X {CQ, GA, ABox}, and let J = T , A be a CQA instance with T = TP TN, such that J is X-Rep-CQE-reducible, and φ an FO sentence. Then, J |=cqa X φ iff E |=cqe X φ, where E = TP , A, TN .

5 CQE under Restricted Censor Languages In this section we establish data complexity of CQE instance checking and entailment of BCQs for both DL-Lite R and EL CQE instances when the censor language is either the ABox of the instance or GA. For the former case, we establish our complexity results by exploiting a mutual reduction between entailment in CQE and CQA. For the latter case, the two frameworks behave in a slightly different way, and thus we also need to use techniques tailored to the CQE setting. The results showed in this section allow us to clarify the computational properties of query answering in CQE when we adopt a restricted censor language, i.e., which can be less expressive than the query language, as in the case of GACens-Entailment and ABox-Cens-Entailment of BCQs. We start by setting the censor language to the assertions in the ABox.

Theorem 3 Each DL-Lite R or EL CQE instance is ABox Cens-CQA-Reducible, and each DL-Lite R or EL CQA instance is ABox-Rep-CQE-Reducible.

Then, we establish an upper bound for entailment of BCQs.

Theorem 4 ABox-Cens-Entailment of BCQs is in co NP in data complexity for both DL-Lite R and EL CQE instances.

Proof (sketch). From Theorem 3 (direction from CQE to CQA) and Corollary 1, it follows that ABox-Cens-Entailment in DL-Lite R is equivalent to ABox-Rep-Entailment in DL-Lite R,den, i.e., entailment under the AR-semantics, and ABox-Cens-Entailment in EL is equivalent to ABox-Rep Entailment in EL plus denials. The thesis then follows from the fact that entailment of BCQs is in co NP in data complexity in CQA under the AR-semantics, for both DL-Lite R,den [Lembo et al., 2015], and EL plus denials. This last result is a consequence of an analogous complexity result shown in [Rosati, 2011] for EL .

The following theorem provides matching lower bounds for the results of Theorem 4.

Theorem 5 ABox-Cens-Instance-Checking is co NP-hard in data complexity for both DL-Lite R and EL CQE instances.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

Proof (sketch). The results follow from Theorem 3 (direction from CQA to CQE), Corollary 2, and from co NP-hardness of instance checking of CQA under the AR-semantics for both DL-Lite R [Lembo et al., 2015] and EL [Rosati, 2011].

Theorem 4 and Theorem 5 actually imply that both ABox-Cens-Instance-Checking and ABox-Cens-Entailment of BCQs are co NP-complete in data complexity for both DL-Lite R and EL CQE instances. We now consider the case in which the censor language coincides with GA. In this case, DL-Lite R and EL CQE instances are not always CQA-reducible, as shown in Example 3, where the non-reducible instance is both DL-Lite R and EL . Reducibility in the other way round is also not always possible. However, for DL-Lite R we can show some weaker, but useful, properties. Proposition 1 Each DL-Lite R CQE instance T , A, P , such that T P {α} is consistent for each α A, is GACens-CQA-Reducible. Also, each DL-Lite R CQA instance T , A , such that T {α} is consistent for each α A, is GA-Rep-CQE-Reducible. Proposition 1 is used to prove the following theorem, which in fact is stated for general DL-Lite R CQE instances. Theorem 6 GA-Cens-Instance-Checking and GA-Cens Entailment of BCQs are respectively in AC0 and co NPcomplete in data complexity for DL-Lite R CQE instances. Proof (sketch). For CQE instances satisfying the condition in Proposition 1 (direction from CQE to CQA), the membership results follow from that proposition, Corollary 1, and from the fact that GA-Rep-Entailment, i.e., entailment under CAR-semantics, of ground atoms and of BCQs over DL-Lite R,den CQA instances are respectively in AC0 (which follows from the results in [Lembo et al., 2015; Lembo et al., 2011]) and in co NP (which follows from the results in [Lembo et al., 2010]). The case of general DL-Lite R instances can be proved by adapting the techniques used to prove the mentioned AC0 and co NP membership for CQA. co NP-hardness for GA-Cens-Entailment of BCQs follows from Proposition 1 (direction from CQA to CQE), Corollary 2, and from co NP-hardness of GA-Rep-Entailment of BCQs in DL-Lite R [Lembo et al., 2010].

Let us now turn to EL . While to establish GA-Rep Entailment of BCQs one needs to check that the consequences follow from subsets of the ABox that are consistent with the denials [Rosati, 2011], this is not needed to establish GA-Cens-Entailment, leading to a lower upper bound, i.e., co NP. co NP-hardness then follows from the co NP-hardness of ABox-Rep-Entailment of BCQs shown in [Bienvenu and Bourgaux, 2016, Theorem 17], which carries over also to CAR-semantics and CQE (the semantic difference between CQE and CQA in this case does not show up). Theorem 7 GA-Cens-Entailment of BCQs is co NP-complete in data complexity for EL CQE instances.

6 CQE under Full Censor Language In this section we study entailment of BCQs under our CQE framework for both DL-Lite R and EL CQE instances and more expressive censor languages.

Algorithm 1: Algorithm CQ-Ent-DL-Lite R(E, q)

Input: DL-Lite R CQE instance E = T , A, P , BCQ q Output: true if E |=cqe CQ q, false otherwise let h be the maximum length of a denial in P; let k = max(h, length(q)); Φ = CQEntailed Subset(T , A, k); for i = 1 to k do remove from Φ every subset Φ such that |Φ | = i and T P Φ is inconsistent; if q Φ then return true else return false

We ﬁrst concentrate on DL-Lite R and study CQ-Cens Entailment of BCQs. We start by providing the following crucial property, which says that to solve this problem it is possible to resort to CQk-Cens-Entailment, i.e., CQE entailment deﬁned over theories of censors using CQk as censor language, denoted |=cqe CQk.

Theorem 8 Let E = T , A, P be a DL-Lite R CQE instance, let q be a BCQ, and let k = max(h, length(q)), where h is the maximum length of a denial assertion in P. Then, E |=cqe CQ q iff E |=cqe CQk q.

Proof (sketch). We prove the if direction (the other one is trivial). The hypothesis implies that q belongs to every theory of optimal censor for E in CQk, i.e., q Ψ for every Ψ Thoc CQk(E). Now, suppose E |=cqe CQ q. Then, there exists Ψ Thoc CQ(E) such that q Ψ. So, Ψ {q} T P is inconsistent. Now, the following property can be shown: Ψ {q} T P is inconsistent iff there exists φ P such that Ψ {q} {φ} is inconsistent. This property follows from the fact that, when T is a DL-Lite R TBox, the inconsistency of T P with respect to a set of BCQs Ψ implies the existence of a denial assertion that is entailed by T P and is violated by Ψ , and from the fact that Ψ is a deductively closed set of BCQs with respect to T . Moreover, since the length of a denial assertion φ is not greater than k, it is immediate to verify that if Ψ {q} {φ} is inconsistent then (Ψ CQk) {q} {φ} is inconsistent. On the other hand, since (Ψ CQk) T P is consistent, there exists Ψ Thoc CQk(E) such that Ψ CQk Ψ , but since, by hypothesis, E |=cqe CQk q, it follows that q Ψ , which implies that (Ψ CQk) {q} {φ} is consistent. This leads to a contradiction, and thus the thesis follows.

Hereinafter, without loss of generality, we assume that all formulas of the language CQk and the query q of the entailment problem use the set of 2k variables {x1, . . . , x2k}. For deciding CQ-Cens-Entailment of BCQs for the DL-Lite R case we deﬁne Algorithm 1, in which CQEntailed Subset(T , A, k) is the function returning the set of BCQs from CQk that are entailed by T A. It is immediate to verify that this function can be computed in polynomial time w.r.t. the size of A. Informally, the algorithm ﬁrst computes an integer k, based on the length of the query q and of the denials in P; then, it computes the set Φ that represents the intersection of the theories of the optimal censors for the CQE instance E in CQk:

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

Algorithm 2: Algorithm CQk-Ent-EL (E, q)

Input: EL CQE instance E = T , A, P , BCQ q Output: true if E |=cqe CQk q, false otherwise let h be the maximum length of a denial assertion in P; let k = max(h, length(q)); let Φ = CQEntailed Subset(T , A, k); if there exists Φ Φ such that Φ is a maximal subset of Φ consistent with T P and q Φ then return false else return true

this is done by removing from Φ all formulas that belong to minimal subsets of Φ that are inconsistent with T P; ﬁnally, it checks whether q belongs to Φ. Theorem 9 Let E be a DL-Lite R CQE instance, and q a BCQ. Then, E |=cqe CQ q iff CQ-Ent-DL-Lite R(E, q) returns true. Proof (sketch). It can easily be shown that the set Φ computed by CQ-Ent-DL-Lite R(E, q) is the set T Ψ Thoc CQk (E) Ψ. This property is based on the fact that, in the case of DL-Lite R TBoxes, every minimal subset of the set returned by CQEntailed Subset(T , A, k) that is inconsistent with T P contains at most k formulas. Then, it follows that E |=cqe CQk q iff q T Ψ Thoc CQk (E) Ψ. Consequently, E |=cqe CQk q iff q Φ.

Therefore, from Theorem 8 it follows that E |=cqe CQ q.

We are now ready to state a complexity result for the CQCens-Entailment problem in the case of DL-Lite R TBoxes. Theorem 10 CQ-Cens-Entailment of BCQs is in PTIME in data complexity for DL-Lite R CQE instances. Proof. We prove that the CQ-Ent-DL-Lite R algorithm provides a polynomial-time upper bound in data complexity. First, as already mentioned, CQEntailed Subset(T , A, k) can be computed in time polynomial w.r.t. the size of A. Then, it is also easy to verify that checking the consistency of an ontology consisting of a DL-Lite R TBox, a policy and a set of BCQs can be done in polynomial time as well: indeed, to check consistency, the BCQs can be encoded into a standard ABox with fresh constant symbols to represent the existential variables of the BCQs, while the policy corresponds to a set of denial assertions in the DL DL-Lite R,den, that is, T P is a DL-Lite R,den TBox. We can then check the consistency of the resulting DL-Lite R,den ontology in polynomial time [Lembo et al., 2015]. This implies that the for-loop of the algorithm can be executed in polynomial time with respect to the size of A, since it corresponds to the execution of a polynomial number of inconsistency checks of polynomiallysized DL-Lite R,den ontologies of the above form.

We now consider the case of EL CQE instances. We restrict our analysis to CQk-Cens-Entailment of BCQs, for which we provide the nondeterministic Algorithm 2. While Algorithm 1 constructs a ﬁnite fragment of the intersection of the theories of all the optimal censors for E in CQ, Algorithm 2 looks for the existence of a theory of an optimal censor for E in CQk that does not contain the query q.

The following theorem easily follows from Deﬁnition 2. Theorem 11 Let E be a EL CQE instance, and q a BCQ. Then, E |=cqe CQk q iff CQk-Ent-EL (E, q) returns true.

CQk-Ent-EL allows us to establish a co NP upper bound of the data complexity of CQk-Cens-Entailment of BCQs for EL CQE instances (note that CQEntailed Subset(T , A, k) can be computed in polynomial time in the size of A also when T is an EL TBox). It is in fact not difﬁcult to show that the above bound is tight. Theorem 12 CQk-Cens-Entailment of BCQs is co NPcomplete in data complexity for EL CQE instances.

7 Discussion and Conclusions The complexity results for DL-Lite R TBoxes given in this paper show a surprising aspect. In fact, the complexity of entailment of BCQs when the censor language is restricted either to the ABox or to the set of ground atoms is harder than when the censor language is CQ. The explanation of this lies in the fact that, in the latter case, it is possible to establish the entailment by computing the intersection of (a ﬁnite and polynomial representation of) all the theories of optimal censors (see Theorem 9), which, as shown in the previous section, can be done in polynomial time in data complexity. This property does not hold for the more restricted censor languages, which require to consider separately all the theories (actually, an exponential number of ﬁnite approximations of such theories). The above property also holds for CQk-Cens-Entailment of BCQs for EL TBoxes. However, in this case it is not possible to compute (a ﬁnite representation of) the intersection of all the theories of the optimal censors in PTIME: indeed, differently from DL-Lite R, for EL the size of the minimal subsets of the set returned by CQEntailed Subset(T , A, k) that are inconsistent with T P is not independent of the size of A. This explains the co NP-hardness in this case. The present work can be extended in several directions. For EL CQE instances we left open the lower bounds of GA-Cens-Instance-Checking and ABox-Cens-Instance Checking, and the complexity of CQ-Cens-Entailment of BCQs. Also, the PTIME upper bound for CQ-Cens Entailment of BCQs over DL-Lite R CQE instances should be reﬁned. We believe that an AC0 bound can be shown in this case: in particular, the ﬁrst-order rewritability of CQE could be proved by adapting and extending query rewriting techniques for CQA in the DL DL-Lite R,den [Lembo et al., 2015]. Then, the complexity analysis of CQE could be extended to other DLs, policy and censor languages. Also, based on the complexity analysis of CQE presented in this paper, it would be very important to look for practical techniques allowing for the implementation of CQE extensions of current DL reasoners and Ontology-based Data Access systems [Calvanese et al., 2017; De Giacomo et al., 2012].

Acknowledgements This work is partially supported by the EC within the H2020 under grant agreement 825333 (MOSAICr OWN). We thank the reviewers for pointing out a potential problem in the proof of Theorem 8 in the submitted version of the paper.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

References [Baader et al., 2005] Franz Baader, Sebastian Brandt, and Carsten Lutz. Pushing the EL envelope. In Proc. of IJCAI, pages 364 369, 2005. [Baader et al., 2007] Franz Baader, Diego Calvanese, Deborah Mc Guinness, Daniele Nardi, and Peter F. Patel Schneider, editors. The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press, 2nd edition, 2007. [Benedikt et al., 2018] Michael Benedikt, Bernardo Cuenca Grau, and Egor V. Kostylev. Logical foundations of information disclosure in ontology-based data integration. AIJ, 262:52 95, 2018. [Bertossi and Li, 2013] Leopoldo E. Bertossi and Lechen Li. Achieving data privacy through secrecy views and nullbased virtual updates. IEEE Trans. Knowl. Data Eng., 25(5):987 1000, 2013. [Bertossi, 2011] Leopoldo E. Bertossi. Database Repairing and Consistent Query Answering. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011. [Bienvenu and Bourgaux, 2016] Meghyn Bienvenu and Camille Bourgaux. Inconsistency-tolerant querying of description logic knowledge bases. In RW Tutorial Lectures, pages 156 202, 2016. [Biskup and Bonatti, 2004a] Joachim Biskup and Piero A. Bonatti. Controlled query evaluation for enforcing conﬁdentiality in complete information systems. Int. J. Inf. Sec., 3(1):14 27, 2004. [Biskup and Bonatti, 2004b] Joachim Biskup and Piero A. Bonatti. Controlled query evaluation for known policies by combining lying and refusal. Ann. Math. Artif. Intell., 40(1-2):37 62, 2004. [Biskup and Weibert, 2008] Joachim Biskup and Torben Weibert. Keeping secrets in incomplete databases. Int. J. Inf. Sec., 7(3):199 217, 2008. [Bonatti and Sauro, 2013] Piero A. Bonatti and Luigi Sauro. A conﬁdentiality model for ontologies. In Proc. of ISWC, pages 17 32, 2013. [Calvanese et al., 2007] Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Riccardo Rosati. Tractable reasoning and efﬁcient query answering in description logics: The DL-Lite family. J. of Automated Reasoning, 39(3):385 429, 2007. [Calvanese et al., 2012] Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Riccardo Rosati. Viewbased query answering in description logics: Semantics and complexity. JCSS, 78(1):26 46, 2012. [Calvanese et al., 2017] Diego Calvanese, Benjamin Cogrel, Sarah Komla-Ebri, Roman Kontchakov, Davide Lanti, Martin Rezk, Mariano Rodriguez-Muro, and Guohui Xiao. Ontop: Answering SPARQL queries over relational databases. Semantic Web, 8(3):471 487, 2017. [Cuenca Grau and Horrocks, 2008] Bernardo Cuenca Grau and Ian Horrocks. Privacy-preserving query answering in

logic-based information systems. In Proc. of ECAI, pages 40 44, 2008. [Cuenca Grau et al., 2013] Bernardo Cuenca Grau, Evgeny Kharlamov, Egor V. Kostylev, and Dmitriy Zheleznyakov. Controlled query evaluation over OWL 2 RL ontologies. In Proc. of ISWC, pages 49 65, 2013. [Cuenca Grau et al., 2015] Bernardo Cuenca Grau, Evgeny Kharlamov, Egor V. Kostylev, and Dmitriy Zheleznyakov. Controlled query evaluation for datalog and OWL 2 proﬁle ontologies. In Proc. of IJCAI, pages 2883 2889, 2015. [De Giacomo et al., 2012] Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, Antonella Poggi, Riccardo Rosati, Marco Ruzzi, and Domenico Fabio Savo. MASTRO: A reasoner for effective Ontology-Based Data Access. In Proc. of ORE, 2012. [Lembo et al., 2010] Domenico Lembo, Maurizio Lenzerini, Riccardo Rosati, Marco Ruzzi, and Domenico Fabio Savo. Inconsistency-tolerant semantics for description logics. In Proc. of RR, pages 103 117, 2010. [Lembo et al., 2011] Domenico Lembo, Maurizio Lenzerini, Riccardo Rosati, Marco Ruzzi, and Domenico Fabio Savo. Query rewriting for inconsistent DL-Lite ontologies. In Proc. of RR, 2011. [Lembo et al., 2015] Domenico Lembo, Maurizio Lenzerini, Riccardo Rosati, Marco Ruzzi, and Domenico Fabio Savo. Inconsistency-tolerant query answering in ontology-based data access. J. of Web Semantics, 33:3 29, 2015. [Lembo et al., 2018] Domenico Lembo, Riccardo Rosati, and Domenico Fabio Savo. A comprehensive framework for controlled query evaluation, consistent query answering and KB updates in Description Logics. In Proc. of KR, pages 653 654, 2018. [Rosati, 2011] Riccardo Rosati. On the complexity of dealing with inconsistency in Description Logic ontologies. In Proc. of IJCAI, pages 1057 1062, 2011. [Sicherman et al., 1983] George L. Sicherman, Wiebren de Jonge, and Reind P. van de Riet. Answering queries without revealing secrets. ACM Trans. Database Syst., 8(1):41 59, 1983. [Stouppa and Studer, 2009] Phiniki Stouppa and Thomas Studer. Data privacy for ALC knowledge bases. In Proc. of LFCS 2009, pages 409 421, 2009. [Studer and Werner, 2014] Thomas Studer and Johannes Werner. Censors for boolean description logic. Trans. Data Privacy, 7(3):223 252, 2014. [Tao et al., 2010] Jia Tao, Giora Slutzki, and Vasant G. Honavar. Secrecy-preserving query answering for instance checking in EL. In Proc. of RR, pages 195 203, 2010. [Tao et al., 2014] Jia Tao, Giora Slutzki, and Vasant G. Honavar. A conceptual framework for secrecy-preserving reasoning in knowledge bases. ACM TOCL, 16(1):3:1 3:32, 2014.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)