# enhancing_controlled_query_evaluation_through_epistemic_policies__3a5a348e.pdf

Enhancing Controlled Query Evaluation through Epistemic Policies

Gianluca Cima1 , Domenico Lembo1 , Lorenzo Marconi1 , Riccardo Rosati1 and Domenico Fabio Savo2

1Sapienza University of Rome 2University of Bergamo {lastname}@diag.uniroma1.it, domenicofabio.savo@unibg.it

In this paper, we propose the use of epistemic dependencies to express data protection policies in Controlled Query Evaluation (CQE), which is a form of confidentiality-preserving query answering over ontologies and databases. The resulting policy language goes significantly beyond those proposed in the literature on CQE so far, allowing for very rich and practically interesting forms of data protection rules. We show the expressive abilities of our framework and study the data complexity of CQE for (unions of) conjunctive queries when ontologies are specified in the Description Logic DL-Lite R. Interestingly, while we show that the problem is in general intractable, we prove tractability for the case of acyclic epistemic dependencies by providing a suitable query rewriting algorithm. The latter result paves the way towards the implementation and practical application of this new approach to CQE.

1 Introduction

Controlled Query Evaluation (CQE) is a confidentialitypreserving query answering approach that protects sensitive information by filtering query answers in such a way that a user cannot deduce information declared confidential by a data protection policy [Biskup, 2000; Biskup and Bonatti, 2004; Bonatti and Sauro, 2013; Cuenca Grau et al., 2013]. In CQE, one crucial aspect concerns the expressiveness of the policy language, which determines the form of the logic formulas definable in the policy, consequently influencing a designer s capacity to declare which pieces of information must not be disclosed. Previous literature has mainly considered only policies consisting of sentences, i.e. closed formulas, as in [Sicherman et al., 1983; Biskup and Bonatti, 2004; Bonatti and Sauro, 2013; Lembo et al., 2019]. Through this approach, it is only possible to impose that the truth value of a sentence in the policy cannot be inferred from the system by asking queries. For example, [Lembo et al., 2019] study CQE when policy statements take the form x(q( x) ), referred to as denial. Enforcing one such denial over the data means refraining from disclosing the inference of the Boolean

conjunctive query (BCQ) x q( x) by the system, even if the inference has occurred. For instance, the rule

δ1 = x, y(Patient(x) admitted(x, y) )

says that it is confidential whether there exists a patient admitted in a hospital department. However, if the aim is to hide admitted patients, the above dependency is imposing an excessively stringent level of protection, since it is denying the presence of patients in the hospital to protect their identity, which is implausible in practice. A more effective approach would be instead to require that the system must not answer the open query q(x) : y(Patient(x) admitted(x, y)). Intuitively, specifying a rule of this kind means imposing that the set of patients that the system knows to be admitted to the hospital has not to be disclosed. To properly capture this behavior, we propose to use an epistemic operator K in policy formulas, in the spirit of [Calvanese et al., 2007b; Console and Lenzerini, 2020]. This allows us to formalize the epistemic state of the user, that is, what the system can tell to the user without disclosing sensitive information. In our example, the policy rule is as follows:

δ2 = x K y(Patient(x) admitted(x, y)) K

Rule δ2 imposes that, in the epistemic state of the user, the set of admitted patients must be empty (but this does not exclude that the user knows that some patients have been admitted). In fact, our proposal enables us to accomplish more than just that. In a more advanced scenario, concealing the relationship between a patient and a hospital should apply only if the patient has not signed a consensus form. This can be expressed by the following formula:

δ3 = x K y(Patient(x) admitted(x, y)) KConsent(x)

Intuitively, the formula is saying that if a user knows that a patient has been admitted, then she must know that the patient has signed a consensus form. Thus, if a patient did not sign a consensus form, the system cannot disclose that this patient has been admitted. In general, this kind of policy is well-suited for encoding the principle of privacy by default, which is a desirable property in data protection (as expressly outlined in Article 25 of GDPR [European Union, 2016]). We remark once again that policy rules of the form just described have not been previously considered in the literature. It is however worth noticing that in [Cuenca Grau et al., 2015]

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)

CQE is studied for a policy consisting of one single open CQ. In this latter framework, a rule analogous to δ2 of our example can be in fact specified, but richer policies, as those requiring more denials and/or rules as δ3 above, are non-expressible. Formulas as δ2 and δ3 are called epistemic dependencies (EDs). EDs have been originally introduced in [Console and Lenzerini, 2020] to express integrity constraints in ontologybased data management and are indeed a special case of domain-independent EQL-Lite(CQ) sentences [Calvanese et al., 2007a]. In the present paper, we use EDs as policy rules that must be satisfied to preserve data confidentiality over Description Logic (DL) ontologies. Similarly, integrity constraints must be satisfied to guarantee data consistency. However, our aim is totally different from that of [Console and Lenzerini, 2020], whose focus is on consistency checking. After defining the policy language, to completely characterize our CQE framework, we need to specify its semantics. This issue is addressed in CQE through censors. In this paper, we use CQ-censors introduced in [Lembo et al., 2019]. In a nutshell, given an ontology O, an optimal CQ-censor is a maximal subset C of the set of all BCQs inferred by O, such that C coupled with the intensional component of O (i.e. its TBox T ) satisfies all rules in the policy (i.e. no secrets are disclosed through standard query answering over T C). As in [Lembo et al., 2019], we then define CQE as the problem of computing the query answers that are in the intersection of the answer sets returned by all optimal censors. We call this problem SC-entailment (Skeptical entailment under CQcensors). This form of CQE does not suffer from the problem of having to arbitrarily select an optimal censor among several incomparable ones (see e.g., in [Bonatti and Sauro, 2013; Cuenca Grau et al., 2015]). CQ-censors are particularly interesting from a practical perspective, since, for ontologies specified DL-Lite R [Calvanese et al., 2007b] (a DL suited for modelling data-intensive ontologies) and policies expressed as denials, SC-entailment of BCQs is tractable in data complexity [Lembo et al., 2019]. One of this paper s aims is then to study data complexity of SC-entailment of BCQs under epistemic policies for DL-Lite R ontologies, with the ultimate goal of identifying conditions ensuring its tractability. Besides the computational complexity study, we also carry out an analysis of the robustness of SC-entailment with respect to confidentiality-preservation. In [Biskup and Weibert, 2008; Bonatti and Sauro, 2013; Bonatti, 2022], it is shown that censoring mechanisms based on an indistinguishability criterion are indeed more secure than others. In abstract terms, according to such a criterion, confidentiality is guaranteed only if query answers returned over a data instance with sensitive information coincide with those returned over a data instance without secrets (which is thus indistinguishable from the other instance). In this paper, we investigate whether the entailment we consider enjoys indistinguishability. Specifically, our main results are for ontologies in DL-Lite R and policies that are sets of EDs. In more detail:

As for indistinguishability, we show that SC-entailment of BCQs preserves confidentiality as defined in [Biskup and Weibert, 2008], but that this result does not carry over to unions of BCQs (BUCQs). We however prove

Query Confidentiality Data language preservation complexity SC-ent. IC-ent. SC-ent. IC-ent. Acyclic BCQ yes in AC0

Arbitrary BCQ yes co NP-c Acyclic BUCQ no yes in AC0 in AC0

Arbitrary BUCQ no yes co NP-c co NP-c

Table 1: Results on data complexity and confidentiality preservation. co NP-c stands for co NP-complete.

that the property holds even for BUCQs in the case of IC-entailment, a sound approximation of SC-Entailment that considers a single censor given by the intersection of all optimal CQ-censors (Section 4); We show that SC-entailment and IC-entailment of BCQs and BUCQs are co NP-complete in data complexity (Section 5.1); For acyclic EDs, we exhibit a rewriting algorithm that allows us to prove that SC-entailment and IC-entailment of BCQs and BUCQs are first-order rewritable, and thus in AC0 in data complexity (Section 5.2).

Our results are summarized in Table 1. An extended version with complete proofs can be found in [Cima et al., 2024].

2 Preliminaries We employ standard notions of function-free first-order logic (FO), and consider FO formulas using only unary and binary predicates called concepts and roles, respectively, as in Description Logics, which are fragments of FO suited for conceptual modeling [Baader et al., 2020]. We assume the existence of pairwise-disjoint countably-infinite sets ΓC, ΓR, ΓI, ΓN, and ΓV containing atomic concepts, atomic roles, constants (also known as individuals), labeled nulls, and variables, respectively. An FO formula ϕ is sometimes denoted as ϕ( x), where x is the sequence of the free variables occurring in ϕ. We also use the term query as a synonym of FO formula and Boolean query as a synonym of closed FO formula (also called sentence). An FO theory Φ is a set of FO sentences. The semantics of Φ is given in terms of FO interpretations over ΓC ΓR ΓI.W.l.o.g., we consider interpretations sharing the same infinite countable domain = ΓI, so that every element in ΓI is interpreted by itself. In other terms, we use standard names, as often customary when one deals with epistemic operators [Calvanese et al., 2007a]. We write eval(ϕ, I) to indicate the evaluation of an FO sentence ϕ over an FO interpretation I. A model of an FO theory Φ is an FO interpretation satisfying all sentences in Φ. We say that Φ entails an FO sentence ϕ, denoted by Φ |= ϕ, if eval(ϕ, I) is true in every model I of Φ. A Description Logic (DL) ontology O = T A consists of a TBox T and an ABox A, representing intensional and extensional knowledge, respectively. In this paper, an ABox is a finite set of atoms using predicate symbols from ΓC ΓR and terms from ΓI ΓN (ABoxes of this form are also called quantified ABoxes [Baader et al., 2020]). A model of an ontology T A is any model of the FO theory T { x ϕA( x)},

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)

where x is a sequence of variables from ΓV and ϕA( x) is the conjunction of all the atoms of A in which each labeled null is replaced with a distinct variable x x. With a little abuse of notation, we sometimes treat an ontology T A (resp. A) as the FO theory T { x ϕA( x)} (resp. { x ϕA( x)}). For instance, this allows us to write T A |= ϕ to intend T { x ϕA( x)} |= ϕ, where ϕ is an FO sentence. Our complexity results are given for ontologies expressed in DL-Lite R, a member of the well-known DL-Lite family of DLs [Calvanese et al., 2007b]. A DL-Lite R TBox T is a finite set of axioms of the form B B and R R (called concept and role inclusions), or B B and R R (called concept and role disjointnesses), where B and B (resp., R and R ) are predicates of the form A, S or S (resp., S or S ) such that A ΓC, S ΓR, S is the inverse of S, and the unqualified existential restrictions S and S represent the set of objects appearing as the first and second argument of S, respectively. As usual when studying query answering over DL ontologies, we focus on the language of conjunctive queries (and variants thereof). A conjunctive query (CQ) takes the form of an FO formula yϕ( x, y), where x y ΓV and ϕ( x, y) is a finite, non-empty conjunction of atoms of the form α( t), where α ΓC ΓR, and each term in t is either a constant in ΓI or a variable in x y. We also consider the special CQ and assume that eval( , I) is false for every FO interpretation I. A union of conjunctive queries (UCQ) is a disjunction q1( x) . . . qn( x) of CQs. For convenience, sometimes we treat UCQs as sets of CQs. Boolean CQs and UCQs, for short, are respectively indicated as BCQs and BUCQs. Given a CQ q, we denote by Len(q) the number of atoms in q. Given a UCQ q, Max Len CQ(q) = maxq q Len(q ). We denote by BCQ (resp. BUCQ) the language of all the BCQs (resp. BUCQs), and, for a positive integer k, by BCQk (resp. BUCQk) the language of BCQs (resp. BUCQs) q such that Len(q) k (resp., Max Len CQ(q) k). Given a TBox T , an ABox A, and a Boolean query language Lq BCQ, we denote by Lq-Cons(T A) the set of Boolean queries q Lq that are logical consequences of T A. Formally: Lq-Cons(T A) = {q Lq | T A |= q} A ground substitution for a sequence x = x1, . . . , xk of variables is a sequence of constants c = c1, . . . , ck. Furthermore, if x are the free variables of a FO formula ϕ( x), we indicate as ϕ( c) the FO sentence obtained from ϕ( x) by replacing each xi with ci, for 1 i k. We recall that query answering of UCQs in DL-Lite R is first-order rewritable, i.e. for every DL-Lite R TBox T and UCQ q( x), it is possible to compute an FO query qr( x) such that, for every ground substitution c of x, T A |= q( c) iff A |= qr( c). To compute qr( x), we use the algorithm Perfect Ref presented in [Calvanese et al., 2007b], for which the following property holds.

Proposition 1. Let T be a DL-Lite R TBox and let q( x) be a UCQ. For every ABox A and every ground substitution c for x, we have that T A |= q( c) if and only if A |= qr( c), where qr( x) = Perfect Ref(q( x), T ).

We point out that, by construction, qr = Perfect Ref(q, T ) is a UCQ and that Max Len CQ(qr) = Max Len CQ(q).

3 Framework In this section, we describe our CQE framework. We first give the notion of epistemic dependencies that we use in the policies, then present the notion of censor, and finally provide two notions of query entailment in our novel framework. Epistemic dependencies. The policy P of our framework is a finite set of epistemic dependencies, each of which can be seen as a domain-independent EQL-Lite(CQ) [Calvanese et al., 2007a] sentence defined as follows. Definition 1. An epistemic dependency (ED) is a sentence δ of the following form: x1, x2(Kqb( x1, x2) Kqh( x2)) (1)

where qb( x1, x2) is a CQ with free variables x1 x2, qh( x2) is a CQ with free variables x2, and K is a modal operator. The variables x2 are called the frontier variables of δ. Intuitively, an ED of form (1) should be read as follows: if the sentence qb( c1, c2) is known to hold, then the sentence qh( c2) is known to hold, for any ground substitutions c1 and c2 for x1 and x2, respectively. More formally, we define when an FO theory Φ satisfies an ED δ, denoted Φ |=EQL δ. To this aim, we consider the set E of all FO models of Φ, and say that Φ |=EQL δ if, for every ground substitutions c1 for x1 and c2 for x2, the fact that eval(qb( c1, c2), I) is true for every I E implies that eval(qh( c2), I) is true for every I E. We say that Φ satisfies a policy P (denoted Φ |=EQL P) if Φ satisfies δ, for each δ P. We remark that, as already said, EDs of the form (1) have been originally introduced in [Console and Lenzerini, 2020], although in a slightly more general form, to express integrity constraints in ontology-based data management.Then, the notion of ED satisfaction defined above is essentially as in [Console and Lenzerini, 2020]. Example 1. Suppose that company CA wants to share certain user-profiling data with a company CB for targeted advertising. This is not allowed in general, but only in some countries with a special regulation that enables sharing based on the users consent. CA may use the following ED in the policy to enable CB to access only data compliant with the above requirements: δ4 = x, y Kprofiled Activity(x, y) K z(cit Of(x, z) SR(z) Consent(x))

In the rule, profiled Activity associates a user with her profiling-data, cit Of relates a user to the country of which she is a citizen, SR denotes countries with special regulation, and Consent denotes users who have given their consent. Suppose that CA also wants CB not to be able to associate a user with her real identity, and that this is possible by collecting the person s name and her date of birth at the same time. To this aim, CA also specifies the following ED: δ5 = x, y, z K(name(x, y) date B(x, z)) K

CQ-censors. As already said, censors are used to enforce confidentiality on an ontology coupled with a data protection policy. Among various definitions of censors proposed in the literature, we adapt here the one investigated in [Lembo et al., 2019] to properly deal with policies constituted by sets of EDs. To this aim, it is convenient to first define CQE instances in our novel epistemic CQE framework.

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)

Definition 2 (CQE instance). An LT CQE instance is a triple E = T , A, P , where T is a TBox in the DL LT , A is an ABox such that T A has at least one model, and P is a policy (i.e., a finite set of EDs) such that T |=EQL P. Hereinafter, if the DL LT is not specified, we implicitly intend any possible DL. The notion of (optimal) CQ-censors is then as follows. Definition 3 ((optimal) CQ-censor). A CQ-censor of a CQE instance E = T , A, P is a subset C of BCQ-Cons(T A) such that T C |=EQL P. An optimal CQ-censor of a CQE instance E = T , A, P is a CQ-censor of E such that there exists no CQ-censor C of E such that C C. We denote by Opt CQCens(E) the set of optimal CQ-censors of E. Entailment. As in [Lembo et al., 2019], in this paper CQE amounts to reason over all the possible optimal censors, according to the following definition of SC-entailment. Definition 4 (SC-entailment). A CQE instance E = T , A, P Skeptically-entails a BUCQ q under CQ-Censors (in short, SC-entails q), denoted by E |=SC q, if T C |= q for every C Opt CQCens(E). We also consider the following sound approximation of SC-entailment. Definition 5 (IC-entailment). A CQE instance E = T , A, P entails a BUCQ q under the Intersection of CQCensors (in short, IC-entails q), denoted by E |=IC q, if T Cint(E) |= q, where Cint(E) = T C Opt CQCens(E) C.

Example 2. Consider the CQE instance E = T , A, P , where T = , P = {δ4, δ5}, with δ4 and δ5 being the EDs illustrated in Example 1, and the ABox A is as follows: A = {profiled Activity(p1, act1), Consent(p1), cit Of(p1, n1), SR(n1), name(p1, ann), date B(p1, date1), profiled Activity(p2, act2), cit Of(p2, n1)}, where n1 ΓN while all the other terms used in A are constants in ΓI. Now, consider the following four BCQs: q1 = y (profiled Activity(p1, act1) cit Of(p1, y) SR(y)) q2 = profiled Activity(p2, act2) q3 = y profiled Activity(y, act2) q4 = profiled Activity(p1, act1) name(p1, ann) For X {SC, IC}, one can verify that E |=X q1 because p1 gave the consent and she is a citizen of some country (n1) with special regulation. Conversely, one can see that E |=X q2 because p2 did not give the consent. Nevertheless, it is easy to verify that q3 C for each optimal CQ-censor C of E, and therefore E |=X q3. Finally, we have that E |=X q4 because there exists an optimal CQ-censor C of E such that date B(p1, date1) C, thus implying that name(p1, ann) C (otherwise δ5 would be violated).

In the above example, note that SCand IC-entailment coincide for all queries. As shown in the subsequent result, this always holds in the case of entailment of BCQs. Theorem 1. For every CQE instance E = T , A, P and for every BCQ q, we have that E |=SC q iff E |=IC q.

Proof. First, it is easy to verify that, for every C that is an optimal CQ-censor of E, and for every BCQ q, T C |= q

iff q C; moreover, T Cint(E) |= q iff q Cint(E). Now, let q be a BCQ. If E |=SC q, then q belongs to all the optimal CQ-censors of E, and thus q Cint(E), which implies that E |=IC q. Conversely, if E |=SC q, then there exists an optimal CQ-censor of E that does not contain q, hence q Cint(E), which implies that E |=IC q.

On the other hand, the next example shows that the same result does not hold for entailment of BUCQs. Example 3. Recall Example 2, and consider the BUCQ q = name(p1, ann) date B(p1, date1). While we have that E |=SC q, because either name(p1, ann) C or date B(p1, date1) C holds for every C Opt CQCens(E), it is easy to see that E |=IC q as neither name(p1, ann) nor date B(p1, date1) belong to Cint(E) = T C Opt CQCens(E) C.

4 Confidentiality Preservation In this section, we investigate the notion of confidentiality used in this paper. We adopt a similar approach to the one described in [Biskup and Weibert, 2008] for relational databases. Intuitively, under such an approach an entailment semantics preserves confidentiality if, for every CQE instance T , A, P and every finite set Q of queries, the answers to such queries are the same as if they were obtained from another CQE instance T , A , P such that T A |=EQL P. We now describe this property formally. First, for a TBox T , a policy P, two ABoxes A and A , a set Q of BUCQs, and X {SC, IC}, we say that A and A are Q-indistinguishable for X-entailment with respect to (T , P) if, for every q Q, we have that T , A, P |=X q iff T , A , P |=X q. Definition 6. Given a query language Lq BUCQ and a DL LT , for X {SC, IC}, we say that X-entailment preserves confidentiality for Lq in LT if, for every LT CQE instance T , A, P , and for every finite set Q Lq, there exists an ABox A such that (i) T A |=EQL P and (ii) A and A are Q-indistinguishable for X-entailment w.r.t. (T , P). For both SC-entailment and IC-entailment, we now investigate confidentiality-preservation for BUCQ in DL-Lite R. Hereinafter, with a slight abuse of notation, given a policy P, we denote by Max Len CQ(P) the maximum length (number of atoms) of a CQ occurring within the scope of the K operator in P. Proposition 2. Let E = T , A, P be a DL-Lite R CQE instance. For every integer k > 0, there exists a DL-Lite R CQE instance E = T , A , P such that (i) T A |=EQL P and (ii) for every q BCQk, E |=SC q iff E |=SC q. Proof (sketch). Let A be the ABox isomorphic to the finite set of BCQs {q BCQh | E |=SC q}, where h = max(k, Max Len CQ(P)). It is easy to verify that T A |=EQL P, from which it follows that E |=SC q iff E |=SC q for every q BCQk.

With the above property at hand, we can prove that SCentailment preserves confidentiality for BCQ in DL-Lite R. We show, however, that the same does not hold for BUCQ. Theorem 2. SC-entailment preserves confidentiality for BCQ in DL-Lite R, whereas it does not preserve confidentiality for BUCQ in DL-Lite R.

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)

Proof. The first statement is an easy consequence of Proposition 2, when we assume that k is the maximum length of a BCQ in Q. For the second statement, we give a counterexample. Let E = T , A, P , where T = , A = {C1(o), C2(o)} and P = { x(K(C1(x) C2(x)) K )}. Consider also the BUCQ q = C1(o) C2(o). It is easy to see that no ABox A is such that T A |=EQL P, and A and A are {q}- indistinguishable for SC-entailment w.r.t. (T , P).

The above proof also shows that SC-entailment does not preserve confidentiality for BUCQ in DL-Lite R even when EDs are restricted to be acyclic (see Definition 8). However, it turns out that confidentiality in the case of BUCQs in DL-Lite R is preserved by IC-entailment. Proposition 3. Let E = T , A, P be a DL-Lite R CQE instance. For every integer k > 0, there exists a DL-Lite R CQE instance E = T , A , P such that (i) T A |=EQL P and (ii) for every q BUCQk, E |=IC q iff E |=IC q.

From the above property, we get the following result. Theorem 3. IC-entailment preserves confidentiality for BUCQ in DL-Lite R.

5 Algorithms and Complexity Results In this section, we analyze the data complexity of SCand IC-entailment of B(U)CQs for DL-Lite R CQE instances. We study the decision problems associated with the query answering problem under SCand IC-entailment. Specifically, we consider the following recognition problem XREC[Lq], which is parametric w.r.t. a Boolean query language Lq and X {SC, IC}: Input: A DL-Lite R CQE instance E = T , A, P , a Boolean query q Lq Question: Does E |=X q? We are interested in the data complexity [Vardi, 1982] version of the above problem, which is the complexity where only the ABox A is regarded as the input while all the other components are assumed to be fixed.

5.1 Arbitrary Policies We start by analyzing the complexity of SC-entailment of BCQs and BUCQs. Lemma 1. SC-REC[BCQ] is co NP-hard in data complexity. Proof (sketch). We show a reduction of 3-CNF, a well-known NP-hard problem, to the complement of SC-REC[BCQ]. Let T be the empty TBox, and let P contain the following EDs:

x, y, v, z K(C1(x, y) V1(x, v) V (y, v) N(x, z) S(x)) KS(z)

x, y, v, z K(C2(x, y) V2(x, v) V (y, v) N(x, z) S(x)) KS(z)

x, y, v, z K(C3(x, y) V3(x, v) V (y, v) N(x, z) S(x)) KS(z)

x K(V (x, f) V (x, t)) K

Given a 3-CNF formula ϕ with m clauses, we represent every clause of ϕ in the ABox A through the

roles C1, C2, C3, V1, V2, V3: e.g., if the i-th clause of ϕ is a b c, we add to A the assertions C1(i, a), C2(i, b), C3(i, c), V1(i, f), V2(i, t), V3(i, f). Moreover, A contains the assertions

{V (a, f), V (a, t) | a PV(ϕ)} {S(i), N(i, i + 1) | 1 i m}

where PV(ϕ) are the propositional variables of ϕ. It can be verified that ϕ is satisfiable iff , A, P |=SC S(1).

In order to prove a matching co NP upper bound for SCREC[BUCQ], we introduce a notion of consequences of a set of BCQs C with respect to a TBox T and a policy P.

Definition 7. Let T be a DL-Lite R TBox, P a policy and C BCQ. We define Policy Cons(T , P, C) as the smallest set S BCQ such that:

(ii) for every ED x1, x2(Kqb( x1, x2) Kqh( x2)) in P and ground substitutions t1 and t2 for x1 and x2, respectively, if T S |= qb( t1, t2) then qh( t2) S.

It can be immediately verified that Policy Cons(T , P, C) can be computed in polynomial time w.r.t. the size of C. The following property can be easily derived from the previous definition and the definition of optimal CQ-censor.

Lemma 2. Let E = T , A, P be a DL-Lite R CQE instance. For every C BCQ, there exists an optimal CQ-censor of E containing C iff Policy Cons(T , P, C) BCQ-Cons(T A).

We are now ready to provide an algorithm for checking SC-entailment of BUCQs.

Algorithm 1: SC-Entails(E, q)

input: DL-Lite R CQE instance E = T , A, P , BUCQ q

1 k max(Max Len CQ(q), Max Len CQ(P));

2 if there exists C BCQk-Cons(T A) such that: (i) T C |=EQL P and (ii) T C |= q and (iii) for every q BCQk-Cons(T A) \ C, Policy Cons(T , P, C {q }) BCQk-Cons(T A)

3 then return false;

4 return true;

Example 4. Let E = T , A, P , where T = {A D}, P = { x(K(D(x) C(x)) K ), x(KB(x) KA(x))}, and A = {A(o), B(o), C(o)}. Let now C be a maximal subset of BCQ2-Cons(T A) such that T C |= A(o) B(o) D(o). One can see that, for the BCQ q = B(o), SC-Entails(E, q) returns false as C satisfies conditions (i), (ii), and (iii) of the algorithm. In particular, Policy Cons(T , P, C {q}) contains B(o), A(o), D(o), and , hence it is not a subset of BCQ2-Cons(T A).

Lemma 3. Let E = T , A, P be a DL-Lite R CQE instance, and let q be a BUCQ. The algorithm SC-Entails(E, q) returns true iff E |=SC q.

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)

Proof (sketch). First, using Lemma 2, it is easy to derive that a set of BCQs C satisfying conditions (i), (ii), and (iii) exists iff there exists an optimal CQ-censor of E containing C and not containing q (and hence iff E |=SC q). Then, it can be shown that, for a DL-Lite R TBox T and a set of BCQs C that is closed under subqueries (i.e. for every BCQ q C, C contains all the subqueries of q), T C |= q iff T Ch |= q, where h = Max Len CQ(q) and Ch = C BCQk. This is the key property that allows us to prove that, if a set of BCQs C satisfying conditions (i), (ii), and (iii) exists, then there exists a set of BCQs C BCQk-Cons(T A) satisfying such conditions, which implies the correctness of the algorithm.

The next theorem follows from Lemma 1, Lemma 3, and from the fact that the previous algorithm can be executed in nondeterministic polynomial time in data complexity. Theorem 4. SC-REC[BUCQ] is co NP-complete in data complexity. We now give a second algorithm, which makes use of the above algorithm SC-Entails to check IC-entailment.

Algorithm 2: IC-Entails(E, q)

input: DL-Lite R CQE instance E = T , A, P , BUCQ q

1 foreach BCQ q q do

2 if SC-Entails(E, q ) then return true;

3 return false;

Lemma 4. Let E = T , A, P be a DL-Lite R CQE instance and let q be a BUCQ. The algorithm IC-Entails(E, q) returns true iff E |=IC q.

Proof. First, from Definition 5, if T is a DL-Lite R TBox, then E |=IC q iff there exists a BCQ q q such that E |=IC q . Then, by Theorem 1, E |=IC q iff E |=SC q . Therefore, by Lemma 3 the thesis follows.

From Lemma 1, Lemma 4, Theorem 1, and from the fact that the algorithm SC-Entails(E, q) can be executed in nondeterministic polynomial time in data complexity, we obtain: Theorem 5. IC-REC[BUCQ] is co NP-complete in data complexity.

5.2 Acyclic Policies Given the intractability results for the set of all DL-Lite R CQE instances presented above, in this section we focus on a subclass of DL-Lite R CQE instances, whose policies enjoy an acyclicity property. First, we extend the notion of first-order rewritability defined over ground ABoxes to the case of ABoxes with labeled nulls and to the problems of SC-entailment and IC-entailment of BUCQs. Given an ABox A, we define the FO interpretation IA over the predicates ΓC ΓR plus the additional new concept Ind, and the constants ΓI ΓN (i.e. in IA we consider the symbols from ΓN as ordinary constants): IA = ΓI ΓN; a IA = a for every a ΓI ΓN;

for every concept name C, CIA = {a | C(a) A}; for every role name R, RIA = {(a, b) | R(a, b) A};

Ind IA = ΓI. Given a TBox T , a policy P and a BUCQ q, and X {SC, IC}, we say that a FO sentence q is a first-order rewriting of X-entailment of q for T and P if, for every ABox A, T , A, P |=X q iff eval(q , IA) is true. Our goal now is to define an algorithm that, for a DL-Lite R TBox T and a policy P, is able to construct a first-order rewriting of SC-entailment of q for T and P. This is not possible in general, given the co NP-completeness result provided by Theorem 5. Therefore, we now define the subclass of policies that are acyclic for a DL-Lite R TBox. Given a DL-Lite R TBox T and a policy P, the dependency graph of T and P, denoted by G(T , P), is the directed graph defined as follows: (i) the set of nodes of G(T , P) is the set of predicates occurring in T P; (ii) there is a P-edge from node p1 to node p2 in G(T , P) if and only if there exists an epistemic dependency of the form (1) in P such that p1 occurs in qb and p2 occurs in qh; (iii) there is a T-edge from node p1 to node p2 in G(T , P) if and only if there is a concept or role inclusion in T such that p1 occurs in the left-hand side and p2 occurs in the right-hand side of the inclusion. Definition 8. Given a DL-Lite R TBox T and a policy P, we say that P is acyclic for T if there exists no cycle in G(T , P) involving a P-edge. Informally, the graph G(T , P) represents the logical dependencies between the predicates in T and P: a P-edge (resp., a T-edge) from p1 to p2 means that predicate p1 may have a direct implication on p2 through P (resp., through T ). The notion of acyclicity defined above guarantees that, if (p1, p2) is a P-edge in G(T , P), then there is no path from p2 to p1, i.e. p2 does not have any (direct or indirect) implication on p1. Example 5. The following ED aims to hide the hierarchical structure of an organization unless it is public. δ6 = x, y(Khas Position(x, y) K z(works In(x, z) Public Office(z)))) Moreover, we want to hide the fact that a person collaborates with a secret service unless she holds an important position (Key Position), for example, she is the director. The following ED achieves this goal: δ7 = x(K y(collaborate(x, y) Sec Service(y)) K z(has Position(x, z) Key Position(z))) Let the policy P be P = {δ6, δ7}. For the empty TBox T = , one can see that P is acyclic for T . Conversely, for the DL-Lite R TBox T = {works In collaborate}, P is not acyclic for T , since there is a cycle in G(T , P) constituted by the P-edges (collaborate, has Position) and (has Position, works In) and the T-edge (works In, collaborate).

With this notion in place, we can now describe the decision problem we are going to study. Specifically, for X {SC, IC}, we consider the recognition problem XAREC[Lq], which is parametric w.r.t. a Boolean query language Lq. X-AREC[Lq] is defined exactly as X-REC[Lq]

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)

except that the input DL-Lite R CQE instances T , A, P are such that the policy P is acyclic for T . From now on, given a set of CQs Q,1 we denote by And(Q) the CQ y V zϕ Q ϕ , where y is a sequence of all the existentially quantified variables occurring in Q. Given a TBox T , a policy P and a CQ q( x), we denote by ϕpc(T , P, q( x)) the CQ (with free variables x) And(Policy Cons(T , P, {q( x)}), where Policy Cons(T , P, {q( x)}) is as in Definition 7, considering the free variables in x as new constant symbols. Lemma 5. Let E = T , A, P be a DL-Lite R CQE instance, let P be an acyclic policy for T , and let q( x) be a CQ. For every ground substitution c for x, there exists an optimal CQ-censor of E that contains the BCQ q( c) if and only if eval(qr( c), IA) is true, where qr( x) = Perfect Ref(ϕpc(T , P, q( x)), T ). Given a BCQ q, we say that a BCQ q is a clash for q in E if there exists an optimal CQ-censor of E containing q and there exists no optimal CQ-censor of E containing both q and q . Given a BUCQ q, we say that a BCQ q is a clash for q in E if for every q q, q is a clash for q in E. Now, let q be a BUCQ and let q ( x) be a CQ. We denote by Clash(q, q ( x), T , P) the following FO formula (with free variables x): Perfect Ref(ϕpc(T , P, q ( x)), T )

qi q Perfect Ref(ϕpc(T , P, And({q ( x), qi)}), T )

Using Lemma 5, we are able to prove the following property. Lemma 6. Let E = T , A, P be a DL-Lite R CQE instance, let q BUCQ, let q ( x) be a CQ, and let qi BCQ-Cons(T A) for every qi q. Then, for every ground substitution c for x, q ( c) is a clash for q in E iff eval(qcl( c), IA) is true, where qcl( x) = Clash(q, q ( x), T , P). It is now possible to show that, in the case of DL-Lite R CQE instances in which P is an acyclic policy for T , if a clash for a BUCQ q exists, then there exists a clash for q whose length depends only on the size of P T {q}. Lemma 7. Let E = T , A, P be a DL-Lite R CQE instance such that P is acyclic for T , and let q be a BUCQ such that qi BCQ-Cons(T A) for every qi q. Then, E |=SC q iff there exists no BCQ q such that q is a clash for q in E and Len(q ) ℓ, where ℓ= m kh, m is the number of BCQs in q, k = Max Len CQ(P), and h is the number of EDs in P. We can now define an FO sentence and then prove that it provides a first-order rewriting of SC-entailment of BUCQs. Definition 9. Let T be a DL-Lite R TBox, let P be an acyclic policy for T , and let q BUCQ. We define the FO sentence SC-Entailed(q, T , P) as follows:

qi qp Perfect Ref(qi, T )

qi q\qp Perfect Ref(qi, T )

q ( x) Q x (Clash(qp, q ( x), T , P)

x x Ind(x))

1W.l.o.g. we assume that the sets of existentially quantified variable symbols used by the CQs in Q are pairwise disjoint.

with (q) = (q) \ { }, where (q) is the powerset of q, Q is the set of CQs defined over the predicates and constants occurring in {q} P T and whose maximum length is m kh, m is the number of BCQs in q, h is the number of EDs in P, and k = Max Len CQ(P). Based on Lemma 6 and Lemma 7, we are able to prove the following crucial property for SC-Entailed(q, T , P). Lemma 8. Let E = T , A, P be a DL-Lite R CQE instance such that P is an acyclic policy for T , and let q BUCQ. Then, SC-Entailed(q, T , P) is a first-order rewriting of SCentailment of q for T and P. The above first-order rewritability property of SCentailment of BUCQs immediately implies the next result.

Theorem 6. SC-AREC[BUCQ] is in AC0 in data complexity. Finally, given a BUCQ q, we define the sentence IC-Entailed(q, T , P) as follows: W qi q SC-Entailed(qi, T , P)

It is then easy to prove the analogous of Lemma 8 (and Theorem 6) for IC-Entailed(q, T , P) and IC-entailment. Lemma 9. Let E = T , A, P be a DL-Lite R CQE instance such that P is an acyclic policy for T , and let q BUCQ. Then, IC-Entailed(q, T , P) is a first-order rewriting of ICentailment of q for T and P.

Proof. The result follows immediately from Lemma 8 and from the fact that E |=IC q iff there exists a BCQ qi q such that E |=SC qi (the latter property easily follows from Definition 5 and Theorem 1).

Theorem 7. IC-AREC[BUCQ] is in AC0 in data complexity.

6 Conclusions The results given in this paper are summarized in Table 1. Beyond their theoretical connotation, our results for acyclic dependencies are particularly interesting for practical applications, since data complexity in these cases is the same as that for standard query answering over databases, and this paves the way for implementation through consolidated SQL technology. Moreover, the table shows that confidentiality is preserved in most of the cases that we have considered. We posit that the epistemic nature of our framework makes it suited to being extended to incorporate user background knowledge, which can be modeled through appropriate epistemic formulas. The implications of this extension remain a subject for future investigation. Further possible development may explore ontology languages alternative to DL-Lite R, such as EL [Baader et al., 2005] or the OWL 2 profiles [Motik et al., 2012]. Additionally, extending the framework to accommodate preferences on data to be censored while ensuring compliance to the policy, as in [Cima et al., 2021], and examining a dynamic context where censors filter responses based on previous answers, as explored in [Bonatti et al., 2022], are paths for further research.

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)

Acknowledgements

This work was partially supported by: projects FAIR (PE0000013) and SERICS (PE00000014) under the MUR National Recovery and Resilience Plan funded by the EU - Next Generation EU; GLACIATION project funded by the EU (N. 101070141); ANTHEM (Adva Nced Technologies for Human-centr Ed Medicine) project (CUP B53C22006700001) funded by the National Plan for NRRP Complementary Investments; the MUR PRIN 2022LA8XBH project Polar (POLicy specific Ation and enfo Rcement for privacy-enhanced data management); and by the EU under the H2020-EU.2.1.1 project TAILOR (grant id. 952215).

References [Baader et al., 2005] Franz Baader, Sebastian Brandt, and Carsten Lutz. Pushing the EL envelope. In Proc. of the 19th Int. Joint Conf. on Artificial Intelligence (IJCAI), pages 364 369, 2005. [Baader et al., 2020] Franz Baader, Francesco Kriegel, Adrian Nuradiansyah, and Rafael Pe naloza. Computing compliant anonymisations of quantified ABoxes w.r.t. EL policies. In Proc. of the 19th Int. Semantic Web Conf. (ISWC), volume 12506 of Lecture Notes in Computer Science, pages 3 20. Springer, 2020. [Biskup and Bonatti, 2004] Joachim Biskup and Piero A. Bonatti. Controlled query evaluation for enforcing confidentiality in complete information systems. Int. J. Inf. Sec., 3(1):14 27, 2004. [Biskup and Weibert, 2008] Joachim Biskup and Torben Weibert. Keeping secrets in incomplete databases. Int. J. Inf. Sec., 7(3):199 217, 2008. [Biskup, 2000] Joachim Biskup. For unknown secrecies refusal is better than lying. Data and Knowledge Engineering, 33(1):1 23, 2000. [Bonatti and Sauro, 2013] Piero A. Bonatti and Luigi Sauro. A confidentiality model for ontologies. In Proc. of the 12th Int. Semantic Web Conf. (ISWC), pages 17 32, 2013. [Bonatti et al., 2022] Piero Bonatti, Gianluca Cima, Domenico Lembo, Lorenzo Marconi, Riccardo Rosati, Luigi Sauro, and Domenico Fabio Savo. Controlled query evaluation in OWL 2 QL: A longest honeymoon approach. In Proc. of the 21st Int. Semantic Web Conf. (ISWC), volume 12922 of Lecture Notes in Computer Science, pages 428 444. Springer, 2022. [Bonatti, 2022] Piero A. Bonatti. A false sense of security. Artificial Intelligence, 310, 2022. [Calvanese et al., 2007a] Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Riccardo Rosati. EQL-Lite: Effective first-order query processing in description logics. In Proc. of the 20th Int. Joint Conf. on Artificial Intelligence (IJCAI), pages 274 279, 2007. [Calvanese et al., 2007b] Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Riccardo Rosati. Tractable reasoning and efficient query an-

swering in description logics: The DL-Lite family. J. of Automated Reasoning, 39(3):385 429, 2007. [Cima et al., 2021] Gianluca Cima, Domenico Lembo, Lorenzo Marconi, Riccardo Rosati, and Domenico Fabio Savo. Controlled query evaluation over prioritized ontologies with expressive data protection policies. In Proc. of the 20th Int. Semantic Web Conf. (ISWC), volume 12922 of Lecture Notes in Computer Science, pages 374 391. Springer, 2021. [Cima et al., 2024] Gianluca Cima, Domenico Lembo, Lorenzo Marconi, Riccardo Rosati, and Domenico Fabio Savo. Controlled query evaluation through epistemic dependencies. Co RR, abs/2405.02458, 2024. [Console and Lenzerini, 2020] Marco Console and Maurizio Lenzerini. Epistemic integrity constraints for ontologybased data management. In Proc. of the 37th AAAI Conf. on Artificial Intelligence (AAAI), pages 2790 2797. AAAI Press, 2020. [Cuenca Grau et al., 2013] Bernardo Cuenca Grau, Evgeny Kharlamov, Egor V. Kostylev, and Dmitriy Zheleznyakov. Controlled query evaluation over OWL 2 RL ontologies. In Proc. of the 12th Int. Semantic Web Conf. (ISWC), pages 49 65, 2013. [Cuenca Grau et al., 2015] Bernardo Cuenca Grau, Evgeny Kharlamov, Egor V. Kostylev, and Dmitriy Zheleznyakov. Controlled query evaluation for datalog and OWL 2 profile ontologies. In Proc. of the 24th Int. Joint Conf. on Artificial Intelligence (IJCAI), pages 2883 2889, 2015. [European Union, 2016] European Union. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016. Official J. of the European Union, L 119:48, 2016. [Lembo et al., 2019] Domenico Lembo, Riccardo Rosati, and Domenico Fabio Savo. Revisiting controlled query evaluation in description logics. In Proc. of the 28th Int. Joint Conf. on Artificial Intelligence (IJCAI), pages 1786 1792, 2019. [Motik et al., 2012] Boris Motik, Bernardo Cuenca Grau, Ian Horrocks, Zhe Wu, Achille Fokoue, and Carsten Lutz. OWL 2 Web Ontology Language profiles (second edition). W3C Recommendation, World Wide Web Consortium, December 2012. Available at http://www.w3.org/ TR/owl2-profiles/. [Sicherman et al., 1983] George L. Sicherman, Wiebren de Jonge, and Reind P. van de Riet. Answering queries without revealing secrets. ACM Trans. on Database Systems, 8(1):41 59, 1983. [Vardi, 1982] Moshe Y. Vardi. The complexity of relational query languages. In Proc. of the 14th ACM SIGACT Symp. on Theory of Computing (STOC), pages 137 146, 1982.

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)