# fast_compliance_checking_in_an_owl2_fragment__96f5d0f5.pdf

Fast Compliance Checking in an OWL2 Fragment

Piero A. Bonatti Universit a degli Studi di Napoli Federico II piero.bonatti@unina.it

We illustrate a formalization of data usage policies in a fragment of OWL2. It can be used to encode (i) a company s data protection policy, (ii) data subjects consent to data processing, and (iii) part of the GDPR (the forthcoming European Data Protection Regulation). Then a company s policy can be checked for compliance with data subjects consent and with part of the GDPR by means of subsumption queries. We provide a complete and tractable structural subsumption algorithm for compliance checking and prove the intractability of a natural generalization of the policy language.

1 Introduction

This work stems from the EU H2020 project SPECIAL1, where semantic technologies are used to help companies in complying with the new European General Data Protection Regulation (GDPR).2 In this project, data usage policies are encoded using a fragment of OWL2-DL and the main policyrelated reasoning tasks are reduced to subsumption and concept consistency checking. Such tasks include - among others:

permission checking: given an operation request, decide whether it is permitted;

compliance checking: does a policy P1 fulﬁll all the restrictions requested by policy P2? (Policy comparison);

policy validation: e.g. is the policy contradictory? Does a policy update strengthen or relax the previous policy?

Compliance checking is the predominant task in this project: the data usage policies of the industrial partners must be compared both with a (partial) formalization of the GDPR itself, and with the consent to the usage of personal data granted by each of the data subjects whose data are collected and processed by the company (that is called data controller in the GDPR). The number of data subjects (and their policies) can be as large as the number of customers of a major communication service provider. Moreover, in the absence of explicit

1https://www.specialprivacy.eu/ 2http://data.consilium.europa.eu/doc/document/ST-5419-2016INIT/en/pdf

consent, some data cannot be stored, even temporarily; so some of the project s use cases consist in checking storage permissions against a stream of incoming data points, at the rate of hundreds of thousands per minute. Then one of the crucial project tasks is the development of scalable reasoning procedures for reasoning in the policy fragment of OWL 2. In this paper we illustrate this fragment and introduce a structural subsumption algorithm that is a promising starting point for scalable compliance checking. Sec. 2 recalls the basics of Description Logics (DL) needed in this work. In Sec. 3 we illustrate the encoding of data usage policies in OWL2. In Sec. 4 we describe the structural subsumption algorithm for the policy language, and a policy consistency checking method. We prove correctness and completeness of these algorithms and their tractability. Moreover, we prove the intractability of a slight generalization of the policy fragment. The paper is concluded by a discussion of the results, a comparison with related work, and a description of ongoing and future work. Some proofs have been moved to Appendix A to enhance readability.

2 Preliminaries on Description Logics

Here we report the basics needed for our work and refer the reader to [Baader et al., 2003] for further details. The DL languages of our interest are built from countably inﬁnite sets of concept names (NC), role names (NR), concrete feature names (NF), and concrete predicates (NP). An interpretation I is a structure I = ( I, I) where I is a nonempty set, and the interpretation function I is such that (i) AI I for all A NC; (ii) RI I I for all R NR; (iii) f I I D for all f NF.3 The semantics of an n-ary predicate p NP is a set of tuples p D ( D)n. In this paper we use D = N and unary concrete predicates inℓ,u, where ℓ, u N, such that in D ℓ,u = [ℓ, u]. To enhance readability we will abbreviate inℓ,u(f) to [ℓ, u](f). Informally, [ℓ, u](f) means that the value of feature f belongs to the interval [ℓ, u]. Figure 1 shows some DL operators and their semantics, that extends I to compound DL expressions. The last four lines illustrate some DL axioms; GCI stands for general

3 D denotes the domain of the predicates in NP. We are assuming for brevity that there is one concrete domain. However, our framework can be immediately extended to multiple domains.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

intersection

(C D)I = CI DI

(C D)I = CI DI

restriction

{d I | (d, e) RI : e CI}

( C)I = I \ CI

concrete constraints

p(f1, .., fn)

{x I | v ( D)n. (x, vi) f I i (1 i n) and v p D}

disjointness

RI is a partial function

range(R, C)

Figure 1: Syntax and semantics of some DL constructs and axioms.

concept inclusion . An interpretation I satisﬁes an axiom α (equivalently, I is a model of α) if I satisﬁes the corresponding semantic condition in Fig. 1. A knowledge base K is a ﬁnite set of DL axioms. An interpretation I is a model of K (I |= K) if I satisﬁes all the axioms in K. We say that K entails an axiom α in symbols, K |= α if all models of K satisfy α. A pointed interpretation is a pair (I, d) where d I. We say (I, d) satisﬁes a concept C ((I, d) |= C) iff d CI.

3 Encoding Usage Policies in OWL2-DL The GDPR states that the personal data of a data subject S can be collected, stored, processed and shared by an organization (call data controller in the GDPR) only after S consents to such usage. This norm admits only a few exceptions, e.g. when personal data handling is required by law or is for the public good. Anonymized data are not subject to the GDPR and can be freely stored and processed. In project SPECIAL, compliance with the GDPR is supported by (i) formalizing consent and the data controllers data usage policies (called business policies), and (ii) formalizing selected parts of the GDPR, mainly related to consent and data transfer. The aspects of data usage that must be speciﬁed in consent requests and preserved after approval are clearly indicated in the GDPR and in the available guidelines4: reasons for data processing (purpose); which data categories are involved; what kind of processing; which third parties data are distributed to (recipients); countries in which the data will be stored (location); time limits for removal of data (duration). The above properties characterize a usage policy. In SPECIAL we adopt a direct encoding of usage policies in description logics. The simplest possible policies have the form:

purp.P data.D proc.O recip.R storage( loc.L dur.T ) . (1)

4E.g. see the Records should contain section in http://ec.europa.eu/justice/smedataprotect/index

The classes P, D, O, etc. are deﬁned in suitable vocabularies (ontologies) that specify also their mutual relationships (inclusion, disjointness etc.). SPECIAL temporarily adopts simple vocabularies derived from ODRL5 (for describing processing) and P3P6 (for the other properties), while longerterm standardization activities are in progress. Currently, such taxonomies are encoded with simple inclusions A B and disjointness constraints disj(A, B), where A, B are concept names.7 Duration is represented internally as an interval of integers [t1, t2]. All of the roles in (1) are functional. The above formalization represents the class of all data usage modalities whose properties are instances of the respective vocabulary terms, that is, when the data subject consents to (1), then she authorizes all of its instances. For example if D = Demographic Data then the data subject authorizes the use of her name, address, age, income, etc. as speciﬁed by the other policy properties. It frequently happens that the data controller intends to use different data categories in different ways, according to their usefulness and sensitivity, so consent requests comprise multiple simple usage policies like (1) (one for each usage type). The intended meaning is that consent is requested for all the instances of all those policies; accordingly, such a compound policy is formalized as the union of its components and a full (usage) policy is a concept P1 . . . Pn (2) where each Pi has the form (1). Example 1 A company call it Be Fit sells a wearable ﬁtness appliance and wants (i) to process biometric data (stored in the EU) for sending health-related advice to a customer, and (ii) share her location data with her friends. Location data are kept for a minimum of one year but no longer than 5; biometric data are kept for an unspeciﬁed amount of time. In order to do all this legally, Be Fit needs consent from its customers. The internal (formalized) consent description would look as follows:

( purp.Fitness Recommendation data.Biometric Data proc.Analytics recip.Be Fit storage.loc.EU) ( purp.Social Networking data.Location Data proc.Transfer recip.Data Subj Friends storage.(loc.EU [y1, y5](dur)) .

Here y1 and y5 are the internal, integer representation of one year and ﬁve years. If Heart Rate is a subclass of Biometric Data and Compute Avg a subclass of Analytics , then the above consent allows Be Fit to compute the average heart rate of the data subject for sending her ﬁtness

5https://www.w3.org/TR/odrl/ 6http://www.w3.org/TR/P3P11 7In the journal version we will show how to support general EL knowledge bases.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

recommendations. The purpose could be further detailed by saying how the data subject is contacted, e.g. by replacing the ﬁrst line with purp.(Fitness Recommendation contact.SMS).

The usage policies that are actually applied by the data controller in its business processes are called business policies and include a description of data usage of the form (1). Additionally, each business policy describes the obligations that must be fulﬁlled, related to the speciﬁed usage. For example, if the data category includes personal data, then, according to the GDPR, the business policy should have the additional conjuncts

duty.Get Consent duty.Give Access duty.Rectify On Request duty.Delete On Request (4)

that model the obligations related to the data subjects s rights prescribed by the GDPR. Such business policies are an abstract description of a business process, highlighting the aspects related to compliance with the GDPR and with data subjects consent. Similarly to consent, a business policy may be a union BP1 . . . BPn of simple business policies BPi of the form (1) (4). In some contexts, business policies can be assembled in a semi-automated way by tagging: data sources and their schemas with vocabulary terms specifying the data source s location and the data categories it stores, as well as the third parties that may access those data; programs or processes with vocabulary terms describing the nature of processing and its purpose, as well as the third parties to which the results are transferred. In order to check whether a business process complies with the consent given by a data subject S it sufﬁces to check whether the corresponding business policy BP is subsumed by the consent policy of S, denoted by CPS (in symbols, BP CPS). This subsumption is checked against a knowledge base that encodes type restrictions related to policy properties and the corresponding vocabularies, i.e. subclass relationships, disjointness constraints, functionality restrictions, domain and range restrictions, and the like. In order to verify that all the required obligations are fulﬁlled by a business process (as abstracted by the business policy), selected parts of the GDPR are formalized with concepts like:

( duty.Get Consent duty.Give Access . . .) data.Anonymous purp.Law Requirement . . . (5)

(that states that a business policy should either satisfy (4) or concern anonymous data, or some of the exceptional cases such as particular law requirements), and

storage.loc.EU storage.loc.EULike . . . (6)

(stating that data should remain within the EU, or countries that adopt similar data protection regulations).

Then a business policy BP can be checked for compliance with the formalized parts of the GDPR by checking whether the aforementioned knowledge base entails that BP is subsumed by all the above concepts. Example 2 The following business policy complies with the consent-related obligations formalized in (5) since it is subsumed by it:

( purp.Fitness Recommendation data.Biometric Data proc.Analytics recip.Be Fit storage.loc.EU duty. . . . (4) . . .) ( purp.Sell data.Anonymous proc.Transfer recip.Third Party) .

In particular, the two disjuncts of (7) are subsumed by the ﬁrst two lines of (5), respectively.

Based on the above discussion, we are now ready to specify a fragment of OWL 2 called PL (policy logic) that covers and slightly generalizes the language outlined above. Deﬁnition 1 (Policy logic PL) A PL knowledge base K is a set of axioms of the following kinds: func(R) where R is a role name or a concrete feature; range(S, A) where S is a role and A a concept name; A B where A, B are concept names; disj(A, B) where A, B are concept names. A simple PL concept has the form:

A1 . . . An E1 . . . Em C1 . . . Ct (8)

where each Ai is either a concept name or , each Ej is an existential restriction R.C such that R is a role and C a simple PL concept, and each Ck is a concrete constraint [l, u](f). A (full) PL concept is a union D1 . . . Dn of simple PL concepts. PL s subsumption queries are expressions C D where C, D are (full) PL concepts.

4 Compliance Checking and its Complexity In this section we introduce a structural subsumption algorithm for deciding whether K |= C D, for all given PL KB K and all PL subsumptions C D. We start by introducing an auxiliary algorithm for elementary subsumptions, that are PL subsumptions C D where both C and D are simple, and C D is interval safe, that is, for all constraints [ℓ, u](f) occurring in C and [ℓ , u ](f ) occurring in D, either [ℓ, u] [ℓ , u ], or [ℓ, u] [ℓ , u ] = .8 The structural subsumption algorithm (Algorithm 1, hereafter STS) takes as input a PL KB K and an elementary PL

8We will show later that all subsumptions can be turned into equivalent interval safe subsumptions.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

subsumption C D, where C is normalized w.r.t. K using the rewrite rules in Table 1. There, denotes the reﬂexive and transitive closure of {(A, B) | (A B) K}. By a straightforward case analysis we obtain Proposition 1:

Algorithm 1: STS(K, C D)

Input: K and an elementary C D where C is normalized Output: true if K |= C D, false otherwise Note: Below, by C = C C we mean that either C = C or C is a conjunct of C (possibly not the ﬁrst one) begin 1 if C = then return true 2 if D = A, C = A C and A A then return true 3 if D = [l, u](f) and C = [l , u ](f) C and l l and 4 u u then return true if D = R.D , C = ( R.C ) C and 5 STS(K, C D ) then return true if D = D D , STS(K, C D ), and 6 STS(K, C D ) then return true else return false 7 end 8

Proposition 1 The rewrite rules in Table 1 preserve concept equivalence w.r.t. K, i.e. if C C then K |= C C .

In order to prove that STS is complete w.r.t. elementary subsumptions we need a canonical counterexample to invalid subsumptions.

Deﬁnition 2 Let C = be a simple PL concept normalized w.r.t. K. A canonical model of C is a pointed interpretation (I, d) deﬁned as follows, by recursion on the nesting level of existential restrictions. Hereafter we call a subconcept of C top level if it does not occur within the scope of .

a. If C = dn i=1 Ai dt j=1 Cj (i.e. C has no existential restrictions), then let I = {d}, I where

AI = {d} if for some i = 1, . . . , n, Ai A; if Cj = [l, u](f), add (d, u) to f I (1 j t); all the other predicates are empty.

b. If the top-level existential restrictions of C are Ri.Di (i = 1, . . . , m), then let (Ii, di) be a canonical model of Di, for each i = 1, . . . , m. Assume w.l.o.g. that all such models are mutually disjoint and do not contain d (if necessary, their elements can be replaced). Deﬁne an auxiliary interpretation J as follows:

J = {d, d1, . . . , dm}; for all concept names A such that for some toplevel Ai in C, Ai A, let AJ = {d}; all other concept names are empty; for each top-level constraint [l, u](f) in C, add (d, u) to f J ; for each top-level restriction Ri.Di in C, add a pair (d, di) to RI i ; there are no other pairs in roles and features.

Finally let I be the union of J and all Ii, that is

I = J S i Ii

i AIi (A NC)

i RIi (R NR NF) .

The canonical model is (I, d).

Note that each C has a unique canonical model up to isomorphism. The canonical model actually satisﬁes K and C:

Lemma 1 If C is a simple PL concept normalized w.r.t. K, and C = , then the canonical model (I, d) of C enjoys the following properties:

b. (I, d) |= C.

Moreover, the canonical model of C characterizes all the valid elementary subsumptions whose left-hand side is C:

Lemma 2 If C D is elementary, and C is normalized w.r.t. K, then K |= C D iff (I, d) |= D, where (I, d) is the canonical model of C.

Basically, STS decides whether (I, d) |= D.

Lemma 3 If C D is elementary, C = , and C is normalized w.r.t. K, then STS(K, C D) = true iff (I, d) |= D, where (I, d) is the canonical model of C.

The correctness and completeness of STS easily follow:

Theorem 2 If C D is elementary and C is normalized w.r.t. K, then STS(K, C D) = true iff K |= C D.

Proof. There are two possibilities. If C = , then clearly K |= C D and STS(K, C D) = true (line 2 of STS), so the theorem holds. If C = , then theorem follows immediately from lemmas 2 and 3.

Now, through Lemma 2, subsumption checks over full PL concepts can be reduced to elementary subsumptions.

Theorem 3 For all interval-safe PL subsumption queries σ = C1 . . . Cm D1 . . . Dn such that each Ci is normalized w.r.t. K, the entailment K |= σ holds iff for all i [1, m] there exists j [1, n] such that K |= Ci Dj.

Proof. By simple logical inferences, these two facts hold: (i) K |= σ iff K |= Ci Fn j=1 Dj holds for all i [1, m], (ii) if K |= Ci Dj holds for some j [1, n], then K |= Ci Fn j=1 Dj. So we are only left to show the converse of (ii): assuming that K |= Ci Dj for all j [1, n], we shall prove that K |= Ci Fn j=1 Dj. By assumption and Lemma 2, the canonical model (I, d) of Ci is such that (I, d) |= Dj for all j [1, n]. Therefore (I, d) |= Fn j=1 Dj. Then K |= Ci Fn j=1 Dj follows by noting that (I, d) satisﬁes both K and Ci by Lemma 1.

Theorem 3 directly yields an obvious polynomial-time algorithm for checking PL subsumptions: it sufﬁces to check that for each Ci there exists Dj such that K |= Ci Dj. So:

Theorem 4 Interval-safe PL subsumption queries can be answered in polynomial time.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

1) D 2) R. 3) [l, u](f) if l > u 4) ( R.D) ( R.D ) D R.(D D ) D if func(R) K 5) [l1, u1](f) [l2, u2](f) D [max(l1, l2), min(u1, u2)](f) D if func(f) K 6) R.D D R.(D A) D if range(R, A) K and A not a conjunct of D 7) A1 A2 D if A1 A 1, A2 A 2, and disj(A 1, A 2) K

Table 1: Normalization rules w.r.t. K. Intersections are treated as sets (the ordering of conjuncts and their repetitions are irrelevant).

Proof. Let σ be a PL query of the form illustrated in Theorem 3. By that theorem, after normalizing the Ci, it sufﬁces to call STS(K, Ci Dj) at most m n times. Each such calls scans Ci for each subconcept of Dj, searching for a matching concept. Matching may require to visit the hierarchy , so the cost of each call is O(|Dj| |Ci| |K|) = O(|σ|2 |K|). Then the overall cost of a naive implementation excluding normalization is O(|σ|4 |K|). Normalization is O(|σ|2 |K|), due to the cost of checking pairwise disjointness of concept names, so it is dominated by the cost of STS s runs.

We are left to discuss the interval-safety prerequisite needed by Theorem 3.9 It can be satisﬁed as follows, with a preliminary interval normalization phase of the query C D. For each [ℓ, u](f) in C, let x1 < x2 < < xr be the integers that occur as interval endpoints in D and belong to [ℓ, u]. Let x0 = ℓand xr+1 = u and replace [ℓ, u](f) with the equivalent concept

[xi, xi](f) [xi + 1, xi+1 1](f) [xr+1, xr+1](f). (9)

Then use distributivity of over and the equivalence R.(C1 C2) R.C1 R.C2 to move all occurrences of to the top level. Denote the result of this interval normalization phase with C .

Example 3 Let C = [1, 9](f) A and D = [5, 12](f). Then r = 1 and x0 = 1, x1 = 5, x3 = 9 (12 falls outside [1, 9] and is ignored). Concept [1, 9](f) is split as follows: [1, 1](f) [2, 4](f) [5, 5](f) [6, 8](f) [9, 9](f). Then C = ([1, 1](f) A) ([2, 4](f) A) ([5, 5](f) A) ([6, 8](f) A) ([9, 9](f) A).

Readers may easily verify that

Proposition 5 For all PL subsumption queries C D, C is equivalent to C and C D is an interval-safe PL subsumption query.

Clearly, interval normalization may inﬂate C exponentially, due to the application of distributivity (e.g. this happens with the concepts C and D in the proof of Theorem 7). We have to rely on the structure of policies to get a polynomial time bound: simple ( -free) policies have at most one, functional concrete feature (dur) so no combinatorial explosion occurs during pre-normalization. Accordingly and more generally the following proposition holds:

Proposition 6 PL subsumption queries C1 . . . Cm D can be answered in polynomial time if there exists a constant

9Theorem 2 can also be proved without assuming interval safety. The proof will be given in a forthcoming journal version.

c such that for each Ci, the number of concrete features occurring in Ci is bounded by c.

Note that C may contain any number of interval constraints, as m grows, because the bound applies to each Ci individually. This may seem unsatisfactory at a ﬁrst glance. Unfortunately, no general polynomial-time algorithms exist (unless P=NP) because the interplay of interval constraints and makes unrestricted PL subsumption queries intractable:

Theorem 7 Subsumption checking in PL is co NP-complete. The result holds even if the knowledge base is empty.

We conclude this section by pointing out that the normalization rules in Table 1 can be used as a policy validation method, to check that a full PL concept is satisﬁable.

Proposition 8 Let K be a PL knowledge base.

1. A PL concept C = C1 . . . Cn is unsatisﬁable w.r.t. K iff Ci for all i [1, n].

2. PL concept satisﬁability w.r.t. K can be checked in polynomial time.

Proof. Point 1 follows immediately from Prop. 1 and Lemma 1. It is easy to see that normalization takes polynomial time, so Point 2 holds.

5 Discussion and Future Work

The encoding of usage policies in OWL 2 using the tractable fragment of the description logic PL introduced in this paper is simple enough to reduce compliance checking to tractable structural subsumption tests. Such tests can be computed in polynomial time w.r.t. the size of the ontology and the size of the policies involved (i.e. company policies and formalized consent). For a ﬁxed company policy and ﬁxed vocabularies, the complexity of verifying whether the company policy complies with the explicit consent released by data subjects grows linearly with the size of formalized consent, so we expect compliance checking to scale well to large numbers of data subjects and articulated consent expressions. Tractability relies on the bound on the number of concrete features in policies, as unrestricted PL subsumption is co NP-complete. We are planning as part of the activities of the SPECIAL project to implement and optimize the structural subsumption algorithm and perform scalability tests. The compliance checking task is well-suited for parallelization (e.g. the compliance tests with respect to different consent policies are all mutually independent and can be carried out in parallel). For this purpose we will leverage the Big Data Europe infrastructure adopted by SPECIAL. We are also going to integrate the structural subsumption algorithm with ELK, a specialized

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

reasoner for the tractable proﬁle OWL2-EL [Kazakov et al., 2013]. In this way, the vocabularies of data categories, purposes, locations, etc. can be expressed with OWL2-EL, instead of the limited axioms over concept names allowed in PL knowledge bases. The real-time checking of storage consent for data streams, mentioned in the introduction, most likely requires further optimization. We foresee knowledge compilation techniques as a promising approach. Their feasibility can only be veriﬁed experimentally, and will be the subject of further work. The previous encodings of policies in KR languages, such as [Uszok et al., 2003; Kagal et al., 2003; Bonatti et al., 2010], focus on access control and trust management, rather than data usage control. Consequently, those languages lack the terms for expressing privacy-related and usage-related concepts. A more critical drawback is that the main reasoning task in those papers is permission checking; policy comparison (which is central to our work) is not considered. Both Rei and Protune [Kagal et al., 2003; Bonatti et al., 2010] support logic program rules. If rules are recursive then policy comparison is generally undecidable, otherwise it is NP-hard. The comparison of PL policies, instead, is tractable (Proposition 6). Similarly, KAo S [Uszok et al., 2003] is based on a DL that, in general, is not tractable, and supports rolevalue maps that in general make reasoning undecidable (see [Baader et al., 2003], Chap. 5). The authors do not discuss how to avoid this issue. A DL-based approach that does not suffer from potential undecidabiblity and stresses the importance of policy comparison and other nonstandard reasoning tasks is [Kolovski et al., 2007]. However, tractable fragments and scalability lie beyond the scope of that paper. Our tractability and intractability results do not follow directly from any previous work. The most expressive language enjoying a complete structural subsumption algorithm to the best of our knowledge is the description logic underlying CLASSIC [Borgida and Patel-Schneider, 1994], that supports neither concept unions ( ) nor qualiﬁed existential restrictions ( R.C). If unions were added, then subsumption checking would immediately become co NP-hard (unless concrete domains were restricted) for the same reasons why unrestricted subsumption checking is co NP-hard in PL (cf. Theorem 7). On the other hand, CLASSIC additionally supports qualiﬁed universal restrictions (that strictly generalize PL s range restrictions), number restrictions, and role-value maps, therefore it is not comparable to PL. PL partially intersects the EL family of tractable DL, too. In particular, EL++ supports (hence it can express disj), , qualiﬁed existential restrictions and concrete domains. It is known that role range restrictions can be added without affecting tractability [Baader et al., 2008], and that union can be supported in queries. However, it has been proved that functional roles make EL EXP-complete [Baader et al., 2005]. A tractability result for empty TBoxes can be found in [Haase and Lutz, 2008]; however, in the same paper, it is proved that even with acyclic TBoxes, subsumption is co NP-complete. Therefore, PL is incomparable with the known tractable logics in the EL family. Moreover, our co NP-completeness result is not entailed by the intractability result for EL with functional roles.

A Proofs Proof of Lemma 1 By induction on the maximum nesting level ℓof C s existential restrictions. If ℓ= 0 (i.e. there are no existential restrictions) then (I, d) |= C by construction (cf. Def. 2.a). Disjointness axioms are satisﬁed, otherwise normalization would make C = (contradicting the hypotheses). Inclusion axioms are satisﬁed due to the ﬁrst and third bullets of Def. 2.a. Moreover, I trivially satisﬁes all the functionality and range assertions in K since all roles are empty. It follows that the lemma holds for the base case. Now suppose that ℓ> 0. By induction hypothesis (I.H), we have that all the submodels (Ii, di) used in Def. 2.b satisfy both Di and K. Then it is immediate to see that (I, d) |= C by construction. We are only left to prove that I satisﬁes all axioms α in K. If α = func(R), then rewrite rules 4) and 5) make sure that C contains at most one existential restriction for R, so J satisﬁes α. Since all Ii satisfy α by I.H. (induction hypothesis), I satisﬁes α, too. If α = range(R, A), then rule 6) makes sure that for each top-level Ri.Di in C, Di D i A. Then, by I.H., (Ii, di) |= A. If α = disj(A, B), and α were not true in J , then rule 7) would make C = (a contradiction); the other parts of I, i.e. (Ii, di), satisfy α by I.H. Finally, inclusions are satisﬁed by construction, cf. the second bullet of Def. 2.b. It follows that I satisﬁes α.

Proof of Lemma 2 (Only If part) Assume that K |= C D. By Lemma 1.a, I |= K, so CI DI. Moreover, by Lemma 1.b, d CI DI. Therefore (I, d) |= D. (If part) Assume that (I, d) |= D. We are going to prove that K |= C D by structural induction on D. If D = A (a concept name), then d AI. Then, by construction of I, C = Ai C (up to top-level subconcept reordering), and Ai A. These two facts, respectively, imply |= C Ai and K |= Ai A, hence K |= C D. If D = D1 D2, then by I.H. K |= Di (i = 1, 2), hence K |= C D. If D = R.D1, then for some di I, (d, di) RI and (Ii, di) |= D1, where (Ii, di) (by construction of I) is the canonical model of a top-level restriction R.C1 in C. It follows (using the I.H.) that |= C R.C1 and K |= C1 D1, hence K |= C D. If D = [ℓ, u](f), then for some u [ℓ, u], (d, u ) f I. By construction of I, C must contain a top-level constraint [ℓ , u ](f), and by interval safety (as C D is elementary), [ℓ , u ] [ℓ, u]. Then |= C D.

Proof of Lemma 3 By structural induction on D. If D = A (a concept name), then STS(K, C D) = true iff C has a top-level subconcept Ai such that Ai A. By def. of I, this holds iff d AI, that is, (I, d) |= D. This proves the base case. If D = D1 D2, then the lemma follows easily from the induction hypothesis. If D = R.D1, then STS(K, C D) = true iff (i) C has a top-level subconcept R.C1, and (ii) STS(K, C1 D1) =

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

true. Moreover, by def. of I,(I, d) |= D iff (i) holds and (ii ) (Ii, di) |= D1, where (Ii, di) is the canonical model of C1. By I.H., (ii) is equivalent to (ii ), so the lemma holds. If D = [ℓ, u](f), then STS(K, C D) = true iff C has a top-level subconcept [ℓ , u ](f) such that [ℓ , u ] [ℓ, u]. This implies (by def. of I) that (d, u ) f I and u [ℓ, u], that is, (I, d) |= D. Conversely, if (d, u ) f I and u [ℓ, u], then C must have a top-level subconcept [ℓ , u ](f) such that also ℓ [ℓ, u] (by interval safety, that holds by the hypothesis that C D is elementary). Then STS(K, C D) = true.

Proof of Theorem 7 Hardness is proved by reducing 3SAT to the complement of subsumption. Let S be a given set of clauses ci = Li1 Li2 Li3 (1 i n) where each Lij is a literal. We are going to use the propositional symbols p1, . . . , pm occurring in S as feature names in PL concepts, and deﬁne a subsumption C D that is valid iff S is unsatisﬁable. Let C = [0, 1](p1) . . . [0, 1](pm) and D = Fn i=1 Li1 Li2 Li3 , where each Lij encodes the complement of Lij as follows:

Lij = [0, 0](pk) if Lij = pk , [1, 1](pk) if Lij = pk .

The correspondencebetween the propositional interpretations I of S and the interpretations J of C D is the following. Given I and an arbitrary element d, deﬁne J = {d}, J such that (d, 0) p J i iff I(pi) = false, and (d, 1) p J i otherwise. By construction, (J , d) |= C, and I |= S iff (J , d) |= D. Consequently, if S is satisﬁable, then C D is not valid. Conversely, if C D is not valid, then there exist J and d J such that (J , d) |= C D. Deﬁne a propositional interpretation I of S by setting I(p) = true iff (d, 1) p J i . By construction (and since d does not satisfy D), I |= S, which proves that if C D is not valid, then S is satisﬁable. We conclude that the above reduction is correct. Moreover, it can be clearly computed in LOGSPACE. This proves that subsumption is co NP-hard even if the KB is empty. Membership in co NP can be proved by showing that the complement of subsumption is in NP. Given a query C D, it sufﬁces to choose nondeterministically one of the disjuncts Ci in the left hand side of the query, and replace each constraint [ℓ, u](f) occurring in Ci with a nondeterministically chosen disjunct from (9). Call C i the resulting concept and note that it is one of the disjuncts in C . Therefore, K |= C D iff for some nondeterministic choice, K |= C i D. The latter subsumption test can be evaluated in deterministic polynomial time with STS, so the complement of PL subsumption is in NP.

Acknowledgements The use case about GDPR compliance illustrated with (5) and Example 2 is due to Benedict Whittam Smith (Thomson Reuters). The author is grateful to the anonymous reviewers for their constructive comments. This research is funded by the European Union s Horizon 2020 research and innovation programme under grant agreement N. 731601.

References [Baader et al., 2003] Franz Baader, Diego Calvanese, Deborah L. Mc Guinness, Daniele Nardi, and Peter F. Patel Schneider, editors. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, 2003. [Baader et al., 2005] Franz Baader, Sebastian Brandt, and Carsten Lutz. Pushing the EL envelope. In IJCAI-05, pages 364 369. Professional Book Center, 2005. [Baader et al., 2008] Franz Baader, Carsten Lutz, and Sebastian Brandt. Pushing the EL envelope further. In Kendall Clark and Peter F. Patel-Schneider, editors, Proceedings of the Fourth OWLED Workshop on OWL: Experiences and Directions, volume 496 of CEUR Workshop Proceedings. CEUR-WS.org, 2008. [Bonatti et al., 2010] Piero A. Bonatti, Juri Luca De Coi, Daniel Olmedilla, and Luigi Sauro. A rule-based trust negotiation system. IEEE Trans. Knowl. Data Eng., 22(11):1507 1520, 2010. [Borgida and Patel-Schneider, 1994] Alexander Borgida and Peter F. Patel-Schneider. A semantics and complete algorithm for subsumption in the CLASSIC description logic. J. Artif. Intell. Res., 1:277 308, 1994. [Haase and Lutz, 2008] Christoph Haase and Carsten Lutz. Complexity of subsumption in the EL family of description logics: Acyclic and cyclic tboxes. In Malik Ghallab, Constantine D. Spyropoulos, Nikos Fakotakis, and Nikolaos M. Avouris, editors, ECAI 2008 - 18th European Conference on Artiﬁcial Intelligence, volume 178 of Frontiers in Artiﬁcial Intelligence and Applications, pages 25 29. IOS Press, 2008. [Kagal et al., 2003] Lalana Kagal, Timothy W. Finin, and Anupam Joshi. A policy language for a pervasive computing environment. In 4th IEEE International Workshop on Policies for Distributed Systems and Networks (POLICY), pages 63 . IEEE Computer Society, 2003. [Kazakov et al., 2013] Yevgeny Kazakov, Markus Kr otzsch, and Frantiˇsek Simanˇc ık. The incredible ELK: From polynomial procedures to efﬁcient reasoning with EL ontologies. Journal of Automated Reasoning, 53:1 61, 2013. [Kolovski et al., 2007] V. Kolovski, J. Hendler, and B. Parsia. Analyzing web access control policies. In Proceedings of the 16th International Conference on World Wide Web, WWW 07, pages 677 686,, 2007. ACM. [Uszok et al., 2003] Andrzej Uszok, Jeffrey M. Bradshaw, Renia Jeffers, Niranjan Suri, Patrick J. Hayes, Maggie R. Breedy, Larry Bunch, Matt Johnson, Shriniwas Kulkarni, and James Lott. KAo S policy and domain services: Towards a description-logic approach to policy representation, deconﬂiction, and enforcement. In 4th IEEE International Workshop on Policies for Distributed Systems and Networks (POLICY), pages 93 96, Lake Como, Italy, June 2003. IEEE Computer Society.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)