# bounded_predicates_in_description_logics_with_counting__f889d2c9.pdf

Bounded Predicates in Description Logics with Counting

Sanja Lukumbuzya and Mantas ˇSimkus Institute of Logic and Computation, TU Wien, Austria lukumbuzya@kr.tuwien.ac.at, simkus@dbai.ac.at

Description Logics (DLs) support so-called anonymous objects, which signiﬁcantly contribute to the expressiveness of these KR languages, but also cause substantial computational challenges. This paper investigates reasoning about upper bounds on predicate sizes for ontologies written in the expressive DL ALCHOIQ extended with closed predicates. We describe a procedure based on integer programming that allows us to decide the existence of upper bounds on the cardinality of some predicate in the models of a given ontology in a data-independent way. Our results yield a promising supporting tool for constructing higher quality ontologies, and provide a new way to push the decidability frontiers. To wit, we deﬁne a new safety condition for Datalogbased queries over DL ontologies, while retaining decidability of query entailment.

1 Introduction Knowledge Representation and Reasoning (KR&R) is playing an increasingly important role in intelligent data management, especially for complex and knowledge-intensive problem domains. Speciﬁcally, KR&R offers Description Logics (DLs) as languages suitable for describing complex domains in terms of ontologies, and automated reasoning tools allow us to infer useful insights from data repositories leveraging such ontologies. DLs are expressive enough to capture and reason about UML and ER diagrams, which are popular modeling languages in the development of data-centric applications [Berardi et al., 2005]. The adequate expressiveness and good computational properties of DLs have also led to the emergence of the so-called Ontology-based Data Access (OBDA) paradigm for information integration; see [Xiao et al., 2018; Schneider and ˇSimkus, 2020] for an overview of this area. A distinguishing feature of DLs, aimed at dealing with information incompleteness, is the ability to describe and reason about anonymous objects, that is, elements in the domain of interest that are not represented by known individuals but whose existence is logically implied by the background knowledge. Consider a simple TBox with two axioms Employee {Robin} {Skyler} and Employee Task 2has Task.Task 5has Task.Task, which tells

that Robin and Skyler are precisely the employees of a company, that employees are not tasks, and that each employee must be assigned from 2 to 5 tasks to work on. If we inspect the models of this TBox, then the named objects corresponding to Robin and Skyler will be associated with 2 to 5 anonymous objects corresponding to tasks. When the number of anonymous objects cannot be bounded (e.g., when it is forced to be inﬁnite by a recursive TBox), one often faces high computational complexity of standard reasoning tasks, and undecidability of more sophisticated problems, like answering Datalog-based recursive queries (see, e.g., [Levy and Rousset, 1998; Rosati, 2007]) or reasoning about data-manipulating actions (see, e.g., [Bagheri Hariri et al., 2013]). However, in many scenarios, the number of anonymous objects in relevant predicates can be inferred to be bounded due to numeric constraints present in the TBox. E.g., if we add to the above TBox the statement that every task must be associated to an employee (Task has Task .Employee), then the extension of Task in any model is bounded by 10. We suggest [Bednarczyk et al., 2020] for a longer discussion of numeric constraints in DLs. The question we investigate in this paper is the existence of upper bounds on predicate sizes for ontologies written in the very expressive DL ALCHOIQ with closed predicates [Franconi et al., 2011; Lutz et al., 2013]. In the context of relational databases, closed predicates are related to the notion of master data, corresponding to the static part of a database [Fan and Geerts, 2010]. E.g., in a database with a (frequently updated) table of sales transactions, the supporting tables of employees, products and clients will likely be relatively static and thus can be considered master data. Using a TBox, closed and nonclosed (open) predicates can be related by means of inclusion axioms, analogously to the way that sales transactions can be related to employees, products and customers via database dependency constraints. Our goal is to understand when and how we can infer bounds on the sizes of open predicates from the extensions of closed predicates, by taking into account the (numeric and other) constraints speciﬁed in a TBox. Our results not only yield a promising tool for supporting the construction of high-quality ontologies, but also open a new way to push the decidability frontiers for challenging data management tasks. Our contributions can be summarized as follows. We introduce the notion of bounded predicates for DLs with closed predicates. Intuitively, a predicate is bounded in

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

a knowledge base, if there exists an integer constant that provides an upper bound on the size of the predicate s extension in all models of the knowledge base. Since we are interested in aiding the design of ontologies and of the so-called ontologymediated queries (OMQs), in the paper we consider a stronger variant of predicate boundedness that requires the predicate to be bounded independently of the actual extensional data stored in a knowledge base. More precisely, we require the existence of a function f such that f(n) is an upper bound for all ABoxes of size at most n. We formalize two decision problems checking predicate boundedness in the weak and in the strong sense, and we characterize their computational complexity. For ALCHOIQ, our method is based on integer programming and it involves an intermediate step that reformulates the problem in terms of another problem called ﬁnite-inﬁnite satisﬁability. The task in this problem is to check whether a TBox has a model in which some speciﬁc predicates have ﬁnite extensions, while some other given predicates have inﬁnite extensions. This is closely related to mixed satisﬁability studied in [Gogacz et al., 2020a] and is interesting in its own right. We show that our results yield worst-case optimal complexity bounds in all cases. Speciﬁcally, checking boundedness in the weak and the strong sense is CONEXPTIMEcomplete for ALCHOIQ TBoxes. Moreover, in case predicate boundedness is inferred, the concrete bound can be readily computed; it is double exponential in the size of the input. This is worst-case optimal: one can craft a TBox that forces a double exponential number of elements in a bounded predicate. However, in case of strong boundedness, the function f as mentioned above is a polynomial function, i.e. if we ﬁx the TBox, the number of elements in bounded predicates will grow polynomially in the size of the input ABox. Finally, we show how our results can be used to deﬁne a new decidability-ensuring safety condition for OMQs based on ALCHOIQ and Datalog, providing worst-case optimal results on the combined and data complexity. This is interesting because there are very few positive results on query answering in this expressive DL. We also infer that the data complexity of satisﬁability of ALCHOIQ KBs with closed predicate is CONP-complete. An extended version of this paper can be found at https://dbai.tuwien.ac.at/staff/simkus/papers/ijcai21-bnd.pdf

2 Preliminaries

We now introduce ALCHOIQ, the main DL investigated in the paper. We assume countably inﬁnite and mutually disjoint sets NC, NR, and NI of concept names, role names, and constants, respectively. Roles are expressions of the form p and p , where p NR. We let N+ R be the set of all roles. With a slight abuse of notation, we write r to denote p if r = p, and p if r = p , for p NR. The role r is the inverse of r. Given a set R of roles, R denotes the set {r : r R}. ALCHOIQ concepts are deﬁned according to the following syntax: C := | | A | {a} | C | C C | C C | r.C | nr.C | nr.C |= nr.C, where A NC, r is a role, a NI and n 0. Concepts of the form {a}, where a NI

are nominals. An ABox is a ﬁnite set of assertions of the form A(a) or p(a, b), where a, b NI, A NC, and p NR. Given Σ NC NR, A|Σ denotes the restriction of A to the concept and role names in Σ. If Σ consists of a single concept or role name p, we simply write A|p to denote A|Σ. An expression C D, where C and D are concepts, is a concept inclusion, and an expression r s, where r and s are roles, is a role inclusion. A (TBox) axiom is a concept inclusion or a role inclusion, and a TBox is a ﬁnite set of axioms. A knowledge base (with closed predicates) (KB) is a triple (T , Σ, A), where T is a TBox, Σ NC NR is a set of closed predicates and A is an ABox. For a TBox, an ABox, or a KB X, we denote by NC(X) and NR(X) the set of concept and role names occurring in X, and by N+ R (X) the set of roles occurring in X and their inverses. As ALCHOIQ is simply a fragment of ﬁrst-order logic with a special syntax, its semantics is given in term of ﬁrst-order interpretations. An interpretation is a pair I = ( I, I), where I is a non-empty set called the domain and I is the interpretation function that assigns to each a NI a domain element a I I, to each A NC a set AI I, and to each r NR a set r I I I. The extension of the interpretation function to the remaining concepts and roles is deﬁned in the standard way [Baader et al., 2003]. A concept or a role inclusion Q1 Q2 is satisﬁed by an interpretation I if QI 1 QI 2 . We note that we make the Standard Name Assumption (SNA), which is common when dealing with closed predicates and which forces us to interpret every constant as itself, i.e., a I = a, for all I and all constants a. We say that I satisﬁes a TBox T (resp. ABox A) if it satisﬁes all axioms in T (resp. assertions in A). I satisﬁes a KB (T , Σ, A) if I satisﬁes T and A, and {p( a) : p Σ, a p I} = A|Σ.

3 Boundedness

We begin with an example that illustrates boundedness.

Example 1. Consider the following ALCHOIQ TBox T = {Empl 5 assgnd To.Proj, Proj 1assgnd To .Empl} stating that each employee in some company can be assigned to at most 5 projects, and that all projects must be assigned to least one employee. Let A be an arbitrary ABox over Empl that lists all n employees of the company, i.e., A = {Empl(a1), . . . , Empl(an)}. Further, let I be an arbitrary model of (T , {Empl}, A). As Empl is viewed as a closed predicate, |Empl I| = |A| = n. Taking into account the axioms from T , it is easy to see that the number of projects in I can be at most 5n, i.e., |Proj I| 5n. Therefore, the size of the extensions of Proj is in a way bounded w.r.t. T and Empl.

Deﬁnition 1. Let T be a DL TBox, Σ be a set of predicates occurring in T , and A be an ABox over the signature of T . A predicate p is bounded w.r.t (T , Σ, A) if there exists a bound b N s.t. in every model I of (T , Σ, A), |p I| b holds. We say that p is bounded w.r.t T and Σ if p is bounded w.r.t. (T , Σ, A), for every ABox A over the signature of T .

The problem of deciding whether all predicates in a given set are bounded w.r.t. a given TBox formulated in the DL L and a set of closed predicates is called strong boundedness, or BOUNDEDNESS(L), and it is deﬁned as follows.

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

BOUNDEDNESS(L) Input: A triple (T , Σ, ΣB), where T is a TBox in DL L and Σ, ΣB are sets of predicates occurring in T . Question: Is each C ΣB bounded w.r.t. T and Σ?

We also deﬁne the second decision problem called weak boundedness, or W-BOUNDEDNESS(L), that given a TBox T formulated in the DL L, a set of predicates Σ, and an ABox A over the signature of T , decides whether all predicates in some given set are bounded w.r.t. (T , Σ, A).

W-BOUNDEDNESS(L) Input: A pair (K, ΣB), where K = (T , Σ, A), T is a TBox in DL L, Σ, ΣB are sets of predicates occurring in T , and A is an ABox over the signature of T . Question: Is each C ΣB bounded w.r.t. K?

We now focus speciﬁcally on strong boundedness for description logics that are fragments of the ﬁrst-order logic (FO). Note that, if not speciﬁed otherwise, under boundedness we understand strong boundedness. As a technical tool we introduce ﬁnite-inﬁnite satisﬁability (ﬁ-satisﬁability), or FI-SAT(L), which is the problem of deciding for a given TBox T in the DL L, and two sets of predicates ΣF and ΣI, whether T has a model in which all the predicates in ΣF have ﬁnite extensions and all the predicates in ΣI have inﬁnite extensions.

FI-SAT(L) Input: A triple (T , ΣF , ΣI), where T is a TBox in DL L, ΣF ΣI is a set of predicates occurring in T .

Question: Is there a model I of T s.t. p I is ﬁnite for all p ΣF and is inﬁnite for all p ΣI?

We next show that we can reduce the boundedness problem to deciding ﬁ-satisﬁability. From the Deﬁnition 1, it is easy to observe that a concept or a role name p is not bounded w.r.t. T and Σ if and only if there exists an ABox A over the signature of T such that for every natural number n there exists a model I of (T , Σ, A) with |p I| > n. The following proposition, formulated for FO, shows that in this case (T , Σ, A) also has a model in which the extension of p is inﬁnite. Proposition 1. Let T be an arbitrary FO theory and let p be a unary or a binary predicate. If for every natural number n there exists a model I of T in which |p I| > n, then there also exists a model J of T in which p J is inﬁnite.

Proof. Let p be a binary predicate. For k 1, let ϕk := x1, x 1, . . . , xk, x k

1 i<j k (xi = xj x i = x j) p(xi, x i),

which is satisﬁed by interpretations in which there are k or more tuples in the extension of p. Consider the FO theory T = T {ϕn : n 1}. As for every n, there is a model I of T in which |p I| > n, every ﬁnite subset of T is satisﬁable. By compactness of FO, T is satisﬁable as well. However, by construction, T only admits models in which the extension of p is inﬁnite. Hence, there exists a model J of T such that p J

is inﬁnite. As T T , J is also a model of T and so there is a model of T in which the extension of p is inﬁnite. The case where p is unary is shown analogously.

Proposition 2. Given a TBox T and a set of predicates Σ occurring in T , a predicate p is not bounded w.r.t. T and Σ if and only if there exists a model I of T in which all predicates from Σ have ﬁnite extensions and p I is inﬁnite.

Proof. Assume that p is not bounded w.r.t. T and Σ. Then there exists an ABox A s.t. for every b N, there exists a model J of (T , Σ, A) in which |p J | > b. Note that, since we are considering DLs that are fragments of FO, we can obtain from (T , Σ, A) an FO theory T whose models coincide with those of (T , Σ, A). We let T = fo(T ) A {ϕp : p Σ}, where fo(T ) is a theory obtained by translating T into FO, and ϕp, for p Σ, encodes the semantics of the closed predicates as follows. For each concept name A Σ, let A = {c : A(c) A}. Let ϕA := x A(x) W c A x = c. For a role name r Σ, ϕr is deﬁned in a similar manner. It is easy to see that T has exactly the same models as (T , Σ, A). Due to Proposition 1, there is a model I of T (and thus also of (T , Σ, A)) in which p I is inﬁnite. As ABoxes are ﬁnite and the extensions of the predicates in Σ are fully speciﬁed by A, we have that the predicates in Σ have ﬁnite extensions in I. Let J be a model of T s.t. the predicates in Σ have ﬁnite extensions and p has an inﬁnite extension in J , and let A = {q( c) : c q I, q Σ}. As J is a model of (T , Σ, A) and |p J | > k, for all k N, p is not bounded w.r.t. T and Σ.

Corollary 1. Let T be a TBox in DL L, and Σ, ΣB be sets of predicates occurring in T . (T , Σ, ΣB) is a yes-instance of BOUNDEDNESS(L) if and only if (T , Σ, {p}) is a no-instance of FI-SAT(L), for all p ΣB.

4 Boundedness via Integer Programming

This section investigates boundedness in ALCHOIQ. We ﬁrst show that we can reduce ﬁ-satisﬁability to solving a system of linear inequalities with side conditions. We are thus able to show decidability of boundedness in ALCHOIQ and provide tight complexity bounds. The second part of the section shows that there is a function depending on the TBox and Σ NC NR, that for every n 0, computes an upper bound on the size of extensions of bounded predicates in the models of (T , Σ, A), for all ABoxes A of size n.

4.1 Decidability and Complexity Let (T , ΣF , ΣI) be an instance of FI-SAT(ALCHOIQ). For ease of presentation, we assume that ΣF and ΣI contain only concept names. This is not a limitation, as role names can be eliminated from both sets in polynomial time while preserving ﬁ-satisﬁability. This is done by introducing a fresh concept name Ar, for each role name r ΣF ΣI, and adding axioms that ensure Ar collects the domain elements in the domain and range of r. Finally, r is replaced by Ar in ΣF and ΣI. The concepts in N+ C = NC { , } {{c} : c NI} are called basic concepts and the set N+ C (T ) consists of the basic concepts occurring in a TBox T . We next ﬁx some notation that remains the same for the rest of the paper. For a given

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

TBox T , we let n T = |NC(T ) N+ R (T )|, m T be the number of constants occurring in T plus 1, and let c T be the maximum integer occurring in T . We assume that TBoxes are in normal form in which all axioms are of the following types:

B1 Bk 1 Bk Bm, B1 = n p.B2, B1 p.B2, r s,

where {B1, . . . , Bm} N+ C , n 0, k > 1, m k, p NR, and {r, s} N+ R . We call the axioms of type B1 = n r.B2 counting axioms. We also assume that every TBox T is closed under role inclusions, i.e., (i) p p T , for each p NR(T ), (ii) if r s T then r s T , and (iii) if r1 r2 T and r2 r3 T then r1 r3 T . Every ALCHOIQ TBox can be transformed in polynomial time into a TBox in normal form that is closed under role inclusions. We are now ready to describe the procedure for deciding ﬁ-satisﬁability in ALCHOIQ. Note that ﬁ-satisﬁability is a generalization of mixed-satisﬁability introduced in [Gogacz et al., 2020a], which is the problem of deciding, for a given TBox T and a set of predicates Σ, whether there exists a model of T in which all predicates from Σ have ﬁnite extensions. Inspired by the techniques in [Calvanese, 1996; Lutz et al., 2005; Pratt-Hartmann, 2005],[Gogacz et al., 2020a] presents a decision procedure for mixed-satisﬁability in ALCHOIF, a DL closely related to ALCHOIQ that only supports functionality. This procedure relies on encoding mixed-satisﬁability as a system of linear integer inequalities with side conditions, and a similar encoding was used in [Gogacz et al., 2020b] for deciding satisﬁability of ALCHOIF KBs with closed predicates. We next adapt the procedure in [Gogacz et al., 2020a] to support our generalized setting. We use tiles to describe different types of domain elements in models of T . Intuitively, a tile speciﬁes basic concepts that an element participates in and the relevant part of its neighborhood. A domain element d in a model of T is said to be an instance of a tile τ if it ﬁts the description given by τ. We then deﬁne a system of integer linear inequalities with side conditions whose solutions assign a multiplicity to each tile in a way that ensures that we can build a desired model of T by instantiating tiles according to their multiplicities. In order to properly support counting, we employ a technique from [Pratt-Hartmann, 2005] that colors the models of T using log(m T c2 T +1) fresh concept names. For each of these fresh concept names A, we add an axiom A A T . We next formally deﬁne the notion of tiles. Deﬁnition 2. Given an ALCHOIQ TBox T , a type T for T is any set with the following properties: (i) T N+ C (T ), (ii) T and T, and (iii) |T {{a} : a NI(T )}| 1. Deﬁnition 3. Given an ALCHOIQ TBox T , a tile for T is a tuple (T, ρ), where T is a type for T and ρ is a set of triples (R, T , k) s.t. R N+ R (T ), T is a type for T , k > 0 and the following conditions hold: 1. If (R, T , k) ρ then (R, T, k) ρ, for all 0 < k < k 2. For every (R, T , k) ρ, there exists some A = nr.B T such that A T, B T and r R 3. If B1 Bk 1 Bk Bm T and {B1, . . . , Bk 1} T, then {Bk, . . . , Bm} T =

4. If A = n r.B T and A T, then |{(R, T , k) ρ : r R and B T }| = n.

5. For all (R, T , k) ρ, the following hold:

(a) If A r.B T , A T and r R, then B T

(b) If A r.B T , A T and r R, then B T (c) If r s T and r R, then s R (d) If (T, R, T ) is invertible, then T = T , and there is no other (R , T , l ) ρ s.t. (T, R , T ) is invertible,

where, for types T, T and R N+ R (T ), (T, R, T ) is invertible if there exists {A = n s.B, C = m p.D} T , s.t. {A, D} T, {B, C} T , and {s, p } R.

Let Tiles(T ) denote the set of all tiles for T . A tile (T, ρ) describes a domain element d in some model of T that participates in the basic concepts in T. Moreover, for each (R, T , l) ρ there is a domain element d whose basic concepts are given by T s.t. (d, d ) is in the extension of every role r R. We say that d is an R-successor of d.

Example 2. Consider the following TBox T = {B C A, A = 1r.B, A = 1r.{b}, A r.C, s r}. Let

τ = ({ , A}, {({r, s}, { , A, B, C, {b}}, 1)}).

Then, τ is a tile for T and it describes a certain kind of domain elements that may appear in models of T . In a model I of T , a domain element d I that is an instance of τ has the following properties:

1. d is of the type { , A}, i.e., d AI and d DI, for all D N+ C (T ) \ {A},

2. (d, d ) r I s I, where d I is of the type { , A, B, C, {b}}, i.e., d = b, d AI BI CI, and d DI, for all D N+ C (T ) \ {A, B, C}.

We brieﬂy explain the intuitions behind the conditions in Def. 3. Recall that, for a tile τ = (T, ρ), each (R, T , k) ρ represents a distinct R-successors of the domain element d that is an instance of τ. By deﬁnition, ρ is a set, which means no duplicates are allowed. However, a situation could arise in which we need two encode two distinct R-successors of d that have the same type and are connected to the d via the same roles. To overcome this issue, we consider triples (R, T , k), where k is an integer that tells us that (R, T , k) encodes the kth R-successor of d of type T . This is reﬂected in the ﬁrst condition. It is also worth mentioning that tiles only encode the relevant part of the neighborhood of a domain element, i.e., those neighbors that serve as witnesses for counting axioms in TBox T of the knowledge base. This is reﬂected in condition 2. The intuition behind the remianing conditions is rather straight forward. Condition 3 ensures the satisfaction of axioms of the type B1 Bk 1 Bk Bm T , condition 4 guarantees that d has n witnesses for every axiom A = nr.B T , conditions 5a-5c ensure that d and its neighbors respect axioms of the type A r.B and r s, and condition 5d is related to the coloring of models and intuitively, it ensures that a domain element is not used as a witness for counting axiom more times than it is allowed.

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

We now introduce enriched systems of linear integer inequalities that are tuples (V, E, VF , VI, I), where V is a set of variables, E is a set of inequalities of the form a1 x1+ +an xn + c b1 y1 + + bm ym, s.t. a1, . . . , an, b1, . . . , bm are positive integers, c is a possibly negative integer, and x1, . . . , xn, y1, . . . , ym V , VF V , VI 2V , and I is the set of implications of the form y1 + + ym > a x1 + + xn > b, where a, b are non-negative integers. Recall that we aim to characterize ﬁ-satisﬁability via enriched systems s.t. the solutions to the systems specify how many times we need to instantiate each tile in order to build a desired model. As some tiles might need to be instantiated inﬁnitely many times, we extend the set of natural numbers with a new value ℵ0 greater than any natural number and we let: ℵ0 ℵ0 = ℵ0 + ℵ0 = ℵ0 + 0 = 0 + ℵ0 = ℵ0 + n = n + ℵ0 = ℵ0 n = n ℵ0 = ℵ0, for all n N \ {0}, and 0 ℵ0 = ℵ0 0 = 0. We let N = N {ℵ0}. Furthermore, we say that inﬁnite sets have cardinality ℵ0. A solution of (V, E, VF , VI, I) is a function S : V N s.t. all inequalities and implications are satisﬁed, for every x VF , S(x) = ℵ0, and for every X VI, there exists x X s.t. S(x) = ℵ0.

Proposition 3. Let T be an ALCHOIQ TBox, and ΣF , ΣI NC(T ). We can obtain an enriched system S(T ,ΣF ,ΣI) that is exponential in the size of T s.t. V = Tiles(T ), and S(T ,ΣF ,ΣI) has a solution S if and only if there exists a model I of T in which AI is ﬁnite for all A ΣF and inﬁnite for all A ΣI. Moreover, for all concept names A N+ C (T ), |AI| = P (T,ρ) Tiles(T ), A T S((T, ρ)) holds.

Theorem 1. FI-SAT(ALCHOIQ) is NEXPTIME-complete. BOUNDEDNESS(ALCHOIQ) is CONEXPTIME-complete.

4.2 Size of Bounded Extensions Let T be an ALCHOIQ TBox and Σ NC(T ) NR(T ). We next show that there exists a function f T ,Σ : N N s.t. f T ,Σ(n) is an upper bound on the size of extensions of predicates bounded w.r.t. T and Σ in the models of (T , Σ, A), for any ABox A over the signature of T of size n. Note that we only consider the case where bounded predicates and predicates in Σ are concept names, as roles can be eliminated using a trick similar to the one in Section 4.1. To compute f T ,Σ, we once again rely on integer programming. To this end, we introduce enriched linear programs. An enriched integer linear program (enriched ILP), is a pair I = (a1 x1 + + an xn, S), where S = (V, E, VF , VI, I) is an enriched system of integer linear inequalities, x1, . . . , xn V and a1, . . . , an are integers. An optimal solution to I is a solution S of S that maximizes a1 x1 + +an xn, and the value a1 S(x1) + + an S(xn) is the optimal value of I.

We next show that for a concrete ABox A over the signature of T and a bounded concept name B, we can compute an enriched ILP whose optimal value acts as an upper bound on the size of the extension of B in the models of (T , Σ, A).

Proposition 4. Let T be an ALCHOIQ TBox, Σ NC(T ), B be a concept name bounded w.r.t. T and Σ, and A be an ABox over the signature of T . Then, there exists an ILP IB (T ,Σ,A) = (a1 x1 + + ak xk, S) that has a ﬁnite

optimal value b B T ,Σ,A, and |BI| b B T ,Σ,A, for every model I of (T , Σ, A). Moreover, let S = (V, E, VF , VI, I) and let c be the maximum among the absolute values of integers occurring in S. We have |V |, |E|, |I| 3(2n T m T )|T | c T +2, |VF | = |VI| = 0, and c max{1, c T , |A|}. Optimal solutions to enriched integer linear programs can be computed using ordinary integer programming techniques. Moreover, known integer programming results that show there is an upper bound on the optimal value of an ordinary integer program can also be transfered to enriched linear programs. Proposition 5. Let I = (a1 x1 + + an xn, S) be an enriched ILP, where S = (V, E, VF , VI, I), and assume that I has a ﬁnite optimal value b. Then b 4(|a1| + 1) (|an| + 1) (c + 1)|V | (|E|+|VF |+|VI|+|I|), where c is the maximum among the absolute values of integers occurring in S. Theorem 2. Let T be an ALCHOIQ TBox, Σ NC(T ), and n 0. For every concept name B bounded w.r.t T , and Σ, every ABox A over the signature of T of size n and every model I of (T , Σ, A):

|BI| 4 (2e|T | c T + 1) (c + 1)18 e(2|T | c T +4),

where e = (2n T m T ) and c = max{1, c T , n}.

Theorem 2 shows that f T ,Σ(n) = O(nk 2|T |k), where k is a constant, i.e., f is doubly-exponential in the size of T , but only polynomial in the size of the ABox. Note that the bound computed by f T ,Σ is optimal in the sense that there exists a simple TBox that creates a binary tree of exponential depth, whose every node belongs to a bounded predicate. Such a TBox is later used in the proof of Theorem 4. Weak boundedness. Given an ALCHOIQ KB K, it is not too hard to adapt the construction for ﬁ-satisﬁability to obtain an enriched system SK that is exponential in the size of K and whose solutions correspond to the models of K. To decide whether a concept name B is bounded w.r.t. K, we add a function that maximizes the extension of B on top of SK. This results in an enriched ILP that has a ﬁnite optimal value if and only if B is bounded w.r.t. K Theorem 3. W-BOUNDEDNESS(ALCHOIQ) is CONEXPTIME-complete. Furthermore, for a given KB K, we can compute an integer bound b that is doublyexponential in the size of K s.t. for all models I of K, |p I| b holds, for all predicates p bounded w.r.t. K.

5 Relaxed Safety for Rule-base Queries We discuss here how our results presented above can be applied in the context of ontology-based access to data. Of key importance in OBDA is the problem of answering ontology-mediated queries (OMQs) over ABoxes, which is deﬁned in terms of logical entailment in a DL KB. Most often an OMQ is a pair Q = (T , Q) of a TBox T and a conjunctive query (CQ) Q. Such an OMQ can be evaluated over an ABox A by computing the so-called certain answer to Q over the KB K = (T , A). OMQs can be seen as database queries, but they are posed over relational structures with at most binary relations, and they are written in an expressive ontology-based query language.

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

Many authors have studied OMQs for various DLs, yet for ALCHOIQ this problem is not well-understood. In particular, for OMQs with CQs we know decidability (without an upper bound) due to [Rudolph and Glimm, 2010] and a CO2NEXPTIME lower bound due to [Glimm et al., 2011]. When CQs are replaced by the more expressive Datalog queries, we immediately run into undecidability, even for much less expressive DLs (see, e.g., [Levy and Rousset, 1998; Rosati, 2007]). One can regain decidability by applying the DL-safety restriction on Datalog queries, which requires query variables to be guarded by predicates that do not occur in the TBox [Motik et al., 2005; Rosati, 2005]. Unfortunately, this restricts reasoning about anonymous objects signiﬁcantly. Our next goal is to show that our results can be used to relax DLsafety to support reasoning about anonymous objects, while retaining decidability. In a nutshell, the new safety condition requires all query variables to be guarded by a predicate that is bounded by the input TBox and some closed predicates. We will make this more formal next. We assume countably inﬁnite set ND and NV of Datalog predicates and variables, respectively. We assume ND to be disjoint from NC and NR. Each p ND has a non-negative integer arity. Each t NV NI is a term. An atom is an expression of the form q(t1, . . . , tn), where t1, . . . , tn are terms and one of the following holds: (i) q ND and q is n-ary, (ii) q NC and n = 1, or (iii) q NR and n = 2. An atom q( t) is ground if it contains no variables. A rule is an expression of the form q0( t0) q1( t1), . . . , qm( tm) such that (i) each qj( tj), 0 j m, is an atom, (ii) q0 ND, and (iii) for each variable X that occurs in t0, there is 1 k m s.t. X occurs in tk. A program is a ﬁnite set of rules. To deﬁne the semantics, we extend interpretations to also assign meaning to the symbols in ND. I.e. in an interpretation I, the function I assigns some n-ary relation p I ( I)n to each n-ary predicate symbol p ND. An assignment π for an interpretation I is any function π : NV NI I such that π(c) = c for all c NI. For a rule ρ of the form q0( t0) q1( t1), . . . , qm( tm), and an assignment π for I, we write I, π |= ρ in case π( t1) q I 1 , . . . , π( tm) q I m implies π( t0) q I 0 . An interpretation I is a model of a program P if I, π |= ρ holds for every rule ρ P and every assignment π. An OMQ is a tuple Q = (T , Σ, P, q), where T is a TBox, Σ NC NR, P is a ﬁnite set of rules, and q ND is the output predicate. The (output tuple) entailment problem is to decide, given an ABox A, an OMQ Q as above and tuple of t of individuals, whether t q I holds for every I that is a model of K = (T , Σ, A) and P. If it is true, t is said to be in the answer to Q over A. To ensure decidability of OMQs just introduced, we deﬁne a new safety condition that exploits predicate boundedness.

Deﬁnition 4 (Safe OMQs). A rule ρ is called safe for an OMQ Q = (T , Σ, P, q) if every variable of ρ occurs in the body of ρ in an atom p( t) such that p ND or p is bounded by T and Σ. We say Q is safe if every ρ P is safe for Q.

Example 3. Consider an OMQ Q = (T , {Empl}, {ρ}, pair), where T is some company ontology that includes the TBox T

from Example 1, and ρ is the following rule: pair(X, Y ) Empl(X), Empl(Y ), Proj(Z) assgnd To(X, Z), assgnd To(Y, Z) Intuitively, this query computes pairs of employees working on a common project. Observe that Q is safe, but is not DL-safe in the sense of [Motik et al., 2005; Rosati, 2005]. We can now characterize the complexity of query answering in the proposed language. Theorem 4. The entailment problem for safe OMQs is CO2NEXPTIME-complete. The problem is CONP-complete in data complexity, i.e., when the size of the input TBox is assumed to be bounded by a constant.

Proof (sketch). Assume an OMQ Q = (T , Σ, P, q), an ABox A, and a tuple t of individuals. The upper bound can be shown by a non-deterministic procedure to check that t does not belong to the answer to Q over A. Exploiting Theorem 2, the procedure guesses a structure that extends the input ABox with up to a double exponential number of fresh objects belonging to bounded predicates. It then checks that the structure (i) is a model of the program P, (ii) it is consistent with the TBox T , and (iii) it falsiﬁes the atom q( t). If the size of TBoxes is ﬁxed by a constant, this structure has polynomial size. The lower bound is shown by encoding a suitable version of the tiling problem. The lower bound for data complexity comes from instance checking in expressive DLs [Schaerf, 1993].

We remark that Theorem 4 also implies (as a special case) that the data complexity of satisﬁability of ALCHOIQ KBs with closed predicates is NP-complete.

6 Conclusion In this paper we have presented a method to reas OAon about the amount of anonymous objects in the models of KBs written in an expressive DL with closed predicates. This provides a new tool to aid the design of ontologies, also opening the way for sophisticated yet decidable reasoning tasks for data management. One such application is answering rule-based queries over ontologies expressed in the DL ALCHOIQ with closed predicates, which was discussed in Section 5. Another application, which will be discussed in detail in a future paper, is in veriﬁcation of temporal properties of evolving graph databases: we believe that using the key idea of Section 5 we can identify new settings with decidable veriﬁcation tasks, e.g., by developing methods to identify state bounded systems in the sense of [Bagheri Hariri et al., 2013]. Implementing our method to check boundedness is another challenging task for future work; we cannot explicitly build inequality systems and reuse existing integer programming solvers, because this would require (best-case) exponential time. A promising way to efﬁciently recognize predicate boundedness is to consider small systems of inequalities that provide a sound (but incomplete) approximation of the full systems deﬁned here.

Acknowledgements This work was partially supported by the Vienna Business Agency and the Austrian Science Fund (FWF) projects P30360, P30873 and W1255.

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

References [Baader et al., 2003] Franz Baader, Diego Calvanese, Deborah L. Mc Guinness, Daniele Nardi, and Peter F. Patel Schneider, editors. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, 2003. [Bagheri Hariri et al., 2013] Babak Bagheri Hariri, Diego Calvanese, Giuseppe De Giacomo, Alin Deutsch, and Marco Montali. Veriﬁcation of relational data-centric dynamic systems with external services. In Proc. of PODS 2013. ACM, 2013. [Bednarczyk et al., 2020] Bartosz Bednarczyk, Franz Baader, and Sebastian Rudolph. Satisﬁability and query answering in description logics with global and local cardinality constraints. In Proc. of ECAI 2020, June 2020. [Berardi et al., 2005] Daniela Berardi, Diego Calvanese, and Giuseppe De Giacomo. Reasoning on UML class diagrams. Artiﬁcial Intelligence, 168:70 118, 2005. [Calvanese, 1996] Diego Calvanese. Finite model reasoning in description logics. In Proc. of KR 1996, pages 292 303. Morgan Kaufmann, 1996. [Fan and Geerts, 2010] Wenfei Fan and Floris Geerts. Relative information completeness. ACM Trans. Database Syst., 35(4):27:1 27:44, 2010. [Franconi et al., 2011] Enrico Franconi, Yazmin Ang elica Ib a nez-Garc ıa, and Inanc Seylan. Query answering with DBoxes is hard. Electr. Notes Theor. Comput. Sci., 278:71 84, 2011. [Glimm et al., 2011] Birte Glimm, Yevgeny Kazakov, and Carsten Lutz. Status QIO: an update. In Proc. DL 2011, volume 745 of CEUR Workshop Proceedings. CEUR-WS.org, 2011. [Gogacz et al., 2020a] Tomasz Gogacz, V ıctor Guti errez Basulto, Yazmin Ang elica Ib a nez-Garc ıa, Filip Murlak, Magdalena Ortiz, and Mantas ˇSimkus. Ontology focusing: Knowledge-enriched databases on demand. In Proc. of ECAI 2020. IOS Press, 2020. [Gogacz et al., 2020b] Tomasz Gogacz, Sanja Lukumbuzya, Magdalena Ortiz, and Mantas Simkus. Datalog rewritability and data complexity of ALCHOIF with closed predicates. In Diego Calvanese, Esra Erdem, and Michael Thielscher, editors, Proc. of KR 2020, 2020. [Levy and Rousset, 1998] Alon Y. Levy and Marie-Christine Rousset. Combining horn rules and description logics in CARIN. Artif. Intell., 104(1-2):165 209, 1998. [Lutz et al., 2005] Carsten Lutz, Ulrike Sattler, and Lidia Tendera. The complexity of ﬁnite model reasoning in description logics. Inf. Comput., 199(1-2):132 171, 2005. [Lutz et al., 2013] Carsten Lutz, Inanc Seylan, and Frank Wolter. Ontology-based data access with closed predicates is inherently intractable(sometimes). In Proc. of IJCAI 2013. IJCAI/AAAI, 2013. [Motik et al., 2005] Boris Motik, Ulrike Sattler, and Rudi Studer. Query answering for OWL-DL with rules. J. Web Sem., 3(1):41 60, 2005.

[Pratt-Hartmann, 2005] Ian Pratt-Hartmann. Complexity of the two-variable fragment with counting quantiﬁers. Journal of Logic, Language and Information, 14(3):369 395, 2005. [Rosati, 2005] Riccardo Rosati. On the decidability and complexity of integrating ontologies and rules. J. Web Sem., 3(1):61 73, 2005. [Rosati, 2007] Riccardo Rosati. The limits of querying ontologies. In Proc. of ICDT 2007. Springer, 2007. [Rudolph and Glimm, 2010] Sebastian Rudolph and Birte Glimm. Nominals, inverses, counting, and conjunctive queries or: Why inﬁnity is your friend! J. Artif. Intell. Res., 39:429 481, 2010. [Schaerf, 1993] Andrea Schaerf. On the complexity of the instance checking problem in concept languages with existential quantiﬁcation. J. Intell. Inf. Syst., 2(3):265 278, 1993. [Schneider and ˇSimkus, 2020] Thomas Schneider and Mantas ˇSimkus. Ontologies and data management: A brief survey. K unstliche Intell., 34(3):329 353, 2020. [Xiao et al., 2018] Guohui Xiao, Diego Calvanese, Roman Kontchakov, Domenico Lembo, Antonella Poggi, Riccardo Rosati, and Michael Zakharyaschev. Ontology-based data access: A survey. In J erˆome Lang, editor, Proc. of IJCAI 2018, pages 5511 5519. ijcai.org, 2018.

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)