# objectrelational_queries_over_cfdi__90acb819.pdf

Object-Relational Queries over CFDI8 nc Knowledge Bases: OBDA for the SQL-Literate

Jason St. Jacques, David Toman, and Grant Weddell Cheriton School of Computer Science, University of Waterloo, Canada

jw.stjacques@gmail.com, {david,gweddell}@cs.uwaterloo.ca

We consider how SQL-like query languages over object-relational schemata can be preserved in the setting of ontology based data access (OBDA), thus leveraging wide familiarity with relational technology. This is enabled by the adoption of the logic CFDI8 nc , a member of the CFD family of description logics (DLs). Of particular note is that this logic can fully simulate DL-Lite F

core, a member of the DL-Lite family commonly used in the OBDA setting. Our main results present efﬁcient algorithms that allow computation of certain answers with respect to CFDI8 nc knowledge bases, facilitating direct access to a pre-existing row-based relational encoding of the data without any need for mappings to triple-based representations.

1 Introduction Ontology based data access (OBDA) is concerned with computing query answers over (possibly incomplete) data sources for which background knowledge about the data, commonly captured in an ontology, is available. The background knowledge provides additional query answers that may not be explicit in the data itself. To address scalability issues relating to the volume of data, many current approaches to OBDA focus on conjunctive queries (CQ) and ontologies based on DL dialects for which CQ answering is in AC0/PTIME with respect to data complexity. Moreover, to leverage advances in query processing in relational systems, approaches in which query answering can be reduced to SQL query evaluation over a relational encoding of the data are commonly sought. The two front-runners in this area are (i) the perfect rewriting-based approaches in which the given CQ is rewritten with the help of the ontological knowledge (typically formulated in one of the DL-Lite family of logics) in such a way that the resulting query can be executed over the original data sources to obtain desired answers [Calvanese et al., 2007], and (ii) the combined approaches in which the data is completed using ontological knowledge (formulated in DL-Lite or EL logics) in such a way that the original query (modulo ﬁltering that only depends on role hierarchies in the ontology) can then be executed over the data completion [Kontchakov et al., 2010; 2011; Lutz et al., 2013; 2009]. The closest to our approach is

the approach for Horn-SHIQ with rules [Eiter et al., 2012]; that approach deals with a logic incomparable with CFDI8 nc . Moreover, our approach handles keys and functional dependencies that are essential in database applications.

Recently, Toman and Weddell proposed CFDI8 nc [Toman and Weddell, 2014], a dialect of the CFD family [Khizder et al., 2000; Toman and Weddell, 2009; 2013] that has PTIME complexity for many of the fundamental reasoning tasks, and can fully simulate DL-Lite F

core [Toman and Weddell, 2015]. In this paper, we show that CQ answering over CFDI8 nc knowledge bases can be reduced to evaluating SQL queries over (a completion of) the data stored in a relational system. Our technique is based on a combination of query rewriting and data completion. Indeed, it is worth noting that, for CQs over CFDI8 nc KBs, OBDA cannot be accomplished by either using (perfect) query rewriting alone, due to PTIMEcompleteness of CQ answering, or by exclusive use of the combined approach, due to the need to realize exponentially many prototypical anonymous witnesses to represent types induced by value restrictions. Our main technical contributions, in the order presented, are as follows:

We exhibit an ABox completion procedure for a given

logic knowledge base K = (T , A) with PTIME data complexity; the completion also serves as a basis for KB consistency checking; We deﬁne a query rewriting that produces a union of

conjunctive queries Q0 from a given conjunctive query Q and T , and show that evaluating Q0 as a SQL query over the above ABox completion, viewed as a relational database, computes the certain answers of Q over K. Finally we show how a standard relational database

schema can be naturally captured as a (fragment of a) CFDI8 nc TBox, eliminating the need for additional mappings between data sources and virtual ABoxes that are typically utilized at this point, e.g., by [Calvanese et al., 2015]. We then show how such a rewritten query can be executed over an underlying relational representation without the need for object (id) invention.

Moreover, we show that the potential exponential blowup of the query rewriting cannot be avoided in general, since the combined complexity for CQ answering in CFDInc is PSPACE-complete (unless NP=PSPACE). However, the exponential blowup originating from concept hierarchies is

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16)

/. -, () *+ STRING

/. -, () *+ PERSON

/. -, () *+ DEPT

/. -, () *+ STUDENT

7?w w w w w w w w

w w w w w w w w

/. -, () *+ PROF

_g GGGGGGGG

/. -, () *+ CHAIR

/. -, () *+ TAKES 89]]]

/. -, () *+ CLASS

room______ ______

num_______ o_______ /. -, () *+ INT

Figure 1: UNIV Schema.

eliminated/minimized thanks to the data completion. Example 1 In the rest of the paper we use an objectrelational UNIV schema depicted in Figure 1 in which single arrows denote named features, double arrows inheritance between classes, and where primary keys are underlined. We can then query instances of this schema using familiar SQL/OQL-style syntax for queries, for example:

select distinct s.name as n from STUDENT s, TAKES t, CLASS c where s = t.student and

c.time = t.class.time and c.inst.dept.name = :p1 and c.num = :p2

The remainder of the paper is organized as follows: preliminaries are given next in which we introduce CFDI8 nc and conjunctive queries. In Section 3, we present the theoretical basis of our approach, and in Section 4 discuss practical matters connected with the use of a relational system. We then outline some preliminary experimental results and follow with a summary, including some brief comments on directions for future work.

2 Preliminaries

All members of the CFD family of DLs are fragments of FOL with underlying signatures based on disjoint sets of unary predicate symbols called primitive concepts, constant symbols called individuals and unary function symbols called features. Formally, the logics are deﬁned with respect to the sets F, PC and IN of (names of) features, primitive concepts, and individuals, respectively. A path function Pf is a word in F with the usual convention that the empty word is denoted by id and concatenation by . . Concept descriptions C and D are deﬁned by the grammars on the left-hand-side of Figure 2. A concept C : Pf1, . . . , Pfk ! Pf is called a path func-

SYNTAX SEMANTICS: ( )I C ::= A AI 4 | 8 Pf .C {x | Pf I(x) 2 CI} | 9f 1 {x | 9y 2 4 : f I(y) = x}

D ::= C CI 4 | C 4 \ CI

| 8 Pf .D {x | Pf I(x) 2 DI} | C : Pf1, . . . , Pfk ! Pf {x | 8y 2 CI :

i (x) = Pf I

i (y)) ) Pf I(x) = Pf I(y)}

Figure 2: CFDInc Concepts.

tional dependency (PFD) and must conform to the following forms:

1. C : Pf1, . . . , Pf . Pfi, . . . , Pfk ! Pf or 2. C : Pf1, . . . , Pf .f, . . . , Pfk ! Pf .g

Semantics is deﬁned in the standard way with respect to an interpretation I = (4, ( )I), where 4 is a domain of objects and ( )I an interpretation function that ﬁxes the interpretation of primitive concepts A to be subsets of 4, features f 2 F to be total functions on 4 ! 4, and individuals a to be elements of 4. The interpretation function is extended to path expressions by interpreting id as the identity function λx.x, concatenation as function composition, and to derived concept descriptions C or D as deﬁned in Figure 2.

An interpretation I satisﬁes an inclusion dependency C v D if CI DI, a concept assertion A(a) if a I 2 AI, and a path function assertion Pf1(a) = Pf2(b) if Pf I

1(a I) = Pf I

2(b I). I satisﬁes a knowledge base K if it satisﬁes each inclusion dependency and assertion in K.

It is easy to see that for every CFDInc KB (T , A), there is a conservative extension (T 0, A0) in which subsumptions in T 0 adhere to the following forms:

A v B, A v 8f.B, 8f.A v B, A v 9f 1, or A v A0 : Pf1, . . . , Pfk ! Pf,

where A and A0 are primitive concepts, B is a primitive concept or a negation of a primitive concept, and f 2 F a feature. Similarly, the ABox A0 contains only assertions of the form A(a) , a.f = b , and a = b .

For detailed description of CFDInc and its variants see [Toman and Weddell, 2014]. The following proposition summarizes computational properties of CFDI8 nc pertinent to the development in this paper:

Proposition 2 (CFDI8 nc Properties) Let K = (T , A) be a CFDI8 nc knowledge base, A1, . . . , Ak primitive concepts, C a concept description not containing the PFD constructor, and D a concept description. Then logical implication, T |= C v D, and K consistency can be decided in PTIME. Moreover, the question of whether or not a conjunction of concepts Ci (i n) interprets as non-empty in every model of K = (T , {A1(a), . . . , Ak(a)}) (i.e., when A1, . . . , Ak forces C1, . . . , Cn) is PTIME-complete in |K| and PSPACEcomplete in n.

Example 3 Sample constraints for the UNIV schema de-

picted in Figure 1 can be captured in CFDI8 nc as follows:

1. PERSON v DEPT, 2. PERSON v 8name.STRING, 3. PERSON v PERSON : name ! id, 4. PERSON v DEPT : name ! id, 5. PROF v PERSON, 6. 8reports.CHAIR v PROF, 7. 9head 1 v CHAIR, 8. CLASS v CLASS : dept, num ! id, etc.

2.1 Conjunctive Queries and Certain Answers Conjunctive queries are, as usual, formed from atomic queries (or atoms) of the form C(x) and x. Pf1 = y. Pf2 , where x and y are variables, using conjunction and existential quantiﬁcation. To simplify notation, we conﬂate conjunctive queries with the set of its constituent atoms and a set of answer variables: Deﬁnition 4 (Conjunctive Query) Let ' be a set of atoms C(xi) and xi1. Pf1 = xi2. Pf2, where C is a concept description (deﬁned in Figure 2), Pfi are path functions, and x a tuple of variables. We call the expression { x | '} a conjunctive query (CQ).

A conjunctive query { x | '} is therefore a notational variant of the formula 9 y. V

2' in which y contains all variables appearing in ' but not in x. The usual deﬁnition of certain answers is given by the following: Deﬁnition 5 (Certain Answer) Let K be a CFDI8 nc knowledge base and Q = { x | '} a CQ. A certain answer to Q over K is a substitution of constant symbols a, [ x 7! a], such that K |= Q[ x 7! a].

As is the case with TBoxes and ABoxes, a CQ can be represented in a normal form, a form in which all atoms in the CQ are of the form A(x) or x.f = y , where A is a primitive concept and f a feature. This can be easily achieved by introducing additional non-answer (existentially quantiﬁed) variables. Example 6 The OQL query from Example 1 above is now formally captured as the following CQ in normal form:

{hn, p1, p2i | STUDENT(s), TAKES(t), CLASS(c),

n = s.name, p2 = c.num, p1 = d0.name, c0 = t.class, p0 = c.inst, d0 = p0.dept, s = t.student, t0 = c.time, t0 = c0.time}.

For the remainder of the paper, we assume CQs are always connected. (Evaluating disconnected CQs is easily achieved by considering each component separately.)

3 Query Answering (OBDA) We begin by introducing OBDA over abstract ABoxes. From the introduction: we need a ﬁrst step to obtain an ABox completion AT depending only on A and T that is polynomial in both |A| and |T |, and a second step to obtain a query rewriting QT depending only on Q and T that is polynomial in |T |, such that a is a certain answer to Q in K = (T , A) if and only if a is an answer to QT when evaluated over AT .

3.1 Inverse-Induced PFDs Allowing inverse features affects how PFDs interact with an ABox. In particular, PFDs in which all path functions have a common preﬁx may apply to (pairs of) anonymous individuals mandated by the existence of anonymous inverse features for existing ABox individuals. Consider the following example: Example 7 For an ABox A = {A(a), A(b), a.g = b.g} and a TBox T = {A v 9f 1, 8f.A v B, B v B : f.g ! id}, it is easy to see that both a and b must have an f-predecessor to which the PFD in T applies and, in combination with functionality of features, yields a = b.

Observe with this example, however, that the TBox logically implies another PFD, in particular A v A : g ! id , and that this PFD applies to objects explicitly present in A. In general, to enforce PFDs on an ABox while avoiding any need to explicitly create anonymous predecessor objects, we add additional logically implied PFDs to a given TBox as follows: Deﬁnition 8 (PFD Enrichment for Inverses) Let T be a CFDI8 nc TBox such that A v B : f. Pf1, . . . , f. Pfk ! f. Pf 2 T (A v B : f. Pf1, . . . , f. Pfk ! id 2 T ) where Pfi 6= id for all 1 i k. Then we require that A v 8f.A0, B v 8f.B0, and A0 v B0 : Pf1, . . . , f. Pfk ! Pf (A0 v B0 : Pf1, . . . , f. Pfk ! id), where A0 and B0 are fresh primitive concepts, are also in T .

It is easy to see that the value restrictions and PFDs added to T logically follow from the original PFD and the necessary existence of f-inverses for A0 and B0 that originate in A and B, respectively. Thus, we assume that TBoxes satisfy this condition for the remainder of the paper.

3.2 Abstract ABox Completion The ﬁrst step of query answering, ABox completion, is deﬁned by the rules in Figure 3. In particular, the rules extend a given ABox with all implied concept memberships and feature agreements. (Note that this step cannot be accomplished by a FO query since it requires path exploration and is therefore hard for NLOGSPACE.) Deﬁnition 9 Let (T , A) be a CFDI8 nc knowledge base. We deﬁne an ABox completion T (A) to be the least ABox A0 such that A A0 and A0 is closed under the rules in Figure 3.

Observe that individuals can only be declared to be members of primitive concepts since A is in normal form. Also, if UNA were to be assumed (or assumed for elements of speciﬁc primitive concepts, such as INT and STRING), the equalities in rules (a) and (c) lead to KB inconsistency (without any need for additional ﬁring of rules ). It is also easy to see that completion terminates since it can add at most |T ||A|2 new objects, one for every pair of existing objects and a feature name.

3.3 Query Rewriting The second step in query answering relies on query reformulation with respect to T . This step is also necessary as CFDI8 nc can force exponentially many anonymous objects with distinct class membership to exist:

if a = b 2 A then add b = a to A if a = b, ' 2 A then add '[b/a] to A

(a) ABox Equality Interactions

if A(a) 2 A and T |= A v B then add B(a) to A if {A(a), a.f = b} A and T |= A v 8f.B then add B(b) to A if {A(a), b.f = a} A and T |= 8f.A v B then add B(b) to A

(b) ABox TBox Interactions

if A(a), B(b) 2 A, a. Pf0

i = ci, b. Pf0

i = ci 2 A for 0 < i k, and A v B : Pf1, . . . , Pfk ! Pf 2 T then 1. if a. Pf = c, b. Pf = d 2 A and c = d 62 A then add c = d to A; or 2. if Pf is of the form Pf00 .f and a. Pf00 = c, b. Pf00 = d and c = d 62 A then add c.f = e, d.f = e to A; where Pf0

i is a preﬁx of Pfi, c and d are A individuals, and e is a new individual.

(c) ABox PFD Interactions

Figure 3: ABox Completion Rules.

Example 10 Consider a CFDInc TBox

T = {Bi v 8 f. .f | {z }

.Bi | i k}.

Asserting (B0 u . . . u Bk)(a) would require exponentially many anonymous objects belonging to distinct concept combinations to be created as prototypical witnesses when completing the ABox along the lines of the combined approach [Lutz et al., 2009].

To avoid the need for expensive ABox completion, our approach treats matches to anonymous individuals by query reformulation along the lines of [Calvanese et al., 2007]: Deﬁnition 11 Let Q = { x | '} be a CQ. We write Fold T (Q) to denote the set of CQs (implicitly denoting the union of their results) that is obtained by an exhaustive application of the following on the initial set {{ x | '}}. For a CQ { y | } 2 Fold(Q), apply rewrite rules:

1. If {A(x), B(x)} and T |= A v B then

Fold(Q) := Fold(Q) {{ y | }}. 2. If {x.f = y, x.f = z} then

Fold(Q) := Fold(Q) {{ y | }} [ {{ y | }[z/y]}. 3. If {x.f = z, y.f = z} then

Fold(Q) := Fold(Q) [ {{ y | }[x/y]}. 4. If {A(x), B(x)} and T |= A v B then

Fold(Q) := Fold(Q) {{ y | }}[{{ y | {A(x)}}}. 5. If {x.f = y, A1(y), . . . , Ak(y)} and y does not appear elsewhere in nor in y then Fold(Q) := Fold(Q) [ {{ y | 0}} for all 0 = {x.f = y, A1(y), . . . , Ak(y)} [ {B1,i1(x), . . . , Bk,ik(x)} for which T |= Bi,ij v 8f.Ai and Bi,ij is maximal w.r.t. v for each i. 6. If {y.f = x, A1(y), . . . , Ak(y)} and y does not appear elsewhere in nor in y then Fold(Q) := Fold(Q) [ {{ y | 0}}, for all 0 = {y.f = x, A1(y), . . . , Ak(y)} [ {B1,i1(x), . . . , Bk,ik(x)} such that T |= 8f.Bi,ij v Ai, where, for each i, Bi,ij is maximal w.r.t. v, and for which T |= Bi,ij v 9f 1 some 1 i k.

The key idea underlying this deﬁnition is that, to ﬁnd query answers, it is now sufﬁcient to match queries in Fold T (Q) explicitly against the (extended) ABox; matches outside the ABox are captured by query reformulation: removed parts of the query are implied by T . Lemma 12 Let Q be a CQ with at least one answer variable. Then a is a certain answer to Q over K = (T , A) if and only if a is an answer for some { x | } 2 Fold T (Q) over AT .

Proof (sketch): Observing that the extended ABox AT is essentially a part of the minimal model of K (since K is Horn) and that every element of Fold T (Q) implies Q, it is easy to see that whenever (1-6) are satisﬁed, there is a match of Q in the minimal model and thus a is an answer. Conversely, if a match of Q in a minimal model exists yielding a as an answer, then part of the match will be realized in the ABox (since at least the answer variables must be bound to ABox individuals) and the reminder of the match must be forestlike. Hence, one of the queries in Fold T (Q) matches in the ABox since the remaining conjuncts must be implied by T . 2

Example 13 Fold T (Q), where Q is the normal form query in Example 6, will contain the following CQ

{hn, p1, p2i | STUDENT(s), TAKES(t), CLASS(c),

n = s.name, p2 = c.num, p1 = d0.name, c = t.class, p0 = c.inst, d0 = p0.dept, s = t.student},

which is obtained from Q by applying rules (3) and (5).

For CQ without answer variables, we need an additional step that checks if the query (when equivalent to a concept) matches in the tree part of every interpretation of K. We therefore extend the query rewriting in Deﬁnition 11 as follows:

7. If x = hi in Q = { x | '} and ' is equivalent to a

conjunction of concepts C1, . . . , Cn. Then Fold(Q) := Fold(Q) [ {hi | A1(x) . . . Ak(x)} for every combination of primitive concepts A1, . . . , Ak that force C1, . . . , Cn in T .

This construction accounts for matches in the anonymous part of the minimal model of K, and yields the following Lemma:

Lemma 14 Let Q be a CQ without answer variables. Then K |= Q if and only if at least one {hi | } 2 Fold T (Q) evaluates to true over AT .

3.4 Anonymous Object Invention To handle equalities generated by PFDs, ABox completion, in particular completion rule (c.2) in Figure 3, requires inventing new ABox individuals (denoted e in the ﬁgure). These individuals then allow agreement in queries to be resolved within the extended ABox. However, in many situations, individual invention (which, in turn leads to primary key invention as we will see in Section 4) may not be possible in many practical applications (indeed, Section 4 will rely on this).

Here, we explore an alternative based on query reformulation. Unfortunately, it is easy to see for the PFDs of the second form (those that lead to individual invention), that this is not possible. In particular, consider the following case: Example 15 Assume T = {A v 8f.A, A v A : f.h, g ! h} and consider the CQ {hx, yi | A(x), A(y), x.h = z0, y.h = z0}. Then a rewriting of the query must contain {hx, yi | A(x), A(y), x.g = z0, y.g = z0, x.f = x1, y.f = y1, A(x1), A(y1), x1.h = z1, y1.h = z1} , and {hx, yi | A(x), A(y), x.g = z0, y.g = z0, x.f = x1, y.f = y1, A(x1), A(y1), x1.g = z1, y1.g = z1, x1.f = x2, y1.f = y2, A(x2), A(y2), x2.h = z2, y2.h = z2} , and so on, which yields a rewriting that is necessarily inﬁnite.1

However, if the second form of PFDs is restricted to standard relational FDs of the form A v A : f1, . . . , fn ! f , then rewriting becomes possible:

8. If {x.f = z, y.f = z, A1(z), . . . , Ak(z)} and

A v B : f1, . . . , fn ! f 2 T then Fold(Q) := Fold(Q) [ {{ y | 0}} for all 0 of the form {x.f = z, y.f = z, A1(z), . . . , Ak(z)} [ {Ax(x), Ay(y)} [ {Bx

1,i1(x), . . . , Bx

k,ik(x)} [ {By

1,i1(y), . . . , By

k,ik(y)} [ {x.f1 = z1, y.f1 = z1, . . . , x, fn = zn, y.fn = zn}}, where (a) (Ax = A and Ay = B) or (Ax = B and Ay = A), and (b) (Bx

i,ji = Bi,ji and By

i,ji = >) or (Bx

i,ji = > and By

i,ji = Bi,ji), for all i and for all combinations of Bi,ij for which T |= 8f.Bi,ij v Ai and Bi,ij is maximal w.r.t. v. A similar adjustment must be performed on the key PFDs to account for equalities implied by FDs and in turn correctly identify ABox individuals (or signal KB inconsistency). Alltogether, we have: Lemma 16 Let Q = { x | '} be a CQ over K = (T , A) with standard relational FDs and keys. Then a is a certain answer to Q over K if and only if a is in the result of at least one { x | } 2 Fold T (Q) using reformulations (1-8) over an A completion using only rules (a,b) and (c.1) in Figure 3 (modiﬁed as described above).

3.5 Concrete Classes and Data Types Although one can treat all primitive concepts uniformly in our approach, in practice one may want to distinguish those

1This, without resorting to regular path functions as in [Toman and Weddell, 2005] which, however, would preclude the consequent use of an SQL engine.

that correspond to primitive data types, such as INT and STRING in our running example, since computing a completion for these concepts results in collecting all integers or strings occurring in the KB, which is undesirable. To avoid the need for completing these concepts, there are two options, with the ﬁrst restricting user queries to disallow asking directly about these concepts.2 Otherwise, to avoid the need for data completion for these concepts, consider that concepts corresponding to data types should not have outgoing features, and can therefore be handled using query rewriting:

9. If A(x) 2 , A a data type, and T |= B v 8f.A then

Fold(Q) := Fold(Q)[{{ y | {A(x)}[{B(y), y.f = x}}}, where y is a fresh variable.

This additional rule makes it safe to omit completion rules from Figure 3(b) when concepts corresponding to data types are involved. Example 17 Consider the query Q = {hxi | STRING(x)}. We get reformulations {hxi | PERSON(y), y.name = x} and {hxi | DEPT(y), y.name = x} in Fold(Q), which in turn retrieve all STRINGS recorded in the KB.

The constructions used in Lemmas 12 and 14 (and optionally Lemma 16 and Section 3.5(9)) also yield the upper bounds. In all cases, computing certain answers reduces to evaluating an union of conjunctive queries over a polynomiallysized completion of an ABox. Lower bounds follow from reductions from graph reachability, Horn-SAT, and the DFA intersection problem [Kozen, 1977], respectively. Note that PSPACE-hardness holds even for very simple CQs of the form 9x.(A1(x) . . . Ak(x)). Corollary 18 Data complexity for CQ query answering over CFDInc KB is in PTIME. CQ query answering over CFDInc KB is NLOGSPACE-hard for data complexity (in |A|), PTIME-hard in |T + A|, and PSPACE-hard for combined complexity.

4 Relational Back-end

Hereon, we assume any non-key PFD occurring in a knowledge base K is a key or conforms to a standard relational FD. (Recall that this enables our framework to avoid anonymous object invention.) Our objective in this section is to show that this leads to a very practical front-end to existing relational engines in which K consists of a TBox T that embeds a relational schema S, and a virtual ABox A that is given by the tuples occurring in relational tables. W.l.o.g., we assume the correspondence is as follows:

1. Primitive concepts (save concepts capturing concrete

values such as INT and STRING in Figure 1), hereon called abstract, correspond to tables, and identities of objects belonging to such concepts are captured by primary key values in the tables; 2. Features between abstract concepts, hereon also called

abstract, are captured by foreign keys between the corresponding tables. Note that a single such feature may correspond to a multi-arity foreign key;

2This is the approach SQL and OQL have taken.

3. Features between abstract concepts and concepts cap-

turing concrete values, hereon called concrete, correspond to attributes in the table corresponding to the abstract concept with an appropriate concrete data type, for which UNA is assumed (as is common in SQL).

Note also that non primary key attributes may contain SQL s null values to account for data incompleteness.

In our running example, S consists of create table commands that include declarations for primary and foreign keys, such as the following in the case of the CLASS primitive concept for our UNIV TBox:

create table CLASS-TAB (dname, num, iname, room, time,

primary key (dname, num), foreign key (dname) to DEPT-TAB, foreign key (iname) to PROF-TAB ).

We write TAB(CLASS), ATTR(CLASS) and PK(CLASS) to respectively denote table CLASS-TAB, its set of attributes (dname, etc.) and the sequence of its primary key attributes (dname, num). For abstract features, such as dept, we use DOMATTR(dept), DOMTAB(dept), and RANTAB(dept) to denote the list of source attributes (dname), the originating table (CLASS-TAB), and the target table (DEPT-TAB), respectively, in order to describe the associated foreign key.

We assume that A v B 2 T whenever PK(A) 6= PK(B) and that ATTR(B) ATTR(A) whenever T |= A v B.3

With these assumptions, ABox completion for a consistent knowledge base will only be required for ABox-TBox interactions (see Figure 3), and is accomplished by repeatedly executing the following updates until no changes occur:4

(when T |= A v B)

TAB(A) := TAB(A) [ (TAB(A) ./PK(A)=PK(B) TAB(B)) TAB(B) := TAB(B) [ ATTR(B)(TAB(A))

(when T |= A v 8f.B and f is abstract)

TAB(B) := TAB(B)

[ ATTR(B)(TAB(A) ./DOMATTR(f)=PK(B) RANTAB(f))

(when T |= 8f.B v A and f is abstract)

TAB(A) := TAB(A)

[ ATTR(A)(DOMTAB(f) ./DOMATTR(f)=PK(B) TAB(B))

As a consequence, this approach effectively equates object identity with primary key values already occurring in a relational data source, and therefore entirely avoids any need to create Skolem constants that would otherwise be needed for an explicit ABox realization. The ﬁnal step consists of converting Fold(Q) to SQL. This is rather clerical task and the main steps consist of replacing

1. concept descriptions by the associated tables; 2. object references by primary keys; and 3. abstract features by corresponding foreign keys.

3This reﬂects a structural inheritance approach to sub-typing in relational schema design that is not strictly necessary since values of inherited attributes for tuples can be computed with additional joins.

4We appeal to the relational algebra to express right-hand-sides. Note that attributes in projection operations are assumed to be initialized to null values when missing from argument subexpressions.

Finally, to obtain well-formed SQL query, we need to separate the conjuncts to the SELECT, FROM, and WHERE clauses as dictated by the (rather rigid) SQL syntax.

5 Experiments

To conﬁrm practicality, we applied our framework to a modiﬁed version of the LUBM benchmark in which we introduce a relational schema. We then deﬁned a knowledge base TBox that embeds this schema and supplied various virtual ABoxes according to a number of conﬁgurations of generated data for the LUBM benchmark [Guo et al., 2005]. We performed the tests on Mac Book Pro with 3ghz Core i7 CPU, 16gb RAM (OSX 10.11.1), and Postgres 9.4. Figure 4 reports the time needed to load the raw data, to index it, and to compute the ABox completion. The results show that the completion is comparable to raw data loading (a factor of about 2).

LUBM scale 10 50 100 200 500 1000 raw data load 14.9 117.4 234.4 518.1 1341 3034 indexing 10.2 69.7 167.9 377.6 1063 2308 completion 5.9 134.2 286.6 769.5 2452 6853

Figure 4: Experimental Results (in seconds).

Due to space constraints we do not report the individual running times for the LUBM queries. However, all the queries complete within few seconds (typically under a second), even on the largest instances (except for queries 2 and 9 that took 73s and 65s, respectively, on the LUBM 1000 instance). These results are not surprising for reasonably indexed relational instances and can be considered representative of the approach.

The paper presents a novel approach to OBDA over knowledge bases formulated in CFDI8 nc , a DL that has been designed to capture relational and object-relational schemata in a natural way. Indeed, there are additional beneﬁts to using CFDI8 nc as the underlying DL: Example 19 Consider the two queries generated in Example 17. Since the CFDI8 nc TBox allows us to reason about keys and concept disjointness, we can deduce that neither of the two queries needs the distinct duplicate elimination in their SQL select clauses (since name is a key for both PERSON and DEPT) and, moreover, the union of these two queries can use the union all operator (since the two subqueries are disjoint), thus completely avoiding duplicate elimination in the ﬁnal query.

For more complete development and beneﬁts of these optimizations see [Khizder et al., 2000; Liu et al., 2002; Toman and Weddell, 2011]. There are two additional avenues for extending this proposal: ﬁrst, the data completion could be performed incrementally and second, the restrictions on concept compatibility based on the primary keys of the underlying tables could be relaxed using techniques based on referring expressions recently proposed in [Borgida et al., 2016].

References [Borgida et al., 2016] Alexander Borgida, David Toman, and

Grant Weddell. On referring expressions in query answering over ﬁrst order knowledge bases. In Principles of Knowledge Representation and Reasoning, 2016. (in press). [Calvanese et al., 2007] Diego Calvanese, Giuseppe De Gi-

acomo, Domenico Lembo, Maurizio Lenzerini, and Riccardo Rosati. Tractable reasoning and efﬁcient query answering in description logics: The DL-Lite family. J. Autom. Reasoning, 39(3):385 429, 2007. [Calvanese et al., 2015] Diego Calvanese, Benjamin Cogrel,

Sarah Komla-Ebri, Davide Lanti, Mart ın Rezk, and Guohui Xiao. How to stay ontop of your data: Databases, ontologies and more. In The Semantic Web: ESWC 2015 Satellite Events - ESWC 2015 Satellite Events Portoroˇz, Slovenia, May 31 - June 4, 2015, Revised Selected Papers, pages 20 25, 2015. [Eiter et al., 2012] Thomas Eiter, Magdalena Ortiz, Mantas

Simkus, Trung-Kien Tran, and Guohui Xiao. Query rewriting for horn-shiq plus rules. In Proceedings of the Twenty-Sixth AAAI Conference on Artiﬁcial Intelligence, July 22-26, 2012, Toronto, Ontario, Canada., 2012. [Guo et al., 2005] Yuanbo Guo, Zhengxiang Pan, and Jeff

Heﬂin. LUBM: A benchmark for OWL knowledge base systems. J. Web Sem., 3(2-3):158 182, 2005. [Khizder et al., 2000] Vitaliy L. Khizder, David Toman, and

Grant Weddell. Reasoning about Duplicate Elimination with Description Logic. In Rules and Objects in Databases (DOOD, part of CL 00), pages 1017 1032, 2000. [Kontchakov et al., 2010] Roman Kontchakov, Carsten Lutz,

David Toman, Frank Wolter, and Michael Zakharyaschev. The combined approach to query answering in DL-Lite. In Principles of Knowledge Representation and Reasoning, pages 247 257, 2010. [Kontchakov et al., 2011] Roman Kontchakov, Carsten Lutz,

David Toman, Frank Wolter, and Michael Zakharyaschev. The combined approach to ontology-based data access. In Proc. Int. Joint Conf. on Artiﬁcial Intelligence (IJCAI), pages 2656 2661, 2011. [Kozen, 1977] Dexter Kozen. Lower bounds for natural proof systems. In Proceedings of the 18th Annual Symposium on Foundations of Computer Science, pages 254 266. IEEE Computer Society, 1977. [Liu et al., 2002] Huizhu Liu, David Toman, and Grant Wed-

dell. Fine Grained Information Integration with Description Logic. In Description Logics 2002, pages 1 12. CEUR-WS vol.53, 2002. [Lutz et al., 2009] Carsten Lutz, David Toman, and Frank

Wolter. Conjunctive query answering in the description logic EL using a relational database system. In Proc. Int. Joint Conf. on Artiﬁcial Intelligence (IJCAI), pages 2070 2075, 2009. [Lutz et al., 2013] Carsten Lutz, Inanc Seylan, David Toman, and Frank Wolter. The combined approach to

OBDA: Taming role hierarchies using ﬁlters. In International Semantic Web Conference (1), pages 314 330, 2013. [Toman and Weddell, 2005] David Toman and Grant Wed-

dell. On Reasoning about Structural Equality in XML: A Description Logic Approach. Theoretical Computer Science, 336(1):181 203, 2005. [Toman and Weddell, 2009] David Toman and Grant E.

Weddell. Applications and extensions of PTIME description logics with functional constraints. In Proc. Int. Joint Conf. on Artiﬁcial Intelligence (IJCAI), pages 948 954, 2009. [Toman and Weddell, 2011] David Toman and Grant E.

Weddell. Fundamentals of Physical Design and Query Compilation. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011. [Toman and Weddell, 2013] David Toman and Grant E.

Weddell. Conjunctive Query Answering in CFDnc: A PTIME Description Logic with Functional Constraints and Disjointness. In Australasian Conference on Artiﬁcial Intelligence, pages 350 361, 2013. [Toman and Weddell, 2014] David Toman and Grant E.

Weddell. On adding inverse features to the description logic CFD8nc. In PRICAI 2014: Trends in Artiﬁcial Intelligence - 13th Paciﬁc Rim International Conference on Artiﬁcial Intelligence, Gold Coast, QLD, Australia, pages 587 599, 2014. [Toman and Weddell, 2015] David Toman and Grant E.

Weddell. On the utility of CFDI. In Proceedings of the 28th International Workshop on Description Logics, Athens,Greece, June 7-10, 2015., 2015.