# datalog_ontology_consolidation__204b9fa8.pdf

Journal of Artiﬁcial Intelligence Research 56 (2016) 613-656 Submitted 03/16; published 08/16

Datalog Ontology Consolidation

Cristhian Ariel D. Deagustini cadd@cs.uns.edu.ar Mar ıa Vanina Mart ınez mvm@cs.uns.edu.ar Marcelo A. Falappa mfalappa@cs.uns.edu.ar Guillermo R. Simari grs@cs.uns.edu.ar AI R&D Lab., Institute for Computer Science and Engineering (ICIC) Consejo Nacional de Investigaciones Cient ıﬁcas y T ecnicas (CONICET) Universidad Nacional del Sur (UNS), Alem 1253, (B8000CPB) Bah ıa Blanca, Argentina.

Knowledge bases in the form of ontologies are receiving increasing attention as they allow to clearly represent both the available knowledge, which includes the knowledge in itself and the constraints imposed to it by the domain or the users. In particular, Datalog

ontologies are attractive because of their property of decidability and the possibility of dealing with the massive amounts of data in real world environments; however, as it is the case with many other ontological languages, their application in collaborative environments often lead to inconsistency related issues. In this paper we introduce the notion of incoherence regarding Datalog ontologies, in terms of satisﬁability of sets of constraints, and show how under speciﬁc conditions incoherence leads to inconsistent Datalog ontologies. The main contribution of this work is a novel approach to restore both consistency and coherence in Datalog ontologies. The proposed approach is based on kernel contraction and restoration is performed by the application of incision functions that select formulas to delete. Nevertheless, instead of working over minimal incoherent/inconsistent sets encountered in the ontologies, our operators produce incisions over non-minimal structures called clusters. We present a construction for consolidation operators, along with the properties expected to be satisﬁed by them. Finally, we establish the relation between the construction and the properties by means of a representation theorem. Although this proposal is presented for Datalog ontologies consolidation, these operators can be applied to other types of ontological languages, such as Description Logics, making them apt to be used in collaborative environments like the Semantic Web.

1. Introduction

The integration of diﬀerent systems, and the interaction resulting from this integration, led to a host of pervasive practical problems and challenging research opportunities; some of the most interesting ones occurs in the Web s collaborative environments, e.g., e-commerce, and with the arrival of the Semantic Web, such as ontology engineering. However, the collaboration among systems brings along the problem of conﬂicting pieces of information that are likely to appear as knowledge repositories evolve. Admittedly, the management of conﬂicting information is an important and challenging issue that has to be faced (G omez, Ches nevar, & Simari, 2010; Haase, van Harmelen, Huang, Stuckenschmidt, & Sure, 2005; Huang, van Harmelen, & ten Teije, 2005; Bell, Qi, & Liu, 2007), specially when integrating

c 2016 AI Access Foundation. All rights reserved.

Deagustini, Martinez, Falappa & Simari

knowledge coming from diﬀerent sources (Black, Hunter, & Pan, 2009; Baral, Kraus, & Minker, 1991; Amgoud & Kaci, 2005), or when such knowledge is expected to be exploited by a reasoning process. In this context, knowledge bases in the form of ontologies are becoming a useful device that provide a convenient way to represent both the intensional and extensional knowledge of the application domain. Moreover, the expressive power of ontologies allows to perform important tasks on data integration (Lenzerini, 2002), and also plays a role of great importance in the aforementioned Semantic Web (Berners-Lee, Hendler, & Lassila, 2001). In this work we adopt Datalog ontologies, a family of rule-based ontology languages (Cal ı, Gottlob, & Lukasiewicz, 2012). Datalog enables a modular rule-based style of knowledge representation, and it can represent syntactical fragments of ﬁrst-order logic (FOL) so that answering a Boolean Conjunctive Query (BCQs) Q under a set Σ of Datalog rules for an input database D is equivalent to the classical entailment check D Σ |= Q. Tractable fragments of Datalog guarantee termination of query answering procedures in polynomial time in the data complexity and ﬁrst-order rewritability. Moreover, ontologies described using existential rules generalize several well-known Description Logics (DLs); in particular, linear and guarded Datalog (two basic tractable fragments of this family) are strictly more expressive than the whole DL-Lite family (Calvanese, De Giacomo, Lembo, Lenzerini, & Rosati, 2005), and guarded Datalog is strictly more expressive than EL (Brandt, 2004; Baader, Brandt, & Lutz, 2005). Therefore, the results presented in this paper extend directly to these DLs as well. These properties of Datalog together with its expressive power, and the fact that it keeps a syntax closer to that used in relational databases for greater readability, make it very useful in modeling real applications, such as ontology querying, Web data extraction, data exchange, ontology-based data access, and data integration. We focus on two particular problems that arise from the integration and/or evolution of information systems: inconsistency and incoherence. Inconsistency refers to the lack of models of a theory. On the other hand, in ontological settings, incoherence refers to a set of ontological rules that cannot be applied without leading to violations of the constraints imposed on the knowledge, making them unsatisﬁable. Incoherence and inconsistency, which can arise from automated procedures such as data integration and ontology matching, may be serious issues in real world applications. Since standard ontology languages adhere to the classical FOL semantics, classical inference semantics fails in the presence of this kind of problems. Thus, it is important to focus on the formalization of methods to address both inconsistency and incoherence in ontologies that are able to cope with the users expectations in terms of eﬀectiveness of the procedures for query answering and the meaning of these answers when potential conﬂict exists. This paper addresses the problem of handling inconsistencies and incoherences that may appear in Datalog ontologies. In this regard, we propose a general framework that aims at the consolidation of Datalog ontologies (i.e., solving every conﬂict of coherence and consistency in them). That is, a consolidation operator takes as input a (possibly) incoherent and inconsistent Datalog ontology and returns another Datalog ontology where all conﬂicts are amended, thus ensuring that it is both coherent and consistent. As it is usual in this setting, an assumption of minimal change is made, that is to say, it is expected that the consolidation process changes the original ontology as little as possible. The approach presented is based on the use of incision functions (Hansson, 1993, 1994, 1997,

Datalog Ontology Consolidation

2001) from the Belief Revision literature. Instead of operators that only account for the information included in the conﬂicts in a knowledge base, in this work we aim to capture consolidation operators that can consider all the information included in a KB when solving conﬂicts. The main contributions of this work are the following:

We introduce a notion of incoherence tailored for Datalog . To achieve this we adapt to this setting similar notions from Description Logics. Also, we look into the relationship of incoherence and inconsistency and how it impacts the consolidation process.

We provide a set of properties expected to be satisﬁed by consolidation operators for Datalog ontologies by means of postulates. These postulates provide a formal characterization of a consolidation operator without focusing on how the consolidation process is actually performed, thus providing a formal comparison framework for consolidation operators. The postulates consider some intuitions in classic Belief Revision; nevertheless, they are adapted to the Datalog ontological setting (and could be also adapted to suit other ontological languages), meaning that they have two versions (one addressing incoherence and another one for inconsistency).

We present a complete construction of consolidation operators that take a (possibly) incoherent and inconsistent Datalog ontology and gives as a result a consistent and coherent one. A noteworthy characteristic of such operators is that it involves a two steps approach, ﬁrst considering incoherence conﬂicts, and then solving inconsistency conﬂicts as a latter step, helping to improve the ﬁnal result in terms of the information that needs to be deleted to solve conﬂicts.

We study the relationship between the formal properties of the operator and the construction we propose, demonstrating that they are equivalent; thus, this shows that any consolidation operator satisfying the properties corresponds to the construction introduced in this work.

The paper is organized as follows: in Section 2 we introduce the necessary notions from Datalog and Belief Revision. Next, though inconsistency and incoherence are related, they are also two very distinct problems in the setting of ontological knowledge bases in particular, where there is a clear separation of the intensional and the extensional knowledge. Therefore, in Section 3, we discuss the two notions in Datalog ontologies, how they relate to each other, and the reasons why they need to be treated in combination but separately. Then, in Section 4 we present the properties that an ontology consolidation operator must satisfy, and in Section 5 we introduce the process used to restore consistency and coherence of Datalog ontologies, and relate the presented process to the given properties by means of a representation theorem. Next, we present a complete example depicting the entire consolidation process. Finally, in Sections 7 and 8 we discuss related work from diﬀerent areas in Artiﬁcial Intelligence and Database Theory, and provide conclusions and future lines of research, respectively.

2. Preliminaries and Background

To facilitate the reading, we begin by introducing the notions from Datalog and Belief Revision that will be needed in the rest of the paper.

Deagustini, Martinez, Falappa & Simari

2.1 Preliminaries on Datalog

First, we recall the basic notions of Datalog ontologies that will be used in the paper (see Cal ı et al., 2012 for more details). Datalog extends Datalog by allowing existential quantiﬁcation in the rule heads, together with other extensions that we enumerate below, but limiting the interaction of these elements in order to achieve tractability. We will assume that the domain of discourse of a Datalog ontology consists of a countable set of data constants , a countable set of nulls N (as place holders for unknown values), and a countable set of variables V. We also assume that diﬀerent constants represent diﬀerent values (unique names assumption). To distinguish constants from variables, we adopt the standard notation from logic programming, where variable names begin with uppercase letters, while constants and predicate symbols begin with lowercase letters. We assume a relational schema R that is a ﬁnite set of predicate symbols (or simply predicates). A term t is a constant, a null, or a variable. An atom a has the form p(t1, . . . , tn), where p is an n-ary predicate and t1, . . . , tn are terms; an atom is ground iﬀall terms in it are constants. Let L be a ﬁrst-order language such that R L; then LR denotes the sublanguage generated by R. A database (instance) of R is a ﬁnite set of atoms with predicates in R and terms in N. A homomorphism on constants, nulls and variables is a mapping h : N V N V such that (i) c implies h(c) = c, (ii) c N implies h(c) N, and (iii) h is naturally extended to atoms, sets of atoms, and conjunctions of atoms. Given a relational schema R, a tuple-generating dependency (TGD) σ is a ﬁrst-order formula of the form X YΦ(X, Y) ZΨ(X, Z) where Φ(X, Y) and Ψ(X, Z) are conjunctions of atoms over R called the body (denoted body(σ)) and the head (denoted head(σ)), respectively. Consider a database D for a relational schema R, and a TGD σ on R of the form Φ(X, Y) Z Ψ(X, Z). Then, σ is applicable to D if there exists a homomorphism h that maps the atoms of Φ(X, Y) to atoms in D. Let σ be applicable to D, and h be a homomorphism that extends h as follows: for each Xi X, h (Xi) = h(Xi); for each Zj Z, h (Zj) = zj, where zj is a fresh null, i.e., zj N, zj does not occur in D, and zj lexicographically follows all other nulls already introduced. The application of σ on D adds to D the atom h (Ψ(X, Z)) if it is not already in D. After the application we say that σ is satisﬁed by D. The Chase for a database D and a set of TGDs ΣT , denoted chase(D, ΣT ), is the exhaustive application of the TGDs (Cal ı et al., 2012) in a breadth-ﬁrst (level-saturating) fashion, which leads to a (possibly inﬁnite) chase for D and Σ. It is important to remark that BCQs Q over D and ΣT can be evaluated on the chase for D and ΣT , i.e., D ΣT |= Q is equivalent to chase(D, ΣT ) |= Q (Cal ı et al., 2012). Negative constraints (NCs) are ﬁrst-order formulas of the form XΦ(X) , where Φ(X) is a conjunction of atoms (without nulls) and the head is the truth constant false, denoted . An NC τ is satisﬁed by a database D under a set of TGDs ΣT iﬀthere does not exist a homomorphism h that maps the atoms of Φ(X) to D, where D is such that every TGD in ΣT is satisﬁed, i.e., the atoms in the body cannot all be true together. Equality-generating dependencies (EGDs) are ﬁrst-order formulas of the form XΦ(X) Xi = Xj, where Φ(X) is a conjunction of atoms, and Xi and Xj are variables from X. An EGD σ is satisﬁed in a database D for R iﬀ, whenever there exists a homomorphism h such that h(Φ(X)) D, it holds that h(Xi) = h(Xj). In this work we

Datalog Ontology Consolidation

will focus on a particular class of EGDs, called separable (Cal ı et al., 2012); intuitively, separability of EGDs w.r.t. a set of TGDs states that, if an EGD is violated, then atoms contained in D are the reason of the violation (and not the application of TGDs); i.e., if an EGD in ΣE is violated when we apply the TGDs in ΣT for a database D, then the EGD is also violated in D. Separability is a standard assumption in Datalog ontology, as one of the most important features of this family of languages is the focus on decidable (Cal ı, Lembo, & Rosati, 2003) (actually tractable) fragments of Datalog . NCs and EGDs play an important role in the matter of conﬂicts in Datalog ontologies. In fact, the approach that we present in this work ensure that neither NCs nor EGDs are violated in the resulting ontology. Also, as an important remark, note that the restriction of using only separable EGDs makes that certain cases of conﬂicts are not considered in our proposal. The treatment of such cases, though interesting from a technical point of view, are outside the scope of this work since we focus on tractable fragments of Datalog . As is the usual case in the literature, in general the universal quantiﬁers in TGDs, negative constraints and EGDs are omitted, and the sets of dependencies and constraints are assumed to be ﬁnite. Now that we have presented the diﬀerent ways of expressing knowledge in Datalog , we are ready to formally deﬁne Datalog ontologies.

Deﬁnition 1 (Datalog Ontology) A Datalog ontology KB = (D, Σ), where Σ = ΣT ΣE ΣNC , consists of a database instance D that is a ﬁnite set of ground atoms (without nulls), a set of TGDs ΣT , a set of separable EGDs ΣE and a set of NCs ΣNC .

Otherwise explicitly said, through the paper when it is clear from context we will refer to the component Σ in KB as the set of constraints in the ontology, without distinguishing between dependencies and constraints. Given a database D for R and a set of constraints Σ = ΣT ΣE ΣNC , the set of models of D and Σ, denoted mods(D, Σ), is the set of all databases B such that D B and every formula in Σ is satisﬁed. The following example shows a simple Datalog ontology; the ontology describes knowledge about the therapy/psychology domain.

Example 1 (Datalog Ontology)

D: {a1 : in therapy(charlie), a2 : dating(kate, charlie), a3 : therapist(kate), a4 : belongs to(g1, charlie), a5 : in therapy(patrick), a6 : belongs to(g2, ed), a7 : belongs to(g1, kate)}

ΣNC : {τ1 : treating(T, P) dating(T, P) }

ΣE : {ν1 : treating(T, P) treating(T , P) T = T }

ΣT : {σ1 : in therapy(P) patient(P), σ2 : therapist(T) belongs to(G, T) leads(T, G), σ3 : leads(T, G) belongs to(G, P) treating(T, P), σ4 : treating(T, P) therapist(T)}

The set ΣT of TGDs expresses dependencies such as: TGD σ1 states that if a person P is in therapy then P is a patient, σ2 establishes that a therapist T that belongs to a group

Deagustini, Martinez, Falappa & Simari

G is the leader of that group. The only NC τ1 states that a patient cannot be dating his therapist, and EGD ν1 states that every patient is in treatment with at most one therapist.

Following the classical notion of consistency, we say that a consistent Datalog ontology has a non-empty set of models.

Deﬁnition 2 (Consistency) A Datalog ontology KB = (D, Σ) is consistent iﬀ mods(D, Σ) = . We say that KB is inconsistent otherwise.

Example 2 Consider the Datalog ontology from the example above; this ontology is clearly inconsistent. Database instance D is clearly not a model in itself since at least TGD σ2 is applicable to D, but there is no superset of D such that it satisﬁes all TGDs and constraints in Σ at the same time. For instance TGDs σ2 is applicable in D creating the atom leads(kate, g1) making now σ3 applicable and resulting in the new atom treating(kate, charlie), which together with dating(kate, charlie) (that was already in D) violate the NC τ1, as a therapist is dating one of her patients.

For the rest of the paper, otherwise explicitly stated KB = (D, Σ) will denote a Datalog

ontology with Σ = ΣT ΣE ΣNC , where D is a database instance, ΣT is the set of all TGDs, ΣE the set of all separable EGDs and ΣNC being the set of all NCs in Σ.

2.2 Background in Belief Revision

Establishing the origins of scientiﬁc ideas is a diﬃcult task that sometimes can be controversial; nevertheless, it could be argued that the origins of belief change theory go back to the work of Isaac Levi (1977), who discussed the problems concerning this ﬁeld of research, and to William Harper s proposal of a rational way to interrelate belief change operators (Harper, 1975). However, the main advances on belief change theory came during the 1980 s when Carlos Alchourr on and David Makinson studied changes in legal codes (Alchourr on & Makinson, 1981), and Peter G ardenfors s introduced rational postulates for change operators (G ardenfors, 1982). After that, the three authors produced a foundational paper containing what became known as the AGM model (Alchourr on, G ardenfors, & Makinson, 1985). The core contribution of the AGM model is the presentation of a new and more general formal framework for the study of belief change; today, this work is considered as the cornerstone from which belief change theory evolved. Since the introduction of the AGM model, diﬀerent frameworks for belief dynamics and their respective epistemic models have been proposed. The epistemic model corresponds to the formalism in which beliefs are represented, providing the framework in which diﬀerent kinds of operators can be deﬁned. The AGM model is conceived as an idealistic theory of rational change in which epistemic states are represented by belief sets (sets of sentences closed under logical consequence, commonly denoted in boldface), and the epistemic input is represented by a sentence. In the AGM model, three basic change operators are deﬁned: expansion, contraction, and revision. In the rest of this section, whenever we use the term consistent or inconsistent, we refer to the traditional notion of inconsistency of a knowledge base that has no models. Let K be a belief set, the change operations are as follows:

Datalog Ontology Consolidation

Expansions: the result of expanding K by a sentence α is a possibly larger set which infers α; intuitively, belief α, hopefully consistent with the given epistemic state, is directly added to K.

Contractions: the result of contracting K by α is a possibly smaller set which does not infer α, unless α is a tautology;

Revisions: the result of revising K by α is a set that neither extends nor is part of the set K. In general, if α is not a fallacy then α is consistently inferred from the revision of K by α.

The great importance of AGM comes from providing axiomatic characterizations of contraction and revision in terms of rationality postulates. Such rationality postulates regard the operators as black boxes, characterizing what they do, but not explaining how they do it. In other words, their behavior is constrained with regard to inputs in basic cases, without describing the internal mechanisms used for achieving that behavior, so it is crucial to say that contraction and revision operators can also be obtained via more constructive approaches. AGM contractions can be realized by partial meet contractions, which are based on a selection among (maximal) subsets of K that do not imply α. Via the Levi s identity (G ardenfors, 1988), associated revision operations called partial meet revisions are obtained. Another possible approach for contraction is based on a selection among the (minimal) subsets of K that contribute to make K imply α, as in safe contraction (Alchourr on & Makinson, 1985). A more general variant of the same approach, known as kernel contraction, was introduced later (Hansson, 1994). It has been shown that both safe contractions and kernel contractions are equivalent to partial meet contractions, and hence to the AGM approach to contraction (Hansson, 1994, 2001). A particularly interesting characteristic of kernel contraction is that it may be concerned with changes at the symbolic level since it is suitable of being applied to belief bases (set of sentences not closed under a consequence relation) as well as belief sets. Thus, it matters how the beliefs are actually represented. This does not happen in the AGM approach, as it studies the changes at the knowledge level since it uses belief sets. The distinction between knowledge and symbolic level was proposed by Allen Newell (1982). According to Newell, the knowledge level lies above the symbolic level, and the latter is used to somehow represent the former. Because of this, belief bases with diﬀerent symbolic content may represent the same knowledge. The importance of this is that, although they are statically equivalent (they represent the same beliefs), equivalent belief bases could be dynamically diﬀerent if we choose to use an approach working directly with them, as with kernel contraction. Besides the three basic operations mentioned, through the years additional operations where developed in Belief Revision to achieve diﬀerent behaviors. For instance, when a belief base is inconsistent, the removal of enough sentences from it can lead to a consistent state. This additional operation is called consolidation, and the consolidation of a belief base K is denoted K! (see Hansson, 1991, 2001). Here we will focus on this last operation, which is inherently diﬀerent from contraction and revision, since its ultimate goal is to obtain a consistent belief base from a possibly inconsistent one (without being given any epistemic input), rather than revising the knowledge base by a speciﬁc formula or by removing a particular formula from it. The consolidation of K can be obtained in a natural way in belief

Deagustini, Martinez, Falappa & Simari

bases by contracting them by falsum, i.e., K! = K , where represents a contraction operator; this process restores consistency attending every conﬂict in K (Hansson, 1991).

3. Incoherence and Inconsistency Problems Related to Datalog

Ontology Consolidation

The problem of obtaining consistent knowledge from an inconsistent knowledge base is natural in many computer science ﬁelds. As knowledge evolves, contradictions are likely to appear, and these inconsistencies have to be handled in a way that they do not aﬀect the quality of the information obtained from the database.

In the setting of Consistent Query Answering (CQA), repairing of relational databases, and inconsistency-tolerant query answering in ontological languages (Arenas, Bertossi, & Chomicki, 1999; Lembo, Lenzerini, Rosati, Ruzzi, & Savo, 2010; Lukasiewicz, Martinez, & Simari, 2012), often the assumption is made that the set Σ expresses the semantics of the data in the component D, and as such there is no internal conﬂict on the set of constraints and these constraints are not subject to changes over time. This means ﬁrst, that the set Σ is always satisﬁable, in the sense that their application do not inevitably yield a consistency problem. Second, as a result of this assumption, it must be the case that the conﬂicts come from the data contained in the database instance, and that is the part of the ontology that must be modiﬁed in order to restore consistency. Although this is a reasonable assumption to make, specially in the case of a single ontology, in this work we will focus on a more general setting, and consider that both data and constraints can change through time and become conﬂicting. In this more general scenario, as knowledge evolves (and so the ontology that represents it) not only data related issues can appear, but also constraint related ones.

We argue that it is also important to identify and separate the sources of conﬂicts in Datalog ontologies. In the previous section we deﬁned inconsistency of a Datalog

ontology based on the lack of models. From an operational point of view, conﬂicts appear in a Datalog ontology whenever a NC or an EGD is violated, that is, whenever the body of one such constraint can be mapped to either atoms in D or atoms that can be obtained from D by the application of the TGDs in ΣT Σ. Beside these conﬂicts, we will also focus on the relationship between the set of TGDs and the set of NCs and EGDs, as it could happen that (a subset of) the TGDs in ΣT cannot be applied without leading always to the violation of the NCs or EGDs. Note that in this case clearly the data in the database instance is not the problem, as any database in which these TGDs are applicable will inevitable produce an inconsistent ontology. This issue is related to the unsatisﬁability problem of a concept in an ontology, and it is known in the Description Logics community as incoherence (Flouris, Huang, Pan, Plexousakis, & Wache, 2006; Schlobach & Cornet, 2003; Borgida, 1995; Beneventano & Bergamaschi, 1997; Kalyanpur, Parsia, Sirin, & Hendler, 2005; Schlobach, Huang, Cornet, & van Harmelen, 2007; Qi & Hunter, 2007). Incoherence can be particularly important when combining multiple ontologies since the constraints imposed by each one of them over the data could (possibly) represent conﬂicting models of the application at hand. Clearly, the notions of incoherence and inconsistency are highly related; in fact, Flouris et al. s (2006) work establish a relation between incoherence and inconsistency, considering incoherence as a particular form of inconsistency.

Datalog Ontology Consolidation

Later in this section we present a complete deﬁnition of incoherence in Datalog , based on the concept of unsatisﬁability of sets of TGDs. Nevertheless, for now it is suﬃcient to know that our proposed notion of incoherence states that given a set of unsatisﬁable constraints Σ it is not possible to ﬁnd a set of atoms D such that KB = (D, Σ) is a consistent ontology and at the same time all TGDs in ΣT Σ are applicable in D. This means that a Datalog ontology can be consistent even if the set of constraints is incoherent, as long as the database instance does not make those dependencies applicable. On the other hand, a Datalog ontology can be inconsistent even when the set of constraints is satisﬁable, e.g., KB = ({tall(peter), small(peter)}, {tall(X) small(X) }), where the (empty) set of dependencies is trivially satisﬁable and thus the ontology coherent; the ontology is, nevertheless, inconsistent. Before formalizing the notion of incoherence that we use in our Datalog setting we need to identify the set of atoms relevant to a given set of TGDs. Intuitively, we say that a set of atoms A is relevant to a set T of TGDs if the atoms in the set A are such that the application of T over A generates the atoms that are needed to apply all TGDs in T, i.e., A triggers the application of every TGD in T.

Deﬁnition 3 (Relevant Set of Atoms for a Set of TGDs) Let R be a relational schema, T be a set of TGDs, and A a (possibly existentially closed) non-empty set of atoms, both over R. We say that A is relevant to T iﬀfor all σ T of the form X YΦ(X, Y) ZΨ(X, Z) it holds that chase(A, T) |= X YΦ(X, Y).

When it is clear from the context, if a singleton set A = {a} is relevant to T ΣT we just say that atom a is relevant to T.

Example 3 (Relevant Set of Atoms) Consider the following constraints:

ΣT = {σ1 : supervises(X , Y ) supervisor(X ), σ2 : supervisor(X ) makes decisions(X ) leads department(X , D), σ3 : employee(X ) works in(X , D)}

Consider set A1 = {supervises(walter, jesse), makes decisions(walter), employee(jesse)}. This set is a relevant set of atoms to the set of constraints ΣT = {σ1, σ2, σ3}, since σ1 and σ3 are directly applicable to A1 and σ2 becomes applicable when we apply σ1 (i.e., the chase entails the atom supervisor(walter), which together with makes decisions(walter) triggers σ2). However, the set A2 = {supervises(walter, jesse), makes decisions(gus)} is not relevant to ΣT . Note that even though σ1 is applicable to A2, the TGDs σ2 and σ3 are never applied in chase(A2, ΣT ), since the atoms in their bodies are never generated in chase(A2, ΣT ). For instance, consider the TGD σ2 ΣT . In the chase of ΣT over D we create the atom supervisor(walter), but nevertheless we still cannot trigger σ2 since we do not have and cannot generate the atom makes decisions(walter), and the atom makes decisions(gus) that is already in A2 does not match the constant value.

We now present the notion of coherence for Datalog , which adapts eﬀorts made for DLs such as Schlobach and Cornet s (2003) and Flouris et al. s (2006). Our conception

Deagustini, Martinez, Falappa & Simari

of (in)coherence is based on the notion of satisﬁability of a set of TGDs w.r.t. a set of constraints. Intuitively, a set of dependencies is satisﬁable when there is a relevant set of atoms that triggers the application of all dependencies in the set and does not produce the violation of any constraint in ΣNC ΣE, i.e., the TGDs can be satisﬁed along with the NCs and EGDs in the KB.

Deﬁnition 4 (Satisﬁability of a Set of TGDs w.r.t. a Set of Constraints) Let R be a relational schema, T ΣT be a set of TGDs, and N ΣNC ΣE, both over R. The set T is satisﬁable w.r.t. N iﬀthere is a set A of (possibly existentially closed) atoms over R such that A is relevant to T and mods(A, T N) = . We say that T is unsatisﬁable w.r.t. N iﬀT is not satisﬁable w.r.t. N. Furthermore, ΣT is satisﬁable w.r.t. ΣNC ΣE iﬀthere is no T ΣT such that T is unsatisﬁable w.r.t. some N with N ΣNC ΣE.

In the rest of the paper sometimes we write that a set of TGDs is (un)satisﬁable omitting the set of constraints, we do this in the context of a particular ontology where we have a ﬁxed set of constraints ΣNC ΣE since any set of TGDs that is satisﬁable w.r.t. ΣNC ΣE is satisﬁable w.r.t. any subset of it and, on the other hand, any set of TGDs that is unsatisﬁable w.r.t. a subset of ΣNC ΣE is also unsatisﬁable w.r.t. the whole set of constraints.

Example 4 (Unsatisﬁable Sets of Dependencies) Consider the following constraints.

Σ1 NC = {τ : risky job(P) unstable(P) }

Σ1 T = {σ1 : dangerous work(W) works in(W, P) risky job(P),

σ2 : in therapy(P) unstable(P)}

The set Σ1 T is a satisﬁable set of TGDs, and even though the simultaneous application of σ1 and σ2 may violate some formula in Σ1 NC Σ1 E, that does not hold for every relevant set of atoms. Consider as an example the relevant set D1 = {dangerous work(police), works in(police, marty), in therapy(rust)}; D1 is a relevant set for Σ1 T , however, as we have that mods(D1, Σ1 T Σ1 NC Σ1 E) = then Σ1 T is satisﬁable.

On the other hand, as an example of unsatisﬁability consider the following constraints:

Σ2 NC = {τ1 : sore throat(X) can sing(X) }

Σ2 T = {σ1 : rock singer(X) sing loud(X), σ2 : sing loud(X) sore throat(X), σ3 : rock singer(X) can sing(X)}

The set Σ2 T is an unsatisﬁable set of dependencies, as the application of TGDs {σ1, σ2, σ3} on any relevant set of atoms will cause the violation of τ1. For instance, consider the relevant atom rock singer(axl): we have that the application of Σ2 T over {rock singer(axl)} causes the violation of τ1 when considered together with Σ2 T , and therefore we have that mods({rock singer(axl)}, Σ2 T Σ2 NC Σ2 E) = . Note that any set of relevant atoms will cause the violation of τ1.

We are now ready to formally deﬁne coherence for a Datalog ontology. Intuitively, an ontology is coherent if there is no subset of their TGDs that is unsatisﬁable w.r.t. the constraints in the ontology.

Datalog Ontology Consolidation

Deﬁnition 5 (Coherence) An ontology KB is coherent if and only if ΣT is satisﬁable w.r.t. ΣNC ΣE. Also, KB is said to be incoherent iﬀit is not coherent.

Example 5 (Coherence) Consider the sets of dependencies and constraints deﬁned in Example 4 and an arbitrary database instance D. We can see that the Datalog ontology KB1 = (D, Σ1 T Σ1 NC Σ1 E) is coherent, while KB2 = (D, Σ2 T Σ2 NC Σ2 E) is incoherent.

Considering incoherence of a set of TGDs is important in the consolidation process of Datalog ontologies, since if not treated appropriately within the consolidation process, an incoherent set of TGDs may lead to the trivial solution of removing every single relevant atom in D (in the worst case, the entire database instance). This may be adequate for some particular domains, but does not seem to be a desirable outcome in the general case. Looking into Deﬁnitions 4 and 5 we can see that there is a close relationship between the concepts of incoherence and inconsistency. In fact, it can be inferred from those deﬁnitions that an incoherent KB will induce an inconsistent KB when the database instance contains any set of atoms that is relevant to the unsatisﬁable sets of TGDs. This result is captured in the following proposition (proofs of results are presented in Appendix A).

Proposition 1 If KB is incoherent and there exists A D such that A is relevant to some unsatisﬁable set U ΣT then KB = (D, Σ) is inconsistent.

As an instance of this relationship, consider the following representative example.

Example 6 (Relating Incoherence and Inconsistency) Consider the following ontology.

D : {a1 : can sing(simone), a2 : rock singer(axl), a3 : sing loud(ronnie), a4 : has fans(ronnie), a5 : rock singer(ronnie), a6 : rock singer(roy), a7 : manage(band1, richard)}

ΣNC : {τ1 : sore throat(X) can sing(X) , τ2 : has private life(X) famous(X) }

ΣE : {ν1 : manage(X, Y ) manage(X, Z) Y = Z}

ΣT : {σ1 : rock singer(X) sing loud(X), σ2 : sing loud(X) sore throat(X), σ3 : has fans(X) famous(X), σ4 : rock singer(X) can sing(X), σ5 : has fans(X) has private life(X)}

As hinted previously in Example 4, there we have the set A D = {rock singer(axl)} and the unsatisﬁable set of TGDs U ΣT = {σ1 : rock singer(X) sing loud(X), σ2 : sing loud(X) sore throat(X), σ4 : rock singer(X) can sing(X)}. Since A is relevant to U the conditions in Proposition 1 are fulﬁlled, and indeed the ontology KB = (D, Σ) is inconsistent since τ1 ΣT is violated.

A set of constraints such as the one presented in Example 6 may appear when we consider scenarios where both components of an ontology evolve (perhaps being collaboratively

Deagustini, Martinez, Falappa & Simari

maintained by a pool of users). As long as new constraints are added, incoherence problems may arise. In this particular scenario it would seem more sensible to identify and modify, somehow, the set of incoherent constraints to make them satisﬁable, instead of deleting all the information from the ontology; and only then proceed to solve remaining inconsistencies, if any. That is, it could be beneﬁcial to deﬁne consolidation processes in which the changes performed to achieve coherence are given higher priority than the changes needed for consistency when possible. To address this we present a twofold proposal for consolidation of Datalog ontologies: that is, to obtain the new KB we begin by addressing issues in the component ΣT w.r.t. the components ΣE and ΣNC in the original ontology to obtain a new coherent set of constraints, giving up some TGDs in ΣT if necessary. Then, we address the problems arising from the component D, obtaining a new one D that is consistent with Σ T ΣE ΣNC . In the next section we characterize, by means of a set of postulates, a consolidation operator that takes into account these considerations.

4. Characterizing the Consolidation: Postulates for Datalog Ontology Consolidation Operators

Belief Revision is one of the main areas that deals with deﬁned principled methods to solve incoherences and inconsistencies; as explained in Section 2, it is common to characterize change operators by means of postulates, which are properties that the operators must satisfy. In this section we introduce a set of postulates with the objective of characterizing consolidation operators for Datalog ontologies. We start by brieﬂy deﬁning the scenario underlying the consolidation process and introducing the characteristics of the sets of formulas that we focus on (Friedman & Halpern, 2001).

4.1 Deﬁning the Consolidation Environment

Depending on the type of knowledge base, we ﬁnd two main streams of work in Belief Revision. On one hand, some works are based on sets of formulas that are closed under some consequence relation, called belief sets (Alchourr on et al., 1985). This is known in the Belief Revision literature as the coherence model. On the other hand, the option is to choose belief bases (Katsuno & Mendelzon, 1991, 1992; Fuhrmann, 1991; Hansson, 1994, 1997, 2001; Falappa, Kern-Isberner, & Simari, 2002), i.e., non-closed sets of formulas; this is referred to as the foundational model.

Opposite to the traditional closed world assumption found in other established areas like relational databases, one important characteristic of Datalog is that of an open world assumption, unknown data is represented by means of null values. As a consequence, the generation of new information in the language by the application of rules is susceptible of being inﬁnite (Cal ı, Gottlob, & Kifer, 2008, 2013), which seems to make the foundational model a more appealing choice when working in this setting. Therefore, for the consolidation of Datalog ontologies we have chosen to follow the foundational model. In this model, the epistemic state is a (possibly incoherent and inconsistent) Datalog ontology.

Datalog Ontology Consolidation

4.2 Expected Properties for the Consolidation Operator: Postulates

We present now the set of properties that a consolidation operator for Datalog ontologies must satisfy. We use the following notation through the rest of the paper. Let KB = (D, Σ) be the original Datalog ontology being consolidated, where Σ = ΣT ΣE ΣNC . Also, KB! denotes the Datalog ontology KB! = (D!, Σ!) resulting from the consolidation of KB, with D! and Σ! being the consolidated components D and Σ in KB!, respectively. When necessary we will diﬀerentiate KBs by using subscripts. In such cases, given KBi we denote its consolidation by KBi! = (Di!, Σi!). We are ready now to introduce the Ontology Consolidation Postulates (OCP) expected to be satisﬁed by the consolidation operators. Let Θ be the set of all Datalog ontologies. Then, a Datalog ontology consolidation operator ! : Θ Θ is a function that must satisfy the following properties:

OCP 1. (Inclusion) Σ! Σ and D! D.

The consolidation process only includes in the resulting ontology formulas belonging to the original ontology.

OCP 2. (Consistency) KB! is consistent.

The ontology obtained by the consolidation process must be consistent, i.e., there are no negative constraints or equality-generating dependencies that are violated when we apply all TGDs in Σ! to the atoms in D!, and therefore mods({D!, Σ!}) = .

OCP 3. (Coherence) KB! is coherent.

The ontology obtained by the consolidation process must be coherent, i.e., ΣT in Σ! must be satisﬁable with respect to ΣNC ΣE in Σ!.

OCP 4. (Minimality): If KB KB is coherent and consistent, then it holds that KB! KB .

There is no coherent and consistent ontology obtained from the original ontology that strictly contains the consolidated ontology.

Some of the postulates presented are inspired by the properties proposed by Hansson (1994) and by Konieczny and Pino-P erez (2002). Nevertheless, they are adapted to suit the particularities of the ontological setting of Datalog ; in particular, they take into account the distinction between incoherence and inconsistency. For instance, Inclusion is a direct adaptation of Hansson s homonymous postulate (Hansson, 1994), which states that the contraction of a knowledge base should be a (not necessarily proper) subset of the original one. Consistency and Coherence, on the other hand, result from adapting to our setting Konieczny and Pino-P erez s postulate IC1 (2002), which intuitively ask that the resulting merging must be consistent; in here we ask that the resulting consolidation is not only consistent but also coherent. Minimality is a postulate added to ensure the quality of the consolidation (w.r.t. a loss of information aspect), and is not adapted from any particular work, but rather as a general notion in Belief Revision, where as noted by

Deagustini, Martinez, Falappa & Simari

Hansson (2001) it has been given many names such as conservatism (Harman, 2008), conservativity (G ardenfors, 1988), minimum mutilation (Quine, 1986) and minimal change (Rott, 1992). The proposed postulates capture the notion that changes made with respect to the original ontology are those that are necessary, and that the resulting ontology is, as expected, both coherent and consistent. That is, given the original ontology the consolidation process only removes constraints (TGDs) and atoms if they are somehow involved in making the original ontology incoherent/inconsistent, and makes it in such a way that no unnecessary removal is made.

5. A Datalog Ontology Consolidation Operator

In previous sections we have presented examples of incoherences and inconsistencies that can arise in Datalog ontologies. Additionally, we stated the properties that the consolidation operator should satisfy in order to make adequate changes in the original ontology regaining coherence and consistency. Now, we propose a construction for the consolidation operator that addresses such incoherence and inconsistency problems in Datalog ontologies.

5.1 A Possible Construction for the Consolidation Operator

In the literature of Belief Revision several constructions for revision and contraction operators have been studied. Hansson (1994) presents how a contraction operation on belief bases can be modeled by means of the application of incision functions. These functions contract a belief base by a formula α by taking minimal sets that entail α (called α-kernels) and producing incisions on these sets so they no longer entail α. The resulting belief base is then conformed by the union of all formulas that are not removed by any function. This approach is known as kernel contraction; the task of restoring consistency is also known in the belief revision literature as contraction by falsum (Hansson, 1991). In this work, we deﬁne the consolidation process as the application of incision functions. Nevertheless, instead of directly considering minimal inconsistent subsets of formulas from the diﬀerent components of the ontology (which are equivalent to -kernels), in this work we perform incisions over structures called clusters (Martinez, Pugliese, Simari, Subrahmanian, & Prade, 2007; Lukasiewicz et al., 2012) that groups together related kernels. More speciﬁcally, to solve incoherence we begin by establishing the dependency kernels; in an analogous way, we deﬁne the data kernels to solve inconsistencies in D w.r.t. Σ, then, based on them, we obtain the dependency clusters and data clusters by exploiting an overlapping relation.

5.1.1 Identifying the Relation among Conflicts

The ﬁrst step towards conﬂict resolution in our framework is to calculate the minimal coherence and consistency conﬂicts, and identify possible relations among such conﬂicts, if any. Dependency kernels are sets of TGDs which are unsatisﬁable w.r.t. the set of NCs and EGDs in a Datalog ontology and are minimal under set inclusion. These sets are known as Minimal unsatisﬁability-preserving sub-TBoxes (MUPS) and Minimal incoherence-preserving sub-TBoxes (MIPS) (Schlobach & Cornet, 2003) in the DL community.

Datalog Ontology Consolidation

Deﬁnition 6 (Dependency Kernels) The set of dependency kernels of KB, denoted with Q

KB, is the set of all X ΣT such that X is an unsatisﬁable set of dependencies w.r.t. ΣNC ΣE and every proper subset X of X (X X) is satisﬁable w.r.t. ΣNC ΣE.

Example 7 (Dependency Kernels) Consider the following sets of constraints in a Datalog ontology KB:

ΣNC : {τ1 : counselor(X ) regent(X ) , τ2 : cannot rule(X ) heir(X ) }

ΣE : {ν1 : advise(X , K) advise(X , K ) K = K }

ΣT : {σ1 : advise(X , K) counselor(X ), σ2 : propose law(X , K) regent(X ), σ3 : prince(P) heir(P), σ4 : son(P, K) king(K) prince(P), σ5 : counselor(C) regent(C), σ6 : bastard son(X , Y ) son(X , Y ), σ7 : bastard son(X , K) king(K) cannot rule(X )}

For this KB there exist two dependency kernels, i.e.,

KB = {{σ3, σ4, σ6, σ7}, {σ5}}.

It is easy to show that the dependency kernels for a Datalog ontology are independent from the particular component D in the ontology, and thus they can be obtained by looking only into the component Σ. That is, even if we replace the component D in an ontology with an empty set of atoms, the dependency kernels for the ontology with the empty database are the same than those in the original one.

Lemma 1 Let KB1 = (D1, Σ1) and KB2 = ( , Σ2) be two Datalog ontologies such that Σ1 = Σ2. Then, Q

In addition to the removal of the TGDs that make a set Σ unsatisﬁable (thus making an ontology incoherent), to solve inconsistencies we may need to remove atoms from components D in order to address data inconsistency as well. Analogously to the deﬁnition of the dependency kernels, we deﬁne now data kernels as the minimal subset of atoms in D that makes a KB = (D, Σ) inconsistent.

Deﬁnition 7 (Data Kernels) The set of data kernels of KB, denoted with

KB, is the set of all X D such that mods(X, Σ) = and for every X X it holds that mods(X , Σ) = .

Deagustini, Martinez, Falappa & Simari

Example 8 (Data Kernels) Consider the following coherent but inconsistent KB, proposed by Lukasiewicz et al. (2012).

D : {directs(john, d1), directs(tom, d1), directs(tom, d2), supervises(tom, john), works in(john, d1), works in(tom, d1)}

ΣNC : {supervises(X , Y ) manager(Y ) , supervises(X , Y ) works in(X , D) directs(Y , D) }

ΣE : {directs(X , D) directs(X , D ) D = D }

ΣT : {σ1 : works in(X , D) employee(X ), σ2 : directs(X , D) employee(X ), σ3 : directs(X , D) works in(X , D) manager(X )}

For this KB, the set of data kernels is

{supervises(tom, john), directs(john, d1), works in(john, d1)}, {supervises(tom, john), directs(john, d1), works in(tom, d1)}, {directs(tom, d1), directs(tom, d2)}

Once we know the minimal conﬂicts in the ontology we identify relations among them, if such relation exists. To do this, we group related kernels together in a new structure called cluster, which makes possible to achieve an optimal solution in related kernels. Clusters are obtained through an overlapping relation deﬁned as follows.

Deﬁnition 8 (Overlapping, Equivalence) Let L be a ﬁrst order language, R L be a relational schema, and LR the sublanguage generated by R. Given A LR and B LR, we say they overlap, denoted A θB, iﬀA T B = . Furthermore, given a multi-set of ﬁrst order formulas M 2LR we denote as θ M the equivalence relation obtained over M through the reﬂexive and transitive closure of θ.

By exploiting the overlapping among dependency kernels and data kernels we can deﬁne dependency clusters and data clusters, respectively.

Deﬁnition 9 (Dependency Clusters) Let Q

KB be the set of Dependency Kernels for KB. Let θ be the overlapping relation, and K = Q

KB the quotient set for the equiv-

alence relation obtained over Q

KB. A Constraint Cluster is a set C = S κ [κ] κ, where [κ] K. We denote by Q Q

KB the set of all Constraint Clusters for KB.

Deﬁnition 10 (Data Clusters) Let

KB be the set of Data Kernels for KB. Let θ be the overlapping relation, and K =

KB the quotient set for the equivalence relation

obtained over

KB. A Data Cluster is a set C = S κ [κ] κ, where [κ] K. We denote by

KB the set of all Data Clusters for KB.

Intuitively, a dependency cluster groups dependency kernels that have some TGD in common, in a transitive fashion; data clusters groups data kernels in an analogous way.

Datalog Ontology Consolidation

Example 9 (Dependency Clusters and Data Clusters) Assume we have KB such that Q

KB = {{σ1, σ2}, {σ1, σ3}, {σ4, σ5}} and

KB = {{a1, a2}, {a1, a3}, {a2, a4, a5}}. Then, we have two dependency clusters based on those kernels, grouping the ﬁrst two kernels (due to σ1) and the remaining kernel in another cluster; i.e.,

KB = {{σ1, σ2, σ3}, {σ4, σ5}}.

On the other hand, for the case of data clusters we have that

KB = {{a1, a2, a3, a4, a5}}.

The following proposition states that, since clusters are based on equivalence classes, every kernel is included in one and only one cluster.

Proposition 2 Y Q

KB is such that Y X for some X Q Q

KB if and only if Y X

for all X Q Q

KB such that X = X . Analogously, Y

KB is such that Y X for some X

KB if and only if Y X for all X

KB such that X = X .

As a corollary of Proposition 2 we have that a formula in a kernel is included in only one cluster.

Corollary 1 (Corollary from Proposition 2) Let α Y for some Y Q

KB and β Y for some Y

KB. Then, α X for some X Q Q

KB if and only if α / X for all X Q Q

KB such that X = X . Analogously, β Y is such that β X for some X

KB if and only if β / X for all X

KB such that X = X .

The following lemma that we shall use further in the paper shows an example of how, in the ontological setting of Datalog , Leibniz s indiscernibility of identicals (von Leibniz, 1976) holds w.r.t. the clusters in Datalog ontologies, as when two KBs are equivalent they have the same set of clusters.

Lemma 2 Let KB1 and KB2 be two Datalog ontologies such that KB1 = KB2. Then,

KB2 and Q Q

5.1.2 Solving Conflicts: Incision Functions

Once we have identiﬁed the clusters, we have to establish how the incoherences and inconsistencies are solved. An incision function selects which formulas should be deleted from the data and dependency clusters.

Deﬁnition 11 (General Incision Function) A General Incision Function for KB is a function δ : (2LR, 2LR) 2LR such that all following conditions holds:

1. δ(KB) S(Q Q

2. For all X Q Q

KB such that Y X it holds that (Y δ(KB)) = .

3. For all X

KB such that Y X it holds that (Y δ(KB)) = .

Deagustini, Martinez, Falappa & Simari

4. For all X Q Q

KB it holds that T = (X δ(KB)) is such that there not exists R X where R satisﬁes conditions 1 and 2, and R T.

5. For all X

KB it holds that T = (X δ(KB)) is such that there not exists R X where R satisﬁes conditions 1 and 3, and R T.

Deﬁnition 11 states that a general incision function selects from each dependency (data, respectively) cluster TGDs (atoms, respectively) for deletion in order to restore coherence (consistency). Any incision function that complies with Deﬁnition 11 can be used as a base for a consolidation operator. However, note that such an operator may not diﬀerentiate between restoring coherence and consistency. This is not a problem in the classic literature of Belief Revision since there is no notion of incoherence, and there is no distinction between rules and facts in languages like propositional logic; thus, only consistency conﬂicts can appear, avoiding the need to treat incoherences. Nevertheless, in the ontological setting of Datalog we have the opportunity of exploiting the fact that we have two diﬀerent although related kinds of conﬂicts to address them separately with the goal of ﬁnding a solution that better suits the needs of applications that rely on this kind of knowledge bases. The point this paper is trying to make is that, for knowledge bases in the form of Datalog ontologies it is important to diﬀerentiate, and adequately handle, incoherence from inconsistency as the quality of the consolidated ontology heavily depends on that assuming we strive for minimal loss in the process. This, more complex setting, needs a careful deﬁnition of what constitutes a kernel. To see what could happen if this is not done properly, consider the following example.

Example 10 (Inﬂuence of Incoherence on Consolidation) Consider KB from Example 6. There we have Σ = Σ2 T Σ2 E Σ2 NC such that Σ2 T is unsatisﬁable. As explained in Example 6, for the singleton set {rock singer(axl)} we have that the NC τ1 : sore throat(X) can sing(X) is violated, making {rock singer(axl)} inconsistent with Σ. Then, {rock singer(axl)} is a data kernel (and a data cluster, since it cannot overlap with any other kernel) and the same is veriﬁable for every singleton set in D relevant to some dependency cluster. Thus, we have that

{rock singer(axl)}, {rock singer(ronnie)}, {rock singer(roy)}, {has fans(ronnie)}

Consider the cluster {rock singer(axl)}; for this cluster we have that

δ({rock singer(axl)}) = rock singer(axl).

This same situation holds for every cluster in

KB, and thus δ(

The problem in this example is that the data kernels (and hence so the data clusters) are computed w.r.t. the original Σ component, which, in this case, contain unsatisﬁable sets of constraints. As can be seen in Example 10, this becomes of utter importance when we have atoms relevant to unsatisﬁable sets: in that case, any general incision function (and any inconsistency management technique based on deletion that does not treat incoherence conﬂicts) will necessarily delete such atoms.

Datalog Ontology Consolidation

Proposition 3 Let δ be a general incision function. If α D is relevant to some X Q

KB then α δ(KB).

Clearly, as a corollary of Proposition 3 we have that if every atom in D is relevant to some unsatisﬁable set then we have to remove every atom in D to restore consistency.

Corollary 2 (Corollary from Proposition 3) Let δ be a general incision function. If for all α D it holds that α is relevant to some X Q

KB then D δ(KB).

As seen, incoherence can have great inﬂuence in consolidation if not treated properly (that is, previously to the consistency restoration). It would seem better to compute the data clusters based only on the retained satisﬁable part of the Σ components. In Lemma 1 we show that the dependency kernels can be obtained independently of the D component from the original ontology, because unsatisﬁable sets are such that they violate a negative constraint or equality-generating dependency for any relevant set of atoms. Therefore, we can ﬁrst obtain QQ

KB, and use the incision function on the dependency clusters to select which TGDs will be deleted. Then, we calculate

KB based on the result of the application of the incision function on QQ

KB, in this way only paying attention to the constraints that will prevail in the consolidation process. Next, we deﬁne both constraint incision functions and data incision functions which are used to select candidates for deletion (from the original ontology) to restore coherence and consistency, respectively. First, we deﬁne an incision function on dependency clusters that helps to solve incoherence on the constraints.

Deﬁnition 12 (Constraints Incision Function) A Constraint Incision Function for KB is a function ρ : (2LR, 2LR) 2LR such that all following conditions hold:

1. ρ(KB) S(Q Q

2. For all X Q Q

KB such that Y X it holds that (Y ρ(KB)) = .

3. For all X Q Q

KB it holds that T = (X ρ(KB)) is such that there not exists R X where R satisﬁes conditions 1 and 2, and R T.

Intuitively, a constraint incision function takes dependency clusters and removes TGDs from each of them in such a way that the resulting KB is coherent. Analogously to the constraints incision functions, we deﬁne data incision functions that solve inconsistencies in

Deﬁnition 13 (Data Incision Function) A Data Incision Function for D is a function ϱ : (2LR, 2LR) 2LR such that all following conditions hold:

KB such that Y X it holds that (Y ϱ(KB)) = .

KB it holds that T = (X ϱ(KB)) is such that there not exists R X where R satisﬁes conditions 13 and 13, and R T.

Deagustini, Martinez, Falappa & Simari

Finally, it is necessary to make a signiﬁcant remark regarding our usage of incision functions. For that, let us ﬁrst consider the following excerpt quoted from Hansson s (2001, cf. p. 122) regarding the possible parameters passed to selection functions (which in our case are incision functions) and how this choice aﬀects the possible outcomes.

[. . . ] the proof of uniformity makes essential use of the fact that selection functions have been deﬁned on remainder sets of the form A α, not on pairs of the form A α, α . If we had instead deﬁned selection functions as follows:

γ(A, α) is a non-empty subset of γ(A α, α) if A α is non-empty.

γ(A, α) = {A} if A α is empty.

then T γ(A, α) would have been an operation very similar to partial meet contraction in other respects, but it would have been possible for γ(A, α) = γ(A, β) to hold if A α = A β, which the standard deﬁnition does not allow [. . . ]

Thus, extending Hansson s observation to incision functions and their use on consolidation, we have that if we take only the sets of conﬂicts as arguments for incisions then the formulas to be removed from two diﬀerent ontologies having the same set of conﬂicts by an operator using the incision function are identical. The reason for this is that the operator could not tell the diﬀerence between the ontologies since its parameter is only the conﬂicts, which are exactly the same. However, here we have chosen not to restrict our family of operators to such behaviors; instead, we model operators whose behavior could select for removal the same formula from equal conﬂicts, but that are not restricted to that choice. To achieve this, we have chosen to take ontologies as parameters; so, if it ﬁts the application domain in which the operators are exploited, formulas that are not in any conﬂict could aﬀect the outcome of the consolidation. In the approach presented here, an incision function not only should consider the TGD s eﬀect over a cluster, but its global eﬀect over the whole knowledge base. The reason for this requirement is that unlike the classic models of belief revision, the language used has greater expressivity and the fact that a TGD generates multiple inferences. For instance, in our framework from a TGD of the form X YΦ(X, Y) ZΨ(X, Z) it is possible to infer multiple instances of Ψ(X, Z). To see the reason behind our choice more clearly consider the following example.

Example 11 Consider the following ontologies.

D : {p(a), q(a)}

ΣNC : {p(X ) r(X ) }

ΣT : {σ1 : q(X ) r(X )}

D : {p(a), q(a)}

ΣNC : {p(X ) r(X ) }

ΣT : {σ1 : q(X ) r(X ), σ2 : p(X ) s(X ), σ3 : p(X ) t(X )}

For these KB, the set of data clusters are equal, as we have

Datalog Ontology Consolidation

KB2 = {p(a), q(a)}

If we use the standard approach and take clusters as arguments for incisions, then we must remove the same formula in both ontologies, because as it was explained the incision is a function and therefore it cannot choose diﬀerently for the same argument. Nevertheless, suppose that in our particular scenario we want to remove the atoms based on the information they help to infer. If that is the case, then from KB1 we should remove p(a), but from KB2 we should take out q(a), since in KB2 the formula p(a) triggers more TGDs, thus inferring more atoms. To achieve this type of behavior, it is necessary to pass ontologies as parameters, since it is that what provides the adequate context.

5.1.3 Cluster Contraction-Based Consolidation Operator

Lastly, we deﬁne the consolidation operator for Datalog ontologies that represents the two diﬀerent parts in the consolidation. First, the coherence restoration of the component Σ is obtained based on the dependency clusters in the D component in the original ontology. Second, the restoration of consistency in the D component is obtained based on the data clusters w.r.t. the Σ! component obtained by applying a constraint incision function on the original Σ. In this way we achieve the behavior stated earlier in the paper; in a sense, we give the incoherence resolution higher priority, since if we can retain atoms by addressing unsatisﬁable sets of TGDs instead, we choose to follow that path. The cluster contractionbased consolidation operator is formally deﬁned as follows:

Deﬁnition 14 (Cluster Contraction-based Consolidation Operator) Let KB be a Datalog ontology, ρ be a Constraint Incision Function and ϱ a Data Incision Function. Also, let KB = (D, Σ \ ρ(KB)) be the Datalog ontology resulting from deleting from KB the TGDs selected by ρ. The Cluster contraction-based consolidation operator KB!, is deﬁned as follows:

KB! = (D \ ϱ(KB ), Σ \ ρ(KB))

The result of KB! is the Datalog ontology obtained by removing, ﬁrst, the TGDs (selected by ρ from QQ

KB) and then atoms (selected by ϱ from

KB ) from the original ontology KB. It is important to note that, on one hand only TGDs are removed from Σ, as dependency clusters do not contain EGDs or NCs. On the other hand, as the Data Incision Function uses KB instead of KB then only atoms from D that are in conﬂicts with Σ \ ρ(KB) are removed; this is because data clusters are calculated based on the constrains obtained after the consolidation of Σ.

5.2 Relation between Postulates and Construction: Representation Theorem

In Section 4 we have introduced the properties that a Datalog consolidation operator must satisfy. By means of the following representation theorem we can now establish the relationship between the set of postulates for a Datalog ontology consolidation operator and the cluster contraction-based consolidation operator that we proposed in the previous section. In what follows we denote with ! a consolidation operator deﬁned as in Deﬁnition 14 where ρ and ϱ correspond to arbitrary constraint and data incision functions, respectively.

Deagustini, Martinez, Falappa & Simari

Theorem 1 (Representation Theorem) The operator consolidation ! is a Cluster Contraction-based Datalog Ontology Consolidation Operator for a Datalog ontology KB iﬀit satisﬁes Inclusion, Coherence, Consistency, and Minimality.

6. A Complete Example of Datalog Ontologies Consolidation

We have introduced an operator that allows us to consolidate Datalog ontologies that satisﬁes the set of expected properties expressed by the postulates in Section 4. In this section, the complete process for the consolidation of Datalog ontologies is depicted in the following example.

Example 12 (Consolidation of Datalog Ontologies) Suppose that we have the (incoherent and inconsistent) ontology KB shown in Figure 1, which expresses the information we have collected about certain company.

D : {a1 : boss(walter), a2 : supervises(walter, jesse), a3 : makes decisions(walter),a4 : makes decisions(jesse), a5 : supervises(skyler, walter), a6 : employee(walter), a7 : in charge of (jesse, distribution), a8 : in charge of (walter, cooking), a9 : on strike(mike)}

ΣNC : {τ1 : follows orders(X ) makes decisions(X ) , τ2 : supervises(Y , X ) supervisor(X ) , τ3 : absent(X ) on strike(X ) }

ΣE : {ν1 : in charge of (X , Y ) in charge of (X , Y ) Y = Y }

ΣT : {σ1 : employee(X ) is supervised(X ), σ2 : is supervised(X ) follows orders(X ), σ3 : boss(X ) makes proﬁt(X ), σ4 : supervises(Y , X ) supervisor(Y ), σ5 : supervises(Y , X ) employee(X ), σ6 : is supervised(X ) makes decisions(X ), σ7 : is supervised(X ) has work(X ), σ8 : has work(X ) get paid(X ), σ9 : has work(X ) Y in charge of (X , Y ), σ10 : on strike(X ) absent(X )}

Figure 1: The original ontology to be consolidated.

Now, to begin with the ﬁrst part of the consolidation process (i.e., solving incoherences by making the set ΣT satisﬁable) we obtain, as the ﬁrst step towards obtaining the dependency clusters, the dependency kernels for KB:

KB = {{σ2, σ6}, {σ10}},

and based on the kernels, we calculate the set of dependency clusters for KB QQ

KB = {{σ2, σ6}, {σ10}}.

Datalog Ontology Consolidation

Note that, as there is no overlap among dependency kernels, we have Q

KB. Next, we use a cluster incision function to solve incoherency problems. For the sake of the example assume that we will guide the contraction process by means of a quantitative criterion, i.e., choosing among the possible incisions the ones that removes fewer formulas, and using the plausibility among formulas when the cardinality of the incisions is the same. In the following we show the possible incisions, i.e., those satisfying the conditions in Deﬁnition 12. These sets are

For cluster {σ2, σ6} we could either remove σ2 or σ6. Since the two incisions remove the same number of atoms assume for this example that σ2 is more plausible than σ6, and thus we prefer to retain the former. For cluster {σ10} we can only remove σ10. Then, the particular incision in this example will be as follows:

ρ({σ2, σ6}) = {σ6} ρ({σ10}) = {σ10}

Now, we move on to the next part in the consolidation process: the consistency recovery. As explained before, for this part the operator only considers TGDs that will eﬀectively be included in the consolidation. In this particular example this is ΣT ! = ΣT \ {σ6, σ10}. From now on then let KB = (D, Σ!); based on KB we calculate the data kernels.

KB = {{a2, a4}, {a3, a5}, {a3, a6}, {a2, a5}}

Then, we obtain the data clusters, which are:

KB = {{a2, a3, a4, a5, a6}}

Now, to solve inconsistencies we need to consider those sets such that the intersection with all kernels included in the clusters is not empty, using ΣT ! instead of ΣT when doing this. Once again, we analyze the possible incisions (the sets respecting all conditions in Deﬁnition 13) in the light of the number of atoms deleted and the plausibility of the formulas in them. The diﬀerent possible incisions for the cluster are:

- To remove {a2, a3}.

- To remove {a2, a3, a6}.

- To remove {a2, a5, a6}.

- To remove {a4, a5, a6}.

Once again, each of the sets presented is such that its removal induce an operator that satisﬁes the postulates, and thus is captured by our framework. Nonetheless, as explained before for this example we will choose to remove as few atoms as possible. That is, we choose to remove {a2, a3}), and so we have

ϱ({{a2, a3, a4, a5, a6}}) = {a2, a3})

Then, using a Datalog ontology consolidation operator based on the contraction of clusters like the one introduced in Deﬁnition 14 we can obtain the coherent and consistent ontology shown in Figure 2.

Deagustini, Martinez, Falappa & Simari

D! : {boss(walter), makes decisions(jesse), supervises(skyler, walter), employee(walter), in charge of (jesse, distribution), in charge of (walter, cooking), on strike(mike)}

ΣNC ! : {follows orders(X ) makes decisions(X ) , supervises(Y , X ) supervisor(X ) , absent(X ) on strike(X ) }

ΣE ! : {in charge of (X , Y ) in charge of (X , Y ) Y = Y }

ΣT ! : {employee(X ) is supervised(X ), is supervised(X ) follows orders(X ), boss(X ) makes proﬁt(X ), supervises(Y , X ) supervisor(Y ), supervises(Y , X ) employee(X ), is supervised(X ) has work(X ), has work(X ) get paid(X ), has work(X ) Y in charge of (X , Y )}

Figure 2: The ontology resulting from the consolidation.

7. Related Work

The most closely related work to ours is the work by Croitoru and Rodriguez (2015). In that work the authors present consolidation operators that are used as the basis for the deﬁnition of semantics for inconsistency tolerant ontology query answering for Datalog+ (a more expressive language than Datalog , Cal ı et al., 2012). As for the case with our work, the work by Croitoru and Rodriguez (2015) is based on the use of Hansson s incision functions (Hansson, 1994) to solve conﬂicts. Nevertheless, there are some remarkable diﬀerences between the works as well. Among the most important ones is that the operators presented by Croitoru and Rodriguez only deal with inconsistent ontologies, but no acknowledgment of the incoherence problem is made. As we have shown through this work, this can have a signiﬁcant impact in the quality of the consolidation when analysed with respect to minimal loss of information. Moreover, this fact makes that, even though the set of postulates in both works are similar in spirit, the family of operators characterized by Croitoru and Rodriguez is a subset of the ones characterized here. This is due to the fact that the setting that we consider here (i.e., both inconsistent and incoherent ontologies) is more general, since for instance our operators remove not only facts but also TGDs, which Croitoru and Rodriguez s operators do not since they only focus on inconsistency. Another closely related work is the one by Lukasiewicz et al. (2012). There, the authors deﬁne a general framework for inconsistency-tolerant query answering in Datalog

ontologies based on the notion of incision functions. Nevertheless, their work is focused on enforcing consistency at query time obtaining (lazy) consistent answers from an inconsistent ontology instead of consolidating one. Clearly, such process must be carried on for every query posed to the system, while with our approach we obtain a new knowledge base

Datalog Ontology Consolidation

in an oﬄine manner, and such knowledge base can be queried without considering inconsistency issues; both approaches can prove useful, depending on the application domain. Additionally, as only one KB is used the rational assumption that there are no conﬂicts in the constraints in Σ is also made, and therefore there are no notions of unsatisﬁability and incoherence. As stated before, in order to gain generality we have chosen to drop that assumption, and treat incoherence problems as well as inconsistency ones. In addition to the works by Croitoru and Rodriguez and Lukasiewicz et al., there are several other works that solve inconsistency or incoherence by means of adapting approaches based on Belief Revision techniques in other knowledge representation formalism.

7.1 Propositional Knowledge Bases

There are numerous works in revision and merging of propositional knowledge bases (see, for instance, Konieczny & P erez, 2002; Katsuno & Mendelzon, 1992; Lin & Mendelzon, 1999; Liberatore & Schaerf, 1998; Everaere, Konieczny, & Marquis, 2008; Konieczny & P erez, 2011; Delgrande, Dubois, & Lang, 2006; Booth, Meyer, Varzinczak, & Wassermann, 2010; Delgrande, 2011; Delgrande & Jin, 2012; Falappa, Kern-Isberner, Reis, & Simari, 2012), which had provided the foundations to further work on (fragments of) ﬁrst order logics. As expected, those works have deep connections with ours, but also has some remarkable diﬀerences, as we shall see. As we have mentioned throughout the paper, the work by Sven Ove Hansson (1994) provides the inspiration and foundations for our work: we follow an approach akin to Kernel Contraction and several intuitions from it, adapted to an ontological language, Datalog . As a consequence, besides treating incoherence we also provide a complete inconsistency resolution process which takes advantage of the ontological setting, exploiting the relation between the components of the ontology to deﬁne how coherence and consistency should be restored. Also, the classic incision functions introduced by Hansson produce their incision over minimal conﬂicts. In our approach, however, we work over clusters, which are groupings of kernels, and thus they are not always minimal. Then, we propose a particularization over Hansson s incision functions, focusing on those incision functions that successfully work over clusters. Konieczny and Pino-P erez (2002) made one of the main contributions on the merging of conﬂicting information. In our work we follow some of the intuitions proposed by them. Nevertheless, the main diﬀerence between their approach and ours (besides the obvious one on the aims of the works, merging vs. consolidation) is that they state that the ﬁnal merging will be consistent only in presence of a consistent (or, in our terminology, coherent) set of integrity constraints, and they do not analyze the alternative case. With respect to the work by Lin and Mendelzon (1999), besides the diﬀerence in the focus (once again merging vs. consolidation), the main diﬀerence in the inconsistency management strategy chosen with our work is that their conﬂict solving strategy relies on the votes of the majority to establish which formulas is retained in the merging. Instead, we have chosen not to introduce a particular strategy. Nevertheless, it is possible to adapt our framework to the use of preference relations to choose between possible incisions (in a similar way to what we have shown in Example 12). Such relations can indeed be designed to comply with the majority intuition (providing that we have the votes, which does not

Deagustini, Martinez, Falappa & Simari

apply to an ontology consolidation environment since there is only one ontology), thus obtaining a similar strategy. In the work by Katsuno and Mendelzon (1992) the problem of knowledge base revision for the propositional case is addressed. As in our approach, the same language is used to express both the facts about the world and the constraints imposed to them in some KB. Nevertheless, once again the diﬀerence between the (in this case) update of a KB and the consolidation of a KB arises in the treatment of the integrity constraints: in their work integrity constraints are considered invariant and the updates to restore consistency are restricted to facts. In they works by Delgrande (2011), Delgrande and Jin (2012) the authors present an approach for revising a propositional knowledge base by a set of sentences, where every sentence in the set can be independently accepted but there can be inconsistencies when considering the whole set. The main idea follows from the AGM theory, but diﬀers in that, it is necessary to alter the Success postulate so it suits the intuition that not every sentence in the set have to be in the ﬁnal revision (since this set can be inconsistent). Guided by the principle of informational economy, they characterize the revision as the most plausible worlds among the various maximally consistent subsets of the set of sentences. In a parallel with our Datalog ontology environment, this is as revising the D component in the ontology to solve inconsistencies. Being the set of sentences inconsistent, the union of it with the original KB will be inconsistent. Nonetheless, there is an important diﬀerence between these works and ours. In those works the authors ﬁrst solve inconsistencies in the set of sentences, so they can decide which subset of it will characterize the revision. Our approach is diﬀerent, as we directly consider an inconsistent KB. Then, in order to solve the same problem in our setting, it is necessary to consider the union of the KB with the entire set of sentences, and then apply a consolidation operator.

7.2 Knowledge Expressed in Description Logics Ontologies, Logic Programs and Relational Databases

We now focus on other knowledge representation formalisms that are more closely related to Datalog , mainly in the family of Description Logics (Baader, Calvanese, Mc Guinness, Nardi, & Patel-Schneider, 2003) and Logic Programming (Lloyd, 1987; Nilsson & Maluszynski, 1995; Gelfond, 2008). A remarkable work using belief revision to solve conﬂicts in DLs is the one by Qi, Liu, and Bell (2006), which is based on the AGM theory (Alchourr on et al., 1985; G ardenfors, 1988). What makes this work stand out is that they do not only introduce the generalizations of the AGM postulates to the case of DLs, but also deﬁne two operators on knowledge bases, based on formulas weakening, that satisfy such postulates. The main diﬀerence with our approach is that they only take into account consistency problems in the ontologies, and no incoherence treatment is provided. As we pointed out earlier, incoherence can lead to extreme a weakening of the information, where they may have to take out every individual name from some general concept inclusion. As we have previously mentioned, our notion of incoherence was inspired by Schlobach and Cornet s work (2003), among others. In that paper the authors focus on the deﬁnition of processes capable of detecting unsatisﬁabilities and incoherences in DLs ontologies, introducing complete algorithms along with an empirical analysis of the approach. Nevertheless,

Datalog Ontology Consolidation

as it is not in the main focus of their work, the authors set aside the issue of how to recover coherence once a conﬂict has been detected, and also do not consider inconsistencies. In our work we presented a consolidation process that treats both incoherence and inconsistency, based on the use of Belief Revision techniques. Thus, the approach presented by Schlobach and Cornet could potentially be useful regarding the implementation of the operators presented in this work, providing an eﬀective way of obtaining the set of kernels in which the set of clusters will be based. Black et al. (2009) propose an approach that is capable of using information coming from several DL ontologies in order to answer queries, taking care in the process of both incoherence and inconsistency. Their approach is based on agents with argumentative capabilities, each one with a personal knowledge base in the form of a DL ontology. These agents use dialogue games to interchange arguments until they reach an agreement about the answer to a certain query. Thus, the agents can use the (possible incoherent/inconsistent) union of the ontologies without merging them, and still obtain an answer inﬂuenced by every ontology in play. Moreover, this approach has the advantage that no information is lost, as no formula is deleted from the ontologies, and as a result the inferences obtained by this approach are a superset of those that can be obtained in the ontology resulting from the consolidation of the union of DL ontologies. Even though the authors argue that one advantage of the proposed approach is that they do not need to waste time and eﬀort in performing the consolidation of the KB, one disadvantage is the computational complexity associated with argumentative reasoning (Parsons, Wooldridge, & Amgoud, 2003; Dunne & Wooldridge, 2009; Cecchi, Fillottrani, & Simari, 2006) as this process has to be conducted for each query issued and in an online manner. Even though a consolidation process can also be computationally expensive, it is only necessary to perform it once and it can be done oﬄine before the query answering system becomes available. The choice of one approach over the other depends highly on the environment in which they will be used, i.e., on the size of the ontologies that will be used, how often updates are issued over the KB or how critical time consumption is for the system, among other considerations; of course the set of inferences that can be obtained from every approach may diﬀer and this should also be taken into account. A consolidation-based approach could be more suitable for time-dependant systems like real-time systems or query intensive systems where the data tractability associated with (a consolidated) Datalog ontology may be proven handy. Another work worth mentioning is that by Kalyanpur, Parsia, Horridge, and Sirin s (2007). This work verses on how to ﬁnd all justiﬁcations of entailments over a Description Logics ontology. A justiﬁcation is simply the precise set of axioms in an ontology responsible for a particular entailment (Kalyanpur et al., 2007). In other words, it is a minimal set of axioms suﬃcient to produce an entailment, which is related to our use of kernels as a mean to obtain clusters as part of the consolidation strategy used. Moreover, Horridge, Parsia, and Sattler (2009) state that justiﬁcations are important for repairing inconsistent ontologies. Thus, they could be important for the deﬁnition of consolidation processes similar to our cluster-based consolidation, as If at least one of the axioms in each of the justiﬁcations for an entailment is removed from the ontology, then the corresponding entailment no longer holds. (Kalyanpur et al., 2007, p. 269). One of the main contributions of that work is the deﬁnition of practical black-box (i.e.,, reasoner independent) techniques that allows us to ﬁnd justiﬁcations for entailments in the ontology in an eﬃcient way. As such, is evident

Deagustini, Martinez, Falappa & Simari

that while our work verses in a diﬀerent direction we still can beneﬁt from those ﬁndings. In particular, it may be possible to use the developed algorithms as part of our implementation strategy for our consolidation operators, adapting them to be used in Datalog and our dual incoherence/inconsistency setting. Regarding Logic Programming, there are also several works that address the problem of merging knowledge bases expressed as logic programs, solving inconsistency issues in the process. For instance, Hu e, Papini, and W urbel (2009) introduce a merging process based on stable model semantics, using the logic of Here-and-There (Turner, 2003). Hu e et al. consider the merging strategy based on pre-orders among deletion candidates called potential removed sets and they do not establish any particular way to obtain these preorders. Instead, they assume that for any strategy P there is a given pre-order that deﬁnes P. As for the case with Lin and Mendelzon s work (1999), although it falls out of the scope of the present work we certainly can adapt our framework to use similar techniques when choosing which incision prevails. Another notorious work in the Logic Programming ﬁeld is the one by Delgrande, Schaub, Tompits, and Woltran (2009). In that work two diﬀerent approaches are proposed. The ﬁrst one follows an arbitration approach, selecting the models of a program that diﬀers the least w.r.t. the models of the other programs. In this work the case of unsatisﬁable programs is studied, similar to the way we consider incoherence leaded by unsatisﬁable sets of TGDs. Nevertheless, they consider unsatisﬁability of a certain program, and not of some concept in the union of the programs. Furthermore, the strategy to solve unsatisﬁability is simply leaving the unsatisﬁable program out of consideration for the merging, instead of trying to solve the conﬂict somehow. The second approach is based on the selection of the models of a special program P0, which can be thought as the constraints guiding the merging process, that has the least variations w.r.t. the programs for the merging. This approach can be seen as a particular instance of the approach proposed by Konieczny and P erez (2002). In the area of databases, one of the most inﬂuential works is the one by Arenas et al. (1999) on Consistent Query Answering, there the authors propose a model theoretic deﬁnition of consistent answers to a query in a relational database potentially inconsistent with a set of integrity constraints. Intuitively, the consistent answers to a query is the set of atoms that are (classical) answers to the query in every repair of the inconsistent database; a repair is a set of atoms that satisfy the set of constraints and is as close as possible to the original database. Diﬀerent notions of repairs have been studied in the literature, as well as diﬀerent notions of what it means for a set of atoms to be as close as possible to the original database. Most of the proposals are based on repairing by inserting and/or deleting tuples to/from the database (actually, the possible actions depend on the form of the integrity constraints and their expressiveness) and the notion of closeness is deﬁned via set inclusion or cardinality. The work by Arieli, Denecker, and Bruynooghe (2007), however, proposes a uniform framework for representing and implementing diﬀerent approaches to database repairing based on minimizing domain dependent distances. The main idea of that work is to show how thinking in terms of (diﬀerent) distances to express preferences among repairs leads to diﬀerent preferences that can be applied in diﬀerent scenarios. The authors show that the set of repairs obtained using the proposed distance functions deviate from those that can be obtained using set-inclusion. Furthermore, besides insertion and deletion of entire tuples there are other several domain independent approaches, e.g., based on cardinality

Datalog Ontology Consolidation

or more complex objective functions. In the approach proposed by Wijsen (2005) updates are considered a primitive in the theoretical framework; Bohannon et al. (2005) present a cost-based framework that allows ﬁnding good repairs for databases that exhibit inconsistencies in the form of violations to either functional or inclusion dependencies, allowing also updates to attribute values. In that work, two heuristics are deﬁning for constructing repairs both based on equivalence classes of attribute values; the algorithms presented are based on greedy selection of least repair cost, and a number of performance optimizations are also explored. A quite diﬀerent semantics for repairing is proposed by Caroprese, Greco, and Zumpano (2009), Caroprese and Truszczynski (2011) through Active Integrity Constraints (AICs for short); an AIC is a production rule where the body is a conjunction of literals, which should be false for the database to be consistent, whereas the head is a disjunction of update atoms that have to be performed if the body is true (that is the constraint is violated). Repairs are then deﬁned as minimal sets (under set inclusion) of update actions (tuple deletions/insertions) and AICs specify the set of update actions that are used to restore data consistency. Hence, among the set of all possible repairs, only the subset of founded repairs consisting of update actions supported by AICs is considered. Other works in this area propose diﬀerent semantics for repairing by either explicitly or implicitly considering a preference relation among the set of repairs (cf. Andritsos, Fuxman, & Miller, 2006; Staworko, Chomicki, & Marcinkowski, 2012; Greco & Molinaro, 2012). More recently, in the area of ontology-based data access (OBDA), Lembo et al. (2010) study the adaptation of CQA for DL-Lite ontologies, called AR (ABox semantics). In that work, also the intersection (IAR) semantics is presented as a sound approximation of consistent answers; this semantics consists of computing the intersection of all repairs and answers are obtained from there, though (possibly many) AR answers cannot be obtained from under the IAR semantics, the latter are computationally easy to obtain for the DL-Lite family, i.e., it is not necessary to compute the whole set of repairs in order to compute their intersection. The data and combined complexity of these and other semantics were studied (Rosati, 2011) for a wider spectrum of DLs. Also, Rosati (2011) presents intractability results for query answering for EL under the intersection semantics, and a non-recursive segment of that language was proved to be computable in polynomial time. More recently, Bienvenu and Rosati (2013) propose another family of approximations to CQA, also for the DL-Lite family. The k-support semantics allows to (soundly) approximate the set of queries entailed under the CQA semantics, based on k subsets of the database that consistently entail q; on the other hand, the k-defeater semantics approximates complete approximations seeking sets that contradict the supporters for q. Both semantics are FO-rewritable for any ontological language for which standard CQ answering is FO-rewritable as well, and can be used in conjunction to overand under-approximate consistent answers. Much like Black et al. (2009), the treatment of inconsistencies proposed by all these semantics is related to particular queries instead of the inconsistency of the whole database. Thus, they do not attempt to obtain a ﬁnal consistent database that can be queried without considering restrictions. Furthermore, they do not address the issues of incoherence and inconsistency together; instead most of the approaches either assume that the set of integrity constraint correctly deﬁnes the semantics of the database instance, so there is no room for incoherence, or they treat constraints and data alike at the moment of removing or ignoring information, which leads to the type of problems that we discuss in Example 10. While

Deagustini, Martinez, Falappa & Simari

these techniques may be suitable for the case of one single database, in the presence of incoherence in the set of ICs, as can be the case when we consider several databases together, this approach would lead to meaningless empty answers, since no subset of the database could satisfy the constraints as would also be the case for the approach by Lukasiewicz et al. (2012). Also related to the databases ﬁeld is the work by Lin and Mendelzon (1998). There, a database is viewed as a ﬁrst-order theory without rules, and ICs are used to ensure consistency in the ﬁnal result as in the work by Konieczny and P erez (2002), presenting ways to solve known database merging problems like synonyms and homonyms. Nonetheless, like Konieczny and Pino-P erez s work, they do not consider problems related to the set of ICs. Instead, the set of ICs used in the merging process is unique, and the choice of such set is expected to be performed by a merge designer. Unlike Lin and Mendelzon, we do not made an assumption for our consolidation environment on the set of ICs being conﬂict-free. Cholvy (1998) introduces another approach that can be used to reason with contradictory information. The framework is represented through a set of axioms and inference rules. Additionally, in the paper several applications for the framework are introduced, e.g., the solving of conﬂicts among beliefs represented by ﬁrst order databases, where facts are ground literals and there are rules that can be integrity constraints or deduction rules. In that scenario, contradiction is obtained by the application of the constraints when considering several databases together. This establishes a certain parallel with the case of inconsistency in a Datalog ontology. However, the main diﬀerence with our work lies in how the strategy for the inconsistency management process is deﬁned. In that work, a preference order between databases is assumed. Instead, we have chosen not to restrict how to achieve the consolidation, thus presenting a general approach. Nevertheless, as stated before we can adapt incision functions to suit the intuition that no every formula is equally desirable, choosing for instance preferences between ontologies as a guideline (if we are using the approach for other tasks rather than consolidation of a single ontology), obtaining an inconsistency management strategy akin to the one introduced by Cholvy. Finally, Meyer, Lee, and Booth (2005) use two well-known techniques for knowledge integration for the propositional case, adapted and reﬁned to the expressiveness of DLs. The proposed approach takes the knowledge bases and produces a disjunctive knowledge base (DKB) as the result of the integration. One disadvantage of DKBs is that they state the possible options we can take when the conﬂicting knowledge is expected to be exploited by a reasoning process rather than choosing one of them. Thus, contrary to our approach where a ﬁnal consolidated ontology is given, in theirs there is no deﬁnitive ﬁnal merging; moreover, they set aside for further research problems related to incoherence in the integration process.

8. Conclusions and Future Work

Collaborative work and information exchange are becoming key aspects of almost any system; thus, it is of the uttermost importance to have automatic and adequate ways to solve conﬂicts: as knowledge evolves in a collaborative environment incoherence and inconsistency are prone to arise. This knowledge is often represented by ontologies that can be collaboratively built, and often shared among entities that use and modify them. One particular way to deal with the conﬂicts that can appear in such application environments is

Datalog Ontology Consolidation

to try to modify the information contained in the ontology in order to regain coherence and consistency. In this paper we have shown how to achieve the consolidation of Datalog

ontologies. We introduced the concept of incoherence in Datalog ontologies in terms of unsatisﬁability of sets of TGDs, and showed the relationship with the classical notion of inconsistency of a logical theory that lacks models. We also proposed a construction for consolidation operators. The construction is inspired by kernel contraction, and uses incision functions on groupings of minimal unsatisﬁable/inconsistent sets called clusters to solve conﬂicts. Finally, we stated the properties that the Datalog ontology consolidation operator is expected to satisfy. We showed that our operators satisfy the respective properties, obtaining as the result of the consolidation a new Datalog ontology that is always coherent and consistent while minimizing the changes that are made in the conﬂict resolution. As a ﬁnal remark, notice that these operators take care of all incoherences in the ontology. However, there are rare cases when the ontology designer introduce some unsatisﬁable concepts in an ontology on purpose, to model some particular feature of the application domain. If that is the case then we should not remove the incoherence, and rather we have to delete the atoms triggering it, if any. Clearly, since they were not deﬁned with that setting in mind this behavior cannot be achieved by the operators presented here. Nevertheless, to modify our present approach to suit such setting is almost straightforward, provide that we can identify whether or not some unsatisﬁable set of TGDs is made on purpose or not. As for future work, we intend to study new constructions for Datalog consolidation operators. To do this, ﬁrst we plan to change the general approach, i.e., operators based on formalisms other than kernel contraction, mainly from the AGM theory (Alchourr on & Makinson, 1985; Alchourr on et al., 1985); and then, while the proposed framework for cluster contraction based consolidation operators is fully constructive, depending on the application domain it may certainly be diﬃcult to asses how to eﬀect the incisions, i.e., it may be hard to decide among the family of possible incisions which one to select. From a design point of view, it may be easier to select how to perform the consolidation if we have some additional information about the formulas in the knowledge base, such as a preference relation that can, for example, be elicited from domain experts. In general, it could be easier for an expert to provide guidelines and information about the application domain at hand that could be then modeled into a preference relation on the formulas in an ontology rather than trying to single out the desired incisions. In this direction we want to explore constructions based on exploiting preference relations among the formulas in the ontologies to deﬁne diﬀerent strategies to choose which formulas to delete, possibly tailored for particular scenarios. Mainly, we plan to analyze two diﬀerent aspects: the relation between these operators based on preference relations with respect to the ones presented in this work, and how diﬀerent strategies aﬀect their behavior. Also, in this work we make a point in diﬀerentiating the concept of inconsistency from that of incoherence; therefore, we need to focus on languages that separate extensional from intensional knowledge, otherwise the two notions are indistinguishable (as it is the case in propositional logic). In that sense, the choice of Datalog is due to its desirable property of generalizing several popular languages such as classical Datalog, DL-Lite, ELH, F-Logic-Lite, etc. Even though in this paper we do not perform a particular analysis of the eﬀects of nulls in the proposed solutions to consolidation, the Datalog family of languages

Deagustini, Martinez, Falappa & Simari

was chosen because it oﬀers a wide variety of languages with high computational tractability (some are FO rewritable and others have PTIME inference algorithms). The results in this work pave the way to continue the research line into the next natural step, which is to show how (or whether) the diﬀerent syntactic and semantic properties that yield tractability for query answering allow us to obtain tractability results also for the consolidation problem, much in the same way as it has happened already in the area of consistent query answering (where only repairs over the extensional part of the KB are considered). It is, for example, in the rewriting algorithms where the capability of value invention plays an important role: the value invention process should be controlled (in general with syntactic restrictions) in order to keep a low complexity for the reasoning tasks. With this in mind, in the future we will further look into the role of processes like value invention in the consolidation of Datalog

ontologies, and their impact both on how conﬂicts should be solved and computational eﬃciency.

We are currently working on the implementation of our operators; we plan to study different techniques that can be used in order to produce an eﬃcient implementation, possibly tailored for speciﬁc fragments of Datalog . As explained before, the algorithms introduced by Schlobach and Cornet (2003) can be proven useful regarding this aspect since they may provide a way to calculate the kernels in a Datalog ontology, thus providing the ﬁrst step towards incoherence resolution. Another important work regarding the implementation of our consolidation operators is the one by Wassermann (2000), where the author shows that the minimal incision functions of a knowledge base can be obtained from the kernels of a KB by using any algorithm for ﬁnding minimal hitting sets (Reiter, 1987). Several works in the area of ontology debugging and repairs, (e.g., Halaschek-Wiener & Katz, 2006, or Horridge et al., 2009 as a way to ﬁnd the justiﬁcations for an inconsistency) have exploited Reiter s algorithms in order to implement their frameworks. Among others, we plan to study how adequate this techniques are for our operators, as there is an almost direct relation between minimal incision functions and Reiter s minimal hitting sets; in this way, it may be possible to adapt Reiter techniques to attend incoherences and inconsistencies, moreover, as we already discussed, we plan to analyze the relation between cluster incision functions and preference relations. Regarding implementation, we hold the conjecture that such relations can be exploited to further reﬁne the implementation of the operators: Reiter s algorithm is based on the expansion of a directed acyclic graph, and such expansion is made in a breadth ﬁrst fashion, which in the end generates all possible values for minimal incision functions. As acknowledged by Wassermann, if some kind of ordering among the formulas is present, this ordering can be used to choose which branch to expand; in other words, not only it may be possible to implement the construction for operators proposed in this work by means of exploiting Reiter s hitting sets algorithm, but also we can use the preference relation equivalent to the incision (if any) to guide the consolidation process. That is, it may be possible to adapt the algorithm so it chooses to expand the branch that has the less preferred set of formulas, thus guiding the graph expansion process.

Appendix A. Proofs

Proof for Proposition 1

Datalog Ontology Consolidation

Proof Consider some U ΣT such that U is an unsatisﬁable set of dependencies w.r.t. ΣNC ΣE, and A D a set of atoms relevant to U. It follows from the deﬁnition of satisﬁability of a set of dependencies w.r.t. a set of constraints that if U is unsatisﬁable then there does not exist a relevant set of atoms A

that makes mods(A , U ΣE ΣNC ) = , because otherwise U is satisﬁable. Then, mods(A, U ΣE ΣNC ) = . Moreover, since A D and U ΣT we have that chase(A, U) chase(D, ΣT ), and thus any NC or EGD that is violated in chase(A, U) is also violated in chase(D, ΣT ). Thus, mods(D, ΣT ΣE ΣNC ) = , i.e., KB is inconsistent.

Proof for Lemma 1

Proof Let KB1 = (D1, Σ1) with Σ1 = Σ1T Σ1E Σ1NC , and KB2 = ( , Σ2) with Σ2 = Σ2T Σ2E Σ2NC be two Datalog ontologies such that Σ1 = Σ2, Q

KB1 be the dependency kernels of KB1, and Q

KB2 be the dependency kernels of KB2, respectively. Consider any X Q

KB1 . Then, by Deﬁnition 6 we have that X Σ1T is an unsatisﬁable set of dependencies w.r.t. Σ1E Σ1NC and every X X is satisﬁable w.r.t. Σ1E Σ1NC . Since Σ1 = Σ2, then Σ1T = Σ2T , Σ1E = Σ2E and Σ1NC = Σ2NC , and thus it holds that X Σ2T is an unsatisﬁable set of dependencies w.r.t. Σ2E Σ2NC and every X X is satisﬁable w.r.t. Σ2E Σ2NC . Then, by Deﬁnition 6 we have that X Q

KB2 , and since this holds for any arbitrary kernel in Q

KB1 we have that Q

Proof for Proposition 2

Proof We will focus on the case of dependency clusters, omitting the proof for data clusters, as they are analogous to each other. Consider any arbitrary Y Q

) We begin by showing that if a kernel is part of a cluster then it is not part of any other cluster, i.e., if Y X for some X QQ

KB then Y X for all X QQ

KB such that X = X . This is obtained directly from the deﬁnition of clusters: we have that X = S Y [κ] Y where [κ] is an equivalence class in the equivalence relation θ obtained from Q

KB. Then, clearly for Y we have that if Y X then Y [κ]. Therefore, since by deﬁnition two equivalence classes are either equal or disjoint then it holds that Y / [κ ] for all [κ ]. Let X = S Y [κ ] Y . Then it holds that X = X and that Y X . Since this holds for any arbitrary equivalence class [κ ] then it holds that if Y X for some X QQ

KB then Y X

for all X QQ

KB such that X = X .

) Now we show that there not exist any kernel that does not belong to a cluster, i.e., if Y X for all X QQ

KB such that X = X then Y X for X QQ

KB. Again, this arise from the use of equivalence classes in Deﬁnitions 9 and 10. If Y X for all X QQ

KB such that X = X , then it holds that Y / [κ ] for all [κ ] = [κ]. So, since equivalence classes form a partition it must holds that Y [κ]. Therefore, as X = S Y [κ] Y we have that Y X

for all X QQ

KB such that X = X then Y X.

Proof for Corollary 1

Deagustini, Martinez, Falappa & Simari

Proof Consider α Y for some Y Q

KB. By Proposition 2 we have that Y X for some X QQ

KB if and only if Y X for all X QQ

KB such that X = X . Thus, we have that α X for some X QQ

KB if and only if α / X for all X QQ

KB such that X = X . Analogously, we can show that β Y for some Y

KB is such that β X for some X

KB if and only if β / X for all X

KB such that X = X .

Proof for Lemma 2

Proof Consider any X Q

KB1 . Then, X is a minimal unsatisﬁable set of TGDs w.r.t. Σ1NC Σ1E. Since KB1 = KB2, then it holds that X KB2, Σ1E = Σ2E, Σ1NC = Σ2NC and X is an unsatisﬁable set of TGDs w.r.t. Σ2NC Σ2E. Also, there does not exist X X such that X is an unsatisﬁable set of TGDs w.r.t. Σ2NC Σ2E, since otherwise we would contradict our hypothesis that X Q

KB1 , as Σ1NC = Σ2NC and Σ1E = Σ2E. Then, X Q

KB2 ; and since this holds for any arbitrary X Q

KB1 , then we have that Q

KB2 . Consider any arbitrary X, Y Q

KB1 such that XθY . Since Q

KB2 , then X, Y Q

KB2 . Thus, θ Q

KB1 is equivalent to θ Q

KB2 , and then QQ

Likewise, consider any arbitrary X , Y

KB1 such that X θY . Since

KB2 , then X , Y

KB2 . Therefore, θ

KB1 is equivalent to θ

KB2 , and thus

Proof for Proposition 3

Proof Consider α D and X Q

KB such that α is relevant to X. From Deﬁnition 6 we have that X is unsatisﬁable w.r.t. N ΣE ΣNC , and then from Deﬁnition 4 and the fact that α is relevant to X we have that mods({α}, X N) = (1). Also, since {α} is singleton then the only A {α} is A = , and clearly mods( , X N) = (2). Then, from (1), (2) and Deﬁnition 7 it follows that {α}

KB. Also, from Deﬁnition 9 we have that {α}

KB, since {α} cannot overlap with any other kernel, being singleton. Consider the incision over {α}. From Deﬁnition 11 it follows that δ(KB) {α} = . Then, we have that δ(KB) {α} = α, and thus α δ(KB).

Proof for Corollary 2

Proof Consider any arbitrary α D. Since α is relevant to some X Q

KB, then by Proposition 3 it holds that α δ(KB). Thus, since this holds for any arbitrary α D we have that D δ(KB).

Proof for Theorem 1

Proof Let KB1 = (D1, Σ1) and KB2 = (D2, Σ2) be two Datalog ontologies such that KB1 = KB2.

) Construction to postulates

Consider an operator ! deﬁned as in Deﬁnition 14; we have to prove that ! satisﬁes every postulate in Theorem 1. Let KB1! = (D1!, Σ1!) and KB2! = (D2!, Σ2!) be the two Datalog

ontologies resulting from the consolidation of KB1 and KB2 by means of !, respectively. Furthermore, let KB 1 = (D1, Σ1 \ ρ(KB1)) and KB 2 = (D2, Σ2 \ ρ(KB2)) be the ontology resulting from removing the TGDs selected by ρ from KB1 and KB2. Let Q

Datalog Ontology Consolidation

be the set of dependency and data kernels for KB1 and KB 1 respectively, Q

KB 2 be the sets of dependency and data kernels for KB2 and KB 2. Finally, let QQ

KB 1 be the set of dependency and data clusters for KB1 and KB 1 respectively, QQ

KB 2 be the sets of dependency and data clusters for KB2 and KB 2.

Inclusion: Σ1! Σ1 and D1! D1.

By deﬁnition of KB1! we have that D1! = D1 \ ϱ(KB 1), and thus D1! D1.

In a similar way, by deﬁnition of KB1! we have that Σ1! = Σ1 \ ρ(KB1), and thus Σ1! Σ1.

Coherence: KB1! is coherent.

To prove that KB1! is coherent we have to show that ΣT Σ1! is satisﬁable for ΣE ΣNC Σ1!. To do this it is suﬃcient to show that all minimal conﬂicts are attended to by the operator, i.e., that no dependency kernel is included in Σ1!,

Consider any arbitrary X Q

KB1 . From Proposition 2 we have that there exists Y QQ

KB1 such that X Y . By deﬁnition of ρ it holds that for all Y QQ

KB1 and X Q

KB1 where X Y it holds that (ρ(KB1) X) = . Then, there exists some α X such that α (ρ(KB1) X), and thus α / Σ1!. Therefore, X Σ1!, i.e., the conﬂict was solved. Since this holds for any arbitrary X Q

KB1 then every unsatisﬁable set in Σ1 is not included in Σ1!, and thus ΣT Σ1! is satisﬁable for ΣE ΣNC Σ1!, i.e., KB1! is coherent.

Consistency: Proof is analogous to that for Coherence.

Minimality: If KB KB1 is coherent and consistent, then it holds that KB1! KB .

Let KB KB1 be such that KB is coherent and consistent, and let CFΣ1 = Σ1 \ S(QQ

KB) and CFD1 = D1 \ S(

KB) be the set of formulas that do not belong to any kernel in Σ1 and D1, respectively.

Suppose by reductio that KB1! KB . By deﬁnition of KB1! we have that ρ(KB1) S(QQ

KB) , and that ϱ(KB1) S(

KB). Then, CFΣ1 KB1! and CFD1 KB1!. Therefore, we have that CFΣ1 KB and that CFD1 KB .

Then, since KB1! KB there must exist α QQ

KB such that α KB but α / KB1!, while KB is coherent and consistent all the same. That is, there exists a dependency cluster or a data cluster where the removal is not optimal, since α could be included in the consolidation. For the rest of the proof and for simplicity reasons, we consider the case where α belongs to a dependency cluster. This is made without loss of generality, since the proof for the case where α is included in a data cluster is analogous to the one presented here.

Let us consider then α QQ

KB such that α KB . By Corollary 1 we have that α X where X QQ

KB. Let T = (X ρ(KB)) be the incision performed over the cluster, and let R = (X {KB \ KB }) be those formulas removed from X when obtaining KB . Clearly, since KB is coherent then for all Y X where Y Q

Deagustini, Martinez, Falappa & Simari

holds that R Y = , because otherwise Y KB , which will make KB incoherent. Besides, since R Y then R QQ

KB, and thus R satisﬁes the ﬁrst two conditions in Deﬁnition 12.

By Deﬁnition 12 we have that T is such that there not exists a set of TGDs that satisﬁes the ﬁrst two conditions in the deﬁnition and at the same time it holds that (1) T R.

Since α / KB1! and α X then α ρ(KB), and thus α T. However, we know that α X and α KB , and thus α / R. Therefore we have that (2) T R.

From (1) and (2) we have that T R and that T R, an absurd coming from our original assumption that KB1! KB , and it holds that if KB KB1 is coherent and consistent then KB1! KB .

) Postulates to Construction

For the second part of the proof, consider an operator ! that satisﬁes all postulates in Theorem 1. Let ρ(!)Σ be a function based on ! deﬁned as follows:

ρ(!)Σ(KB1) = {x | x X for some X QQ

KB1 and x / {Σ1 KB1!}}

Let KB 1 = (D1, Σ1 \ ρ(!)Σ(KB1)) be the ontology resulting of removing from KB1 the TGDs selected by ρ(!)Σ. Then, let ϱ(!)D be another function based on ! deﬁned as follows:

ϱ(!)D(KB 1) = {x | x X for some X

KB 1 and x / {D1 KB1!}}

Based on ϱ(!)D and ρ(!)Σ we deﬁne a new operator as follows:

KB1! = (D1 \ ϱ(!)D(KB 1), Σ1 \ ρ(!)Σ(KB1))

We have to show that ! is a Datalog ontology consolidation operator based on Cluster Contraction. To do this, we ﬁrst prove that ϱ(!)D is a well-deﬁned data incision function and that ρ(!)Σ is a well-deﬁned constraint incision function. That is, given ρ(!)Σ we have to prove that:

- ρ(!)Σ is well-deﬁned, i.e., if KB1 = KB2, then ρ(!)Σ(KB1) = ρ(!)Σ(KB2).

By deﬁnition of ρ(!)Σ we have that ρ(!)Σ(KB1) = {x | x X for some X QQ

KB1 and x / Σ1 KB1!}.

Consider any arbitrary x ρ(!)Σ(KB1). Since KB1 = KB2, then by Lemma 2 we have that QQ

KB2 . Since x ρ(!)Σ(KB1), then x X QQ

KB1 , and thus it holds that x X QQ

Besides, since x X QQ

KB1 then x Σ1. Thus, since x / Σ1 KB1!, then x / KB1!. Since KB1 = KB2, from the fact that ! is a function we have that KB1! = KB2!, and then it also holds that x / KB2!. Thus, x / Σ2 KB2!(2).

From (1) and (2) it follows that for any x ρ(!)Σ(KB1) it holds that x {y | y Y for some Y QQ

KB2 and y / Σ2 KB2!}. By deﬁnition of ρ(!)Σ this is ρ(!)Σ(KB2), and thus we have that if KB1 = KB2, then ρ(!)Σ(KB1) = ρ(!)Σ(KB2).

Datalog Ontology Consolidation

- ρ(!)Σ(KB1) S(QQ

This follows directly from the deﬁnition of ρ(!)Σ, since for every x ρ(!)Σ(KB1) it holds that x X for some X QQ

KB1 because of the ﬁrst condition in the deﬁnition.

KB1 are such that Y = and Y X, then (Y ρ(!)Σ(KB1)) = .

Suppose by reductio that there exists some X QQ

KB1 and Y Q

KB1 such that Y = , Y X and (Y T ρ(!)Σ(KB1)) = .

Then, for all α Y it holds that α / ρ(!)Σ(KB1), i.e., α / X QQ

KB1 or α {Σ1 KB1!}. By our hypothesis we have that α Y Q

KB1 and Y X. Thus α X, and therefore it must hold that α {Σ1 KB1!}, and by extension α KB1!.

Since this holds for any arbitrary α Y then we have that Y Σ1!. From Deﬁnition 6 it holds that Y is a minimal unsatisﬁable set of TGDs w.r.t. ΣE ΣNC Σ1. Then, for any relevant set of atoms A it holds that mods(A, Y ΣE ΣNC ) = . Then, since Y Σ1! then for any relevant set A it holds that mods(A , Σ1! ΣE ΣNC ) = , because the TGDs in Y are triggered by A . Then, Σ1! is an unsatisﬁable set of TGDs w.r.t. ΣE ΣNC Σ1.

However, from Coherence we have that KB1! is coherent, and thus Σ1! is satisﬁable w.r.t. ΣE ΣNC Σ1.

Then we have that Σ1! is satisﬁable w.r.t. ΣE ΣNC Σ1 and Σ1! is unsatisﬁable w.r.t. ΣE ΣNC Σ1, an absurd coming from our initial supposition that there exists some X QQ

KB1 and Y Q

KB1 such that Y = , Y X and (Y T ρ(!)Σ(KB1)) = , and it holds that for all X QQ

KB1 and Y Q

KB1 such that Y X, if Y = then (Y T ρ(!)Σ(KB1)) = .

- For all X QQ

KB1 it holds that T = (X ρ(!)Σ(KB1)) is such that there not exists R X where R satisﬁes the two previous conditions and R T.

To prove this is suﬃcient to show that, being clusters disjoint sets, the election in each cluster is optimal, because otherwise if should exists any cluster where the incision function does not choose in an optimal way then Minimality would not be satisﬁed. So, suppose by reductio that there exists X QQ

KB1 where T = (X ρ(!)Σ(KB1)) is such that there does exist R X where R satisﬁes the two previous conditions and R T.

Let us consider KB = (Σ , D ) such that for all Y QQ

KB1 such that Y = X it holds that T = (Y ρ(!)Σ(KB1)) and R = (Y {KB \ KB }) (those formulas removed from Y when obtaining KB ) are such that T = R . Since T = R then R is such that the two conditions in Deﬁnition 12 are satisﬁed. Besides, let CFΣ1 = Σ1 \ QQ

KB and CFD1 = D1 \

KB be the set of formulas that do not belong to any kernel in Σ1 and D1, respectively; and let KB be such that CFΣ1 Σ and CFD1 D .

The fact that every formula that is not in any conﬂict belongs to KB and that KB is built in such a way that the election in each cluster diﬀerent than X is the same both in KB and ρ(!)Σ(KB1) makes that KB \ (X {KB \ KB }) = KB1! \ (X ρ(!)Σ(KB1)). This is, if there is any diﬀerence between KB and KB1! that diﬀerence arise from the election on which formulas to remove from X.

Deagustini, Martinez, Falappa & Simari

Finally, from our supposition we have that there exists R X where R satisﬁes the two previous conditions and R T. Let KB and R T be such that R = (X {KB \KB }) is the set of formulas removed from X when obtaining KB . Then, we have that KB is coherent and consistent, since every conﬂict in clusters in KB1 where solved, whether by removing R (for cluster X) or the sets R (for every cluster diﬀerent than X). Besides, since we have that KB \ (X {KB \ KB }) = KB1! \ (X ρ(!)Σ(KB1)) and that R T, then for KB1! = KB1 \ ρ(!)Σ(KB1) and KB = KB1 \ {S

KB1 \X} R R} where

(R = Y {KB \ KB }) and R = (X {KB \ KB }) it holds that KB1! KB (1). That is, if all formulas that are not involved in conﬂicts belong to both KB1! and KB , in each cluster diﬀerent than X the same formulas are removed, and the set of formulas removed from X to obtain KB are an strict subset of those removed by ρ(!)Σ(KB1) to obtain KB1!, then KB1! is an strict subset of KB , i.e., we have removed more formulas by deleting T than by deleting R.

On the other hand, since KB is coherent and consistent, then by Minimality we have that KB1! KB (2).

Therefore, from (1) and (2) we have that KB1! KB and KB1! KB , and absurd coming from our initial supposition that there exists X QQ

KB1 where T = (X ρ(!)Σ(KB1)) is such that there exists R X where R satisﬁes the two previous conditions and R T, and we have that for all X QQ

KB1 it holds that T = (X ρ(!)Σ(KB1)) is such that there not exists R X where R satisﬁes the two previous conditions and R T. We omit the proof that ϱ(!)D is a well-deﬁned data incision function using Consistency and Minimality since it is analogous to the proof that ρ(!)Σ is a well-deﬁned constraint incision function using Coherence and Minimality. Now that we have shown that ϱ(!)D and ρ(!)Σ are well-deﬁned data incision functions and constraint incision functions, respectively, to conclude this second part of the proof we have to show that ! coincides with !. From Inclusion it follows that D1! D1 and Σ1! Σ1 (1). Also, from our deﬁnition of ϱ(!)D it follows that ϱ(!)D(KB 1) = D1 \ D1!, and from our deﬁnition of ρ(!)Σ it follows that ρ(!)Σ(KB1) = Σ1 \ Σ1! (2). Then, from (1) and (2) we have that D1! = D1 \ ϱ(!)D(KB 1) and Σ1! = Σ1 \ ρ(!)Σ(KB1). Thus, ! = (D1 \ ϱ(!)D(KB 1), Σ1 \ ρ(!)Σ(KB1)), and therefore ! coincides with !.

Alchourr on, C., G ardenfors, P., & Makinson, D. (1985). On the logic of theory change: Partial meet contraction and revision functions. Journal of Symbolic Logic, 50(2), 510 530.

Alchourr on, C., & Makinson, D. (1981). Hierarchies of Regulation and Their Logic. New Studies in Deontic Logic, 125 148.

Alchourr on, C., & Makinson, D. (1985). On the Logic of Theory Change: Safe Contraction. Studia Logica, 44, 405 422.

Amgoud, L., & Kaci, S. (2005). An argumentation framework for merging conﬂicting knowledge bases: The prioritized case. In Proc. of 8th European Conferences on Symbolic

Datalog Ontology Consolidation

and Quantitative Approaches to Reasoning with Uncertainty (ECSQUARU 05), pp. 527 538.

Andritsos, P., Fuxman, A., & Miller, R. J. (2006). Clean answers over dirty databases: A probabilistic approach. In Proc. of 22nd International Conference on Data Engineering (ICDE 06), p. 30.

Arenas, M., Bertossi, L. E., & Chomicki, J. (1999). Consistent query answers in inconsistent databases. In Proc. of 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS 99), pp. 68 79.

Arieli, O., Denecker, M., & Bruynooghe, M. (2007). Distance semantics for database repair. Annals of Mathematics and Artiﬁcial Intelligence, 50(3-4), 389 415.

Baader, F., Brandt, S., & Lutz, C. (2005). Pushing the EL envelope. In Proc. of the 19th International Joint Conference on Artiﬁcial Intelligence (IJCAI 05), pp. 364 369.

Baader, F., Calvanese, D., Mc Guinness, D. L., Nardi, D., & Patel-Schneider, P. F. (Eds.). (2003). The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press.

Baral, C., Kraus, S., & Minker, J. (1991). Combining multiple knowledge bases. Transactions on Knowledge and Data Engineering, 3(2), 208 220.

Bell, D. A., Qi, G., & Liu, W. (2007). Approaches to inconsistency handling in descriptionlogic based ontologies. In Proc. of On the Move to Meaningful Internet Systems (OTM) Workshops (2), pp. 1303 1311.

Beneventano, D., & Bergamaschi, S. (1997). Incoherence and subsumption for recursive views and queries in object-oriented data models. Data and Knowledge Engineering, 21(3), 217 252.

Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The semantic web. Scientiﬁc American, 284(5):3443.

Bienvenu, M., & Rosati, R. (2013). Tractable approximations of consistent query answering for robust ontology-based data access. In Proc. of 23rd International Joint Conference on Artiﬁcial Intelligence (IJCAI 13), pp. 775 781.

Black, E., Hunter, A., & Pan, J. Z. (2009). An argument-based approach to using multiple ontologies. In Proc. of 3rd International Conference on Scalable Uncertainty Management (SUM 09), pp. 68 79.

Bohannon, P., Flaster, M., Fan, W., & Rastogi, R. (2005). A cost-based model and eﬀective heuristic for repairing constraints by value modiﬁcation. In Proc. of 24th ACM SIGMOD International Conference on Management of Data / Principles of Database Systems (PODS 05), pp. 143 154.

Booth, R., Meyer, T. A., Varzinczak, I. J., & Wassermann, R. (2010). Horn belief change: A contraction core. In Proc. of 19th European Conference on Artiﬁcial Intelligence (ECAI 10), pp. 1065 1066.

Borgida, A. (1995). Description logics in data management. Transactions on Knowledge and Data Engineering, 7(5), 671 682.

Deagustini, Martinez, Falappa & Simari

Brandt, S. (2004). Polynomial time reasoning in a description logic with existential restrictions, GCI axioms, and - what else?. In Proc. of 16th European Conference on Artiﬁcial Intelligence (ECAI 04), pp. 298 302.

Cal ı, A., Gottlob, G., & Kifer, M. (2008). Taming the inﬁnite chase: Query answering under expressive relational constraints. In Brewka, G., & Lang, J. (Eds.), Proc. of 11th International Conference on Principles of Knowledge Representation and Reasoning (KR 08), pp. 70 80. AAAI Press.

Cal ı, A., Gottlob, G., & Kifer, M. (2013). Taming the inﬁnite chase: Query answering under expressive relational constraints. Journal of Artiﬁcial Intelligence Research, 48, 115 174.

Cal ı, A., Gottlob, G., & Lukasiewicz, T. (2012). A general Datalog-based framework for tractable query answering over ontologies. Journal of Web Semantic, 14, 57 83.

Cal ı, A., Lembo, D., & Rosati, R. (2003). On the decidability and complexity of query answering over inconsistent and incomplete databases. In Proc. of 22nd ACM SIGMOD Symposium on Principles of database systems (PODS 03), pp. 260 271. ACM.

Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., & Rosati, R. (2005). DL-Lite: Tractable description logics for ontologies. In AAAI, pp. 602 607.

Caroprese, L., Greco, S., & Zumpano, E. (2009). Active integrity constraints for database consistency maintenance. Transactions on Knowledge and Data Engineering, 21(7), 1042 1058.

Caroprese, L., & Truszczynski, M. (2011). Active integrity constraints and revision programming. Theory and Practice of Logic Programming, 11(6), 905 952.

Cecchi, L., Fillottrani, P., & Simari, G. R. (2006). On the Complexity of De LP through Game Semantics. In Dix, J., & Hunter, A. (Eds.), Proc. of 11th International Workshop on Non-Monotonic Reasoning (NMR 06), pp. 386 394.

Cholvy, L. (1998). Reasoning about merged information. In Belief Change, Vol. 3, pp. 233 263. Springer Netherlands.

Croitoru, M., & Rodriguez, R. O. (2015). Using kernel consolidation for query answering in inconsistent OBDA. In Proc. of the Joint Ontology Workshops 2015 Episode 1: The Argentine Winter of Ontology.

Delgrande, J. P. (2011). Revising by an inconsistent set of formulas. In Proc. of 22nd International Joint Conference on Artiﬁcial Intelligence (IJCAI 11), pp. 833 838.

Delgrande, J. P., Dubois, D., & Lang, J. (2006). Iterated revision as prioritized merging. In Proc. of 10th International Conference on Principles of Knowledge Representation and Reasoning (KR 06), pp. 210 220.

Delgrande, J. P., & Jin, Y. (2012). Parallel belief revision: Revising by sets of formulas. Artiﬁcial Intelligence, 176(1), 2223 2245.

Delgrande, J. P., Schaub, T., Tompits, H., & Woltran, S. (2009). Merging logic programs under answer set semantics. In Proc. of 25th International Conference on Logic Programming (ICLP 09), pp. 160 174.

Datalog Ontology Consolidation

Dunne, P., & Wooldridge, M. (2009). Argumentation in Artiﬁcial Intelligence, chap. Complexity of Abstract Argumentation, pp. 85 104. Springer.

Everaere, P., Konieczny, S., & Marquis, P. (2008). Conﬂict-based merging operators. In Proc. of 11th International Conference on Principles of Knowledge Representation and Reasoning (KR 08), pp. 348 357.

Falappa, M. A., Kern-Isberner, G., Reis, M. D. L., & Simari, G. R. (2012). Prioritized and non-prioritized multiple change on belief bases. Journal of Philosophical Logic, 41(1), 77 113.

Falappa, M. A., Kern-Isberner, G., & Simari, G. R. (2002). Belief Revision, Explanations and Defeasible Reasoning. Artiﬁcial Intelligence, 141, 1 28.

Flouris, G., Huang, Z., Pan, J. Z., Plexousakis, D., & Wache, H. (2006). Inconsistencies, negations and changes in ontologies. In Proc. of 21st National Conference on Artiﬁcial Intelligence (AAAI 06), pp. 1295 1300.

Friedman, N., & Halpern, J. Y. (2001). Belief revision: A critique. Computer Research Repository (Co RR), cs.AI/0103020.

Fuhrmann, A. (1991). Theory contraction through base contraction. Journal of Philosophical Logic, 20, 175 203.

G ardenfors, P. (1982). Rule for rational changes of belief. Philosophical Essay Dediccated To Lennart Aqvist on his Fiftieth Birthday, 88 101.

G ardenfors, P. (1988). Knowledge in Flux: Modeling the dynamics of epistemic states. MIT Press.

Gelfond, M. (2008). Answer sets. In Handbook of Knowledge Representation, chap. 7, pp. 285 316. Elsevier.

G omez, S. A., Ches nevar, C. I., & Simari, G. R. (2010). Reasoning with inconsistent ontologies through argumentation. Applied Artiﬁcial Intelligence, 24(1&2), 102 148.

Greco, S., & Molinaro, C. (2012). Probabilistic query answering over inconsistent databases. Annals of Mathematics and Artiﬁcial Intelligence (AMAI), 64(2-3), 185 207.

Haase, P., van Harmelen, F., Huang, Z., Stuckenschmidt, H., & Sure, Y. (2005). A framework for handling inconsistency in changing ontologies. In Proc. of 4th International Semantic Web Conference (ISWC 05), pp. 353 367.

Halaschek-Wiener, C., & Katz, Y. (2006). Belief base revision for expressive description logics. In Proc. of International Workshop on OWL: Experiences and Directions (OWLED 06).

Hansson, S. O. (1991). Belief Base Dynamics. Ph.D. thesis, Uppsala University, Department of Philosophy, Uppsala, Sweden.

Hansson, S. O. (1993). Theory contraction and base contraction uniﬁed. Journal of Symbolic Logic, 58(2), 602 625.

Hansson, S. O. (1994). Kernel contraction. Journal of Symbolic Logic, 59(3), 845 859.

Hansson, S. O. (1997). Semi-revision. Journal of Applied Non-Classical Logics, 7(1-2), 151 175.

Deagustini, Martinez, Falappa & Simari

Hansson, S. O. (2001). A Textbook of Belief Dynamics: Solutions to Exercises. Kluwer Academic Publishers, Norwell, MA, USA.

Harman, G. (2008). Change in view: Principles of reasoning. Cambridge University Press.

Harper, W. (1975). Rational Belief Change, Popper Functions and Counterfactuals. Synthese, 30, 221 262.

Horridge, M., Parsia, B., & Sattler, U. (2009). Explaining inconsistencies in OWL ontologies. In Scalable Uncertainty Management, pp. 124 137. Springer.

Huang, Z., van Harmelen, F., & ten Teije, A. (2005). Reasoning with inconsistent ontologies. In Proc. of 19th International Joint Conference on Artiﬁcial Intelligence (IJCAI 05), pp. 454 459.

Hu e, J., Papini, O., & W urbel, E. (2009). Merging belief bases represented by logic programs. In Proc. of 10th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU 09), pp. 371 382.

Kalyanpur, A., Parsia, B., Horridge, M., & Sirin, E. (2007). Finding all justiﬁcations of OWL DL entailments. Springer.

Kalyanpur, A., Parsia, B., Sirin, E., & Hendler, J. A. (2005). Debugging unsatisﬁable classes in owl ontologies. Web Semantics: Science, Services and Agents on the World Wide Web, 3(4), 268 293.

Katsuno, H., & Mendelzon, A. O. (1991). On the diﬀerence between updating a knowledge base and revising it. In Proc. of 2nd International Conference on Principles of Knowledge Representation and Reasoning (KR 91), pp. 387 394.

Katsuno, H., & Mendelzon, A. O. (1992). Propositional knowledge base revision and minimal change. Artiﬁcial Intelligence, 52(3), 263 294.

Konieczny, S., & P erez, R. P. (2002). Merging information under constraints: A logical framework. Journal of Logic and Computation, 12(5), 773 808.

Konieczny, S., & P erez, R. P. (2011). Logic based merging. Journal of Philosophical Logic, 40(2), 239 270.

Lembo, D., Lenzerini, M., Rosati, R., Ruzzi, M., & Savo, D. F. (2010). Inconsistencytolerant semantics for description logics. In Proc. of 4th International Conference on Web Reasoning and Rule Systems (RR 10), pp. 103 117.

Lenzerini, M. (2002). Data integration: A theoretical perspective. In Proc, of 21st ACM SIGMOD Symposium on Principles of Database Systems (PODS 02), pp. 233 246.

Levi, I. (1977). Subjunctives, Dispositions and Chances. Synthese, 34(4), 423 455.

Liberatore, P., & Schaerf, M. (1998). Arbitration (or how to merge knowledge bases). Knowledge and Data Engineering, 10(1), 76 90.

Lin, J., & Mendelzon, A. O. (1998). Merging databases under constraints. International Journal of Cooperative Information Systems, 7(1), 55 76.

Lin, J., & Mendelzon, A. O. (1999). Knowledge base merging by majority. Applied Logic Series, 18, 195 218.

Datalog Ontology Consolidation

Lloyd, J. W. (1987). Foundations of Logic Programmming. Springer-Verlag.

Lukasiewicz, T., Martinez, M. V., & Simari, G. I. (2012). Inconsistency handling in datalog+/- ontologies. In Proc. of 20th European Conference on Artiﬁcial Intelligence (ECAI 12), pp. 558 563.

Martinez, M., Pugliese, A., Simari, G., Subrahmanian, V., & Prade, H. (2007). How dirty is your relational database? an axiomatic approach. In Mellouli, K. (Ed.), Proc. of 9th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU 07), Vol. 4724 of Lecture Notes in Computer Science, pp. 103 114. Springer.

Meyer, T., Lee, K., & Booth, R. (2005). Knowledge integration for description logics. In Veloso, M., & Kambhampati, S. (Eds.), Proceedings of AAAI05, Twentieth National Conference on Artiﬁcial Intelligence, pp. 645 650. AAAI Press.

Newell, A. (1982). The Knowledge Level. Artiﬁcial Intelligence, 18, 87 127.

Nilsson, U., & Maluszynski, J. (1995). Logic, Programming and Prolog (2ed). John Wiley & Sons Ltd.

Parsons, S., Wooldridge, M., & Amgoud, L. (2003). Properties and complexity of some formal inter-agent dialogues. Journal of Logic and Computation, 13(3), 347 376.

Qi, G., & Hunter, A. (2007). Measuring incoherence in description logic-based ontologies. In Proc. of 6th International Semantic Web Conference and the 2nd Asian Semantic Web Conference (ISWC/ASWC 07), pp. 381 394.

Qi, G., Liu, W., & Bell, D. A. (2006). Knowledge base revision in description logics. In Proc. of 10th European Conference in Logics in Artiﬁcial Intelligence (JELIA 06), pp. 386 398.

Quine, W. V. O. (1986). Philosophy of logic. Harvard University Press.

Reiter, R. (1987). A theory of diagnosis from ﬁrst principles. Artiﬁcial Intelligence, 32(1), 57 95.

Rosati, R. (2011). On the complexity of dealing with inconsistency in description logic ontologies. In Proc. of International Joint Conference on Artiﬁcial Intelligence (IJCAI 11), pp. 1057 1062.

Rott, H. (1992). Modellings for belief change: Prioritization and entrenchment. Theoria, 58(1), 21 57.

Schlobach, S., & Cornet, R. (2003). Non-standard reasoning services for the debugging of description logic terminologies.. In Proceedings of the Eighteenth International Joint Conference on Artiﬁcial Intelligence (IJCAI 03), pp. 355 362.

Schlobach, S., Huang, Z., Cornet, R., & van Harmelen, F. (2007). Debugging incoherent terminologies. Journal of Automated Reasoning, 39(3), 317 349.

Staworko, S., Chomicki, J., & Marcinkowski, J. (2012). Prioritized repairing and consistent query answering in relational databases. Annals of Mathematics and Artiﬁcial Intelligence, 64(2-3), 209 246.

Deagustini, Martinez, Falappa & Simari

Turner, H. (2003). Strong equivalence made easy: nested expressions and weight constraints. Theory and Practice of Logic Programming, 3(4-5), 609 622.

von Leibniz, G. W. F. (1976). Philosophical Papers and Letters: a selection, Vol. 1. Springer.

Wassermann, R. (2000). An algorithm for belief revision. In Proc. of International Conference on Principles of Knowledge Representation and Reasoning (KR 00), pp. 345 352.

Wijsen, J. (2005). Database repairing using updates. ACM Transaction on Database Systems, 30(3), 722 768.