# consistent_knowledge_discovery_from_evolving_ontologies__93ac184c.pdf

Consistent Knowledge Discovery from Evolving Ontologies

Freddy L ecu e IBM Dublin Research Center, Ireland freddy.lecue@ie.ibm.com

Jeff Z.Pan University of Aberdeen, UK jeff.z.pan@abdn.ac.uk

Deductive reasoning and inductive learning are the most common approaches for deriving knowledge. In real world applications when data is dynamic and incomplete, especially those exposed by sensors, reasoning is limited by dynamics of data while learning is biased by data incompleteness. Therefore discovering consistent knowledge from incomplete and dynamic data is a challenging open problem. In our approach the semantics of data is captured through ontologies to empower learning (mining) with (Description Logics) reasoning. Consistent knowledge discovery is achieved by applying generic, signiﬁcative, representative association semantic rules. The experiments have shown scalable, accurate and consistent knowledge discovery with data from Dublin.

Introduction and Related Work Knowledge discovery, as an area focusing upon methodologies for extracting knowledge through deduction (a priori) or from data (a posteriori), has been largely studied in Database and Artiﬁcial Intelligence. Deductive reasoning e.g., logic reasoning (Reiter 1980) gains logically knowledge from pre-established (certain) knowledge statements, while inductive inference such as data mining (Agrawal, Imielinski, and Swami 1993) or learning (Fanizzi, d Amato, and Esposito 2010; V olker and Niepert 2011) discovers (uncertain) knowledge by generalizing from initial information. Data is dynamic, incomplete, especially when exposed through sensors (Labrinidis and Jagadish 2012). Tracking phenomena with multiple sensor readings is a challenging problem. From trafﬁc diagnosis (L ecu e 2012), systems monitoring (Song et al. 2014) to disease transmission prediction (Sadilek, Kautz, and Silenzio 2012), all are examples of scenarios where consistent knowledge needs to be derived from dynamic, incomplete data. Reasoning is strongly restricted by dynamics, variance, noisiness of data, enforcing knowledge to be regularly revisited (Anicic et al. 2011). Recent works in stream reasoning (Valle et al. 2009) handle the dynamics of semantic querying but are very limited in reasoning, and fail in recovering knowledge from data. Although (L ecu e and Pan 2013) exposed beneﬁts in combining reasoning and learning, the shortcomings are lack of scalability, over-speciﬁcation of rules, non-integration to reasoning

Copyright 2015, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

systems. On the other hand inductive learning is limited by the large ﬂuctuation of rules abstraction and its (high) theoretical complexity. Learning is also heavily biased by data incompleteness and inconsistency, making knowledge subject to incorrectness (Dietterich and Michalski 1981). They follow classic (raw) data mining techniques i.e., identiﬁcation of patterns using distance metrics (Ramaswamy, Rastogi, and Shim 2000) between syntactic values. (Srikant and Agrawal 1995) capture such patterns through frequent itemset mining (Agrawal et al. 1996). Their approach discovers recurring sequences of syntactic items and implication rules among those items e.g., rules buying milk implies buying bread with a conﬁdence of 70% are learnt in market basket analysis. (Dehaspe and Raedt 1997) constrain all rules to be represented following inductive logic programming (ILP), which signiﬁcantly improve scalability. Although metrics have been used to measure the quality of the derived rules, previous approaches fail in deriving scalable, accurate, and consistent knowledge. In this respect (Gal arraga et al. 2013) tackles the scalability issue when performing rule mining on semantic data but does not in the context of evolving data. We address discovery of consistent knowledge from dynamic semantic data . Given continuous knowledge, how do we capture a minimal albeit representative set of timeevolving trends to discover accurate, consistent knowledge? The semantics of data is captured through OWL (Web Ontology Language) ontologies, which are underpinned by Description Logics (DL) (Baader and Nutt 2003). Key contributions include: (1) We design the ﬁrst algorithm to learn DL rules, which are strictly more expressive than DL axioms and Datalog rules. (2) By exploiting the expressiveness of DL rules, we introduce the notions of signiﬁcative, representative association DL rules, enabling precise identiﬁcation of fundamental rules. The beneﬁts of combining learning and reasoning, validated in experiments, are: (i) logical representation of learnt rules, (ii) classiﬁcation, and abstraction of rules, (iii) tight integration of rules in reasoning; (iv) scalability, (v) accuracy of consistent knowledge discovery. Next section reviews the adopted logic and rule representation together with dynamic ontology (knowledge). Then we present inductive learning in dynamic ontologies. The next section presents how representative rules drive consistent knowledge discovery. Finally, we report experiments with real data from Dublin City and draw some conclusions.

Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence

Figure 1: O .= T , A . Sample of TBox T and ABox A.

Background Evolving and static background knowledge are represented using an ontology. We focus on DL to deﬁne ontologies since it offers good reasoning support for most of its expressive families and compatibility to W3C standard OWL 2. We illustrate our work with DL EL++ (Baader, Brandt, and Lutz 2005). The selection of this DL fragment, which is the logic behind the basis of many more expressive DL, has been guided by (i) the semantics expressed by data in our application cf. Experimental Results, lessons learned, (ii) its polynomial time reasoning (satisﬁability, subsumption) when combined with EL++ rules. We review (i) DL basics of EL++, (ii) EL++ atomsets and rules, (iii) evolving ontologies and the underlying reasoning.

Description Logics EL++

A signature Σ, noted (NC, NR, NI) consists of 3 disjoint sets of (i) atomic concepts NC, (ii) atomic roles NR, and (iii) individuals NI. Given a signature, the top concept , the bottom concept , an atomic concept A, an individual a, an atomic role expression r, EL++ concept expressions C and D in C can be composed with the following constructs: | | A | C D | r.C | {a} The DL ontology O .= T , A is composed of a TBox T , and an ABox A. A TBox is a set of concept and role axioms. EL++ supports General Concept Inclusion axioms (GCIs e.g. C D), Role Inclusion axioms (RIs e.g., r s ). An ABox is a set of concept assertion axioms e.g., C(a), role assertion axioms e.g., R(a, b), individual in/equality axioms e.g., a = b, a = b. In this paper, we assume acyclic TBoxes which entail ﬁnitely instance statements. Example 1. (TBox and ABox Concept Assertion Axioms) Figure 1 presents (i) a TBox T where Disrupted Road (4) denotes the concept of roads which are adjacent to an event causing high disruption , (ii) concept assertions (16 18) denoting the individual r0 having ri,1 i 3 as adjunct roads. Table 1 sketches some completion rules (Baader, Brandt, and Lutz 2005) that are used to classify EL++ TBox T and entail subsumption. Reasoning with such rules is PTime Complete (Baader, Brandt, and Lutz 2008).

EL++ Atom, Atomsets, Binding and Rule We consider EL++ with (i) concept expressions C, role names NR, individual names NI, and (ii) a countable set of ﬁrst-order variables V.

R1 If X A, A B then X B R2 If X A1, An, A1 An B then X B R3 If X A, A r.B then X r.B R4 If X r.A, A A , r.A B then X B R5 If X r.A, A then X R6 If X r.A, r s then X s.A R7 If X r1.A, A r2.B, r1 r2 r3 then X r3.B

Table 1: EL++ TBox Completion Rules (no datatypes).

Atomset: Given terms x1, x2 V NI, a concept (role) atom is a formula C(x1) (R(x1, x2)) with C C (R NR). We use ﬁnite sets (atomsets) B of (concepts, roles) atoms for representing conjunction x. B where x .= x1, , xn V are variables of atoms B B which could be shared. Atomset Binding: Atomsets can be seen as conjunctive queries (Glimm et al. 2007) without non-distinguished variables. The arity of an atomset is the number of variables in an atomset. We write T , A |= B[ a] to denote that a NI is an answer to query B. In other words the variables x of atomset B are bound (mapped) by a. In the rest of the paper we used the terms answer and binding interchangeably. bind(B, T A) is the set of all bindings to B w.r.t. T , A.

Example 2. (Atomset & Binding w.r.t. O in Figure 1) Given atomset B .= {adj(x, r1)}, bind(B, O) is {(r0)}.

Atomset Containment: Let T be a TBox, B, C atomsets with the same arity. Then B is contained in C w.r.t. T , written B T C, if for all consistent ABoxes A w.r.t. T , we have bind(B, T A) bind(C, T A).

Example 3. (Atomset Containment w.r.t. O in Figure 1) Let B, C be atomsets {(1)} and {(2)}; A be {Event(e1), Event(e2), Social Event(e1)}. B T C for A since all bindings (answers) of B are also bindings of C.

Social Event(x) (1) Event(x) (2)

EL++ Rules: EL++ rules (Kr otzsch, Rudolph, and Hitzler 2008) extends EL++ expressivity while maintaining polynomial time complexity of many typical inference problems. Given atomsets B, H, and all variables x V of atomset B H, an EL++ rule is a formula B H, such that B is cycle free and does not contain atom of the form R(x, x).

Example 4. (EL++ Rule) Below rule denotes if x3 is adjacent to a x2 where a highly disruptive event x1 occurs then buses are congested in x3 . {(5)} is atomset {(Road with.Congested Bus)(x3)}.

(Event disruption.High)(x1) (3) occur(x2, x1) adj(x3, x2) (4) (Road with.Congested Bus)(x3) (5)

Dynamics of Knowledge as Evolving Ontologies We represent dynamics of knowledge by an evolution of ontologies in Deﬁnition 1 (Huang and Stuckenschmidt 2005).

Figure 2: Evolving Ontologies P9 0(i), Q9 0(i), R9 0(i)i {5,6,7}.

Deﬁnition 1. (DL L Evolving Ontology) A DL L evolving ontology Pn m from point of time m to point of time n is a sequence of (sets of) Abox axioms (Pn m(m), Pn m(m + 1), , Pn m(n)) w.r.t a static TBox T in a DL L where m, n N and m < n.

Pn m(i) is a snapshot of an evolving ontology Pn m at time i, referring to ABox axioms with respect to a TBox in L. We will consider evolving ontologies Pn 0 for the sake of clarity.

Example 5. (DL EL++ Evolving Ontology) Figure 2 illustrates EL++ evolving ontologies P9 0, Q9 0, R9 0, related to events, travel time, buses, through snapshots at time i {5, 6, 7}. Their dynamic knowledge is captured by evolving ABox axioms e.g., (27) captures e2 as a social music event occurring in r2 at time 6 of P9 0.

By applying rules in Table 1 on static knowledge T , evolving ontology Pn 0 , snapshot-speciﬁc axioms are inferred. Example 6. (Reasoning in Evolving Ontology) (6), (7), as dynamic knowledge are entailed from axioms of O in Figure 1 and evolving ontologies P9 0, Q9 0, R9 0 in Figure 2 (cf. references to axioms A in |=(A)), by applying completion rules in Table 1 (cf. references to rules R in |=R). E.g., r0 and r3 are respectively entailed to be roads with: (i) disruptions, (ii) some congested buses, both at time 7.

O, P9 0(7) |=(3)-4),(7),(18),(30) R1,R2,R3,R7 Disrupted Road(r0) (6)

O, Q9 0 R9 0(7) |=(5-6),(9),(12),(31-32) R1,R2,R3,R4 with.Congested Bus(r3) (7)

ABox axioms (31) in Q9 0, (32) in R9 0 are both required to ﬁre GCIs (5 6) and entail (34). We say that (34) emphasizes an association ( in Figure 3) of Q9 0, R9 0 through (5 6) in T . Thus dynamic knowledge can be entailed by axioms from single (6) but also associated (7) evolving ontologies.

Inductive Reasoning in Evolving Ontologies Axioms which enable knowledge association e.g., (5 6) are rarely modeled a priori in a background knowledge T because of the uncertainty of evolving ontologies. An association of P9 0 (events) with Q9 0 (travel time) or R9 0 (buses information) cannot be derived a priori but only a posteriori by analyzing data. Indeed (5 6) are speciﬁc to cities subject to changes. Capturing such rules (e.g., dashed in Figure 3) in T would certainly extend the reasoning impact since they capture evolving knowledge. They could be used for inducing missing knowledge in Q9 0 or R9 0. How to discover

Figure 3: How to Discover an Association ( ) of Evolving Ontologies Q, R at time 7 with respect to T ?

knowledge association across evolving ontologies P and R or Q at time 7 with respect to T ? We tackle this problem by mining knowledge associations as EL++ rules.

Association EL++ Rules

We revisit the concept of association rules (Agrawal, Imielinski, and Swami 1993) in the context of evolving ontologies through an EL++ rule-based representation.

Deﬁnition 2. (Association EL++ Rule) Let T , A be EL++ axioms, Pn 0 , Qn 0 be EL++ evolving ontologies, B, H be atomsets where the set of variables in H, or var(H), is a subset of the variables in B. An association EL++ rule in Pn 0 Qn 0 is a DL EL++ rule B H (identiﬁed by atomset B H) such that i [0, n]: bind(B, T A Pn 0 (i))|var(H) = bind(H, T A Qn 0(i)).

Deﬁnition 2 identiﬁes an association as an EL++ rule between atomsets B, H if they have the same bindings for their variables across two evolving ontologies at a time i. Contrary to (L ecu e and Pan 2013), representing association as an arbitrary combination of ABox axioms, we consider EL++ rule as a broader framework. Indeed it supports (i) associations which can be modeled by EL++ rules, (ii) more generic rules since variables are accepted in atoms of EL++

rules, instead of only allowing constants in the other case, (iii) native combination of EL++ rule and axioms. In addition rules could be evaluated with respect to their bindings.

Example 7. (Association EL++ Rule) The rule (3-5) in Example 4 illustrates an association EL++

rule from P9 0 to Q9 0 R9 0. The atomset of this rule is bound at time 5 by {(e1, r0, r1)} and 7 by {(e3, r0, r3)}.

We measure the interestingness of association rules by adapting the concepts of support (Deﬁnition 3) and conﬁdence (Deﬁnition 5) introduced in the database community.

Deﬁnition 3. (Atomset Support) Given axioms O .= T , A , an evolving ontology Pn 0 , atomset B, the support of B, noted σ(B), in [0, 1] is deﬁned by:

σ(B) .= |{i [0, n] | a NI : O, Pn 0 (i) |= B[ a]}| n + 1 (8)

where the expression |S| refers to the cardinality of S.

The support of atomset B is the proportion of snapshots where B has at least one binding a in A Pn 0 with respect to T . Since B may have multiple bindings in A Pn 0 at any time i [0, n], we capture their number in Deﬁnition 4.

Deﬁnition 4. (Atomset Weight) Given EL++ axioms O .= T , A , evolving ontology Pn 0

Ontology P7 5 Q7 5 R7 5 Q7 5 R7 5 Atomset B {(21)} {(22)} {(28)}*{(31)}* {(32)}* {(23)} Variable V (x1) (x1, x2, x3) (x3) (x3) (x4, x3) (x3)

Binding at Time

5 {(e1)} {(e1, r1, r0)} {(r1)} {(b1, r1)} {(r1)} 6 {(e2)} {(e2, r2, r0)} {(r2)} {(b2, r2)}

7 {(e3), (e4)} {(e3, r3, r0), {(r3)}{(r3)} {(b3, r3)} {(r3)} (e4, r3, r0)}

σ(B) 1 1 1 1/3 1 2/3

ω(B) 4 4 3 1 3 2

[*] From now on all terms NI of (28), (31-32) are free variables in V.

Table 2: Binding, Support and Weight of Atomsets B.

and atomset B, the weight of B, noted ω(B), is deﬁned by:

i=0 |{ a NI | O, Pn 0 (i) |= B[ a]}| (9)

Example 8.(Atomsets Support and Weight) Let P7 5, Q7 5, R7 5 be P9 0, Q9 0, R9 0 restricted to [5, 7] where ABox statements related to e4 extends P7 5 at time 7 cf. Table 2. This table illustrates the support σ, weight ω of atomsets e.g., {(5)} is bound in Q7 5 R7 5 at time 5 by {(r1)} and 7 by {(r3)}, but not at time 6, thus σ({(5)}) is 2/3. The number of bindings of {(5)}, noted ω({(5)}), is 2 in Q7 5 R7 5.

Deﬁnition 5. (Conﬁdence of an Association EL++ Rule) Let B H be an association EL++ rule in Pn 0 Qn 0. The conﬁdence γ of B H in (0, 1]2 is:

γ(B H) .= σ(B H)

σ(B) , ω(B H)

σ(B H), ω(B H) are support and weight of B H i.e., respectively: the proportion of snapshots in Pn 0 Qn 0 where B H has at least one binding, and its number of bindings. The conﬁdence is deﬁned as the percentage of (i) snapshots in Pn 0 Qn 0 where B H has at least a binding with regard to those where B has a binding, and (ii) bindings of B H in Pn 0 Qn 0 with regard to those which bind B. That is, they represent complementary conditional probabilities:

p(O, Qn 0(i) |= H[ a] | O, Pn 0 (i) |= B[ a]|var(H)) (11)

evaluated with respect to the number of (i) snapshots and (ii) bindings i.e., respectively σand ω-related entry of (10).

Example 9. (Conﬁdence of an Association EL++ Rule) Conﬁdence γ(B H) with B :{(3), (4)}, H :{(5)} is: σ({(3), (4), (5)})

σ({(3), (4)}) , ω({(3), (4), (5)})

ω({(3), (4)})

B H is correct in 2/3 of time [5, 7] and its atomset B H is bound by 3/4 of bindings of atomset B : {(3), (4)}. Remark 1. (Rule Conﬁdence and Ordering) The level of conﬁdence of rules can be compared by analyzing their supports and weights since conﬁdence is deﬁned as a tuple (Deﬁnition 5) e.g., γ(B1 H1) > γ(B2 H2) if

σ(B1) > σ(B2 H2)

σ(B2) ω(B1 H1)

ω(B1) > ω(B2 H2)

If conﬂict e.g., value of 1st element of γ(B1 H1) is better than 1st element of γ(B2 H2) but worse for 2nd element, we compare a weighted average of normalised components.

Mining Association EL++ Rules

Mining EL++ rules consists in generating all rules with a minimum support σmin, conﬁdence γmin, weight ωmin. This can be decomposed (Agrawal, Imielinski, and Swami 1993) by: (i) ﬁnding all signiﬁcant atomsets i.e., atomsets with support, weight above σmin, ωmin, (ii) using them to generate rules that meet γmin. Algorithm 1 revisits WARMR (Dehaspe and Raedt 1997) to support EL++ atomsets bindings, containment and weight. It ﬁnds all signiﬁcant atomsets by exploiting the lattice structure the containment relation T is imposing on the space of atomsets to perform a breadth-ﬁrst search. Sk refers to the set of signiﬁcant atomsets mined at depth k in the lattice while Ck is the set of potential candidates for Sk. Their elements are called k-atomsets i.e., atomsets of k atoms. Initially (line 4) the set of signiﬁcant 1-atomsets S1 is determined. A subsequent pass (line 5), say pass k, consists of two phases. First (line 6), the signiﬁcant atomsets Sk 1 found in the (k 1)th pass are used to generate the candidate atomsets Ck, using Algorithm 2. Pn 0 is then analyzed (line 7) to determine support (line 10) and weight (line 11) of candidates C Ck, when bound in Pn 0 (line 9).

Algorithm 1: atomsets-mining On 0 , σmin, ωmin .

1 Input: EL++ axioms O .= T , A ; EL++ evolving ontology Pn 0 ; Min. threshold of support σmin and weight ωmin.

2 Result: Set of signiﬁcant atomsets.

4 S1 Set of signiﬁcant 1-atomsets; % Initialization

5 foreach k 2 | Sk 1 = do % k-atomsets on top of k-1 ones

6 Ck atomsets-gen Sk 1 ; % k-atomsets Candidates

7 foreach point of time i [0, n] do % Snapshot Pn 0 (i)

8 % Atomsets with bindings a in Pn 0 (i)

9 foreach C Ck | a : T , A Pn 0 (i) |= C[ a] do

10 count(C) count(C) + 1; % Needed for σ(C)

11 ω(C) ω(C) + { a | T , A Pn 0 (i) |= C[ a]} ;

12 % Only signiﬁcant atomsets are considered

13 Sk {C Ck | count(C)

(n+1) σmin, ω(C) ωmin};

14 return k Sk;

Algorithm 2 generates candidate (k + 1)-atomsets from signiﬁcant k-atomsets. In join-step (lines 5-7) where atomsets are sorted in their lexicographic order ( lex), Sk is combined with itself on the basis of its k common atoms. In the prune-step, we discard all atomsets that have a subset which does meet σmin, ωmin i.e., not in Sk (line 10). It is trivial to show that any atomset in Ck+1 is signiﬁcant only if all subsets of size k are also signiﬁcant by revisiting the results of (Agrawal et al. 1996) for EL++ atomsets. Thus line 10 maintains completeness. We also prune (k + 1)-atomsets which are redundant i.e., already contain(ed by) Sk (line11). The complexity of Algorithm 1 is Θ(|S|2 (n+1)) where |S| is the number of atoms in Pn 0 . The prune-step in Algorithm 2 controls the exponential growth of atomsets.

Example 10. (Atomset Mining - Case k = 2) Let {(3)},{(4)},{(2)} be signiﬁcant 1-atomsets. After the join-step of Algorithm 2 we obtain 2-atomsets

Algorithm 2: Atomsets Generation: atomsets-gen Sk

1 Input: A terminology T ; Set of Signiﬁcant k-atomsets Sk.

2 Result: Set of Candidate (k + 1)-atomsets Ck+1.

4 % Join-step of k-atomsets from the same level k to obtain Ck+1 5 C k+1 {{S1, ..., Sk, Tk} | {S1, ..., Sk 1, Sk} Sk,

6 {S1, ..., Sk 1, Tk} Sk,

7 Sk lex Tk}

8 % Prune-step of (k + 1)-atomsets

9 Ck+1 C k+1 \ { {S1, ..., Sk, Tk} |

10 (i) S {S1, ..., Sk, Tk} with |S| = k such that S / Sk,

11 (ii) {Tk} T ( T ){Sk} } % Redundant k-atomsets

12 return Ck+1;

{(3), (4)}, {(4), (2)}, {(3), (2)}. The last one is discarded during pruning (line 11) since {(3)} T {(2)}.

For generating EL++ rules with minimum conﬁdence (Deﬁnition 5), we refer to ap-genrules (Agrawal et al. 1996). This procedure for large itemsets, with minor modiﬁcations e.g., rules representation (Deﬁnition 2), applies to signiﬁcant atomsets. Its complexity is Θ(maxk |Sk| k), where |Sk| is the number of signiﬁcant k-atomsets.

Knowledge Discovery in Evolving Ontologies Knowledge is discovered by (i) determining its representative rules (Deﬁnition 7), and (ii) exploiting their combination with background knowledge (Algorithm 3).

Representative Association EL++ Rules Although measures of support, weight, conﬁdence largely pruned uninteresting rules, the remaining ones are not necessarily all fundamental to discover new knowledge. Indeed some can be logically derived from a minimal set of rules, which represents the representative inductive rules. Definition 7 formalizes this concept of representative rule by reﬁning the notion of cover (Zaki 2000) in Deﬁnition 6.

Deﬁnition 6. (Association EL++ Rules Cover) Let R be association EL++ rules. The cover of a rule ρ : B H in R is deﬁned by Γ(ρ) .= {A H in R | A T B}.

The rule ρ, covering rules in Γ(ρ), synthesizes all rules deriving same consequents as ρ from any contained antecedents. It captures all rules which convey to similar knowledge but requiring more speciﬁc knowledge than ρ.

Deﬁnition 7. (Representative Association EL++ Rules) A set of representative association DL rules of R, denoted by R , is deﬁned by {ρ R | ρ R, ρ = ρ, ρ Γ(ρ )}.

A set of representative association DL rules is a least set of rules covering (Deﬁnition 6) all rules in R. It captures a synthesis set of R, which is required to derive any rule in R. Thus all rules in a cover can be derived from their representative rule by atomset containment.

Proposition 1. (Minimum Support, Weight of Rules in R ) σ(ρ ) σmin and ω(ρ ) ωmin ρ R iff ρ R | (i) σ(ρ) σmin, (ii) ω(ρ) ωmin and (iii) ρ Γ(ρ ).

Figure 4: B H as Representative Rule of C H, D H.

Proposition 1 states that any representative rule has minimum support, weight if it covers a rule with minimum support, weight. The proof is trivial per Deﬁnitions 6 and 7. Example 11. (Rules Cover and Representative Rules) Let R be {(2), (4)} {(5)}, {(1), (4)} {(5)}, {(3), (4)} {(5)} in Figure 4. The ﬁrst rule covers the last two since antecedents (1), (3) are contained by (2) cf. Examples 3, 10 while their consequent are similar. R is {(2), (4)} {(5)}, covering the other two rules. The semantics of atomsets is crucial to prune large sets of rules through its representative rules. The more semantic relations (atomsets containment) among evolving ontologies the more (resp. less) covered (resp. representative) rules.

Consistent Inductive Knowledge Discovery (CIKD) Algorithm 3 presents our general approach CIKD for inducing consistent knowledge from Pn 0 to Qn 0 at time i. All signiﬁcant rules R in Pn 0 Qn 0, which are generated from signiﬁcant atomsets (line 5), are identiﬁed by adapting ap-genrules (Agrawal et al. 1996) in line 7. Besides atom(sets) to be substituted by item(sets), our revisited definitions of support and conﬁdence needs to be applied. Its representative rules R are then elaborated (line 8). Proposition 1 ensures minimum support, weight. All representative rules ρ : B H, ﬁltered by conﬁdence (line 10), are evaluated at time i of Pn 0 , Qn 0 (line 13). We checked consistency of Qn 0(i) with knowledge H derived from ρ, especially for all bindings a of B with respect to Pn 0 (i). This ensures that only consistent knowledge H( a) is derived from B( a) and added to Qn 0(i). This condition validates the consistent combination TBox, ABox, rules axioms at time i of Pn 0 , Qn 0. Its complexity is polynomial since (i) the representative rules generation (lines 4-8) is polynomial in the number of snapshots, atoms (line 5 i.e., Algorithms 1,2), k-atomsets (line 7), rules (Deﬁnition 7); (ii) their consistency checking (line 13), together with (iii) atomset binding, containment are polynomial in EL++ (Bienvenu, Lutz, and Wolter 2012). Example 12. (Consistent Knowledge Discovery) Suppose Q7 5, R7 5 not exposed at time 5 due to defective sensors. R7 5(5) cannot be deduced from T neither P7 5(5). So we apply Algorithm 3 with e.g., T , P7 5, R7 5, 5, 2/3, 2, (2/3, 3/4) . This leads to the representative rule (among others) in Example 11, which derives consistent knowledge in R7 5(5): (Road with.Congested Bus)(r1). Applying the same approach to R7 5(6) ends up with consistent but not accurate knowledge, hence the use of support, weight, conﬁdence.

Experimental Results We report (i) scalability, (ii) accuracy of Algorithm 3 (noted [A3]). The experiments have been conducted on a server of

Algorithm 3: CIKD T , Pn 0 , Qn 0 , i, σmin, ωmin, γmin

1 Input: Terminology T ; Evolving Ontologies Pn 0 , Qn 0 ; Time i; Min. support σmin, weight ωmin, conﬁdence γmin.

2 Result: Consistent knowledge Qn 0 (i) induced from Pn 0 Qn 0 .

4 % S : Set of signiﬁcant atomsets in evolving ontology Pn 0 Qn 0 5 S atomsets-mining Pn 0 Qn 0 , σmin, ωmin ;

6 % R : Set of rules in Pn 0 Qn 0 with minimum conﬁdence γmin 7 R ap-genrules S, Pn 0 Qn 0 , γmin ; % Version adapted

8 R {ρ R | ρ R, ρ = ρ, ρ Γ(ρ )}; % Deﬁnition 7

9 % All representative rules ρ : B H with highest conﬁdence

10 foreach rule ρ : B H in R | ρ R , γ(ρ ) > γ(ρ) do

11 % Consistency of rule ρ at time i of Pn 0 Qn 0 12 if T Qn 0 (i) {H( a)} |=

13 a | T Pn 0 (i) |= B[ a]|var(H)

14 % Qn 0 (i) and knowledge H[ a] from inductive reasoning

15 then Qn 0 (i) Qn 0 (i) {H[ a]}; ;

16 return Qn 0 (i);

Data Set Size (Mb / Day) #Axioms / Update #RDF Triples / Update [a] Weather 3 53 318 [b] Travel Time 43 270 810 [c] Incident 0.1 81 324 [d] Event 9.5 480 1, 150 [e] Bus 120 3, 000 12, 000

Table 3: Dynamic Data / Evolving Ontologies Details.

4 Intel(R) Xeon(R) X5650, 2.67GHz cores, 6GB RAM. Context: Dynamic data: [a] weather, [b] travel time, [c] incident, [d] event, [e] bus location in Dublin (Table 3) is transformed in EL++ using mapping techniques. We used a ﬁxed (off-line) window of n = 4, 320 snapshots (48 hours) for mining. We considered an ontology T of 55 concepts and 19 roles. The objective is to derive the status of buses (by mining EL++ rules across semantic data i.e., association of types of weather, travel condition, incident, events, bus delays) when not retrievable (due to 34% of missing data). Scalability Result: Figure 5 reports scalability of [A3] ((σmin, ωmin, γmin) being (1/2, n, (2/3, 3/4))) and compares its computation time with (Dehaspe and Raedt 1997) using inductive logic noted [D97], (Srikant and Agrawal 1995) using basic taxonomies noted [S95] and (L ecu e and Pan 2013) using ABox axioms-related rules noted [L13]. The evaluation is achieved on different (i) variations of T i.e., T [0], T [50], T [100] capturing a proportion of 0%, 50%, 100% of GCIs, RIs of T , (ii) number of evolving ontologies |s| i.e., {1, 3, 5} for respectively [e], [c,d,e], [a,b,c,d,e] in Table 3. The scalability of all approaches decreases with the number of evolving ontologies, axioms. [D97] is the most scalable (when |s| > 1) since it supports pruning strategies. Its performance remains unchanged for any variation of T since no semantics is supported. [S95], [A3] improve their scalability by clustering rules using semantics, which highly reduces the number of representative rules for [A3]. The off-line version of [L13] is the least scalable since knowledge discovery is based on the high number signiﬁcant rules. Our approach outperforms (i) [S95] when numerous evolving ontologies

Figure 5: Scalability. 1st x axis: Approaches on 5 Test Cases. 1st y axis: Computation Time in Seconds. 2nd x axis: 5 Types of Semantic and Evolving Ontologies Conﬁgurations. 2nd y axis: Search Space of Association Rules.

c1 c2 c3 c4 c5 c6 c7 c8 σmin .4 .4 .4 .4 .8 .8 .8 .8 γmin (.4, .4) (.4, .8) (.8, .4) (.8, .8) (.4, .4) (.4, .8) (.8, .4) (.8, .8)

Table 4: Support σmin, Conﬁdence γmin Conﬁguration.

Figure 6: Accuracy. 1st x axis: Approaches on 8 Test Cases. 1st y axis: Accuracy of Discovered Knowledge. 2nd x axis: 8 Types of Support / Conﬁdence Conﬁgurations (Table 4).

occur, (ii) [D97] when more semantics is captured by T . Accuracy Result: Figure 6 reports accuracy with Table 4 as conﬁguration. The weight is interpreted in [A3]. T [50] and all ontologies are considered. Accuracy is measured by validating induced knowledge over 10, 000 past situations where buses status is known. All approaches are compared as they expose similar results with different representation. [A3] outperforms all approaches, even signiﬁcantly when (i) weight associated to conﬁdence γmin is higher than .4, (ii) support σmin is .8. The experiments v.s. [L13] show that genericity, representativeness, weight of rules largely contribute in reducing their quantity while improving quality. Lessons Learnt: The semantics of rules and its atomsets together with representativeness beneﬁts classical rules mining approaches. Our approach, as a variant of [L13], [D97], [S95], beneﬁts from (i) [L13] to discover association, (ii) [D97] to prune atomsets (for scalability), (iii) [S95] to capture their semantics (for consistency, accuracy). The scalability (accuracy) of knowledge discovery is negatively

(positively) impacted by the number of data sources, snapshots, axioms c.f. Algorithms 1, 2. Their number are critical as they drive heterogeneity in rules, which could improve accuracy, but not scalability. It would be worst with more expressive DLs due to binding and containment checks.

Conclusion and Future Work

Our approach, combining inductive and deductive reasoning, discovers consistent knowledge by mining and applying association EL++ rules across DL-augmented dynamic data. Existing approaches learn knowledge from raw data while we exploit its semantics and consistency during the learning phase. Semantics was essential for (i) capturing consistent knowledge across evolving ontologies, (ii) raising its accuracy, (iii) improving scalability through identiﬁcation of representative rules. Experiments have shown scalable, accurate, consistent knowledge discovery in Dublin. In future work we will investigate scalable and incremental re-adjustment of rules in a context of streaming data.

Agrawal, R.; Mannila, H.; Srikant, R.; Toivonen, H.; and Verkamo, A. I. 1996. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press. 307 328. Agrawal, R.; Imielinski, T.; and Swami, A. N. 1993. Mining association rules between sets of items in large databases. In SIGMOD Conference, 207 216. Anicic, D.; Fodor, P.; Rudolph, S.; and Stojanovic, N. 2011. Ep-sparql: a uniﬁed language for event processing and stream reasoning. In Proceedings of the 20th international conference on World wide web, 635 644. ACM. Baader, F., and Nutt, W. 2003. In The Description Logic Handbook: Theory, Implementation, and Applications. Baader, F.; Brandt, S.; and Lutz, C. 2005. Pushing the el envelope. In IJCAI, 364 369. Baader, F.; Brandt, S.; and Lutz, C. 2008. Pushing the el envelope further. In OWLED. Bienvenu, M.; Lutz, C.; and Wolter, F. 2012. Query containment in description logics reconsidered. In KR. Dehaspe, L., and Raedt, L. D. 1997. Mining association rules in multiple relations. In ILP, 125 132. Dietterich, T. G., and Michalski, R. S. 1981. Inductive learning of structural descriptions: Evaluation criteria and comparative review of selected methods. Artiﬁcial intelligence 16(3):257 294. Fanizzi, N.; d Amato, C.; and Esposito, F. 2010. Induction of concepts in web ontologies through terminological decision trees. In ECML/PKDD (1), 442 457. Gal arraga, L. A.; Teﬂioudi, C.; Hose, K.; and Suchanek, F. M. 2013. AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In 22nd International World Wide Web Conference, WWW 13, Rio de Janeiro, Brazil, May 13-17, 2013, 413 422.

Glimm, B.; Horrocks, I.; Lutz, C.; and Sattler, U. 2007. Conjunctive query answering for the description logic shiq. In IJCAI, 399 404. Huang, Z., and Stuckenschmidt, H. 2005. Reasoning with multi-version ontologies: A temporal logic approach. In ISWC, 398 412. Kr otzsch, M.; Rudolph, S.; and Hitzler, P. 2008. Description logic rules. In ECAI, 80 84. Labrinidis, A., and Jagadish, H. 2012. Challenges and opportunities with big data. Proceedings of the VLDB Endowment 5(12):2032 2033. L ecu e, F., and Pan, J. Z. 2013. Predicting knowledge in an ontology stream. In IJCAI. L ecu e, F. 2012. Diagnosing changes in an ontology stream: A dl reasoning approach. In AAAI. Ramaswamy, S.; Rastogi, R.; and Shim, K. 2000. Efﬁcient algorithms for mining outliers from large data sets. In ACM SIGMOD Record, volume 29, 427 438. ACM. Reiter, R. 1980. A logic for default reasoning. Artiﬁcial intelligence 13(1):81 132. Sadilek, A.; Kautz, H. A.; and Silenzio, V. 2012. Predicting disease transmission from geo-tagged micro-blog data. In AAAI. Song, X.; Zhang, Q.; Sekimoto, Y.; and Shibasaki, R. 2014. Intelligent system for urban emergency management during large-scale disaster. In AAAI, 458 464. Srikant, R., and Agrawal, R. 1995. Mining generalized association rules. In VLDB, 407 419. Valle, E. D.; Ceri, S.; van Harmelen, F.; and Fensel, D. 2009. It s a streaming world! reasoning upon rapidly changing information. IEEE Intelligent Systems 24(6):83 89. V olker, J., and Niepert, M. 2011. Statistical schema induction. In The Semantic Web: Research and Applications - 8th Extended Semantic Web Conference, ESWC 2011, Heraklion, Crete, Greece, May 29-June 2, 2011, Proceedings, Part I, 124 138. Zaki, M. J. 2000. Generating non-redundant association rules. In KDD, 34 43.