# efficient_extraction_of_elontology_deductive_modules__bf16dd3f.pdf

Efficient Extraction of EL-Ontology Deductive Modules*

Hui Yang, Yue Ma, Nicole Bidoit

LISN, CNRS, Universit e Paris-Saclay yang@lisn.fr, ma@lisn.fr, nicole.bidoit@lisn.fr

Because widely used real-world ontologies are often complex and large, one important challenge has emerged: designing tools for users to focus on sub-ontologies corresponding to their specific interests. To this end, various modules have been introduced to provide concise ontology views. This work concentrates on extracting deductive modules that preserve logical entailments over a given vocabulary. Existing deductive module proposals are either inefficient from a computing point of view or unsatisfactory from a quality point of view because the modules extracted are not concise enough. For example, minimal modules guarantee the most concise results but their computation is highly time-consuming, while -modules are easy to compute but usually they contain many redundant items. To overcome computation cost and lack of quality, we propose to compute two different kinds of deductive modules called pseudo-minimal modules and complete modules for EL-ontology. Our deductive module definitions rely on associating a tree representation with an ontology, and their computation is based on SAT encoding. Our experiments on real-world ontologies show that our pseudominimal modules are indeed minimal modules in almost all cases (98.9%), and computing pseudo-minimal modules is more efficient (99.79 times faster on average) than the stateof-the-art method Zoom for computing minimal modules. Also, our complete modules are more compact than - modules, but their computation time remains comparable. Finally, note that our proposal applies to EL-ontologies while Zoom only works for EL-terminologies.

Introduction Description logic-based ontologies have been widely studied and used in many areas. However, real-world ontologies are often too big to be handled by humans. The most evident approach for overcoming this problem is to extract subontologies related to user interests. For example, the wellknown biomedical ontology Snomed CT contains more than 300,000 axioms. By extracting deductive modules, we could provide doctors with small sub-ontologies of Snomed CT based on symptoms to establish a diagnosis. Extracting deductive modules has been used for various different areas,

*This work is funded by the BPI-France (PSPC AIDA: 2019PSPC-09). Copyright 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

like ontology debugging (Arif et al. 2016), re-use (Jim enez Ruiz et al. 2008), and forgetting (Koopmann and Schmidt 2013). We can distinguish two classes of deductive modules. The first class is syntactical locality-based modules such as - modules (Sattler, Schneider, and Zakharyaschev 2009) and AMEX-modules (Gatens, Konev, and Wolter 2014). Those modules can be computed efficiently, but usually, they contain many unnecessary terms. The second class is subsetminimal modules such as minimal modules (Chen et al. 2017). They do not contain any redundant terms but suffer from high complexity and are time-consuming in practice. In this work, we propose a new method for computing deductive modules for EL ontologies to balance the computation cost and the result quality. Our method is inspired by the SAT-based approach (Arif, Menc ıa, and Marques-Silva 2015; Manthey, Pe naloza, and Rudolph 2016) developed to compute justifications, which are minimal sub-ontologies that derive a given entailment. The main idea of these SATbased methods is to encode the derivations of a given entailment as a set of Horn-clauses, then it enumerates all the justifications of this entailment by SAT tools or resolution (Kazakov and Skoˇcovsk y 2018). However, the computation of deductive modules is much more complex: First, the input is a vocabulary instead of an entailment and it could be complicated to generate all the entailments over the given vocabulary; Second, there may exist (even infinitely) many entailments over a given vocabulary. Therefore, instead of using justifications of the entailments directly, one has to find other proper ways to tackle the computation of deductive modules. Our contribution is twofold: (i) We associate a forest with each given ontology and vocabulary to efficiently capture the entailments over the vocabulary. The definition of forest is inspired by the regular tree grammar developed in (Nikitina and Rudolph 2014). We can regard the forest and the regular tree grammar as a set of derivation trees and derivation rules that generate entailments over a given vocabulary, respectively. Moreover, we are able to deal with the case of infinitely many entailments by considering a finite subset of trees from our forest. (ii) We introduce two novel notions of deductive modules called pseudo-minimal modules and complete modules, and we develop an efficient SAT-based algorithm to compute them based on the notion

The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23)

of the forest. Our pseudo-minimal modules are quite interesting approximations of minimal modules: (1) They are indeed minimal modules when there are finitely many entailments over a given vocabulary; (2) Moreover, our algorithm is 99.79 times faster on average than the state-of-the-art algorithm Zoom (Chen et al. 2017) which computes all the minimal modules but only for EL-terminologies. Compared to pseudo-minimal modules, our complete modules are less concise but easier to compute. They are far more concise than the -module, as demonstrated by our experiment, but their calculation time remains comparable. This paper is organized as follows. First, we introduce our notion of the forest and define pseudo-minimal modules based on this notion, while complete modules are defined directly from the definition of deductive modules (Definition 2). Next, we compute pseudo-minimal modules and complete modules using Horn-clause encoding. Then, we validate our method by running experiments on real-world ontologies with the prototype For Mod. Finally, after the discussion of the related work, we conclude the paper with some future work directions.

Preliminaries Given finite sets of atomic concepts NC={A, B, } and atomic roles NR={r, s, }, EL-concepts C and ELaxioms α are built by the grammar rules (i) C ::= | A | C C | r.C or (ii) α ::= C C. An EL-ontology O is a finite set of EL-axioms. We denote by sig(O) (resp. sig(C)) the set of the atomic concepts and roles that appear in O (resp. C). For example, sig(B1 r.B2)={r, B1, B2}. An interpretation I=( I, I) of ontology O consists of a non-empty set I and a mapping from each atomic concept A NC to a subset AI I, and from each role r NR to a subset r I I I. For a concept C, we define CI inductively by: ( )I= I, (C D)I=CI DI, ( r.C)I={a I | b CI, (a, b) r I}. An interpretation is a model of O if it is compatible with all axioms in O, i.e., for all C D O, we have CI DI. We say O|=α where α is an axiom iff any model of O is compatible with α. An EL-ontology O is normalized if all its axioms are of the form: A B1 Bn, B1 Bn A, r.B A, A r.B, where A, B, Bi NC, r NR, n 1. Every EL-ontology can be normalized in polynomial time by introducing new atomic concepts. Let LA = {C | C A O}, RA = {D | A D O}. We say an atomic concept A is primitive iff (i) LA = or (ii) LA = RA and |LA| = 1. O is a terminology iff all the atomic concepts in O are primitive. The notion terminology defined here equals to that introduced in (Konev, Walther, and Wolter 2009). We state the definition in a different way because we use a different form of normalized ontologies. Example 1. O defined below is a normalized ontology. O is a terminology because all its atomic concepts are primitive. O={α1:A r.B1, α2:B1 A1 A2, α3:A r.A1 α4: r.B2 A2, α5:A3 A4 B2}. Definition 1 (Justification). Given an ontology O such that O|=A B. A justification of A B is a minimal sub-ontology J O such that J|=A B.

Given two ontologies O1, O2 and a signature Σ NC NR of atomic concepts and roles, the logical difference (Konev et al. 2012) between O1 and O2 w.r.t. Σ is defined as the set of axioms inferred by O1 but not inferred by O2:

c DiffΣ(O1, O2)={α | sig(α) Σ, O1|=α, O2 |=α}.

Definition 2 (Deductive module). A deductive module for an ontology O and a signature Σ is a sub-ontology M O such that c DiffΣ(O, M)= . Moreover: M is a minimal module for O and Σ if M is a minimal (under inclusion) deductive module for O and Σ. M is a complete module for O and Σ if M contains all minimal modules for O and Σ. It is clear that the union of all minimal modules is a complete module, and so is the ontology itself. There can be multiple deductive, minimal, and complete modules. Example 2 (Example 1 cont d). Assume that Σ is the signature given by {A, A1, A2, A3, A4, r}. Then O \ {α3} is a deductive module for O and Σ. O \ {α3} is also a minimal module because no proper subset of O \ {α3} is a deductive module. Moreover, O\{α3} is also a complete module since it is the unique minimal module for O and Σ. We now introduce the notion of uniform interpolation, which provides a way to capture entailments over a given signature w.r.t. an ontology O as defined below: Definition 3 (Uniform interpolation). Given an ontology O and a signature Σ, an ontology UO Σ is a uniform interpolation (hereafter UI) for O and Σ iff (i) sig(UO Σ ) Σ; (ii) O |= α UO Σ |= α for any axiom α with sig(α) Σ.

Example 3 (Example 2 cont d). The ontology UO Σ = {β1 : A r.(A1 A2), β2 : A r.A1, β3 : r.(A3 A4) A2} is a uniform interpolation for O and Σ. Given an ontology and a signature, a UI does not always exist. To overcome this problem, several approaches (Lutz, Piro, and Wolter 2010; Calvanese, Giacomo, and Lenzerini 1999) have been proposed to provide approximations of UI. In the following, our definitions and theorems always assume that O is a normalized EL-ontology and Σ a signature unless otherwise stated.

Introducing Forest F O Σ Now, we analyze possible ways to compute deductive models and then introduce our notion of forest F O Σ .

Motivation Assuming UO Σ is a UI for an ontology O and a signature Σ, we can compute deductive modules for O and Σ using the justifications of each axiom β UO Σ (see Figure 1). More precisely, consider the following collection of subontologies defined as the union of justifications:

Jα | Jα is a justification of α}.

Each element M in S is a sub-ontology of O and contains exactly one justification for each element in UO Σ . Thus it is a

Σ Deductive modules

UI Ontology Signature

justification

Figure 1: The main schema

deductive module for O and Σ. Moreover, we can conclude that the minimal modules for O and Σ are identical to the subset-minimal sub-ontologies in the collection S.

Example 4 (Example 3 cont d). Recall the ontology O, the signature Σ and the UI UO Σ of our running example. We know that there are one justification for β1: Jβ1 = {α1, α2}, two justifications for β2: J1 β2 = {α1, α2}, J2 β2 = {α3} and one justification for β3: Jβ3 = {α4, α5}. Therefore, the collection S = {O, O\{α3}} contains two different deductive modules for O and Σ. Moreover, O\{α3} is the unique minimal module for O and Σ.

In Figure 1, computing deductive modules through UI and justifications is shown by the black bold arrows. There are two main difficulties in implementing this simple idea: (i) computing UIs is hard, and (ii) a UI does not always exist. Our contribution to overcome these difficulties is to proceed to the computation of deductive modules through a new object F O Σ , called a forest, associated with the ontology O and the signature Σ. This alternative computation process is depicted by the red arrows in Figure 1. The notion of forest F O Σ , formally introduced hereafter, is inspired by the regular tree grammar developed in (Nikitina and Rudolph 2014). The forest F O Σ and the regular tree grammar can be regarded respectively as a set of derivation trees and derivation rules that generate candidate axioms of a UI. Our method for computing deductive modules relies on the structure of the trees in F O Σ instead of using justifications, and has mainly three benefits: (i) the computation of F O Σ is efficient; (ii) our method works no matter a UI exists or not; (iii) our method can be easily encoded as a SAT-problem and thus solved by efficient SAT-tools. Next, we develop the presentation of forests F O Σ and show the relation between F O Σ and UI (i.e., the dotted arrow in Figure 1). In the following sections, we introduce pseudominimal modules using F O Σ and describe For Mod, our SATbased algorithm, to compute them.

Definition of Forest F O Σ Next, we consider labeled trees. Each labeled tree t is associated with a label map lab that maps (i) node n in t to an atomic concept; (ii) edge e in t to a condition of the form α O or O|=A B. For simplicity, we use r(t) to denote the root of t, and At the label of the root (i.e., At = lab(r(t))). The forest F O Σ consists of two kinds of trees, forward trees and backward trees, defined as follows:

Definition 4. A labeled tree t+ is a forward tree (hereafter f-tree) from A to Σ over O iff:

A α1 풪 횝+ ퟷ

Figure 2: f-trees (blue words are the label of edges).

1. the label of the root of t+ is A (i.e., At+ = A); 2. for node n t+ distinct from the root, lab(n) Σ iff n is a leaf of t+, 3. if the child set of a node n0 t+ is {n1, , nm} and Bi = lab(ni), 0 i m, then for each 0 i m, one of the following conditions holds: (a) B0 r.Bi O and r Σ, (b) O |= B0 Bi, where Bi is not primitive or Bi Σ. The edge e from n0 to ni is labeled by the condition1 that generates e. For the sake of the presentation, edges labeled by B0 r.Bi O (i.e., case 3(a)) are called r-edges. Example 5 (Example 4 cont d). For simplicity and without ambiguity, in the figure, we represent a node n by its label lab(n) and mark r-edges by r . In Figure 2, t+ 1 , t+ 2 are two f-trees from A to Σ over O, and t+ 2 is a sub-tree of t+ 1 . There are still 5 other f-trees from A to Σ over O that are proper sub-trees of t+ 1 . For Σ =Σ {B1}, the trees t+ 1 , t+ 2 are no longer f-trees from A to Σ over O, because t+ 1 , t+ 2 both contain an internal node labeled by B1 Σ , which violates the requirement 2 in Definition 4. Definition 5. A labeled tree t is a backward tree (hereafter b-tree) from A to Σ over O iff: 1. the label of the root of t is A (i.e., At = A); 2. for node n t distinct from the root, lab(n) Σ iff n is a leaf of t ; 3. if the child set of a node n0 t is {n1, , nm} and Bi = lab(ni), 0 i m, then one of the following conditions holds: (a) m = 1, r.B1 B0 O, r Σ; (b) m = 1, O |= B1 B0, B1 is not primitive or B1 Σ; (c) m > 1, B1 Bm B0 O. Again, here, the edge e from n0 to ni is labeled by the condition generating e. Similarly, edges labeled with r.B1 B0 O are called r-edges. Note that for b-trees, all edges from a node to its children are generated by the same condition. Example 6 (Example 4 cont d). In Figure 3, t 1 is the only b-tree from A2 to Σ over O. Note that the tree on the right of Figure 3 is not a b-tree from A2 to Σ over O because we do not have A3 B2 O nor O |= A3 B2, which violates the requirement 3 of Definition 5.

1The label lab(e) is B r.Bi O for the case 3(a); lab(e) is O|=B Bi for the case 3(b).

Not a b-tree:

Figure 3: b-tree (an example and a counter-example)

The forest F O Σ associated with an ontology O and a signature Σ is defined as follows: Definition 6. The forest F O Σ is the collection of all f-trees and b-trees from A to Σ over O, where A sig(O). Notice that F O Σ can be an infinite set. For example: Example 7. Let O1={A r.B, B r.B, B r.A} and Σ1={A, r}. Then, F O1 Σ1 contains infinitely many f-trees t+ 1 , t+ 2 , from A to Σ1 over O1 as shown in Figure 4. The f-trees in Figure 4 are generated by the loop starting at the node labeled by B. More precisely, we say that a tree t F O Σ contains a loop if there is a path p from root r(t) to a leaf of t with two different nodes having the same label. For the same reason, there could be infinitely many b-trees in a forest F O Σ . We have that F O Σ is infinite iff some t F O Σ contains a loop.

Generating UI from Forest F O Σ Although our computation of deductive modules does not require computing UI, we investigate here how to generate a UI from a forest. These discussions (definitions and results) are necessary for presenting our method. First, we associate each f-tree or b-tree t with an ELconcept Ct defined as follows: Definition 7. For a f-tree or b-tree t, the corresponding ELconcept Ct is defined inductively by (i) if the child set of the root r(t) is empty, then Ct = At; (ii) if the child set of the root r(t) is {n1, , nm}, and there exists 0 k m such that the edge from r(t) to ni is a ri-edge for i k. Then

Ct = r1Ct1 rk Ctk Ctk+1 Ctm,

where ti is the maximal sub-tree of t rooted at ni. For instance, we have Ct+ 1 = r.(A2 A1) r.A1 for t+ 1 in Example 5 and we have Ct 1 = r.(A3 A4) for t 1 in Example 6. Now, the UI candidate generated from F O Σ is defined as follows. Definition 8. Given a forest F of b-trees and f-trees, the set of axioms UΣ(F) associated with F is the union of the three axiom sets below: U1 Σ(F)={Ct At | t F, At Σ} U2 Σ(F)={At+ Ct+ | t+ F, At+ Σ} U3 Σ(F)={Ct Ct+ | t , t+ F, At =At+ Σ}. UΣ(F) is a UI candidate because of the following result: Theorem 1. For any axiom α with sig(α) Σ, we have O |= α iff UΣ(F O Σ ) |= α. Therefore, UΣ(F O Σ ) is a UI for O and Σ if F O Σ is finite.

Figure 4: Some trees of F O1 Σ1

Next, we omit Σ in UΣ(F) and Ui Σ(F), i {1, 2, 3} if there is no ambiguity.

Example 8 (Example 5 and 6 cont d). Note that F O Σ is finite because no b-tree or f-tree over the ontology O contains a loop. We have U1(F O Σ ) = {β 1 : r.(A3 A4) A2},

U2(F O Σ ) = {β 2 : A r.(A2 A1) r.A1,

β 3 : A r.(A2 A1), β 4 : A r.A2 r.A1,

β 5 : A r.A1, β 6 : A r.A2},

and U3(F O Σ ) = . Here, β 1 is generated from t 1 F O Σ and β 2, β 3 are generated from t+ 1 , t+ 2 F O Σ respectively. β 4, β 5, β 6 are generated from sub-trees of t+ 1 other than t+ 2 . Then, U(F O Σ ) = U1(F O Σ ) U2(F O Σ ) U3(F O Σ ) is a UI for O and Σ since F O Σ is finite.

It may happen that F O Σ is infinite and nevertheless, a UI for O and Σ exists. In that case, we can still obtain a UI from F O Σ by extracting a finite subset of U(F O Σ ) as in (Nikitina and Rudolph 2014). We do not provide further details here as computing UI is not our main goal.

Pseudo-Minimal Modules

This section introduces a novel notion of deductive modules, called pseudo-minimal modules, based on the forest F O Σ . The definition of pseudo-minimal modules relies on tree-support and a finite representative subset of F O Σ defined below.

Tree-Support

Intuitively, tree-supports can be seen as analogs to justifications, although tree-supports are related to tree derivation, whereas justifications are related to axiom inference.

Definition 9. A tree-support of a tree t F O Σ is a subontology of O defined as the union of the following axiom sets: (i) {α}, for each edge e t labeled by α O; (ii) a justification JA B O of A B, for each edge e t labeled by O |= A B. Let Supp(t) be the collection of all tree-supports of t.

For example, in Example 6, the only tree-support of t 1 is {α4, α5} since all edges in t 1 are labeled by α4 O or α5 O. Different tree-supports can be obtained depending on the different choices of justification JA B. Compared to justifications of α U(F O Σ ), tree-supports are easier to compute since (i) the first component {α} of

tree-supports is obtained directly; (ii) computing justifications of A B, where A, B NC, is easier than computing the justifications of an arbitrary axiom α U(F O Σ ); (iii) the computation of tree-supports can be encoded as Hornclauses and solved by efficient SAT tools as shown in the next section.

Finding a Finite Representative Subset of F O Σ Now, we extract a finite subset of F O Σ . First, we partition F O Σ into three disjoint sets F1, F2 and F3 as follows:

F1 = {t F O Σ | t contains a loop}

F2 = {t+ F O Σ \F1 | t F1 such that At+=At Σ}

{t F O Σ \F1 | t+ F1 such that At =At+ Σ}

F3 = F O Σ \(F1 F2).

Then F2, F3 are finite sets since they do not contain trees with loop, F1 is infinite iff F O Σ is infinite. Second, we select a finite subset F 1 of F1 as follows. Let us say that two f-trees (or b-trees) are equivalent iff they share the same set of edge labels. Then F 1 F1 is obtained by selecting one representative tree for each equivalent class in F1. Then, because the number of labels of the form α O or O|=A B is finite given an ontology O, the number of equivalent classes is finite and thus F 1 is finite. Finally, we obtain a finite representative subset F 1 F2 F3 F O Σ . Note that if F O Σ is finite, then F1=F2=F 1 = , F3=F O Σ and thus F 1 F2 F3 is F O Σ itself.

Pseudo-Minimal Modules Now, we formally introduce pseudo-minimal modules as follows:

Definition 10. A pseudo-minimal module for O and Σ is a minimal element in the collection SO Σ := { S t F 1 F2 F St | St Supp(t), F F3, U(F2 F ) |= U(F2 F3) }.

We can regard pseudo-minimal modules as an approximation of minimal modules because of the following result:

Theorem 2. A pseudo-minimal module for O and Σ is a deductive module for O and Σ. Moreover, if F O Σ is finite, then M is a pseudo-minimal module for O and Σ iff M is a minimal module for O and Σ.

Notice that, in Definition 10 of SO Σ , if we do not consider F F3 satisfying U(F F2)|=U(F2 F3) (i.e., let F = F3), all the elements in SO Σ are still deductive modules for O and Σ. However, in this case, we can not guarantee that pseudo-minimal modules are minimal modules when F O Σ is finite. For example:

Example 9 (Example 8 cont d). Recall that F O Σ is finite, then F O Σ = F3. If we require F = F3, then SO Σ =

{S t F O Σ St St Supp(t)}. Since t+ 1 F O Σ has a unique

tree-support St+ 1 = {{α1, α2, α3}}, all the sets in SO Σ are super sets of St+ 1 and thus contain α3. However, as shown in Example 4, α3 does not belong to any minimal module for O

and Σ. On the other hand, if we allow F F3, we can get rid of t+ 1 because U(F ) |= U(F O Σ ) for F = F O Σ \ {t+ 1 }. Therefore, we can get rid of α3.

When F O Σ is infinite, we show by our evaluation (see Table 4) that pseudo-minimal modules are still very concise. This provides an experimental validation of the minimal module approximation defined by pseudo-minimal modules.

For Mod: A SAT-Based Algorithm Now, we present our algorithm For Mod that constructs a set of Horn-clauses CΣ and computes pseudo-minimal modules and complete modules using CΣ. Assume that we have computed the finite representative subset F 1 F2 F3 of F O Σ by enumeration of b-trees and f-trees.

Encoding by Horn-Clauses Recall that all pseudo-minimal modules are minimal elements in the collection SO Σ . In the following, we encode the extraction of these minimal elements by a set of Hornclauses: CΣ. We associate to each α O, β U(F 1 F2 F3), t F O Σ and each edge e t a literal lα, lβ, lt, le, respectively. Also, we introduce a new literal lΣ to capture pseudo-minimal modules for O and Σ. Now, the encoding is decomposed into two parts: (i) The first part (items 1, 2 and 3 below) encodes the computation of the subsets F F3 such that U(F2 F )|=U(F2 F3); (ii) The second part (items 4 and 5 below) encodes the computation of the tree-supports St Supp(t). In details, CΣ consists of:

1. ( β U(F2 F3) lβ) ( t F 1 F2 lt) lΣ; 2. for each axiom β U(F2 F3), a Horn-clause is built depending on the provenance of β in U(F2 F3) (recall Definition 8):

(a) lt lβ if β = Ct At U1(F2 F3), (b) lt+ lβ if β = At+ Ct+ U2(F2 F3), (c) lt+ lt lβ if β = Ct Ct+ U3(F2 F3),

for some t , t+ F2 F3. 3. lβ1 lβn lβ, for each β U(F2 F3) and each justification {β1, , βn} of β w.r.t. U(F2 F3); 4. ( e tle) lt, for each t F 1 F2 F3; 5. For each edge e t, where t F 1 F2 F3:

(a) ( α JA Blα) le, if lab(e) is O |= A B, where JA B O is a justification of A B; (b) lα le, if lab(e) is α O.

Above, for clarity, building the clauses of item 5(a) is presented using justifications. In the implementation, the computation of these justifications is itself encoded by a set of Horn-clauses as in (Yang, Ma, and Bidoit 2022).

Example 10 (Example 6 and 8 cont d). For simplicity, we assume F O Σ = {t+ 1 , t+ 2 , t 1 } and U(F O Σ ) = {β 1, β 2, β 3}. Indeed, this simplification is based on the optimization steps that are explained later. Let e1, , e7 be the edges of the trees in F O Σ as illustrated in Figure 5. Then the set of Hornclauses CΣ is shown in Table 1.

1: lβ 1 lβ 2 lβ 3 lΣ 2(a): lt 1 lβ 1, (b): lt+ 1 lβ 2, lt+ 2 lβ 3 3: lβ 2 lβ 3, lβ 3 lβ 2; 4: le1 le2 le3 le4 lt+ 1 , le1 le2 le3 lt+ 2 , le5 le6 le7 lt 1 5(a): lα2 le2, lα2 le3, (b): lα1 le1, lα3 le4, lα4 le5, lα5 le6, lα5 le7

Table 1: Clause set CΣ

Complete and Pseudo-Minimal Modules Let the answer literals be the literals lα associated with axioms α O. We can extract a complete module directly from CΣ as follows. Theorem 3. Mc={α O | lα is an answer literal in CΣ} is a complete module for O and Σ. For instance, in Example 10, the complete module Mc for O and Σ is O itself. Pseudo-minimal modules are computed from CΣ as in (Kazakov and Skoˇcovsk y 2018). We apply resolution over CΣ { lΣ} which leads to a set of clauses denoted by Cf Σ. Assuming MΣ Cf Σ is the subset of minimal2 clauses composed of answer literals only, we have: Corollary 1. M={α1, , αk} is a pesudo-minimal module for O and Σ iff ( k i=1lαi) MΣ. For instance, in Example 10, we obtain MΣ = {lα1 lα2 lα4 lα5 } by resolution over CΣ { lΣ}. Therefore, the only pseudo-minimal module for O and Σ is {α1, α2, α4, α5}, which is also the unique minimal module for O and Σ since the corresponding forest F O Σ is finite.

Optimization As mentioned in Example 10, we can reduce the size of F O Σ by ignoring some redundant trees. For example, we can ignore those trees that do not contribute to the generation of axioms in U(F O Σ ). Furthermore, we can ignore some sub-ftrees based on the following result: Corollary 2. Theorem 1, 2 and 3 still hold if we ignore the ftrees t+ F O Σ that satisfies (i) t+ has an edge, starting from the root r(t+), which is not a r-edge; or (ii) t+ is a proper sub-tree of a f-tree t+ 1 F O Σ such that r(t) = r(t+ 1 ) and |= Ct+ Ct+ 1 .

In Example 10, t+ 1 has 6 different sub-trees, but only t+ 2 satisfies |= Ct+ 1 Ct+ 2 . Therefore, we can ignore the other 5 sub-trees and thus now F O Σ ={t+ 1 , t+ 2 , t 1 }.

Evaluation We implemented a prototype For Mod3 of our algorithm in Python and evaluated it over three real-world ontologies:

2c1 is smaller than c2 if all literals of c1 are in c2. 3https://gitlab.lisn.upsaclay.fr/yang/formod

r . A2 횝 ퟷe5

Figure 5: Trees in F O Σ

Σsn16 50 Σsn16 100 Σnci 50 Σnci 100 Σsn21 50 Σsn21 100 Zoom 57.1 32.5 79.0 57.3 - - For Mod 83.8 79.2 92.3 74.2 98.9 88.1

Table 2: Success rate (%)

Snomed CT (versions Jan 2016 and Jan 2021) and NCI (version 16.03d)4. We denote them as sn16, sn21, nci, respectively. Here, sn16 and nci are two EL-terminologies containing 317891 and 165341 axioms respectively. And sn21 is an EL-ontology with 362638 axioms. All the experiments run on a machine with an Intel Xeon Core 4 Duo CPU 2.50 GHz with 64 Gi B of RAM. For each ontology O, we run the experiments over 2 sets ΣO n of 1000 randomly generated signatures, where each signature has n concepts (n {50, 100}) and 10 roles.

Pseudo-Minimal Modules

Recall that Zoom (Chen et al. 2017) is the state-of-the-art algorithm that computes all the minimal modules but only for EL-terminologies. First, we compare all pseudo-minimal modules computed by For Mod with all minimal modules computed by Zoom. For each signature, we set the total run-time limit of 600s. The success rates (i.e., the percentage of completed experiments within the time limit) of For Mod and Zoom are summarized in Table 2. We can see that the success rate of For Mod is from 13.3% to 46.7% higher than Zoom. Note that Zoom does not work for sn21, which is not an ELterminology. Table 3 summarizes the time-cost comparison over signatures that are solved successfully by both For Mod and Zoom. We highlight that, for these signatures, the corresponding forests are indeed finite. Therefore, the pseudominimal modules are indeed minimal modules by Theorem 2. According to Table 3, For Mod is 99.79 times faster than Zoom on average. Note that, as discussed in (Chen et al. 2017), Zoom spends most of its running time (94.6% on average) on computing justifications using Beacon (Arif et al. 2016), which is less efficient than the resolution we use. However, even if we ignore the time cost of Beacon (i.e. only consider 5.4% computation time of Zoom), For Mod is still 5.67, 7.48, 3.72, 5.33 times faster than Zoom on average for Σsn16 50 , Σsn16 100 , Σnci 50 , Σnci 100, respectively. Second, in our experiments, there are 66 signatures in Σnci 50 Σnci 100 for which F O Σ is infinite, and 43 of them are

4https://www.snomed.org,http://evs.nci.nih.gov

Time(s) For Mod Zoom Σsn16 50 17.20 / 1.09 / 1.77 / 1.73 558.76 / 34.85 / 186.04 / 143.53 Σsn16 100 9.75 / 1.29 / 2.18 / 2.10 563.24 / 71.38 / 302.24 / 294.19 Σnci 50 8.89 / 0.74 / 1.32 / 1.28 560.89 / 7.81 / 91.08 / 60.42 Σnci 100 12.47 / 0.89 / 1.47 / 1.42 576.35 / 15.49 / 145.32 / 105.92

Table 3: Time cost (max / min / mean / median)

Module size -module pseudo-minimal module Σnci 50 7499 / 6340 / 7126.24 80 / 12 / 31.06 Σnci 100 8091 / 1936 / 7317.31 111 / 9 / 52.81

Table 4: Signatures with infinite F O Σ (max / min / mean )

solved within the time limit by For Mod, but all of them are time-out for Zoom. So we compare pseudo-minimal modules of these signatures with -modules implemented by OWL API (Grau et al. 2008) instead. The result is summarized in Table 4. We observe that the pseudo-minimal modules are still very concise in terms of size and are significantly smaller than the -modules.

Complete Modules

Here, we compare our complete modules generated by For Mod with -modules. Note that, for all signatures tested, the corresponding complete modules and - modules are all computed within 32s, i.e., the success rate is 100%. The size comparison of those modules is shown in Table 5. We can see that the size of complete modules is significantly smaller than that of -modules (103.5 times smaller on average) for all ontologies. Table 6 summarizes the time cost comparison between complete modules and -modules. It shows that the computation of complete modules is faster than that of - modules except for the maximal time cost.

Related Work

For computing deductive modules, the paper (Chen et al. 2017) developed a recursive algorithm Zoom, which computes minimal modules by computing subsumption justifications based on a logical difference algorithm proposed in (Ludwig and Walther 2014). Zoom can always compute minimal modules no matter whether a UI exists or not. However, Zoom only works for EL-terminologies since the logical difference algorithm only works for EL-terminologies. In (Koopmann and Chen 2020), deductive modules are computed w.r.t. ALCH, which is more expressive than EL. The authors compute a single deductive module by collecting all axioms contributing to the generation of a UI obtained by adapting lethe (Koopmann 2020). When a UI exists, they further compute a single minimal module by removing redundant axioms over the above obtained deductive module using a reasoner. We do not compare their method directly with For Mod because the UI and minimal modules are different if different languages (e.g., EL or ALCH) are used as the underlying semantics, even for the same ontology and

Module Size -module complete module pseudo-minimal

Σsn16 50 6628.21 74.53 8.80 Σsnt16 100 13159.89 140.68 20.97 Σnci 50 5560.64 52.02 24.36 Σnci 100 7327.67 73.91 37.53 Σsn21 50 4384.08 88.48 7.60 Σsnt21 100 15845.43 81.28 16.11

Table 5: Mean size of different deductive modules

Time(s) complete module (For Mod) -module (OWL API)

Σsn16 50 27.10 / 1.09 / 2.92 /1.95 9.36 / 5.60 / 6.55 / 6.45 Σsn16 100 31.60 / 1.29 / 4.76 / 2.60 9.68 / 5.76 / 6.74 / 6.65 Σnci 50 3.45 / 0.74 / 1.32 / 1.30 2.88 / 1.16 / 1.90 / 1.97 Σnci 100 23.49 / 0.89 / 1.46 / 1.42 2.90 / 1.28 / 1.98 / 1.97 Σsn21 50 28.06 / 1.34 / 2.62 / 2.27 12.90 / 3.55 / 8.57 / 8.88 Σsn21 100 17.71 / 2.10 / 3.48 / 3.17 14.19 / 3.99 / 9.28 / 9.71

Table 6: Time cost (max / min / mean / median)

signature. Moreover, notice that their method always produces only one deductive module while ours computes all (pseudo-)minimal modules. Recall that Theorem 1 provides a way to compute UI based on forests F O Σ although we do not implement it. There are mainly two different approaches for computing UI: (i) Forgetting-based approach like lethe and Fame (Zhao and Schmidt 2018). They compute a UI by forgetting all the concepts and roles outside a signature Σ. (ii) Generation approach like NUI (Konev, Walther, and Wolter 2009). NUI works only for EL-terminology and when F O Σ is finite (has no Σ-loop in their case). It is shown in (Chen et al. 2019) that NUI is much more efficient than lethe and Fame on EL-terminologies. Moreover, generating UI from F O Σ underlined by Theorem 1 can be regarded as a generalization of NUI from EL-terminologies to EL-ontologies:

Proposition 3. Assume that O is an EL-terminology, Σ is a signature, and F O Σ is finite. Let F F O Σ be the subset consisting of all b-trees and maximal f-trees. Then, the UI for O and Σ computed by NUI is equivalent to U1 Σ(F) U2 Σ(F) (recall Definition 8).

Conclusion and Future Work

In this paper, we present pseudo-minimal modules and complete modules for EL-ontologies and a SAT-based algorithm For Mod to compute them. Our method is based on a novel notion of the forest, which enables to capture all the entailments over a given signature. The experiments on real-world ontologies validated the efficiency of our proposal as well as the quality of our deductive modules. As the next step, we plan to investigate how to generalize our ideas to more expressive ontologies such as EL+ and ALC. Also, we are interested in investigating such ideas on module notions that are not necessarily deductive modules (e.g., semantic modules (Konev et al. 2008)).

Arif, M. F.; Menc ıa, C.; Ignatiev, A.; Manthey, N.; Pe naloza, R.; and Marques-Silva, J. 2016. BEACON: An Efficient SAT-Based Tool for Debugging EL+ Ontologies. In International Conference on Theory and Applications of Satisfiability Testing, 521 530.

Arif, M. F.; Menc ıa, C.; and Marques-Silva, J. 2015. Efficient axiom pinpointing with EL2MCS. In Joint German/Austrian Conference on Artificial Intelligence (K unstliche Intelligenz), 225 233.

Calvanese, D.; Giacomo, G. D.; and Lenzerini, M. 1999. Reasoning in Expressive Description Logics with Fixpoints based on Automata on Infinite Trees. In International Joint Conference on Artificial Intelligence, 84 89.

Chen, J.; Alghamdi, G.; Schmidt, R. A.; Walther, D.; and Gao, Y. 2019. Ontology extraction for large ontologies via modularity and forgetting. In International Conference on Knowledge Capture, 45 52.

Chen, J.; Ludwig, M.; Ma, Y.; and Walther, D. 2017. Zooming in on Ontologies: Minimal Modules and Best Excerpts. In 16th International Semantic Web Conference, 173 189.

Gatens, W.; Konev, B.; and Wolter, F. 2014. Lower and Upper Approximations for Depleting Modules of Description Logic Ontologies. In European Conference on Artificial Intelligence, 345 350.

Grau, B. C.; Horrocks, I.; Kazakov, Y.; and Sattler, U. 2008. Modular reuse of ontologies: Theory and practice. Journal of Artificial Intelligence Research, 31: 273 318.

Jim enez-Ruiz, E.; Grau, B. C.; Sattler, U.; Schneider, T.; and Berlanga, R. 2008. Safe and economic re-use of ontologies: A logic-based methodology and tool support. In European Semantic Web Conference, 185 199.

Kazakov, Y.; and Skoˇcovsk y, P. 2018. Enumerating justifications using resolution. In International Joint Conference on Automated Reasoning, 609 626.

Konev, B.; Ludwig, M.; Walther, D.; and Wolter, F. 2012. The Logical Difference for the Lightweight Description Logic EL. J. Artif. Intell. Res., 44: 633 708.

Konev, B.; Lutz, C.; Walther, D.; and Wolter, F. 2008. Semantic Modularity and Module Extraction in Description Logics. In European Conference on Artificial Intelligence, volume 178 of Frontiers in Artificial Intelligence and Applications, 55 59.

Konev, B.; Walther, D.; and Wolter, F. 2009. Forgetting and Uniform Interpolation in Large-Scale Description Logic Terminologies. In 21st International Joint Conference on Artificial Intelligence, 830 835.

Koopmann, P. 2020. LETHE: Forgetting and uniform interpolation for expressive description logics. KI-K unstliche Intelligenz, 34(3): 381 387.

Koopmann, P.; and Chen, J. 2020. Deductive Module Extraction for Expressive Description Logics. In Bessiere, C., ed., International Joint Conference on Artificial Intelligence, 1636 1643.

Koopmann, P.; and Schmidt, R. A. 2013. Forgetting Concept and Role Symbols in ALCH-Ontologies. In Logic for Programming, Artificial Intelligence, and Reasoning - 19th International Conference, 552 567. Ludwig, M.; and Walther, D. 2014. The Logical Difference for ELHr-Terminologies using Hypergraphs. In Schaub, T.; Friedrich, G.; and O Sullivan, B., eds., European Conference on Artificial Intelligence, 18-22 August 2014, Prague, Czech Republic - Including Prestigious Applications of Intelligent Systems (PAIS 2014), volume 263 of Frontiers in Artificial Intelligence and Applications, 555 560. IOS Press. Lutz, C.; Piro, R.; and Wolter, F. 2010. Enriching ELconcepts with greatest fixpoints. In European Conference on Artificial Intelligence, 41 46. Manthey, N.; Pe naloza, R.; and Rudolph, S. 2016. Efficient Axiom Pinpointing in EL using SAT Technology. In Description Logics. Nikitina, N.; and Rudolph, S. 2014. (Non-) Succinctness of uniform interpolants of general terminologies in the description logic EL. Artificial Intelligence, 215: 120 140. Sattler, U.; Schneider, T.; and Zakharyaschev, M. 2009. Which kind of module should I extract? Description Logics, 477: 78. Yang, H.; Ma, Y.; and Bidoit, N. 2022. Hypergraph-Based Inference Rules for Computing EL+-Ontology Justifications. In International Joint Conference on Automated Reasoning, 310 328. Zhao, Y.; and Schmidt, R. A. 2018. FAME: an automated tool for semantic forgetting in expressive description logics. In International Joint Conference on Automated Reasoning, 19 27.