# augmenting_transfer_learning_with_semantic_reasoning__8ff4b182.pdf

Augmenting Transfer Learning with Semantic Reasoning

Freddy L ecu e1,2 , Jiaoyan Chen3 , Jeff Z. Pan4,5 and Huajun Chen6,7

1Cort AIx Thales, Montreal, Canada 2Inria, Sophia Antipolis, France 3Department of Computer Science, University of Oxford, UK 4Department of Computer Science, The University of Aberdeen, UK 5Edinburgh Research Centre, Huawei, UK 6College of Computer Science, Zhejiang University, China 7ZJU-Alibaba Joint Lab on Knowledge Engine, China

Transfer learning aims at building robust prediction models by transferring knowledge gained from one problem to another. In the semantic Web, learning tasks are enhanced with semantic representations. We exploit their semantics to augment transfer learning by dealing with when to transfer with semantic measurements and what to transfer with semantic embeddings. We further present a general framework that integrates the above measurements and embeddings with existing transfer learning algorithms for higher performance. It has demonstrated to be robust in two real-world applications: bus delay forecasting and air quality forecasting.

1 Introduction

Transfer learning [Pan and Yang, 2010] aims at solving the problem of lacking training data by utilizing data from other related learning domains, each of which is referred to as a pair of dataset and prediction task. Transfer Learning plays a critical role in real-world applications of ML as (labelled) data is usualy not large enough to train accurate and robust models. Most approaches focus on similarity in raw data distribution with techniques such as dynamic weighting of instances [Dai et al., 2007] and model parameters sharing [Benavides-Prado et al., 2017] (cf. Related Work). Despite of a large spectrum of techniques [Weiss et al., 2016] in transfer learning, it remains challenging to assess a priori which domain and data set to elaborate from [Dai et al., 2009]. To deal with such challenges, [Choi et al., 2016] integrated expert feedback as semantic representation on domain similarity for knowledge transfer while [Lee et al., 2017] evaluated the graph-based representations of source and target domains. Both studies encode semantics but are limited by the expressivity, which restricts domains interpretability and inhibits a good understanding of transferability. There are also efforts on Markov Logic Networks (MLN) based transfer learning, by using ﬁrst order [Mihalkova et al., 2007; Mihalkova and Mooney, 2009] or second order [Davis and Domingos, 2009; Van Haaren et al., 2015] rules as declarative prediction models. However, these efforts still cannot

answer questions like: What ensures a positive domain transfer? Would learning a model from road trafﬁc congestion in London be the best for predicting congestion in Paris? Or would an air quality model transfer better? In this paper, we propose to encode the semantics of learning tasks and domains with OWL ontologies and provide a robust foundation to study transferability between source and target learning domains. From knowledge materialization [Nickel et al., 2016], feature selection [Vicient et al., 2013], predictive reasoning [L ecu e and Pan, 2015], stream learning [Chen et al., 2017] to transfer learning explanation [Chen et al., 2018], all are examples of inference tasks where the semantics of data representation are exploited for deriving a priori knowledge from pre-established statements in ML tasks. We introduce a framework to augment transfer learning by semantics and its reasoning capability, as shown in Figure 1. It deals with (i) when to transfer by suitable transferability measurements (i.e., variability of semantic learning task and consistent transferability knowledge), (ii) what to transfer by embedding the semantics of learning domains and tasks with transferability vector, consistent vector and variability vector. In addition to expose semantics that drives transfer, a transfer boosting algorithm is developed to integrate the embeddings with existing transfer learning approaches. Our approach achieves high performance for multiple transfer learning tasks in air quality and bus delay forecasting.

When to Transfer?

What to Transfer?

Source Domain

Target Domain

Transfer Learning Task

Variability of Semantic Learning Task

Consistent Transferability Knowledge

Transferability Vector

Consistent Vector

Variability Vector

Semantic Transfer Boosting Algorithm

Figure 1: Ontology-based Transfer Learning Augmentation.

2 Background

Our work uses OWL ontologies underpinned by Description Logic (DL) EL++ [Baader et al., 2005][Bechhofer et al., 2004] to model the semantics of learning domains and tasks.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

2.1 Description Logics EL++ and Ontology A signature Σ, noted (NC, NR, NI) consists of 3 disjoint sets of (i) atomic concepts NC, (ii) atomic roles NR, and (iii) individuals NI. Given a signature, the top concept , the bottom concept , an atomic concept A, an individual a, an atomic role expression r, EL++ concept expressions C and D in C can be composed with the following constructs:

| | A | C D | r.C | {a}

A DL ontology is composed of a TBox T and an ABox A. T is a set of concept, role axioms. EL++ supports General Concept Inclusion axioms (GCIs e.g., C D), Role Inclusion axioms (RIs e.g., r s ). A is a set of class assertion axioms, e.g., C(a), role assertion axioms, e.g., r(a, b), individual in/equality axioms e.g., a = b, a = b. Given an input ontology T A, we consider the closure of atomic ABox entailments (or simply entialment closure, denoted as G(T A)) as {g|T A |= g}, where g represents an atomic concept assertion A(b), or an atomic role assertion entailment r(a, b), involving only named concepts, named roles and named individuals. Entailment reasoning in EL++ is PTime-Complete.

Example 1. (TBox and ABox Concept Assertion Axioms) Figure 2 presents (i) a TBox T where Road (1) denotes the concept of ways which are in a continent , and (ii) concept assertions (8-9) with individuals r0 and r1 being roads.

Figure 2: Sample of an Ontology s TBox T and ABox A.

2.2 Learning Domain and Task To model the learning domain with ontology, we use Learning Sample Ontology and Target Entailment, as in [Chen et al., 2018]. A learning domain consists of an LSO set (i.e., dataset) and a target entailment set (i.e., prediction task).

Deﬁnition 1. (Learning Sample Ontology (LSO)) A learning sample ontology O = T , A , S is an ontology T , A annotated by property-value pairs S.

The annotation S acts as key dimensions to uniquely identify an input sample of ML methods. When the context is clear, we also use LSO to refer to its ontology T , A .

Example 2. (An LSO in Context of Ireland Trafﬁc) Assume an LSO is annotated by property-value pairs S := {topic: Road, road : C Way, country : UK}. Its TBox T includes static axioms like (1); its ABox A includes facts e.g., has Avg Speed(r0, Low) that are observed in C Way in UK.

Deﬁnition 2. (Learning Domain and Target Entailment) A learning domain D = O, GY consists of a set of LSOs O that share the same TBox T , and target entailments GY, each of whose truth in an LSO is to be predicted. Its entailment closure, denoted as G(O) is deﬁned as O OG(O).

Deﬁnition 3 revisits supervised learning within a domain. In a training LSO, a target entailment is true if it is entailed by an LSO, and false otherwise. In a testing LSO, the truth of a target entailment is to be predicted instead of being inferred. Deﬁnition 3. (Semantic Learning Task) Given a learning domain D = O, GY , whose LSOs O are divided into two disjoint sets O and O , a semantic learning task, denoted by T, within D, is deﬁned as: D, O , O , f( ) i.e., the task of identifying a function f( ) with O and GY to predict the truth of GY in each O in O . Here, O is called a training LSO set, while O is called a testing LSO set. Example 3. (Semantic Learning Task) Given a domain composed of LSOs annotated by {topic : Road, country : UK} and target entailments Cleared(r0) and Disrupted(r0), the LSOs are divided into a training set O and a testing set O according to the type of roads involved, the objective is to identify a function from O to predict the condition of road r0, namely the truth of Cleared(r0) and Disrupted(r0) in each LSO in O .

2.3 Transfer Learning Across Domains Deﬁnition 4 revisits transfer learning where DS and DT are called source domain and target domain and their entailment closures are denoted as GS and GT . Deﬁnition 4. (Transfer Learning) Given two learning domains DS = OS, GY S and DT = OT , GY T , where the LSOs of DT are divided into two disjoint sets O T and O T , transfer learning from DS to DT is a task of learning a prediction function f T |S( ) from OS, GY S , O T and GY T to predict the truth of GY T in each LSO in O β.

Example 4. (Transfer Learning) Assume DT is the domain in Example 3, DS is a domain with LSOs annotated by {topic: Road, country: IE}, an example of transfer learning is to identify a function using all the LSOs of Dublin trafﬁc and the training LSOs of London trafﬁc (O T ) for predicting the trafﬁc condition of road r0 in each testing LSO of London trafﬁc (O T ). We demonstrate how ontology-based descriptions can drive transfer learning from one domain to another. To this end, similarities between domains are ﬁrst characterized. We adopt the variability of ABox entailments [L ecu e, 2015] in Deﬁnition 5, where (10) reﬂects variant knowledge between two domains while (11) denotes invariant knowledge. Deﬁnition 5. (Entailment-based Domain Variability) Given a source learning domain DS and a target learning domain DT , let G = GS GT , the variability from DS to DT , denoted as (OT , OS) are ABox entailments:

G[S],[T ] var = {g G | g GT g GS} (10)

G[S],[T ] inv = {g G | g GT g GS} (11)

Example 5. (Entailment-based Domain Variability) Let Figure 3, which capture the contexts in IE and UK, be ontologies of DS and DT respectively. Table 1 illustrates some variabilities of DS and DT through ABox entailments. For instance r1 as a disrupted road in DS is new (variant) w.r.t. knowledge in DT and axioms (1), (9) and (12-15).

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

Figure 3: [Up] Source Domain Ontologies OS in Context of IE Trafﬁc; [Down] Target Domain Ontologies OT in Context of UK Trafﬁc.

Ontology Variability (DS, DT ) variant invariant Road(r3) Cleared(r1) Disrupted(r0)

Table 1: Examples for Entailment-based Domain Variability.

3 Transferability

We present (i) variability of semantic learning tasks, (ii) semantic transferability, as a basis for qualifying, quantifying transfer learning (i.e., when to transfer), together with (iii) indicators (i.e., what to transfer) driving transferability. They are pivotal properties, as any change in domains, their transfer function and consistency drastically impact the quality of derived models [Long et al., 2015; Chen et al., 2018].

3.1 Variability of Semantic Learning Tasks Deﬁnition 6 extends entailment-based ontology variability (Deﬁnition 5) to capture the learning task variability, where ( )[YS],[YT ] represents using target entailments in (10) (11). Deﬁnition 6. (Variability of Semantic Learning Tasks) Let TS and TT be semantic learning tasks of source learning domain DS and target learning domain DT . The variability of semantic learning tasks (TS, TT ) is deﬁned by (22), where | | refers to the cardinality of a set. |G[S],[T ] var |

|G[S],[T ] var | + |G[S],[T ] inv | , |G[YS],[YT ] var |

|G[YS],[YT ] var | + |G[YS],[YT ] inv |

The variability of semantic learning tasks (22), also represented by ( (TS, TT )|O, (TS, TT )|Y) in [0, 1], captures the variability of source and target domain LSOs as well as the variability of target entailments. The higher values the stronger variability. The calculation of (22) is in worst case polynomial time w.r.t size OS, OT , YS, YT in EL++. Its evaluation requires (i) ABox entailment, (ii) basic set theory operations from Deﬁnition 5, both in polynomial time [Baader et al., 2005].

Example 6. (Variability of Semantic Learning Tasks) The variability of learning task between TS and TT in Example 4 is (2/3, 0) as the number of variant and invariant ABox entailments are respectively 6 and 3, and YS = YT . i.e., moderate variability of domains, none for target variables.

3.2 Semantic Transferability - When to Transfer? We deﬁne semantic transferability from a source to a target semantic learning task as the existence of knowledge that are captured as ABox entailments [Pan and Thomas, 2007] in the

source and have positive effects on predictive quality of the prediction function of the target semantic learning task.

Deﬁnition 7. (Semantic ε-Transferability) Let TS, TT be source, target semantic learning tasks with entailment closures GS, GT . Semantic ε-transferability TS ε7 TT occurs from TS to TT iff S OS :

m(f T |S( )) m(f T ( )) > ε (23) GS = GT (24)

where f T |S( ) is the predictive function f T ( ) w.r.t. OT S. GS is the ABox closures of S.

S is knowledge from OS, to be used for over-performing the predictive quality of f T ( ) with a ε (0, 1] factor (23) while being new with respect to ABox entailments in GT (24).

Example 7. (Semantic ε-Transferability) Let TS, TT be semantic learning tasks in DS, DT in Example 4, S be ABox entailment closure of (12-15) in OS, and m(f T |S( )) > m(f T ( )). Semantic ε-transferability occurs from TS to TT as (i) an ε > 0, satisfying condition (23), exists, and (ii) (24) is true cf. Table 1 w.r.t. S. Thus, knowledge S in IE trafﬁc context (DS) ensures transferability from DS to DT for trafﬁc prediction in UK.

ABox entailments S satisfying Deﬁnition 7 are denoted as transferable knowledge while those contradicting (23) i.e., m(f T |S( )) m(f T ( )) ε are non-transferable knowledge as they deteriorate predictive quality of target function f T ( ).

Example 8. (Transferable Knowledge) Consider entailments in S: (i) Disrupted(r4), derived from (13) (19-21), (ii) Cleared(r0), derived from (8) (12) (17-18). As part of knowledge S positively impacting the quality of the prediction task, they are also separate ε-transferable knowledge with max ε: .1, .07 (computation details omitted).

3.3 Consistent Transferable Knowledge

Transferring knowledge across domains can derive to inconsistency. Deﬁnition 8 captures knowledge ensuring transferability while maintaining consistency in the target domain.

Deﬁnition 8. (Consistent Transferable Knowledge) Let S be ABox entailments ensuring TS ε7 TT . S is consistent transferable knowledge from TS to TT iff S OT |= .

ABox entailments S satisfying S OT |= are called inconsistent transferable knowledge. They are interesting ABox entailments as they expose knowledge contradicting the target domain while maintaining transferability. Evaluating if S is consistent transferable knowledge is in worst case polynomial time in EL++ w.r.t. size of S and OT .

Example 9. ((In-)Consistent Transferable Knowledge) Disrupted(r4) in S of Example 8 is consistent transferable knowledge in TT as {Disrupted(r4)} OT |= . On contrary Cleared(r0) and Disrupted(r0) in S, derived from (16-18) are inconsistent (7). Thus, Cleared(r0) in OS is inconsistent transferable knowledge in TT .

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

4 Semantic Transfer Learning

We tackle the problem of transfer learning by computing semantic embeddings (i.e., how to transfer) for knowledge transfer, and determining a strategy to exploit the semantics of the learning tasks (Section 3) in Algorithm 1.

4.1 Semantic Embeddings - How to transfer?

The semantics of learning tasks exposes three levels of knowledge which are crucial for transfer learning: variability, transferability, consistency. They are encoded as embeddings through Deﬁnition 9, 10 and 11.

Deﬁnition 9. (Transferability Vector) Let G = {g1, . . . , gm} be all distinct ABox entailments in OS OT . A transferability vector from TS to TT , denoted by t(G), is a vector of dimension m such that j [1, m]: tj .= εj if gj is εj-transferable knowledge, and tj .= 0 otherwise, with εj | ε j, εj < ε j and g is ε j-transferable knowledge.

A transferability vector (Deﬁnition 9) is adapting the concept of feature vector [Bishop, 2006] in Machine Learning to represent the qualitative transferability from source to target of all ABox entailments. Each dimension captures the best of transferability of a particular ABox entailment.

Example 10. (Transferability Vector) Suppose G .= {Disrupted(r4), Cleared(r0)}. Transferability vector t(G) is (.1, .07) cf. ε-transferability in Example 8.

A consistency vector (Deﬁnition 10) is computed from all entailments by evaluating their (in-)consistency, either 1 or 0, when transferred in the target semantic learning task. Feature vectors are bounded to only raw data while transferability and consistency vectors, with larger dimensions, embed transferability and consistency of data and its inferred assertions. They ensure a larger, more contextual coverage.

Deﬁnition 10. (Consistency Vector) Let G = {g1, . . . , gm} be all distinct ABox entailments in OS OT . A consistency vector from TS to TT , denoted by c(G), is a vector of dimension m such that j [1, m]: cj = 1 if {gj} OT |= , and cj = 0 otherwise

The variability vector (Deﬁnition 11) is used as an indicator of semantic variability between the two learning tasks. It is a value in [0, 1] with an emphasis on the domain ontologies and / or label space depending on its parameterization (α, β). We characterize any variability weight above 1/2 as inter-domain transfer learning tasks, below 1/2 as intra-domain.

Deﬁnition 11. (Variability Vector) Let G = {g1, . . . , gm} be ABox entailments in OS OT . A variability vector v(G, α, β) from TS to TT is a vector of dimension m with α, β [0, 1] such that vj, j [1,m] is:

α( (TS, TT )|O) + β( (TS, TT )|Y)

Example 11. (Variability Vector) Applying (25) on the variability of semantic learning tasks between TS and TT : (2/3, 0) in Example 6 results in v(G, α, β) = 1/3, which represents moderate variability.

4.2 Boosting for Semantic Transfer Learning

Algorithm 1 presents an extension of the transfer learning method Tr Ada Boost [Dai et al., 2007] by integrating semantic embeddings. It aims at learning a predictive function f T |S( ) (line 20) using TS, OS , OT for TT . The semantic embeddings of all entailments in GS GT are computed (lines 7-8). They are deﬁned through transferability, consistency, variability effects from source to target domain. Then, their importance / weight w are iteratively adjusted (line 9) depending on the evaluation of f t (lines 13-14) when comparing estimated prediction f t(ei) and real values YT (gi). The base model (lines 11-12), which can be derived from any weak learner e.g., Logistic Regression, is built on top of all entailments in source, target tasks. However, entailments from the source might be wrongly predicted due to tasks variability (Deﬁnition 6 - line 8) TS, TT . Thus, we follow the parameterization of γ and γt [Dai et al., 2007] by decreasing the weights of such entailments to reduce their effects (lines 17-19). In the next iteration, the misclassiﬁed source entailments, which are dissimilar to the target ones w.r.t. semantic embeddings, will affect the learning process less than the current iteration. Finally, St Ada B returns a binary hypothesis (line 20). Multi-class classiﬁcation can be easily applied.

Algorithm 1: St Ada B( DS, TS , DT , TT , O T , G, L, N, α, β)

1 Input: (i) Source/target learning domains and tasks DS, TS , DT , TT , (ii) a training LSO set of the target learning domain O T , (iii) distinct ABox entailments G = {g1, . . . , gm} of OS O T , (iv) a base learning algorithm L, (v) max. iterations N, (vi) α, β [0, 1].

2 Result: f T |S: A predictive function by DS, TS, O T , GY T for TT .

4 % Initialization of weights for transferability, consistency,

5 % and variability vectors of all m ABox entailments in G.

6 Initialization of w1 .= (w1 1, , w1 3 m);

7 % Computation of semantic embeddings for all gi G.

8 ei (t(gi), c(gi), v(G, α, β)), i {1, , m};

9 foreach t = 1, 2, ..., N do % Weight computation iteration

10 pt wt/P3m i=1 wt i; % Probability distribution of wt.

11 % Predictive function f t over OS O T .

12 (f t : ei YT (ei)) L(e, pt, YT );

13 % Error computation of f t on TT , O T .

i|gi GT wt i |ft(ei) YT (gi)| P i|gi GT ;

15 % Weights for reducing errors on TT over iteration.

16 γt ψt/(1 ψt); γ 1/(1 + p

2 ln(|GS |/N));

17 % Weight update of source and target entailments in G.

18 % using γt, γ, and results from previous iteration: wt i.

( wt i γ |ft(ei) YT (ei)| t , if gi GT wt i γ|ft(ei) YT (ei)|, else

20 return Hypothesis ensemble:

f T |S(e) =

( 1, if QN t= N/2 β ft(e) t QN t= N/2 β 1

2 t 0, else

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

A brute force approach would consist in generating an exponential number of models with any combination of entailments from source, target. St Ada B reduced its complexity by only evaluating atomic impact and (approximately) computing the optimal combination. As a side effect, St Ada B exposes entailments in the source which are driving transfer learning (cf. ﬁnal weight assignment of embeddings).

5 Experimental Results St Ada B is evaluated by two Intra-domain transfer learning cases: (i) air quality forecasting from Beijing to Hangzhou (IBH), (ii) trafﬁc condition prediction from London to Dublin (ILD), one Inter-domain case: (iii) from trafﬁc condition prediction in London to air quality forecasting in Beijing (ILB). Accuracy with cross validation is reported. All tasks are performed with a respective value of .3, .4, .7 for variability v(G, α, β). α and β are set to .5. In IBH1, air quality knowledge in Beijing (source) is transferred to Hangzhou (target) for forecasting air quality index, ranging from Good (value 5), Moderate (4), Unhealthy (3), Very Unhealthy (2), Hazardous (1) to Emergent (0). The observations include air pollutants (e.g., PM2.5), meteorology elements (e.g., wind speed) and weather condition from 12 stations. The semantics of observations is based on a DL ALEH(D) ontology, including 48 concepts, 15 roles, 598 axioms. 1, 065, 600 RDF triples are generated on a daily basis. 18 (resp. 5) months of observations are used as training (resp. testing). Even though the ontologies are from the same domain, the proportion of similar concepts and roles are respectively .81 (i.e., 81% of concepts are similar) and .74. For instance, no hazardous air quality concept in Hangzhou. In ILD, bus delay knowledge in London (source) is transferred to Dublin (target) for predicting trafﬁc conditions classiﬁed as Free (value 4), Low (3), Moderate (2), Heavy (1), Stopped (0). Source and target domain data include bus location, delay, congestion status, weather conditions. We enrich the data using a DL EL++ domain ontology (55 concepts, 19 roles, 25, 456 axioms). 178, 700, 000 RDF triples are generated on a daily basis. 24 (resp. 8) months of observations are used as training (resp. testing). The concept and role similarities among the two ontologies are respectively .73 and .77. In ILB, bus delay knowledge in London (source) is transferred to a very different domain: Beijing (target) for forecasting air quality index. Data and ontologies from IBH and ILD are considered. Both domains share some common and conﬂicting knowledge. Inconsistency might then occur. For instance, both domains have the concepts of City, weather such as Wind but are conﬂicting on their importance and impact on the targeted variable i.e., bus delay in London and air quality in Beijing. The concept and role similarities among the two ontologies are respectively .23 and .17.

5.1 Semantic Impact Table 2 reports the impact of considering semantics (cf. Sem. vs. Basic) and (in)consistency (cf. Consistency / Inconsistency) in semantic embeddings on Random Forest (RF),

1Air quality data: https://bit.ly/2BUx Ksi. See more about the application and data in [Chen et al., 2015].

Stochastic Gradient Descent (SGD), Ada Boost (AB). Basic models are models with no semantics attached. Plain models are modelling and prediction in the target domain i.e., no transfer learning, while TL refers to transferring entailments from the source. As expected semantics positively boosts accuracy of transfer learning for intra-domain cases (IBH and ILD) with an average improvement of 13.07% across models. More surprisingly it even over-performs in the inter-domain case (ILB) with an improvement of 20.03%. Inconsistency has shown to drive below-baseline accuracy. On the opposite results are much better when considering consistency for intra-domain cases (63.55%), and inter-domain cases (187.89%).

Case Models RF SGD AB Plain TL Plain TL Plain TL

Basic .61 .61 .59 .62 .59 .63

Consistency .65 .74 .62 .69 .64 .73 Inconsistency .56 .64 .52 .60 .49 .63 Cons. / Incons. +16.07% +19.23% +30.61% Semantic / Basic +13.93% +8.18% +12.17%

Basic .68 .71 .57 .62 .63 .69

Consistency .75 .78 .65 .71 .75 .82 Inconsistency .44 .52 .26 .49 .24 .46 Cons. / Incons. +60.22% +102.86% +152.35% Semantic / Basic +10.07% +14.70% +19.42%

Basic .62 .65 .60 .66 .61 .68

Consistency .74 .79 .69 .78 .73 .85 Inconsistency .23 .45 .29 .42 .18 .34 Cons. / Incons. +153.96% 166.25% +243.46% Semantic / Basic +20.44% +17.33% +22.33%

Table 2: Forecasting Accuracy / Improvement over State-of-the-art Models (noted as Basic) with Consistency / Inconsistency (Consistency ratio .8) based Knowledge Transfer.

Figure 4 reports the impact of consistency and inconsistency on transfer learning by analysing how the ratio of consistent transferable knowledge in [0, 1] is driving accuracy. Accuracy is reported for methods in Table 2 on intra- (average of IBH and ILD) and inter-domains (ILB). Max. (resp. min.) accuracy is ensured with ratio in [.9, .7) (resp. [.3, .1)). The more consistent transferable knowledge the more transfer for [.9, .1). Interestingly having only consistent (resp. inconsistent) transferable knowledge does not ensure best (resp. worst) accuracy. This is partially due to under- (resp. over-) populating the target task with conﬂicting knowledge, ending up to limited transferability.

5.2 Comparison with Baselines and Discussion We compare St Ada B (L = Logistic Regression, N = 800) with Transfer Ada Boost Tr AB [Dai et al., 2007], Transfer Component Analysis (TCA) [Pan et al., 2011], Tr SVM [Benavides-Prado et al., 2017] and Sem Tr [Lv et al., 2012] (cf. details in Section 6). We considered intra-domains: IBH, ILD and inter-domains: ILB and ILB (i.e., ILB with same level of semantic expressivity covered by Sem Tr). Results report that transfer learning has limitations in the Beijing - Hangzhou context cf. Figure 5(a). Although our approach over-performs other techniques (from 10.29% to 50%), accuracy does not exceed 74%. The latter is due to the context, which is limited by the (i) semantic expressivity and (ii) data availability in Hangzhou. The results show that Tr SVM and

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

[1,.9) [.9,.7) [.7,.5) [.5,.3) [.3,.1) [.1,.0]

[Intra-Domain] RF TL [Intra-Domain] SGD TL [Intra-Domain] AB Plain [Intra-Domain] AB TL [Inter-Domain] RF TL [Inter-Domain] SGD TL [Inter-Domain] AB Plain [Inter-Domain] AB TL

Forecasting Accuracy

Ratio of Consistent Transferable Knowledge

Figure 4: Forecasting Accuracy vs. Semantic Consistency.

TCA reach similar results (average difference of 9.1%) in all the cases. However our approach and Tr AB tend to maximise the accuracy specially in inter-domains ILB and ILB in Figures 5(c) and 5(d) as both favour heterogenous domains by design. Interestingly the semantic context of ILB in Figure 5(d) (i) does not favour Sem Tr much (+7.46% vs. ILB), (ii) does not have impact for St Ada B compared to ILB, and more surprisingly (iii) does beneﬁt Tr AB (+9.15% vs. ILB). This shows that expressivity of semantics is crucial in our approach to beneﬁt from (in-)consistency in transfer.

(a) Intra-Domain IBH

(b) Intra-Domain ILD

(c) Inter-Domains ILB

(d) Inter-Domains ILB

Figure 5: Baseline Comparison of Forecasting Accuracy.

Adding semantics to domains for transfer learning has clearly shown the positive impact on accuracy, specially in context of inter-domains transfer. This demonstrates the robustness of models supporting semantics when common / conﬂicting knowledge is shared. The expressivity of semantics has also shown positive impacts, specially when (in- )consistency can be derived from the domain logics, although

some state-of-the-art approaches beneﬁt from taxonomy-like knowledge structure. Our approach also demonstrates that the more semantic axioms the more robust is the model and hence the higher the accuracy cf. Figure 5(a) vs. 5(b). Data size and axiom numbers are critical as they drive and control the semantics of domain and transfer, which improve accuracy, but not scalability (not reported in the paper). It is worst with more expressive DLs due to consistency checks, and with limited impact on accuracy. Enough training data in the source domain is required. Indeed logic reasoning could not help if important data or features are not mapped to the ontology. This is crucial for training and validation of semantics in transfer learning. Our approach is as robust as other transfer learning approaches, it only differentiate on valuing the transferability at semantic level.

6 Related Work

We brieﬂy divide the related work into instance transfer, model transfer and semantics transfer. Instance transfer selectively reuses source domain samples with weights [Dai et al., 2007]. [Tan et al., 2017] select data points from intermediate domains to obtain smooth transfer between largely distant domains. Model transfer reuses model parameters like features in the target domain. For example, [Pan et al., 2011] introduced a transfer component analysis for domain adaption; [Benavides-Prado et al., 2017] selectively shares the hypothesis components learnt by Support Vector Machines. These methods however usually ignore data semantics. Semantics transfer incorporates external knowledge to boost the above two groups, by using semantic nets [Lv et al., 2012] or knowledge graph-structure data [Lee et al., 2017] to derive similarity in data and features, with no reasoning applied. There are efforts on Markov Logic Networks (MLN) based transfer learning, by using ﬁrst [Mihalkova et al., 2007; Mihalkova and Mooney, 2009] or second order [Davis and Domingos, 2009; Van Haaren et al., 2015] rules as declarative prediction models. However, these approaches do not address the problem of when is feasible to transfer. Our approach uses OWL reasoning to select transferable samples (addressing when to transfer ), then enriching the samples with embedded transferability semantics. It can support different machine learning models (and not just rules).

7 Conclusion

We addressed the problem of transfer learning in expressive semantics settings, by exploiting semantic variability, transferability and consistency to deal with when to transfer and what to transfer, for existing instance-based transfer learning methods. It has been shown to be robust to both intraand inter-domain transfer learning tasks from real-world applications in Dublin, London, Beijing and Hangzhou. As for future work, we will investigate limits and explanations of transferability with more expressive semantics (e.g, based on approximate reasoning) [Pan et al., 2016; Du et al., 2019].

Acknowledgments

This work is partially funded by NSFC91846204.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

[Baader et al., 2005] Franz Baader, Sebastian Brandt, and Carsten Lutz. Pushing the el envelope. In IJCAI, pages 364 369, 2005. [Bechhofer et al., 2004] Sean Bechhofer, Frank Van Harmelen, Jim Hendler, Ian Horrocks, Deborah L Mc Guinness, Peter F Patel-Schneider, Lynn Andrea Stein, et al. Owl web ontology language reference. W3C recommendation, 10(02), 2004. [Benavides-Prado et al., 2017] Diana Benavides-Prado, Yun Sing Koh, and Patricia Riddle. Acc Gen SVM: Selectively transferring from previous hypotheses. In IJCAI, pages 1440 1446, 2017. [Bishop, 2006] Christopher M Bishop. Pattern recognition. Machine Learning, 128:1 58, 2006. [Chen et al., 2015] Jiaoyan Chen, Huajun Chen, Daning Hu, Jeff Z Pan, and Yalin Zhou. Smog disaster forecasting using social web data and physical sensor data. In 2015 IEEE International Conference on Big Data (Big Data), pages 991 998. IEEE, 2015. [Chen et al., 2017] Jiaoyan Chen, Freddy L ecu e, Jeff Z Pan, and Huajun Chen. Learning from ontology streams with semantic concept drift. In IJCAI, pages 957 963, 2017. [Chen et al., 2018] Jiaoyan Chen, Freddy L ecu e, Jeff Z. Pan, Ian Horrocks, and Huajun Chen. Knowledge-based transfer learning explanation. In KR, pages 349 358, 2018. [Choi et al., 2016] Jonghyun Choi, Sung Ju Hwang, Leonid Sigal, and Larry S Davis. Knowledge transfer with interactive learning of semantic relationships. In AAAI, pages 1505 1511, 2016. [Dai et al., 2007] Wenyuan Dai, Qiang Yang, Gui-Rong Xue, and Yong Yu. Boosting for transfer learning. In Proceedings of the 24th international conference on Machine learning, pages 193 200. ACM, 2007. [Dai et al., 2009] Wenyuan Dai, Yuqiang Chen, Gui-Rong Xue, Qiang Yang, and Yong Yu. Translated learning: Transfer learning across different feature spaces. In Advances in neural information processing systems, pages 353 360, 2009. [Davis and Domingos, 2009] Jesse Davis and Pedro Domingos. Deep transfer via second-order Markov Logic. In ICML, pages 217 224, 2009. [Du et al., 2019] Jianfeng Du, Jeff Z. Pan, Sylvia Wang, Yuming Shen Kunxun Qi, and Yu Deng. Validation of Growing Knowledge Graphs by Abductive Text Evidences. In the Proc. of the 33rd National Conference on Artiﬁcial Intelligence (AAAI 2019), 2019. [L ecu e and Pan, 2015] Freddy L ecu e and Jeff Z Pan. Consistent knowledge discovery from evolving ontologies. In AAAI, pages 189 195, 2015. [L ecu e, 2015] Freddy L ecu e. Scalable maintenance of knowledge discovery in an ontology stream. In IJCAI, pages 1457 1463, 2015.

[Lee et al., 2017] Jaekoo Lee, Hyunjae Kim, Jongsun Lee, and Sungroh Yoon. Transfer learning for deep learning on graph-structured data. In AAAI, pages 2154 2160, 2017. [Long et al., 2015] Mingsheng Long, Yue Cao, Jianmin Wang, and Michael I. Jordan. Learning transferable features with deep adaptation networks. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, pages 97 105, 2015. [Lv et al., 2012] Wenlong Lv, Weiran Xu, and Jun Guo. Transfer learning in classiﬁcation based on semantic analysis. In Computer Science and Network Technology (ICCSNT), 2012 2nd International Conference on, pages 1336 1339. IEEE, 2012. [Mihalkova and Mooney, 2009] L. Mihalkova and R. J. Mooney. Transfer learning from minimal target data by mapping across relational domains, 2009. [Mihalkova et al., 2007] Lilyana Mihalkova, Tuyen Huynh, and Raymond J. Mooney. Mapping and revising Markov Logic Networks for transfer learning. In AAAI, pages 608 614, 2007. [Nickel et al., 2016] Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich. A review of relational machine learning for knowledge graphs. Proceedings of the IEEE, 104(1):11 33, 2016. [Pan and Thomas, 2007] Jeff Z. Pan and Edward Thomas. Approximating OWL-DL Ontologies. In the Proc. of the 22nd National Conference on Artiﬁcial Intelligence (AAAI-07), pages 1434 1439, 2007. [Pan and Yang, 2010] Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345 1359, 2010. [Pan et al., 2011] Sinno Jialin Pan, Ivor W Tsang, James T Kwok, and Qiang Yang. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 22(2):199 210, 2011. [Pan et al., 2016] Jeff Z. Pan, Yuan Ren, and Yuting Zhao. Tractable approximate deduction for OWL. Arteﬁcial Intelligence, pages 95 155, 2016. [Tan et al., 2017] Ben Tan, Yu Zhang, Sinno Jialin Pan, and Qiang Yang. Distant domain transfer learning. In AAAI, pages 2604 2610, 2017. [Van Haaren et al., 2015] Jan Van Haaren, Andrey Kolobov, and Jesse Davis. TODTLER: Two-order-deep transfer learning. In AAAI, pages 3007 3015, 2015. [Vicient et al., 2013] Carlos Vicient, David S anchez, and Antonio Moreno. An automatic approach for ontologybased feature extraction from heterogeneous textual resources. Engineering Applications of Artiﬁcial Intelligence, 26(3):1092 1106, 2013. [Weiss et al., 2016] Karl Weiss, Taghi M Khoshgoftaar, and Ding Ding Wang. A survey of transfer learning. Journal of Big Data, 3(1):9, 2016.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)