# confident_anchorinduced_multisource_free_domain_adaptation__3780793f.pdf

Conﬁdent-Anchor-Induced Multi-Source-Free Domain Adaptation

Jiahua Dong1, 2 , Zhen Fang3 , Anjin Liu3, Gan Sun1 , Tongliang Liu4

1State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences. 2University of Chinese Academy of Sciences. 3De SI Lab, AAII, University of Technology Sydney. 4TML Lab, University of Sydney. {dongjiahua1995, fzjlyt, sungan1412}@gmail.com, anjin.liu@uts.edu.au, tongliang.liu@sydney.edu.au

Unsupervised domain adaptation has attracted appealing academic attentions by transferring knowledge from labeled source domain to unlabeled target domain. However, most existing methods assume the source data are drawn from a single domain, which cannot be successfully applied to explore complementarily transferable knowledge from multiple source domains with large distribution discrepancies. Moreover, they require access to source data during training, which are inefﬁcient and unpractical due to privacy preservation and memory storage. To address these challenges, we develop a novel Conﬁdent-Anchor-induced multisource-free Domain Adaptation (CAi DA) model, which is a pioneer exploration of knowledge adaptation from multiple source domains to the unlabeled target domain without any source data, but with only pre-trained source models. Specifically, a source-speciﬁc transferable perception module is proposed to automatically quantify the contributions of the complementary knowledge transferred from multi-source domains to the target domain. To generate pseudo labels for the target domain without access to the source data, we develop a conﬁdent-anchorinduced pseudo label generator by constructing a conﬁdent anchor group and assigning each unconﬁdent target sample with a semantic-nearest conﬁdent anchor. Furthermore, a class-relationship-aware consistency loss is proposed to preserve consistent inter-class relationships by aligning soft confusion matrices across domains. Theoretical analysis answers why multi-source domains are better than a single source domain, and establishes a novel learning bound to show the effectiveness of exploiting multi-source domains. Experiments on several representative datasets illustrate the superiority of our proposed CAi DA model. The code is available at https://github.com/Learning-group123/CAi DA.

1 Introduction

Unsupervised Domain Adaptation (UDA) [22,59,63] captures transferable knowledge from labeled data in a source domain to classify unlabeled data in a target domain. UDA has achieved remarkable successes in many applications, e.g., object detection [23], medical diagnose [9,12], sentiment analysis [34], etc. Generally, most of existing methods employ adversarial learning [17] to encourage the learned source and target features to be indistinguishable from each other [10,15], or minimize the distribution discrepancy across domains by matching the statistical moments of distributions [41].

Equal contributions Corresponding author

35th Conference on Neural Information Processing Systems (Neur IPS 2021).

However, the above-mentioned methods have a strong assumption that the source data are merely drawn from a single domain. Unfortunately, the source data are often collected under different deployed environments (i.e., multiple source domains with large distribution discrepancies) in realworld applications, which makes them difﬁcult to explore complementarily transferable knowledge from the multi-source domains for target prediction. To achieve this, Multi-Source Domain Adaptation (MSDA) [32, 34, 61] is proposed to match the features across domains and then quantify the contributions of source domains [2,41,60]. Additionally, [29,58] aim to weight the source contributions by normalizing the distance similarities between source and target domains.

Unfortunately, recent MSDA methods [29, 34, 41, 44] require massive labeled source data when adapting source domains to the target domain. This could make them inefﬁcient and unpractical in real-world applications with sensitive information (e.g., medical diagnosis [12] and recommendation system [24]), due to privacy preservation issues, storage and security concerns [50,51]. To this end, a new challenging and practical problem named Multi-Source-Free Domain Adaptation (MSFDA) is researched, which explores transferable knowledge from multiple source domains to target domain with only pre-trained source models and without access to any source data. The trivial solutions for tackling MSFDA via using existing single-source-free domain adaptation methods [25, 30, 33, 57] are to adapt each source model individually and simply take an average prediction of source models. However, they cannot explore the contributions of the complementary information transferred from different source domains, due to the lack of source data. Therefore, tackling the MSFDA problem is a challenging but rarely-researched task.

To address the MSFDA problem, we develop a novel Conﬁdent-Anchor-induced multi-source-free Domain Adaptation (CAi DA) model, which is a pioneer exploration to capture transferable information from multiple source models to promote target prediction without access to source data. Speciﬁcally, a source-speciﬁc transferable perception module is designed to calibrate the contributions of the transferability from multiple source domains. We develop a conﬁdent-anchor-induced pseudo label generator to mine pseudo labels for the unlabeled target data, by incorporating with the quantiﬁed source transferability contributions. We construct a conﬁdent anchor group to assign each target sample with a semantic-nearest conﬁdent anchor, and perform feature augmentation between them to generate conﬁdent target pseudo label. A class-relationship-aware consistency loss is proposed to ensure the semantic consistency of underlying inter-class relationships across domains via the alignment of soft confusion matrices. Furthermore, based on some mild assumptions, theoretical analysis guarantees that multiple source models could help generate more reliable pseudo labels. Our theoretical analysis also provides a novel learning bound for MSFDA, which reveals that multiple source models help achieve a tighter generalization error bound for the target domain. We verify the effectiveness of our proposed model via comparison experiments on benchmark datasets. The main contributions of our work are summarized as follows:

We develop a novel Conﬁdent-Anchor-induced multi-source-free Domain Adaptation (CAi DA) model to explore transferable knowledge from multiple source domains to assist target prediction with pre-trained source models and without access to source data. To our best knowledge, this paper is a pioneer exploration of multi-source-free domain adaptation in the ﬁeld of transfer learning.

We propose a novel MSFDA theory, which shows that multiple source models could improve the possibility of obtaining more reliable pseudo labels under some mild assumptions. Our theoretical analysis also provides a novel generalization bound for MSFDA to show the effect of multiple source models. This novel bound implies a positive answer to the solvability of MSFDA problem.

A source-speciﬁc transferable perception module and a class-relationship-aware consistency loss are designed to quantify the contributions of the transferability of multiple source domains and ensure the semantic consistency of underlying inter-class relationships across domains, respectively.

Based on the conﬁdent pseudo labeling strategy in theoretical analysis, a conﬁdent-anchor-induced pseudo label generator is proposed to generate pseudo labels for the target domain by establishing a conﬁdent anchor group and assigning each target sample with a semantic-nearest conﬁdent anchor.

2 Related Work

Unsupervised Domain Adaptation aims to borrow transferable knowledge from source domain to promote the prediction of the unlabeled target domain. After Hoffman et al. [22] introduce ad-

versarial learning [17] into domain adaptation, enormous adversarial-based methods [8, 11, 15, 59] are proposed to perform feature-level or pixel-level distribution alignment. Besides, some moment matching-based methods [14, 35, 41] focus on matching the distribution statistical moments at different orders to minimize the distribution discrepancy across domains. Furthermore, some researches design the adversarial dropout [28], batch normalization [52] and auxiliary reconstruction tasks [5,16] to narrow the domain discrepancy. Unfortunately, the methods mentioned above assume massive labeled source data are available. This is unpractical due to privacy and security concerns.

Source-Free Domain Adaptation (SFDA) [33] is studied to tackle the above challenge. A common strategy in SFDA methods is to mine the conﬁdent pseudo labels for target domain. To this end, [33] uses a self-supervised pseudo labeling strategy, and [25] designs a conﬁdence-based sample ﬁltering method. [42] alleviates the negative transfer brought by noisy pseudo labels through conﬁdence reweighting and regularization. In addition to the pseudo labeling strategy, the model adaptation strategy has also been studied. For example, [30, 57] employ an adversarial learning strategy to perform model adaptation with only pre-trained source models. However, they cannot be applied to tackle the MSFDA problem, due to the distribution discrepancies across different source domains.

Multi-Source Domain Adaptation is extended from vanilla domain adaptation [12, 15, 41] by exploring transferable knowledge from multiple sources. To capture the relationship between different source domains and a given target domain, Guo et al. [19] design a mixture-of-experts model for unsupervised domain adaptation from multiple sources. [56] focuses on determining which source domain is the best for target prediction via dynamic curriculum learning. Some discrepancy-based methods aim to narrow the distribution discrepancy across domains by minimizing different measures such as the Rényi-divergence [21] and maximum mean discrepancy [19]. Moreover, some adversarial-based methods focus on optimizing the H-divergence [41, 60], generative adversarial loss [55,61] and Wasserstein distance [32] to make features from multiple sources indistinguishable for a shared discriminator. [34, 44] perform knowledge adaptation at the pixel-level by replaying multiple source domains. Due to lack of source data, above strategies in MSDA may be invalid and unsuitable to address the challenging MSFDA problem. To this end, Ahmed et al. [1] utilize nearest distance measure to mine target pseudo labels, and weight the predictions from multiple source models for the MSFDA task. It may result in that the generation process has high probability to obtain noisy labels [4,31,54] when the strategy is not matched with the target data, while our model could generate conﬁdent pseudo labels from two different perspectives, i.e., geometry and probability.

3 Problem Setting

Let X and Y = [K] := {1, ..., K} denote the feature space and label space. A domain is a joint distribution PXY on X Y. There are n source domains {P i XY }n i=1. For any source domain P i XY , a corresponding neural-network-based predictor (model) hi : X RK is given. Given a target domain P t XY with unlabeled target data T = {xj}m j=1 P t X, i.i.d., the aim of multi-source-free domain adaptation (MSFDA) is to classify the unlabeled target data by utilizing T and {hi}n i=1.

Let ℓdenote a non-negative loss function deﬁned over RK RK. Given a hypothesis space H {h : X RK}, we denote Li s(h) = E(x,y) P i XY ℓ(h(x), Φ(y)) and Lt(h) = E(x,y) P t XY ℓ(h(x), Φ(y)) as the risks with respect to the i-th source domain and a given target domain, where Φ : Y RK maps any label y to a corresponding one-hot vector.

The source predictor hi is a vector-valued function, i.e., hi(x) = [hi 1(x), ..., hi K(x)] , and consists of two basic components: feature extractor f i : X Rd and classiﬁer ci : Rd RK, where d is the dimension of extracted features. Therefore, hi can be rewritten as ci f i. After using softmax as the activation function in the output layer, we have hi k 0 and K k=1 hi k = 1. To ensure that each predictor hi is a relatively accurate predictor for the domain P i XY , we assume that the predictor hi

is ϵ-accurate under ℓ1 loss, i.e., E(x,y) P i XY hi(x) Φ(y) ℓ1 < ϵ, for any i [n].

4 Theoretical Analysis

Without any relations between source domains and target domain, MSFDA cannot be effectively addressed from the theoretical view. To bridge source and target domains, Mansour et al. [40] and

Miraj Ahmed et al. [1] assume that PXY = n i=1 λi P i XY and P t Y |X = P i Y |X = P j Y |X, for any i, j [n], where λi 0 and n i=1 λi = 1. This assumption is very strong and may be unrealistic in many real-world applications. Motivated by the meta learning [43] and domain generalization [3], in this paper, we propose some novel and mild assumptions to address MSFDA problem. Assumption 1 (Meta Assumption.) P t XY , P 1 XY , ..., P n XY are drawn (i.i.d.) from a meta distribution P, which is deﬁned over a joint distribution space PXY .

Assumption 2 (Regular Domain.) Let the joint distribution space PXY is endowed with total variation distance d TV( , ). The target domain P t XY is a regular domain, i.e., for any σ > 0, P(N σ P t XY ) > 0, where N σ P t XY = {P : d TV(P, P t XY ) < σ}.

It is easy to check, if the meta distribution P is discrete or continuous with continuous density function, then the target domain P t XY drawn by P is a regular domain with probability 1. In addition, to weaken the assumption P t Y |X = P i Y |X = P j Y |X, for any i, j [n], our key strategy is to consider the anchor point assumption that is studied in label-noise learning [36].

We say a given point x is a τ-anchor point, if there exists a predictor hi with the largest score hi k(x), such that hi k(x) hi c(x) τ, for any c [K] and c = k. hi is the τ-anchor predictor for x. Assumption 3 (τ-Anchor Point Assumption.) Given a τ-anchor point x, suppose that all τanchor predictors for x are hi1, ..., hil, then the true label of the τ-anchor point x is k, if hj k(x) is the largest score among scores hi1 c , ..., hilc , for any c [K]. When the source predictors are accurate enough, Assumption 3 implies that the source and target conditional distributions are similar in the high conﬁdence region. Hence, it is much weaker than the traditional assumption P t Y |X = P i Y |X = P j Y |X, for i, j [n]. τ is regarded as a threshold to distinguish which data has highly conﬁdent prediction. In general, τ is close to 1.

Given any hi, it is easy to check that there exists a matrix function Ai(x) = [ai kl(x)] such that hi k(x) = K l=1 ai kl(x)P t Y |X(l|x) with K k=1 ai kl = 1. We say the diagonal elements {ai ll}K l=1 as the transfer factor for hi and P t Y |X. Generally, Ai(x) = [ai kl(x)] is not unique, thus the transfer factor may be not unique. The following theorem indicates that Assumption 3 holds if we give proper assumptions for transfer factor. Theorem 1 Suppose that the Bayesian label is true label [7]. If there exist transfer factors and a constant B < K such that maxi [n],l [K] ai ll B, then Assumption 3 holds with τ > 1 B/K. Theorem 1 provides a theoretical support for Assumption 3 and indicates that when the transfer factors are positive, Assumption 3 always holds with a proper τ. To further study the highly conﬁdent pseudo labeling strategy, the following theorem provides a lower bound to estimate the number of highly conﬁdent pseudo labels, i.e., the number of τ-anchor points. Theorem 2 Assume Assumptions 1 and 2 hold and the conditional distribution P t Y |X can be presented as a labeling function, i.e., P t Y |X(y|x) = 0 or 1. Given η > 0, if m n and (1 η)(1 τ) >

log(2m/δ)/2m, then with probability at least 1 δ (1 P(N σ PXt Yt ))n > 0, at least ηm target data are τ-anchor points, where ϵ is the upper bound of the accuracies of source predictors, and σ is introduced in Assumption 2. Theorem 2 indicates that multi-source predictors improve the probability to obtain more τ-anchor points. To further understand the effect of multi-source domains, we build a novel learning bound for the MSFDA task. Let Aτ be the set consisting of all τ-anchor points. Denote the empirical risk b Lτ s(h) by 1 |Aτ T | x Aτ T ℓ(h(x), Φ(y)), where y is the label of the anchor point x.

Theorem 3 Given Assumption 3 and some assumptions used in Theorem 2, and suppose that the loss ℓhas upper bound M > 0 and hypothesis space H has ﬁnite Natarajan dimension, for η > 0, if m n and (1 η)(1 τ) > ϵ + 2σ + 2

log(2m/δ)/2m, then for any h H and b (0, 1), there exists a constant C(b, K) such that with the probability at least 1 2δ 2(1 P(N σ P t XY ))n:

Lt(h) b Lτ s(h) MC(b, K)

log(2/δ) η1 bm1 b + M 2σ + ϵ 1 τ 2σ ϵ, (1)

where ϵ is the upper bound of accuracies of source predictors, and σ is introduced in Assumption 2.

Class-Relationship-Aware

Consistency Loss

Source-Specific Transferable Perception

Confident-Anchor-Induced Pseudo

Label Generator

Distance Measure

ℒcls ℒdiv 𝐟(𝐱𝑗) 𝐟syn(𝐱𝑗)

𝐟1(𝐱𝑗ሻ 𝐡1(𝐱𝑗ሻ

Target Data

Figure 1: Overview of our model, mainly including a source-speciﬁc transferable perception strategy to quantify the contributions of the transferability of source domains, a conﬁdent-anchorinduced pseudo label generator to generate pseudo labels for target domain, and a class-relationshipaware consistency loss to ensure the semantic consistency of underlying inter-class relationships.

Theorem 3 shows that multiple source domains improve the probability to ensure a tighter generalization bound, i.e., Eq. (1) holds. Note that the Natarajan dimension used in Theorem 3 is a bit outdated. However, the Natarajan dimension can be replaced and Theorem 3 can be updated without any technical barriers, if there exists better generalization theory for supervised learning.

Summary of Theoretical Analysis: The reasons to develop MSFDA are to study the solvability of MSFDA and understand how multi-source predictors beneﬁt the target domain s classiﬁcation. By Theorems 2 and 3, we realize that multi-source predictors improve the probability to obtain more highly conﬁdent pseudo labels, resulting in a tighter generalization bound. The generalization bound in Theorem 3 gives a positive answer to the solvability of MSFDA. Additionally, Theorems 1 and 2 also imply two interesting and important results: Theorem 1 provides the ﬁrst theoretical support to the conﬁdent pseudo labeling strategy, and Theorem 2 provides the ﬁrst lower bound of the number of highly conﬁdent pseudo labels. As we know, the theoretical results in Theorems 1 and 2 are novel.

5 The Proposed CAi DA Model

The graphical illustration of our proposed model is depicted in Figure 1. It mainly consists of three signiﬁcant components: source-speciﬁc transferable perception, conﬁdent-anchor-induced pseudo label generator and class-relationship-aware consistency loss, which are elaborated as follows.

5.1 Source-Speciﬁc Transferable Perception

Generally, in multi-source domain adaptation (MSDA), different source domains have different contributions to improve the performance on target domain [21,61]. To this end, many previous MSDA methods match the features across different domains, and then quantify the contributions of source domains by taking the average of the trained source predictors [41,60], or weight the trained source predictors by normalizing the distance similarities [29,58] between source and target domains. However, due to the lack of source data, these methods cannot employ source data to match features and cannot be successfully applied to multi-source-free domain adaptation (MSFDA) tasks.

Therefore, a source-speciﬁc transferable perception module is developed to automatically quantify the contributions of the transferability of source domains, as shown in Figure 1. Speciﬁcally, the one-hot encoding vector ui Rn of the i-th source domain can be considered as the unique domain characterization, which is then employed as network input to quantify the contribution of the transferability. We then concatenate all source domains one-hot characterizations together to obtain U = [u1, , un] Rn n. U is then forwarded into a Multi-Layer Perceptron (MLP) network Ωto automatically quantify the contribution of the transferability µ Rn of n source domains: µ = Ω(U) = Ω([u1, u2, , un] ), such that n i=1 µi = 1, where µi denotes the quantiﬁed contribution of the i-th source domain to the target prediction.

When the source data are unavailable, it is difﬁcult to narrow distribution discrepancy across domains, due to lack of any supervised information. Inspired by [33], we freeze the network parameters of classiﬁers {ci}n i=1 and solely perform distribution adaptation across domains on feature extractors {f i}n i=1 via information maximization [26], since {ci}n i=1 contain class distribution information of source domains. However, Liang et al. [33] cannot be effectively applied to multi-source-free do-

main adaptation scenario, where different source domains have different transferable contributions on target prediction. Therefore, Lent is proposed to minimize conditional entropy of target outputs by incorporating the source-speciﬁc transferable perception µ:

k=1 hk(xj) log(hk(xj))], (2)

where hk(xj) is the k-th coordinate value of h(xj), here h(xj) = n i=1 µihi(xj) denotes the combination of source predictions. The larger value of µi indicates the larger contribution of transferability of the i-th source domain for target adaptation. Unfortunately, our proposed CAi DA model may suffer from the trivial solution of Eq. (2) by predicting all target data as a single class to minimize Eq. (2). To tackle this issue, Ldiv is designed to consider class prediction diversity by maximizing the entropy of empirical label distribution [6] predicted by different source domains:

k=1 ˆpk log ˆpk, (3)

where ˆpk = 1

m m j=1 hk(xj) denotes the mean prediction probability over target data.

5.2 Conﬁdent-Anchor-Induced Pseudo Label Generator

Although the minimization of both Lent and Ldiv promotes the class diversity and knowledge adaptation between multiple unseen source data and target data, it cannot circumvent erroneous label assignment due to the noisy target prediction brought by domain discrepancy. To alleviate this issue, based on Assumption 3, a conﬁdent-anchor-induced pseudo label generator is developed to mine conﬁdent pseudo labels for target data, as shown in Figure 1. Speciﬁcally, based on the τ-anchor assumption, we select target data xj as conﬁdent anchor when the maximum category prediction probability hk (xj) is larger than the rest of category prediction probabilities hk(xj) (k = k ) by a threshold τp. For each conﬁdent anchor, we integrate the features extracted from multiple source extractors together to obtain a probability-based conﬁdent anchor group Cp:

Cp = {f(xj)|hk (xj) hk(xj) τp, k = k , j = 1, 2, m}, (4)

where f(xj) = n i=1 µif i(xj) denotes the feature of the j-th target data. τp is deﬁned as the median value of the probability difference between the largest and the second largest probabilities over all target data. Furthermore, inspired by the proposed τ-anchor assumption, to circumvent the noisy anchor in Cp, we construct a distance-based conﬁdent anchor group Cd. The feature centroid of the k-th class induced by the i-th source domain for the whole target data is deﬁned as ξi k = m j=1 hi k(xj)f i(xj)/ m j=1 hi k(xj). {ξi k}n i=1 are then weighted with source-speciﬁc transferability to obtain the feature centroid ξk of the k-th class over all source domains via ξk = n i=1 µiξi k. The distance between the feature of xj and the k-th feature centroid ξk is denoted as d(f(xj), ξk), where d( , ) is a distance measure function. When the minimum distance d(f(xj), ξk ) is shorter than the rest of distances d(f(xj), ξk)(k = k ) by a threshold τd, we select the target data xj into Cd:

Cd = {f(xj)|d(f(xj), ξk) d(f(xj), ξk ) τd, k = k , j = 1, 2, m}, (5)

where we set τd as the median value of the distance difference between the shortest and the second shortest distances over all target data. Therefore, the ﬁnal conﬁdent anchor group C is obtained by performing an intersection operation between Cp and Cd, i.e., C = Cp Cd.

To generate conﬁdent pseudo labels for unconﬁdent target data, we select a semantic-nearest conﬁdent anchor from C for each target data via continual similarity searching. To be speciﬁc, given an unconﬁdent target data xj, we search a serial of unconﬁdent guiding data consecutively using the distance measure function d( , ) in the feature space, until the conﬁdent anchor from C is detected. During each searching process, the previous searched guiding data are not considered in the following iteration. Denote I as the set containing previously searched guiding data, xjc as the searched conﬁdent anchor closest to xj. The guiding sample xj(t) in the t-th search could be obtained by:

xj(t) = arg min xj T d(f(xj(t 1)), f(xj)), subject to xj(t) = xj(t 1), xj(t) / I, (6)

Algorithm 1 The Searching Process of Semantic-Nearest Conﬁdent Anchor xjc.

1: Input: xj, C, t = 1; 2: Initialize: xj(t 1) = xj, I = ; 3: While xj(t 1) / C do 4: Obtain xj(t) via Eq. (6); 5: Update I via adding xj(t) into I; 6: Update xj(t 1) via xj(t 1) xj(t); 7: End 8: Obtain xjc via xjc xj(t 1)

9: Return xjc.

Algorithm 2 The Optimization of Our Model.

1: Input: {hi}n i=1, T, E epoches, B batches; 2: Initialize: κ1, κ2, the parameters of Ω; 3: For e = 1, , E do 4: Obtain pseudo labels via Eq. (7); 5: For b = 1, , B do 6: Select a mini-batch of target data; 7: Update {f i}n i=1 and Ωvia Eq.(9); 8: End 9: End 10: Return: {hi}n i=1 and Ω.

where xj(t 1) is the guiding data in the previous search. The searching process of xjc for the j-th target data is summarized in Algorithm 1. Moreover, we fuse the target data and their corresponding conﬁdent anchor in the feature space for feature augmentation, and the synthetic feature is denoted as fsyn(xj) = (1 ω)f(xj) + ωf(xjc), where ω [0, 1] is the random weight to determine the inﬂuence of conﬁdent anchor on fsyn(xj). Therefore, the conﬁdent-anchor-induced pseudo label ˆyj

of the j-th target data xj and the classiﬁcation loss Lcls of whole target data are formulated as:

ˆyj = arg min k [K] d(fsyn(xj), ξk); Lcls = 1

k=1 [ 1ˆyj=k log(hk(xj))]. (7)

5.3 Class-Relationship-Aware Consistency Loss

The inherent relationships between different classes have semantic consistency across domains, regardless of distribution discrepancy. In light of this, aligning class relationships could promote more shared transferable knowledge from source domains towards target adaptation. To achieve this, as depicted in Figure 1, a class-relationship-aware consistency loss Lcrc is designed to encourage target data from the same class to be compactly clustered together while preserving the intrinsic inter-class relationships via soft confusion matrix alignment. To be speciﬁc, the soft label distribution si k of the k-th class predicted via the i-the source predictor is formulated as si k = 1 m m j=1[1ˆyj=kµihi(xj)]. The collection of soft label distributions {si k}K k=1 represents a kind of soft confusion matrix associated with a particular domain, encoding inter-class relationships learned by the i-th source predictor (e.g., computers have more similar semantic relationship with desks than horses). Without access to the source data, Lcrc aims to align soft confusion matrices from different source predictors:

Lcrc = 1 2n2

( KL(si k||si k ) + KL(si k ||si k) ) , (8)

where KL(p||q) = r pr log pr

qr denotes the Kullback-Leibler (KL) divergence. The complexity of Lcrc is not problematic in practice, due to the limited number of source domains.

In summary, the overall optimization objective to optimize {f i}n i=1 and Ωis formulated as:

L = Lent + Ldiv + κ1Lcls + κ2Lcrc, (9)

where κ1, κ2 are the balanced weights. The optimization of our model is presented in Algorithm 2.

6 Experiments

6.1 Datasets and Baseline Methods

Datasets: Ofﬁce-31 [22] consists of three representative domains with 31 shared object categories in the ofﬁce environment, i.e., Amazon (A), Webcam (W) and DSLR (D). Ofﬁce-Caltech [18] is an extension dataset of Ofﬁce-31 [22] by adding an additional subset called Caltech-256 (C) on it and extracting 10 common object classes among them. Ofﬁce-Home [46] is composed of four different domains including Product (Pr), Clipart (Cl), Art (Ar), and Realworld (Re). Each of these subsets

Table 1: Comparisons between our model and other competing methods on Ofﬁce-31 [22] dataset (the left block) and Ofﬁce-Caltech [18] dataset (the right block).

Methods Source Data A, D W A, W D D, W A Avg. A, D, C W A, C, W D C, D, W A A, D, W C Avg. Source only [20] 97.1 92.0 51.6 80.2 93.5 94.2 90.6 87.5 91.5 MDAN [60] 99.2 95.4 55.2 83.3 99.4 98.7 93.5 91.6 95.8 DCTN [55] 99.6 96.9 54.9 83.8 99.3 99.4 94.1 91.3 96.0 M3SDA [41] 99.4 96.2 55.4 83.7 99.5 99.2 94.5 92.2 96.4 MDDA [62] 99.2 97.1 56.2 84.2 99.3 99.6 95.3 92.3 96.6 Lt C-MSDA [47] 99.6 97.2 56.9 84.6 99.4 99.7 93.7 95.1 97.0 Source model only 95.4 97.5 60.2 84.4 98.0 99.5 96.3 92.1 96.5 BAIT [57] 98.5 98.8 71.1 89.5 98.0 97.5 97.5 95.7 97.2 Pr DA [25] 93.8 96.7 73.2 87.9 97.6 97.1 97.3 94.6 96.7 SHOT [33] 94.9 97.8 75.0 89.3 99.6 96.8 95.7 95.8 97.0 MA [30] 96.1 97.3 75.2 89.5 99.8 97.2 95.7 95.6 97.1 DECISION [1] 98.4 99.6 75.4 91.1 99.6 100.0 95.9 95.9 98.0 Ours-w/o Ent 97.5 99.1 74.2 90.3 99.1 99.0 94.5 96.3 97.2 Ours-w/o Div 97.2 98.6 73.7 89.8 98.6 99.3 94.1 95.7 96.9 Ours-w/o Cls 96.7 98.4 73.0 89.4 97.3 98.4 93.6 95.2 96.1 Ours-w/o Crc 98.6 99.5 75.4 91.2 99.6 100.0 95.3 96.5 97.9 Ours 98.9 99.8 75.8 91.6 99.8 100.0 96.8 97.1 98.4

(a) Source model only (b) Ours (c) Source model only (d) Ours Figure 2: t-SNE [45] visualizations on Ofﬁce-31 [22] dataset when performing D, W A (a)(b) and A, W D (c)(d) domain adaptation tasks.

consists of 65 shared object categories. Digits-Five [41] contains ﬁve digit recognition subsets including MNIST-M (MM), USPS (UP), MNIST (MT), SVHN (SV) and Synthetic Digits (SY).

Baseline Methods: To validate the effectiveness of our model, we conduct comparison experiments with a wide array of baseline methods. Speciﬁcally, MDAN [60], DCTN [55], M3SDA [41], MDDA [62] and Lt C-MSDA [47] are traditional representative multi-source domain adaptation methods with access to source data. BAIT [57], Pr DA [25], SHOT [33] and MA [30] focuses on unsupervised single-source domain adaptation without access to source data. We compare against the multi-source extensions of [25,30,33,57] by taking an average of target soft predictions from all adapted source models. DECISION [1] combines the source adaptation models with suitable weights automatically for multi-source-free domain adaptation. Furthermore, Source only denotes the performance of evaluation on target data when combining the rest of source data for training, and Source model only represents the average performance over the predictions of all source models.

6.2 Experiments on Ofﬁce-31 and Ofﬁce-Caltech Datasets

Performance Comparisons: The comparisons between our CAi DA model and other state-of-theart methods on Ofﬁce-31 [22] and Ofﬁce-Caltech [18] datasets are presented in Table 1. We have the following conclusions from the results in Table 1: 1) When compared with the multi-source domain adaptation methods [41, 47, 55, 60, 62] that employ source data for training, our proposed model without access to source data signiﬁcantly outperforms them by a large margin of 1.4% 8.3% in terms of mean accuracy. It veriﬁes the superiority of our model to tackle multi-source-free domain adaptation. 2) Our model performs better than the multi-source extensions of single-source-free domain adaptation methods [25, 30, 33, 57], which validates the effectiveness of our pseudo label generation process. 3) The performance of our model is better than [1] for all evaluation tasks, since the conﬁdent-anchor-induced pseudo label generator and class-relationship-aware consistency loss promote the adaptation performance. Figure 2 shows that our model signiﬁcantly narrows distribution discrepancy across domains on Ofﬁce-31 [22] when compared with Source model only.

Ablation Studies: This subsection introduces the effectiveness of each component in our model via ablation studies on Ofﬁce-31 [22] and Ofﬁce-Caltech [18] datasets, as shown in Table 1. Ours-

Accuracy (%)

0.7 10-2 10-3 0.9 10-4 (a) Ofﬁce-31

Accuracy (%)

0.7 10-2 10-3 0.9 10-4 (b) Ofﬁce-Caltech

R -> W R -> D 0

Office-31 Office-Caltech

(c) Weights of domain A

2 4 6 8 10 12 14 Number of Epoches

Accuracy (%)

A, C, D -> W A, C, W -> D C, D, W -> A A, D, W -> C

(d) Convergence Figure 3: Qualitative analysis about parameters {κ1, κ2} (a)(b), weights of domain A (c) and convergence on Ofﬁce-Caltech (d), where R in (c) denotes the rest of domains except for the target.

Table 2: Comparisons between our model and other competing methods on Ofﬁce-Home [46].

Methods Source Data Ar, Cl, Pr Re Ar, Cl, Re Pr Ar, Pr, Re Cl Cl, Pr, Re Ar Avg. Source only [20] 67.8 71.3 51.8 53.4 61.1 MDAN [60] 77.3 77.6 62.2 65.4 70.6 DCTN [55] 78.7 78.3 63.8 66.4 71.8 M3SDA [41] 79.4 79.1 63.5 67.2 72.3 MDDA [62] 79.6 79.5 62.3 66.7 71.0 Lt C-MSDA [47] 80.1 79.2 64.1 67.4 72.7 Source model only 76.3 78.8 50.1 50.9 64.0 BAIT [57] 77.2 79.4 59.6 71.1 71.8 Pr DA [25] 76.8 79.1 57.5 69.3 70.7 SHOT [33] 82.9 82.8 59.3 72.2 74.3 MA [30] 81.7 82.3 57.4 72.5 73.5 DECISION [1] 83.6 84.4 59.4 74.5 75.5 Ours-w/o Ent 82.6 83.0 58.7 74.2 74.6 Ours-w/o Div 82.1 82.9 58.5 73.8 74.3 Ours-w/o Cls 81.4 82.7 57.9 73.1 73.8 Ours-w/o Crc 83.5 84.4 59.7 74.9 75.6 Ours 84.2 84.7 60.5 75.2 76.2

w/o Ent, Ours-w/o Div, Ours-w/o Cls and Ours-w/o Crc are the abbreviations of training our proposed model without Lent, Ldiv, Lcls and Lcrc, respectively. When any one of component of our model is abandoned, the performance degrades 0.4% 2.3% in terms of average accuracy, which illustrates the rationality and effectiveness of all components to cooperate together. All modules play an indispensable role in improving performance, even though our model has no access to source data.

Parameter Investigation: This subsection investigates the effects of hyper-parameters κ1 in a range of {0.1, 0.3, 0.5, 0.7, 0.9} and κ2 in a range of {10 4, 10 3, 10 2, 10 1, 1} on Ofﬁce-31 [22] and Ofﬁce-Caltech [18] datasets, as shown in Figure 3 (a)(b). It validates that our proposed model achieves stable performance over a wide range of hyper-parameters selection. Moreover, the best performance of our proposed model on target domain is obtained when κ1 = 0.7 and κ2 = 10 2.

Contribution Weights and Convergence Analysis: Figure 3 (c)(d) present the contribution weights of domain A and convergence curves of our proposed model on Ofﬁce-31 [22] and Ofﬁce-Caltech [18] datasets. The source-speciﬁc transferable perception module could quantify the transferability contributions of complementary information from multiple source domains to target prediction. Furthermore, the accuracy on Ofﬁce-Caltech [18] dataset converges to a stable value after a few epoches, which demonstrates the convergence effectiveness of our proposed CAi DA model.

6.3 Experiments on Ofﬁce-Home Dataset

Performance Comparisons: As presented in Table 2, we conduct comparison experiments on Ofﬁce-Home [46] dataset to illustrate the effectiveness of our model. We have the following observations from Table 2: 1) Without access to source data, our model signiﬁcantly outperforms the representative multi-source domain adaptation methods [41,47,55,60,62] by 3.5% 5.6% in terms of average accuracy. 2) The conﬁdent pseudo label generator encourages our proposed model to perform better than [1], which veriﬁes the superiority of our model for multi-source-free domain adaptation. 3) The performance of our model improves a large margin of 1.9% 5.5% in terms of mean accuracy, compared with [25,30,33,57]. It validates the efﬁciency of source-speciﬁc transferable perception strategy and class-relationship-aware consistency to narrow distribution discrepancy.

Ablation Studies: As introduced in Table 2, we conduct ablation studies of our model on Ofﬁce Home [46] dataset to illustrate the rationality of all designed modules. When compared with Ours,

Table 3: Comparisons between our model and other competing methods on Digits-Five [41] dataset, where R denotes the rest of four domains except for the single target domain.

Methods Source Data R MM R MT R UP R SV R SY Avg. Source only [27] 63.4 90.5 88.7 63.5 82.4 77.7 MDAN [60] 69.5 98.0 92.4 69.2 87.4 83.3 DCTN [55] 70.5 96.2 92.8 77.6 86.8 84.8 M3SDA [41] 72.8 98.4 96.1 81.3 89.6 87.7 MDDA [62] 78.6 98.8 93.9 79.3 89.7 88.1 Lt C-MSDA [47] 85.6 99.0 98.3 83.2 93.0 91.8 Source model only 25.2 90.0 93.3 42.8 77.8 65.8 BAIT [57] 87.6 96.2 96.7 60.6 90.5 86.3 Pr DA [25] 86.2 95.4 95.8 57.4 84.8 83.9 SHOT [33] 90.4 98.9 97.7 58.3 83.9 85.8 MA [30] 90.8 98.4 98.0 59.1 84.5 86.2 DECISION [1] 93.0 99.2 97.8 82.6 97.5 94.0 Ours-w/o Ent 92.1 97.3 96.0 80.7 96.3 92.5 Ours-w/o Div 91.7 97.0 96.8 82.2 96.5 92.8 Ours-w/o Cls 91.3 96.6 96.4 80.5 95.8 92.1 Ours-w/o Crc 92.8 98.2 98.1 82.8 97.7 93.9 Ours 93.7 99.1 98.6 83.3 98.1 94.6

the performances of Ours-w/o Ent, Ours-w/o Div, Ours-w/o Cls and Ours-w/o Crc degrades 1.6%, 1.9%, 2.4% and 0.6% in terms of average accuracy, respectively. It validates that all designed modules could cooperate well to address MSFDA task. Moreover, a conﬁdent-anchor-induced pseudo label generator could reduce the distribution discrepancy via mining conﬁdent pseudo labels.

6.4 Experiments on Digits-Five Dataset

This subsection presents the ablation studies and comparison experiments between our model and other competing methods on Digits-Five [41] dataset, as introduced in Table 3. Some conclusions are drawn from the presented results in Table 3: 1) Our proposed model performs the best in terms of average accuracy when compared with multi-source domain adaptation methods [1,41,47,55,60,62] and single-source-free adaptation methods [25,30,33,57]. The signiﬁcant performance improvement demonstrates the effectiveness of our model, even without access to source data. 2) Our model outperforms [25, 30, 33, 57] by 8.3% 10.7% mean accuracy, since it automatically quantiﬁes the contributions of different source domains to target adaptation. When we compare our model with [1, 41, 47, 55, 60, 62], the conﬁdent-anchor-induced pseudo label generator and class-relationshipaware consistency loss facilitate the performance improvement by mining conﬁdent pseudo labels and aligning soft confusion matrices across domains. 3) The performance degradation in ablation studies (i.e., Ours-w/o Ent, Ours-w/o Div, Ours-w/o Cls and Ours-w/o Crc) validates that all proposed components are designed effectively and reasonably to explore transferable knowledge.

7 Conclusion and Future Work

This paper proposes a novel conﬁdent-anchor-induced multi-source-free domain adaptation (CAi DA) model to capture transferable knowledge from multiple source domains without access to source data. To be speciﬁc, a source-speciﬁc transferable perception module is developed to automatically weight the contributions of the transferability of source domains. Meanwhile, we design a conﬁdent-anchor-induced pseudo label generator to mine conﬁdent pseudo labels for the target domain by establishing a conﬁdent anchor group, and develop a class-relationship-aware consistency loss to capture consistent inter-class relationships across domains. Theoretical analysis provides some new perspectives to the highly conﬁdent pseudo labeling strategy, and gives theoretical support for MSFDA task under some proper mild assumptions. Extensive experiments illustrate the superiority of our proposed model. In the future, we will extend MSFDA to the multi-label [48,49] or open-set [13,37,53] scenarios and use MSFDA techniques to study pandemic [38,39].

Acknowledgments

This work was partially supported by National Nature Science Foundation of China under Grant 62003336; National Postdoctoral Innovative Talents Support Program under Grant BX20200353; Nature Foundation of Liaoning Province of China under Grant 2020-KF-11-01; and Australian Research Council Projects under Grant DP-180103424, DE-190101473, and IC-190100031.

[1] Sk Miraj Ahmed, Dripta S Raychaudhuri, Sujoy Paul, Samet Oymak, and Amit K. Roy-Chowdhury. Unsupervised multi-source domain adaptation without access to source data. In CVPR, pages 10103 10112, 2021.

[2] Zechen Bai, Zhigang Wang, Jian Wang, Di Hu, and Errui Ding. Unsupervised multi-source domain adaptation for person re-identiﬁcation. In CVPR, pages 12914 12923, 2021.

[3] Yogesh Balaji, Swami Sankaranarayanan, and Rama Chellappa. Metareg: Towards domain generalization using meta-regularization. In Neur IPS, volume 31, pages 1006 1016, 2018.

[4] Antonin Berthon, Bo Han, Gang Niu, Tongliang Liu, and Masashi Sugiyama. Conﬁdence scores make instance-dependent label-noise learning possible. In ICML, volume 139, pages 825 836, 2021.

[5] Konstantinos Bousmalis, George Trigeorgis, Nathan Silberman, Dilip Krishnan, and Dumitru Erhan. Domain separation networks. In Neur IPS, pages 343 351, 2016.

[6] John Bridle, Anthony Heading, and David Mac Kay. Unsupervised classiﬁers, mutual information and phantom targets. In Neur IPS, volume 4, pages 1096 1101, 1992.

[7] Jiacheng Cheng, Tongliang Liu, Kotagiri Ramamohanarao, and Dacheng Tao. Learning with bounded instance and label-dependent label noise. In ICML, volume 119, pages 1789 1799, 2020.

[8] Haoang Chi, Feng Liu, Wenjing Yang, Long Lan, Tongliang Liu, Bo Han, William K. Cheung, and James T. Kwok. TOHAN: A one-step approach towards few-shot hypothesis adaptation. 2021.

[9] Jiahua Dong, Yang Cong, Gan Sun, and Dongdong Hou. Semantic-transferable weakly-supervised endoscopic lesions segmentation. In ICCV, pages 10711 10720, 2019.

[10] Jiahua Dong, Yang Cong, Gan Sun, Yuyang Liu, and Xiaowei Xu. Cscl: Critical semantic-consistent learning for unsupervised domain adaptation. In ECCV, volume 12353, pages 745 762, 2020.

[11] Jiahua Dong, Yang Cong, Gan Sun, Yunsheng Yang, Xiaowei Xu, and Zhengming Ding. Weaklysupervised cross-domain adaptation for endoscopic lesions segmentation. IEEE Transactions on Circuits and Systems for Video Technology, 31:2020 2033, 2020.

[12] Jiahua Dong, Yang Cong, Gan Sun, Bineng Zhong, and Xiaowei Xu. What can be transferred: Unsupervised domain adaptation for endoscopic lesions segmentation. In CVPR, pages 4022 4031, 2020.

[13] Zhen Fang, Jie Lu, Anjin Liu, Feng Liu, and Guangquan Zhang. Learning bounds for open-set learning. In ICML, volume 139, pages 3122 3132, 2021.

[14] Zhen Fang, Jie Lu, Feng Liu, Junyu Xuan, and Guangquan Zhang. Open set domain adaptation: Theoretical bound and algorithm. IEEE Transactions on Neural Networks and Learning Systems, abs/1907.08375, 2019.

[15] Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. Domain-adversarial training of neural networks. J. Mach. Learn. Res., 17:20962030, 2016.

[16] Muhammad Ghifary, W. Bastiaan Kleijn, Mengjie Zhang, David Balduzzi, and Wen Li. Deep reconstruction-classiﬁcation networks for unsupervised domain adaptation. In ECCV, volume 9908, pages 597 613, 2016.

[17] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Neur IPS, volume 27, 2014.

[18] Kristen Grauman. Geodesic ﬂow kernel for unsupervised domain adaptation. In CVPR, page 20662073, 2012.

[19] Jiang Guo, Darsh Shah, and Regina Barzilay. Multi-source domain adaptation with mixture of experts. In EMNLP, pages 4694 4703, 2018.

[20] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, pages 770 778, 2016.

[21] Judy Hoffman, Mehryar Mohri, and Ningshan Zhang. Algorithms and theory for multiple-source adaptation. In Neur IPS, volume 31, pages 8256 8266, 2018.

[22] Judy Hoffman, Eric Tzeng, Taesung Park, Jun-Yan Zhu, Phillip Isola, Kate Saenko, Alexei Efros, and Trevor Darrell. Cy CADA: Cycle-consistent adversarial domain adaptation. In ICML, volume 80, pages 1989 1998, 2018.

[23] Han-Kai Hsu, Wei-Chih Hung, Hung-Yu Tseng, Chun-Han Yao, Yi-Hsuan Tsai, Maneesh Singh, and Ming-Hsuan Yang. Progressive domain adaptation for object detection. In CVPR Workshops, pages 738 746, 2019.

[24] Shatha Jaradat. Deep cross-domain fashion recommendation. In RECSYS, page 407410, 2017.

[25] Youngeun Kim, Donghyeon Cho, Priyadarshini Panda, and Sungeun Hong. Progressive domain adaptation from a source pre-trained model. ar Xiv preprint ar Xiv:2007.01524.

[26] Andreas Krause, Pietro Perona, and Ryan Gomes. Discriminative clustering by regularized information maximization. In Neur IPS, volume 23, pages 775 783, 2010.

[27] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86:2278 2324, 1998.

[28] Seungmin Lee, Dongwan Kim, Namil Kim, and Seong-Gyun Jeong. Drop to adapt: Learning discriminative features for unsupervised domain adaptation. In ICCV, pages 91 100, 2019.

[29] Keqiuyin Li, Jie Lu, Hua Zuo, and Guangquan Zhang. Multi-source contribution learning for domain adaptation. IEEE Transactions on Neural Networks and Learning Systems, pages 1 15, 2021.

[30] Rui Li, Qianfen Jiao, Wenming Cao, Hau-San Wong, and Si Wu. Model adaptation: Unsupervised domain adaptation without source data. In CVPR, pages 9638 9647, 2020.

[31] Xuefeng Li, Tongliang Liu, Bo Han, Gang Niu, and Masashi Sugiyama. Provably end-to-end label-noise learning without anchor points. In ICML, volume 139, pages 6403 6413, 2021.

[32] Yitong Li, michael Murias, geraldine Dawson, and David E Carlson. Extracting relationships by multidomain matching. In Neur IPS, volume 31, pages 6799 6810, 2018.

[33] Jian Liang, Dapeng Hu, and Jiashi Feng. Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In ICML, pages 6028 6039, 2020.

[34] Chuang Lin, Sicheng Zhao, Lei Meng, and Tat-Seng Chua. Multi-source domain adaptation for visual sentiment classiﬁcation. AAAI, 34:2661 2668, 2020.

[35] Feng Liu, Wenkai Xu, Jie Lu, Guangquan Zhang, Arthur Gretton, and Danica J. Sutherland. Learning deep kernels for non-parametric two-sample tests. In ICML, volume 119, pages 6316 6326, 2020.

[36] Tongliang Liu and Dacheng Tao. Classiﬁcation with noisy labels by importance reweighting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38:447 461, 2016.

[37] Yadan Luo, Zijian Wang, Zi Huang, and Mahsa Baktashmotlagh. Progressive graph learning for open-set domain adaptation. In ICML, volume 119, pages 6468 6478, 2020.

[38] Qianqian Ma, Yang-Yu Liu, and Alex Olshevsky. Optimal lockdown for pandemic control. ar Xiv preprint ar Xiv:2010.12923, 2020.

[39] Qianqian Ma, Yang-Yu Liu, and Alex Olshevsky. Optimal vaccine allocation for pandemic stabilization. ar Xiv preprint ar Xiv: 2109.04612, 2021.

[40] Yishay Mansour, Mehryar Mohri, and Afshin Rostamizadeh. Multiple source adaptation and the rényi divergence. In UAI, pages 367 374, 2009.

[41] Xingchao Peng, Qinxun Bai, Xide Xia, Zijun Huang, Kate Saenko, and Bo Wang. Moment matching for multi-source domain adaptation. In ICCV, pages 1406 1415, 2019.

[42] Zhen Qiu, Yifan Zhang, Hongbin Lin, Shuaicheng Niu, Yanxia Liu, Qing Du, and Mingkui Tan. Sourcefree domain adaptation via avatar prototype generation and adaptation. In IJCAI, pages 2921 2927, 2021.

[43] Aravind Rajeswaran, Chelsea Finn, Sham M Kakade, and Sergey Levine. Meta-learning with implicit gradients. In Neur IPS, volume 32, pages 113 124, 2019.

[44] Paolo Russo, Tatiana Tommasi, and Barbara Caputo. Towards multi-source adaptive semantic segmentation. In Image Analysis and Processing, pages 292 301, 2019.

[45] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9:2579 2605, 2008.

[46] Hemanth Venkateswara, Jose Eusebio, Shayok Chakraborty, and Sethuraman Panchanathan. Deep hashing network for unsupervised domain adaptation. In CVPR, pages 5385 5394, 2017.

[47] Hang Wang, Minghao Xu, Bingbing Ni, and Wenjun Zhang. Learning to combine: Knowledge aggregation for multi-source domain adaptation. In ECCV, pages 727 744, 2020.

[48] Lichen Wang, Zhengming Ding, and Yun Fu. Generic multi-label annotation via adaptive graph and marginalized augmentation. ACM Transactions on Knowledge Discovery from Data, 16:1 20, 2021.

[49] Lichen Wang, Yunyu Liu, Can Qin, Gan Sun, and Yun Fu. Dual relation semi-supervised multi-label learning. In AAAI, volume 34, pages 6227 6234, 2020.

[50] Lixu Wang, Shichao Xu, Xiao Wang, and Qi Zhu. Eavesdrop the composition proportion of training labels in federated learning. ar Xiv preprint ar Xiv:1910.06044, 2019.

[51] Lixu Wang, Shichao Xu, Xiao Wang, and Qi Zhu. Towards class imbalance in federated learning. ar Xiv preprint ar Xiv:2008.06217, 2020.

[52] Ximei Wang, Ying Jin, Mingsheng Long, Jianmin Wang, and Michael I Jordan. Transferable normalization: Towards improving transferability of deep neural networks. In Neur IPS, volume 32, pages 1951 1961, 2019.

[53] Zijian Wang, Yadan Luo, Ruihong Qiu, Zi Huang, and Mahsa Baktashmotlagh. Learning to diversify for single domain generalization. In ICCV, 2021.

[54] Songhua Wu, Xiaobo Xia, Tongliang Liu, Bo Han, Mingming Gong, Nannan Wang, Haifeng Liu, and Gang Niu. Class2simi: A noise reduction perspective on learning with noisy labels. In ICML, volume 139, pages 11285 11295, 2021.

[55] Ruijia Xu, Ziliang Chen, Wangmeng Zuo, Junjie Yan, and Liang Lin. Deep cocktail network: Multisource unsupervised domain adaptation with category shift. In CVPR, pages 3964 3973, 2018.

[56] Luyu Yang, Yogesh Balaji, Ser-Nam Lim, and Abhinav Shrivastava. Curriculum manager for source selection in multi-source domain adaptation. In ECCV, pages 608 624, 2020.

[57] Shiqi Yang, Yaxing Wang, Joost van de Weijer, Luis Herranz, and Shangling Jui. Unsupervised domain adaptation without source data by casting a bait. ar Xiv preprint ar Xiv:2010.12427, 2020.

[58] Jun Zhang, Weien Zhou, Xianqi Chen, Wen Yao, and Lu Cao. Multisource selective transfer framework in multiobjective optimization problems. IEEE Transactions on Evolutionary Computation, 24:424 438, 2020.

[59] Yiyang Zhang, Feng Liu, Zhen Fang, Bo Yuan, Guangquan Zhang, and Jie Lu. Clarinet: A one-step approach towards budget-friendly unsupervised domain adaptation. In IJCAI, pages 2526 2532, 2020.

[60] Han Zhao, Shanghang Zhang, Guanhang Wu, José M. F. Moura, Joao P Costeira, and Geoffrey J Gordon. Adversarial multiple source domain adaptation. In Neur IPS, volume 31, pages 8568 8579, 2018.

[61] Sicheng Zhao, Bo Li, Xiangyu Yue, Yang Gu, Pengfei Xu, Runbo Hu, Hua Chai, and Kurt Keutzer. Multi-source domain adaptation for semantic segmentation. In Neur IPS, volume 32, pages 7285 7298, 2019.

[62] Sicheng Zhao, Guangzhi Wang, Shanghang Zhang, Yang Gu, Yaxian Li, Zhichao Song, Pengfei Xu, Runbo Hu, Hua Chai, and Kurt Keutzer. Multi-source distilling domain adaptation. AAAI, 34:12975 12983, 2020.

[63] Li Zhong, Zhen Fang, Feng Liu, Jie Lu, Bo Yuan, and Guangquan Zhang. How does the combined risk affect the performance of unsupervised domain adaptation approaches? In AAAI, pages 11079 11087, 2021.