# fairness_and_accuracy_under_domain_generalization__0670b0c5.pdf

Published as a conference paper at ICLR 2023

FAIRNESS AND ACCURACY UNDER DOMAIN GENERALIZATION

Thai-Hoang Pham, Xueru Zhang, Ping Zhang The Ohio State University, Columbus, OH 43210, USA {pham.375,zhang.12807,zhang.10631}@osu.edu

As machine learning (ML) algorithms are increasingly used in high-stakes applications, concerns have arisen that they may be biased against certain social groups. Although many approaches have been proposed to make ML models fair, they typically rely on the assumption that data distributions in training and deployment are identical. Unfortunately, this is commonly violated in practice and a model that is fair during training may lead to an unexpected outcome during its deployment. Although the problem of designing robust ML models under dataset shifts has been widely studied, most existing works focus only on the transfer of accuracy. In this paper, we study the transfer of both fairness and accuracy under domain generalization where the data at test time may be sampled from never-before-seen domains. We first develop theoretical bounds on the unfairness and expected loss at deployment, and then derive sufficient conditions under which fairness and accuracy can be perfectly transferred via invariant representation learning. Guided by this, we design a learning algorithm such that fair ML models learned with training data still have high fairness and accuracy when deployment environments change. Experiments on real-world data validate the proposed algorithm. Model implementation is available at https://github.com/pth1993/FATDM.

1 INTRODUCTION

Machine learning (ML) algorithms trained with real-world data may have inherent bias and exhibit discrimination against certain social groups. To address the unfairness in ML, existing studies have proposed many fairness notions and developed approaches to learning models that satisfy these fairness notions. However, these works are based on an implicit assumption that the data distributions in training and deployment are the same, so that the fair models learned from training data can be deployed to make fair decisions on testing data. Unfortunately, this assumption is commonly violated in real-world applications such as healthcare e.g., it was shown that most US patient data for training ML models are from CA, MA, and NY, with almost no representation from the other 47 states (Kaushal et al., 2020). Because of the distribution shifts between training and deployment, a model that is accurate and fair during training may behave in an unexpected way and induce poor performance during deployment. Therefore, it is critical to account for distribution shifts and learn fair models that are robust to potential changes in deployment environments.

The problem of learning models under distribution shifts has been extensively studied in the literature and is typically referred to as domain adaptation/generalization, where the goal is to learn models on source domain(s) that can be generalized to a different (but related) target domain. Specifically, domain adaptation requires access to (unlabeled) data from the target domain at training time, and the learned model can only be used at a specific target domain. In contrast, domain generalization considers a more general setting when the target domain data are inaccessible during training; instead it assumes there exists a set of source domains based on which the learned model can be generalized to an unseen, novel target domain. For both problems, most studies focus only on the generalization of accuracy across domains without considering fairness, e.g., by theoretically examining the relations between accuracy at target and source domains (Mansour et al., 2008; 2009; Hoffman et al., 2018; Zhao et al., 2018; Phung et al., 2021; Deshmukh et al., 2019; Muandet et al., 2013; Blanchard et al., 2021; Albuquerque et al., 2019; Ye et al., 2021; Sicilia et al., 2021; Shui et al., 2022) or/and developing practical methods (Albuquerque et al., 2019; Zhao et al., 2020; Li et al., 2018a; Sun &

Published as a conference paper at ICLR 2023

Saenko, 2016; Ganin et al., 2016; Ilse et al., 2020; Nguyen et al., 2021). To the best of our knowledge, only Chen et al. (2022); Singh et al. (2021); Coston et al. (2019); Rezaei et al. (2021); Oneto et al. (2019); Madras et al. (2018); Schumann et al. (2019); Yoon et al. (2020) considered the transfer of fairness across domains. However, all of them focused on domain adaptation, and many also imposed rather strong assumptions on distributional shifts (e.g., covariate shifts (Singh et al., 2021; Coston et al., 2019; Rezaei et al., 2021), demographic shift (Giguere et al., 2022), prior probability shift (Biswas & Mukherjee, 2021)) that may be violated in practice. Among them, most focused on empirically examining how fairness properties are affected under distributional shifts, whereas theoretical understandings are less studied (Schumann et al., 2019; Yoon et al., 2020). Details and more related works are in Appendix A.

Asian Black Others

Hispanic or Latino

Training Data

Hispanic or Latino

Source Domain 1 Source Domain N

Unseen Target Domain

Figure 1: An example of domain generalization in healthcare: (fair) ML model trained with patient data in CA, NY, etc., can be deployed in other states by maintaining high accuracy/fairness.

In this paper, we study the transfer of both fairness and accuracy in domain generalization via invariant representation learning, where the data in target domain is unknown and inaccessible during training. A motivating example is shown in Figure 1. Specifically, we first establish a new theoretical framework that develops interpretable bounds on accuracy/fairness at a target domain under domain generalization, and then identify sufficient conditions under which fairness/accuracy can be perfectly transferred to an unseen target domain. Importantly, our theoretical bounds are fundamentally different from the existing bounds, compared to which ours are better connected with practical algorithmic design, i.e., our bounds are aligned with the objective of adversarial learning-based algorithms, a method that is widely used in domain generalization.

Inspired by the theoretical findings, we propose Fairness and Accuracy Transfer by Density Matching (FATDM), a two-stage learning framework such that the representations and fair model learned with source domain data can be well-generalized to an unseen target domain. Last, we conduct the experiments on real-world data; the empirical results show that fair ML models trained with our method still attain a high accuracy and fairness when deployment environments differ from the training. Our main contributions and findings are summarized as follows:

We consider the transfer of both accuracy and fairness in domain generalization. To the best of our knowledge, this is the first work studying domain generalization with fairness consideration. We develop upper bounds for expected loss (Thm. 1) and unfairness (Thm. 3) in target domains. Notably, our bounds are significantly different from the existing bounds as discussed in Appendix A. We also develop a lower bound for expected loss (Thm. 2); it indicates an inherent tradeoff of the existing methods which learn marginal invariant representations for domain generalization. We identify sufficient conditions under which fairness and accuracy can be perfectly transferred from source domains to target domains using invariant representation learning (Thm. 4). We propose a two-stage training framework (i.e., based on Thm. 5) that learns models in source domains (Sec. 4), which can generalize both accuracy and fairness to target domain. We conduct experiments on real-world data to validate the effectiveness of the proposed method.

2 PROBLEM FORMULATION

Notations. Let X, A, and Y denote the space of features, sensitive attribute (distinguishing different groups, e.g., race/gender), and label, respectively. Let Z be the representation space induced from X by representation mapping g : X Z. We use X, A, Y , Z to denote random variables that take values in X, A, Y, Z and x, a, y, z the realizations. A domain D is specified by distribution PD : X A Y [0, 1] and labeling function f D : X Y , where is a probability simplex over Y. Similarly, let h D : Z Y be a labeling function from representation space for domain D. Note that f D, h D, g are stochastic functions and f D = h D g.1 For simplicity, we use P V D (or P V |U D ) to denote the induced marginal (or conditional) distributions of variable V (given U) in domain D.

1The deterministic labeling function is a special case when it follows Dirac delta distribution in .

Published as a conference paper at ICLR 2023

Error metric. Consider hypothesis bf = bh g : X Y , where bh : Z Y is the hypothesis directly used in representation space. Denote bf(x)y as the element on y-th dimension which predicts the probability that label Y = y given X = x. Then the expected error of bf in domain D is defined as ϵAcc D ( bf) = ED[L( bf(X), Y )] for some loss function L : Y Y R+ (e.g., 0-1 loss, cross-entropy loss). Similarly, define the expected error of bh in representation space as ϵAcc D (bh) = ED[L(bh(Z), Y )]. Note that most existing works (Albuquerque et al., 2019; Zhao et al., 2018) focus on optimizing ϵAcc D (bh), while our goal is to attain high accuracy in input space, i.e., low ϵAcc D ( bf).

Unfairness metric. We focus on group fairness notions (Makhlouf et al., 2021) that require certain statistical measures to be equalized across different groups; many of them can be formulated as (conditional) independence statements between random variables bf(X), A, Y , e.g., demographic parity ( bf(X) A: the likelihood of a positive outcome is the same across different groups) (Dwork et al., 2012) , equalized odds ( bf(X) A|Y : true positive rate (TPR) and false positive rate (FPR) are the same across different groups), equal opportunity ( bf(X) A|Y = 1 when Y = {0, 1}: TPR is the same across different groups) (Hardt et al., 2016). In the paper, we will present the results under equalized odds (EO) fairness with binary Y = {0, 1} and A = {0, 1}, while all the results (e.g., methods, analysis) can be generalized to multi-class, multi-protected attributes, and other fairness notions. Given hypothesis bf = bh g : X Y , the violation of EO in domain D can be measured

as ϵEO D ( bf) = P y Y D P b f(X)1|Y =y,A=0 D ||P b f(X)1|Y =y,A=1 D for some distance metric D( || ).

Problem setup. Consider a problem of domain generalization where a learning algorithm has access to data {(xk, ak, yk, dk)}m k=1 sampled from a set of N source domains {DS i }i [N], where dk is the domain label and [N] = {1, , N}. Our goal is to learn a representation mapping g : X Z and a fair model bh : Z Y trained on source domains such that the model bf = bh g can be generalized to an unseen target domain DT in terms of both accuracy and fairness. Specifically, we investigate under what conditions and by what algorithms we can guarantee that attaining high accuracy and fairness at source domains {DS i }N i=1 implies small ϵAcc DT ( bf) and ϵEO DT ( bf) at unknown target domain.

3 THEORETICAL RESULTS

In this section, we present the results on the transfer of accuracy/fairness under domain generalization via domain-invariant learning (proofs are shown in Appendix E). We first examine that for any model bh : Z Y and any representation mapping g : X Z, how the accuracy/fairness attained at source domains {DS i }N i=1 can be affected when bf = bh g is deployed at any target domain DT . Specifically, we can bound the error and unfairness at any target domain based on source domains. Before presenting the results, we first introduce the discrepancy measure used for measuring the dissimilarity between domains.

Discrepancy measure. We adopt Jensen-Shannon (JS) distance (Endres & Schindelin, 2003) to measure the dissimilarity between two distributions. Formally, JS distance between distributions P and P is defined as d JS(P, P ) := p

DJS(P||P ),

where DJS(P||P ) := 1

2DKL(P|| P +P

2DKL(P || P +P

2 ) is JS divergence defined based on Kullback Leibler (KL) divergence DKL( || ). Note that unlike KL divergence, JS divergence is symmetric and bounded: 0 DJS(P||P ) 1.

While different discrepancy measures such as H and H H divergences (Ben-David et al., 2010) (i.e., definitions are given in Appendix A) were used in prior works, we particularly consider JS distance because (1) it is aligned with training objective for discriminator in generative adversarial networks (GAN) (Goodfellow et al., 2014), and many existing methods (Ganin et al., 2016; Albuquerque et al., 2019) for invariant representation learning are built based on GAN framework; (2) H and H H divergences are limited to settings where the labeling functions f D are deterministic (Ben-David et al., 2010; Albuquerque et al., 2019; Zhao et al., 2018). In contrast, our bounds admit the stochastic labeling functions. The limitations of other discrepancy measures and existing bounds are discussed in detail in Appendix A.

Published as a conference paper at ICLR 2023

Theorem 1 (Upper bound: accuracy) For any hypothesis bh : Z Y , any representation mapping g : X Z, and any loss function L : Y Y R+ that is upper bounded by C, the expected error of bf = bh g : X Y at any unseen target domain DT is upper bounded:2

ϵAcc DT bf 1

i=1 ϵAcc DS i

| {z } term (i)

2C min i [N]d JS P X,Y DT , P X,Y DS i

| {z } term (ii)

2C max i,j [N]d JS P Z,Y DS i , P Z,Y DS j

| {z } term (iii)

The upper bound in Eq. (1) are interpretable and have three terms: term (i) is the averaged error of source domains in input space; term (ii) is the discrepancy between the target domain and the source domains in input space; term (iii) is the discrepancy between the source domains in representation space.3 It provides guidance on learning the proper representation mapping g : X Z: to ensure small error at target domain ϵAcc DT ( bf), we shall learn representations such that the upper bound of ϵAcc DT ( bf) is minimized. Because term (ii) depends on the unknown target domain DT and it s evaluated in input space X Y, it is fixed and is out of control during training, we can only focus on term (i) and term (iii), i.e., learn representations Z such that errors at source domains ϵAcc DS i ( bf) and the

discrepancy between source domains in the representation space d JS(P Z,Y DS i , P Z,Y DS j ) are minimized.

Corollary 1.1 i, j, JS distance between P Z,Y DS i and P Z,Y DS j in Eq. (1) can be decomposed:

d JS P Z,Y DS i , P Z,Y DS j

= d JS P Y DS i , P Y DS j

2Ey P Y DS i,j

d JS P Z|Y DS i , P Z|Y DS j

where P Y DS i,j = 1

2 P Y DS i + P Y DS j

Our algorithm in Sec. 4 is designed based on above decomposition: because P Y DS i solely depends

on source domain DS i , we learn representations by minimizing d JS(P Z|Y DS i , P Z|Y DS j ), i, j. Combining

Thm. 1 and Corollary 1.1, to ensure high accuracy at unseen target domain DT , we learn the representation mapping g and model bh such that P Z|Y DS i is invariant across source domains, and

meanwhile bf = bh g attains high accuracy at source domains.

Note that unlike our method, many existing works (Phung et al., 2021; Albuquerque et al., 2019; Ganin et al., 2016) suggest that to ensure high accuracy in domain generalization, representation mapping g should be learned such that P Z DS i is same across domains, i.e., small d JS(P Z DS i , P Z DT ).

However, we show that the domain-invariant P Z DS i may adversely increase the error at target domain, as indicated in the Thm. 2 below.

Theorem 2 (Lower bound: accuracy) Suppose L( bf(x), y) = P

ˆy Y bf(x)ˆy L(ˆy, y) where function L : Y Y R+ is lower bounded by c when ˆy = y, and is 0 when ˆy = y. If d JS(P Y DS i , P Y DT )

d JS(P Z DS i , P Z DT ), the expected error of bf at source and target domains is lower bounded:

i=1 ϵAcc DS i ( bf) + ϵAcc DT ( bf) c 4|Y|N

d JS(P Y DS i , P Y DT ) d JS(P Z DS i , P Z DT ) 4 . (2)

The above lower bound shows an inherent trade-off of approaches that minimize d JS(P Z DS i , P Z DT )

when learning the representations. Specifically, with the domain-invariant P Z DS i , the right hand side

of Eq. (2) may increase, resulting in an increased error at target domain ϵAcc DT ( bf).

2The condition on the bounded loss is mild and can be satisfied by many loss functions. For example, crossentropy loss can be bounded by modifying the softmax output from p1, p2, , p|Y| to ˆp1, ˆp2, , ˆp|Y| , where ˆpi = pi(1 exp( C)|Y|) + exp( C), i |Y|. 3In fact, a tighter upper bound for the loss at target domain can be established using strong data processing inequality (Polyanskiy & Wu, 2017), as detailed in Appendix D

Published as a conference paper at ICLR 2023

Similar to the loss, the unfairness at target domain can also be upper bounded, as presented in Thm. 3.

Theorem 3 (Upper bound: fairness) Consider a special case where the unfairness measure is defined as the distance between means of two distributions:

ϵEO D ( bf) = P

y {0,1} ED h bf(X)1|Y = y, A = 0 i ED h bf(X)1|Y = y, A = 1 i ,

then the unfairness at any unseen target domain DT is upper bounded:

ϵEO DT bf 1 N

i=1 ϵEO DS i

2 min i [N]

a {0,1} d JS P X|Y =y,A=a DT , P X|Y =y,A=a DS i

2 max i,j [N]

a {0,1} d JS P Z|Y =y,A=a DS i , P Z|Y =y,A=a DS j

Similar to Thm. 1, the upper bound in Thm. 3 also has three terms and the second term is out of control during training because it depends on the unseen target domain and is defined in input space. Therefore, to maintain fairness at target domain DT , we learn the representation mapping g and model bh such that P Z|Y,A DS i is invariant across source domains, and meanwhile bf = bh g attains high fairness at source domains.

The results above characterize the relations between accuracy/fairness at any target and source domains under any representation mapping g and model bh. Next, we identify conditions under which the accuracy/fairness attained at sources can be perfectly transferred to a target domain.

Theorem 4 (Sufficient condition for perfect transfer) Consider N source domains {DS i }N i=1 and an unseen target domain DT . Define set Λ = {Dt : Dt = PN i=1 πi DS i , {πi} N 1}.

1. (Transfer of fairness) DT Λ, if g is the mapping under which P Z|Y,A DS i is the same across all

source domains, then ϵEO DS i (bh) = ϵEO DT (bh) = ϵEO DS i ( bf) = ϵEO DT ( bf), i.

2. (Transfer of accuracy) DT Λ, if P Y DS i is the same and if g is the mapping under which P Z|Y DS i is the same across all source domains, then ϵAcc DS i (bh) = ϵAcc DT (bh) = ϵAcc DS i ( bf) = ϵAcc DT ( bf), i.

Thm. 4 indicates the possibility of attaining the perfect transfer of accuracy/fairness and examples of such representation mappings are provided. Note that these results are consistent with Thm. 1 and Thm. 3, which also suggest learning domain-invariant representations P Z|Y DS i and P Z|Y,A DS i .

4 PROPOSED ALGORITHM

Table 1: Usages of terms in Eq. (3) to guarantee the fairness and accuracy in target domain.

Loss terms Usages

Lcls Mimimize ϵAcc DS i Lfair Mimimize ϵEO DS i

Linv Minimize d JS P Z|Y DS i , P Z|Y DS j

and d JS P Z|Y,A DS i , P Z|Y,A DS j

Lcls + Lfair + Linv Mimimize ϵAcc DT and ϵEO DT

The accuracy and fairness upper bounds in Sec. 3 shed light on designing robust ML model that can preserve high accuracy and fairness on unseen target domains. Specifically, the model consists of representation mapping g : X Z and classifier bh : Z Y such that (1) the prediction errors and unfairness of bf = bh g on source domains are minimized; and (2) the discrepancy of learned conditional representations (i.e, P Z|Y DS i and P Z|Y,A DS i ) among source domains is minimized. That is,

min g,bh Lcls(g,bh) + ωLfair(g,bh) + γLinv(g) (3)

where Lcls, Lfair, and Linv are expected losses that penalize incorrect classification, unfairness, and discrepancy among source domains. Hyper-parameters ω > 0 and γ > 0 control the accuracyfairness trade-off and accuracy-invariant representation trade-off, respectively. The usages of these three losses are summarized in Table 1.

Published as a conference paper at ICLR 2023

Adversarial learning framework (Goodfellow et al., 2014). Linv in Eq. (3) can be optimized directly with adversarial learning. This is because the training objective of the discriminator in GAN is aligned with our goal of minimizing JS distance between P Z|Y DS i (or P Z|Y,A DS i ) among source domains, as mentioned in Sec. 3. Specifically, define a set of discriminators K = {ky : y Y} {ky,a : y Y, a A}; each discriminator ky (resp. ky,a) aims to distinguish whether a sample with label y (resp. label y and sensitive attribute a) comes from a particular domain (i.e., maximize Linv). The representation mapping g should be learned to increase the error of discriminators (i.e., minimize Linv). Therefore, the model and discriminators can be trained simultaneously by playing a two-player minimax game (i.e., ming max K Linv(g)). Combine with the objective of minimizing prediction error and unfairness (i.e., ming,bh Lcls(g,bh) + ωLfair(g,bh)), the overall learning objective is:

min g,bh max K Lcls(g,bh) + ωLfair(g,bh) + γLinv(g) (4)

However, the above adversarial learning framework for learning domain-invariant representation may not work well when |Y A| is large: as the label space and sensitive attribute space get larger, the number of discriminators to be learned increases and the training can be highly unstable. A naive solution to tackling this issue is to use one discriminator y Y, a A. However, this would result in the reduced mutual information between representations and label/sensitive attribute, which may hurt the accuracy. We thus propose another approach to learn the domain-invariant representations.

source domain source domain

Figure 2: 1D illustration of domain-invariant representation. To transfer accuracy and fairness to target domains, we need to find representation z such that P z|y DS i and P z|y,a DS i are domain-invariant.

Proposed solution to learning invariant representations. For any domain D, we have:

P Z|y D = Z

X P Z,x|y D dx = Z

X P Z|x P x|y D dx

P Z|y,a D = Z

X P Z,x|y,a D dx = Z

X P Z|x P x|y,a D dx

where P Z|x is domain-independent so we drop D in subscript. Given any two source domains DS i and DS j , in general P X|y DS i = P X|y DS j and P X|y,a DS i = P X|y,a DS j so that it is nontrivial to achieve domain-invariant representations P Z|y DS i = P Z|y DS j and P Z|y,a DS i = P Z|y,a DS j . How-

ever, if there exist invertible functions my i,j : X X and my,a i,j : X X that can match the density

functions of X from DS i to DS j such that P X|y DS i = P my i,j(X)|y

DS j and P X|y,a DS i = P my,a i,j (X)|y,a

if we can find the representation Z such that P Z|x DS i = P Z|my i,j(x)

DS i and P Z|x DS i = P Z|my,a i,j (x)

DS i , then y Y, a A, we have:

P Z|y DS i = Z

X P Z|x P x|y DS i dx = Z

X P Z|x P x |y DS j dx = P Z|y DS j

P Z|y,a DS i = Z

X P Z|x P X|y,a DS i dx = Z

X P Z|x P x |y,a DS j dx = P Z|y,a DS j

where x = my i,j(x) and x = my,a i,j (x). This observation suggests that to minimize the discrepancy of representation distributions among source domains, we can first find the density mapping functions my i,j and my,a i,j , y, a, i, j, and then minimize the discrepancies between P Z|x, P Z|x , and P Z|x , x. This is formally shown in Thm. 5 below.

Theorem 5 If there exist invertible mappings my i,j and my,a i,j such that P X|y DS i = P my i,j(X)|y

P X|y,a DS i = P my,a i,j (X)|y,a

DS j , y, a, i, j, and if the representation mapping are in the form of g := P Z|x =

N(µ(x), σ2Id), where µ(x) is the function of x and d is the dimension of the representation space

Z, then minimizing d JS P Z|y DS i , P Z|y DS j

and d JS P Z|y,a DS i , P Z|y,a DS j

can be reduced to minimizing µ(x) µ my i,j(x) 2 and µ(x) µ my,a i,j (x) 2, respectively.

Published as a conference paper at ICLR 2023

Based on Thm. 5, we propose a two-stage learning approach FATDM, as stated below.

Remark 1 (Fairness and Accuracy Transfer by Density Matching (FATDM)) Given the existence of density matching functions my i,j and my,a i,j , and representation mapping g := N(µ(x), σ2Id), domain-invariant representations can be learned via a two-stage process: (i) finding these mapping functions my i,j and my,a i,j ; (ii) minimizing the mean squared errors between µ(x) and µ my i,j(x) , and µ(x) and µ my,a i,j (x) , i, j [N], x X, y Y, a A.

Stage 1: learning mapping functions my i,j and my,a i,j across source domains. Many approaches can be leveraged to estimate my i,j and my,a i,j from data. In our study, we adopt Star GAN (Choi et al., 2018) and Cycle GAN (Zhu et al., 2017) as examples; both frameworks are widely used in multi-domain image-to-image translation and can be leveraged. In our algorithm, we independently train two translation models Density Match Y and Density Match Y,A using Star GAN or Cycle GAN, with each used for learning {my i,j}y Y,i,j [N] and {my,a i,j }y Y,a A,i,j [N], respectively.

Stage 1: Finding

Stage 2: Enforcing

source domain source domain

Figure 3: FATDM: two-stage training

Specifically, Density Match Y (or Density Match Y,A) consists of a generator G : X [N] [N] X and a discriminator D : X [N] {0, 1}. The generator takes in real image x and a pair of domain labels i, j as input and generates a fake image; the discriminator aims to predict the domain label of the image generated by the generator and distinguish whether it is fake or real. G and D are learned simultaneously by solving the minimax game, and their loss functions are specified in Appendix B. When the training is completed, we obtain two optimal generators from Density Match Y and Density Match Y,A, denoted as GY and GY,A. We shall use GY ( , i, j) (resp. GY,A( , i, j)) directly as the density mapping function {my i,j( )}y Y (resp. {my,a i,j ( )}y Y,a A).

Stage 2: learning domain-invariant representation. Given GY and GY,A learned in stage 1, we are ready to learn the invariant representation Z by finding g : X Z such that g := N(µ(x), σ2Id) and minimizes the following:

Linv = Ed,d ,d {DS i }i [N] [Lmse(µ(X), µ(X )) + Lmse(µ(X), µ(X ))] (5)

where d, d , d are domain labels sampled from source domains, X is features sampled from domain d, X = GY (X, d, d ), X = GY,A(X, d, d ), Lmse is mean squared error. The pseudo-code of our proposed model (FATDM) is in Algorithm 1. The detailed architecture of FATDM is in Appendix B.

Algorithm 1: Fairness and Accuracy Transfer by Density Matching (FATDM)

Input: Training dataset Dtrain from N source domains {DS i }N i=1 Output: representation mapping g, classifier bh, density matching functions GY , GY,A

1 Procedure Density_Matching(Dtrain)

/* Procedure for training GY is similar but not presented */

2 while training Density Match Y,A is not end do

3 Sample y Y, a A and data batch B = {xk, dk|ak = a, yk = y}|B| k=1 from Dtrain ;

4 Update GY,A based on the objectives of the minimax game (Appendix B).

5 Procedure Invariant_Representation_Learning(Dtrain, GY , GY,A)

6 while training FATDM is not end do

7 Sample data batch B = {xk, ak, yk, dk}|B| k=1 from Dtrain ;

8 Sample lists of domain labels {d i}|B| k=1 and {d i }|B| k=1;

9 Generate sets of artificial images {x k}|B| k=1 and {x k}|B| k=1 by GY and GY,A ;

10 Update g, ˆh by optimizing Eq. (3) with Linv defined in Eq. (5).

Remark 2 (Summary of theoretical results and proposed algorithm) Thm. 1 and Thm. 3 suggest a way to ensure high accuracy and fairness in target domain: by minimizing the source error ϵAcc Ds i (i.e., Lcls in Eq. (3)), the source unfairness ϵEO Ds i (i.e.,Lfair in Eq. (3)), and the discrepancies between source

domains d JS P Z|Y =y DS i , P Z|Y =y DS j

and d JS P Z|Y =y,A=a DS i , P Z|Y =y,A=a DS j

(i.e., Linv in Eq. (3)). The

Published as a conference paper at ICLR 2023

common way to optimize Eq. (3) using adversarial learning (Eq. (4)) is not stable when |Y A| is large. Thm. 5 states that instead of using adversarial learning, Eq. (3) can be optimized via 2-stage learning: (i) find mappings my i,j and my,a i,j ( Density_Matching in Alg. 1) and (ii) minimize Eq. (3) with Linv defined in Eq. (5) ( Invariant_Representation_Learning in Alg. 1).

5 EXPERIMENTS

We conduct experiments on MIMIC-CXR database (Johnson et al., 2019), which includes 377,110 chest X-ray images associated with 227,827 imaging studies about 14 diseases performed at the Beth Israel Deaconess Medical Center. Importantly, these images are linked with MIMIC-IV database (Johnson et al., 2021) which includes patients information such as age, and race; these can serve as sensitive attributes for measuring the unfairness. Based on MIMIC-CXR and MIMIC-IV data, we construct two datasets on two diseases: Cardiomegaly disease: we first extract all images related to Cardiomegaly disease, and the corresponding labels (i.e., positive/negative) and sensitive attributes (i.e., male/female); then we partition the data into four domain-specific datasets based on age (i.e., [18, 40), [40, 60), [60, 80), [80, 100)). We consider age as domain label because it captures the real scenario that there are distribution shifts across patients with different ages. Edema disease: we extract all images related to Edema disease, and corresponding labels (i.e., positive/negative) and sensitive attributes (i.e., age with ranges [18, 40), [40, 60), [60, 80), [80, 100)). Unlike Cardiomegaly data, we construct the dataset for each domain by first sampling images from Edema data followed by θ degree counter-clockwise rotation, where θ {0 , 15 , 30 , 45 , 60 }. We consider rotation degree as domain label to model the scenario where there is rotational misalignment among images collected from different devices. Next, we focus on Cardiomegaly disease and the results for Edema disease are shown in Appendix C.

Baselines. We compare our method (i.e., FATDM-Star GAN and FATDM-Cycle GAN) with existing methods for domain generalization, including empirical risk minimization, domain invariant representation learning, and distributionally robust optimization, as detailed below.

Empirical risk minimization (ERM): The baseline that considers all source domains as one domain. Domain invariant representation learning: Method that aims to achieve the invariant across source domains. We experiment with G2DM (Albuquerque et al., 2019), DANN (Ganin et al., 2016), CDANN (Li et al., 2018c), CORAL (Sun & Saenko, 2016), IRM (Arjovsky et al., 2019). These models focus on accuracy transfer by enforcing the invariance of distributions P Z DS i or P Z|Y DS i . Distributionally robust optimization: Method that learns a model at worst-case distribution to hope it can generalize well on test data. We experiment with Group DRO (Sagawa et al., 2019) that minimizes the worst-case training loss over a set of pre-defined groups through regularization. ATDM: A variant of FATDM-Star GAN that solely focuses on accuracy transfer. That is, we only enforce the invariance of P Z|Y DS i during learning which is similar to Nguyen et al. (2021).

The implementations of these models except G2DM are adapted from Domain Bed framework (Gulrajani & Lopez-Paz, 2020). For G2DM, we use the author-provided implementation. For all models, we use Res Net18 (He et al., 2016) as the backbone module of representation mapping g : X Z; and fairness constraint Lfair is enforced as a regularization term added to the original objective functions.

Experiment setup. We follow leave-one-out domain setting in which 3 domains are used for training and the remaining domain serves as the unseen target domain and is used for evaluation. Several metrics are considered to measure the unfairness and error of each model in target domain, including:

Error: cross-entropy loss (CE), misclassification rate (MR), AUROC := 1 AUROC, AUPR := 1 AUPR, F1 := 1 F1, where AUROC, AUPR, F1 are area under receiver operating characteristic curve, area under precision-recall curve, F1 score, respectively. Unfairness: we consider both equalized odds and equal opportunity fairness notion, and adopt mean distance (MD) and earth mover s distance (EMD) as distance metric D( || ).

Fairness and accuracy on target domains. We first compare our method with baselines in terms of the optimal trade-off (Pareto frontier) between accuracy and fairness on target domains under different metric pairs. Figure 4 shows the error-unfairness curves (as ω varies from 0 (no fairness constraint) to 10 (strong fairness constraint)), with AUROC and MR as error metric, and equalized odds (measured under distance metrics MD and EMD) as fairness notion; the results for other error metrics are similar

Published as a conference paper at ICLR 2023

Figure 4: Fairness-accuracy trade-off (Pareto frontier) of FATDM-Star GAN, FATDM-Cycle GAN, and baseline methods: error-unfairness curves are constructed by varying ω [0, 10] and the values of error and unfairness are normalized to [0, 1]. Lower-left points indicate the model has a better fairness-accuracy trade-off (Pareto optimality).

Figure 5: Prediction performances (AUROC, AUPR, Accuracy, F1) of FATDM-Star GAN on Cardiomegaly disease data when varying hyper-parameter γ at different levels of fairness constraint ω.

and shown in Appendix C. Our observations are as follows: (1) As expected, there is a trade-off between fairness and accuracy: for all methods, increasing ω improves fairness but reduces accuracy. (2) Among all methods, the Pareto frontiers of FATDM-Star GAN and FATDM-Cycle GAN are the bottom leftmost, implying that our method attains a better fairness-accuracy trade-off than baselines. (3) Although fairness constraint is imposed during training for all methods, the fairness attained at source domains cannot be well-generalized to the target domain under other methods. These results validate our theorems and show that enforcing the domain-invariant P Z|Y DS i and P Z|Y,A DS i when learning representations ensures the transfer of both accuracy and fairness. It is worth-noting that under this dataset, the domain-invariant P Z|Y DS i (accuracy transfer) does not imply the domain-invariant P Z|Y,A DS i (fairness transfer). This is because domain DS i (i.e., age) is correlated with label Y (i.e., has a disease) and sensitive attribute A (i.e., gender), making the distribution P Y,A DS i different across domains.

Impact of density mapping model. To investigate whether the performance gain of our method is due to the use of any specific density mapping model, we adopt Star GAN and Cycle GAN architectures to learn density mapping functions in our method and compare their performances. Figure 4 shows that FATDM-Star GAN and Cycle GAN achieve similar fairness-accuracy trade-off at the target domains and both of them outperform the baselines. This result shows that our method is not limited to any specific density mapping model and is broadly applicable to other architectures.

Impact of invariant representation constraints. We also examine the impact of Linv on the performance of FATDM-Star GAN at target domains, where we vary the hyper-parameter γ [0, 5e2] at different levels of fairness (i.e., fix ω = 1, 5, 10) and examine how the prediction performances (i.e., AUROC, AUPR, accuracy and F1) could change. Figure 5 shows that enforcing domain-invariant constraint Linv helps transfer the performance from source to target domain, and γ that attains the highest accuracy at target domain can be different for different levels of fairness. The results also indicate the fairness-accuracy trade-off, i.e., for any γ, enforcing stronger fairness constraints (large ω) could hurt prediction performances.

6 CONCLUSION In this paper, we theoretically and empirically demonstrate how to achieve fair and accurate predictions in unknown testing environments. To the best of our knowledge, our work provides the first theoretical analysis to understand the efficiency of invariant representation learning in transferring both fairness and accuracy under domain generalization. In particular, we first propose the upper bounds of prediction error and unfairness in terms of JS-distance, then design the two-stage learning method that minimizes these upper bounds by learning domain-invariant representations. Experiments on the real-world clinical data demonstrate the effectiveness of our study.

Published as a conference paper at ICLR 2023

REPRODUCIBILITY STATEMENT

The original chest X-ray images and the corresponding metadata can be downloaded from Physio Net (https://physionet.org/content/mimic-cxr-jpg/2.0.0/; https: //physionet.org/content/mimiciv/2.0/). Codes for data processing and proposed algorithms are in supplementary materials. Technical details of the proposed algorithms and experimental settings are in Appendix B. Additional experimental results are in Appendix C. Lemmas used in proofs of the theorems in the main paper are in Appendix D. Complete proofs of the theorems in the main paper and the corresponding lemmas are in Appendix E.

ACKNOWLEDGEMENTS

This work was funded in part by the National Science Foundation under award number IIS-2145625, by the National Institutes of Health under award number UL1TR002733, and by The Ohio State University President s Research Excellence Accelerator Grant.

Alekh Agarwal, Alina Beygelzimer, Miroslav Dudík, John Langford, and Hanna Wallach. A reductions approach to fair classification. In International Conference on Machine Learning, pp. 60 69. PMLR, 2018.

Isabela Albuquerque, João Monteiro, Mohammad Darvishi, Tiago H Falk, and Ioannis Mitliagkas. Generalizing to unseen domains via distribution matching. ar Xiv preprint ar Xiv:1911.00804, 2019.

Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization. ar Xiv preprint ar Xiv:1907.02893, 2019.

Yahav Bechavod, Christopher Jung, and Steven Z Wu. Metric-free individual fairness in online learning. Advances in neural information processing systems, 33:11214 11225, 2020.

Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. A theory of learning from different domains. Machine learning, 79(1):151 175, 2010.

Asia J Biega, Krishna P Gummadi, and Gerhard Weikum. Equity of attention: Amortizing individual fairness in rankings. In The 41st international acm sigir conference on research & development in information retrieval, pp. 405 414, 2018.

Arpita Biswas and Suvam Mukherjee. Ensuring fairness under prior probability shifts. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pp. 414 424, 2021.

Gilles Blanchard, Aniket Anand Deshmukh, Urun Dogan, Gyemin Lee, and Clayton Scott. Domain generalization by marginal transfer learning. Journal of Machine Learning Research, 22:1 55, 2021.

Fabio M Carlucci, Antonio D Innocente, Silvia Bucci, Barbara Caputo, and Tatiana Tommasi. Domain generalization by solving jigsaw puzzles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2229 2238, 2019.

Yatong Chen, Reilly Raab, Jialu Wang, and Yang Liu. Fairness transferability subject to bounded distribution shift. ar Xiv preprint ar Xiv:2206.00129, 2022.

Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8789 8797, 2018.

Amanda Coston, Karthikeyan Natesan Ramamurthy, Dennis Wei, Kush R Varshney, Skyler Speakman, Zairah Mustahsan, and Supriyo Chakraborty. Fair transfer learning with missing protected attributes. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp. 91 98, 2019.

Published as a conference paper at ICLR 2023

Aniket Anand Deshmukh, Yunwen Lei, Srinagesh Sharma, Urun Dogan, James W Cutler, and Clayton Scott. A generalization error bound for multi-class domain generalization. ar Xiv preprint ar Xiv:1905.10392, 2019.

Luc Devroye, Abbas Mehrabian, and Tommy Reddad. The total variation distance between highdimensional gaussians. ar Xiv preprint ar Xiv:1810.08693, 2018.

Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pp. 214 226, 2012.

D.M. Endres and J.E. Schindelin. A new metric for probability distributions. IEEE Transactions on Information Theory, 49(7):1858 1860, 2003. doi: 10.1109/TIT.2003.813506.

Yaroslav Ganin and Victor Lempitsky. Unsupervised domain adaptation by backpropagation. In International conference on machine learning, pp. 1180 1189. PMLR, 2015.

Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. Domain-adversarial training of neural networks. The journal of machine learning research, 17(1):2096 2030, 2016.

Stephen Giguere, Blossom Metevier, Bruno Castro da Silva, Yuriy Brun, Philip S Thomas, and Scott Niekum. Fairness guarantees under demographic shift. In International Conference on Learning Representations, 2022.

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.

Ishaan Gulrajani and David Lopez-Paz. In search of lost domain generalization. ar Xiv preprint ar Xiv:2007.01434, 2020.

Swati Gupta and Vijay Kamble. Individual fairness in hindsight. Journal of Machine Learning Research, 22(144):1 35, 2021.

Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning. Advances in neural information processing systems, 29, 2016.

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770 778, 2016.

Judy Hoffman, Mehryar Mohri, and Ningshan Zhang. Algorithms and theory for multiple-source adaptation. Advances in Neural Information Processing Systems, 31, 2018.

Weihua Hu, Gang Niu, Issei Sato, and Masashi Sugiyama. Does distributionally robust supervised learning give robust classifiers? In International Conference on Machine Learning, pp. 2029 2037. PMLR, 2018.

Zeyi Huang, Haohan Wang, Eric P Xing, and Dong Huang. Self-challenging improves cross-domain generalization. In European Conference on Computer Vision, pp. 124 140. Springer, 2020.

Maximilian Ilse, Jakub M Tomczak, Christos Louizos, and Max Welling. Diva: Domain invariant variational autoencoders. In Medical Imaging with Deep Learning, pp. 322 348. PMLR, 2020.

Seogkyu Jeon, Kibeom Hong, Pilhyeon Lee, Jewook Lee, and Hyeran Byun. Feature stylization and domain-aware contrastive learning for domain generalization. In Proceedings of the 29th ACM International Conference on Multimedia, pp. 22 31, 2021.

Alistair Johnson, Lucas Bulgarelli, Tom Pollard, Steven Horng, Leo Anthony Celi, and Mark Roger. Mimic-iv. https://doi.org/10.13026/s6n6-xd98, 2021.

Published as a conference paper at ICLR 2023

Alistair EW Johnson, Tom J Pollard, Nathaniel R Greenbaum, Matthew P Lungren, Chih-ying Deng, Yifan Peng, Zhiyong Lu, Roger G Mark, Seth J Berkowitz, and Steven Horng. Mimic-cxr-jpg, a large publicly available database of labeled chest radiographs. ar Xiv preprint ar Xiv:1901.07042, 2019.

Faisal Kamiran and Toon Calders. Data preprocessing techniques for classification without discrimination. Knowledge and information systems, 33(1):1 33, 2012.

Amit Kaushal, Russ Altman, and Curt Langlotz. Geographic distribution of us cohorts used to train deep learning algorithms. Jama, 324(12):1212 1213, 2020.

Daehee Kim, Youngjun Yoo, Seunghyun Park, Jinkyu Kim, and Jaekoo Lee. Selfreg: Self-supervised contrastive regularization for domain generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9619 9628, 2021.

Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, et al. Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, pp. 5637 5664. PMLR, 2021.

David Krueger, Ethan Caballero, Joern-Henrik Jacobsen, Amy Zhang, Jonathan Binas, Dinghuai Zhang, Remi Le Priol, and Aaron Courville. Out-of-distribution generalization via risk extrapolation (rex). In International Conference on Machine Learning, pp. 5815 5826. PMLR, 2021.

Haoliang Li, Sinno Jialin Pan, Shiqi Wang, and Alex C Kot. Domain generalization with adversarial feature learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5400 5409, 2018a.

Ya Li, Mingming Gong, Xinmei Tian, Tongliang Liu, and Dacheng Tao. Domain generalization via conditional invariant representations. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018b.

Ya Li, Xinmei Tian, Mingming Gong, Yajing Liu, Tongliang Liu, Kun Zhang, and Dacheng Tao. Deep domain generalization via conditional invariant adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 624 639, 2018c.

Zheren Li, Zhiming Cui, Sheng Wang, Yuji Qi, Xi Ouyang, Qitian Chen, Yuezhi Yang, Zhong Xue, Dinggang Shen, and Jie-Zhi Cheng. Domain generalization for mammography detection via multi-style and multi-view contrastive learning. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 98 108. Springer, 2021.

Evan Z Liu, Behzad Haghgoo, Annie S Chen, Aditi Raghunathan, Pang Wei Koh, Shiori Sagawa, Percy Liang, and Chelsea Finn. Just train twice: Improving group robustness without training group information. In International Conference on Machine Learning, pp. 6781 6792. PMLR, 2021.

David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. Learning adversarially fair and transferable representations. In International Conference on Machine Learning, pp. 3384 3393. PMLR, 2018.

Karima Makhlouf, Sami Zhioua, and Catuscia Palamidessi. Machine learning fairness notions: Bridging the gap with real-world applications. Information Processing & Management, 58(5): 102642, 2021.

Yishay Mansour, Mehryar Mohri, and Afshin Rostamizadeh. Domain adaptation with multiple sources. Advances in neural information processing systems, 21, 2008.

Yishay Mansour, Mehryar Mohri, and Afshin Rostamizadeh. Multiple source adaptation and the rényi divergence. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 367 374, 2009.

Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. ar Xiv preprint ar Xiv:1802.05957, 2018.

Published as a conference paper at ICLR 2023

Krikamol Muandet, David Balduzzi, and Bernhard Schölkopf. Domain generalization via invariant feature representation. In International Conference on Machine Learning, pp. 10 18. PMLR, 2013.

A Tuan Nguyen, Toan Tran, Yarin Gal, and Atilim Gunes Baydin. Domain invariant representation learning with domain density transformations. Advances in Neural Information Processing Systems, 34, 2021.

Luca Oneto, Michele Donini, Andreas Maurer, and Massimiliano Pontil. Learning fair and transferable representations. ar Xiv preprint ar Xiv:1906.10673, 2019.

Trung Phung, Trung Le, Tung-Long Vuong, Toan Tran, Anh Tran, Hung Bui, and Dinh Phung. On learning domain-invariant representations for transfer learning with multiple sources. Advances in Neural Information Processing Systems, 34, 2021.

Yury Polyanskiy and Yihong Wu. Lecture notes on information theory. Lecture Notes for ECE563 (UIUC) and, 6(2012-2016):7, 2014.

Yury Polyanskiy and Yihong Wu. Strong data-processing inequalities for channels and bayesian networks. In Convexity and Concentration, pp. 211 249. Springer, 2017.

Fengchun Qiao, Long Zhao, and Xi Peng. Learning to learn single domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12556 12565, 2020.

Alexandre Rame, Corentin Dancette, and Matthieu Cord. Fishr: Invariant gradient variances for out-of-distribution generalization. ar Xiv preprint ar Xiv:2109.02934, 2021.

Ashkan Rezaei, Anqi Liu, Omid Memarrast, and Brian D Ziebart. Robust fairness under covariate shift. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 9419 9427, 2021.

Shiori Sagawa, Pang Wei Koh, Tatsunori B Hashimoto, and Percy Liang. Distributionally robust neural networks. In International Conference on Learning Representations, 2019.

Candice Schumann, Xuezhi Wang, Alex Beutel, Jilin Chen, Hai Qian, and Ed H Chi. Transfer of machine learning fairness across domains. ar Xiv preprint ar Xiv:1906.09688, 2019.

Shiv Shankar, Vihari Piratla, Soumen Chakrabarti, Siddhartha Chaudhuri, Preethi Jyothi, and Sunita Sarawagi. Generalizing across domains via cross-gradient training. In International Conference on Learning Representations, 2018.

Yuge Shi, Jeffrey Seely, Philip HS Torr, N Siddharth, Awni Hannun, Nicolas Usunier, and Gabriel Synnaeve. Gradient matching for domain generalization. ar Xiv preprint ar Xiv:2104.09937, 2021.

Changjian Shui, Boyu Wang, and Christian Gagné. On the benefits of representation regularization in invariance based domain generalization. Machine Learning, pp. 1 21, 2022.

Anthony Sicilia, Xingchen Zhao, and Seong Jae Hwang. Domain adversarial neural networks for domain generalization: When it works and how to improve. ar Xiv preprint ar Xiv:2102.03924, 2021.

Harvineet Singh, Rina Singh, Vishwali Mhasawade, and Rumi Chunara. Fairness violations and mitigation under covariate shift. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 3 13, 2021.

Baochen Sun and Kate Saenko. Deep coral: Correlation alignment for deep domain adaptation. In European conference on computer vision, pp. 443 450. Springer, 2016.

Chris Xing Tian, Haoliang Li, Xiaofei Xie, Yang Liu, and Shiqi Wang. Neuron coverage-guided domain generalization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.

Riccardo Volpi, Hongseok Namkoong, Ozan Sener, John C Duchi, Vittorio Murino, and Silvio Savarese. Generalizing to unseen domains via adversarial data augmentation. Advances in neural information processing systems, 31, 2018.

Published as a conference paper at ICLR 2023

Jingge Wang, Yang Li, Liyan Xie, and Yao Xie. Class-conditioned domain generalization via wasserstein distributional robust optimization. ar Xiv preprint ar Xiv:2109.03676, 2021.

Haotian Ye, Chuanlong Xie, Tianle Cai, Ruichen Li, Zhenguo Li, and Liwei Wang. Towards a theoretical framework of out-of-distribution generalization. Advances in Neural Information Processing Systems, 34, 2021.

Taeho Yoon, Jaewook Lee, and Woojin Lee. Joint transfer of model knowledge and fairness over domains using wasserstein distance. IEEE Access, 8:123783 123798, 2020.

Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez-Rodriguez, and Krishna P Gummadi. Fairness constraints: A flexible approach for fair classification. The Journal of Machine Learning Research, 20(1):2737 2778, 2019.

Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representations. In International conference on machine learning, pp. 325 333. PMLR, 2013.

Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations, 2018.

Xueru Zhang, Mohammadmahdi Khaliligarekani, Cem Tekin, et al. Group retention when using machine learning in sequential decision making: the interplay between user dynamics and fairness. Advances in Neural Information Processing Systems, 32, 2019.

Xueru Zhang, Ruibo Tu, Yang Liu, Mingyan Liu, Hedvig Kjellstrom, Kun Zhang, and Cheng Zhang. How do fair decisions fare in long-term qualification? Advances in Neural Information Processing Systems, 33:18457 18469, 2020.

Han Zhao, Shanghang Zhang, Guanhang Wu, José MF Moura, Joao P Costeira, and Geoffrey J Gordon. Adversarial multiple source domain adaptation. Advances in neural information processing systems, 31, 2018.

Shanshan Zhao, Mingming Gong, Tongliang Liu, Huan Fu, and Dacheng Tao. Domain generalization via entropy regularization. Advances in Neural Information Processing Systems, 33:16096 16107, 2020.

Kaiyang Zhou, Yongxin Yang, Timothy Hospedales, and Tao Xiang. Learning to generate novel domains for domain generalization. In European conference on computer vision, pp. 561 578. Springer, 2020.

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp. 2223 2232, 2017.

Published as a conference paper at ICLR 2023

A RELATED WORKS

Domain generalization/Domain adaptation: In many real scenarios of machine learning, data in training phase is sampled from one or many source domains, while in the testing phase, data is sampled from an unseen target domain. Many works have been proposed to design robust ML models that can achieve good performances in deployment environment depending on whether they can access to the target data (domain adaptation) or not (domain generalization). However, most of these models focus only on transfering accuracy from source to target domains and can be categorized into five main approaches: (1) data manipulation (Volpi et al., 2018; Qiao et al., 2020; Zhou et al., 2020; Zhang et al., 2018; Shankar et al., 2018); (2) domain-invariant representation learning (Li et al., 2018b;a; Ganin & Lempitsky, 2015; Ganin et al., 2016; Phung et al., 2021; Nguyen et al., 2021); (3) distributional robustness (Krueger et al., 2021; Liu et al., 2021; Koh et al., 2021; Wang et al., 2021; Sagawa et al., 2019; Hu et al., 2018), (4) gradient operation (Huang et al., 2020; Shi et al., 2021; Rame et al., 2021; Tian et al., 2022), and (5) self-supervised learning (Carlucci et al., 2019; Kim et al., 2021; Jeon et al., 2021; Li et al., 2021).

Fairness in Machine Learning: Many fairness notions have been proposed to measure the unfairness in ML model, and they can be roughly classified into two classes: Individual fairness considers the equity at the individual-level and it requires that similar individuals should be treated similarly (Biega et al., 2018; Bechavod et al., 2020; Gupta & Kamble, 2021; Dwork et al., 2012). Group fairness attains a certain balance in the group-level, where the entire population is first partitioned into multiple groups and certain statistical measures are equalized across different groups (Hardt et al., 2016; Zhang et al., 2019; 2020). Various approaches have also been developed to satisfy these fairness notions, they roughly fall into three categories: (1) Pre-processing: modifying training dataset to remove bias before learning an ML model (Kamiran & Calders, 2012; Zemel et al., 2013). (2) In-processing: attain fairness during the training process by imposing certain fairness constraint or modifying loss function. (Zafar et al., 2019; Agarwal et al., 2018) (3) Post-processing: altering the output of an existing algorithm to satisfy a fairness constraint after training (Hardt et al., 2016). However, most of these methods assume the data distributions at training and testing are the same. In contrast, we study fairness problem under domain generalization in this paper.

Fairness under Domain Adaptation: There are some studies proposed to achieve good fairness when the testing environment changes but all of them focused on the domain adaptation setting. The most common adaptation setup is learning under the assumption of covariate shift. For example, Singh et al. (2021) leveraged a feature selection method in a causal graph describing data to mitigate fairness violation under covariate shift of distribution in testing data. Coston et al. (2019) proposed the weighting methods that can give fair prediction under covariate shift between source and target distribution when access to the sensitive attributes is prohibited. Rezaei et al. (2021) sought fair decisions by optimizing a worst-case testing performance. Besides convariate shift, there are some works proposed to handle other types of distribution shift including demographic shift and prior probability shift. Instead of learning fair model directly, Oneto et al. (2019) and Madras et al. (2018) find fair representation that can generalize to the new tasks. Aside from empirical studies, Schumann et al. (2019) and Yoon et al. (2020) developed theoretical frameworks to examine fairness transfer in domain adaptation setting and then offered modeling approaches to achieve good fairness in the target domain.

Comparison with existing bounds in the literature: We compare our bounds with most commons bound in the fields of domain adaptation and domain generalization as follows.

Accuracy bounds in domain adaptation.

Bounds in Ben-David et al. (2010):

ϵAcc DT bf ϵAcc DS bf + DT V P X DS P X DT + min D {DS,DT }ED [|f DS(X) f DT (X)|]

This bound is for binary classification problem under domain adaptation. The classification error in target domain is bounded by the error in source domain, the total variation distance of feature distribution between source and target domain, and the misalignment of the labeling function between source and target domain. The limitation of this bound is that (1) it s only applicable to settings with zero-one loss function and deterministic labeling function; (2) estimating the total variation distance is hard in practice and it doesn t relate the feature and representation spaces.

Published as a conference paper at ICLR 2023

This paper also provides another accuracy bound based on H H divergence:.

ϵAcc DT bf ϵAcc DS bf + DH H P X DS P X DT + inf b f

h ϵAcc DT bf + ϵAcc DS bf i

where DH H P X DS P X DT = sup b f1, b f1

PDS bf1(X) = bf2(X) PDT bf1(X) = bf2(X) is the

H H divergence. However, it has the same limitations as total variation distance mentioned above.

Accuracy bounds in domain generalization.

Bounds in Albuquerque et al. (2019):

i=1 πiϵAcc DS i

bf + max j,k [N]DH P X DS j P X DS k

+ DH P X DS P X DT

+ min D {DS ,DT }ED f DS (X) f DT (X)

where DH P X DS P X DT = sup b f

PDS bf(X) = 1 PDT bf(X) = 1 is the H divergence,

P X DS = arg min π DH PN i=1 πi P X DS i P X DT is the mixture of source domains that is closest to

target domain with respect to H divergence. In this bound, the classification error in target domain is bounded by the convex combination of errors in source domains, the H divergence between source domains, the H divergence between target domain and its nearest mixture of source domains, and the misalignment of the labeling function between mixture source domains and target domain. Because this bound is constructed based on H divergence, it also has the limitations for the bounds in domain adaptation (Ben-David et al., 2010) as we mentioned. This bound can be transformed to the representation space Z by replacing X by Z in its formula. Then, this bound suggests enforcing invariant constraint of marginal distribution of representation Z across source domains, which has inherent trade-off as shown in Thm. 2. Because the target domain is unknown during training, the mixing weights {πi}N i=1 are not useful for algorithmic design. Bounds in Phung et al. (2021):

i=1 πiϵAcc DS i

bf + Cmax i [N]EDS i

h f DT (X)y f DS i (X)y i|Y|

N d1/2 P Z DT , P Z DS i

N d1/2 P Z DS i , P Z DS j

where d1/2 P X DS i , P X DS j

D1/2 P X DS i P X DS j

is Hellinger distance defined based on

Hellinger divergence D1/2 P X DS i P X DS j

2 d X. This bound re-

lates the feature and representation spaces that the classification error of target domain defined in feature space is bounded by classification errors of source domains defined in feature space, the misalignment of labeling function between target and source domains, and the Hellinger distances between source and target domains and between source domains of marginal distribution of representation Z. While this bound is not limited to zero-one loss and the labeling function can be stochastic, it suggests the alignment of marginal distribution of representation Z across source domains for generalization. Moreover, estimating Hellinger distance can be hard in practice.

The mismatch between existing bounds and adversarial learning approach for domain generalization.

All existing bounds mentioned above suggest minimizing the distances between representation distributions across source domains with respect to some discrepancy measures such as H divergence, total variation distance, and Hellinger distance. Based on these bounds, adversarial learning-based models are often proposed to minimize these distances. However, there is a misalignment between the objectives of adversarial learning and the bounds which results in the gap between theoretical findings and practical algorithms.

Published as a conference paper at ICLR 2023

In particular, it has been shown that the objective of the minimax game between the representation mapping and the discriminator is equivalent to minimizing the JS divergence between representation distributions across source domains (Goodfellow et al., 2014). However, minimizing JS divergence does not guarantee the minimization of common distances used in the existing bounds. The details are as follows.

H divergence: We show that JS divergence is not the upper bound of H divergence. Con-

sider an example with two distributions P(X) and Q(X) where P(X) = 0 w.p 1/3 P(X) = 1 w.p 2/3 and Q(X) = 0 w.p 1/3 Q(X) = 1 w.p 2/3. By definition, DH(P Q) 0.33 > DJS(P Q) 0.08.

Total variation distance: We have DJS(P Q) DT V (P Q) P, Q where DJS and DT V are JS divergence and total variation distance, respectively. Then, minimizing JS divergence does not guarantee the minimization of total variation distance.

Hellinger distance: We have DJS(P Q)

2d1/2(P, Q) P, Q where d1/2 is Hellinger distance and total variation distance, respectively. Then, minimizing JS divergence does not guarantee the minimization of Hellinger distance.

Different from the existing bounds, our bounds are based on JS divergence/distance. Then they align with the adversarial learning approach for domain generalization in general, and with our proposed method FATDM in particular.

Advantages of our proposed bounds in domain generalization.

In summary, our proposed bounds has several advantages in terms of the following:

Most existing bounds (Ben-David et al., 2010; Albuquerque et al., 2019) do not relates feature and representation spaces so it is not clear how performance in input space is affected by the representations. In contrast, our bounds connect the representation and input spaces; this further guides us to find representations that lead to good performances in input space. Most prior studies adopt H divergence to measure the dissimilarity between domains, which is limited to deterministic labeling functions and zero-one loss (Ben-David et al., 2010; Albuquerque et al., 2019). In contrast, our bound is more general and is applicable to settings where domains are specified by stochastic labeling functions and general loss functions. Distant metrics (i.e., total variation distance, H divergence, Hellinger divergence, etc.) used in existing bounds (Ben-David et al., 2010; Albuquerque et al., 2019; Phung et al., 2021) are hard to compute in practice. In contrast, our bounds use JS divergence which is aligned with training objective for discriminator in adversarial learning Goodfellow et al. (2014). Existing bounds for domain generalization only imply the alignment of marginal distribution of feature across source domains (Albuquerque et al., 2019; Phung et al., 2021). As shown in Thm. 2, methods that learn invariance of marginal distribution have an inherent trade-off and may increase the lower bound of expected loss. In contrast, our bounds suggest the alignment of label-conditional distribution of feature across source domains which has been verified to be more effective in empirical studies (Li et al., 2018b;c; Zhao et al., 2020; Nguyen et al., 2021). Regarding the fairness, our work is the first that bounds the unfairness in domain generalization. In particular, our bounds suggest enforcing the invariant constraint of feature distribution given label and sensitive attribute across source domains to transfer fairness to the unseen target domain.

B DETAILS OF ALGORITHM FATDM

FATDM consists of density mapping functions my i,j and my,a i,j , y Y, a A, i, j [N] (learned by two Density Match models), feature mapping function g (Res Net18 model), and the classifier bh. In our study, we experiment with two different Density Match architectures: Star GAN (i.e., in FATDM-Star GAN) and Cycle GAN (in FATDM-Cycle GAN). We show the details of FATDM-Star GAN below. For FATDM-Cycle GAN, the only difference is we used Cycle GAN as Density Match instead of Star GAN. The details of Cycle GAN were presented in the original paper (Zhu et al., 2017).

Published as a conference paper at ICLR 2023

For FATDM-Star GAN, each Density Match Y (or Density Match Y,A) consists of a generator G : X [N] [N] X and a discriminator D : X [N] {0, 1}. The generator takes in real image x and a pair of domain labels i, j as input and generates a fake image; the discriminator aims to predict the domain label of the image generated by the generator and distinguish whether it is fake or real. G and D are learned simultaneously by solving the following optimizations:

Discriminator s objective: min LStar GAN D := LStar GAN adv + λcls LStar GAN cls(real) Generator s objective: min LStar GAN G := LStar GAN adv + λcls LStar GAN cls(fake) + λrec LStar GAN rec (6)

where LStar GAN adv is the adversarial loss, LStar GAN cls(fake) , LStar GAN cls(real) are domain classification loss with respect to fake and real images respectively, LStar GAN rec is the reconstruction loss. The specific formulations of these loss functions are in Choi et al. (2018). λcls and λrec are hyper-parameters that control the relative importance of domain classification and reconstruction losses, respectively, compared to the adversarial loss.

In our experiments, input images are resized to (256, 256) and normalized into the range [ 1, 1]. The dimension of representation space Z is set to 512. ω (hyper-parameter that controls accuracy-fairness trade-off) varies from 0 to 10 with step sizes 0.0002 for ω [0, 0.002], 0.002 for ω [0.002, 0.1] and 0.2 for ω [0.2, 10], and γ (hyper-parameter that controls accuracy-invariance trade-off) is set to 0.1 (after hyper-parameter tuning). Models (FATDM and baselines) are implemented by Py Torch library version 1.11 and is trained on multiple computer nodes (each model instance is trained on a single node which has 4 CPUs, 8GB of memory, and a single GPU (P100 or V100)). One domain s data is used for testing and the other domains data is used for training (10% of training data is used for validation). Each model is trained with 10 epoches and the results are from the epoch with best performance on the validation set. Figure 6 visualizes the two-stage training process of FATDM-Star GAN. The detailed architectures of FATDM-Star GAN are shown in Tables 2-5. We have also provided all code for these models in supplemental material.

Published as a conference paper at ICLR 2023

Figure 6: Two-stage training of FATDM-Star GAN. For stage 1, we only show the training process for Density Match Y,A (training process for Density Match Y is similar.)

Table 2: Architecture of Star GAN generators GY and GY,A - Density mapping functions my i,j and my,a i,j y Y, a A, i, j [N]. This architecture is similar to the one in the original paper Choi et al. (2018) except for the first convolution layer where number of input channels is 1 (for grayscale images) and input shape is (h, w, 1 + 2nc). (h, w) is the size of input images, IN is instance batchnorm, and Re LU is Rectified Linear Unit. N: number of output channels, K: kernel size, S: stride szie, P: padding size are convolution and deconvolution layers hyper-parameters.

Part Input Output Shape Layer Information

Down-sampling

(h, w, 1 + 2nc) (h, w, 64) CONV-(N64, K7x7, S1, P3), IN, Re LU

(h, w, 64) h

2 , 128 CONV-(N128, K4x4, S2, P1), IN, Re LU h

4 , 256 CONV-(N256, K4x4, S2, P1), IN, Re LU

4 , 256 Residual Block: CONV-(N256, K3x3, S1, P1), IN, Re LU h

4 , 256 Residual Block: CONV-(N256, K3x3, S1, P1), IN, Re LU h

4 , 256 Residual Block: CONV-(N256, K3x3, S1, P1), IN, Re LU h

4 , 256 Residual Block: CONV-(N256, K3x3, S1, P1), IN, Re LU h

4 , 256 Residual Block: CONV-(N256, K3x3, S1, P1), IN, Re LU h

4 , 256 Residual Block: CONV-(N256, K3x3, S1, P1), IN, Re LU

Up-sampling

2 , 128 DECONV-(N128, K4x4, S2, P1), IN, Re LU h

2 , 128 (h, w, 64) DECONV-(N64, K4x4, S2, P1), IN, Re LU

(h, w, 64) (h, w, 3) CONV-(N3, K7x7, S1, P3), IN, Re LU

Published as a conference paper at ICLR 2023

Table 3: Architecture of Star GAN discriminators. This architecture is similar to the one in the original paper Choi et al. (2018) except for the first convolution layer where number of input channels is 1 (for grayscale images). (h, w) is the size of input images, nd is the number of domains, and Leaky Re LU is Leaky Rectified Linear Unit. N: number of output channels, K: kernel size, S: stride szie, P: padding size are convolution layers hyper-parameters.

Layer Input Output Shape Layer Information

Input Layer (h, w, 1) h

2 , 64 CONV-(N64, K4x4, S2, P1), Leaky Re LU

Hidden Layer h

4 , 128 CONV-(N128, K4x4, S2, P1), Leaky Re LU

Hidden Layer h

8 , 256 CONV-(N256, K4x4, S2, P1), Leaky Re LU

Hidden Layer h

16, 512 CONV-(N512, K4x4, S2, P1), Leaky Re LU

Hidden Layer h

32, 1024 CONV-(N1024, K4x4, S2, P1), Leaky Re LU

Hidden Layer h

64, 2048 CONV-(N2048, K4x4, S2, P1), Leaky Re LU

Output Layer (Dsrc) h

64, 1 CONV-(N1, K3x3, S1, P1)

Output Layer (Dcls) h

64, 2048 (1, 1, nd) CONV-(N(nd), K h

64, S1, P0)

Table 4: Architecture of feature mapping g. This architecture is similar to Res Net18 model He et al. (2016) except for the first convolution layer where number of input channels is 1 (for grayscale images) and the last layer where output dimension is nz - dimension of representation space Z. (h, w) is the size of input images, BN is batchnorm, Max Pool is max pooling, Ave Pool is average pooling, and Re LU is Rectified Linear Unit. N: number of output channels, K: kernel size, S: stride szie, P: padding size are convolution layers hyper-parameters.

Part Input Output Shape Layer Information

Input (h, w, 1) h

2 , 64 CONV-(N64, K7x7, S2, P3), BN, Re LU, Max Pool

4 , 64 Residual Block: CONV-(N64, K3x3, S1, P1), BN, Re LU,

CONV-(N64, K3x3, S1, P1), BN

8 , 128 Residual Block: CONV-(N128, K3x3, S1, P1), BN, Re LU,

CONV-(N128, K3x3, S1, P1), BN

16, 256 Residual Block: CONV-(N256, K3x3, S1, P1), BN, Re LU,

CONV-(N256, K3x3, S1, P1), BN

16, 256 (1, 1, 512) Residual Block: CONV-(N512, K3x3, S1, P1), IN, Re LU,

CONV-(N512, K3x3, S1, P1), BN, Avg Pool

Output (1, 1, 512) nz LINEAR-(512, nz)

Table 5: Architecture of classifier bh. nz is the dimension of representation space Z.

Layer Input Output Shape Layer Information

Hidden Layer nz nz

2 LINEARnz, nz

Hidden Layer nz

Output Layer nz

4 1 LINEARnz

4 , 1 , Sigmoid

Published as a conference paper at ICLR 2023

C ADDITIONAL EXPERIMENTS

Experimental results with all unfairness and error metrics. In this section, we provide more experimental results about fairness and accuracy under domain generalization. In particular, we investigate fairness-accuracy trade-off on the two clinical image datasets including Cardiomegaly and Edema diseases with respect to different fairness criteria (i.e., Equalized Odds, Equal Opportunity), and unfairness (i.e., MD and EMD) and error (i.e., CE, MR, AUROC, AUPR, F1) measures. Figure 7 (Cardiomegaly disease - Equalized Odds), Figure 8 (Cardiomegaly disease - Equal Opportunity), Figure 9 (Edema disease - Equalized Odds), and Figure 10 (Edema disease - Equal Opportunity) show the unfairness-error curves of our models as well as baselines for these two datasets. As we can see, our model outperforms other baselines in terms of fairness-accuracy trade-off. The curve of our model is the bottom-leftmost compared to other baselines in all measures showing the clear benefit of (1) enforcing conditional invariant constraints for accuracy and fairness transfer and (2) using the two-stage training process to stabilize training compared to adversarial learning approach. We also quantify our observations by calculating the areas under these unfairness-error curves, in which the smaller area indicates the better accuracy-fairness trade-off. As shown in Tables 6 and 7, our model has the smallest areas under the curve and achieves significantly better fairness-accuracy trade-off for both equalized odd and equal opportunity compared to other methods.

Impact of the number of source domains. Our work focuses on transferring fairness and accuracy under domain generalization when the target domain data are inaccessible during training. Instead, it relies on a set of source domains to generalize to an unseen, novel target domain. We investigate the relationship between the fairness-accuracy trade-off on the target domain and the number of source domains during training. In particular, we evaluate the performances of FATDM and ERM on Edema dataset with different numbers of source domains. Similar to the previous experiment, we first construct the dataset for each domain by rotating images with θ degree, where θ {0 , 15 , 30 } when the number of domain is 3, θ {0 , 15 , 30 , 45 } when the number of domain is 4, and θ {0 , 15 , 30 , 45 , 60 } when the number of domain is 5. The number of images per domain is adapted to ensure the training set size is fixed for the three cases. We follow the leave-one-out domain setting in which one domain serves as the unseen target domain for evaluation while the rest domains are for training; the average results across target domains are reported.

Figure 11 shows error-unfairness curves of FATDM and ERM when training with 2, 3, and 4 source domains. We observe that training with more source domains does not always help the model achieve better fairness-accuracy trade-off on unseen target domains. In particular, the performances of both FATDM and ERM are the best when training with 2 source domains and the worst when training with 3 source domains. We conjecture the reason that adding more source domains may help reduce the discrepancy between source and target domains (term (ii) in Thm. 1 and Thm. 3), but it may make it more difficult to minimize the source error and unfairness (term (i) in Thm. 1 and Thm. 3) and to learn invariant representation across the source domains (term (iii) in Thm. 1 and Thm. 3). Thus, our suggestion in practice is to conduct an ablation study to find the optimal number of source domains.

Simultaneous and sequential training comparison. In all experiments we conducted so far, the fairness constraint Lfair is optimized simultaneously with the prediction error Lacc and the domain-invariant constraint Linv for all methods. To investigate whether FATDM still attains a better accuracy-fairness trade-off when the processes of invariant representation learning and fair model training are decoupled, we conduct another set of experiments where models (FATDM (i.e., FATDM-Star GAN) and baselines G2DM, DANN, CDANN) are learned in a sequential matter: for each model, we first learn the representation mapping g by optimizing Linv and Lacc; using the representations generated by the fixed g, we then learn the fair classifier by optimizing Lacc and Lfair. The models trained based on the above procedure are named FATDM-seq, G2DM-seq, DANN-seq, and CDANN-seq; and their corresponding error-unfairness curves are shown in Figure 12. The results show that FATDM-seq still attains the best accuracy-fairness trade-off at target domain compared to G2DM-seq, DANN-seq, CDANN-seq. Our method is effective no matter whether Lfair and Linv are optimized simultaneously or sequentially.

The reason that our method consistently outperforms the baselines for both settings is that the invariant-representation learning in baseline methods only guarantees the transfer of accuracy but not fairness. Even though a fairness regularizer is imposed to ensure the model is fair at source

Published as a conference paper at ICLR 2023

domains (no matter whether invariant representations and fair classifier are trained simultaneously or sequentially), this fairness cannot be preserved at the target domain due to the potential distributional shifts. The key to ensuring the transfer of fairness is to learn representations such that P(Z|Y, A) is domain-invariant; this must be done during the representation learning process. From Thm 3, we can see that unfairness at target domain ϵEO DT can still blow up if P Z|Y,A is different across domains, regardless of how fair the model is at source domains (i.e., small ϵEO DS i ).

Figure 7: Error-unfairness curves with respect to equalized odds of FATDM and baselines on Cardiomegaly disease dataset.

Published as a conference paper at ICLR 2023

Figure 8: Error-unfairness curves with respect to equal opportunity of FATDM and baselines on Cardiomegaly disease dataset.

Published as a conference paper at ICLR 2023

Figure 9: Error-unfairness curves with respect to equalized odds of FATDM and baselines on Edema disease dataset.

Published as a conference paper at ICLR 2023

Figure 10: Error-unfairness curves with respect to equal opportunity of FATDM and baselines on Edema disease dataset.

Published as a conference paper at ICLR 2023

Table 6: Area under the error-unfairness curves (Cardiomegaly disease dataset).

Error - Unfairness Method ERM G2DM DANN CDANN CORAL Group DRO IRM FATDM

Equalized Odds

AUROC - MD 0.5575 0.6093 0.7571 0.7224 0.7239 0.7039 0.6784 0.0935 AUPRC - MD 0.5463 0.6301 0.7730 0.6883 0.7300 0.7152 0.6967 0.0291 CE - MD 0.2861 0.2601 0.4622 0.4232 0.4424 0.3148 0.3370 0.2152 MR - MD 0.6312 0.4906 0.6795 0.6667 0.6683 0.6382 0.5721 0.2439 F1 - MD 0.5901 0.4150 0.6507 0.6547 0.5745 0.5360 0.5025 0.3365 AUROC - EMD 0.7326 0.7106 0.8342 0.7931 0.8075 0.7845 0.7991 0.1099 AUPRC - EMD 0.6901 0.7146 0.8308 0.7577 0.7918 0.7806 0.7945 0.0437 CE - EMD 0.5158 0.4443 0.6143 0.5788 0.5873 0.4911 0.5274 0.3384 MR - EMD 0.7056 0.5795 0.7137 0.6979 0.6902 0.6571 0.6483 0.2045 F1 - EMD 0.6866 0.5328 0.7279 0.7019 0.6515 0.6027 0.6120 0.2888

Equal Opportunity

AUROC - MD 0.5128 0.6001 0.6999 0.6686 0.5935 0.6288 0.5910 0.0750 AUPRC - MD 0.5419 0.6718 0.7086 0.7189 0.6423 0.6761 0.6435 0.0262 CE - MD 0.3690 0.4272 0.5094 0.4492 0.3780 0.2737 0.3582 0.2754 MR - MD 0.3203 0.5068 0.5252 0.5512 0.4897 0.4368 0.4173 0.1778 F1 - MD 0.2134 0.4570 0.4608 0.5207 0.4017 0.3561 0.3510 0.2737 AUROC - EMD 0.6119 0.7184 0.7649 0.7517 0.6720 0.7068 0.6780 0.0947 AUPRC - EMD 0.6321 0.7684 0.7718 0.7877 0.6912 0.7335 0.7200 0.0448 CE - EMD 0.5092 0.6093 0.6264 0.6141 0.4737 0.4340 0.4917 0.3070 MR - EMD 0.4619 0.6420 0.6325 0.6532 0.5790 0.5515 0.5298 0.1918 F1 - EMD 0.3876 0.6122 0.5942 0.6496 0.5101 0.4898 0.4889 0.3108

Table 7: Area under the error-unfairness curves (Edema disease dataset).

Error - Unfairness Method ERM G2DM DANN CDANN CORAL Group DRO FATDM

Equalized Odds

AUROC - MD 0.3395 0.2765 0.2972 0.2548 0.3642 0.3627 0.0633 AUPRC - MD 0.2865 0.2446 0.2561 0.2304 0.3052 0.2998 0.0771 CE - MD 0.1096 0.1266 0.1243 0.1192 0.1269 0.1179 0.0341 MR - MD 0.3929 0.3525 0.3509 0.3302 0.4303 0.4240 0.0656 F1 - MD 0.4213 0.3219 0.3527 0.4178 0.4283 0.4189 0.1369 AUROC - EMD 0.4277 0.3813 0.3637 0.3419 0.4419 0.4394 0.2729 AUPRC - EMD 0.3868 0.3588 0.3285 0.3245 0.3958 0.3921 0.3041 CE - EMD 0.2366 0.2401 0.2348 0.2334 0.2447 0.2339 0.1827 MR - MD 0.4592 0.4435 0.4017 0.3904 0.4942 0.4792 0.2802 F1 - MD 0.5186 0.4642 0.4132 0.4827 0.5180 0.5029 0.3855

Equal Opportunity

AUROC - MD 0.2488 0.2139 0.2085 0.1806 0.2696 0.2625 0.0218 AUPRC - MD 0.2606 0.2381 0.2297 0.2035 0.2937 0.2874 0.0168 CE - MD 0.1540 0.1839 0.1689 0.1572 0.1487 0.1446 0.0234 MR - MD 0.2967 0.2652 0.2620 0.2516 0.3101 0.2999 0.0468 F1 - MD 0.2848 0.2195 0.2534 0.2613 0.2973 0.2975 0.0502 AUROC - EMD 0.2736 0.2472 0.2449 0.2155 0.2897 0.2841 0.1121 AUPRC - EMD 0.2653 0.2451 0.2429 0.2176 0.2852 0.2812 0.0912 CE - EMD 0.2083 0.2318 0.2355 0.2147 0.2055 0.2003 0.1159 MR - MD 0.3409 0.3162 0.3258 0.3026 0.3442 0.3388 0.1872 F1 - MD 0.3237 0.2756 0.3031 0.3008 0.3215 0.3271 0.1779

Published as a conference paper at ICLR 2023

Figure 11: Error-unfairness curves with respect to equalized odds of FATDM and ERM on Edema disease dataset when training with different numbers of source domains. Names in the figure legend are in the form of X-Y where X is the model and Y is the number of source domains (e.g., ERM-2 means training ERM on two source domains.)

Published as a conference paper at ICLR 2023

(a) Equalized Odds

(b) Equal Opportunity

Figure 12: Fairness-accuracy trade-off (Pareto frontier) of models trained with simultaneous and sequential (i.e., models with -seq suffix) approaches, and FATDM-Cycle GAN (i.e., use Cycle GAN instead of Star GAN as density mapping functions) on Cardiomegaly disease dataset: error-unfairness curves are constructed by varying ω [0, 10] and the values of error and unfairness are normalized to [0, 1]. Lower-left points and the smaller area under the curve indicate the model has a better fairness-accuracy trade-off (Pareto optimality).

D ADDITIONAL RESULTS & LEMMAS

D.1 TIGHTER UPPER BOUND FOR ACCURACY

Corollary 5.1 We can replace term (ii) in Thm. 1 with the following term to attain a tighter upper bound for accuracy:

2C min i [N]

d JS P Y DT , P Y DS i

2ηT V Ez PDT i (z)

d JS P X|Y DT , P X|Y DT i

where ηT V = sup P X Di =P X Dj

DT V P Z Di,P Z Dj

DT V P X Di,P X Dj

1 is called Dobrushin s coefficient (Polyanskiy & Wu,

This result suggests that we can further optimize term (ii) in Thm. 1 by minimizing ηT V . It has been shown in Shui et al. (2022) that ηT V can be controlled by Lipschitz constant of the feature mapping g : X Z when g follows Gaussian distribution. The Lipschitz constant of g, in turn, can be upper bounded by the Frobenius norm of Jacobian matrix with respect to g (Miyato et al., 2018). However, in practice, we found that computing Jacobian matrix of g is computationally expensive when dimension of representation Z is large, and optimizing it together with invariant constraints does not improve the performances of models in our experiments.

D.2 LEMMAS FOR PROVING THEOREM 1

Lemma 6 Let X be the random variable in domains Di and Dj, and E be an event that P X Dj P X Di, then we have: Z

P X Dj P X Di d X = Z

P X Dj P X Di d X = 1

Z P X Dj P X Di d X

where E is the complement of event E.

Published as a conference paper at ICLR 2023

Lemma 7 Let X be the random variable in domains Di and Dj, let f : X R+ be a non-negative function bounded by C, then we have:

EDj[f(X)] EDi[f(X)] C

min DKL P X Di P X Dj

, DKL P X Dj P X Di

where DKL( ) is the KL-divergence between two distributions.

Lemma 8 Suppose loss function L is upper bounded by C and consider a classifier bf : X Y. the expected classification error of bf in domain Dj can be upper bounded by its error in domain Di:

ϵAcc Dj bf ϵAcc Di bf +

2Cd JS P X,Y Dj , P X,Y Di

where X, Y are random variables denoting feature and label in domains Di and Dj.

Lemma 9 Consider two distributions P X Di and P X Dj over X. Let P Z Di and P Z Dj be the induced distributions over Z by mapping function g : X Z, then we have:

d JS(P X Di, P X Dj) d JS(P Z Di, P Z Dj)

Lemma 10 (Phung et al., 2021) Consider domain D with joint distribution P X,Y D and labeling function f D : X Y from feature space to label space. Given mapping function g : X Z from feature to representation space, we define labeling function h D : Z Y from representation

space to label space as h D(Z)Y = f D(X)Y g 1(Z) =

g 1(Z) f D(X)Y P X D d X R

g 1(Z) P X D d X . Similarly, let

bf be the hypothesis from feature space, then the corresponding hypothesis bh from representation

space under the mapping function g is computed as bh(Z)Y =

g 1(Z) b f(X)Y P X D d X R

g 1(Z) P X D d X . Let ϵAcc D ( bf) =

ED L( bf(X), Y ) and ϵAcc D (bh) = ED L(bh(Z), Y ) be expected errors defined with respect to feature space and representation space, respectively. We have:

ϵAcc D bf = ϵAcc D bh

D.3 LEMMAS FOR PROVING COROLLARY 1.1

Lemma 11 Consider two random variables X, Y . Let P X,Y Di , P X,Y Dj be two joint distributions defined

in domains Di and Dj, respectively. Then, JS-divergence DJS P X,Y Di P X,Y Dj

and KL-divergence

DKL P X,Y Di P X,Y Dj

can be decomposed as follows:

DKL P X,Y Di P X,Y Dj

= DKL P Y Di P Y Dj + EDi h DKL P X|Y Di P X|Y Dj

DJS P X,Y Di P X,Y Dj

DJS P Y Di P Y Dj + EDi h DJS P X|Y Di P X|Y Dj

+ EDj h DJS P X|Y Di P X|Y Dj

D.4 LEMMAS FOR PROVING THEOREM 2

Lemma 12 Under Assumption in Theorem 2, the following holds for any domain D:

ϵAcc D ( bf) = q

ED[L( bf(X), Y )]

2c |Y|d JS(P Y D , P b Y D )2, bf

where b Y is the prediction made by randomized predictor bf.

Published as a conference paper at ICLR 2023

D.5 LEMMAS FOR PROVING THEOREM 3

Definition 13 Given domain Di with binary random variable A denoting the sensitive attribute, the unfairness measures that evaluate the violation of equalized odd (EO) and equal opportunity (EP) criteria between sensitive groups of this domain are defined as follows.

ϵEO Di bf = R0,0 Di

bf + R1,0 Di

ϵEP Di bf = R1,0 Di

where Ry,a Di

bf = EDi h bf(X)1|Y = y, A = a i .

Lemma 14 Given two domains Di and Dj, under Definition 13, Ry,a Dj

bf can be bounded by

bf as follows.

2d JS P X|Y =y,A=a Dj , P X|Y =y,A=a Di

y, a {0, 1}

Lemma 15 Given two domains Di and Dj, under Definition 13, the unfairness in domain Dj can be upper bounded by the unfairness measure in domain Di as follows.

ϵEO Dj bf ϵEO Di bf +

a=0,1 d JS P X|Y =y,A=a Dj , P X|Y =y,A=a Di

ϵEP Dj bf ϵEP Di bf +

a=0,1 d JS P X|Y =1,A=a Dj , P X|Y =1,A=a Di

Lemma 16 Consider domain D with distribution P X,Y D and labeling function f D : X Y . Given mapping function g : X Z from feature to representation space, we define labeling function h D : Z Y from representation space to label space as h D(Z)Y = f D(X)Y g 1(Z) = R

g 1(Z) f D(X)Y P X D d X R

g 1(Z) P X D d X . Similarly, let bf be the hypothesis from feature space, then the corresponding

hypothesis bh from representation space under the mapping function g is computed as bh(Z)Y = R

g 1(Z) b f(X)Y P X D d X R

g 1(Z) P X D d X . Under Definition 13, we have:

ϵEO D bf = ϵEO D bh

ϵEP D bf = ϵEP D bh

D.6 LEMMAS FOR PROVING THEOREM 5

Lemma 17 Consider two domains Di and Dj, if there exist invertible mappings my i,j and my,a i,j such

that P X|y Di = P my i,j(X)|y Dj and P X|y,a Di = P my,a i,j (X)|y,a Dj , y Y, a A, then DJS P Z|y Di P Z|y Dj

and DJS P Z|y,a Di P Z|y,a Dj

can be upper bounded by R

x P x|y Di DJS P Z|x P Z|my i,j(x) dx and R

x P x|y,a Di DJS P Z|x P Z|my,a i,j (x) dx, respectively.

E.1 PROOFS OF THEOREMS

Proof of Theorem 1. First, we get the upper bound based on the representation space Z. Then, we relate it with the feature space X. Let DS {DS i }N i=1 be the source domain that s nearest to the

Published as a conference paper at ICLR 2023

target domain DT . According to Lemma 8, we have upper bound of the expected classification error for the target domain based on each of the source domain as follows.

ϵAcc DT bh ϵAcc DS i

2Cd JS P Z,Y DT , P Z,Y DS i

Taking average of upper bounds based on all source domains, we have:

ϵAcc DT bh 1

i=1 ϵAcc DS i

i=1 d JS P Z,Y DT , P Z,Y DS i

i=1 ϵAcc DS i

i=1 d JS P Z,Y DT , P Z,Y DS

i=1 d JS P Z,Y DS , P Z,Y DS i

i=1 ϵAcc DS i

2C min i [N]d JS P Z,Y DT , P Z,Y DS i

2C max i,j [N]d JS P Z,Y DS i , P Z,Y DS j

Here we have (1) by using triangle inequality for JS-distance: d JS(P, R) d JS(P, Q) + d JS(Q, R)

with P, Q, and R = PDT , PDS and PDS i , respectively. We have (2) because DS {DS i }N i=1 then

d JS P Z,Y DS , P Z,Y DS i

max i,j [N]d JS P Z,Y DS i , P Z,Y DS j

. Similarly, we can obtain the upper bound based

on the feature space X as follows.

ϵAcc DT bf 1

i=1 ϵAcc DS i

2C min i [N]d JS P X,Y DT , P X,Y DS i

2C max i,j [N]d JS P X,Y DS i , P X,Y DS j

However, the bounds in Eq. (7) and Eq. (8) are based on either feature space or representation space, which is not readily to use for practical algorithmic design because the actual objective is to minimize ϵAcc DT bf in feature space by controlling Z in representation space. According to Lemmas 9 and 10, we can derive the bound that relates feature and representation spaces as follows.

ϵAcc DT bf = ϵAcc DT bh

i=1 ϵAcc DS i

2C min i [N]d JS P Z,Y DT , P Z,Y DS i

2C max i,j [N]d JS P Z,Y DS i , P Z,Y DS j

i=1 ϵAcc DS i

2C min i [N]d JS P X,Y DT , P X,Y DS i

2C max i,j [N]d JS P Z,Y DS i , P Z,Y DS j

Proof of Corollary 1.1.

d JS P Z,Y DS i , P Z,Y DS j

DJS P Z,Y DS i P Z,Y DS j

DJS P Y DS i P Y DS j

+ 2Ez PDS i,j (z) h DJS P Z|Y DS i P Z|Y DS j

(2) d JS P Y DS i , P Y DS j

2Ez PDS i,j (z)

d JS P Z|Y DS i , P Z|Y DS j

Here we have (1) by using Lemma 11 to decompose the JS-divergence of the joint distributions and (2) by using inequality

Published as a conference paper at ICLR 2023

This new upper bound, combined with Thm. 1 suggests learning representation Z such that P Z|Y DS i is invariant across source domains, or in another word, Z D | Y . This result is consistent with Thm. 4: when the target domain DT is the mixture of source domains {DS i }N i=1, and when P Y DS i and

P Z|Y DS i are invariant across source domains, we have d JS P Z,Y DT , P Z,Y DS i

= d JS P Z,Y DS i , P Z,Y DS j

implying ϵAcc DT bf 1

N PN i=1 ϵAcc DS i

bf = ϵAcc DS i

Proof of Corollary 5.1 (tighter upper bound for accuracy). The bound in Eq. (9) is constructed using Lemma 9. Indeed, we can make this bound tighter using the strong data processing inequality for JS-divergence (Polyanskiy & Wu, 2017), as stated below.

DJS P Z Di P Z Dj ηJSDJS P X Di P X Dj ηT V DJS P X Di P X Dj

where Z is random variable induced from random variable X, and P X Di and P X Di are two distribution

over X, and ηJS = sup P X Di =P X Dj

DJS P Z Di,P Z Dj

DJS P X Di,P X Dj

ηT V = sup P X Di =P X Dj

DT V P Z Di,P Z Dj

DT V P X Di,P X Dj

1, DT V is the

total variation distance. ηT V is called the Dobrushin s coefficient (Polyanskiy & Wu, 2017).

Apply Lemma 11 and this inequality to the second term in the right hand side of Eq. (7) (similar to the proof of Corollary 1.1), we have:

2C min i [N]d JS P Z,Y DT , P Z,Y DS i

2C min i [N]

d JS P Y DT , P Y DS i

2Ez PDT i (z)

d JS P Z|Y DT , P Z|Y DT i

2C min i [N]

d JS P Y DT , P Y DS i

2ηT V Ez PDT i (z)

d JS P X|Y DT , P X|Y DT i

Proof of Theorem 2. Consider a source domain DS i and target domain DT . Because JS-distance d JS( , ) is a distance metric, we have triangle inequality:

d JS(P Y DS i , P Y DT ) d JS(P Y DS i , P b Y DS i ) + d JS(P b Y DS i , P b Y DT ) + d JS(P b Y DT , P Y DT )

Since X g Z bh b Y , we have d JS(P b Y DS i , P b Y DT ) d JS(P Z DS i , P Z DT ). Using Lemma 12, the

following holds when d JS(P Y DS i , P Y DT ) d JS(P Z DS i , P Z DT ) d JS(P Y DS i , P Y DT ) d JS(P Z DS i , P Z DT ) 2 d JS(P Y DS i , P b Y DS i ) + d JS(P b Y DT , P Y DT ) 2

2 d JS(P Y DS i , P b Y DS i )2 + d JS(P b Y DT , P Y DT )2

ϵAcc DS i ( bf) + q

ϵAcc DT ( bf)

ϵAcc DS i ( bf) + ϵAcc DT ( bf)

The last inequality is by AM-GM inequality.

Therefore, when d JS(P Y DS i , P Y DT ) d JS(P Z DS i , P Z DT ), we have

ϵAcc DS i ( bf) + ϵAcc DT ( bf) c 4|Y|

d JS(P Y DS i , P Y DT ) d JS(P Z DS i , P Z DT ) 4

The above holds for any source domain DS i . Average over all N source domains, we have

i=1 ϵAcc DS i ( bf) + ϵAcc DT ( bf) c 4|Y|N

d JS(P Y DS i , P Y DT ) d JS(P Z DS i , P Z DT ) 4

Published as a conference paper at ICLR 2023

Proof of Theorem 3. The proof is based on Lemmas 15 and 16 and similar to the proof of Thm. 1.

Let DS {DS i }N i=1 be the source domain nearest to the target domain DT . According to Lemma 15, we have upper bound of the unfairness measured with respect to the representation space for the target domain based on each of the source domain. For equal opportunity (EP), we have:

ϵEP DT bh ϵEP DS i

a {0,1} d JS P Z|Y =1,A=a DT , P Z|Y =1,A=a DS i

Taking average of upper bounds based on all source domains, we have:

ϵEP DT bh 1

i=1 ϵEP DS i

a {0,1} d JS P Z|Y =1,A=a DT , P Z|Y =1,A=a DS i

i=1 ϵEP DS i

a {0,1} d JS P Z|Y =1,A=a DT , P Z|Y =1,A=a DS

a {0,1} d JS P Z|Y =1,A=a DS , P Z|Y =1,A=a DS i

i=1 ϵEP DS i

2 min i [N]

a {0,1} d JS P Z|Y =1,A=a DT , P Z|Y =1,A=a DS i

2 max i,j [N]

a {0,1} d JS P Z|Y =1,A=a DS i , P Z|Y =1,A=a DS j

According to Lemmas 9 and 16. we can relate this bound to the feature space as follows.

ϵEP DT bf = ϵEP DT bh

i=1 ϵEP DS i

2 min i [N]

a {0,1} d JS P Z|Y =1,A=a DT , P Z|Y =1,A=a DS i

2 max i,j [N]

a {0,1} d JS P Z|Y =1,A=a DS i , P Z|Y =1,A=a DS j

i=1 ϵEP DS i

2 min i [N]

a {0,1} d JS P X|Y =1,A=a DT , P X|Y =1,A=a DS i

2 max i,j [N]

a {0,1} d JS P Z|Y =1,A=a DS i , P Z|Y =1,A=a DS j

Similarly, we got the upper bound for unfairness measure with respect to equalized odds as follows.

ϵEO DT bf 1

i=1 ϵEO DS i

2 min i [N]

a {0,1} d JS P X|Y =y,A=a DT , P X|Y =y,A=a DS i

2 max i,j [N]

a {0,1} d JS P Z|Y =y,A=a DS i , P Z|Y =y,A=a DS j

Proof of Theorem 4. Consider two source domains, DS i and DS j , if P Y DS i = P Y DS j , we can learn the

mapping function g = Pθ (Z|X) such that P Z|Y DS i = P Z|Y DS j . Note that this mapping function always

exists. In particular, the trivial solution for Z that satisfies P Z|Y DS i = P Z|Y DS j is making Z Y, D (e.g.,

Published as a conference paper at ICLR 2023

Pθ (Z|X) = N (0, I)). Then we have:

bh = Ez P Z DS i ,y h DS i (z) h L bh (Z) , Y i

= Ey P Y DS i ,z P Z|Y

h L bh (Z) , Y i

= Ey P Y DS j ,z P Z|Y

h L bh (Z) , Y i

= Ez P Z DS j ,y h DS j (z) h L bh (Z) , Y i

= ϵAcc DS j

For unseen target domain DT in Λ, we have:

ϵAcc DT bh = EDT h L bh (Z) , Y i

Z Y L bh (Z) , Y P Y,Z DT d Y d Z

Z Y L bh (Z) , Y N X

i=1 πi P Y,Z DS i d Y d Z

Z Y L bh (Z) , Y P Y,Z DS i d Y d Z

i=1 πi EDS i

h L bh (Z) , Y i

h L bh (Z) , Y i i [N]

= ϵAcc DS i

By Lemma 10, we have ϵAcc DT bh = ϵAcc DS i

bh = ϵAcc DT bf = ϵAcc DS i

For fairness, we only give the proof for equalized odds (EO), we can easily get the similar derivation for equal opportunity. For any Z that satisfies P Z|Y =y,A=a DS i = P Z|Y =y,A=a DS j y, a {0, 1}, we have:

y {0,1} D P bh(Z)1|Y =y,A=0 DS i P bh(Z)1|Y =y,A=1 DS i

y {0,1} D P bh(Z)1|Y =y,A=0 DS j P bh(Z)1|Y =y,A=1 DS j

For unseen target domain DT in Λ, we have:

ϵEO DT bh = X

y {0,1} D P bh(Z)1|Y =y,A=0 DT P bh(Z)1|Y =y,A=1 DT

i=1 πi P bh(Z)1|Y =y,A=0 DS i

i=1 πi P bh(Z)1|Y =y,A=1 DS i

y {0,1} D P bh(Z)1|Y =y,A=0 DS i P bh(Z)1|Y =y,A=1 DS i

Published as a conference paper at ICLR 2023

Similar to the proof of accuracy, Z that satisfies P Z|Y =y,A=a DS i = P Z|Y =y,A=a DS j y, a {0, 1}, i, j

[N] always exists. The trivial solution for is Z that satisfies Z Y, A, D.

By Lemma 16, we have ϵEO DT bh = ϵEO DS i

bh = ϵEO DT bf = ϵEO DS i

For equal opportunity (EP), Z only need to satisfy the condition for positive label, i.e., P Z|Y =1,A=a DS i =

P Z|Y =1,A=a DS j a {0, 1}, i, j [N].

Proof of Theorem 5. According to Lemma 17, we have:

DJS P Z|y Di P Z|y Dj

x P x|y Dj DJS P Z|x Di P Z|my i,j(x) Di

Then, minimizing DJS P Z|y Di P Z|y Dj

can be achieved by minimizing

DJS P Z|x P Z|my i,j(x) x X. We can upper bound DJS P Z|x P Z|my i,j(x) as follows

DJS P Z|x P Z|my i,j(x) DT V P Z|x P Z|my i,j(x)

2 d1/2 P Z|x, P Z|my i,j(x)

2 d1/2 N µ(x); σ2Id , N µ my i,j(x) ; σ2Id (13)

where DT V and d1/2 are total variation distance and Hellinger distance between two distributions,

respectively. We have (1) = because of our choice for representation mapping g(x) := P Z|x = N µ(x); σ2Id . According to Devroye et al. (2018), the Hellinger distance between two multivariate normal distributions over Rd has a closed form as follows

d1/2 (N (µ1; Σ1) , N (µ2; Σ2))

v u u t1 det (Σ1)1/4 det (Σ2)1/4

8 (µ1 µ2)T Σ1 + Σ2

1 (µ1 + µ2)

where µ1, µ2, Σ1, Σ2 are mean vectors and covariance matrices of the two normal distributions. In Eq. (14), let µ1 = µ(x), µ2 = µ my i,j(x) , Σ1 = Σ2 = σ2Id, then we have:

d1/2 N µ(x); σ2Id , N µ my i,j(x) ; σ2Id

1 exp 1 8dσ2 µ (x) µ my i,j(x) T µ (x) µ my i,j(x)

1 exp 1 8dσ2 µ (x) µ my i,j(x) 2

From Eq. (15), we can see that Helinger distance between two representation distributions P Z|x

and P Z|my i,j(x) is the function of their means µ (x) and µ my i,j(x) . Combining this with Eq.

(12) and Eq. (13), we conclude that minimizing d JS P Z|y DS i , P Z|y DS j

can be reduced to minimizing µ(x) µ my i,j(x) 2 which can be implemented as the mean square error between µ(x) and

µ my i,j(x) in practice. Proof for d JS P Z|y,a DS i , P Z|y,a DS j

is derived in the similar way.

Published as a conference paper at ICLR 2023

E.2 PROOFS OF LEMMAS

Proof of Lemma 6. We have: Z

P X Dj P X Di d X = Z

P X Dj P X Di d X

P X Dj P X Di d X Z

P X Dj P X Di d X

P X Di P X Dj d X

P X Dj P X Di d X

Z P X Dj P X Di d X

Proof of Lemma 7. We have:

EDj [f(X)] = Z

X f(X)P X Djd X = Z

X f(X)P X Did X + Z

X f(X) P X Dj P X Di d X

= EDi [f(X)] + Z

X f(X) P X Dj P X Di d X

= EDi [f(X)] + Z

E f(X) P X Dj P X Di d X + Z

E f(X) P X Dj P X Di d X

(1) EDi [f(X)] + Z

E f(X) P X Dj P X Di d X

(2) EDi [f(X)] + C Z

P X Dj P X Di d X

= EDi [f(X)] + C Z

P X Dj P X Di d X

(3) EDi [f(X)] + C

Z P X Dj P X Di d X

(4) EDi [f(X)] + C

2 min DKL P X Di P X Dj

, DKL P X Dj P X Di

= EDi [f(X)] + C

min DKL P X Di P X Dj

, DKL P X Dj P X Di

where E is the event that P X Dj P X Di and E is the complement of E. We have (1) because R

E f(X) P X Dj P X Di d X 0; (2) because f(X) is non-negative function and is bounded by

C; (3) by using Lemma 6; (4) by using Pinsker s inequality between total variation norm and KLdivergence.

Proof of Lemma 8. Applying Lemma 7 and replacing X by (X, Y ), f by loss function L, Di by Di,j, we have:

ϵAcc Dj bf EDi,j h L( bf(X), Y ) i = EDj h L( bf(X), Y ) i EDi,j h L( bf(X), Y ) i

min DKL P X,Y Dj P X,Y Di,j

, DKL P X,Y Di,j P X,Y Dj

DKL P X,Y Dj P X,Y Di,j

Published as a conference paper at ICLR 2023

Applying Lemma 7 again and replacing X by (X, Y ), f by loss function L, Dj by Di,j, we have:

EDi,j h L( bf(X), Y ) i ϵAcc Di bf = EDi,j h L( bf(X), Y ) i EDi h L( bf(X), Y ) i

min DKL P X,Y Di P X,Y Di,j

, DKL P X,Y Di,j P X,Y Di

DKL P X,Y Di P X,Y Di,j

Adding Eq. (16) to Eq. (17), we have:

ϵAcc Dj bf ϵAcc Di bf C

DKL P X,Y Di P X,Y Di,j

DKL P X,Y Dj P X,Y Di,j

2 DKL P X,Y Di P X,Y Di,j

+ DKL P X,Y Dj P X,Y Di,j

4DJS P X,Y Di P X,Y Dj

2Cd JS P X,Y Di , P X,Y Dj

Here we have (1) by using Cauchy Schwarz inequality.

Proof of Lemma 9. Note that the JS-divergence DJS P X Di P X Dj can be understood as the

mutual information between a random variable X associated with the mixture distribution P X Di,j =

1 2 P X Di + P X Dj and the equiprobable binary random variable T used to switch between P X Di and

P X Dj to create the mixture distribution P X Di,j. In particular, we have:

DJS P X Di P X Dj = 1

DKL P X Di P X Di,j + DJS P X Dj P X Di,j

Z log P X Di log P X Di,j P X Did X

Z log P X Dj log P X Di,j P X Djd X

Z log P X Di P X Didx + 1

Z log P X Dj P X Djd X

Z log P X Di,j P X Di,jd X

= H(X|T) + H(X) = I(X; T)

where H(X) is the entropy of X, H(X|T) is the entropy of X conditioned on T, and I(X; T) is the mutual information between X and T. Similarly, we also have DJS((P Z Di P Z Dj)) = I(Z; T). Because Z is induced from X by the mapping function h then we have Z T | X and the Markov chain T X Z. According to data processing inequality for mutual information (Polyanskiy & Wu, 2014), we have I(X; T) I(Z; T) which implies DJS((P X Di P X Dj)) DJS((P Z Di P Z Dj)). Taking square root on both sides, we have d JS(P X Di, P X Dj) d JS(P Z Di, P Z Dj).

Published as a conference paper at ICLR 2023

Proof of Lemma 10. We have:

ϵAcc D bh = Ez P Z D ,y h D(z) h L bh (Z) , Y i

y=1 Ez P Z D

h L bh (Z) , y h D(Z)y i

Z L bh (Z) , y h D(Z)y P Z Dd Z

Z L bh (Z) , y R

g 1(Z) f D(X)y P X D d X R

g 1(Z) P X D d X

g 1(Z) P X D d Xd Z

Z L bh (Z) , y Z

g 1(Z) f D(X)y P X D d Xd Z

g 1(Z) L bh (g(X)) , y f D(X)y P X D d Xd Z

X 1 X g 1(Z) L bh (Z) , y f D(X)y P X D d Xd Z

Z 1 (Z = g(X)) L bh (Z) , y f D(X)y P X D d Xd Z

X L bh (g(X)) , y f D(X)y P X D d Xd Z

X L bf (X) , y f D(X)y P X D d X

= ϵAcc D bf

Proof of Lemma 11. We show the decomposition for KL-divergence first and then use the result to derive the decomposition for JS-divergence. We have:

DKL P X,Y Di P X,Y Dj

= EDi h log P X,Y Di log P X,Y Dj

= EDi h log P Y Di + log P X|Y Di

i EDi h log P Y Dj + log P X|Y Dj

= EDi h log P Y Di log P Y Dj i + EDi h log P X|Y Di log P X|Y Dj

= EDi h log P Y Di log P Y Dj i + Ey P Y Di

Ex P X|y Di

h log P X|Y Di log P X|Y Dj

= DKL P Y Di P Y Dj + EDi h DKL P X|Y Di P X|Y Dj

Published as a conference paper at ICLR 2023

DJS P X,Y Di P X,Y Dj

DKL P X,Y Di P X,Y Di,j

DKL P X,Y Dj P X,Y Di,j

DKL P Y Di P Y Di,j + 1

EDi h DKL P X|Y Di P X|Y Di,j

DKL P Y Dj P Y Di,j + 1

EDj h DKL P X|Y Dj P X|Y Di,j

= DJS P Y Di P Y Dj + 1

EDi h DKL P X|Y Di P X|Y Di,j

EDj h DKL P X|Y Dj P X|Y Di,j

DJS P Y Di P Y Dj + 1

EDi h DKL P X|Y Di P X|Y Di,j

EDi h DKL P X|Y Dj P X|Y Di,j

EDj h DKL P X|Y Dj P X|Y Di,j

EDj h DKL P X|Y Di P X|Y Di,j

= DJS P Y Di P Y Dj + EDi h DJS P X|Y Di P X|Y Dj

i + EDj h DJS P X|Y Di P X|Y Dj

Proof of Lemma 12.

ED h L( bf(X), Y ) i = ED

by Y bf(X)by L(by, Y )

by Y bf(X)by Pr(Y = by|X)

(2) = c EX h 1 bf(X)T f(X) i

bf(X) f(X) 2

(4) c 2 1 |Y|EX

bf(X) f(X) 1

(5) c 2 1 |Y|

EX h bf(X) f(X) i 1

= c 2 1 |Y|

P b Y D P Y D 2

(6) 2c |Y|DJS P Y D P b Y D 2

= 2c |Y| d JS P Y D , P b Y D 4

Here we have (1) is because of the assumption that L(by, y) is lower bounded by c when by = y;

(2) = is because bf(X)T 1 = || bf(X)||1 = 1; (3) is because || bf(X)||2 || bf(X)||1 = 1; (4) is because

|| bf(X)||2 1

|Y||| bf(X)||1; (5) is by using Jensen s inequality; (6) is by using JS-divergence lower

bound of total variation distance.

Published as a conference paper at ICLR 2023

Proof of Lemma 14. Similar to the proof in Lemma 8, we apply Lemma 7 for Ry,a Di and Ry,a Dj and

note that bf(X)y is bounded by 1. Then y, a {0, 1}, we have:

Ry,a Dj EDi,j h bf(X)y|Y = y, A = a i

= EDj h bf(X)y|Y = y, A = a i EDi,j h bf(X)y|Y = y, A = a i

min DKL P X|Y =y,A=a Di P X|Y =y,A=a Di,j

, DKL P X|Y =y,A=a Di,j P X|Y =y,A=a Di

DKL P X|Y =y,A=a Dj P X|Y =y,A=a Di,j

EDi,j h bf(X)y|Y = y, A = a i Ry,a Di

= EDi,j h bf(X)y|Y = y, A = a i EDi h bf(X)y|Y = y, A = a i

min DKL P X|Y =y,A=a Dj P X|Y =y,A=a Di,j

, DKL P X|Y =y,A=a Di,j P X|Y =y,A=a Dj

DKL P X|Y =y,A=a Di P X|Y =y,A=a Di,j

Adding Eq. (18) to Eq. (19), we have: Ry,a Dj Ry,a Di

DKL P X|Y =y,A=a Di P X|Y =y,A=a Di,j

DKL P X|Y =y,A=a Dj P X|Y =y,A=a Di,j

2d JS P X|Y =y,A=a Dj , P X|Y =y,A=a Di,j

Proof of Lemma 15. We give the proof for unfairness measure w.r.t. to equal opportunity first and then use this result to derive the proof for unfairness measure w.r.t. to equalized odd. Without loss of generality, assign group indices 1, 0 be such that R1,0 Dj

bf . Then we have:

ϵEP Dj bf = R1,0 Dj

bf EDj h bf(X)1|Y = 1, A = 1 i

bf + EDj h 1 bf(X)1|Y = 1, A = 1 i 1

bf + R1,1 Dj

where 1 is vector with all 1 s. By Lemma 14, we have:

2d JS P X|Y =1,A=0 Dj , P X|Y =1,A=0 Di

1 bf R1,1 Di

2d JS P X|Y =1,A=1 Dj , P X|Y =1,A=1 Di

Sum above two inequalities and add 1 at both sides, we have,

ϵEP Dj bf = R1,0 Dj

bf + R1,1 Dj

bf + R1,1 Di

a=0,1 d JS P X|Y =1,A=a Dj , P X|Y =1,A=a Di

ϵEP Di bf +

a=0,1 d JS P X|Y =1,A=a Dj , P X|Y =1,A=a Di

Published as a conference paper at ICLR 2023

Similarly, we have: R0,0 Dj

a=0,1 d JS P X|Y =0,A=a Dj , P X|Y =0,A=a Di

Sum both Eq. (20) and Eq. (21), we have:

ϵEO Dj bf ϵEO Di bf +

a=0,1 d JS P X|Y =y,A=a Dj , P X|Y =y,A=a Di

Proof of Lemma 16. Similar to the proof of Lemma 10, Ry,a Di

bf = Ry,a Di

bh y, a {0, 1}. Then, we have:

ϵEO Di bf = R0,0 Di

bf + R1,0 Di

bh + R1,0 Di

= ϵEO Di bh

ϵEP Di bf = R1,0 Di

= ϵEP Di bh

Proof of Lemma 17. y Y, we have:

DJS P Z|y i P Z|y j (1) = DJS

X P Z|x P x|y i dx Z

X P Z|my i,j(x)P my i,j(x)|y j dmy i,j(x)

X P Z|x P x|y i dx Z

X P Z|my i,j(x)P my i,j(x)|y j dx

X P Z|x P x|y i dx Z

X P Z|my i,j(x)P x|y i dx

X P x|y i DJS P Z|x P Z|my i,j(x) dx

Here we have (1) = is because of law of total probability and Z Y |X; (2) = is because my i,j is invertible

function; (3) = is because P x|y i = P my i,j(x)|y j x X; (4) is because of joint complexity of JS divergence. By similar derivation, y Y, a A, we have:

DJS P Z|y,a i P Z|y,a j Z

X P x|y,a i DJS P Z|x P Z|my,a i,j (x) dx