# sustaining_fairness_via_incremental_learning__4dcd1783.pdf

Sustaining Fairness via Incremental Learning

Somnath Basu Roy Chowdhury, Snigdha Chaturvedi

University of North Carolina at Chapel Hill {somnath, snigdha}@cs.unc.edu

Machine learning systems are often deployed for making critical decisions like credit lending, hiring, etc. While making decisions, such systems can encode the user s demographic information (like gender, age) in their intermediate representations. This can lead to decisions that are biased towards speciﬁc demographics. Prior work has focused on debiasing intermediate representations to ensure fair decisions. However, these approaches fail to remain fair with changes in the task or demographic distribution. To ensure fairness in the wild, it is important for a system to adapt to such changes as it accesses new tasks in an incremental fashion. In this work, we propose to address this issue by introducing the problem of learning fair representations in an incremental learning setting. To this end, we present Fairness-aware Incremental Representation Learning (Fa IRL), a representation learning system that can sustain fairness while incrementally learning new tasks. Fa IRL is able to achieve fairness and learn new tasks by controlling the rate-distortion function of the learned representations. Our empirical evaluations show that Fa IRL is able to make fair decisions while achieving high performance on the target task, outperforming several baselines.

Introduction An increasing number of organizations are leveraging machine learning solutions for making decisions in critical applications like hiring (Dastin 2018), criminal recidivism (Larson et al. 2016), etc. Machine learning systems can often rely on a user s demographic information, like gender, race, and age (protected attributes), encoded in their representations (Elazar and Goldberg 2018) to make decisions, resulting in biased outcomes against certain demographic groups (Mehrabi et al. 2021; Shah, Schwartz, and Hovy 2020). Numerous works try to achieve fairness through unawareness (Apfelbaum et al. 2010) by debiasing model representations from protected attributes (Blodgett, Green, and O Connor 2016; Elazar and Goldberg 2018; Elazar et al. 2021; Chowdhury and Chaturvedi 2022). However, these techniques are only able to remove in-domain spurious correlations and fail to generalize to new data distributions (Barrett et al. 2019). For example, let us consider a fair resume screening system that was trained only on resumes of software engineering

Copyright c 2023, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

Incremental Learning System

Fair decisions

Task 2 Task 3

Test Task 1

Figure 1: Illustration of a fair representation learning system in an incremental setting. The system is expected to make fair decisions while incrementally learning new tasks.

roles. The system may not remain fair while screening for roles like sales or marketing, where the gender demographic distribution may be different. Similarly, a fair system also needs to be robust to shifts in data distribution (e.g. new applicants may report scores on speciﬁc tests that didn t appear in the training data) and task changes (e.g. resumes being screened for new roles like social media manager). In such cases, it is not always practical to retrain the system from scratch every time new data comes in because of the resources and environmental impact associated with training modern machine learning systems. Previous works focused on improving the robustness of fair learning models by considering shifts in data distribution. These involve learning fair models under covariate shift (Rezaei et al. 2021; Singh et al. 2021) or for streaming data (Zhang et al. 2021; Zhang and Ntoutsi 2019), but these systems do not incrementally learn new tasks. In this work, we introduce the problem of learning fair representations in an incremental learning setting. In this setting, data from new tasks, with different underlying demographic distributions, pour in at consecutive training stages and the system has to perform well on all tasks seen so far while making fair decisions (see Figure 1). This setup is quite similar to incremental learning (Rebufﬁet al. 2017). However, most works in incremental learning literature focus on target task performance without considering the fairness of their predictions. To address this problem, we propose a representation learning system Fairness-aware Incremental

The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23)

Representation Learning (Fa IRL). At its core, Fa IRL uses an adversarial debiasing setup for removing demographic information by controlling the number of bits (rate-distortion) required to encode the learned representations (Yu et al. 2020; Ma et al. 2007). We leverage this debiasing setup for incremental learning using an exemplar-based approach, by retaining a small set of representative samples from previous tasks, to prevent forgetting. Empirical evaluations show that Fa IRL outperforms baseline incremental learning systems in fairness metrics while successfully learning target task information. Our key contributions are:

We propose Fa IRL, a representation learning system that learns fair representations, while incrementally learning new tasks, by controlling their rate-distortion function. We show using empirical evaluations that Fa IRL outperforms baseline incremental learning systems in making fair decisions while performing well on the target task. We also perform extensive analysis experiments to investigate the functioning of Fa IRL.

Related Work

In this section, we discuss some of the prior works on fairness in varying setups and incremental learning. Fair Representation Learning. Zemel et al. (2013) introduced the problem of learning fair representations as an optimization task. Following works (Zhang, Lemoine, and Mitchell 2018; Li, Baldwin, and Cohn 2018; Elazar and Goldberg 2018; Chowdhury et al. 2021) leveraged an adversarial framework (Goodfellow et al. 2014) to achieve fairness, where a discriminator tries to extract demographic information from intermediate representations while performing prediction. Different from these, (Bahng et al. 2020) proposed to learn fair representations, without using protected attribute annotation, by making representations uncorrelated with ones retrieved from a biased classiﬁer. However, these techniques require a target task at hand and are often difﬁcult to train (Elazar and Goldberg 2018). Another line of work introduced by (Bolukbasi et al. 2016), focuses on debiasing representations independent of a target task. These approaches (Ravfogel et al. 2020; Bolukbasi et al. 2016) iteratively identify subspaces that encode protected attribute information, and project vectors onto their corresponding nullspaces. Another line of work (Cheng et al. 2020; Dixon et al. 2018), use counterfactual data augmentation approaches to debias sentence embeddings. Recently, Chowdhury and Chaturvedi (2022) proposed a debiasing framework that makes representations from same protected attribute class uncorrelated by maximizing their ratedistortion function. Despite showcasing promise in a single domain, these frameworks fail to remain fair for out-ofdistribution data (Barrett et al. 2019). Fairness under distribution shift. Several works (Rezaei et al. 2021; Singh et al. 2021) have investigated the robustness of fair classiﬁers under covariate shift. These works identify conditions where fairness can be sustained given shifts in data and label distribution. Efﬁcacy of fair classiﬁers has also been studied in online settings (Zhang et al.

2021; Zhang and Ntoutsi 2019), where the data distribution continually evolves depending on the input data stream. However, both lines of work consider a ﬁxed task description at initiation and do not learn new tasks while training. Incremental Learning. Li and Hoiem (2017) introduced the task of incremental learning and proposed a dynamic architecture leveraging a knowledge distillation loss to prevent catastrophic forgetting (Mc Closkey and Cohen 1989). Since then, works on incremental learning can be classiﬁed into three broad categories: (a) Regularization-based approaches (Li and Hoiem 2017; Kirkpatrick et al. 2017; Zenke, Poole, and Ganguli 2017; Castro et al. 2018; Chan et al. 2021) use a penalty measure to ensure model parameters crucial for previous tasks do not change abruptly; (b) Dynamic architecture-based approaches (Long et al. 2015; Rusu et al. 2016; Li et al. 2019) introduce new taskspeciﬁc parameters to prevent interference with parameters from previous tasks. These architectures grow linearly with the number of tasks having a heavy memory footprint; (c) Exemplar-based approaches (Rebufﬁet al. 2017; Chaudhry et al. 2019a,b; Tong et al. 2022) maintain a small memory of representative samples from previous tasks and replay them to prevent catastrophic forgetting. Our framework Fa IRL is similar to that of Tong et al. (2022), as we also control the rate-distortion of learned representations. However, we also consider the fairness of the predictions by ensuring protected information does not get encoded in the representations.

Background In this section, we discuss the fundamental concepts of ratedistortion theory that form the building blocks of our framework, Fa IRL. Rate Distortion. In information theory (Cover 1999), the compactness of a distribution is measured by their coding length number of bits required to encode it. In lossy data compression, a set of vectors Z = {z1, . . . , zn} Rn d, sampled from a distribution P(Z), is encoded using a coding scheme, such that the transmitted vectors {ˆzi}n i=1 can be recovered up to a distortion ϵ. The minimal number of bits required per vector to encode the sequence Z is deﬁned by the rate-distortion function R(Z, ϵ). The optimal R(Z, ϵ) for vectors Z sampled from a multivariate Gaussian N(0, Σ) is:

R(Z, ϵ) = 1

2 log2 det I + d nϵ2 ZZT (1)

where n is the number of vectors and d is the dimension of individual vectors. Equation 1 provides a tight bound even in cases where the underlying distribution P(Z) is degenerate (Ma et al. 2007). In general scenarios, e.g. image representations for multilabel classiﬁcation, the vector set Z can arise from a mixture of class distributions. In such cases, the overall ratedistortion function can be computed by splitting the vectors into multiple subsets: Z = Z1 Z2 . . . Zk, where Zj is the subset from the j-th distribution. We can then compute R(Zj, ϵ) (Equation 1) for each subset. To facilitate this computation, we leverage a global membership matrix Π = {Πj}k j=1, which is a set of k matrices encoding membership information in each subset. The member-

ship matrix for a subset Zj is a diagonal matrix deﬁned as: Πj = diag(π1j, π2j, . . . , πnj) Rn n, where πij [0, 1] is the probability that zi belongs to Zj. The matrices satisfy the following constraints: P j Πj = In n, P j πij = 1, Πj 0. The optimal number of bits to encode Z is given as:

Rc(Z, ϵ|Π) =

2n log2 det I + d tr(Πj)ϵ2 ZΠj ZT

The expected number of vectors in a subset Zj is tr(Πj) and the corresponding covariance is cov(Zj) = 1 tr(Πj)ZΠj ZT . For multi-class data, a vector zi can only be a member of a single class, we restrict πij = {0, 1} and the covariance matrix for j-th subset is Zj(Zj)T . Maximal Coding Rate (MCR2). Yu et al. (2020) introduced a classiﬁcation framework by learning discriminative representations using the rate-distortion function. Given n input samples X = {xi}n i=1 belonging to k distinct classes, their representations Z = {zi}n i=1 are obtained using a deep network fθ(x). The network parameters (θ) are learned by maximizing a representation-level objective using rate-distortion called maximal coding rate (MCR2):

max θ R(Z, Π) = R(Z, ϵ) Rc(Z, ϵ|Π) (2)

where Π captures the class label information. To have discriminative representations, same class representations should resemble each other while being different from representations from other classes. This can be achieved by maximizing the overall volume R(Z, ϵ) and compressing representations within each class by minimizing Rc(Z, ϵ|Π). We provide further details in Appendix B.

Fairness-aware Incremental Representation Learning (Fa IRL) Debiasing Framework

We present a novel adversarial debiasing framework that controls the rate-distortion function of the learned representations. We use rate-distortion in this debiasing framework as it is amenable to incremental learning. Figure 2 illustrates our proposed adversarial framework. It consists of a feature encoder φ and a discriminator D. The feature encoder takes as input a data point x and generates representations z = φ(x). Its goal is to learn representations that are discriminative for the target attribute y and not informative about protected attribute g. The discriminator network takes as input the representations produced by feature encoder z and generates z = D(z). Its goal is to extract protected attribute g information from z . The discriminator is trained by maximizing the MCR2 objective function:

max D R(Z , Πg) = R(Z , ϵ) Rc(Z , ϵ|Πg) (3)

where Πg is the membership matrix encoding the protected attribute information. The encoder is trained by optimizing the given objective function:

Figure 2: Workﬂow of our debiasing framework. The discriminator tries to extract protected attribute information by optimizing R(Z , Πg). The feature encoder tries to learn discriminative representations for the target task (y) using MCR2 objective while minimizing the discriminator loss.

max φ R(Z, Πy) β R(Z , Πg) (4)

where Πy is the membership matrix encoding the target attribute information and β is a hyperparameter. Empirically we observed that the proposed debiasing framework is competitive with other debiasing setups in non-incremental learning settings. Fa IRL s debiasing framework leverages the MCR2 objective (Equation 2) for classiﬁcation. MCR2 objective, by itself, is not amenable to incremental learning for reasons discussed below. Yu et al. (2020) showed that using MCR2 it is possible to learn representations with low-dimensional orthogonal subspaces corresponding to each class. However, naively maximimizing the MCR2 objective results in representations spanning the complete feature space (an Rddimensional feature space can accommodate a maximum of d orthogonal subspaces). This is not ideal for incremental learning as representations from new classes cannot be accommodated in the same feature space. For incremental learning, representations learned at a given training stage should be compact and not span the entire feature space. In Fa IRL, we empirically observe that the feature spaces learned at each training stage are compact. This happens because while learning discriminative representations using the MCR2 objective ( R(Z, Πy)), the encoder also tries to remove protected information by minimizing R(Z , Πg) (Equation 4). Minimizing R(Z , Πg) makes representations from different protected classes similar, resulting in a compact feature space. The R(Z , Πg) term acts as a natural regularizer to the MCR2 objective, and prevents the learned representations from expanding in an unconstrained manner, making them suitable for incremental learning. Next, we discuss how we extend this debiasing framework to incremental learning in the following section.

Incremental Learning For incremental learning, we use an exemplar-based approach (Rebufﬁet al. 2017; Chaudhry et al. 2019a,b). We store a small set of exemplars from old tasks Xold =

{X 1 old, . . . , X m old}, where m is the number of target classes (m = c(t 1) < k) the system has encountered so far (each training step introduces c target classes, k is the total number of classes). At training stage t, we have a set of new data samples Xnew and exemplar set Xold (Xold = at t = 0). The goal of our system is to learn discriminative representations w.r.t y for Xnew while retaining the old representation subspaces of Xold. To ensure fairness, the system also needs to learn representations that are oblivious to the protected attribute g for both Xnew and Xold. We will refer to the representations for the old and new data as Zold = φ(Xold) and Znew = φ(Xnew) respectively. Discriminator. In the incremental learning setup, the discriminator tries to extract protected attribute information for Xnew. This is achieved by maximizing R(Z new, Πg new), where Z new = D(φ(Xnew)), and Πg new encodes protected attribute g information for Xnew. Feature encoder. The objective of the feature encoder is to learn fair representations that are discriminative for both old and new tasks. To achieve this, the system should have the following properties:

(a) The system should learn representations for Xnew that are informative about y. This can be achieved by learning discriminative representations for Xnew by maximizing the MCR2 objective: R(Znew, Πy new).

(b) The system should not reveal protected information and learn fair representations for Xnew. This is achieved by minimizing the discriminator loss R(Z new, Πg new) (Equation 4).

(c) The system should retain knowledge about old tasks encountered in previous training stages. Fa IRL maintains an exemplar set Xold and tries to retain the subspace structure learned for these samples. To ensure that encoder φt at training stage t retains the subspace structure of old representations, we minimize the function:

R(Zold, Zold) =

i=1 R(Zi old, Zi old)

i=1 R(Zi old Zi old) 1

2 R(Zi old) + R( Zi old) (5)

where Zold = φt 1(Xold) are exemplar representations obtained using the encoder at the previous training stage (t 1), and Zj old are exemplar representations from the j-th target class. R(Zj old, Zj old) measures the similarity between the representation sets Zj old and Zj old by computing the difference in the number of bits required to encode them jointly and separately.

(d) The system should learn fair representations for Xold. This is achieved by minimizing the discriminator loss for the exemplars R(Z old, Πg old) (Equation 4).

The overall objective function that the encoder optimizes in the incremental learning setup:

Algorithm 1: Prototype Sampling

1: Input: Zt = {φ(X 1 t ), . . . , φ(X c t ))} representations of c classes at training stage t, reservoir of old samples Xold. 2: X t old = exemplars for training stage t 3: for i = 1, . . . , c do 4: V i = PCA(Zi t) where Zi t = X i t 5: V i k = [v1, . . . , vk] top-k(V i) top-k eigen vectors selected based on singular values 6: for j = 1, . . . , r do 7: s = v T i Zi t similarity scores 8: X i old top-q(X i t ) select top q = r

k samples based on similarity scores s 9: X t old = X t old X i old 10: end for 11: end for 12: Xold = Xold X t old add to exemplar set 13: return Xold

max φ R(Znew, Πy new) | {z } (a)

β R(Z new, Πg new) | {z } (b) γ R(Zold, Zold) | {z } (c)

η R(Z old, Πg old) | {z } (d)

where Z old = D(Zold), Πy new is the membership matrix encoding target class labels for Xnew, Πg new and Πg old encode protected class labels for Xnew and Xold respectively. In the following section, we discuss the selection of representative samples Xold from old classes.

Exemplar Sample Selection As discussed in previous section, we maintain exemplars Xold = {X 1 old, . . . , X m old} belonging to m classes, which is useful for retaining information from previous tasks. For each class, we select r (where r |X i|) samples X i old X i by using one of the following sampling techniques: Random Sampling. We randomly select r samples from each class set X i old r X i. Prototype Sampling. We use prototype sampling (Tong et al. 2022) for selecting representative samples for each class. The detailed pseudo-code is presented in Algorithm 1. In this technique, we compute the top k eigenvectors for the set of representations for each class Zi t = φ(X i t ) at training stage t. For each eigenvector, we select r/k data samples (X t old) with the highest similarity scores (line 7). The selected samples are added to Xold. Submodular Optimization. We use submodular optimization (Krause and Golovin 2014) to select representative samples that summarize features of a set. Submodular optimization focuses on set functions which have the diminishing return property. Formally, a submodular function f satiﬁes the property: f(Z {s}) f(Z) f(Y {s}) f(Y ) , where Z Y S, s S, and s Y . We construct a submodular function computed using representations Z that capture their diversity. We select r samples that maximizes f. Speciﬁcally, we use the facility lo-

Figure 3: Representative samples from Biased MNIST dataset. We show an example from each class.

cation algorithm (Frieze 1974), which selects r representative samples from a set Z with n elements (n > r). For any subset S Z, the submodular function f is: f(S) = P z Z maxs S sim(s, z), where sim( , ) is the similarity measure between s and z. In our experiments, Z is the set of data representations and we use euclidean distance as our similarity measure sim(s, z) = ||s z||2 2.

Evaluation In this section, we discuss the datasets, experimental setup, and metrics used for evaluating Fa IRL. Additional details of our experimental setup can be found in Appendix B. Our implementation of Fa IRL is publicly available at https://github.com/brcsomnath/Fa IRL.

Datasets We tackle the problem of fairness in an incremental learning setup, where there are no existing benchmarks.1 We perform evaluations by re-purposing existing datasets. Biased MNIST. We follow the setup of (Bahng et al. 2020) to generate a synthetic dataset using MNIST (Le Cun et al. 1998), by making the background colors highly correlated with the digits. In the training set, the digit category (target attribute) is associated with a distinct background color (protected attribute) with probability p or a randomly chosen color with probability 1 p. In the test set, each digit with assigned one of the 10 colors randomly. We evaluate the generalization ability of Fa IRL for p = {0.8, 0.85, 0.9, 0.95}. We simulate incremental learning by providing the system access to 2 classes at each training stage (a total of 5 stages). Biography classiﬁcation. We re-purpose the BIOS dataset (De-Arteaga et al. 2019) for incremental learning. BIOS contains biographies of people that are associated with a profession (target attribute) and gender label (protected attribute). There are 28 different profession categories and 2 gender classes. The demographic distribution can vary vastly depending on the profession (e.g. software engineer role is skewed towards men while the yoga teacher role is most associated with females). The detailed demographic distribution is reported in Appendix B.In our setup, the system is presented with samples from 5 classes at each training stage (a total of 6 training stages).

Baselines We compare Fa IRL with the following systems:

1Most fairness datasets have target attributes with only 2 classes (along with a binary protected attribute), making them unsuitable for evaluating incremental learning.

Incremental Learning systems. We report the performance of the following incremental systems: (a) Lw F (Li and Hoiem 2017) is a dynamic architecture with shared and task speciﬁc parameters, with additional parameters being incorporated incrementally for new tasks. Lw F uses a knowledge distillation loss along with the current task loss to prevent catastrophic forgetting; (b) Adversarial Lw F we introduced an adversarial head in Lw F for fair incremental learning that tries to remove protected attribute information via gradient reversal; (c) i Ca RL (Rebufﬁet al. 2017) is an exemplar-based approach that uses a knowledge distillation loss to learn representations. i Ca RL uses a nearest class mean classiﬁer for performing prediction. Joint learning systems. We report the performance of the following joint learning systems, where the system has access to the entire dataset in a single training stage: (a) Ad S (Chowdhury et al. 2021) is an adversarial debiasing framework that maximizes the entropy of discriminator output. (b) Fa RM (Chowdhury and Chaturvedi 2022) is a state-of-theart system for both constrained and unconstrained debiasing, which performs debiasing by controlling the rate-distortion function of representations; (c) Fa IRL (joint). We report the performance of our framework when trained on full data.

Metrics In this section, we discuss the metrics reported. For each metric, we report the average and the value achieved at the ﬁnal training stage. Target Accuracy. We follow (Elazar and Goldberg 2018; Ravfogel et al. 2020; Chowdhury et al. 2021) in evaluating the quality of the learned representations for target task (y) by using a separate probing network. For Biased MNIST, a fair system would be able to generalize to the test set, therefore target accuracy helps measure the fairness of the system. A high accuracy is desired in all settings. For both datasets, we also report group fairness metrics discussed below. Group Fairness Metrics. We evaluate the fairness of representations using the following metrics. A low score on these metrics indicates a fairer system. (a) TPR-GAP. TPR-GAP (De-Arteaga et al. 2019) computes the difference between true positive rates between two protected groups Gapg,y = TPRg,y TPR g,y, where g, g are possible values of the protected attribute. (Romanov et al. 2019) proposed a single fairness score by computing the root mean square of Gapg,y: Gap RMS g = q

1/|Y| P y Y(Gapg,y)2, where Y is the target label set. (b) Demographic Parity (DP). DP measures the difference in target prediction rate w.r.t to protected attribute g. Mathematically, it is expressed as:

y Y |p(ˆy = y|g = g) p(ˆy = y|g = g)| (7)

Zhao and Gordon (2019) illustrated that there is an inherent tradeoff between the utility and fairness in fair representation learning, when y and g are correlated. Accordingly, in our experiments, we observe good fairness scores often result in poor target task performance and vice-versa.

Method p = 0.8 p = 0.85 p = 0.9 p = 0.95 Last Avg. Last Avg. Last Avg. Last Avg.

Incremental Systems

Lw F (Li and Hoiem 2017) 10.3 32.4 10.3 31.5 10.6 31.3 10.3 28.6 Adversarial Lw F 10.3 32.4 10.3 31.9 10.3 27.1 10.3 25.8 i Ca RL (Rebufﬁet al. 2017) 62.8 79.2 58.4 72.4 51.1 70.8 47.5 69.9

Fa IRL (w/ random) 81.7 90.4 77.8 88.2 71.1 83.9 59.3 75.7 Fa IRL (w/ proto.) 80.7 89.8 77.2 87.7 71.0 83.5 57.8 75.3 Fa IRL (w/ submod.) 80.5 89.8 77.6 88.0 72.2 84.4 57.9 73.5

Joint Systems

Fa IRL (joint) 88.08 - 85.64 - 81.94 - 68.85 - Ad S (Chowdhury et al. 2021) 79.98 - 75.39 - 66.46 - 52.49 - Fa RM (Chowdhury and Chaturvedi 2022) 92.44 - 90.54 - 82.55 - 57.09

Table 1: Evaluation accuracy of incremental and joint learning systems on Biased MNIST dataset. Performance of joint learning systems are reported in gray. Fa IRL achieves the best performance among incremental learning baselines (shown in bold). In strongly correlated settings (p = 0.95), Fa IRL is competitive with joint learning setups.

1 2 3 4 5 0

Accuracy (%)

1 2 3 4 5 0

1 2 3 4 5 0

1 2 3 4 5 0

Fa IRL Adv. Lw F Lw F i Ca RL

Figure 4: Test accuracy at different training stages of Fa IRL and baseline incremental learning systems on Biased MNIST dataset. We observe that Fa IRL signiﬁcantly outperforms baseline approaches in all setups.

Results: Biased MNIST In Table 1, we report the performance of Fa IRL and baseline approaches on Biased MNIST dataset. For this dataset, high target accuracy also implies fair decisions as the training sets are biased. We observe that Fa IRL outperforms the incremental learning baselines in all settings (different values of p). Fa IRL with prototype and submodular exemplar selection approaches slightly fall behind random sampling. We believe that as the class samples are skewed towards a color, these sampling approaches may have ended up selecting instances based on their color instead of the digit information. It is also interesting to note that Fa IRL (joint) is competitive with other state-of-the-art approaches Ad S and Fa RM, outperforming them when the color and digit information are strongely correlated (p = 0.95). In this settings (p = 0.95), Fa IRL even in the incremental learning setting outperforms joint learning baselines. This shows that Fa IRL is able to learn robust representations in challenging scenarios where the bias is highly correlated with the target task. We report the fairness metrics in Appendix D for completeness. In Figure 4, we report the performance of incremental learning systems at various training stages. We observe that Lw F suffers from catastrophic forgetting, achieving near random performance in the ﬁnal stages. Adversarial Lw F

achieves a similar performance to Lw F. We believe that the adversarial head doesn t provide an added advantage over Lw F because it may encounter unseen classes of protected attribute (colors) at later training stages. i Ca RL and Fa IRL do not suffer from catastrophic forgetting, and in all settings Fa IRL consistently outperforms other baselines.

Results: Biography Classiﬁcation We present the results of Fa IRL on Biography classiﬁcation in Table 2. We observe that Lw F-based systems achieve poor target performance due to catastrophic forgetting. However, as most of their predictions are incorrect these systems end up with good scores on fairness metrics. Adversarial Lw F performs slightly better than Lw F in terms of target accuracy. i Ca RL achieves the best target accuracy but performs the worst on fairness metrics. Fa IRL provides a good balance between the two traits achieving target accuracy close to i Ca RL while signiﬁcantly improving the fairness metrics. We observe that Fa IRL (joint) is competitive with stateof-the-art debiasing frameworks Ad S and Fa RM. It is interesting to note that incrementally trained Fa IRL achieves better DP scores than jointly trained debiasing frameworks. We report the target accuracy and Gap RMS g metric across training stages. In Figure 5(a), Fa IRL outperforms most base-

Method Accuracy ( ) Fairness DP ( ) Gap RMS g ( ) Last Avg. Last Avg. Last Avg.

Incremental

Lw F 17.9 52.1 0.25 0.30 0.05 0.02 Adv. Lw F 21.1 54.2 0.31 0.36 0.19 0.05 i Ca RL 97.7 99.1 0.45 0.37 0.10 0.05

Fa IRL (rand.) 95.1 97.5 0.42 0.35 0.06 0.03 (w/ proto.) 93.9 96.8 0.40 0.34 0.05 0.03 (w/ submod.) 94.4 97.4 0.41 0.35 0.04 0.02

Fa IRL (joint) 98.5 - 0.43 - 0.06 - Ad S 99.9 - 0.45 - 0.0 - Fa RM 99.9 - 0.42 - 0.0 -

Table 2: Target accuracy and fairness metrics achieved by Fa IRL and other baseline approaches on Biographies dataset. Fa IRL achieves a good balance between target accuracy and fairness metrics.

1 2 3 4 5 6

Accuracy ( )

1 2 3 4 5 6

Gap RMS g ( )

Fa IRL Adv. Lw F Lw F i Ca RL

Figure 5: Evolution of target accuracy and Gap RMS g of models at different training stages. We observe that Fa IRL achieves a ﬁne balance between accuracy and TPR-GAP.

lines, marginally falling short of i Ca RL in the ﬁnal training stages. However, the Gap RMS g metric (in Figure 5(b)) for Fa IRL is much better than i Ca RL. Lw F-based systems also achieve low scores but this is because of underﬁtting as evident from their low target accuracies.

In this section, we perform several analysis experiments to investigate the functioning of Fa IRL. Task ablations. We vary the number of classes that Fa IRL is presented with at a given training stage and report the average accuracy and fairness scores on Biographies dataset. In Table 3, we observe a signiﬁcant drop in target performance when the number of classes (in a training stage) are reduced accompanied by an improvement in DP, reﬂecting the tradeoff between fairness and utility as noted by (Zhao and Gordon 2019). The complete results for all sampling strategies are reported in Appendix C. Visualization. We visualize the UMAP (Mc Innes et al. 2018) feature projections before and after the debiasing pro-

Figure 6: UMAP projections of representations from Fa IRL before and after all the training stages.

# classes Acc. ( ) DP ( ) Gap RMS g ( )

2 89.47 0.26 0.046 5 97.49 0.35 0.032 10 98.57 0.39 0.031

Table 3: Performance of Fa IRL with varying number of classes per training stage. We observe an improved performance when the class count per training stage is increased.

cess in Biographies dataset. The feature vectors are colorcoded according to the protected attribute (gender). In Figure 6, we observe that before debiasing (left) it is easier to distinguish features, and after debiasing (right) features from both gender encompass similar subspaces. We report additional analysis experiments to investigate the memory usage, sample efﬁciency, robustness, and effect of exemplar size on Fa IRL s performance in Appendix C.

In this work, we tackle the problem of learning fair representations in an incremental learning setting. To achieve this, we proposed Fairness-aware Incremental Representation Learning (Fa IRL), a representation learning system that can make fair decisions while learning new tasks by controlling the rate-distortion function of representations. Empirical evaluations show that Fa IRL is able to make fair decisions outperforming prior baselines, even in scenarios where the target and protected attributes are strongly correlated. Through extensive analysis, we observe that the debiasing framework at the core of Fa IRL is able to keep the feature compact, which helps Fa IRL to learn new tasks in an incremental fashion. Our framework, Fa IRL can make fair decisions with incremental access to unseen tasks. Such systems will be crucial for achieving fairness in the wild, as learning systems are increasingly being deployed to critical applications. Future work can focus on developing incrementally trained fair decision-making systems with minimal reliance on protected attribute annotations.

Acknowledgements

This research project was supported in part by Amazon Research Awards.

References Apfelbaum, E. P.; Pauker, K.; Sommers, S. R.; and Ambady, N. 2010. In blind pursuit of racial equality? Psychological science, 21(11): 1587 1592. Bahng, H.; Chun, S.; Yun, S.; Choo, J.; and Oh, S. J. 2020. Learning De-biased Representations with Biased Representations. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, 528 539. PMLR. Barrett, M.; Kementchedjhieva, Y.; Elazar, Y.; Elliott, D.; and Søgaard, A. 2019. Adversarial Removal of Demographic Attributes Revisited. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 6330 6335. Hong Kong, China: Association for Computational Linguistics. Blodgett, S. L.; Green, L.; and O Connor, B. 2016. Demographic Dialectal Variation in Social Media: A Case Study of African-American English. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 1119 1130. Austin, Texas: Association for Computational Linguistics. Bolukbasi, T.; Chang, K.; Zou, J. Y.; Saligrama, V.; and Kalai, A. T. 2016. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In Lee, D. D.; Sugiyama, M.; von Luxburg, U.; Guyon, I.; and Garnett, R., eds., Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, 4349 4357. Castro, F. M.; Mar ın-Jim enez, M. J.; Guil, N.; Schmid, C.; and Alahari, K. 2018. End-to-end incremental learning. In Proceedings of the European conference on computer vision (ECCV), 233 248. Chan, K. H. R.; Yu, Y.; You, C.; Qi, H.; Wright, J.; and Ma, Y. 2021. Redu Net: A white-box deep network from the principle of maximizing rate reduction. Ar Xiv preprint, abs/2105.10446. Chaudhry, A.; Ranzato, M.; Rohrbach, M.; and Elhoseiny, M. 2019a. Efﬁcient Lifelong Learning with A-GEM. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. Open Review.net. Chaudhry, A.; Rohrbach, M.; Elhoseiny, M.; Ajanthan, T.; Dokania, P. K.; Torr, P. H.; and Ranzato, M. 2019b. On tiny episodic memories in continual learning. Ar Xiv preprint, abs/1902.10486. Cheng, P.; Hao, W.; Yuan, S.; Si, S.; and Carin, L. 2020. Fair Fil: Contrastive Neural Debiasing Method for Pretrained Text Encoders. In International Conference on Learning Representations. Chowdhury, S. B. R.; and Chaturvedi, S. 2022. Learning Fair Representations via Rate-Distortion Maximization. Transactions of the Association for Computational Linguistics, 10: 1159 1174.

Chowdhury, S. B. R.; Ghosh, S.; Li, Y.; Oliva, J.; Srivastava, S.; and Chaturvedi, S. 2021. Adversarial Scrubbing of Demographic Information for Text Classiﬁcation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 550 562. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics. Cover, T. M. 1999. Elements of information theory. John Wiley & Sons. Dastin, J. 2018. Amazon scraps secret AI recruiting tool that showed bias against women. In Ethics of Data and Analytics, 296 299. Auerbach Publications. De-Arteaga, M.; Romanov, A.; Wallach, H.; Chayes, J.; Borgs, C.; Chouldechova, A.; Geyik, S.; Kenthapadi, K.; and Kalai, A. T. 2019. Bias in bios: A case study of semantic representation bias in a high-stakes setting. In proceedings of the Conference on Fairness, Accountability, and Transparency, 120 128. Dixon, L.; Li, J.; Sorensen, J.; Thain, N.; and Vasserman, L. 2018. Measuring and mitigating unintended bias in text classiﬁcation. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, 67 73. Elazar, Y.; and Goldberg, Y. 2018. Adversarial Removal of Demographic Attributes from Text Data. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 11 21. Brussels, Belgium: Association for Computational Linguistics. Elazar, Y.; Ravfogel, S.; Jacovi, A.; and Goldberg, Y. 2021. Amnesic probing: Behavioral explanation with amnesic counterfactuals. Transactions of the Association for Computational Linguistics, 9: 160 175. Frieze, A. M. 1974. A cost function property for plant location problems. Mathematical Programming, 7(1): 245 248. Goodfellow, I. J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; and Bengio, Y. 2014. Generative adversarial networks. Ar Xiv preprint, abs/1406.2661. Kirkpatrick, J.; Pascanu, R.; Rabinowitz, N.; Veness, J.; Desjardins, G.; Rusu, A. A.; Milan, K.; Quan, J.; Ramalho, T.; Grabska-Barwinska, A.; et al. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13): 3521 3526. Krause, A.; and Golovin, D. 2014. Submodular function maximization. Tractability, 3: 71 104. Larson, J.; Mattu, S.; Kirchner, L.; and Angwin, J. 2016. How we analyzed the COMPAS recidivism algorithm. Pro Publica (5 2016), 9(1): 3 3. Le Cun, Y.; Bottou, L.; Bengio, Y.; and Haffner, P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): 2278 2324. Li, X.; Zhou, Y.; Wu, T.; Socher, R.; and Xiong, C. 2019. Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting. In Chaudhuri, K.; and Salakhutdinov, R., eds., Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 915 June 2019, Long Beach, California, USA, volume 97

of Proceedings of Machine Learning Research, 3925 3934. PMLR.

Li, Y.; Baldwin, T.; and Cohn, T. 2018. Towards Robust and Privacy-preserving Text Representations. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 25 30. Melbourne, Australia: Association for Computational Linguistics.

Li, Z.; and Hoiem, D. 2017. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12): 2935 2947.

Long, M.; Cao, Y.; Wang, J.; and Jordan, M. I. 2015. Learning Transferable Features with Deep Adaptation Networks. In Bach, F. R.; and Blei, D. M., eds., Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, 97 105. JMLR.org.

Ma, Y.; Derksen, H.; Hong, W.; and Wright, J. 2007. Segmentation of multivariate mixed data via lossy data coding and compression. IEEE transactions on pattern analysis and machine intelligence, 29(9): 1546 1562.

Mc Closkey, M.; and Cohen, N. J. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, 109 165. Elsevier.

Mc Innes, L.; Healy, J.; Saul, N.; and Großberger, L. 2018. UMAP: Uniform Manifold Approximation and Projection. Journal of Open Source Software, 3(29): 861.

Mehrabi, N.; Morstatter, F.; Saxena, N.; Lerman, K.; and Galstyan, A. 2021. A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR), 54(6): 1 35.

Ravfogel, S.; Elazar, Y.; Gonen, H.; Twiton, M.; and Goldberg, Y. 2020. Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 7237 7256. Online: Association for Computational Linguistics.

Rebufﬁ, S.; Kolesnikov, A.; Sperl, G.; and Lampert, C. H. 2017. i Ca RL: Incremental Classiﬁer and Representation Learning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, 5533 5542. IEEE Computer Society.

Rezaei, A.; Liu, A.; Memarrast, O.; and Ziebart, B. D. 2021. Robust fairness under covariate shift. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, volume 35, 9419 9427.

Romanov, A.; De-Arteaga, M.; Wallach, H.; Chayes, J.; Borgs, C.; Chouldechova, A.; Geyik, S.; Kenthapadi, K.; Rumshisky, A.; and Kalai, A. 2019. What s in a Name? Reducing Bias in Bios without Access to Protected Attributes. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4187 4195. Minneapolis, Minnesota: Association for Computational Linguistics.

Rusu, A. A.; Rabinowitz, N. C.; Desjardins, G.; Soyer, H.; Kirkpatrick, J.; Kavukcuoglu, K.; Pascanu, R.; and Hadsell, R. 2016. Progressive neural networks. Ar Xiv preprint, abs/1606.04671. Shah, D. S.; Schwartz, H. A.; and Hovy, D. 2020. Predictive Biases in Natural Language Processing Models: A Conceptual Framework and Overview. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5248 5264. Online: Association for Computational Linguistics. Singh, H.; Singh, R.; Mhasawade, V.; and Chunara, R. 2021. Fairness violations and mitigation under covariate shift. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 3 13. Tong, S.; Dai, X.; Wu, Z.; Li, M.; Yi, B.; and Ma, Y. 2022. Incremental Learning of Structured Memory via Closed Loop Transcription. Ar Xiv preprint, abs/2202.05411. Yu, Y.; Chan, K. H. R.; You, C.; Song, C.; and Ma, Y. 2020. Learning diverse and discriminative representations via the principle of maximal coding rate reduction. Ar Xiv preprint, abs/2006.08558. Zemel, R. S.; Wu, Y.; Swersky, K.; Pitassi, T.; and Dwork, C. 2013. Learning Fair Representations. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, volume 28 of JMLR Workshop and Conference Proceedings, 325 333. JMLR.org. Zenke, F.; Poole, B.; and Ganguli, S. 2017. Continual Learning Through Synaptic Intelligence. In Precup, D.; and Teh, Y. W., eds., Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, 3987 3995. PMLR. Zhang, B. H.; Lemoine, B.; and Mitchell, M. 2018. Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, 335 340. Zhang, W.; Bifet, A.; Zhang, X.; Weiss, J. C.; and Nejdl, W. 2021. Farf: A fair and adaptive random forests classiﬁer. In Paciﬁc-Asia Conference on Knowledge Discovery and Data Mining, 245 256. Springer. Zhang, W.; and Ntoutsi, E. 2019. FAHT: An Adaptive Fairness-aware Decision Tree Classiﬁer. In IJCAI. Zhao, H.; and Gordon, G. J. 2019. Inherent Tradeoffs in Learning Fair Representations. In Wallach, H. M.; Larochelle, H.; Beygelzimer, A.; d Alch e-Buc, F.; Fox, E. B.; and Garnett, R., eds., Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, Neur IPS 2019, December 8-14, 2019, Vancouver, BC, Canada, 15649 15659.