# ensuring_fairness_beyond_the_training_data__5c924136.pdf

Ensuring Fairness Beyond the Training Data

Debmalya Mandal dm3557@columbia.edu Columbia University

Samuel Deng sd3013@columbia.edu Columbia University

Suman Jana suman@cs.columbia.edu Columbia University

Jeannette M. Wing wing@columbia.edu Columbia University

Daniel Hsu djhsu@cs.columbia.edu Columbia University

We initiate the study of fair classiﬁers that are robust to perturbations in the training distribution. Despite recent progress, the literature on fairness has largely ignored the design of fair and robust classiﬁers. In this work, we develop classiﬁers that are fair not only with respect to the training distribution, but also for a class of distributions that are weighted perturbations of the training samples. We formulate a min-max objective function whose goal is to minimize a distributionally robust training loss, and at the same time, ﬁnd a classiﬁer that is fair with respect to a class of distributions. We ﬁrst reduce this problem to ﬁnding a fair classiﬁer that is robust with respect to the class of distributions. Based on online learning algorithm, we develop an iterative algorithm that provably converges to such a fair and robust solution. Experiments on standard machine learning fairness datasets suggest that, compared to the state-of-the-art fair classiﬁers, our classiﬁer retains fairness guarantees and test accuracy for a large class of perturbations on the test set. Furthermore, our experiments show that there is an inherent trade-off between fairness robustness and accuracy of such classiﬁers.

1 Introduction

Machine learning (ML) systems are often used for high-stakes decision-making, including bail decision and credit approval. Often these applications use algorithms trained on past biased data, and such bias is reﬂected in the eventual decisions made by the algorithms. For example, Bolukbasi et al. [9] show that popular word embeddings implicitly encode societal biases, such as gender norms. Similarly, Buolamwini and Gebru [10] ﬁnd that several facial recognition softwares perform better on lighter-skinned subjects than on darker-skinned subjects. To mitigate such biases, there have been several approaches in the ML fairness community to design fair classiﬁers [4, 19, 35].

However, the literature has largely ignored the robustness of such fair classiﬁers. The fairness of such classiﬁers are often evaluated on the sampled datasets, and are often unreliable because of various reasons including biased samples, missing and/or noisy attributes. Moreover, compared to the traditional machine learning setting, these problems are more prevalent in the fairness domain, as the data itself is biased to begin with. As an example, we consider how the optimized pre-processing algorithm [11] performs on Pro Publica s COMPAS dataset [1] in terms of demographic parity (DP), which measures the difference in accuracy between the protected groups. Figure 1 shows two situations (1) unweighted training distribution (in blue), and (2) weighted training distributions (in red). The optimized pre-processing algorithm [11] yields a classiﬁer that is almost fair on the unweighted training set (DP 0.02). However, it has DP of at least 0.2 on the weighted set, despite the fact that the marginal distributions of the features look almost the same for the two scenarios.

34th Conference on Neural Information Processing Systems (Neur IPS 2020), Vancouver, Canada.

Figure 1: Unweighted vs Reweighted COMPAS dataset. The marginals of the two distributions are almost the same, but standard fair classiﬁers show demographic parity of at least 0.2 on the reweighted dataset.

This example motivates us to design a fair classiﬁer that is robust to such perturbations. We also show how to construct such weighted examples using a few linear programs.

Contributions: In this work, we initiate the study of fair classiﬁers that are robust to perturbations in the training distribution. The set of perturbed distributions are given by any arbitrary weighted combinations of the training dataset, say W. Our main contributions are the following:

We develop classiﬁers that are fair not only with respect to the training distribution, but also for the class of distributions characterized by W. We formulate a min-max objective whose goal is to minimize a distributionally robust training loss, and simultaneously, ﬁnd a classiﬁer that is fair with respect to the entire class. We ﬁrst reduce this problem to ﬁnding a fair classiﬁer that is robust with respect to the class of distributions. Based on online learning algorithm, we develop an iterative algorithm that provably converges to such a fair and robust solution. Experiments on standard machine learning fairness datasets suggest that, compared to the state-ofthe-art fair classiﬁers, our classiﬁer retains fairness guarantees and test accuracy for a large class of perturbations on the test set. Furthermore, our experiments show that there is an inherent trade-off between fairness robustness and accuracy of such classiﬁers.

Related Work: Numerous proposals have been laid out to capture bias and discrimination in settings where decisions are delegated to algorithms. Such formalization of fairness can be statistical [14, 19 21, 28], individual [13, 31], causal [23, 25, 36], and even procedural [18]. We restrict attention to statistical fairness, which ﬁx a small number of groups in the population and then compare some statistic (e.g., accuracy, false positive rate) across these groups. We mainly consider the notion of demographic parity [14, 20, 21] and equalized odds [19] in this paper, but our method of designing robust and fair classiﬁers can be adapted to any type of statistical fairness.

On the other hand, there are three main approaches for designing a fair classiﬁer. The pre-processing approach tries to transform training data and leverage standard classiﬁers [11, 14, 20, 35]. The in-processing approach, on the other hand, directly modiﬁes the learning algorithm to meet the fairness criteria [4, 15, 22, 34]. The post-processing approach, however, modiﬁes the decisions of a classiﬁer [19, 28] to make it fair. Ours is an in-processing approach and mostly related to [4, 5, 22]. Agarwal et al. [4] and Alabi et al. [5] show how binary classiﬁcation problem with group fairness constraints can be reduced to a sequence of cost-sensitive classiﬁcation problems. Kearns et al. [22] follow a similar approach, but instead consider a combinatorial class of subgroup fairness constraints. Recently, [7] integrated and implemented a range of such fair classiﬁers in a Git Hub project, which we leverage in our work.

In terms of technique, our paper falls in the category of distributionally robust optimization (DRO), where the goal is to minimize the worst-case training loss for any distribution that is close to the training distribution by some metric. Various types of metrics have been considered including bounded f-divergence [8, 26], Wasserstein distance [2, 17], etc. To the best of our knowledge, prior literature has largely ignored enforcing constraints such as fairness in a distributionally robust sense. Further aﬁeld, our work has similarity with recent work in fairness testing inspired by the literature on program veriﬁcation [6, 16, 32]. These papers attempt to automatically discover discrimination in decision-making programs, whereas we develop tools based on linear program to discover distributions that expose potential unfairness.

2 Problem and Deﬁnitions

We will write ((x, a), y) to denote a training instance where a A denotes the protected attributes, x X denotes all the remaining attributes, and y {0, 1} denotes the outcome label. For a hypothesis h, h(x, a) {0, 1} denotes the outcome predicted by it, on an input (x, a). We assume that the set of hypothesis is given by a class H. Given a loss function ℓ: {0, 1} {0, 1} R, the goal of a standard fair classiﬁer is to ﬁnd a hypothesis h H that minimizes the training loss Pn i=1 ℓ(h(xi, ai), yi) and is also fair according to some notion of fairness.

We aim to design classiﬁers that are fair with respect to a class of distributions that are weighted perturbations of the training distribution. Let W = {w Rn + : P

i wi = 1} be the set of all possible weights. For a hypothesis h and weight w, we deﬁne the weighted empirical risk, ℓ(h, w) = Pn i=1 wiℓ(h(xi, ai), yi). We will write δw F (h) to deﬁne the unfairness gap with respect to the weighted empirical distribution deﬁned by the weight w and fairness constraint F (e.g., demographic parity (DP) or equalized odds (EO)). For example, δw DP (h) is deﬁned as

δw DP (h) = max a,a A

i:ai=a wih(xi, a) P

P i:ai=a wih(xi, a ) P

Therefore, δw DP (h) measures the maximum weighted difference in acceptance rates between the two groups with respect to the distribution that assigns weight w to the training examples. On the other hand, δw EO(h) = 1/2(δw EO(h|0) + δ2 EO(h|1))1, where δw EO(h|y) is deﬁned as

δw EO(h|y) = max a,a A

P i:ai=a,yi=y wih(xi, a) P i:ai=a,yi=y wi

P i:ai=a ,yi=y wih(xi, a ) P i:ai=a ,yi=y wi

Therefore, δw EO(h|0) (resp., δw EO(h|1)) measures the weighted difference in false (resp., true) positive rates between the two groups with respect to the weight w. We will develop our theory using DP as an example of a notion of fairness, but our experimental results will concern both DP and EO.

We are now ready to formally deﬁne our main objective. For a class of hypothesis H, let HW = {h H : δw F (h) ϵ w W} be the set of hypothesis that are ϵ-fair with respect to all the weights in the set W. Our goal is to solve the following min-max problem:

min h HW max w W ℓ(h, w) (2)

Therefore, we aim to minimize a robust loss with respect to a class of distributions indexed by W. Additionally, we also aim to ﬁnd a classiﬁer that is fair with respect to such perturbations.

We allow our algorithm to output a randomized classiﬁer, i.e., a distribution over the hypothesis H. This is necessary if the space H is non-convex or if the fairness constraints are such that the set of feasible hypothesis HW is non-convex. For a randomized classiﬁer µ, its weighted empirical risk is ℓ(µ, w) = P

h µ(h)ℓ(h, w), and its expected unfairness gap is δw F (µ) = P

h µ(h)δw F (h).

Our algorithm follows a top-down fashion. First we design a meta algorithm that reduces the min-max problem of Equation (2) to a loss minimization problem with respect to a sequence of weight vectors. Then we show how we can design a fair classiﬁer that performs well with respect a ﬁxed weight vector w W in terms of accuracy, but is fair with respect to the entire set of weights W.

3.1 Meta Algorithm

Algorithm 1 provides a meta algorithm to solve the min-max optimization problem deﬁned in Equation (2). The algorithm is based on ideas presented in [12], which, given an α-approximate Bayesian oracle for distributions over loss functions, provides an α-approximate robust solution. The algorithm can be viewed as a two-player zero-sum game between the learner who picks the

1We consider the average of false positive rate and true positive rate for simplicity, and our method can also handle more general deﬁnitions of EO. [28]

ALGORITHM 1: Meta-Algorithm Input: Training Set: {xi, ai, yi}n i=1, set of weights: W, hypothesis class H, parameters T and η. Set η = p

2/Tm and w0(i) = 1/n for all i [n] h0 = Apx Fair(w0) /* Approximate solution of arg minh HW Pn i=1 ℓ(h(xi, ai), yi). */ for each t [Tm] do

wt = wt 1 + η wℓ(ht 1, wt 1) wt = ΠW(wt) /* Project wt onto the set of weights W. */ ht = Apx Fair(wt) /* Approximate solution of minh HW Pn i=1 wt(i)ℓ(h(xi, ai), yi)]. */ end Output: hf: Uniform distribution over {h0, h1, . . . , h T }.

hypothesis ht, and an adversary who picks the weight vector wt. The adversary performs a projected gradient descent every step to compute the best response. On the other hand, the learner solves a fair classiﬁcation problem to pick a hypothesis which is fair with respect to the weights W and minimizes weighted empirical risk with respect to the weight wt. However, it is infeasible to compute an exact optima of the problem minh HW Pn i=1 wt(i)ℓ(h(xi, ai), yi)]. So the learner uses an approximate fair classiﬁer Apx Fair( ), which we deﬁne next.

Deﬁnition 1. Apx Fair( ) is an α-approximate fair classiﬁer, if for any weight w Rn +, Apx Fair(w) returns a hypothesis bh such that

i=1 wiℓ(bh(xi, ai), yi) min h HW

i=1 wiℓ(h(xi, ai), yi) + α.

Using the α-approximate fair classiﬁer, we have the following guarantee on the output of Algorithm 1.

Theorem 1. Suppose the loss function ℓ( , ) is convex in its ﬁrst argument and Apx Fair( ) is an α-approximate fair classiﬁer. Then, the hypothesis hf, output by Algorithm 1 satisﬁes

max w W Eh hf

i=1 wiℓ(h(xi, ai), yi)

min h HW max w W ℓ(h, w) + r

The proof uses ideas from [12], except that we use an additive approximate best response.2

3.2 Approximate Fair Classiﬁer

We now develop an α-approximate fair and robust classiﬁer. For the remainder of this subsection, let us assume that the meta algorithm (Algorithm 1) has called the Apx Fair( ) with a weight vector w0 and our goal is to design a classiﬁer that minimizes weighted empirical risk with respect to the weight w0, but is fair with respect to the set of all weights W, i.e., ﬁnd f arg minh HW ℓ(h, w0). Our method applies the following three steps.

1. Discretize the set of weights W, so that it is sufﬁcient to design an approximate fair classiﬁer with respect to the set of discretized weights. In particular, if we discretize each weight up to a multiplicative error ϵ, then developing an α-approximate fair classiﬁer with respect to the discretized weights gives O(α + ϵ)-fair classiﬁer with respect to the set W. 2. Introduce a Lagrangian multiplier for each fairness constraint i.e. for each of the discretized weights, and pair of protected attributes. This lets us set up a two-player zero-sum game for the problem of designing an approximate fair classiﬁer with respect to the set of discretized weights. Here, the learner chooses a hypothesis, whereas an adversary picks the most unfair weight in the set of discretized weights. 3. Design a learning algorithm for the learner s learning algorithm, and design an approximate solution to the adversary s best response to the learner s chosen hypothesis. This lets us write an iterative algorithm where at every step, the learner choosed a hypothesis, and the adversary adjusts the Lagrangian multipliers corresponding to the most violating fairness constraints.

2All the omitted proofs are provided in the supplementary material.

We point out that Agarwal et al. [4] was the ﬁrst to show that the design of a fair classiﬁer can be formulated as a two-player zero-sum game (step 2). However, they only considered group-fairness constraints with respect to the training distribution. The algorithm of Alabi et al. [5] has similar limitations. On the other hand, we consider the design of robust and fair classiﬁer and had to include an additional discretization step (1). Finally, the design of our learning algorithm and the best response oracle is signiﬁcantly different than [4, 5, 22].

3.2.1 Discretization of the Weights

We ﬁrst discretize the set of weights W as follows. Divide the interval [0, 1] into buckets B0 = [0, δ), Bj+1 = [(1 + γ1)jδ, (1 + γ1)j+1δ) for j = 0, 1, . . . , M 1 for M = log1+γ1(1/δ)) . For any weight w W, construct a new weight w = (w 1, . . . , w n) by setting w i to be the upper-end point of the bucket containing wi, for each i [n].

We now substitute δ = γ1

2n and write N(γ1, W) to denote the set containing all the discretized weights of the set W. The next lemma shows that a fair classiﬁer for the set of weights N(γ1, W), is also a fair classiﬁer for the set of weights W up to an error 4γ1. Lemma 1. If w N(γ1, W), δw DP (f) ϵ, then we have δw DP (f) ϵ + 4γ1 for any w W.

Therefore, in order to ensure that δw DP (f) ε we discretize the set of weights W and enforce ε 4γ1 fairness for all the weights in the set N(γ1, W). This result makes our work easier as we need to guarantee fairness with respect to a ﬁnite set of weights N(γ1, W), instead of a large and continuous set of weights W. However, note that, the number of weights in N(γ1, W) can be O logn 1+γ1(2n/γ1) , which is exponential in n. We next see how to avoid this problem.

3.3 Setting up a Two-Player Zero-Sum Game

We formulate the problem of designing a fair and robust classiﬁer with respect to the set of weights

in N(γ1, W) as a two-player zero-sum game. Let R(w, a, f) =

i:ai=a wif(xi,a) P

i:ai=a wi . Then δw DP (f) =

supa,a |R(w, a, f) R(w, a , f)|. Our aim is to solve the following problem.

i=1 w0 i ℓ(h(xi, ai), yi) (3)

s.t. R(w, a, h) R(w, a , h) ε 4γ1 w N(γ1, W) a, a A

We form the following Lagrangian.

min h H max λ 1 B

i=1 w0 i ℓ(h(xi, ai), yi) + X

a,a A λa,a w (R(w, a, h) R(w, a , h) ε+4γ1). (4)

Notice that we restrict the ℓ1-norm of the Lagrangian multipliers by the parameter B. We will later see how to choose this parameter B. We ﬁrst convert the optimization problem deﬁne in Equation (4) as a two-player zero-sum game. Here the learner s pure strategy is to play a hypothesis h in H. Given the learner s hypothesis h H, the adversary picks the constraint (weight w and groups a, a ) that violates fairness the most and sets the corresponding coordinate of λ to B. Therefore, for a ﬁxed hypothesis h, it is sufﬁcient for the adversary to play a vector λ such that either all the coordinates of λ are zero or exactly one is set to B. For such a pair (h, λ) of hypothesis and Largangian multipliers, we deﬁne the payoff matrix as

i=1 w0 i ℓ(h(xi, ai), yi) + X

a,a A λa,a w (R(w, a, h) R(w, a , h) ε + 4γ1)

Now our goal is to compute a ν-approximate minimax equilibrium of this game. In the next subsection, we design an algorithm based on online learning. The algorithm uses best responses of the hand λ-players, which we discuss next.

Best response of the h-player: For each i [n], we introduce the following notation

λai,a w λa ,ai w wi P

With this notation, the payoff becomes

i=1 w0 i ℓ(h(xi, ai), yi) + ih(xi, ai) (ε 4γ1) X

a,a A λa,a w

Let us introduce the following costs.

c0 i = ℓ(0, 1)w0 i if yi = 1 ℓ(0, 0)w0 i if yi = 0 c1 i = ℓ(1, 1)w0 i + i if yi = 1 ℓ(1, 0)w0 i + i if yi = 0 (5)

Then the h-player s best response becomes the following cost-sensitive classiﬁcation problem.

bh arg min h H

c1 i h(xi, ai) + c0 i (1 h(xi, ai)) (6)

Therefore, as long as we have access to an oracle for the cost-sensitive classiﬁcation problem, the h-player can compute its best response. Note that, the notion of a cost-sensitive classiﬁcation as an oracle was also used by [4, 22]. In general, solving this problem is NP-hard, but there are several efﬁcient heuristics that perform well in practice. We provide further details about how we implement this oracle in the section devoted to the experiments.

Best response of the λ-player: Since the fairness constraints depend on the weights non-linearly (e.g., see Eq. (1)), ﬁnding the most violating constraint is a non-linear optimization problem. However, we can guess the marginal probabilities over the protected groups. If we are correct, then the most violating weight vector can be found by a linear program. Since we cannot exactly guess this particular value, we instead discretize the set of marginal probabilities, iterate over them, and choose the option with largest violation in fairness.

This intuition can be formalized as follows. We discretize the set of all marginals over |A| groups by the following rule. First discretize [0, 1] as 0, δ, (1 + γ2)jδ for j = 1, 2, . . . , M for M = O(log1+γ2(1/δ)). This discretizes [0, 1]A into M |A| points, and then retain the discretized marginals whose total sum is at most 1+γ2, and discard all other points. Let us denote the set of such marginals as Π(γ2, A). Algorithm 2 goes through all the marginals π in Π(γ2, A) and for each such tuple and a pair of groups a, a ﬁnds the weight w which maximizes R(w, a, h) R(w, a , h). Note that this can be solved using a linear Program as the weights assigned to a group is ﬁxed by the marginal tuple π. Out of all the solutions, the algorithm picks the one with the maximum value. Then it checks whether this maximum violates the constraint (i.e., greater than ε). If so, it sets the corresponding λ value to B and everything else to 0. Otherwise, it returns the zero vector. As the weight returned by the LP need not correspond to a weight in N(γ1, W), it rounds the weight to the nearest weight in N(γ1, W). For discretizing the marginals we will set δ = (1 + γ2) γ1

n , which implies that the number

of LPs run by Algorithm 2 is at most O log|A| 1+γ2 n (1+γ2)γ1

= O(poly(log n)), as the number of groups is ﬁxed. Lemma 2. Algorithm 2 is an B(4γ1 + γ2)-approximate best response for the λ-player i.e., for any h H, it returns λ such that U(h, λ ) maxλ U(h, λ) B(4γ1 + γ2).

Learning Algorithm: We now introduce our algorithm for the problem deﬁned in Equation (4). In this algorithm, the λ-player uses Algorithm 2 to compute an approximate best response, whereas the h-player uses Regularized Follow the Leader (RFTL) algorithm [3, 30] as its learning algorithm. RFTL is a classical algorithm for online convex optimization (OCO). In OCO, the decision maker takes a decision xt K at round t, an adversary reveals a convex loss function ft : K R, and the decision maker suffers a loss of ft(xt). The goal is to minimize regret, which is deﬁned as maxu K{PT t=1 ft(xt) ft(u)}, i.e., the difference between the loss suffered by the learner and the best ﬁxed decision. RFTL requires a strongly convex regularization function R : K R 0, and chooses xt according to the following rule:

xt = arg min x K η

s=1 fs(xs)T x + R(x).

We use RFTL in our learning algorithm as follows. We set the regularization function R(x) = 1/2 x 2 2, and loss function ft(ht) = U(ht, λt) where λt is the approximate best-response to ht.

ALGORITHM 2: Best Response of the λ-player Input: Training Set: {xi, ai, yi}n i=1, and hypothesis h H. for each π Π(γ2, A) and a, a A do

Solve the following LP:

w(a, a , π) = arg max w 1 πa

i:ai=a wih(xi, a) 1 πa

i:ai=a wih(xi, a )

i:ai=a wi = πa X

i:ai=a wi = πa wi 0 i [n]

Set val(a, a , π) = 1 πa P

i:ai=a w(a, a , π)ih(xi, a) 1 πa P

i:ai=a w(a, a , π)ih(xi, a ) end Set (a 1, a 2, π ) = arg maxa,a ,π val(a, a , π) if val(a 1, a 2, π ) > ε then Let w = w(a 1, a 2, π ). For each i [n], let w i be the upper-end point of the bucket containing wi.

return λa,a w = B if (a, a , w) = (a 1, a 2, w ) 0 o.w. else

Therefore, at iteration t the learner needs to solve the following optimization problem.

ht arg min h H η

s=1 hs U(hs, λs)T h + 1

2 h 2 2. (7)

Here with slight abuse of notation we write H to include the set of randomized classiﬁers, so that h(xi, ai) is interpreted as the probability that hypothesis h outputs 1 on an input (xi, ai). Now we show that the optimization problem (Eq. (7)) can be solved as a cost-sensitive classiﬁcation problem. For a given λs, the best response of the learner is the following:

bh arg min h H

i=1 c1 i (λs)h(xi, ai) + c0 i (λs)(1 h(xi, ai))

Writing Li(λs) = c1 i (λs) c0 i (λs), the objective becomes Pn i=1 Li(λs)h(xi, ai). Hence, hs U(hs, λs) is linear in hs and equals the vector {Li(λs)}n i=1. With this observation, the objective in Equation (7) becomes

i=1 L(λs)h(xi, ai) + 1

i=1 (h(xi, ai))2

h(xi, ai) + 1

i=1 h(xi, ai) =

The ﬁrst inequality follows from two observations L(λ) is linear in λ, and, since the predictions h(xi, ai) [0, 1] we replace the quadratic term by a linear term, an upper bound.3

Finally, we observe that even though the number of weights in N(γ1, W) is exponential in n, Algorithm 3 can be efﬁciently implemented. This is because the best response of the λ-player always returns a solution where all the entries are zero or exactly one is set to B. Therefore, instead of recording the entire λ vector the algorithm can just record the non-zero variables and there will be at most T of them. The next lemma provides performance guarantees of Algorithm 3. Theorem 2. Given a desired fairness level ε, if Algorithm 3 is run for T = O n

ε2 rounds, then the ensemble hypothesis bh provides the following guarantee: n X

i=1 w0 i ℓ(bh(xi, ai), yi) min h H

i=1 w0 i ℓ(h(xi, ai), yi) + O(ε) and δw DP (bh) 2ε w W.

3Without this relaxation we will have to solve a regularized version of cost-sensitive classiﬁcation.

ALGORITHM 3: Approximate Fair Classiﬁer (Apx Fair)

Input: η > 0, weight w0 Rn +, number of rounds T Set h1 = 0 for t [T] do

λt = Bestλ(ht) Set eλt = Pt t =1 λt ht+1 = arg minh H Pn i=1(ηLi(eλt) + 1/2)h(xi, ai) end return Uniform distribution over {h1, . . . , h T }.

4 Experiments

We used the following four datasets for our experiments.

Adult. In this dataset [24], each example represents an adult individual, the outcome variable is whether that individual makes over $50k a year, and the protected attribute is gender. We work with a balanced and preprocessed version with 2,020 examples and 98 features, selected from the original 48,882 examples. Communities and Crime. In this dataset from the UCI repository [29], each example represents a community. The outcome variable is whether the community has a violent crime rate in the 70th percentile of all communities, and the protected attribute is whether the community has a majority white population. We used the full dataset of 1,994 examples and 123 features. Law School. Here each example represents a law student, the outcome variable is whether the law student passed the bar exam or not, and the protected attribute is race (white or not white). We used a preprocessed and balanced subset with 1,823 examples and 17 features [33]. COMPAS. In this dataset, each example represents a defendant. The outcome variable is whether a certain individual will recidivate, and the protected attribute is race (white or black). We used a 2,000 example sample from the full dataset.

For Adult, Communities and Crime, and Law School we used the preprocessed versions found in the accompanying Git Hub repo of [22]4. For COMPAS, we used a sample from the original dataset [1].

In order to evaluate different fair classiﬁers, we ﬁrst split each dataset into ﬁve different random 80%-20% train-test splits. Then, we split each training set further into a 80%-20% train and validation sets. Therefore, there were ﬁve random sets of 64%-16%-20% train-validation-test split. For each split, we used the validation set to select the hyperparameters, train set to build a model, and the test set to evaluate its performance (fairness and accuracy). Finally we aggregated these metrics across the ﬁve different test sets to obtain average performance.

We compared our algorithm to: a pre-processing method of [20], an in-processing method of [4], a post-processing method of [19]. For our algorithm 5, we use scikit-learn s logistic regression [27] as the learning algorithm in Algorithm 3. We also show the performance of unconstrained logistic regression. To ﬁnd the correct hyper-parameters (B, η, T, and Tm) for our algorithm, we ﬁxed T = 10 for EO, and T = 5 for DP, and used grid search for the hyper-parameters B, η, and Tm. The tested values were {0.1, 0, 2, . . . , 1} for B, {0, 0.05, . . . , 1} for η, and {100, 200, . . . , 2000} for Tm.

Results. We computed the maximum violating weight by solving a LP that is the same as the one used by the best response oracle (Algorithm 2), except that we restrict individual weights to be in the range [(1 ϵ)/n, (1+ϵ)/n], and keep protected group marginals the same. This keeps the ℓ1 distance between weighted and unweighted distributions within ϵ. Figure 2 compares the robustness of our classiﬁer against the other fair classiﬁers, and we see that for both DP and EO, the fairness violation of our classiﬁer grows more slowly as ϵ increases, compared to the others, suggesting robustness to ϵ-perturbations of the distribution. Our algorithm also performs comparatively well in both accuracy and fairness violation to the existing fair classiﬁers, though there is a trade-off between robustness and test accuracy. The unweighted test accuracy of our algorithm drops by at most 5%-10% on all

4https://github.com/algowatchpenn/Gerry Fair 5Our code is available at this Git Hub repo: https://github.com/essdeee/ Ensuring-Fairness-Beyond-the-Training-Data.

Figure 2: DP and EO Comparison. We vary the ℓ1 distance ϵ on the x-axis and plot the fairness violation on the y-axis. We use ﬁve random 80%-20% train-test splits to evaluate test accuracy and fairness. The bands across each line show standard error. For both DP and EO fairness, our algorithm is signiﬁcantly more robust to reweightings that are within ℓ1 distance ϵ on most datasets.

datasets, suggesting that robustness comes at the expense of test accuracy on the original distribution. However, on the test set (which is typically obtained from the same source as the original training data), the difference in fairness violation between our method and other methods is almost negligible on all the datasets, except for the COMPAS dataset, where the difference it at most 12%. See the supplementary material for full details of this trade-off.

5 Conclusion and Future Work

In this work, we study the design of fair classiﬁers that are robust to weighted perturbations of the dataset. An immediate future work is to consider robustness against a broader class of distributions like the set of distributions with a bounded f-divergence or Wasserstein distance from the training distribution. We also considered statistical notions of fairness and it would be interesting to perform a similar fairness vs robustness analysis for other notions of fairness.

6 Broader Impact

This paper addresses a topic of societal interest that has received considerable attention in the Neur IPS community over the past several years. Two anticipated positive outcomes of our work include: (i) increased awareness of the dataset robustness concerns when evaluating fairness criteria, and (ii) algorithmic techniques for coping with these robustness concerns. Thus, we believe researchers and practitioners who seek to develop classiﬁers that satisfy certain fairness criteria would be the primary beneﬁciaries of this work. A possible negative outcome of the work is the use of the techniques to declare some classiﬁers as fair without proper consideration of the semantics of the fairness criteria or the underlying dataset used to evaluate these criteria. Thus, individuals subject to the decisions made by such classiﬁers could be adversely affected. Finally, our experimental evaluations use real world datasets, so the conclusions that we draw in our paper may have implications for individuals associated with these data.

Acknowledgments: We thank Shipra Agrawal and Roxana Geambasu for helpful preliminary discussions. DM was supported through a Columbia Data Science Institute Post-Doctoral Fellowship. DH was partially supported by NSF awards CCF-1740833 and IIS-15-63785 as well as a Sloan Fellowship. SJ was partially supported by NSF award CNS-18-01426.

[1] Compas dataset. https://www.propublica.org/datastore/dataset/ compas-recidivism-risk-score-data-and-analysis, 2019. Accessed: 2019-10-26.

[2] Soroosh Shaﬁeezadeh Abadeh, Peyman Mohajerin Mohajerin Esfahani, and Daniel Kuhn. Distributionally robust logistic regression. In Advances in Neural Information Processing Systems, pages 1576 1584, 2015.

[3] Jacob Abernethy, Elad E Hazan, and Alexander Rakhlin. Competing in the dark: An efﬁcient algorithm for bandit linear optimization. In 21st Annual Conference on Learning Theory, pages 263 273, 2008.

[4] Alekh Agarwal, Alina Beygelzimer, Miroslav Dudik, John Langford, and Hanna Wallach. A reductions approach to fair classiﬁcation. In International Conference on Machine Learning, pages 60 69, 2018.

[5] Daniel Alabi, Nicole Immorlica, and Adam Kalai. Unleashing linear optimizers for group-fair learning and optimization. In Conference on Learning Theory, pages 2043 2066, 2018.

[6] Aws Albarghouthi, Loris D Antoni, Samuel Drews, and Aditya V Nori. Fairsquare: probabilistic veriﬁcation of program fairness. Proceedings of the ACM on Programming Languages, 1 (OOPSLA):1 30, 2017.

[7] Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, et al. Ai fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. ar Xiv preprint ar Xiv:1810.01943, 2018.

[8] Aharon Ben-Tal, Dick Den Hertog, Anja De Waegenaere, Bertrand Melenberg, and Gijs Rennen. Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 59(2):341 357, 2013.

[9] Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In Advances in Neural Information Processing Systems, pages 4349 4357, 2016.

[10] Joy Buolamwini and Timnit Gebru. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classiﬁcation. In Conference on Fairness, Accountability and Transparency, pages 77 91, 2018.

[11] Flavio Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan Ramamurthy, and Kush R Varshney. Optimized pre-processing for discrimination prevention. In Advances in Neural Information Processing Systems 30, pages 3992 4001, 2017.

[12] Robert S Chen, Brendan Lucier, Yaron Singer, and Vasilis Syrgkanis. Robust Optimization for Non-Convex Objectives. In Advances in Neural Information Processing Systems, pages 4705 4714, 2017.

[13] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, pages 214 226, 2012.

[14] Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 259 268, 2015.

[15] Benjamin Fish, Jeremy Kun, and Ádám D Lelkes. A conﬁdence-based approach for balancing fairness and accuracy. In Proceedings of the 2016 SIAM International Conference on Data Mining, pages 144 152. SIAM, 2016.

[16] Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. Fairness testing: testing software for discrimination. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, pages 498 510, 2017.

[17] Rui Gao and Anton J Kleywegt. Distributionally robust stochastic optimization with wasserstein distance. ar Xiv preprint ar Xiv:1604.02199, 2016.

[18] Nina Grgic-Hlaca, Muhammad Bilal Zafar, Krishna P Gummadi, and Adrian Weller. The case for process fairness in learning: Feature selection for fair decision making. In NIPS Symposium on Machine Learning and the Law, volume 1, page 2, 2016.

[19] Moritz Hardt, Eric Price, Nati Srebro, et al. Equality of Opportunity in Supervised Learning. In Advances in Neural Information Processing Systems, pages 3315 3323, 2016.

[20] Faisal Kamiran and Toon Calders. Data preprocessing techniques for classiﬁcation without discrimination. Knowledge and Information Systems, 33(1):1 33, 2012.

[21] Toshihiro Kamishima, Shotaro Akaho, and Jun Sakuma. Fairness-aware learning through regularization approach. In 2011 IEEE 11th International Conference on Data Mining Workshops, pages 643 650. IEEE, 2011.

[22] Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In International Conference on Machine Learning, pages 2564 2572, 2018.

[23] Niki Kilbertus, Mateo Rojas Carulla, Giambattista Parascandolo, Moritz Hardt, Dominik Janzing, and Bernhard Schölkopf. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems, pages 656 666, 2017.

[24] Ron Kohavi. Scaling up the accuracy of naive-bayes classiﬁers: A decision-tree hybrid. In Kdd, volume 96, pages 202 207, 1996.

[25] Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. In Advances in Neural Information Processing Systems, pages 4066 4076, 2017.

[26] Hongseok Namkoong and John C Duchi. Stochastic gradient methods for distributionally robust optimization with f-divergences. In Advances in Neural Information Processing Systems, pages 2208 2216, 2016.

[27] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825 2830, 2011.

[28] Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q Weinberger. On fairness and calibration. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 5680 5689. Curran Associates, Inc., 2017.

[29] Michael Redmond and Alok Baveja. A data-driven software tool for enabling cooperative information sharing among police departments. European Journal of Operational Research, 141(3):660 678, 2002.

[30] Shai Shalev-Shwartz. Online learning: Theory, algorithms, and applications. Ph D thesis, The Hebrew University of Jerusalem, 2007.

[31] Saeed Shariﬁ-Malvajerdi, Michael Kearns, and Aaron Roth. Average individual fairness: Algorithms, generalization and experiments. In Advances in Neural Information Processing Systems, pages 8240 8249, 2019.

[32] Florian Tramer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, Jean-Pierre Hubaux, Mathias Humbert, Ari Juels, and Huang Lin. Fairtest: Discovering unwarranted associations in data-driven applications. In 2017 IEEE European Symposium on Security and Privacy (Euro S&P), pages 401 416. IEEE, 2017.

[33] Linda F Wightman. LSAC national longitudinal bar passage study. Technical report, LSAC Research Report Series, 1998.

[34] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. Fairness beyond disparate treatment & disparate impact: Learning classiﬁcation without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web, pages 1171 1180, 2017.

[35] Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning Fair Representations. In International Conference on Machine Learning, pages 325 333, 2013.

[36] Junzhe Zhang and Elias Bareinboim. Fairness in decision-making the causal explanation formula. In Thirty-Second AAAI Conference on Artiﬁcial Intelligence, 2018.