# ensuring_fairness_beyond_the_training_data__5c924136.pdf Ensuring Fairness Beyond the Training Data Debmalya Mandal dm3557@columbia.edu Columbia University Samuel Deng sd3013@columbia.edu Columbia University Suman Jana suman@cs.columbia.edu Columbia University Jeannette M. Wing wing@columbia.edu Columbia University Daniel Hsu djhsu@cs.columbia.edu Columbia University We initiate the study of fair classifiers that are robust to perturbations in the training distribution. Despite recent progress, the literature on fairness has largely ignored the design of fair and robust classifiers. In this work, we develop classifiers that are fair not only with respect to the training distribution, but also for a class of distributions that are weighted perturbations of the training samples. We formulate a min-max objective function whose goal is to minimize a distributionally robust training loss, and at the same time, find a classifier that is fair with respect to a class of distributions. We first reduce this problem to finding a fair classifier that is robust with respect to the class of distributions. Based on online learning algorithm, we develop an iterative algorithm that provably converges to such a fair and robust solution. Experiments on standard machine learning fairness datasets suggest that, compared to the state-of-the-art fair classifiers, our classifier retains fairness guarantees and test accuracy for a large class of perturbations on the test set. Furthermore, our experiments show that there is an inherent trade-off between fairness robustness and accuracy of such classifiers. 1 Introduction Machine learning (ML) systems are often used for high-stakes decision-making, including bail decision and credit approval. Often these applications use algorithms trained on past biased data, and such bias is reflected in the eventual decisions made by the algorithms. For example, Bolukbasi et al. [9] show that popular word embeddings implicitly encode societal biases, such as gender norms. Similarly, Buolamwini and Gebru [10] find that several facial recognition softwares perform better on lighter-skinned subjects than on darker-skinned subjects. To mitigate such biases, there have been several approaches in the ML fairness community to design fair classifiers [4, 19, 35]. However, the literature has largely ignored the robustness of such fair classifiers. The fairness of such classifiers are often evaluated on the sampled datasets, and are often unreliable because of various reasons including biased samples, missing and/or noisy attributes. Moreover, compared to the traditional machine learning setting, these problems are more prevalent in the fairness domain, as the data itself is biased to begin with. As an example, we consider how the optimized pre-processing algorithm [11] performs on Pro Publica s COMPAS dataset [1] in terms of demographic parity (DP), which measures the difference in accuracy between the protected groups. Figure 1 shows two situations (1) unweighted training distribution (in blue), and (2) weighted training distributions (in red). The optimized pre-processing algorithm [11] yields a classifier that is almost fair on the unweighted training set (DP 0.02). However, it has DP of at least 0.2 on the weighted set, despite the fact that the marginal distributions of the features look almost the same for the two scenarios. 34th Conference on Neural Information Processing Systems (Neur IPS 2020), Vancouver, Canada. Figure 1: Unweighted vs Reweighted COMPAS dataset. The marginals of the two distributions are almost the same, but standard fair classifiers show demographic parity of at least 0.2 on the reweighted dataset. This example motivates us to design a fair classifier that is robust to such perturbations. We also show how to construct such weighted examples using a few linear programs. Contributions: In this work, we initiate the study of fair classifiers that are robust to perturbations in the training distribution. The set of perturbed distributions are given by any arbitrary weighted combinations of the training dataset, say W. Our main contributions are the following: We develop classifiers that are fair not only with respect to the training distribution, but also for the class of distributions characterized by W. We formulate a min-max objective whose goal is to minimize a distributionally robust training loss, and simultaneously, find a classifier that is fair with respect to the entire class. We first reduce this problem to finding a fair classifier that is robust with respect to the class of distributions. Based on online learning algorithm, we develop an iterative algorithm that provably converges to such a fair and robust solution. Experiments on standard machine learning fairness datasets suggest that, compared to the state-ofthe-art fair classifiers, our classifier retains fairness guarantees and test accuracy for a large class of perturbations on the test set. Furthermore, our experiments show that there is an inherent trade-off between fairness robustness and accuracy of such classifiers. Related Work: Numerous proposals have been laid out to capture bias and discrimination in settings where decisions are delegated to algorithms. Such formalization of fairness can be statistical [14, 19 21, 28], individual [13, 31], causal [23, 25, 36], and even procedural [18]. We restrict attention to statistical fairness, which fix a small number of groups in the population and then compare some statistic (e.g., accuracy, false positive rate) across these groups. We mainly consider the notion of demographic parity [14, 20, 21] and equalized odds [19] in this paper, but our method of designing robust and fair classifiers can be adapted to any type of statistical fairness. On the other hand, there are three main approaches for designing a fair classifier. The pre-processing approach tries to transform training data and leverage standard classifiers [11, 14, 20, 35]. The in-processing approach, on the other hand, directly modifies the learning algorithm to meet the fairness criteria [4, 15, 22, 34]. The post-processing approach, however, modifies the decisions of a classifier [19, 28] to make it fair. Ours is an in-processing approach and mostly related to [4, 5, 22]. Agarwal et al. [4] and Alabi et al. [5] show how binary classification problem with group fairness constraints can be reduced to a sequence of cost-sensitive classification problems. Kearns et al. [22] follow a similar approach, but instead consider a combinatorial class of subgroup fairness constraints. Recently, [7] integrated and implemented a range of such fair classifiers in a Git Hub project, which we leverage in our work. In terms of technique, our paper falls in the category of distributionally robust optimization (DRO), where the goal is to minimize the worst-case training loss for any distribution that is close to the training distribution by some metric. Various types of metrics have been considered including bounded f-divergence [8, 26], Wasserstein distance [2, 17], etc. To the best of our knowledge, prior literature has largely ignored enforcing constraints such as fairness in a distributionally robust sense. Further afield, our work has similarity with recent work in fairness testing inspired by the literature on program verification [6, 16, 32]. These papers attempt to automatically discover discrimination in decision-making programs, whereas we develop tools based on linear program to discover distributions that expose potential unfairness. 2 Problem and Definitions We will write ((x, a), y) to denote a training instance where a A denotes the protected attributes, x X denotes all the remaining attributes, and y {0, 1} denotes the outcome label. For a hypothesis h, h(x, a) {0, 1} denotes the outcome predicted by it, on an input (x, a). We assume that the set of hypothesis is given by a class H. Given a loss function ℓ: {0, 1} {0, 1} R, the goal of a standard fair classifier is to find a hypothesis h H that minimizes the training loss Pn i=1 ℓ(h(xi, ai), yi) and is also fair according to some notion of fairness. We aim to design classifiers that are fair with respect to a class of distributions that are weighted perturbations of the training distribution. Let W = {w Rn + : P i wi = 1} be the set of all possible weights. For a hypothesis h and weight w, we define the weighted empirical risk, ℓ(h, w) = Pn i=1 wiℓ(h(xi, ai), yi). We will write δw F (h) to define the unfairness gap with respect to the weighted empirical distribution defined by the weight w and fairness constraint F (e.g., demographic parity (DP) or equalized odds (EO)). For example, δw DP (h) is defined as δw DP (h) = max a,a A i:ai=a wih(xi, a) P P i:ai=a wih(xi, a ) P Therefore, δw DP (h) measures the maximum weighted difference in acceptance rates between the two groups with respect to the distribution that assigns weight w to the training examples. On the other hand, δw EO(h) = 1/2(δw EO(h|0) + δ2 EO(h|1))1, where δw EO(h|y) is defined as δw EO(h|y) = max a,a A P i:ai=a,yi=y wih(xi, a) P i:ai=a,yi=y wi P i:ai=a ,yi=y wih(xi, a ) P i:ai=a ,yi=y wi Therefore, δw EO(h|0) (resp., δw EO(h|1)) measures the weighted difference in false (resp., true) positive rates between the two groups with respect to the weight w. We will develop our theory using DP as an example of a notion of fairness, but our experimental results will concern both DP and EO. We are now ready to formally define our main objective. For a class of hypothesis H, let HW = {h H : δw F (h) ϵ w W} be the set of hypothesis that are ϵ-fair with respect to all the weights in the set W. Our goal is to solve the following min-max problem: min h HW max w W ℓ(h, w) (2) Therefore, we aim to minimize a robust loss with respect to a class of distributions indexed by W. Additionally, we also aim to find a classifier that is fair with respect to such perturbations. We allow our algorithm to output a randomized classifier, i.e., a distribution over the hypothesis H. This is necessary if the space H is non-convex or if the fairness constraints are such that the set of feasible hypothesis HW is non-convex. For a randomized classifier µ, its weighted empirical risk is ℓ(µ, w) = P h µ(h)ℓ(h, w), and its expected unfairness gap is δw F (µ) = P h µ(h)δw F (h). Our algorithm follows a top-down fashion. First we design a meta algorithm that reduces the min-max problem of Equation (2) to a loss minimization problem with respect to a sequence of weight vectors. Then we show how we can design a fair classifier that performs well with respect a fixed weight vector w W in terms of accuracy, but is fair with respect to the entire set of weights W. 3.1 Meta Algorithm Algorithm 1 provides a meta algorithm to solve the min-max optimization problem defined in Equation (2). The algorithm is based on ideas presented in [12], which, given an α-approximate Bayesian oracle for distributions over loss functions, provides an α-approximate robust solution. The algorithm can be viewed as a two-player zero-sum game between the learner who picks the 1We consider the average of false positive rate and true positive rate for simplicity, and our method can also handle more general definitions of EO. [28] ALGORITHM 1: Meta-Algorithm Input: Training Set: {xi, ai, yi}n i=1, set of weights: W, hypothesis class H, parameters T and η. Set η = p 2/Tm and w0(i) = 1/n for all i [n] h0 = Apx Fair(w0) /* Approximate solution of arg minh HW Pn i=1 ℓ(h(xi, ai), yi). */ for each t [Tm] do wt = wt 1 + η wℓ(ht 1, wt 1) wt = ΠW(wt) /* Project wt onto the set of weights W. */ ht = Apx Fair(wt) /* Approximate solution of minh HW Pn i=1 wt(i)ℓ(h(xi, ai), yi)]. */ end Output: hf: Uniform distribution over {h0, h1, . . . , h T }. hypothesis ht, and an adversary who picks the weight vector wt. The adversary performs a projected gradient descent every step to compute the best response. On the other hand, the learner solves a fair classification problem to pick a hypothesis which is fair with respect to the weights W and minimizes weighted empirical risk with respect to the weight wt. However, it is infeasible to compute an exact optima of the problem minh HW Pn i=1 wt(i)ℓ(h(xi, ai), yi)]. So the learner uses an approximate fair classifier Apx Fair( ), which we define next. Definition 1. Apx Fair( ) is an α-approximate fair classifier, if for any weight w Rn +, Apx Fair(w) returns a hypothesis bh such that i=1 wiℓ(bh(xi, ai), yi) min h HW i=1 wiℓ(h(xi, ai), yi) + α. Using the α-approximate fair classifier, we have the following guarantee on the output of Algorithm 1. Theorem 1. Suppose the loss function ℓ( , ) is convex in its first argument and Apx Fair( ) is an α-approximate fair classifier. Then, the hypothesis hf, output by Algorithm 1 satisfies max w W Eh hf i=1 wiℓ(h(xi, ai), yi) min h HW max w W ℓ(h, w) + r The proof uses ideas from [12], except that we use an additive approximate best response.2 3.2 Approximate Fair Classifier We now develop an α-approximate fair and robust classifier. For the remainder of this subsection, let us assume that the meta algorithm (Algorithm 1) has called the Apx Fair( ) with a weight vector w0 and our goal is to design a classifier that minimizes weighted empirical risk with respect to the weight w0, but is fair with respect to the set of all weights W, i.e., find f arg minh HW ℓ(h, w0). Our method applies the following three steps. 1. Discretize the set of weights W, so that it is sufficient to design an approximate fair classifier with respect to the set of discretized weights. In particular, if we discretize each weight up to a multiplicative error ϵ, then developing an α-approximate fair classifier with respect to the discretized weights gives O(α + ϵ)-fair classifier with respect to the set W. 2. Introduce a Lagrangian multiplier for each fairness constraint i.e. for each of the discretized weights, and pair of protected attributes. This lets us set up a two-player zero-sum game for the problem of designing an approximate fair classifier with respect to the set of discretized weights. Here, the learner chooses a hypothesis, whereas an adversary picks the most unfair weight in the set of discretized weights. 3. Design a learning algorithm for the learner s learning algorithm, and design an approximate solution to the adversary s best response to the learner s chosen hypothesis. This lets us write an iterative algorithm where at every step, the learner choosed a hypothesis, and the adversary adjusts the Lagrangian multipliers corresponding to the most violating fairness constraints. 2All the omitted proofs are provided in the supplementary material. We point out that Agarwal et al. [4] was the first to show that the design of a fair classifier can be formulated as a two-player zero-sum game (step 2). However, they only considered group-fairness constraints with respect to the training distribution. The algorithm of Alabi et al. [5] has similar limitations. On the other hand, we consider the design of robust and fair classifier and had to include an additional discretization step (1). Finally, the design of our learning algorithm and the best response oracle is significantly different than [4, 5, 22]. 3.2.1 Discretization of the Weights We first discretize the set of weights W as follows. Divide the interval [0, 1] into buckets B0 = [0, δ), Bj+1 = [(1 + γ1)jδ, (1 + γ1)j+1δ) for j = 0, 1, . . . , M 1 for M = log1+γ1(1/δ)) . For any weight w W, construct a new weight w = (w 1, . . . , w n) by setting w i to be the upper-end point of the bucket containing wi, for each i [n]. We now substitute δ = γ1 2n and write N(γ1, W) to denote the set containing all the discretized weights of the set W. The next lemma shows that a fair classifier for the set of weights N(γ1, W), is also a fair classifier for the set of weights W up to an error 4γ1. Lemma 1. If w N(γ1, W), δw DP (f) ϵ, then we have δw DP (f) ϵ + 4γ1 for any w W. Therefore, in order to ensure that δw DP (f) ε we discretize the set of weights W and enforce ε 4γ1 fairness for all the weights in the set N(γ1, W). This result makes our work easier as we need to guarantee fairness with respect to a finite set of weights N(γ1, W), instead of a large and continuous set of weights W. However, note that, the number of weights in N(γ1, W) can be O logn 1+γ1(2n/γ1) , which is exponential in n. We next see how to avoid this problem. 3.3 Setting up a Two-Player Zero-Sum Game We formulate the problem of designing a fair and robust classifier with respect to the set of weights in N(γ1, W) as a two-player zero-sum game. Let R(w, a, f) = i:ai=a wif(xi,a) P i:ai=a wi . Then δw DP (f) = supa,a |R(w, a, f) R(w, a , f)|. Our aim is to solve the following problem. i=1 w0 i ℓ(h(xi, ai), yi) (3) s.t. R(w, a, h) R(w, a , h) ε 4γ1 w N(γ1, W) a, a A We form the following Lagrangian. min h H max λ 1 B i=1 w0 i ℓ(h(xi, ai), yi) + X a,a A λa,a w (R(w, a, h) R(w, a , h) ε+4γ1). (4) Notice that we restrict the ℓ1-norm of the Lagrangian multipliers by the parameter B. We will later see how to choose this parameter B. We first convert the optimization problem define in Equation (4) as a two-player zero-sum game. Here the learner s pure strategy is to play a hypothesis h in H. Given the learner s hypothesis h H, the adversary picks the constraint (weight w and groups a, a ) that violates fairness the most and sets the corresponding coordinate of λ to B. Therefore, for a fixed hypothesis h, it is sufficient for the adversary to play a vector λ such that either all the coordinates of λ are zero or exactly one is set to B. For such a pair (h, λ) of hypothesis and Largangian multipliers, we define the payoff matrix as i=1 w0 i ℓ(h(xi, ai), yi) + X a,a A λa,a w (R(w, a, h) R(w, a , h) ε + 4γ1) Now our goal is to compute a ν-approximate minimax equilibrium of this game. In the next subsection, we design an algorithm based on online learning. The algorithm uses best responses of the hand λ-players, which we discuss next. Best response of the h-player: For each i [n], we introduce the following notation λai,a w λa ,ai w wi P With this notation, the payoff becomes i=1 w0 i ℓ(h(xi, ai), yi) + ih(xi, ai) (ε 4γ1) X a,a A λa,a w Let us introduce the following costs. c0 i = ℓ(0, 1)w0 i if yi = 1 ℓ(0, 0)w0 i if yi = 0 c1 i = ℓ(1, 1)w0 i + i if yi = 1 ℓ(1, 0)w0 i + i if yi = 0 (5) Then the h-player s best response becomes the following cost-sensitive classification problem. bh arg min h H c1 i h(xi, ai) + c0 i (1 h(xi, ai)) (6) Therefore, as long as we have access to an oracle for the cost-sensitive classification problem, the h-player can compute its best response. Note that, the notion of a cost-sensitive classification as an oracle was also used by [4, 22]. In general, solving this problem is NP-hard, but there are several efficient heuristics that perform well in practice. We provide further details about how we implement this oracle in the section devoted to the experiments. Best response of the λ-player: Since the fairness constraints depend on the weights non-linearly (e.g., see Eq. (1)), finding the most violating constraint is a non-linear optimization problem. However, we can guess the marginal probabilities over the protected groups. If we are correct, then the most violating weight vector can be found by a linear program. Since we cannot exactly guess this particular value, we instead discretize the set of marginal probabilities, iterate over them, and choose the option with largest violation in fairness. This intuition can be formalized as follows. We discretize the set of all marginals over |A| groups by the following rule. First discretize [0, 1] as 0, δ, (1 + γ2)jδ for j = 1, 2, . . . , M for M = O(log1+γ2(1/δ)). This discretizes [0, 1]A into M |A| points, and then retain the discretized marginals whose total sum is at most 1+γ2, and discard all other points. Let us denote the set of such marginals as Π(γ2, A). Algorithm 2 goes through all the marginals π in Π(γ2, A) and for each such tuple and a pair of groups a, a finds the weight w which maximizes R(w, a, h) R(w, a , h). Note that this can be solved using a linear Program as the weights assigned to a group is fixed by the marginal tuple π. Out of all the solutions, the algorithm picks the one with the maximum value. Then it checks whether this maximum violates the constraint (i.e., greater than ε). If so, it sets the corresponding λ value to B and everything else to 0. Otherwise, it returns the zero vector. As the weight returned by the LP need not correspond to a weight in N(γ1, W), it rounds the weight to the nearest weight in N(γ1, W). For discretizing the marginals we will set δ = (1 + γ2) γ1 n , which implies that the number of LPs run by Algorithm 2 is at most O log|A| 1+γ2 n (1+γ2)γ1 = O(poly(log n)), as the number of groups is fixed. Lemma 2. Algorithm 2 is an B(4γ1 + γ2)-approximate best response for the λ-player i.e., for any h H, it returns λ such that U(h, λ ) maxλ U(h, λ) B(4γ1 + γ2). Learning Algorithm: We now introduce our algorithm for the problem defined in Equation (4). In this algorithm, the λ-player uses Algorithm 2 to compute an approximate best response, whereas the h-player uses Regularized Follow the Leader (RFTL) algorithm [3, 30] as its learning algorithm. RFTL is a classical algorithm for online convex optimization (OCO). In OCO, the decision maker takes a decision xt K at round t, an adversary reveals a convex loss function ft : K R, and the decision maker suffers a loss of ft(xt). The goal is to minimize regret, which is defined as maxu K{PT t=1 ft(xt) ft(u)}, i.e., the difference between the loss suffered by the learner and the best fixed decision. RFTL requires a strongly convex regularization function R : K R 0, and chooses xt according to the following rule: xt = arg min x K η s=1 fs(xs)T x + R(x). We use RFTL in our learning algorithm as follows. We set the regularization function R(x) = 1/2 x 2 2, and loss function ft(ht) = U(ht, λt) where λt is the approximate best-response to ht. ALGORITHM 2: Best Response of the λ-player Input: Training Set: {xi, ai, yi}n i=1, and hypothesis h H. for each π Π(γ2, A) and a, a A do Solve the following LP: w(a, a , π) = arg max w 1 πa i:ai=a wih(xi, a) 1 πa i:ai=a wih(xi, a ) i:ai=a wi = πa X i:ai=a wi = πa wi 0 i [n] Set val(a, a , π) = 1 πa P i:ai=a w(a, a , π)ih(xi, a) 1 πa P i:ai=a w(a, a , π)ih(xi, a ) end Set (a 1, a 2, π ) = arg maxa,a ,π val(a, a , π) if val(a 1, a 2, π ) > ε then Let w = w(a 1, a 2, π ). For each i [n], let w i be the upper-end point of the bucket containing wi. return λa,a w = B if (a, a , w) = (a 1, a 2, w ) 0 o.w. else Therefore, at iteration t the learner needs to solve the following optimization problem. ht arg min h H η s=1 hs U(hs, λs)T h + 1 2 h 2 2. (7) Here with slight abuse of notation we write H to include the set of randomized classifiers, so that h(xi, ai) is interpreted as the probability that hypothesis h outputs 1 on an input (xi, ai). Now we show that the optimization problem (Eq. (7)) can be solved as a cost-sensitive classification problem. For a given λs, the best response of the learner is the following: bh arg min h H i=1 c1 i (λs)h(xi, ai) + c0 i (λs)(1 h(xi, ai)) Writing Li(λs) = c1 i (λs) c0 i (λs), the objective becomes Pn i=1 Li(λs)h(xi, ai). Hence, hs U(hs, λs) is linear in hs and equals the vector {Li(λs)}n i=1. With this observation, the objective in Equation (7) becomes i=1 L(λs)h(xi, ai) + 1 i=1 (h(xi, ai))2 h(xi, ai) + 1 i=1 h(xi, ai) = The first inequality follows from two observations L(λ) is linear in λ, and, since the predictions h(xi, ai) [0, 1] we replace the quadratic term by a linear term, an upper bound.3 Finally, we observe that even though the number of weights in N(γ1, W) is exponential in n, Algorithm 3 can be efficiently implemented. This is because the best response of the λ-player always returns a solution where all the entries are zero or exactly one is set to B. Therefore, instead of recording the entire λ vector the algorithm can just record the non-zero variables and there will be at most T of them. The next lemma provides performance guarantees of Algorithm 3. Theorem 2. Given a desired fairness level ε, if Algorithm 3 is run for T = O n ε2 rounds, then the ensemble hypothesis bh provides the following guarantee: n X i=1 w0 i ℓ(bh(xi, ai), yi) min h H i=1 w0 i ℓ(h(xi, ai), yi) + O(ε) and δw DP (bh) 2ε w W. 3Without this relaxation we will have to solve a regularized version of cost-sensitive classification. ALGORITHM 3: Approximate Fair Classifier (Apx Fair) Input: η > 0, weight w0 Rn +, number of rounds T Set h1 = 0 for t [T] do λt = Bestλ(ht) Set eλt = Pt t =1 λt ht+1 = arg minh H Pn i=1(ηLi(eλt) + 1/2)h(xi, ai) end return Uniform distribution over {h1, . . . , h T }. 4 Experiments We used the following four datasets for our experiments. Adult. In this dataset [24], each example represents an adult individual, the outcome variable is whether that individual makes over $50k a year, and the protected attribute is gender. We work with a balanced and preprocessed version with 2,020 examples and 98 features, selected from the original 48,882 examples. Communities and Crime. In this dataset from the UCI repository [29], each example represents a community. The outcome variable is whether the community has a violent crime rate in the 70th percentile of all communities, and the protected attribute is whether the community has a majority white population. We used the full dataset of 1,994 examples and 123 features. Law School. Here each example represents a law student, the outcome variable is whether the law student passed the bar exam or not, and the protected attribute is race (white or not white). We used a preprocessed and balanced subset with 1,823 examples and 17 features [33]. COMPAS. In this dataset, each example represents a defendant. The outcome variable is whether a certain individual will recidivate, and the protected attribute is race (white or black). We used a 2,000 example sample from the full dataset. For Adult, Communities and Crime, and Law School we used the preprocessed versions found in the accompanying Git Hub repo of [22]4. For COMPAS, we used a sample from the original dataset [1]. In order to evaluate different fair classifiers, we first split each dataset into five different random 80%-20% train-test splits. Then, we split each training set further into a 80%-20% train and validation sets. Therefore, there were five random sets of 64%-16%-20% train-validation-test split. For each split, we used the validation set to select the hyperparameters, train set to build a model, and the test set to evaluate its performance (fairness and accuracy). Finally we aggregated these metrics across the five different test sets to obtain average performance. We compared our algorithm to: a pre-processing method of [20], an in-processing method of [4], a post-processing method of [19]. For our algorithm 5, we use scikit-learn s logistic regression [27] as the learning algorithm in Algorithm 3. We also show the performance of unconstrained logistic regression. To find the correct hyper-parameters (B, η, T, and Tm) for our algorithm, we fixed T = 10 for EO, and T = 5 for DP, and used grid search for the hyper-parameters B, η, and Tm. The tested values were {0.1, 0, 2, . . . , 1} for B, {0, 0.05, . . . , 1} for η, and {100, 200, . . . , 2000} for Tm. Results. We computed the maximum violating weight by solving a LP that is the same as the one used by the best response oracle (Algorithm 2), except that we restrict individual weights to be in the range [(1 ϵ)/n, (1+ϵ)/n], and keep protected group marginals the same. This keeps the ℓ1 distance between weighted and unweighted distributions within ϵ. Figure 2 compares the robustness of our classifier against the other fair classifiers, and we see that for both DP and EO, the fairness violation of our classifier grows more slowly as ϵ increases, compared to the others, suggesting robustness to ϵ-perturbations of the distribution. Our algorithm also performs comparatively well in both accuracy and fairness violation to the existing fair classifiers, though there is a trade-off between robustness and test accuracy. The unweighted test accuracy of our algorithm drops by at most 5%-10% on all 4https://github.com/algowatchpenn/Gerry Fair 5Our code is available at this Git Hub repo: https://github.com/essdeee/ Ensuring-Fairness-Beyond-the-Training-Data. Figure 2: DP and EO Comparison. We vary the ℓ1 distance ϵ on the x-axis and plot the fairness violation on the y-axis. We use five random 80%-20% train-test splits to evaluate test accuracy and fairness. The bands across each line show standard error. For both DP and EO fairness, our algorithm is significantly more robust to reweightings that are within ℓ1 distance ϵ on most datasets. datasets, suggesting that robustness comes at the expense of test accuracy on the original distribution. However, on the test set (which is typically obtained from the same source as the original training data), the difference in fairness violation between our method and other methods is almost negligible on all the datasets, except for the COMPAS dataset, where the difference it at most 12%. See the supplementary material for full details of this trade-off. 5 Conclusion and Future Work In this work, we study the design of fair classifiers that are robust to weighted perturbations of the dataset. An immediate future work is to consider robustness against a broader class of distributions like the set of distributions with a bounded f-divergence or Wasserstein distance from the training distribution. We also considered statistical notions of fairness and it would be interesting to perform a similar fairness vs robustness analysis for other notions of fairness. 6 Broader Impact This paper addresses a topic of societal interest that has received considerable attention in the Neur IPS community over the past several years. Two anticipated positive outcomes of our work include: (i) increased awareness of the dataset robustness concerns when evaluating fairness criteria, and (ii) algorithmic techniques for coping with these robustness concerns. Thus, we believe researchers and practitioners who seek to develop classifiers that satisfy certain fairness criteria would be the primary beneficiaries of this work. A possible negative outcome of the work is the use of the techniques to declare some classifiers as fair without proper consideration of the semantics of the fairness criteria or the underlying dataset used to evaluate these criteria. Thus, individuals subject to the decisions made by such classifiers could be adversely affected. Finally, our experimental evaluations use real world datasets, so the conclusions that we draw in our paper may have implications for individuals associated with these data. Acknowledgments: We thank Shipra Agrawal and Roxana Geambasu for helpful preliminary discussions. DM was supported through a Columbia Data Science Institute Post-Doctoral Fellowship. DH was partially supported by NSF awards CCF-1740833 and IIS-15-63785 as well as a Sloan Fellowship. SJ was partially supported by NSF award CNS-18-01426. [1] Compas dataset. https://www.propublica.org/datastore/dataset/ compas-recidivism-risk-score-data-and-analysis, 2019. Accessed: 2019-10-26. [2] Soroosh Shafieezadeh Abadeh, Peyman Mohajerin Mohajerin Esfahani, and Daniel Kuhn. Distributionally robust logistic regression. In Advances in Neural Information Processing Systems, pages 1576 1584, 2015. [3] Jacob Abernethy, Elad E Hazan, and Alexander Rakhlin. Competing in the dark: An efficient algorithm for bandit linear optimization. In 21st Annual Conference on Learning Theory, pages 263 273, 2008. [4] Alekh Agarwal, Alina Beygelzimer, Miroslav Dudik, John Langford, and Hanna Wallach. A reductions approach to fair classification. In International Conference on Machine Learning, pages 60 69, 2018. [5] Daniel Alabi, Nicole Immorlica, and Adam Kalai. Unleashing linear optimizers for group-fair learning and optimization. In Conference on Learning Theory, pages 2043 2066, 2018. [6] Aws Albarghouthi, Loris D Antoni, Samuel Drews, and Aditya V Nori. Fairsquare: probabilistic verification of program fairness. Proceedings of the ACM on Programming Languages, 1 (OOPSLA):1 30, 2017. [7] Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, et al. Ai fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. ar Xiv preprint ar Xiv:1810.01943, 2018. [8] Aharon Ben-Tal, Dick Den Hertog, Anja De Waegenaere, Bertrand Melenberg, and Gijs Rennen. Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 59(2):341 357, 2013. [9] Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In Advances in Neural Information Processing Systems, pages 4349 4357, 2016. [10] Joy Buolamwini and Timnit Gebru. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In Conference on Fairness, Accountability and Transparency, pages 77 91, 2018. [11] Flavio Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan Ramamurthy, and Kush R Varshney. Optimized pre-processing for discrimination prevention. In Advances in Neural Information Processing Systems 30, pages 3992 4001, 2017. [12] Robert S Chen, Brendan Lucier, Yaron Singer, and Vasilis Syrgkanis. Robust Optimization for Non-Convex Objectives. In Advances in Neural Information Processing Systems, pages 4705 4714, 2017. [13] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, pages 214 226, 2012. [14] Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 259 268, 2015. [15] Benjamin Fish, Jeremy Kun, and Ádám D Lelkes. A confidence-based approach for balancing fairness and accuracy. In Proceedings of the 2016 SIAM International Conference on Data Mining, pages 144 152. SIAM, 2016. [16] Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. Fairness testing: testing software for discrimination. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, pages 498 510, 2017. [17] Rui Gao and Anton J Kleywegt. Distributionally robust stochastic optimization with wasserstein distance. ar Xiv preprint ar Xiv:1604.02199, 2016. [18] Nina Grgic-Hlaca, Muhammad Bilal Zafar, Krishna P Gummadi, and Adrian Weller. The case for process fairness in learning: Feature selection for fair decision making. In NIPS Symposium on Machine Learning and the Law, volume 1, page 2, 2016. [19] Moritz Hardt, Eric Price, Nati Srebro, et al. Equality of Opportunity in Supervised Learning. In Advances in Neural Information Processing Systems, pages 3315 3323, 2016. [20] Faisal Kamiran and Toon Calders. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems, 33(1):1 33, 2012. [21] Toshihiro Kamishima, Shotaro Akaho, and Jun Sakuma. Fairness-aware learning through regularization approach. In 2011 IEEE 11th International Conference on Data Mining Workshops, pages 643 650. IEEE, 2011. [22] Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In International Conference on Machine Learning, pages 2564 2572, 2018. [23] Niki Kilbertus, Mateo Rojas Carulla, Giambattista Parascandolo, Moritz Hardt, Dominik Janzing, and Bernhard Schölkopf. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems, pages 656 666, 2017. [24] Ron Kohavi. Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid. In Kdd, volume 96, pages 202 207, 1996. [25] Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. In Advances in Neural Information Processing Systems, pages 4066 4076, 2017. [26] Hongseok Namkoong and John C Duchi. Stochastic gradient methods for distributionally robust optimization with f-divergences. In Advances in Neural Information Processing Systems, pages 2208 2216, 2016. [27] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825 2830, 2011. [28] Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q Weinberger. On fairness and calibration. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 5680 5689. Curran Associates, Inc., 2017. [29] Michael Redmond and Alok Baveja. A data-driven software tool for enabling cooperative information sharing among police departments. European Journal of Operational Research, 141(3):660 678, 2002. [30] Shai Shalev-Shwartz. Online learning: Theory, algorithms, and applications. Ph D thesis, The Hebrew University of Jerusalem, 2007. [31] Saeed Sharifi-Malvajerdi, Michael Kearns, and Aaron Roth. Average individual fairness: Algorithms, generalization and experiments. In Advances in Neural Information Processing Systems, pages 8240 8249, 2019. [32] Florian Tramer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, Jean-Pierre Hubaux, Mathias Humbert, Ari Juels, and Huang Lin. Fairtest: Discovering unwarranted associations in data-driven applications. In 2017 IEEE European Symposium on Security and Privacy (Euro S&P), pages 401 416. IEEE, 2017. [33] Linda F Wightman. LSAC national longitudinal bar passage study. Technical report, LSAC Research Report Series, 1998. [34] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web, pages 1171 1180, 2017. [35] Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning Fair Representations. In International Conference on Machine Learning, pages 325 333, 2013. [36] Junzhe Zhang and Elias Bareinboim. Fairness in decision-making the causal explanation formula. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.