# fair_classification_with_adversarial_perturbations__a595de42.pdf

Fair Classiﬁcation with Adversarial Perturbations

L. Elisa Celis Yale University Anay Mehrotra Yale University Nisheeth K. Vishnoi Yale University

We study fair classiﬁcation in the presence of an omniscient adversary that, given an η, is allowed to choose an arbitrary η-fraction of the training samples and arbitrarily perturb their protected attributes. The motivation comes from settings in which protected attributes can be incorrect due to strategic misreporting, malicious actors, or errors in imputation; and prior approaches that make stochastic or independence assumptions on errors may not satisfy their guarantees in this adversarial setting. Our main contribution is an optimization framework to learn fair classiﬁers in this adversarial setting that comes with provable guarantees on accuracy and fairness. Our framework works with multiple and non-binary protected attributes, is designed for the large class of linear-fractional fairness metrics, and can also handle perturbations besides protected attributes. We prove near-tightness of our framework s guarantees for natural hypothesis classes: no algorithm can have signiﬁcantly better accuracy and any algorithm with better fairness must have lower accuracy. Empirically, we evaluate the classiﬁers produced by our framework for statistical rate on real-world and synthetic datasets for a family of adversaries.

1 Introduction

It is increasingly common to deploy classiﬁers to assist in decision-making in applications such as criminal recidivism [50], credit lending [21], and predictive policing [34]. Hence, it is imperative to ensure that these classiﬁers are fair with respect to protected attributes such as gender and race. Consequently, there has been extensive work on approaches for fair classiﬁcation [32, 26, 28, 17, 63, 62, 48, 24, 27, 1, 13]. At a high level, a classiﬁer f is said to be fair with respect to a protected attribute Z if it has a similar performance with respect to a given metric on different protected groups deﬁned by Z. Given a fairness metric and a hypothesis class F, fair classiﬁcation frameworks consider the problem of ﬁnding a classiﬁer f F that maximizes accuracy constrained to being fair with respect to the given fairness metric (and Z) [8]. To specify fairness constraints, these approaches need protected attributes of training data to be known.

However, protected attributes can be erroneous for various reasons; there could be uncertainties during data collection or data cleaning process [20, 52], or the attributes could be strategically misreported [46]. Further, protected attributes may be missing entirely, as is often the case for racial and ethnic information in healthcare [20] or when data is scraped from the internet as with many image datasets [22, 66, 35]. In these cases, protected attributes can be imputed [18, 36, 16], but this can also introduce errors [12]; further, imputation by machine-learning-based methods is known to be fragile to imperceptible changes in the inputs [29] and to have correlated errors across samples [49]. Perturbations in protected attributes, regardless of origin, have been shown to have adverse effects on fair classiﬁers, affecting their performance on both accuracy and fairness metrics; see e.g., [16, 7].

Towards addressing this problem, several recent works have developed fair classiﬁcation algorithms for various models of errors in the protected attributes. [44] consider an extension of the mutually contaminated learning model [53] where, instead of observing samples from the true joint distribution, distributions of observed group-conditional distributions are stochastic mixtures of their true counterparts. [6] consider a binary protected attribute and Bernoulli perturbations that are

35th Conference on Neural Information Processing Systems (Neur IPS 2021).

independent of the labels (and of each other). [14] consider the setting where each sample s protected attribute is independently ﬂipped to a different value with a known probability. [59] considers two approaches to deal with perturbations. In their soft-weights approach, they assume perturbations follow a ﬁxed distribution and one has access to an auxiliary data containing independent draws of both the true and perturbed protected attributes. In their distributionally robust approach, for each protected group, its feature and label distributions in the true data and the perturbed data are a known total variation distance away from each other. Finally, in an independent work, [40] study fair classiﬁcation under the Malicious noise model [56, 39] in which a fraction of the training samples are chosen uniformly at random, and can then be perturbed arbitrarily.

Our perturbation model. We extend this line of work by studying fair classiﬁcation under the following worst-case adversarial perturbation model: Given an η > 0, after the training samples are independently drawn from a true distribution D, the adversary with unbounded computation power sees all the samples and can use this information to choose any η-fraction of the samples and perturb their protected attributes arbitrarily. This model is a straightforward adaptation of the perturbation model of [31] to the fair classiﬁcation setting and we refer to it as the η-Hamming model. Unlike perturbation models studied before, this model can capture settings where the perturbations are strategic or arbitrarily correlated as can arise in the data collection stage or during imputation of the protected attributes, and in which the errors cannot be estimated using auxiliary data. In fact, under this perturbation model, the classiﬁers outputted by prior works can violate the fairness constraints by a large amount or have an accuracy that is signiﬁcantly lower than the accuracy of f ; see Section 5 and Supplementary Material D.2. Taking these perturbed samples, a fairness metric Ω, and a desired fairness threshold τ as input, the goal is to learn a classiﬁer f with the maximum accuracy with respect to the true distribution D subject to having a fairness value, ΩD(f), of at least τ with respect to the true distribution D.

Our contributions. We present an optimization framework (Deﬁnition 4.1) that outputs fair classiﬁers for the η-Hamming model and comes with provable guarantees on accuracy and fairness (Theorem 4.3). Our framework works for multiple and non-binary protected attributes, and the large class of linear-fractional fairness metrics (that capture most fairness metrics studied in the literature); see Deﬁnition 3.1 and [13]. The framework provably outputs a classiﬁer whose accuracy is within 2η of the accuracy of f and which violates the fairness constraint by at most O(η/λ) additively (Theorem 4.3), under the mild assumption that the performance of f on each protected group is larger than a known constant λ > 0 (Assumption 1). Assumption 1 is drawn from the work of [14] for fair classiﬁcation with stochastic perturbations. While it is not clear if the assumption is necessary in their model, we show that Assumption 1 is necessary for fair classiﬁcation in the η-Hamming model: If λ is not bounded away from 0, then no algorithm can give a non-trivial guarantee on both accuracy and fairness value of the output classiﬁer (Theorem 4.4). Moreover, we prove the near-tightness of our framework s guarantee under Assumption 1: No algorithm can guarantee to output a classiﬁer with accuracy closer than η to that of f and any algorithm that violates the fairness constraint by less than η/(20λ) additively has an accuracy at most 19/20 (Theorems 4.5 and A.21). Finally, we also extend our framework s guarantees to the Nasty Sample Noise model (Supplementary Material A.1.5). The Nasty Sample Noise model is a generalization of the η-Hamming model, which was studied by [11] in the context of PAC learning (without any fairness considerations), where the adversary can choose any η-fraction of the samples, and can arbitrarily perturb both their labels and features.

We implement our framework for logistic loss function with linear classiﬁers and evaluate its performance on COMPAS [3], Adult [23], and a synthetic dataset (Section 5). We generate perturbations of these datasets admissible in the η-Hamming model and compare the performance of our approach to several baselines [44, 6, 59, 14, 40] with statistical rate and false-positive rate as fairness metrics.1 On the synthetic dataset, we compare against a method developed for fair classiﬁcation under stochastic perturbations [14] and demonstrate the comparative strength of the η-Hamming model; our results show that [14] s framework achieves a signiﬁcantly lower accuracy than our framework for the same statistical rate. Empirical results on COMPAS and Adult show that the classiﬁer output by our framework can attain better statistical rate and false-positive rate than the accuracy maximizing classiﬁer on the true distribution, with a small loss in accuracy. Further, our framework has a similar (or better) fairness-accuracy trade-off compared to all baselines we consider in a variety of settings, and is not dominated by any other approach (Figure 1 and Figures 7 and 8 in Supplementary Material E.2).

1Let qℓ(f, SR) (respectively qℓ(f, FPR)) be the fraction of positive predictions (respectively false-positive predictions) by f in the ℓ-th protected group. f s statistical rate (respectively false-positive rate) is the ratio of the minimum value to the maximum value of qℓ(f, SR) (respectively qℓ(f, FPR)) over all protected groups.

Techniques. The starting point of our optimization framework (Deﬁnition 4.1) is the standard optimization program for fair classiﬁcation in the absence of any perturbations: Given a fairness metric Ωand a desired fairness threshold τ as input, ﬁnd f F that maximizes the accuracy on the given data b S constrained to a fairness value at least τ on the given data. However, when b S is given to us by an η-Hamming adversary, this standard program, which imposes the fairness constraints with respect to the perturbed data b S, may output a classiﬁer with an accuracy/fairness-value worse than that of f when measured with respect to D. But, observe that the difference in accuracies of a classiﬁer when measured with respect to the given data b S and data sampled from D is at most η. Thus, if f F is feasible for the standard optimization program, this observation (used twice) implies that the accuracy of the output classiﬁer measured with respect to D is within 2η of the accuracy of f measured with respect D (Equation (8)). However, without any modiﬁcations, the classiﬁer output by the standard optimization program could still have a fairness value much lower than τ with respect to D (see Example A.27). To bypass this, we introduce the notion of s-stability that allows us to lower bound the fairness value of a classiﬁer with respect to D given its fairness value on b S. Roughly, f F is said to be s-stable with respect to a fairness metric if for any b S that is generated by an η-Hamming adversary, the ratio of fairness value of f with respect to D and with respect to b S is between s and 1/s (see Deﬁnition 4.7). It follows that any s-stable classiﬁer that has fairness value τ > 0 with respect to b S, has fairness value at least s τ with respect to D. Hence, an optimization program that ensures that all feasible classiﬁers are s-stable (for a suitable choice of s) and have fairness value at least τ > 0 with respect to b S, comes with a guarantee that any feasible classiﬁer has a fairness value at least s τ (with respect to D). If such an optimization program could further ensure that f is feasible for it, then by arguments presented above, the classiﬁer output by this optimization program would satisfy required guarantees on both fairness and accuracy (Lemma 4.9). The issue is that, to directly enforce s-stability, one needs to compute the fairness values of classiﬁers with respect to D, but this is not possible in the absence of samples from D. We overcome this by present a proxy constraint on the classiﬁer (Equation (5)) that involves only b S and ensures that any classiﬁer that satisﬁes it is s-stable. Moreover, f satisﬁes this constraint under Assumption 1. Overall, modifying Program (2) to include this constraint (Equation (5)) with a suitable value of s, and setting an appropriate fairness threshold τ so that f remains feasible, leads us to our framework.

2 Related work

In this section, we situate this paper in relation to lines of work which also consider fair classiﬁcation with perturbed protected attributes; additional related work (e.g., on fair classiﬁcation in the absence of protected attributes) are presented in Supplementary Material D.1.

[44] give a framework which comes with provable guarantees on the accuracy and fairness value of output classiﬁers for a binary protected attribute and either statistical rate or equalized-odds fairness metrics. [6] identify conditions on the distribution of perturbations under which the post-processing algorithm of [32] improves the fairness value of the accuracy-maximizing classiﬁer with respect to equalized-odds on the true distribution with a binary protected attribute. [59] consider a nonbinary protected attribute. In their soft-weights approach, they give provable guarantees on the accuracy (with respect to f ) and fairness value of the output classiﬁer in expectation and in their distributionally robust approach, they give provable guarantees on the fairness value of the output classiﬁers.2 [14] give provable guarantees on the accuracy and fairness value of output classiﬁers for multiple non-binary protected attributes and the class of linear-fractional metrics. All of the aforementioned works [44, 6, 59, 14] consider stochastic perturbation models, which are weaker than the model considered in this paper. Further, compared to [44, 6], our approach (and that of [14]) can handle multiple categorical protected attributes and multiple linear-fractional metrics (which include statistical rate and can ensure equalized-odds constraints). Compared to [6, 59], our work (and those of [44, 14]) give provable guarantees on the accuracy (with respect to f ) and fairness value of output classiﬁers with high probability. In another related work, [40] give an algorithm for a binary protected attribute which, under the realizable assumption (i.e., assuming there exists a classiﬁer with perfect accuracy), outputs a classiﬁer with guarantees on accuracy and fairness value with respect to the true-positive rate fairness metric. They study the Malicious noise model, which can modify a uniformly randomly selected subset of samples arbitrarily; this is weaker than the Nasty Sample Noise model [11, 4], and hence, than the model considered in this paper. Further, our

2Supplementary Material D.2.3 gives an example where [59] s distributionally robust approach outputs a classiﬁer whose accuracy is arbitrarily close to 1/2.

framework works without the realizable assumption (i.e., in the agnostic setting), can handle multiple and non-binary protected attributes, and can ensure fairness with respect to multiple linear-fractional metrics (which include true-positive rate).

Another line of work has studied PAC learning in the presence of adversarial (and stochastic) perturbations in the data, without considerations of fairness [39, 2, 11, 15, 5]; see also [4]. In particular, [11] study PAC learning (without fairness constraints) under the Nasty Sample Noise model. They use the empirical risk minimization framework (see, e.g., [54]) run on the perturbed samples to output a classiﬁer. Our framework Program (Err Tolerant) ﬁnds empirical risk minimizing classiﬁers that satisfy fairness constraints on the perturbed data, and that are also stable for the given fairness metric. While both frameworks show that the accuracy of the respective output classiﬁers is within 2η of the respective optimal classiﬁers when the data is unperturbed, the optimal classiﬁers can be quite different. For instance, while [11] s framework is guaranteed to output a classiﬁer with high accuracy, it can perform poorly on fairness metrics; see Section 5 and Supplementary Material D.2.1.

Let the data domain be D := X {0, 1} [p], where X is the set of non-protected features, {0, 1} is the set of binary labels, and [p] is the set of p protected attributes. Let D be a distribution over D. Let F {0, 1}X [p] be a hypothesis class of binary classiﬁers. For f F, let Err D(f) := Pr(X,Y,Z) D[f(X, Z) = Y ] denote f s predictive error on draws from D. In the vanilla classiﬁcation problem, the learner L s goal is to ﬁnd a classiﬁer with minimum error: argminf F Err D(f). In the fair classiﬁcation problem, the learner is restricted to pick classiﬁers that have a similar performance conditioned on Z = ℓfor all ℓ [p]. We consider the following class of metrics.

Deﬁnition 3.1 (Linear/linear-fractional metrics [13]). Given f F and two events E(f) and E (f), that can depend on f, deﬁne the performance of f on Z = ℓ(ℓ [p]) as qℓ(f) := Pr D[E(f) | E (f), Z = ℓ]. If E depends on f, then qℓ(f) is said to be linear-fractional, otherwise linear.

Deﬁnition 3.1 captures most of the performance metrics considered in the literature. For instance, for E := (f = 1) and E := , we get statistical rate (a linear metric).3 For E := (f = 1) and E := (Y = 0), we get false-positive rate (also a linear metric). For E := (Y = 0) and E := (f = 1), we get false-discovery rate (a linear-fractional metric). Given a performance metric q, the corresponding fairness metric is deﬁned as

ΩD(f) := minℓ [p] qℓ(f)

maxℓ [p] qℓ(f). (1)

When D is the empirical distribution over samples S, we use Ω(f, S) to denote ΩD(f). The goal of fair classiﬁcation, given a fairness metric Ωand a threshold τ (0, 1], is to (approximately) solve:

minf F Err D(f) s.t., ΩD(f) τ. (2)

If samples from D are available, then one could try to solve this program. However, as discussed in Section 1, we do not have access to the true protected attribute Z, but instead only see a perturbed version, b Z [p], generated by the following adversary.

η-Hamming model. Given an η [0, 1], let A(η) denote the set of all adversaries in the η-Hamming model. Any adversary A A(η) is a randomized algorithm with unbounded computation resources that knows the true distribution D and the algorithm of the learner L. In this model, the learner L queries A for N N samples from D exactly once. On receiving the request, A draws N independent samples S := {(xi, yi, zi)}i [N] from D, then A uses its knowledge of D and L to choose an arbitrary η N samples (η [0, 1]) and perturb their protected attribute arbitrarily to generate b S := {(xi, yi, bzi)}i [N]. Finally, A gives these perturbed samples b S to L.

Learning model. Given b S and the η, the learner L would like to (approximately) solve Program (2).

Deﬁnition 3.2 ((ε, ν)-learning). Given bounds on error ε (0, 1) and constraint violation ν (0, 1), a learner L is said to (ε, ν)-learn a hypothesis class F {0, 1}X [p] with perturbation rate η [0, 1] and conﬁdence δ (0, 1) if for all

3We overload the notation f to denote both the classiﬁer as well as its prediction, and the terms, statistical rate and false-positive rate, to refer to both the linear/linear-fractional metric q and the resulting fairness metric Ω.

distributions D over X {0, 1} [p] and adversaries A A(η),

there exists a threshold N0(ε, ν, δ, η) N, such that with probability at least 1 δ over the draw of N N0(ε, ν, δ, η) iid samples S D, given η and the perturbed samples b S := A(S), L outputs f F that satisﬁes Err D(f) Err D(f ) ε and ΩD(f) τ ν, where f is the optimal solution of Program (2) (i.e., f := argminf F Err D(f), s.t., ΩD(f) τ).

Given a ﬁnite number of perturbed samples, Deﬁnition 3.2 requires the learner to output a classiﬁer that violates the fairness constraints additively by at most ν and that has a predictive error at most ε smaller than that of f , with probability at least 1 δ. Like PAC learning [56], for a given hypothesis class F, Deﬁnition 3.2 requires the learner to succeed on all distributions D. Problem 1 (Fair classiﬁcation with adversarial perturbations). Given a hypothesis class F {0, 1}X [p], a fairness metric Ω, a threshold τ [0, 1], a perturbation rate η [0, 1], and perturbed samples b S, the goal is to (ε, ν)-learn F for small ε, ν (0, 1).

4 Theoretical results

In this section, we present our results on learning fair classiﬁers under the η-Hamming model. Our optimization framework (Program (Err Tolerant)) is a careful modiﬁcation of Program (2). The main difﬁculty is that, unlike Program (2), it only has access to the perturbed samples b S, and the ratio of a classiﬁer s fairness with respect to the true distribution D and with respect to b S can be arbitrarily small (see Example A.27 in Supplementary Material A.5). To overcome this, our framework ensures that all feasible classiﬁers are stable (Deﬁnition 4.7). Then, as mentioned in Section 1, imposing the fairness constraint on b S guarantees (approximate) fairness on the true distribution D. The accuracy guarantee follows by ensuring that the optimal solution of Program (2), f F, is feasible for our framework. To ensure this, we require Assumption 1 that also appeared in [14]. Assumption 1. There is a known constant λ > 0 such that minℓ [p] Pr D[E(f ), E (f ), Z = ℓ] λ.

It can be shown that this assumption implies that λ is also a lower bound on the performances q1(f ), . . . , qp(f ) that depend on E and E . We expect λ to be a non-vanishing positive constant in applications. For example, if q is statistical rate, the minority protected group makes at least 20% of the population (i.e., minℓ [p] Pr D[Z = ℓ] 0.2), and for all ℓ [p], Pr[f = 1 | Z = ℓ] 1/2, then λ 0.1. In practice, λ is not known exactly, but it can be set based on the context (e.g., see Section 5 and [14]). We show that Assumption 1 is necessary for the η-Hamming model (see Theorem 4.4). Deﬁnition 4.1 (Error-tolerant program). Given a fairness metric Ωand corresponding events E and E (as in Deﬁnition 3.1), a perturbation rate η [0, 1], and constants λ, (0, 1], we deﬁne the error-tolerant program for perturbed samples b S, whose empirical distribution is b D, as

minf F Err b D(f), (Err Tolerant) (3)

s.t., Ω(f, b S) τ 1 (η + )/λ

1 + (η + )/λ

ℓ [p], Pr b D [E(f), E (f), b Z = ℓ] λ η . (5)

acts as a relaxation parameter in Program (Err Tolerant), which can be ﬁxed in terms of the other parameters; see Theorem 4.3. Equation (4) ensures all feasible classiﬁers satisfy fairness constraints with respect to the perturbed samples b S. Equation (5) ensures that all feasible classiﬁers are (1 O(η/λ))-stable (see Deﬁnition 4.7). As mentioned in Section 1, this sufﬁces to ensure that all feasible classiﬁers are fair with respect to S. Finally, to ensure the accuracy guarantee the thresholds in the RHS of Equations (4) and (5) are carefully tuned to ensure that f is feasible for Program (Err Tolerant); see Lemma 4.9. We refer the reader to the proof overview of Theorem 4.3 at the end of this section for further discussion of Program (Err Tolerant).

Before presenting our result we require the deﬁnition of the Vapnik Chervonenkis (VC) dimension. Deﬁnition 4.2. Given a ﬁnite set A, deﬁne the collection of subsets FA := {{a A | f(a) = 1} | f F}. We say that F shatters a set B if |FB| = 2|B|. The VC dimension of F, VC(F) N, is the largest integer such that there exists a set C of size VC(F) that is shattered by F.

Our ﬁrst result bounds the accuracy and fairness of an optimal solution f ET of Program (Err Tolerant) for any hypothesis class F with a ﬁnite VC dimension using O(VC(F)) samples. Theorem 4.3 (Main result). Suppose Assumption 1 holds with constant λ > 0 and F has VC dimension d N. Then, for all perturbation rates η (0, λ/2), fairness thresholds τ (0, 1], bounds on error ε > 2η and constraint violation ν > 8ητ/(λ 2η), and conﬁdence parameters δ (0, 1) with probability at least 1 δ, the optimal solution f ET F of Program (Err Tolerant) with parameters η, λ, and := O (min {ε 2η, ν 8ητ/(λ 2η), λ 2η}), and N = poly(d, 1/ , log(p/δ)) perturbed samples from the η-Hamming model satisﬁes Err D(f ET) Err D(f ) ε and ΩD(f ET) τ ν.

Thus, Theorem 4.3 shows that any procedure that outputs f ET, given with a sufﬁciently large number of perturbed samples, (ε, ν)-learns F for any ε > 2η and ν = O((η τ)/λ). Theorem 4.3 can be extended to provably satisfy multiple linear-fractional metrics (at the same time) and work for multiple non-binary protected attributes; see Theorem B.2 in Supplementary Material B.1. Moreover, Theorem 4.3 also holds for the Nasty Sample Noise model. The proof of this result is implicit in the proof of Theorem 4.3; we present the details in Supplementary Material A.1.5. Finally, Program (Err Tolerant) only requires an estimate of one parameter, λ. (Since η is known, τ is ﬁxed by the user, and can be set in terms of the other parameters.) If for each ℓ [p], we also have estimates of λℓ:= Pr D [E(f ), E (f ), Z = ℓ] and γℓ:= Pr D[E (f ), Z = ℓ], then we can use this information to tighten Program (Err Tolerant) to the following program:

minf F Err b D(f), (Err Tolerant+) (6)

s.t., Ω(f, b S) τ s,

ℓ [p], Pr b D [E(f), E (f), b Z = ℓ] λℓ η .

where the scaling parameter s [0, 1] is the solution of the following optimization program

min η1,η2,...,ηp 0 min ℓ,k [p] 1 ηℓ/λℓ 1 + (ηk ηℓ)/γℓ 1 + (ηℓ ηk)/γk

1 + ηℓ/λk , s.t., X

ℓ [p] ηℓ η + . (7)

If the classiﬁers in F do not use the protected attributes for prediction, then we can show that Program (Err Tolerant+) has a fairness guarantee of (1 s)+ 4ητ/(λ 2η) (which is always smaller than 8ητ/(λ 2η)) and an accuracy guarantee of 2η. We prove this result in Supplementary Material B.2. Thus, in applications where one can estimate λ1, . . . , λp and γ1, . . . , γp, Program (Err Tolerant+) offers better fairness guarantee than Program (Err Tolerant) (up to constants).

The proof of Theorem 4.3 appears in Supplementary Material A.1.

As for computing f ET, note that in general, Program (Err Tolerant) is a nonconvex optimization problem. In our simulations, we use the standard solver SLSQP in Sci Py [57] to heuristically ﬁnd f ET; see Supplementary Material E.1. Theoretically, for any arbitrarily small α > 0, the techniques from [13] can be used to ﬁnd an f F that has the optimal objective value for Program (Err Tolerant) and that additively violates its fairness constraint (4) by at most α by solving a set of O(1/(λα)) convex programs; details appear in Supplementary Material C.

Impossibility results. We now present results complementing the guarantees of Theorem 4.3. Theorem 4.4 (No algorithm can guarantee high accuracy and fairness without Assumption 1). For all perturbation rates η (0, 1], thresholds τ (1/2, 1), conﬁdence parameters δ [0, 1/2), and bounds on the error ε [0, 1/2) and constraint violation ν [0, τ 1/2), if the fairness metric is statistical rate, then it is impossible to (ε, ν)-learn any hypothesis class F {0, 1}X [p] that shatters a set of 6 points of the form {x A, x B, x C} [2] X [p] for some distinct x A, x B, x C X.

Suppose that τ = 0.8, say to encode the 80% Then, Theorem 4.4 shows that for any η > 0, any F satisfying the condition in Theorem 4.4 is not (ε, ν)-learnable for any ε < 1/2 and ν < τ 1/2 = 3/10. Intuitively, the condition on F avoids simple hypothesis classes. It is similar to the conditions considered by works on PAC learning with adversarial perturbations [11, 39], and holds for common hypothesis classes such as decision-trees and SVMs (Remark A.28 in Supplementary Material A.5). Thus, even if η is vanishingly small, without additional assumptions, any F satisfying mild assumptions is not (ε, ν)-learnable for any ε < 1/2 and ν < 3/10, justifying Assumption 1. The proof of Theorem 4.4 appears in Supplementary Material A.2. Theorem 4.5 (Fairness guarantee of Theorem 4.3 is optimal up to a constant factor). For all perturbation rates η (0, 1], conﬁdence parameter δ [0, 1/2), and a (known) constant λ (0, 1/4],

if the fairness metric is statistical rate and τ = 1, then given the promise that Assumption 1 holds with constant λ, for any bounds ε < 1/4 2η/5 and v < η/(10λ) (1 4λ) O η2/λ2 it is impossible to (ε, ν)-learn any hypothesis class F {0, 1}X [p] that shatters a set of 10 points of the form {x A, x B, x C, x D, x E} [2] X [p] for some distinct x A, x B, x C, x D, x E X.

Suppose that λ < 1/8 and η < 1/2, then Theorem 4.5 shows that for any η > 0, any learner L that has a fairness guarantee ν < η/(20λ) O(η2/λ2), must have a poor error bound, of at least ε 1/4 2η/5 1/20, to (ε, ν)-learn any F that satisﬁes a mild assumption. When η/λ is small, this shows that any learner with a fairness guarantee ν = o(η/λ) must have an error guarantee at least 1/4 2η/5 2η. Thus, Theorem 4.5 shows that one cannot improve the fairness guarantee in Theorem 4.3 by more than a constant amount without deteriorating the error guarantee from 2η to 1/4 2η/5. Like Theorem 4.4, the condition on F in Theorem 4.5 avoids simple hypothesis classes and holds for common hypothesis (Remark A.28 in Supplementary Material A.5). Finally, complementing our accuracy guarantee, we prove that for any ε < η, no algorithm can (ε, ν)-learn any hypothesis classes F satisfying mild assumptions (Theorem A.21 in Supplementary Material A.4); its proof appears in Supplementary Material A.4. Thus, the accuracy guarantee in Theorem 4.5 is optimal up to a factor of 2. The proof of Theorem 4.5 appears in Supplementary Material A.3.

Proof overview of Theorem 4.3. We explain the key ideas behind Program (Err Tolerant) and how they connect with the proof of Theorem 4.3. Our goal is to construct error-tolerant constraints using perturbed samples b S such that the classiﬁer f ET, that has the smallest error on b S subject to satisfying these constraints, has accuracy 2η-close to that of f and that additively violates the fairness constraints by at most O(η/λ). Step 1: Lower bound on the accuracy of f ET. This step relies on Lemma 4.6. Lemma 4.6. For any bounded function g: {0, 1}2 [p] [0, 1], δ, (0, 1), and adversaries A A(η), given N = poly(1/ , VC(F), log 1/δ) true samples S D and corresponding perturbed samples A(S) := {(xi, yi, bzi)}i [N], with probability at least 1 δ, it holds that

i [N] g(f(xi, bzi), yi, bzi) E(X,Y,Z) D [g(f(X, Z), Y, Z)] + η.

The proof of Lemma 4.6 follows from generalization bounds for bounded functions (e.g., see [54]) and because the η-Hamming model perturbs at most η N samples. Let g be the 0-1 loss (i.e., g(ey, y, z) := I [ey = y]), then for all f F, Lemma 4.6 shows that the error of f on samples drawn from D and samples in b S are close: |Err D(f) Err(f, b S)| + η. Thus, intuitively, minimizing Err(f, b S) could be a good strategy to minimize Err D(f). Then, if f is feasible for Program (Err Tolerant), we can bound the error of f ET: Since f ET is optimal for Program (Err Tolerant), its error on b S is at most the error of f on b S. Using this and applying Lemma 4.6 we get that Err D(f ET) Err(f ET, b S) + η + Err(f , b S) + η + Err D(f ) + 2(η + ). (8) Step 2: Lower bound on the fairness of f ET. One could try to bound the fairness of f ET using the same approach as Step 1, i.e., show that for all f F: |ΩD(f) Ω(f, b S)| O(η/λ). Then ensuring that f has a high fairness on b S implies that it also has high fairness on S (up to an O(η/λ) factor). However, such a bound does not hold for any F satisfying mild assumptions (see Example A.27). The ﬁrst idea is to prove a similar (in fact, stronger multiplicative) bound on a speciﬁcally chosen subset of F (consisting of stable classiﬁers). Toward this, we deﬁne: Deﬁnition 4.7. A classiﬁer f F is said to be s-stable for fairness metric Ω, if for all adversaries A A(η) and conﬁdence parameters δ (0, 1), given polylog(1/δ) samples S D, with probability at least 1 δ, it holds that ΩD(f)/Ω(f,b S) [s, 1/s], where b S := A(S).

If an s-stable classiﬁer f has fairness τ on b S, then it has a fairness at least τ s on D with high probability. Thus, if we have a condition such that any feasible f F satisfying this condition is s-stable, then any classiﬁer satisfying this condition and the fairness constraint, Ω( , b S) τ/s, must have a fairness at least τ on D with high probability. The key idea is coming up such constraints. Lemma 4.8. Any classiﬁer f F that satisﬁes minℓ [p] Pr D [E(f), E (f), b Z = ℓ] λ + η + , is ( 1 (η+ )/λ

1+(η+ )/λ)2-stable for fairness metric Ω(deﬁned by events E and E ).

Step 3: Requirements for the error-tolerant program. Building on Steps 1 and 2, we prove: Lemma 4.9. If the following conditions hold then, Err D(f ET) Err D(f ) 2η and ΩD(f ET) τ O(η/λ): (C1) f is feasible for Program (Err Tolerant), and all f F feasible for Program (Err Tolerant) are (C2) s-stable for s = 1 O(η/λ), and (C3) satisfy Ω(f, b S) τ (1 O(η/λ)).

Thus, it sufﬁces to ﬁnd error-tolerant constraints that satisfy conditions (C1) to (C3). Condition (C3) can be satisﬁed by adding the constraint Ω( , b S) τ , for τ = τ (1 O(η/λ)). From Lemma 4.8, condition (C2) follows by using the constraint in minℓ [p] Pr D [E(f), E (f), b Z = ℓ] λ , for λ Θ(λ). It remains to pick τ and λ such that condition (C1) also holds. The tension in setting τ and λ is that if they are too large then condition (C1) does not hold, and if they are too small, then conditions (C2) and (C3) do not hold. In the proof we show that τ := τ ( 1 (η+ )/λ

1+(η+ )/λ)2 and λ := λ η sufﬁce to satisfy conditions (C1) to (C3) (this is where we use Assumption 1).

Overall the main technical idea is to identify the notion of s-stable classiﬁers and sufﬁcient conditions for a classiﬁer to be s-stable; combining these conditions with the fairness constraints on b S, ensures that f ET has high fairness on S, and carefully tuning the thresholds so that f is likely to be feasible for Program (Err Tolerant) ensures that f ET has an accuracy close to f .

Proof overviews of Theorems 4.4 and 4.5. Our proofs are inspired by [39, Theorem 1] and [11, Theorem 1] which consider PAC learning with adversarial corruptions. In both Theorems 4.4 and 4.5, for some ε, ν [0, 1], the goal is to show that given samples perturbed by an η-Hamming adversary, under some additional assumptions, no learner L can output a classiﬁer that has accuracy ε-close to the accuracy of f and that additively violates the fairness constraints by at most ν. Say a classiﬁer f F is good if it satisﬁes these required guarantees. The approach is to construct two or more distributions D1, D2, . . . , Dm that satisfy the following conditions: (C1) For any ℓ, k, given a iid draw S from Dℓ, an η-Hamming adversary can add perturbations such that with high probability b S is distributed according to iid samples from Dk. Thus L, who only sees b S, with high probability, cannot identify the original distribution of S and is forced to output a classiﬁer that is good for all D1, . . . , Dm. The next condition ensures that this is not possible. (C2) No classiﬁer f F is good for all D1, . . . , Dm, and for each Di (i [m]), there is at least one good classiﬁer fi F. (The latter-half ensures that the fairness and accuracy requirements are not vacuously satisﬁed.) Thus, for every L there is a distribution in D1, . . . , Dm for which L outputs a bad classiﬁer. (Note that even if the learner is randomized, it must fail with probability at least 1/m.) Finally, the assumptions on F ensure that condition (C2) is satisﬁable. For instance, if F has less than m hypothesis, then condition (C2) cannot be satisﬁed.

The key idea in the proofs is to come up with distributions satisfying the above conditions. [39, 11] follow the same outline in the context of PAC learning, however, as we also consider fairness constraints, our constructions end up being very different from their constructions. Our constructions are speciﬁc to the statistical rate fairness metric. However, one can still apply the general approach outlined above to other fairness metrics by constructing a suitable set of distributions. Full details appear in Supplementary Materials A.2 and A.3.

5 Empirical results

We implement our approach using the logistic loss function with linear classiﬁers and evaluate its performance on real world and synthetic data.

Metrics and baselines. The selection of an appropriate fairness metric is context-dependent and beyond the scope of this work [55]; for illustrative purposes we (arbitrarily) consider the statistical rate (SR) and compare an implementation of our framework (Program (Err Tolerant+)), Err-Tol, with stateof-the-art fair classiﬁcation frameworks for statistical rate under stochastic perturbations: LMZV [44] and CHKV [14]. LMZV and CHKV take parameters δL, τ [0, 1] as input; these parameters control the desired fairness, where decreasing δL or increasing τ increases the desired fairness. We also compare against KL [40], which controls for true-positive rate (TPR) in the presence of a Malicious adversary, and AKM [6] that is the post-processing method of [32] and controls for equalized-odds fairness constraints. We also compare against the optimal unconstrained classiﬁer, Uncons; this is the same as [11] s algorithm for PAC-learning in the Nasty Sample Noise Model without fairness constraints. We provide additional comparisons using our framework with false-positive rate as the fairness metric with additional baselines and using the Adult data [23] in Supplementary Material E.

Implementation details. We use a randomly generated 70-30 train (S) test (T) split of the data, and generate the perturbed data b S from S for a (known) perturbation rate η. We train each algorithm on b S, and report the accuracy (acc) and statistical rate (SR) of the output classiﬁers on the (unperturbed) test data T. Err-Tol is given the perturbation rate η and uses the SLSQP solver in Sci Py [57] to solve Program (Err Tolerant+). To advantage the baselines in our comparison, we provide them with even more information as needed by their approaches: LMZV and CHKV are given group-speciﬁc pertur-

Table 1: Simulation on synthetic data: We run CHKV and Err-Tol with τ = 0.8 on synthetic data and report their average accuracy (acc) and statistical rate (SR) with standard deviation in parentheses. The result shows that prior approaches can fail to satisfy their guarantees under the η-Hamming model.

acc (η=0%) SR (η=0%) acc (η=3%) SR (η=3%) acc (η=5%) SR (η=5%)

Unconstrained 1.00 (.001) .799 (.001) 1.00 (.000) .799 (.002) 1.00 (.001) .800 (.001) CHKV (τ=.8) 1.00 (.001) .800 (.002) .859 (.143) .787 (.015) .799 (.139) .795 (.049) Err-Tol (τ=.8) .985 (.065) .800 (.001) 1.00 (.001) .799 (.002) .999 (.002) .799 (.004)

bation rates: for each ℓ [p], ηℓ:= Pr D[ b Z = Z | Z = ℓ], and KL is given η and for each ℓ [p], the probability Pr D[Z = ℓ, Y = 1]; where D is the empirical distribution of S. Err-Tol implements Program (Err Tolerant+) which requires estimates of λℓand γℓfor all ℓ [p]. As a heuristic, we set γℓ= λℓ:= Pr b D[Z = ℓ], where b D is the empirical distribution of b S. We ﬁnd that these estimates sufﬁce, and expect that a more reﬁned approach would only improve the performance of Err-Tol.

Adversaries. We consider two η-Hamming adversaries (which we call ATN and AFN); each one computes the optimal fair classiﬁer f , which has the highest accuracy (on S) subject to having statistical rate at least τ on S. ATN considers the set of all true negatives of f that have protected attribute Z = 1, selects the η |S| samples that are furthest from the decision boundary of f , and perturbs their protected attribute to b Z = 2. AFN is similar, except that it considers the set of false negatives of f . Both adversaries try to increase the performance of f on Z = 1 in b S by removing the samples that f predicts as negative; thus, increasing f s statistical rate. The adversary s hope is that choosing samples far from the decision boundary would (falsely) give the appearance of a high statistical rate on b S. This would make a fair classiﬁcation framework output unfair classiﬁers with higher accuracy. Note that these are not intended to be worst-case adversaries; as Err-Tol comes with provable guarantees, we expect it to perform well against other adversaries while other approaches may have even poorer performance.

Simulation on synthetic data. We ﬁrst show empirically that perturbations by the η-Hamming adversary can be prohibitively disruptive for methods that attempt to correct for stochastic noise. We consider synthetic data with 1,000 samples from two equally-sized protected groups; each sample has a binary protected attribute, two continuous features x1, x2 R, and a binary label. Conditioned on the protected attribute, (x1, x2) are independent draws from a mixture of 2D Gaussians (see Figure 4). This distribution and the labels are such that a) one group has a higher likelihood of a positive label than the other, and b) Uncons has a near-perfect accuracy (> 99%) and a statistical rate of 0.8 on S. Similar to Uncons, we consider a fairness constraint of τ = 0.8. Thus, in the absence of noise, this is an easy case: where Uncons satisﬁes the fairness constraints. We generate b S using ATN, and compare against CHKV, which was developed for correcting stochastic perturbations.4

Results. The fairness and statistical rate averaged over 50 iterations are reported in Table 1 as a function of the perturbation η. At η = 0, both CHKV and Err-Tol nearly-satisfy the fairness constraint (SR 0.79) and have a near-perfect accuracy (acc 0.98). However, as η increases, while CHKV retains the same statistical rate ( 0.8), it loses a signiﬁcant amount of accuracy ( 20%). In contrast, Err-Tol has high accuracy and fairness (acc 0.99 and SR 0.79) for all η considered. Hence, this shows that stochastic approaches may fail to satisfy their guarantees under the η-Hamming model.

Simulations on real-world data. In this simulation, we show that our framework can outperform each baseline with respect to the accuracy-fairness trade-off under perturbations from the adversaries we consider, and does not under-perform compared to baselines under perturbations from either adversary. The COMPAS data in [9] contains 6,172 samples with 10 binary features and a label that is 1 if the individual did not recidivate and 0 otherwise; the statistical rate of Uncons on COMPAS is 0.78. We take gender (coded as binary) as the protected attribute, and set the fairness constraint on the statistical rate to be τ = 0.9 for Err-Tol and all baselines. We consider both adversaries ATN and AFN, and a perturbation rate of η = 3.5%, as 3.5% is roughly the smallest value for η necessary to ensure that the optimal fair classiﬁer f for τ = 0.9 (on S) has a statistical rate less than 0.78 on b S.

Results. The accuracy and statistical rate (SR) of Err-Tol and baselines for τ [0.7, 1] and δL [0, 0.1] and averaged over 100 iterations are reported in Figure 1. For both adversaries, Err-Tol attains a better statistical rate than the unconstrained classiﬁer (Uncons) for a small trade-off in accuracy. For adversary ATN (Figure 1(a)), Uncons has statistical rate (0.80) and accuracy (0.67). In contrast,

4We also attempted to compare against AKM, KL, and LMZV. But they did not converge to f even on the unperturbed synthetic data, and hence, we did not include these results as it would be an unfair comparison.

Unconstrained KL AKM Err-Tol (This work) CHKV LMZV

0.70 0.75 0.80 0.85 0.90 0.95 1.00 Statistical rate

Adversarial noise with =3.5%

aaaaaaaaaaaaa Accuracy

Statistical rate

(more fair) (less fair)

0.70 0.75 0.80 0.85 0.90 0.95 1.00 Statistical rate

Adversarial noise with =3.5%

aaaaaaaaaaaaa Accuracy

Statistical rate

(more fair) (less fair)

Figure 1: Simulations on COMPAS data: Perturbed data is generated using adversary ATN (a) and AFN (b) as described in Section 5 with η = 3.5%. All algorithms are run on the perturbed data varying the fairness parameters (τ [0.7, 1] and δL [0, 0.1]). The y-axis depicts accuracy and the x-axis depicts statistical rate (SR); both values are computed over the unperturbed test set. We observe that for both adversaries our approach Err-Tol, attains a better fairness than the unconstrained classiﬁer with a natural trade-off in accuracy. Further, Err-Tol achieves a better fairness-accuracy trade-off than each baseline on at least one of (a) or (b). Error bars represent the standard error of the mean.

Err-Tol achieves high statistical rate (0.92) with a trade-off in accuracy (0.60). In comparison, AKM has a higher accuracy (0.65) but a lower statistical rate (0.87), and other baselines have an even lower statistical rate ( 0.84) with accuracy comparable to AKM. For adversary AFN (Figure 1(b)), Uncons has statistical rate (0.80) and accuracy (0.67), while Err-Tol has a signiﬁcantly higher SR (0.91) and accuracy (0.61). This signiﬁcantly outperforms AKM which has statistical rate (0.83) and accuracy (0.58). LMZV achieves the highest statistical rate (0.97) with a natural reduction in accuracy to (0.57). In this case, Err-Tol has similar accuracy to statistical rate trade-off as LMZV, but achieves a lower maximum statistical rate (0.91). Meanwhile, Err-Tol has a signiﬁcantly higher statistical rate trade-off than CHKV at the same accuracy. We further evaluate our framework under stochastic perturbations in Supplementary Material E (speciﬁcally, against the perturbation model of [14]) and observe similar statistical rate and accuracy trade-offs as approaches [14, 44] tailored for stochastic perturbations. Remark 5.1 (Range of fairness parameters in the simulation). Among baselines, AKM, KL, and Uncons do not take the desired-fairness value as input, so they appear as points in Figure 1. For all other methods (CHKV, Err-Tol, and LMZV), we vary the fairness parameters starting from the tightest constraints (i.e., τ = 1 and δL = 0) and relax the constraints until all algorithms achieved statistical rate matches the achieved statistical rate of the unconstrained classiﬁer (this happens around τ = 0.7 and δL = 0.1). We do not relax the fairness parameters further because the resulting problem is equivalent to the unconstrained classiﬁcation problem. (This is because the unconstrained classiﬁer, which has the highest accuracy, satisﬁes the fairness constraints for τ 0.7 and δL 0.1).

6 Limitations and conclusion

This work extends fair classiﬁcation to real-world settings where perturbations in the protected attributes may be correlated or affect arbitrary subsets of samples. We consider the η-Hamming model and give a framework that outputs classiﬁers with provable guarantees on both fairness and accuracy; this framework works for categorical protected attributes and the class of linear-fractional fairness metrics. We show near-tightness of our framework s guarantee and extend it to the Nasty Sample Noise model, which can perturb both labels and features. Empirically, classiﬁers produced by our framework achieve high fairness at a small cost to accuracy and outperform existing approaches.

Compared to existing frameworks for fair classiﬁcation with stochastic perturbations, our framework requires less information about the perturbations. That said, in a few applications, e.g., the randomized response procedure [60], where the perturbations are independent across samples and identically distributed according to a known distribution, frameworks for fair classiﬁcation with stochastic perturbations can perform better. Further, like existing frameworks, our framework s efﬁcacy will depend on an appropriate choice of parameters; e.g., an overly conservative λ can decrease accuracy and an optimistic λ can decrease fairness. A careful assessment both preand post-deployment would be important to avoid negative social implications in a misguided attempt to do good [45].

Finally, we note that discrimination is a systematic problem and our work only addresses one part of it; this work would be effective as one piece of a broader approach to mitigate and rectify biases.

Acknowledgements. This research was supported in part by an NSF CAREER Award (IIS2045951), a J.P. Morgan Faculty Award, and an AWS MLRA Award.

[1] Alekh Agarwal, Alina Beygelzimer, Miroslav Dudík, John Langford, and Hanna M. Wallach. A Reductions Approach to Fair Classiﬁcation. In ICML, volume 80 of Proceedings of Machine Learning Research, pages 60 69. PMLR, 2018.

[2] Dana Angluin and Philip D. Laird. Learning from noisy examples. Mach. Learn., 2(4):343 370, 1987.

[3] Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. COMPAS recidivism risk score data and analysis, 2016.

[4] Peter Auer. Learning with Malicious Noise, pages 1086 1089. Springer New York, New York, NY, 2016.

[5] Peter Auer and Nicolo Cesa-Bianchi. On-line learning with malicious noise and the closure algorithm. Annals of mathematics and artiﬁcial intelligence, 23(1):83 99, 1998.

[6] Pranjal Awasthi, Matthäus Kleindessner, and Jamie Morgenstern. Equalized odds postprocessing under imperfect group information. In International Conference on Artiﬁcial Intelligence and Statistics, pages 1770 1780. PMLR, 2020.

[7] Eugene Bagdasaryan, Omid Poursaeed, and Vitaly Shmatikov. Differential privacy has disparate impact on model accuracy. Advances in Neural Information Processing Systems, 32:15479 15488, 2019.

[8] Solon Barocas, Moritz Hardt, and Arvind Narayanan. Fairness and Machine Learning. fairmlbook.org, 2019. http://www.fairmlbook.org.

[9] Rachel K. E. Bellamy, Kuntal Dey, Michael Hind, Samuel C. Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, Seema Nagar, Karthikeyan Natesan Ramamurthy, John Richards, Diptikalyan Saha, Prasanna Sattigeri, Moninder Singh, Kush R. Varshney, and Yunfeng Zhang. AI Fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias, October 2018.

[10] Avrim Blum and Kevin Stangl. Recovering from biased data: Can fairness constraints improve accuracy? In FORC, volume 156 of LIPIcs, pages 3:1 3:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020.

[11] Nader H Bshouty, Nadav Eiron, and Eyal Kushilevitz. PAC learning with nasty noise. Theoretical Computer Science, 288(2):255 275, 2002.

[12] Joy Buolamwini and Timnit Gebru. Gender shades: Intersectional accuracy disparities in commercial gender classiﬁcation. In FAT, volume 81 of Proceedings of Machine Learning Research, pages 77 91. PMLR, 2018.

[13] L. Elisa Celis, Lingxiao Huang, Vijay Keswani, and Nisheeth K. Vishnoi. Classiﬁcation with Fairness Constraints: A Meta-Algorithm with Provable Guarantees. In FAT, pages 319 328. ACM, 2019.

[14] L. Elisa Celis, Lingxiao Huang, Vijay Keswani, and Nisheeth K. Vishnoi. Fair classiﬁcation with noisy protected attributes. In ICML, volume 120 of Proceedings of Machine Learning Research. PMLR, 2021.

[15] Nicolo Cesa-Bianchi, Eli Dichterman, Paul Fischer, Eli Shamir, and Hans Ulrich Simon. Sample-efﬁcient strategies for learning in the presence of noise. Journal of the ACM (JACM), 46(5):684 719, 1999.

[16] Jiahao Chen, Nathan Kallus, Xiaojie Mao, Geoffry Svacha, and Madeleine Udell. Fairness under unawareness: Assessing disparity when protected class is unobserved. In FAT, pages 339 348. ACM, 2019.

[17] Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2):153 163, 2017.

[18] Andrew J Coldman, Terry Braun, and Richard P Gallagher. The classiﬁcation of ethnic status using name information. Journal of Epidemiology & Community Health, 42(4):390 395, 1988.

[19] Amanda Coston, Karthikeyan Natesan Ramamurthy, Dennis Wei, Kush R. Varshney, Skyler Speakman, Zairah Mustahsan, and Supriyo Chakraborty. Fair transfer learning with missing protected attributes. In AIES, pages 91 98. ACM, 2019.

[20] N.R. Council, D.B.S.S. Education, C.N. Statistics, P.D.C.R.E. Data, E. Perrin, and M.V. Ploeg. Eliminating Health Disparities: Measurement and Data Needs. National Academies Press, 2004.

[21] Bill Dedman. The color of money. Atlanta Journal-Constitution, pages 1 4, 1988.

[22] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A largescale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248 255. Ieee, 2009.

[23] Dua Dheeru and E Karra Taniskidou. UCI machine learning repository, 2017. http://archiv e.ics.uci.edu/ml.

[24] Cynthia Dwork, Nicole Immorlica, Adam Tauman Kalai, and Mark D. M. Leiserson. Decoupled classiﬁers for group-fair and efﬁcient machine learning. In FAT, volume 81 of Proceedings of Machine Learning Research, pages 119 133. PMLR, 2018.

[25] Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. Certifying and removing disparate impact. In KDD, pages 259 268. ACM, ACM, 2015.

[26] Benjamin Fish, Jeremy Kun, and Ádám D Lelkes. A conﬁdence-based approach for balancing fairness and accuracy. In Proceedings of the 2016 SIAM International Conference on Data Mining, pages 144 152. SIAM, 2016.

[27] Naman Goel, Mohammad Yaghini, and Boi Faltings. Non-discriminatory machine learning through convex fairness criteria. In AAAI 2018, 2018.

[28] Gabriel Goh, Andrew Cotter, Maya R. Gupta, and Michael P. Friedlander. Satisfying real-world goals with dataset constraints. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 2415 2423, 2016.

[29] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In ICLR (Poster), 2015.

[30] Maya R. Gupta, Andrew Cotter, Mahdi Milani Fard, and Serena Wang. Proxy fairness. Co RR, abs/1806.11212, 2018.

[31] Richard W Hamming. Error detecting and error correcting codes. The Bell system technical journal, 29(2):147 160, 1950.

[32] Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning. In Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett, editors, Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 3315 3323, 2016.

[33] Tatsunori B. Hashimoto, Megha Srivastava, Hongseok Namkoong, and Percy Liang. Fairness without demographics in repeated loss minimization. In ICML, volume 80 of Proceedings of Machine Learning Research, pages 1934 1943. PMLR, 2018.

[34] Mara Hvistendahl. Can predictive policing prevent crime before it happens. Science Magazine, 28, 2016.

[35] Vidit Jain and Erik Learned-Miller. Fddb: A benchmark for face detection in unconstrained settings. Technical Report UM-CS-2010-009, University of Massachusetts, Amherst, 2010.

[36] Nathan Kallus, Xiaojie Mao, and Angela Zhou. Assessing algorithmic fairness with unobserved protected class using data combination. In FAT*, page 110. ACM, 2020.

[37] Faisal Kamiran and Toon Calders. Classifying without discriminating. In 2nd International Conference on Computer, Control and Communication, 2009. IC4 2009., pages 1 6. IEEE, 2009.

[38] Faisal Kamiran and Toon Calders. Data preprocessing techniques for classiﬁcation without discrimination. Knowledge and Information Systems, 33(1):1 33, Oct 2012.

[39] Michael J. Kearns and Ming Li. Learning in the presence of malicious errors. SIAM J. Comput., 22(4):807 837, 1993.

[40] Nikola Konstantinov and Christoph H. Lampert. Fairness-aware learning from corrupted data. Co RR, abs/2102.06004, 2021.

[41] D. Kraft. A software package for sequential quadratic programming. Deutsche Forschungsund Versuchsanstalt für Luftund Raumfahrt Köln: Forschungsbericht. Wiss. Berichtswesen d. DFVLR, 1988.

[42] Emmanouil Krasanakis, Eleftherios Spyromitros-Xiouﬁs, Symeon Papadopoulos, and Yiannis Kompatsiaris. Adaptive sensitive reweighting to mitigate bias in fairness-aware classiﬁcation. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, April 23-27, 2018. International World Wide Web Conferences Steering Committee, 2018.

[43] Preethi Lahoti, Alex Beutel, Jilin Chen, Kang Lee, Flavien Prost, Nithum Thain, Xuezhi Wang, and Ed Chi. Fairness without demographics through adversarially reweighted learning. In Neur IPS, 2020.

[44] Alexandre Louis Lamy and Ziyuan Zhong. Noise-tolerant fair classiﬁcation. In Neur IPS, pages 294 305, 2019.

[45] Lydia T Liu, Sarah Dean, Esther Rolf, Max Simchowitz, and Moritz Hardt. Delayed impact of fair machine learning. In International Conference on Machine Learning, pages 3150 3158. PMLR, 2018.

[46] Elizabeth Luh. Not so black and white: Uncovering racial bias from systematically misreported trooper reports. Available at SSRN 3357063, 2019.

[47] Binh Thanh Luong, Salvatore Ruggieri, and Franco Turini. k-nn as an implementation of situation testing for discrimination discovery and prevention. In KDD 2011, pages 502 510. ACM, 2011.

[48] Aditya Krishna Menon and Robert C. Williamson. The cost of fairness in binary classiﬁcation. In FAT 2018, pages 107 118, 2018.

[49] Vidya Muthukumar, Tejaswini Pedapati, Nalini Ratha, Prasanna Sattigeri, Chai-Wah Wu, Brian Kingsbury, Abhishek Kumar, Samuel Thomas, Aleksandra Mojsilovic, and Kush R Varshney. Understanding unequal gender classiﬁcation accuracy from face images. ar Xiv preprint ar Xiv:1812.00099, 2018.

[50] Northpointe. Compas risk and need assessment systems. http://www.northpointeinc.c om/files/downloads/FAQ_Document.pdf, 2012.

[51] Geoff Pleiss, Manish Raghavan, Felix Wu, Jon M. Kleinberg, and Kilian Q. Weinberger. On fairness and calibration. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 5684 5693, 2017.

[52] Catherine Saunders, Gary Abel, Anas El Turabi, Faraz Ahmed, and Georgios Lyratzopoulos. Accuracy of routinely recorded ethnic group information compared with self-reported ethnicity: Evidence from the english cancer patient experience survey. BMJ open, 3, 06 2013.

[53] Clayton Scott, Gilles Blanchard, and Gregory Handy. Classiﬁcation with asymmetric label noise: Consistency and maximal denoising. In Conference on learning theory, pages 489 511. PMLR, 2013.

[54] Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.

[55] Megha Srivastava, Hoda Heidari, and Andreas Krause. Mathematical notions vs. human perception of fairness: A descriptive approach to fairness for machine learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2459 2468, 2019.

[56] Leslie G Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134 1142, 1984.

[57] Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, Ilhan Polat, Yu Feng, Eric W. Moore, Jake Vander Plas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E. A. Quintero, Charles R. Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and Sci Py 1.0 Contributors. Sci Py 1.0: Fundamental Algorithms for Scientiﬁc Computing in Python. Nature Methods, 17:261 272, 2020.

[58] Jialu Wang, Yang Liu, and Caleb Levy. Fair classiﬁcation with group-dependent label noise. In FAcc T, pages 526 536. ACM, 2021.

[59] Serena Wang, Wenshuo Guo, Harikrishna Narasimhan, Andrew Cotter, Maya R. Gupta, and Michael I. Jordan. Robust optimization for fairness with noisy protected groups. In Neur IPS, 2020.

[60] Stanley L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63 69, 1965. PMID: 12261830.

[61] Blake E. Woodworth, Suriya Gunasekar, Mesrob I. Ohannessian, and Nathan Srebro. Learning non-discriminatory predictors. In COLT 2017, pages 1920 1953, 2017.

[62] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. Fairness beyond disparate treatment & disparate impact: Learning classiﬁcation without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web, pages 1171 1180. International World Wide Web Conferences Steering Committee, 2017.

[63] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez-Rodriguez, and Krishna P. Gummadi. Fairness constraints: Mechanisms for fair classiﬁcation. In AISTATS, volume 54 of Proceedings of Machine Learning Research, pages 962 970. PMLR, 2017.

[64] Richard Zemel, Yu Wu, Kevin Swersky, Toniann Pitassi, and Cynthia Dwork. Learning fair representations. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28, ICML 13, pages III 325 III 333. JMLR.org, 2013.

[65] Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pages 335 340. ACM, 2018.

[66] Xiangxin Zhu and Deva Ramanan. Face detection, pose estimation, and landmark localization in the wild. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2879 2886, 2012.