# superhuman_fairness__928d94c8.pdf

Superhuman Fairness

Omid Memarrast 1 Linh Vu 1 Brian Ziebart 1

The fairness of machine learning-based decisions has become an increasingly important focus in the design of supervised machine learning methods. Most fairness approaches optimize a specified trade-off between performance measure(s) (e.g., accuracy, log loss, or AUC) and fairness measure(s) (e.g., demographic parity, equalized odds). This begs the question: are the right performancefairness trade-offs being specified? We instead recast fair machine learning as an imitation learning task by introducing superhuman fairness, which seeks to simultaneously outperform human decisions on multiple predictive performance and fairness measures. We demonstrate the benefits of this approach given suboptimal decisions.

1. Introduction

The social impacts of algorithmic decisions based on machine learning have motivated various group and individual fairness properties that decisions should ideally satisfy (Calders et al., 2009; Hardt et al., 2016). Unfortunately, impossibility results prevent multiple common group fairness properties from being simultaneously satisfied (Kleinberg et al., 2016). Thus, no set of decisions can be universally fair to all groups and individuals for all notions of fairness. Instead, specified weightings, or trade-offs, of different criteria are often optimized (Liu & Vicente, 2022). Identifying an appropriate trade-off to prescribe to these fairness methods is a daunting task open to application-specific philosophical and ideological debate that could delay or completely derail the adoption of algorithmic methods.

We consider the motivating scenario of multiple (errorprone) stakeholders with different notions of fairness and desired performance-fairness trade-offs collaboratively producing decisions. Preference elicitations (Hiranandani et al.,

1Department of Computer Science, University of Illinois Chicago, Chicago, USA. Correspondence to: Omid Memarrast <omemar2@uic.edu>.

Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright 2023 by the author(s).

Figure 1. Three sets of decisions (black dots) with different predictive performance and group disparity values defining the sets of 100%-, 67%-, and 33%-superhuman fairness-performance values (red shades) based on Pareto dominance.

2020) is of limited use since knowing the stakeholder tradeoffs still leaves the question of how different stakeholders preferences should be prioritized. Rather than seeking optimal decisions for specific performance-fairness (meta- )trade-offs, we propose a more modest, yet more practical objective: produce decisions preferred by all stakeholders over human-produced decisions with maximal frequency. This provides an opportunity for superhuman decisions that Pareto dominate human decisions across predictive performance and fairness measures (Figure 1) without identifying an explicit desired trade-off. We argue that for many algorithmic fairness tasks, frequently outperforming human decisions across all relevant predictive performance and fairness measures may be sufficient for replacing human decision-makers with algorithmic decision-makers.

To the best of our knowledge, this paper is the first to define fairness objectives for supervised machine learning with respect to noisy human decisions rather than using prescriptive trade-offs or hard constraints. We leverage and extend a recently-developed imitation learning method for subdominance minimization (Ziebart et al., 2022). Instead of using the subdominance to identify a target trade-off, as previous work does in the inverse optimal control setting of sequential decision-making to estimate a cost function, we use it to directly optimize our fairness-aware classifier. We develop a method based on policy gradient optimization (Sutton & Barto, 2018) that allows flexible classes of probabilistic decision policies (e.g., aware or unaware of protected group membership status) to be optimized for given sets of performance/fairness measures and demonstrations.

We conduct extensive experiments on standard fairness datasets (Adult and COMPAS) using accuracy as a per-

Superhuman Fairness

formance measure and three conflicting fairness definitions: Demographic Parity (Calders et al., 2009), Equalized Odds (Hardt et al., 2016), and Predictive Rate Parity (Chouldechova, 2017). Though our motivation is to outperform human decisions, we employ a synthetic decision-maker with differing amounts of label and group membership noise to identify sufficient conditions for superhuman fairness of varying degrees. We find that our approach achieves high levels of superhuman performance that increase rapidly with reference decision noise and significantly outperform the superhumanness of other methods that are based on more narrow fairness-performance objectives.

2. Fairness, Elicitation, and Imitation

2.1. Group Fairness Measures

Group fairness measures are primarily defined by confusion matrix statistics (based on labels yi {0, 1} and decisions/predictions ˆyi {0, 1} produced from inputs xi RM) for examples belonging to different protected groups (e.g., ai {0, 1}).

We focus on three prevalent fairness properties in this paper:

Demographic Parity (DP) (Calders et al., 2009) requires equal positive rates across protected groups:

P( ˆY = 1|A = 1) = P( ˆY = 1|A = 0);

Equalized Odds (Eq Odds) (Hardt et al., 2016) requires equal true positive rates and false positive rates across groups, i.e.,

P( ˆY =1|Y =y, A=1) = P( ˆY =1|Y =y, A=0), y {0, 1};

Predictive Rate Parity (PRP) (Chouldechova, 2017) requires equal positive predictive value (ˆy = 1) and negative predictive value (ˆy = 0) across groups:

P(Y =1|A=1, ˆY = ˆy) = P(Y =1|A=0, ˆY = ˆy), ˆy {0, 1}.

Violations of these fairness properties can be measured as differences:

D.DP(ˆy, a) =

PN i=1 I [ˆyi =1, ai =1] PN i=1 I [ai =1] (1)

PN i=1 I [ˆyi =1, ai =0] PN i=1 I [ai =0]

D.Eq Odds(ˆy, y, a) = max y {0,1}

PN i=1 I [ˆyi =1, yi =y, ai =1] PN i=1 I [ai =1, yi =y]

PN i=1 I [ˆyi =1, yi =y, ai =0] PN i=1 I [ai =0, yi =y]

D.PRP(ˆy, y, a) = max y {0,1}

PN i=1 I [yi =1, ˆyi =y, ai =1] PN i=1 I [ai =1, ˆyi =y]

PN i=1 I [yi =1, ˆyi =y, ai =0] PN i=1 I [ai =0, ˆyi =y]

2.2. Performance-Fairness Trade-offs

Numerous fair classification algorithms have been developed over the past few years, with most targeting one or two fairness measures (Zafar et al., 2015; Hardt et al., 2016; Goel et al., 2018; Aghaei et al., 2019). With some exceptions (Blum & Stangl, 2019), predictive performance and fairness are typically competing objectives in supervised machine learning approaches (Menon & Williamson, 2018). Thus, though satisfying many fairness properties simultaneously may be na ıvely appealing, doing so often significantly degrades predictive performance or even creates infeasibility (Kleinberg et al., 2016).

Given this, many approaches seek to choose parameters θ for (probabilistic) classifier Pθ that balance the competing predictive performance and fairness objectives (Kamishima et al., 2012; Hardt et al., 2016; Menon & Williamson, 2018; Celis et al., 2019; Martinez et al., 2020; Rezaei et al., 2020). Recently, Hsu et al. (2022) proposed a novel optimization framework to satisfy three conflicting fairness measures (demographic parity, equalized odds, and predictive rate parity) to the best extent possible:

min θ Eˆy Pθ h loss(ˆy, y) + αDPD.DP(ˆy, a) (4)

+ αOdds D.Eq Odds(ˆy, y, a) + αPRPD.PRP(ˆy, y, a) i .

2.3. Preference Elictation & Imitation Learning

Preference elicitation (Chen & Pu, 2004) is one natural approach to identifying desirable performance-fairness tradeoffs. Preference elicitation methods typically query users for their pairwise preference on a sequence of pairs of options to identify their utilities for different characteristics of the options. This approach has been extended and applied to fairness measure elicitation (Hiranandani et al., 2020), allowing efficient learning of linear (e.g., Eq. (4)) and nonlinear measures from finite and noisy preference feedback.

When decisions are made jointly by multiple stakeholders (Donaldson & Preston, 1995) rather than a single individual, preference elicitation may not be very informative. Each stakeholder s preferences could be elicited, for example, but how those sets of preferences should be prioritized to determine joint outcomes can remain unclear without strong additional assumptions about the decision-making process (e.g., outcomes determined by a majority vote) (Dowling et al., 2016).

Imitation learning (Osa et al., 2018) is a type of supervised

Superhuman Fairness

machine learning that seeks to produce a general-use policy ˆπ based on demonstrated trajectories of states and actions, ξ = ( s1, a1, s2, . . . , s T ). Inverse reinforcement learning methods (Abbeel & Ng, 2004; Ziebart et al., 2008) seek to rationalize the demonstrated trajectories as the result of (near-) optimal policies on an estimated cost or reward function. Feature matching (Abbeel & Ng, 2004) plays a key role in these methods, guaranteeing if the expected feature counts match, the estimated policy ˆπ will have an expected cost under the demonstrator s unknown fixed cost function weights w RK equal to the average of the demonstrated trajectories:

Eξ ˆπ [fk(ξ)] = 1

i=1 fk ξi , k (5)

= Eξ ˆπ [cost w(ξ)] = 1

i=1 cost w ξi ,

where fk(ξ) = P

st ξ fk (st).

Syed & Schapire (2007) seeks to outperform the set of demonstrations when the signs of the unknown cost function are known, wk 0, by making the inequality,

Eξ π [fk(ξ)] 1

i=1 fk ξi , k, (6)

strict for at least one feature. Subdominance minimization (Ziebart et al., 2022) seeks to produce trajectories that outperform each demonstration by a margin:

fk(ξ) + mk fk( ξi), i, k, (7)

under the same assumption of known cost weight signs. However, since this is often infeasible, the approach instead minimizes the subdominance, which measures the α-weighted violation of this inequality:

subdomα(ξ, ξ) X

h αk fk(ξ) fk( ξ) + 1 i

where [f(x)]+ max(f(x), 0) is the hinge function and the per-feature margin has been reparameterized as α 1 k . Previous work (Ziebart et al., 2022) has employed subdominance minimization in conjunction with inverse optimal control:

min w min α

k=1 subdomα(ξ (w), ξi), where:

ξ (w) = argmin ξ

learning the cost function parameters w for the optimal trajectory ξ (w) that minimizes subdominance. One contribution of this paper is extending subdominance minimization

to the more flexible prediction models needed for fairnessaware classification that are not directly conditioned on cost features or performance/fairness measures.

3. Subdominance Minimization for Improved Fairness-Aware Classification

We approach fair classification from an imitation learning perspective (Ziebart et al., 2022). We assume vectors of (human-provided) reference decisions are available that may have been produced collaboratively by multiple stakeholders with competing predictive performance-fairness tradeoffs. Our goal is to construct a fairness-aware classifier that outperforms reference decisions on all performance and fairness measures on withheld data as frequently as possible, which also provides guarantees to all stakeholders.

3.1. Superhumanness and Subdominance

We consider reference decisions y = { yj}M j=1 that are drawn from an (unknown) human decision-making process or baseline method P, on a set of M items, XM L = {xj}M j=1, where L is the number of attributes in each of M items xj. Group membership attributes am from vector a indicate to which group item m belongs.

The predictive performance and fairness of decisions ˆy for each item are assessed based on ground truth y and group membership a using a set of predictive loss and unfairness measures1 {fk(ˆy, y, a)} (e.g., Equations 1, 2, 3). Without loss of generality, we assume that larger values for these measures are less desirable. Ideally, the set of these measures should cover the bases of all stakeholder preference functions (i.e., stakeholder cost functions for evaluating different vectors of decisions can be expressed as summed monotonic transformations of {fk(ˆy, y, a)} measures). Definition 3.1. A fairness-aware classifier is considered γsuperhuman for a given set of predictive loss and unfairness measures {fk} if its decisions ˆy satisfy:

P (f (ˆy, y, a) f ( y, y, a)) γ.

If strict Pareto dominance is required to be γ-superhuman, which is often effectively true for continuous domains, then by definition, at most (1 γ)% of human decision makers could be γ-superhuman. However, far fewer than (1 γ) may be γ-superhuman if pairs of human decisions do not Pareto dominate one another in either direction (i.e., neither f ( y, y, a) f ( y , y, a) nor f ( y , y, a) f ( y, y, a) for pairs of human decisions y and y ). From this perspective, any decisions with γ-superhuman performance more

1These measures take the place of features used to define cost/reward function in imitation learning methods. We instead use features to describe the inputs to our fairness-aware decision model, ˆPθ.

Superhuman Fairness

than (1 γ)% of the time exceed the performance limit for the distribution of human demonstrators.

Unfortunately, directly maximizing γ is difficult in part due to the discontinuity of Pareto dominance ( ). The subdominance (Ziebart et al., 2022) serves as a convex upper bound for non-dominance in each measure {fk} and on 1 γ in aggregate:

subdomk αk(ˆy, y, y, a) (9)

[αk (fk(ˆy, y, a) fk( y, y, a)) + 1]+ ;

subdomα(ˆy, y, y, a) X

k subdomk αk(ˆy, y, y, a).

Given N vectors of reference decisions as demonstrations, Y = { yi}N i=1, the subdominance for decision vector ˆy with respect to the set of demonstrations is:2

subdomα(ˆy, Y, y, a) = 1

y Y subdomα(ˆy, y, y, a),

where ˆyi is the predictions produced by our model for the set of items Xi, and ˆY is the set of these prediction sets, ˆY = {ˆyi}N i=1.

Figure 2. A Pareto frontier for possible ˆPθ (blue) optimally trading off predictive performance (e.g., inaccuracy) and group unfairness. The modelproduced decision (red point) defines dominance boundaries (solid red) and margin boundaries (dashed red), which incur subdominance (maroon lines) on three examples.

The subdominance is illustrated by Figure 2. Following concepts from support vector machines (Cortes & Vapnik, 1995), reference decisions y that actively constrain the predictions ˆy in a particular measure dimension, k, are referred to as support vectors and denoted as:

YSVk(ˆy, αk) n y|αk(fk(ˆy) fk( y)) + 1 0 o .

3.2. Performance-Fairness Subdominance Minimization

We consider probabilistic predictors Pθ : X M YM that make structured predictions over the set of items in the most general case, but can also be simplified to make conditionally independent decisions for each item.

2For notational simplicity, we assume all demonstrated decisions y Y correspond to the same M items represented in X. Generalization to unique X for each demonstration is straightforward.

Definition 3.2. The minimally subdominant fairness-aware classifier ˆPθ has model parameters θ chosen by:

argmin θ min α 0 Eˆy|X Pθ h subdomα ˆy, Y, y, a i + λ α 1.

Hinge loss slopes α {αk}K k=1 are also learned from the data during training. For the subdominance of the kth measure, αk indicates the degree of sensitivity to how much the algorithm fails to sufficiently outperform demonstrations in that measure. When αk value is higher, reducing underperformance on that measure minimizes the overall subdominance more than reducing underperformance on other measures.

The bi-level optimization of θ and α differs from singlelevel support vector machine optimization (of θ alone), which is a convex optimization problem (Cortes & Vapnik, 1995). Instead, subdominance is a quasi-convex function, which similarly implies that there are no local optima as a function of the realized predictive performance/fairness measures.

Theorem 3.3. The αk-minimized subbdominance,

Γk(ˆy, Y,y,a) z }| { min αk 0

subdomk αk ˆy, Y, y, a + λkαk , (10)

is a quasiconvex function in terms of the set of measures, {fk(ˆy, y, a)}.

We use the gradient of the expected subdominance with respect to θ and α to update these variables iteratively, and after convergence, the best learned weights θ are used in the final model ˆPθ . Though subdominance minimization is not necessarily quasiconvex in terms of model parameters θ, particularly for complex, nonlinear models, stochastic gradient methods are often effective in avoiding local optima. A commonly used linear model like logistic regression can be used for ˆPθ to simplify the overall optimization. Theorem 3.4. The gradient of expected subdominance under ˆPθ with respect to the set of reference decisions { yi}N i=1 is:

θEˆy|X ˆ Pθ

k Γk ˆy, Y, y, a #

= Eˆy|X ˆ Pθ

k Γk(ˆy, Y, y, a)

θ log ˆPθ(ˆy|X) ,

where the optimal αk for each Γk (10) is obtained from:

αk = argmin

α(m) k m such that fk (ˆy) + λ 1

j=1 fk y(j) ,

Superhuman Fairness

using α(j) k = 1 fk(ˆy(j)) fk( y(j)) to represent the αk value

that would make the demonstration with the jth smallest fk measure, y(j), a support vector with zero subdominance.

Using gradient descent, we update the model weights θ using an approximation of the gradient based on a set of sampled predictions ˆy ˆY from the model ˆPθ:

k Γk(ˆy, Y, y, a)

θ log ˆPθ(ˆy|X)

Algorithm 1 Subdominance policy gradient optimization Draw N set of reference decisions { yi}N i=1 from a human decision-maker or baseline method P. Initialize: θ θ0 while θ not converged do

Sample model predictions {ˆyi}N i=1 from ˆPθ(.|Xi) for the matching items used in reference decisions { yi}N i=1 for k {1, ..., K} do

Sort reference decisions { yi}N i=1 in ascending order by kth measure value fk( yi): { y(j)}N j=1 Compute α(j) k = 1 fk( y(j)) fk(ˆy(j)) αk = argmin

α(m) k m such that

fk (ˆy) + λ 1

m Pm j=1 fk y(j)

Compute Γk(ˆyi, Y, y, a)

k Γk(ˆyi, Y, y, a) θ log ˆPθ(ˆyi|Xi);

We show the steps for training our model in Algorithm 1. Reference decisions { yi}N i=1 from a human decision-making process or baseline method P are provided as input to the algorithm. θ is set to an initial value. In each iteration of the algorithm, we first sample a set of model predictions {ˆyi}N i=1 from ˆPθ(.|Xi) for the matching items used for reference decisions { yi}N i=1. We then find the new θ (and α) based on the algorithms discussed in Theorem 3.4.

3.3. Generalization Bounds

A fairness-aware classifier with a relatively small number of support vectors has important generalization guarantees under iid assumptions.

Theorem 3.5. A classifier ˆPθ from a family with a convex realizable space of measures {fk(ˆy, y, y, a)} minimizing P

i subdomα (ˆy, yi, yi, a) on a set of N iid reference de-

cisions with support vector sets n YSVk (ˆy, αk) o is on average γ-superhuman on the population distribution with: γ = 1 1

N SK k=1 YS Vk (ˆy, αk) .

The proof for this generalization bound (see Appendix A) is an extension to our setting of the generalization bound

based on support vectors developed for inverse optimal control subdominance minimization (Ziebart et al., 2022). It requires that the realizable set of measures {fk(ˆy, y, a)} is convex and that the (deterministic) Pθ with measures globally minimizing subdominance can be found. This may be unrealistic for complex Pθ models (e.g., multilayer neural networks).

Importantly, superhuman performance provides comparative satisfaction guarantees for stakeholders. Specifically, stakeholders will prefer the algorithmic decisions with at least γ frequency for a fairly wide range of cost functions defined in terms of the measures {fk(ˆy, y, a)}.

Corollary 3.6. For any stakeholder with a cost function, cost(f, X) such that:

f1 f2 = cost(f1, X) cost(f2, X),

a γ-superhuman classifier will be preferable in expectation with probability at least:

P (cost(f(ˆy, y, a), X) cost(f( y, y, a)), X)) γ.

4. Experiments

The goal of our approach is to produce a fairness-aware prediction method that outperforms reference (human) decisions on multiple fairness/performance measures. In this section, we discuss our experimental design to synthesize reference decisions with varying levels of noise, evaluate our method, and provide comparison baselines.3

4.1. Training and Testing Dataset Construction

To emulate human decision-making with various levels of noise, we add noise to benchmark fairness datasets and apply fair learning methods over repeated randomized dataset splits. We describe this process in detail in the following section.

Datasets We perform experiments on two benchmark fairness datasets:

UCI Adult dataset (Dheeru & Karra Taniskidou, 2017) considers predicting whether a household s income exceeds $50K/yr based on census data. Group membership is based on gender. The dataset consists of 45,222 items.

COMPAS dataset (Larson et al., 2016) considers predicting recidivism with group membership based on race. It consists of 6,172 examples.

3Our code is publicly available at https://github.com/ omid Memari/superhumn-fairness.

Superhuman Fairness

fair_logloss_eqodds fair_logloss_dp

post_proc_eqodds post_proc_dp

MFOpt post_proc_demos

superhuman_train superhuman_test

0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 D.DP

Prediction error

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.00 0.05 0.10 0.15 0.20 0.25 0.30 D.Eq Odds

Prediction error

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 D.PRP

Prediction error

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.0 0.1 0.2 0.3 0.4 0.5 D.DP

Prediction error

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 D.Eq Odds

Prediction error

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.15 0.20 0.25 0.30 0.35 0.40 D.PRP

Prediction error

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

Figure 3. Prediction error versus difference of: Demographic Parity (D.DP), Equalized Odds (D.Eq Odds) and Predictive Rate Parity (D.PRP) on test data using noiseless training data (ϵ = 0) for Adult (top row) and COMPAS (bottom row) datasets.

Partitioning the data We first split the entire dataset randomly into a disjoint train (train-all) and test (test-all) set of equal size. The test set (test-all) is entirely withheld from the training procedure and ultimately used solely for evaluation. To produce each demonstration (a vector of reference decisions), we split the (train-all) set randomly into a disjoint train (train-demo) and test (test-demo) set of equal size.

Noise insertion We randomly flip ϵ% of the ground truth labels y and group membership attributes a to add noise to our demonstration-producing process.

Fair classifier P: Using the noisy data, we provide existing fairness-aware methods with labeled train-demo data and unlabeled test-demo to produce decisions on the test-demo data as demonstrations y. Specifically, we employ:

The Post-processing method of Hardt et al. (2016), which aims to reduce both prediction error and {demographic parity or equalized odds} at the same time. We use demographic parity as the fairness constraint. We produce demonstrations using this method for Adult dataset.

Robust fairness for logloss-based classification (Rezaei et al., 2020) employs distributional robustness to match

target fairness constraint(s) while robustly minimizing the log loss. We use equalized odds as the fairness constraint. We employ this method to produce demonstrations for COMPAS dataset.

We repeat the process of partitioning train-all N = 50 times to create randomized partitions of train-demo and test-demo and to then produce a set of demonstrations { y}50 i=1.

4.2. Evaluation Metrics and Baselines

Predictive Performance and Fairness Measures Our focus for evaluation is on outperforming demonstrations in multiple fairness and performance measures. We use K = 4 measures: inaccuracy (Prediction error), difference from demographic parity (D.DP), difference from equalized odds (D.Eq Odds), difference from predictive rate parity (D.PRP).

Baseline methods As baseline comparisons, we train five different models on the entire train set (train-all) and then evaluate them on the withheld test data (test-all):

The Post-processing model of (Hardt et al., 2016) with {demographic parity or equalized odds} as the fairness constraint (post proc dp and post proc eqodds).

Superhuman Fairness

fair_logloss_eqodds fair_logloss_dp

post_proc_eqodds post_proc_dp

MFOpt post_proc_demos

superhuman_train superhuman_test

0.00 0.05 0.10 0.15 0.20 0.25 D.DP

Prediction error

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.00 0.05 0.10 0.15 0.20 0.25 0.30 D.Eq Odds

Prediction error

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.15 0.20 0.25 0.30 0.35 0.40 D.PRP

Prediction error

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.0 0.1 0.2 0.3 0.4 0.5 0.6 D.DP

Prediction error

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 D.Eq Odds

Prediction error

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.15 0.20 0.25 0.30 0.35 D.PRP

Prediction error

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

Figure 4. Experimental results on the Adult and COMPAS datasets with noisy demonstrations (ϵ = 0.2). Margin boundaries are shown with dotted red lines. Each plot shows the relationships between two measures.

The Robust Fair-logloss model of (Rezaei et al., 2020) with {demographic parity or equalized odds} as the fairness constraint (fair logloss dp and fair logloss eqodds).

The Multiple Fairness Optimization framework of Hsu et al. (2022) which is designed to satisfy three conflicting fairness measures {demographic parity, equalized odds, and predictive rate parity} to the best extent possible (MFOpt).

Hinge Loss Slopes As discussed previously, each αk value corresponds to the hinge loss slope, which defines the sensitivity of produced decision not sufficiently outperforming the demonstrations on the kth measure. When the αk is large, the model heavily weights support vector reference decisions for that particular k when minimizing subdominance. We report these values in our experiments.

4.3. Superhuman Model Specification and Updates

We use a logistic regression model Pθ0 with first-order moment feature functions, ϕ(y, x) = [x1y, x2y, . . . xmy] , and weights θ applied independently on each item as our decision model. During the training process, we update the model parameter θ to reduce subdominance.

Sample from Model ˆPθ In each iteration of the algorithm, we first sample prediction vectors {ˆyi}N i=1 from ˆPθ(.|Xi) for the matching items used in demonstrations { yi}N i=1. In the implementation, to produce the ith sample, we look up the indices of the items used in yi, which constructs item set Xi. Now we make predictions using our model on this item set ˆPθ(.|Xi). The model produces a probability distribution for each item which can be sampled and used as a prediction {ˆyi}N i=1.

Update model parameters θ We update θ until convergence using Algorithm 1. For our logistic regression model, the gradient is:

θ log ˆPθ(ˆy|X) = ϕ(ˆy, X) Eˆy|X ˆ Pθ [ϕ(ˆy, X)] ,

where ϕ denotes the feature function and ϕ(ˆy, X) = PM m=1 ϕ(ˆym, xm) is the corresponding feature function for the ith set of reference decisions. We employ a learning rate of η = 0.01.

4.4. Experimental Results

After training each model, e.g., obtaining the best model weight θ from the training data (train-all) for superhuman, we evaluate each on unseen test data (test-all). We employ hard predictions (i.e., the most

Superhuman Fairness

Table 1. Experimental results on noise-free datasets, along with the αk values learned for each measure in subdominance minimization.

Method Dataset Adult COMPAS

Prediction error DP diff Eq Odds diff PRP diff Prediction error DP diff Eq Odds diff PRP diff

αk 62.62 35.93 6.46 4.98 82.5 4.27 3.15 12.72 γ-superhuman 98% 94% 100% 100% 100% 100% 100% 100% Min Sub-Fair (ours) 0.2109 0.0259 0.0067 0.1831 0.3668 0.0406 0.1247 0.1712 MFOpt 0.1957 0.0632 0.0775 0.2092 0.4347 0.0058 0.0695 0.1616 post proc dp 0.2125 0.0309 0.2204 0.3983 0.3460 0.0104 0.0770 0.1737 post proc eqodds 0.2139 0.1188 0.0072 0.3135 0.3634 0.0412 0.0602 0.1510 fair logloss dp 0.2812 0.0043 0.0480 0.1248 0.4676 0.0002 0.0714 0.1724 fair logloss eqodds 0.2541 0.1535 0.0301 0.1166 0.4515 0.1031 0.0291 0.1244

Table 2. Experimental results on datasets with noisy demonstrations, along with the αk values learned for each measure.

Method Dataset Adult COMPAS

Prediction error DP diff Eq Odds diff PRP diff Prediction error DP diff Eq Odds diff PRP diff

αk 29.63 10.77 5.83 13.42 29.33 4.51 3.34 57.74 γ-superhuman 100% 100% 100% 100% 100% 100% 100% 98% Min Sub-Fair (ours) 0.1937 0.0310 0.0093 0.1760 0.3600 0.0320 0.0367 0.1723 MFOpt 0.3157 0.0132 0.0225 0.2092 0.4597 0.0919 0.0397 0.1533 post proc dp 0.2265 0.1442 0.0879 0.2304 0.3532 0.0879 0.0884 0.1605 post proc eqodds 0.2176 0.1572 0.1396 0.1451 0.3513 0.1442 0.1584 0.1485 fair logloss dp 0.3835 0.0246 0.0577 0.1158 0.4846 0.0053 0.1455 0.1832 fair logloss eqodds 0.3776 0.1179 0.0238 0.1380 0.4870 0.1272 0.0119 0.1539

probable label) using our approach at test time rather than randomly sampling.

Noise-free reference decisions Our first set of experiments considers learning from reference decisions with no added noise.4 The results are shown in Figure 3. We observe that our approach outperforms demonstrations in all fairness measures and shows comparable performance in accuracy. The (post proc dp) performs comparably to the average of demonstrations in all dimensions, hence our approach can outperform it in all fairness measures. In comparison to (post proc dp), our approach can outperform in all fairness measures but is slightly worse in prediction error.

We show the experiment results along with αk values in Table 1. Note that the margin boundaries (dotted red lines) in Figure 3 are equal to 1 αk for measure k, hence there is reverse relation between αk and margin boundary for measure k. We observe larger values of αk for prediction error and demographic parity difference. The reason is that these measures are already optimized in demonstrations and our model has to increase αk values for those measures to sufficiently outperform them.

Noisy reference decisions In our second set of experiments, we introduce significant amounts of noise (ϵ = 0.2) into our reference decisions. We similarly add this noise to the training datasets (train-all) of the baseline methods.

4Added noise does not imply the original dataset is noise-free.

Table 3. Percentage of reference demonstrations that each method outperforms in all prediction/fairness measures.

Method Dataset Adult COMPAS

ϵ = 0.0 ϵ = 0.2 ϵ = 0.0 ϵ = 0.2 Min Sub-Fair (ours) 96% 100% 100% 98% MFOpt 42% 0% 18% 18% post proc dp 16% 86% 100% 80% post proc eqodds 0% 66% 100% 88% fair logloss dp 0% 0% 0% 0% fair logloss eqodds 0% 0% 0% 0%

The results for these experiments are shown in Figure 4. We observe that in the case of learning from noisy demonstrations, our approach still outperforms the reference decisions.

The main difference here is that due to the noisy setting, demonstrations have worse prediction error but regardless of this issue, our approach still can achieve a competitive prediction error. We show the experimental results along with αk values in Table 2.

Relationship of noise to superhuman performance We also evaluate the relationship between the amount of augmented noise in the label and protected attribute of demonstrations, with achieving γ-superhuman performance in our approach. As shown in Figure 5, with slightly increasing the amount of noise in demonstrations, our approach can outperform 100% of demonstrations and reach 1-superhuman performance. In Table 3 we show the percentage of demonstrations that each method can outperform across all predic-

Superhuman Fairness

0.00 0.02 0.04 0.06 0.08 0.10 Noise Ratio

Predictive value difference Equalized odds difference Demographic parity difference Zero One

0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 Noise Ratio

Predictive value difference Equalized odds difference Demographic parity difference Zero One

Figure 5. The relationship between the ratio of augmented noise in the label and the protected attribute of reference decisions produced by post-processing (upper) and fair-logloss (lower) and achieving γ-superhuman performance in our approach.

tion/fairness measures (i.e., the γ superhuman value).

5. Conclusions

In this paper, we introduce superhuman fairness, an approach to fairness-aware classifier construction based on imitation learning. Our approach avoids explicit performancefairness trade-off specification or elicitation. Instead, it seeks to unambiguously outperform human decisions across multiple performance and fairness measures with maximal frequency. When successful, this provides important guarantees for stakeholders with a broad set of possible preferences for performance and fairness measures. We develop a general framework for pursuing this based on subdominance minimization (Ziebart et al., 2022) and policy gradient optimization methods (Sutton & Barto, 2018) that enable a broad class of probabilistic fairness-aware classifiers to be learned. Our experimental results show the effectiveness of our approach in outperforming synthetic decisions corrupted by small amounts of label and group-membership noise when evaluated using multiple fairness criteria combined with predictive accuracy.

Societal impacts By design, our approach has the potential to identify fairness-aware decision-making tasks in which human decisions can frequently be outperformed by a learned classifier on a set of provided performance and fairness measures. This has the potential to facilitate a transition from manual to automated decisions that are preferred by all interested stakeholders, so long as their interests are reflected in some of those measures. Since the formulation only provides preference guarantees for stakeholders with nonnegatively-weighted combinations of performance and fairness measures, it may reduce the negative impact of stakeholders in human-produced decision-making from successfully seeking negative outcomes for specific groups.

Despite these benefits, our approach also has limitations. First, when performance-fairness tradeoffs can either be fully specified (e.g., based on first principles) or effectively elicited, fairness-aware classifiers optimized for those tradeoffs should produce better results than our approach, which operates under greater uncertainty cast by the noisiness of human decisions. Second, if target fairness concepts lie outside the set of measures we consider, our resulting fairness-aware classifier will be oblivious to them. Third, our approach assumes human-demonstrated decision are well-intentioned, noisy reflections of desired performancefairness trade-offs. If this is not the case, then our methods could succeed in outperforming them across all fairness measures, but still not provide an adequate degree of fairness.

Future directions We have conducted experiments with a relatively small number of performance/fairness measures using a simplistic logistic regression model. Scaling our approach to much larger numbers of measures and classifiers with more expressive representations are both of great interest. Additionally, we plan to pursue experimental validation using human-provided fairness-aware decisions in addition to the synthetically-produced decisions we consider in this paper. More broadly, other techniques that can minimize subdominance or provide generalization guarantees for stakeholders adoption preferences of algorithmic decision-making are of significant interest.

Acknowledgements

This work was supported by the National Science Foundation Program on Fairness in AI in collaboration with Amazon under award No. 1939743. We thank our reviewers for providing constructive feedback that helped to substantially improve the framing and presentation of the paper.

Superhuman Fairness

Abbeel, P. and Ng, A. Y. Apprenticeship learning via inverse reinforcement learning. In International Conference on Machine Learning, pp. 1 8, 2004.

Aghaei, S., Azizi, M. J., and Vayanos, P. Learning optimal and fair decision trees for non-discriminative decisionmaking. In AAAI Conference on Artificial Intelligence, volume 33, pp. 1418 1426, 2019.

Blum, A. and Stangl, K. Recovering from biased data: Can fairness constraints improve accuracy? ar Xiv preprint ar Xiv:1912.01094, 2019.

Boyd, S. and Vandenberghe, L. Convex optimization. Cambridge University Press, 2004.

Calders, T., Kamiran, F., and Pechenizkiy, M. Building classifiers with independency constraints. In 2009 IEEE International Conference on Data Mining Workshops, pp. 13 18. IEEE, 2009.

Celis, L. E., Huang, L., Keswani, V., and Vishnoi, N. K. Classification with fairness constraints: A meta-algorithm with provable guarantees. In ACM FAT*, 2019.

Chen, L. and Pu, P. Survey of preference elicitation methods. Technical report, EPFL, 2004.

Chouldechova, A. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2):153 163, 2017.

Cortes, C. and Vapnik, V. Support-vector networks. Machine learning, 20:273 297, 1995.

Dheeru, D. and Karra Taniskidou, E. UCI machine learning repository, 2017. URL http://archive.ics.uci. edu/ml.

Donaldson, T. and Preston, L. E. The stakeholder theory of the corporation: Concepts, evidence, and implications. Academy of management Review, 20(1):65 91, 1995.

Dowling, A. W., Ruiz-Mercado, G., and Zavala, V. M. A framework for multi-stakeholder decision-making and conflict resolution. Computers & Chemical Engineering, 90:136 150, 2016.

Goel, N., Yaghini, M., and Faltings, B. Non-discriminatory machine learning through convex fairness criteria. In AAAI Conference on Artificial Intelligence, 2018.

Hardt, M., Price, E., and Srebro, N. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems, volume 29, pp. 3315 3323, 2016.

Hiranandani, G., Narasimhan, H., and Koyejo, S. Fair performance metric elicitation. In Advances in Neural Information Processing Systems, volume 33, pp. 11083 11095, 2020.

Hsu, B., Mazumder, R., Nandy, P., and Basu, K. Pushing the limits of fairness impossibility: Who s the fairest of them all? In Advances in Neural Information Processing Systems, 2022.

Kamishima, T., Akaho, S., Asoh, H., and Sakuma, J. Fairness-aware classifier with prejudice remover regularizer. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 35 50. Springer, 2012.

Kleinberg, J., Mullainathan, S., and Raghavan, M. Inherent trade-offs in the fair determination of risk scores. ar Xiv preprint ar Xiv:1609.05807, 2016.

Larson, J., Mattu, S., Kirchner, L., and Angwin, J. How we analyzed the compas recidivism algorithm. Pro Publica, 9, 2016.

Liu, S. and Vicente, L. N. Accuracy and fairness tradeoffs in machine learning: A stochastic multi-objective approach. Computational Management Science, pp. 1 25, 2022.

Martinez, N., Bertran, M., and Sapiro, G. Minimax Pareto fairness: A multi objective perspective. In International Conference on Machine Learning, pp. 6755 6764. PMLR, 13 18 Jul 2020.

Menon, A. K. and Williamson, R. C. The cost of fairness in binary classification. In ACM FAT*, 2018.

Osa, T., Pajarinen, J., Neumann, G., Bagnell, J. A., Abbeel, P., and Peters, J. An algorithmic perspective on imitation learning. Foundations and Trends in Robotics, 7(1-2): 1 179, 2018.

Rezaei, A., Fathony, R., Memarrast, O., and Ziebart, B. Fairness for robust log loss classification. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp. 5511 5518, 2020.

Sutton, R. S. and Barto, A. G. Reinforcement learning: An introduction. MIT Press, 2018.

Syed, U. and Schapire, R. E. A game-theoretic approach to apprenticeship learning. In Advances in Neural Information Processing Systems, volume 20, 2007.

Vapnik, V. and Chapelle, O. Bounds on error expectation for support vector machines. Neural computation, 12(9): 2013 2036, 2000.

Superhuman Fairness

Zafar, M. B., Valera, I., Rodriguez, M. G., and Gummadi, K. P. Fairness constraints: Mechanisms for fair classification. ar Xiv preprint ar Xiv:1507.05259, 2015.

Ziebart, B., Choudhury, S., Yan, X., and Vernaza, P. Towards uniformly superhuman autonomy via subdominance minimization. In International Conference on Machine Learning, pp. 27654 27670. PMLR, 2022.

Ziebart, B. D., Maas, A. L., Bagnell, J. A., Dey, A. K., et al. Maximum entropy inverse reinforcement learning. In AAAI Conference on Artificial Intelligence, volume 8, pp. 1433 1438, 2008.

Superhuman Fairness

A. Proofs of Theorems

Proof of Theorem 3.3. We first establish that the average αk-minimized subdominance of a single measure k,

y min αk subdomk αk(ˆy, y, y, a) = 1

h α k ˆfk fk( y, y, a) + 1 i

is a monotonic (increasing) function of ˆfk fk(ˆy, y, a).

When α k 0 is nonzero, it is minimized by defining a margin boundary at the largest support vector, y(j):

fk( y(j), y, a) ˆfk .

When summed over all examples, Equation (11) can be expressed as:

ˆfk fk( y(1:j), y, a)

fk( y(j), y, a) ˆfk + 1

fk( y(j), y, a) fk( y(1:j), y, a)

fk( y(j), y, a) ˆfk

From the left-hand side of Eq. (12), we can see that when ˆfk is equal to the average features of the j (smallest) support vectors, fk( y(1:j), y, a), the subdominance is equal to the support vector frequency (j/N). This is also precisely the value of ˆfk at which a new support vector with measure value fk( y(j+1), y, a), is added. Starting from the left-hand side of Eq. (12), we show that this has the same value of j/N for the subdominance when ˆfk = fk( y(1:j), y, a):

fk( y(1:j), y, a) fk( y(1:j+1), y, a)

fk( y(j+1), y, a) fk( y(1:j), y, a) + 1

fk( y(1:j), y, a) fk( y(1:j+1), y, a) + fk( y(j+1), y, a) fk( y(1:j), y, a)

fk( y(j+1), y, a) fk( y(1:j), y, a)

fk( y(1:j+1), y, a) + fk( y(j+1), y, a)

fk( y(j+1), y, a) fk( y(1:j), y, a)

(j + 1)fk( y(1:j+1), y, a) + fk( y(j+1), y, a) + jfk( y(j+1), y, a)

fk( y(j), y, a) fk( y(1:j), y, a)

jfk( y(1:j), y, a) + jfk( y(j+1), y, a)

fk( y(j+1), y, a) fk( y(1:j), y, a)

where step (a) follows from (j + 1)fk( y(1:j+1), y, a) fk( y(j+1), y, a) = jfk( y(1:j). This shows that at its non-smooth points, the subdominance is not decreasing.

Differentiating the right-hand side of Eq. (12) yields:

fk( y(j), y, a) fk( y(1:j), y, a)

(fk( y(j), y, a) ˆfk)2

which is nonnegative as long as fk( y(j)) fk( y(1:j), y, a), a condition that is always true by definition of the ordered support vectors. Thus, since subdominance is non-decreasing at both its smooth and nonsmooth portions, it is a monotonic (increasing) function of ˆfk in each dimension k.

Since the per-measure subdominances are independent and combined via summation over all the dimensions k to form the entire subdominance, the sublevel sets must be convex, and the subdominance overall is therefore a quasiconvex function of ˆf.

Superhuman Fairness

Proof of Theorem 3.4. The gradient of the training objective with respect to model parameters θ is:

θEˆy|X ˆ Pθ

Γk(ˆy, Y,y,a) z }| { min αk

subdomk αk ˆy, Y, y, a + λkαk

= Eˆy|X ˆ Pθ

k Γk(ˆy, Y, y, a)

θ log ˆPθ(ˆy|X) ,

which follows directly from a property of gradients of logs of function:

θ log ˆP(ˆy|X) = 1 ˆP(ˆy|X) θˆP(ˆy|X) = θˆPθ(ˆy|X) = ˆP(ˆy|X) θ log ˆP(ˆy|X). (15)

We note that this is a well-known approach employed by policy-gradient methods in reinforcement learning (Sutton & Barto, 2018).

Next, we consider how to obtain the α minimized subdominance for a particular tuple (ˆy, Y,y,a), Γk ˆy, Y, y, a =

minαk subdomk αk ˆy, Y, y, a + λkαk , analytically.

First, we note that subdomk αk ˆy, Y, y, a + λkαk is comprised of hinged linear functions of αk, making it a convex and piece-wise linear function of αk. This has two important implications: (1) any point of the function for which the subgradient includes 0 is a global minimum of the function (Boyd & Vandenberghe, 2004); (2) an optimum must exist at a corner of the function: αk = 0 or where one of the hinge functions becomes active:

αk(fk(ˆyi) fk( yi)) + 1 = 0 = αk = 1 fk( yi) fk(ˆyi). (16)

The subgradient for the jth of these points (ordered by fk value from smallest to largest and denoted fk( y(j)) for the demonstration) is:

αk subdomk αk ˆy, Y, y, a αk=(fk(ˆy) fk( y(j))) 1 = αk

fk(ˆy) fk( y(i)) + 1

fk(ˆy) fk( y(i)) + h 0, fk(ˆy) fk( y(j)) i ,

where the final bracketed expression indicates the range of values added to the constant value preceding it.

The smallest j for which the largest value in this range is positive must contain the 0 in its corresponding range, and is thus the provides the j value for the optimal αk value.

Proof of Theorem 3.5. We first recall generalization guarantees for support vector machines (SVMs) (Cortes & Vapnik, 1995) based on leave-one-out cross validation (LOOCV) that our approach leverages. For support vector machines, examples that are not support vectors incur zero loss and do not actively constrain the SVM parameters. Thus, when these examples are removed, the decision boundary does not change and therefore no cross validation loss is incurred on any left-out example during LOOCV. Due to this, the support vector frequency is an upper bound on the leave-one-out cross validation error, which is an (almost) unbiased estimate of the generalization inaccuracy (Vapnik & Chapelle, 2000).

Since subdominance is quasiconvex instead of convex, this analysis is slightly more complicated. Specifically, it requires the set of realizable f measures to be convex. The intersection of the sublevel sets of the quasiconvex subdominance (Theorem 3.3 with a convex set of feasible measures is also convex, so the constrained subdominance minimization problem (minimizing subdominance over the set of realizable features for the family of possible Pθ) is also quasiconvex. As a result, no local optima exist that are not global optima. Since the non-support vectors do not actively constrain the global optima, removing them does not change the global optima and therefore they do not contribute any loss to the leave-one-out cross validation error. The remaining argument then follows directly from the SVM LOOCV analysis.

Superhuman Fairness

B. Additional Results

In the main paper, we only included plots that show the relationship of a fairness metric with prediction error. To show the relation between each pair of fairness metrics, in Figures 6 and 7 we show the remaining plots removed from Figures 3 and 4 respectively.

0.00 0.05 0.10 0.15 0.20 0.25 0.30 D.Eq Odds

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 D.PRP

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 D.PRP

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 D.Eq Odds

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.15 0.20 0.25 0.30 0.35 0.40 D.PRP

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.15 0.20 0.25 0.30 0.35 0.40 D.PRP

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

Figure 6. The trade-off between each pair of: difference of Demographic Parity (D.DP), Equalized Odds (D.Eq Odds) and Predictive Rate Parity (D.PR) on test data using noiseless training data (ϵ = 0) for Adult (top row) and COMPAS (bottom row) datasets.

B.1. Experiment with more measures

Since our approach is flexible enough to accept wide range of fairness/performance measures, we extend the experiment on Adult to K = 5 measures. In this experiment we use Demographic Parity (D.DP), Equalized Odds (D.Eq Odds), False Negative Rate (D.FNR), False Positive Rate (D.FPR) and Prediction Error as the measures to outperform reference decisions on. The results are shown in Figure 8.

Superhuman Fairness

0.00 0.05 0.10 0.15 0.20 0.25 0.30 D.Eq Odds

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.15 0.20 0.25 0.30 0.35 0.40 D.PRP

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.15 0.20 0.25 0.30 0.35 0.40 D.PRP

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 D.Eq Odds

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.15 0.20 0.25 0.30 0.35 D.PRP

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.15 0.20 0.25 0.30 0.35 D.PRP

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

Figure 7. The trade-off between each pair of: difference of Demographic Parity (D.DP), Equalized Odds (D.Eq Odds) and Predictive Rate Parity (D.PR) on test data using noiseless training data (ϵ = 0.2) for Adult (top row) and COMPAS (bottom row) datasets.

Superhuman Fairness

0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 D.DP

Prediction error

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.05 0.10 0.15 0.20 0.25 0.30 D.Eq Odds

Prediction error

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.05 0.10 0.15 0.20 0.25 0.30 D.FNR

Prediction error

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.00 0.05 0.10 0.15 0.20 0.25 D.FPR

Prediction error

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.05 0.10 0.15 0.20 0.25 0.30 D.Eq Odds

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.05 0.10 0.15 0.20 0.25 0.30 D.FNR

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.00 0.05 0.10 0.15 0.20 0.25 D.FPR

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.05 0.10 0.15 0.20 0.25 0.30 D.Eq Odds

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.00 0.05 0.10 0.15 0.20 0.25 D.FPR

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

0.05 0.10 0.15 0.20 0.25 0.30 D.Eq Odds

fair_logloss_eqodds fair_logloss_dp post_proc_eqodds post_proc_dp MFOpt post_proc_demos superhuman_train superhuman_test

Figure 8. The trade-off between each pair of: difference of Demographic Parity (D.DP), Equalized Odds (D.Eq Odds), False Negative Rate (D.FNR), False Positive Rate (D.FPR) and Prediction Error on test data using noiseless training data (ϵ = 0) for Adult dataset.