# meritocratic_fairness_for_crosspopulation_selection__2165be67.pdf

Meritocratic Fairness for Cross-Population Selection

Michael Kearns 1 Aaron Roth 1 Zhiwei Steven Wu 1

We consider the problem of selecting a pool of individuals from several populations with incomparable skills (e.g. soccer players, mathematicians, and singers) in a fair manner. The quality of an individual is deﬁned to be their relative rank (by cumulative distribution value) within their own population, which permits cross-population comparisons. We study algorithms which attempt to select the highest quality subset despite the fact that true CDF values are not known, and can only be estimated from the ﬁnite pool of candidates. Speciﬁcally, we quantify the regret in quality imposed by meritocratic notions of fairness, which require that individuals are selected with probability that is monotonically increasing in their true quality. We give algorithms with provable fairness and regret guarantees, as well as lower bounds, and provide empirical results which suggest that our algorithms perform better than the theory suggests.

1. Introduction

Consider the following common academic (or similar) hiring scenario: The dean has promised your department 3 faculty slots, in any areas. Your goal is to hire the best candidates possible but how should you identify them? An immediate problem is that candidates are incomparable across subﬁelds, because, among other things, standards of publication, citation counts, and letter-writing styles can vary considerably across subﬁelds. An attractive way to rank candidates is according to how strong they are relative to others working in the same ﬁeld, to whom they are directly comparable. If we model each subﬁeld as corresponding to a different distribution over metrics that are monotonically increasing in candidate quality, this is the value we get when we evaluate the CDF function of the distribution on a candidate s realized value. But because

1University of Pennsylvania, Philadelphia, USA. Correspondence to: Zhiwei Steven Wu <steven7woo@gmail.com>.

Proceedings of the 34 th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017. Copyright 2017 by the author(s).

the number of candidates each year is small, simply comparing each candidate to their direct competitors this year i.e. taking their empirical CDF values as truth would lead to a noisy ranking: it could be that due to chance, the best candidate this year in subﬁeld A would be a mediocre candidate in a typical year, and the top two candidates in subﬁeld B would each be the top candidate in a typical year. We would prefer to evaluate our success by considering the unknown true CDF value of each candidate.1 Similar situations, in which we must select a high quality set of candidates from multiple, mutually incomparable groups, arise frequently. Some afﬁrmative action policies are premised on the assertion that SAT scores and other measures may not be directly comparable across different groups (e.g. due to only advantaged groups having the ﬁnancial resources for test preparation courses and multiple retakes).

For various reasons, in these settings we may also be concerned with the fairness of our choices.2 But what should fairness mean? In this paper, we take inspiration from (Dwork et al., 2012) who propose that fairness should mean that similar individuals are treated similarly , where similarity is measured with respect to some task speciﬁc metric. In our setting, the natural task-speciﬁc metric is the true within-group CDF value for each individual. On its own, this is compatible with the goal of selecting the best candidates, but in our work, the main obstacle is that we do not know the true CDF value of each individual, and can only approximate this from data. We study the degree to which fairness and optimality are compatible with one another in this setting.

1.1. Our Results

We study a setting in which we wish to select k individuals out of a pool of n for some task. The individuals are drawn from d populations, each represented by a different

1Letters of recommendation often seek to communicate this information, with statements like This candidate is among the top 5 students I have seen in my 16 years as a professor. 2With respect to men s and women s sports, equal opportunity is legislated in Title IX. With respect to faculty hiring, fairness concerns can arise because the proportion of women can vary substantially across subﬁelds. For example, as reported in (Cohoon et al., 2011), the percentage of female authors varies from 10% to 44% across ACM conferences, when averaged over the 10 year period from 1998-2008.

Meritocratic Fairness for Cross-Population Selection

distribution over real numbers.3 The number of draws from each distribution may differ. The quality of an individual is deﬁned to be their (true) CDF value, as evaluated on the distribution from which they were drawn. An algorithm is evaluated based on the (expected) quality of the k individuals it selects.

The meritocratic fairness deﬁnition we propose informally asks that lower quality individuals are never (probabilistically) favored over higher quality individuals. When formulating this deﬁnition, we have a choice as to how to incorporate randomness. The strongest formulation possible (ex-post fairness) does not involve randomness, and simply requires that every individual actually selected has quality at least that of every individual not selected. The weakest formulation (ex-ante fairness) incorporates the randomness of the selection of the population from the underlying distribution, and informally requires that for any pair of individuals, the higher quality individual is selected with weakly higher probability than the lower quality individual, where the randomness is over the realization of the population from the underlying distributions, as well as any internal randomness of the mechanism. An intermediate formulation (ex-interim fairness) requires informally that higher quality individuals be selected with weakly higher probability than lower quality individuals, where the probability is computed over the randomness of the mechanism, but not over the selection of the population. Roughly speaking, these choices correspond to what an individual may know and still be satisﬁed by a promise of fairness . Individuals should be satisﬁed with ex-post fairness even after the choices of the mechanism are made, with full knowledge of the applicant pool that is, they should be satisﬁed with the actual outcome, regardless of the algorithm used to reach it. In contrast, individuals with full knowledge of the applicant pool should still be satisﬁed with ex-interim fairness before the mechanism makes its decisions that is, they should feel satisﬁed that the algorithm used is fair. An individual should only be satisﬁed by ex-ante fairness if she has no knowledge of the applicant pool (and so can consider it a random variable) before the choices are made.

Given such a spectrum of fairness constraints, we observe that the strongest ex-post fairness is impossible to achieve, whereas the weakest ex-ante fairness is sometimes easy to achieve: when the population sizes are the same, it is satisﬁed by the mechanism that simply selects the k individuals with highest empirical CDF values.4 Our main results

3We study the simple setting in which each individual is represented by a 1-dimensional score e.g. a credit score, a time in the 100m dash, etc. which itself may encapsulate or summarize many features into a single value. Generalizing this work to richer representations is an interesting direction for future work. 4However, for the cases in which the populations are not the same size, we do not know of better utility guarantees for ex-ante

therefore concern the cost (in terms of the expected quality of the selected applicants) of asking for the stronger notion of ex-interim fairness. We show that satisfying an exact variant of this constraint requires the selection algorithm to select uniformly at random amongst all individuals, and hence obtain only trivial utility guarantees, but that subject to an approximate relaxation of this constraint, it is possible to recover asymptotically optimal utility bounds. We show that when we further relax the problem, to allow the algorithm to select approximately k individuals (rather than exactly k), it is possible to recover asymptotically optimal utility bounds while satisfying ex-post fairness guarantees within each sub-population, and approximate ex-interim fairness guarantees across populations. We summarize our results in Table 1. We complement our theoretical results with empirical simulations which emphasize that both the utility and fairness guarantees of our algorithms are better in practice than our theorems promise.

Finally, we remark on an interesting property of our upper bounds: they are oblivious, in the sense that they do not make use of the raw scores associated with each individual only their empirical CDF ranking. As such, our upper bounds can be viewed as universal distributions over permutations (of empirical CDF rankings) that satisfy a fairness guarantee, rather than algorithms. Our lower bounds apply not just to oblivious algorithms, but to any algorithm, even those that can make use of raw scores (or indeed, even knowledge of the family of distributions from which populations are drawn).

1.2. Related Work

This paper ﬁts into a rapidly growing line of work studying fairness in learning settings that is now too large to summarize fully, and so we discuss only the most closely related work. Our deﬁnition of fairness is in the spirit of (Dwork et al., 2012), who propose that individual fairness should mean that similar individuals are treated similarly with respect to some underlying task-speciﬁc metric. As with the work of (Joseph et al., 2016; Jabbari et al., 2016), we deﬁne the metric to be a measure of quality already present in the model (in our case, the CDF values of individuals) but unknown to the algorithm, except through samples. It is this necessity to learn the underlying metric that poses the tension between the fairness constraint and the accuracy goal. Although in this line of work, we adopt a deﬁnition that merely requires better individuals be treated better according to the true unknown metric, this necessarily requires that similar individuals be treated similarly with respect to empirical estimates of the metric.

Technically, our work includes adaptations of techniques in

fairness than those we derive for the stronger notion of ex-interim fairness.

Meritocratic Fairness for Cross-Population Selection

Exact Fairness Approximate Fairness Ex-Ante Regret O(1/n) (Lemma 2.8) Regret O(1/n) (Lemma 2.8) Ex-Interim Regret Ω(1) (Theorem 5.1). Regret O(

k/n) (Theorem 3.7) Ex-Post Impossible Regret O(

k/n)* (Theorem 4.2)

Table 1. An informal summary of results. The bounds are stated in the case when the populations have sizes within a constant factor of one another see the theorem statements for the precise bounds. When the population sizes are the same. *Exact ex-post fairness within each population, approximate ex-interim fairness between populations, and selects approximately k individuals.

differential privacy (Dwork et al., 2006). Speciﬁcally, we adopt variants of the report noisy max algorithm (Dwork & Roth, 2014), and Raskhodnikova and Smith s exponential mechanism for scores with varying sensitivities (Raskhodnikova & Smith, 2016), which is itself a variant of the exponential mechanism (Mc Sherry & Talwar, 2007).

2. Model and Preliminaries

There are d different populations, indexed by j. For each population j, there is a pool of candidates with their raw scores (and henceforth observations) drawn i.i.d. from some unknown continuous distribution Fj over R. Let F = F1 Fd denote the product distribution. We will slightly abuse notation and write xij to denote both the individual i in the population j and her associated observation, and write X to denote the set of all candidates. Let mj be the size of the candidate pool from population j, n = P

j mj be the size of the total population, and m = minj mj be the smallest population size. Each individual xij is associated with the following values.

A cumulative distribution function (CDF) value Fj(xij) = Pr Fj[x < xij],5 and an empirical CDF value b Fj(xij) = 1 mj Pmj i =1 1[x < xi j].

A complementary cumulative distribution function (CCDF) value: pij = 1 Fj(xij) and an empirical CCDF value ˆpij = 1 mj Pmj i =1 1[x xi j].

A selection algorithm A takes all the n observations X drawn from different distributions as input, and (randomly) selects k individuals as outputs. We will write A(X, xij) (or Aij for simplicity) to denote the selection probability over the individual xij. The utility for selecting an individual xij is her true CDF value Fj(xij). Equivalently, the loss for selecting an individual xij is the true CCDF value pij. The loss for an algorithm A on input X is then deﬁned as

L(A, X) = 1

xij X A(X, xij)(1 Fj(xij))

5We adopt a slightly different deﬁnition from the standard one: Fj(xij) = Pr Fj[x xij].

and the expected loss of the algorithm is EX F [L(A, X)].

2.1. Fairness Formulation

Our goal is design selection algorithms subject to a meritocratic fairness notion that requires that less qualiﬁed candidates (in terms of CDF values) are never preferred over more qualiﬁed ones. We will present three different formulations of such notion based on the different forms of randomness we are considering.

First, the weakest formulation is the following ex-ante fairness, which guarantees fairness over the randomness of both the random draws of the candidates and the coin ﬂips of the algorithm.

Deﬁnition 2.1 (Ex-Ante Fairness). An algorithm A satisﬁes ex-ante fairness if for any pair of candidates xij, xi j with CDF values Fj(xij) > Fj (xi j ), their selection probabilities (when they are in the pool) satisfy

E [A(X, xij)] E [A(X, xi j )]

where the expectations are taken over the (n 2) random draws of all the other candidates.

An intermediate formulation of fairness is the following exinterim fairness, which guarantees fairness over the randomness of the algorithms (but not the realizations of X) on almost all of inputs drawn from the distribution.

Deﬁnition 2.2 (Exact Ex-Interim Fairness). Let δ (0, 1). An algorithm A satisﬁes δ-exact ex-interim fairness if with probability at least 1 δ over the realized observations X, for any pair of individuals xij, xi j X,

A(X, xij) > A(X, xi j ) only if Fj(xij) > Fj (xi j )

We also consider the following relaxation:

Deﬁnition 2.3 (Approximate Ex-Interim Fairness). An algorithm A satisﬁes (ε, δ)-approximate ex-interim fairness if with probability at least 1 δ over the realized observations X, for any pair of individuals xij, xi j X,

A(X, xij) > eε A(X, xi j ) only if Fj(xij) > Fj (xi j )

Remark 2.4. We note that this relaxation of ex-interim fairness bears a similarity to the deﬁnition of differential

Meritocratic Fairness for Cross-Population Selection

privacy (Dwork et al., 2006), and indeed, techniques from the differential privacy literature will prove useful in designing algorithms to satisfy it.

Perhaps the strongest formulation is the following ex-post fairness condition, which requires that an individual is selected only if a more qualiﬁed individual is also selected. Deﬁnition 2.5 (Ex-post Fairness). An algorithm A satisﬁes ex-post fairness if any pair of individuals xij and xi j such that Fj(xij) > Fj (xi j ), the individual xi j is admitted only if xij is also selected.

Note that any algorithm that satisﬁes ex-post fairness must admit a preﬁx of individuals from each population, which is also sufﬁcient to guarantee within population ex-post fairness, but that this is not sufﬁcient to satisfy the constraint between populations.

It is not hard to see that satisfying ex-post fairness in the generality that we have deﬁned it is impossible, since it requires perfectly selecting the k true best CDF values from only sample data. Thus, the primary focus of our paper is on ex-interim fairness. Unless we specify differently, the term fair and fairness refer to ex-interim fairness.

2.2. Oblivious Algorithms

A special class of selection algorithms is the class of oblivious algorithms, which select candidates with probabilities that only depend on their empirical CDF values, not on their observations. Deﬁnition 2.6 (Oblivious Algorithms). An algorithm A is oblivious if for any pair of input observations X and X

that induce the same empirical CDF values over the candidates, A(X) = A(X ).

All of our algorithms presented in this paper are oblivious. As a result, we need to make no assumption on the underlying distributions to achieve both fairness and utility guarantees. Moreover, the utility guarantee of an oblivious algorithm can be characterized as follows. Lemma 2.7. The expected loss achieved by any oblivious algorithm A is the expected average empirical CCDF values among the selected candidates.

A very simple example of an oblivious algorithm is GREEDY which selects the k individuals with the highest empirical CDF values (breaking ties uniformly at random). Lemma 2.8. Suppose that the populations sizes are the same, that is, mj = m for each j. The algorithm GREEDY satisﬁes ex-ante fairness and has an expected loss at most k 2n + 1

To simplify our bounds on the expected loss, we will use k/2n as our benchmark and deﬁne the regret of an algorithm A to be R(A) = EX F [L(A, X)] k 2n.

3. An Approximately Fair Algorithm

In this section, we provide an algorithm that satisﬁes approximate fairness in the sense of Deﬁnition 2.3. We will present our solution in three steps.

1. First, we provide conﬁdence intervals for the candidates CCDF values pij based on their empirical CCDF values ˆpij. As we show, our bound has a tighter dependence on pij, which gives better utility guarantee than using the standard DKW inequality of Dvoretzky et al. (1956).

2. Next, we give a simple subroutine NOISYTOP that randomly selects k individuals out of n based on their scores . We show that individuals with similar scores will have close selection probabilities under this subroutine. This subroutine is similar to the Report Noisy Max algorithm (Dwork & Roth, 2014).

3. Then, we will use the deviation bound in the ﬁrst step to assign scores to the candidates. We show that running NOISYTOP based on these scores give approximate fairness and low regret guarantees. These scores are computed in a way similar to the generalized exponential mechanism of Raskhodnikova & Smith (2016).

3.1. Conﬁdence Intervals for CCDF Values

We will ﬁrst give the following concentration inequality specialized for the uniform distribution over (0, 1).

Lemma 3.1. Fix any n N. Let x1, x2, . . . , xn be i.i.d. draws from the uniform distribution over (0, 1). Then with probability at least 1 δ, for any p (0, 1),

where ˆp = 1

n Pn i=1 1[xi < p].

To translate this result into a deviation bound on the CCDF values, ﬁrst note that CCDF values for any distribution Fj are drawn from the uniform distribution over (0, 1), so the bound applies immediately to the CCDF values. By a standard calculation, we can also get a bound in terms of the empirical CCDF value ˆpij as shown below.

Lemma 3.2. For each j [d], draw mj points Xj = {xij}mj i=1 i.i.d. from Fj. For each point xij, let pij be its true CCDF value and ˆpij be its empirical CCDF value in Fj. Then with probability at least 1 δ over the n random draws,

|pij ˆpij| 9

where m = minj mj and n = Pd j=1 mj.

Meritocratic Fairness for Cross-Population Selection

Remark 3.3. The standard DKW inequality gives a bound of O( p

1/m). Our bound gives a tighter dependence for small empirical CCDF value ˆpij. For example, when ˆpij = 1/m, we obtain a bound of O(1/m). 6

3.2. The Noisy Top Subroutine

Given a set of individuals with scores Y = {y1, . . . , yn}, the subroutine NOISYTOP will ﬁrst perturb each score by adding independent noise drawn from the Laplace distribution,7 and output the k individuals with the minimum noisy scores (ties broken arbitrarily). We will now show that NOISYTOP has the following desirable Lipschitz property individuals with similar scores are chosen with similar probabilities. This is crucial for obtaining approximate fairness.

Algorithm 1 NOISYTOP({y1, y2, . . . , yn}, α, k)

Input: n numbers {y1, y2, . . . , yn} and parameter α

For each i [n]: let yi = yi + Lap(α) Output: the k indices with the smallest yi

Lemma 3.4. Let i, j [n] be such that = yi yj 0. Let Pi and Pj denote the probabilities that the two indices i and j are output by NOISYTOP({y1, y2, . . . , yn}, α) respectively. Then Pi Pj Pi exp(2 /α).

Proof. Let yi and yj be the noisy scores for i or j. We will introduce a new random variable Q to denote the value of the (k 1)-st lowest noisy value, not counting yi and yj. We will slightly abuse notation and write Pr[R = r] as a shorthand for the pdf of any random variable R evaluated at r. The ratio Pi

Pj can then be written as R

q R Pr[Q = q] R

t R Pr[ yj = t] Pr[ yi < min{t, q}]dt dq R

q R Pr[Q = q] R

t R Pr[ yi = t] Pr[ yj < min{t, q}]dt dq (1)

For any ﬁxed value r R, we also have the following based on the Laplace distribution,

Pr[ yi = r] Pr[ yj = r] =

1 2α exp |r yi|

1 2α exp |r yj|

= exp |r yj|

By the triangle inequality we know that |r yj| |r yi| . It follows that for any t and q,

exp( /α) Pr[ yi = t]

Pr[ yj = t] exp( /α) and,

6As shown in Corollary 3.8, this also gives an improvement over regret when k is small ( O(

k/n) versus O( p

1/n)). 7The Laplace distribution Lap(b) has density function f(x) = exp( |x|/b).

Pr[ yi < min{q, t}] Pr[ yj < min{q, t}] =

r<min{q,t} Pr[ yi = r] dr R

r<min{q,t} Pr[ yj = r] dr

Plugging these bounds into Equation (1), we get Pi Pj exp(2 /α). The inequality that Pi/Pj 1 follows directly from yi yj.

3.3. Wrapping Up

We will present our algorithm FAIRTOP by combining the methods in the previous two sections. In the light of Lemma 3.2, we will deﬁne the following conﬁdence interval width function on the empirical CCDF values

c(ˆp) = 9 ln(2n/δ) p

and a normalized score function s(ˆp) = ˆp/c(ˆp). We have that any candidate is guaranteed a score not much lower than a less qualiﬁed one.

Lemma 3.5. Let x, y [0, 1] be the (true) CCDF values for two individuals such that x y. Let ˆx, ˆy be the empirical CCDF values respectively. Suppose that |x ˆx| c(ˆx) and |y ˆy| c(ˆy), then s(ˆx) s(ˆy) 1.

Algorithm 2 FAIRTOP(X = {xij}, ε, δ, k, m)

Input: candidates observations X, fairness parameters ε, δ, number of selected individuals k, and smallest population size m

For each individual xij X

Compute the empirical CCDF value ˆpij and the associated score s(ˆpij) Run NOISYTOP({s(ˆpij}, 2/ε, k)

Our algorithm FAIRTOP (presented in Algorithm 2) proceeds by ﬁrst computing the normalized score of every candidates based on their empirical CCDF values, and then calling NOISYTOP to output k individuals. We will ﬁrst establish the approximate fairness guarantee.

Theorem 3.6. The algorithm FAIRTOP instantiated with parameters ε and δ satisﬁes (ε, δ)-approximate fairness.

Proof sketch. By Lemma 3.2, we know that with probability 1 δ, for every candidate xij, the true and empirical CCDF values satisfy |pij ˆpij| c(ˆpij). This means that for any pair of individuals a and a with CCDF values pa < pa (that is, a is more qualiﬁed than a ), we also have s(ˆpa) s(ˆpa ) + 1 by Lemma 3.5. Finally, by the result of Lemma 3.4 and the instantiation of NOISYTOP, we guarantee that a will not be selected with substantially higher probability: Aa exp(ε) Aa , which recovers the approximate fairness guarantee.

Meritocratic Fairness for Cross-Population Selection

Our algorithm also has a diminishing regret guarantee:

Theorem 3.7. Fix any β (0, 1). Then with probability at least 1 β, the algorithm FAIRTOP instantiated with fairness parameters ε and δ has regret bounded by 1 ε

polylog(n, 1/β, 1/δ)

Thus for example, as the smallest sampled population size m grows (ﬁxing k and ε), our regret rapidly approaches 0. To understand the utility guarantee better, we will state the regret bound for the following natural scaling, which is also examined in the simulations of Section 7:

Corollary 3.8. Consider an instance with two population of sizes m1 and m2 such that m1 = αm2 for some constant α 1. Suppose we instantiate FAIRTOP with parameter ε = Θ(1), then the regret is at most O

4. Within Population Ex-Post Fairness

In this section, we provide a variant of the FAIRTOP algorithm that satisﬁes approximate ex-interim fairness across different populations, but also ex-post fairness within each population. The key idea here is that since we know the ranking of the candidates true qualities within each population, we can guarantee ex-post fairness within populations as long as we select a preﬁx of candidates in each population. This will however come at a cost our algorithm will no longer select exactly k individuals, but only approximately k individuals.

Similar to FAIRTOP, the algorithm ABOVETHRE (presented in Algorithm 3) also computes the normalized scores for each candidate. Instead of perturbing the scores, ABOVETHRE computes a noisy threshold Tj for each population by adding Laplace noise to s(k/n). The algorithm then selects all candidates with scores above the noisy threshold. Because the algorithm selects a preﬁx of the raw scores within each population, within population ex-post fairness is immediate. We also show that ABOVETHRE also achieves approximate ex-interim fairness.

Theorem 4.1. The algorithm ABOVETHRE instantiated with fairness parameters ε and δ satisﬁes both (ε, δ)- approximate ex-interim fairness and ex-post fairness within each population.

Note that were the algorithm to take all the individuals with scores above s(k/n), it would select a (k/n) fraction from each population and therefore select k people in total. Due to the noisy thresholds, the algorithm will only select approximately k individuals. We will now establish the utility guarantee of ABOVETHRE and show that the number of selected individuals is roughly k O(

k) when m = Θ(n).

Algorithm 3 ABOVETHRE(X = {xij}, ε, δ, k, m)

Input: observations X, fairness parameters ε, δ, target number of selected individuals k, smallest population size m

For each individual xij

Compute her empirical CCDF value ˆpij and the associated score s(ˆpij) For each population j

Compute a noisy threshold Tj = s(k/n)+νj where νj is drawn from Lap(1/ε) Select candidates xij with scores s(ˆpij) above Tj

Theorem 4.2. Fix any β (0, 1). With probability at least 1 β, the algorithm ABOVETHRE instantiated with fairness parameters ε and δ has regret bounded by 1 mε2 +

polylog(n, d, 1/δ, 1/β),

and selects a total number of ˆk individuals with

polylog(n, d, 1/δ, 1/β)

5. Lower Bound for Exact Fairness

We will show that it is impossible to achieve exact exinterim fairness with non-trivial regret guarantees.

Theorem 5.1. Fix any δ < 0.0002 and any δ-fair algorithm A. There exist two distributions F1 and F2 over the two populations such that if algorithm A takes m observations drawn from each distribution as input, and must select at least k = Ω(m1/2+α) individuals for any α > 0, A incurs a regret of Ω(1).

The main idea is to show that there exist distributions F1 and F2 such that any fair algorithm will essentially have to select uniformly at random across Ω(m) individuals, which incurs regret Ω(1). We will proceed via Bayesian reasoning. Suppose that the observations from the two populations are drawn from two different unit-variance Gaussian distributions N(µ1, 1) and N(µ2, 1), and both means µ1 and µ2 are themselves drawn from the prior N(0, 1). The following lemma characterizes the posterior distribution on the mean given a collection of observations.

Lemma 5.2. (Murphy, 2007) Suppose that a mean parameter µ is drawn from a prior distribution N(0, 1). Let D = (x1, x2, . . . , xm) be m i.i.d. draws from the distribution N(µ, 1). Then the posterior distribution of µ conditioned on D is the Gaussian distribution N(ˆµ, σ2), where ˆµ =

i xi m+1 and σ2 = 1 m+1.

Meritocratic Fairness for Cross-Population Selection

The result above shows that conditioned on any m draws from the Gaussian distribution, there is constant probability that the true mean will be bounded away from the posterior maximum likelihood estimate by at least Ω(1/ m). With this observation, we will partition the real line into the following intervals: Given any posterior mean ˆµ any integer r 1, let the two intervals I+ r (ˆµ) and I r (ˆµ) be

I+ r (ˆµ) = [ˆµ + (r 1)/ m, ˆµ + r/ m] and

I r (ˆµ) = [ˆµ r/ m, ˆµ (r 1)/ m]

The intervals capture the uncertainty we have regarding the CDF values of the observations xij. Let Xj = (x1j, x2j, . . . , xmj} denote the m draws from each distribution Fj, ˆµj =

i xij m+1 be the posterior mean for µj conditioned on the draws. Consider two individuals xi1 and xi 2 such that xi1 I+ r (ˆµ1) and xi 2 I+ r+1(ˆµ2). Even though (xi 2 ˆµ2) > (xi1 ˆµ1), there is a constant probability that their CDF values satisfy F1(xi1) > F2(xi 2). Any fair algorithm therefore must play these two individuals in these neighboring intervals with equal probabilities.

Next, we show that with high probability over the realizations of the true mean µ and the m draws X, all of the O(m log m) intervals around the posterior mean will be hit by points in X.

Lemma 5.3. Fix any c < 1 and β (0, 1). Let mean µ be drawn from N(0, 1), ˆr = cm log m + 1 and X = (x1, x2, . . . , xm) be m i.i.d. draws from N(µ, 1). Let ˆµ =

i xi m+1 . Then except with probability

2ˆr exp m(1/2 c/2)

+ 3β over the joint realizations of µ and X, the following holds

for all r ˆr 2 p

2 ln(2/β), there exist two draws x+ r , x r X such that x+ r I+ r (ˆµ) and x r I r (ˆµ);

the number of points that are bigger than ˆµ + (ˆr 2 p

2 ln(2/β))/ m is no more than

m1 c/2 + m1/2 c/4p

We show that the event that all of the consecutive intervals are occupied for both populations will force a fair algorithm to play all the individuals in these intervals with equal probability. More formally, ﬁx any c and sufﬁciently small constant β, let ˆr = cn log n + 1 and let Y = {xij | xij I+ r (ˆµj) xij I r (ˆµj) for some r ˆr 2 p

2 ln(2/β)}. Consider the following events:

FULLCHAIN(X1, X2): for all r ˆr 2 p

2 ln(2/β) and j {1, 2}, both the intervals I+ r (ˆµj), I r (ˆµj) contain at least one point in Xj,

UARCHAIN(A, X1, X2): the points in Y are selected by the algorithm A with equal probabilities.

Lemma 5.4. Fix any δ-fair algorithm A for some δ < 0.0002. With probability at least 1/2 over the realizations of µ1, µ2, X1 and X2, the event FULLCHAIN(X1, X2) implies UARCHAIN(A, X1, X2).

Proof sketch for Theorem 5.1. The combination of Lemmas 5.3 and 5.4 shows that with constant probability over µ1, µ2 and X, A will need to select Ω(m) individuals with equal probabilities, which leads to an expected regret of Ω(1) over the draws of µ1, µ2 This means there exist distributions F1 = N(µ 1, 1) and F2 = N(µ 2, 1) under which A incurs Ω(1) regret.

6. Sequential Batch Setting

We brieﬂy mention an extension to the sequential batch setting, in which the algorithm selects individuals in T rounds. In each round t, for each population j, there are mj new candidates with their observations drawn i.i.d. from the distribution Fj, At each round t, the algorithm needs to select k individuals from this pool. Let St j be the set of observations from population j accumulated after the ﬁrst t rounds. In particular, for any observation x and population j, let the historical CCDF value be ˆqt j(x) = 1 |mjt| P

x St j 1[x < x]. As t grows large, the empirical CCDF values become better estimates for the true CCDF values. We give a variant of the FAIRTOP algorithm that achieves (ε, δ)-approximate fairness in every round, and in-

curs average regret over time diminishing as O 1 ε

7. Simulations

We conclude by discussing some illustrative simulation results for FAIRTOP, along with comparisons to simpler algorithms without fairness guarantees. The simulations were conducted on data in which the raw scores for each population i = 1, 2 were drawn from N(µi, 1) respectively, and the µi themselves were chosen randomly from N(0, 1). Thus beﬁtting the motivation for our model, the raw scores are not directly comparable between populations. While we varied the population sizes, they were held in the ﬁxed ratio m1/m2 = 2 and k = 0.1(m1 + m2) .

For such a simulation with population sizes m1 = 100 and m2 = 50, Figure 1(a) shows the underlying scores computed by FAIRTOP (which depend only on the empirical CDF values) for each member of both populations, but sorted according to their true CDF values so that the transpositions that occur between emprical and true CDFs are apparent; the red points are for the larger population and green for the smaller. Overlaid on this arc of underlying scores is a black plot illustrating sample post-noise scores

Meritocratic Fairness for Cross-Population Selection

Figure 1. (a) Sample underlying and noisy scores as a function of true CDF rank for ε = 10. (b) Empirical distribution of selection counts as a function of true CDF rank for ε = 10. (c) Regret as a function of population sizes, for ε = 1 (green), 5 (red), and 10 (blue).

when ε = 10. As we can see, re-sorting the points by their noisy scores will result in a signiﬁcant amount of additional reshufﬂing.

Figure 1(b) illustrates the induced distribution over chosen individuals; here we show the results of resampling the Laplace noise (again at ε = 10) for 100,000 trials, and choosing the top k post-noise scores across populations. The ordering is again by true CDF values and the same color coding is used. At this value of ε the distribution is biased towards better true CDF values but still enjoys strong fairness properties. For example, the unfairness ratio (maximum ratio of the number of times a worse CDF value is chosen to a better CDF value is chosen) is only 1.56 (note that this is substantially stronger than the bound of e10 guaranteed by our theorem). It is also visually clear that FAIRTOP is treating similar CDF values similarly, both within and between populations.

Nevertheless, the regret of FAIRTOP for these population sizes and ε is nontrivial (roughly 0.20 regret compared to the best k true CDF values). Of course, as per Theorem 3.7 by increasing ε we can reduce regret to any desired level at the expense of weakened fairness guarantees. However, as per Corollary 3.8 even for ﬁxed ε (and therefore ﬁxed fairness properties), regret diminishes rapidly in the natural scaling where the population sizes grow, but in a ﬁxed ratio. This is illustrated empirically in Figure 1(c), where for varying choices of ε we plot regret as m1, m2 with m1/m2 = 2.

We now brieﬂy compare the properties of FAIRTOP to simpler approaches that generally enjoy lower regret but have no fairness properties. Perhaps the simplest is to pick the k highest ranked individuals by empirical CDF rank. This method will in general have very low regret, but since it is deterministic, any trial in which it doesn t select the top k true CDF values has no fairness guarantee (i.e. the unfairness ratio will be inﬁnite), and this happens in approx-

imately approximately 87% of trials under the simulation parameters above (and approaches 100% as populations grow in ﬁxed ratio).

Perhaps the most natural learning approach is to use the raw scores to obtain estimated population means ˆµi (or more generally to estimate the unknown parameters of some known or assumed parametric form) and then use the CDFs of N( ˆµ1, 1) and N( ˆµ2, 1) to select the k best individuals across the two populations. This again has generally lower regret than FAIRTOP, but is deterministic and without fairness guarantees, with approximately 53% of trials resulting in unbounded unfairness ratio (approaching 100% as populations grow in ﬁxed ratio).

But the main drawback of such a learning approach in comparison to the data-oblivious FAIRTOP is its need for realizability. For instance, if we change the population 2 scores to be drawn from the uniform distribution over a wide range, but the learning approach continues to assume normality in each population, it will virtually always choose only members of population 2, a clear and dramatic violation of any intuitive notion of fairness. This is of course due the fact that the highest scores in population 2 appear to have extraordinarily high CDF values when (incorrectly) assumed to have been drawn from a normal distribution. In contrast FAIRTOP, since it doesn t even consider the actual scores but only generic properties of the relationship between empirical and true CDF values, will behave exactly the same, in both fairness and regret, regardless of how the underlying scores are generated.

Meritocratic Fairness for Cross-Population Selection

Cohoon, J Mc Grath, Nigai, Sergey, and Kaye, Joseph Joﬁsh. Gender and computing conference papers. Communications of the ACM, 54(8):72 80, 2011.

Dvoretzky, A., Kiefer, J., and Wolfowitz, J. Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. Ann. Math. Statist., 27(3):642 669, 09 1956. doi: 10.1214/ aoms/1177728174. URL http://dx.doi.org/ 10.1214/aoms/1177728174.

Dwork, Cynthia and Roth, Aaron. The algorithmic foundations of differential privacy. Foundations and Trends R in Theoretical Computer Science, 9(3 4):211 407, 2014.

Dwork, Cynthia, Mc Sherry, Frank, Nissim, Kobbi, and Smith, Adam. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, pp. 265 284. Springer, 2006.

Dwork, Cynthia, Hardt, Moritz, Pitassi, Toniann, Reingold, Omer, and Zemel, Richard. Fairness through awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, pp. 214 226. ACM, 2012.

Jabbari, Shahin, Joseph, Matthew, Kearns, Michael, Morgenstern, Jamie, and Roth, Aaron. Fair learning in Markovian environments. ar Xiv preprint ar Xiv:1611.03071, 2016.

Joseph, Matthew, Kearns, Michael, Morgenstern, Jamie H, and Roth, Aaron. Fairness in learning: Classic and contextual bandits. In Advances in Neural Information Processing Systems, pp. 325 333, 2016.

Mc Sherry, Frank and Talwar, Kunal. Mechanism design via differential privacy. In Foundations of Computer Science, 2007. FOCS 07. 48th Annual IEEE Symposium on, pp. 94 103. IEEE, 2007.

Murphy, Kevin P. Conjugate Bayesian analysis of the Gaussian distribution. 2007. URL https://www.cs.ubc.ca/ murphyk/ Papers/bayes Gauss.pdf.

Raskhodnikova, Sofya and Smith, Adam D. Lipschitz extensions for node-private graph statistics and the generalized exponential mechanism. In Dinur, Irit (ed.), IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, 9-11 October 2016, Hyatt Regency, New Brunswick, New Jersey, USA, pp. 495 504. IEEE Computer Society, 2016. doi: 10.1109/FOCS. 2016.60. URL http://dx.doi.org/10.1109/ FOCS.2016.60.