# meritocratic_fairness_for_crosspopulation_selection__2165be67.pdf Meritocratic Fairness for Cross-Population Selection Michael Kearns 1 Aaron Roth 1 Zhiwei Steven Wu 1 We consider the problem of selecting a pool of individuals from several populations with incomparable skills (e.g. soccer players, mathematicians, and singers) in a fair manner. The quality of an individual is defined to be their relative rank (by cumulative distribution value) within their own population, which permits cross-population comparisons. We study algorithms which attempt to select the highest quality subset despite the fact that true CDF values are not known, and can only be estimated from the finite pool of candidates. Specifically, we quantify the regret in quality imposed by meritocratic notions of fairness, which require that individuals are selected with probability that is monotonically increasing in their true quality. We give algorithms with provable fairness and regret guarantees, as well as lower bounds, and provide empirical results which suggest that our algorithms perform better than the theory suggests. 1. Introduction Consider the following common academic (or similar) hiring scenario: The dean has promised your department 3 faculty slots, in any areas. Your goal is to hire the best candidates possible but how should you identify them? An immediate problem is that candidates are incomparable across subfields, because, among other things, standards of publication, citation counts, and letter-writing styles can vary considerably across subfields. An attractive way to rank candidates is according to how strong they are relative to others working in the same field, to whom they are directly comparable. If we model each subfield as corresponding to a different distribution over metrics that are monotonically increasing in candidate quality, this is the value we get when we evaluate the CDF function of the distribution on a candidate s realized value. But because 1University of Pennsylvania, Philadelphia, USA. Correspondence to: Zhiwei Steven Wu . Proceedings of the 34 th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017. Copyright 2017 by the author(s). the number of candidates each year is small, simply comparing each candidate to their direct competitors this year i.e. taking their empirical CDF values as truth would lead to a noisy ranking: it could be that due to chance, the best candidate this year in subfield A would be a mediocre candidate in a typical year, and the top two candidates in subfield B would each be the top candidate in a typical year. We would prefer to evaluate our success by considering the unknown true CDF value of each candidate.1 Similar situations, in which we must select a high quality set of candidates from multiple, mutually incomparable groups, arise frequently. Some affirmative action policies are premised on the assertion that SAT scores and other measures may not be directly comparable across different groups (e.g. due to only advantaged groups having the financial resources for test preparation courses and multiple retakes). For various reasons, in these settings we may also be concerned with the fairness of our choices.2 But what should fairness mean? In this paper, we take inspiration from (Dwork et al., 2012) who propose that fairness should mean that similar individuals are treated similarly , where similarity is measured with respect to some task specific metric. In our setting, the natural task-specific metric is the true within-group CDF value for each individual. On its own, this is compatible with the goal of selecting the best candidates, but in our work, the main obstacle is that we do not know the true CDF value of each individual, and can only approximate this from data. We study the degree to which fairness and optimality are compatible with one another in this setting. 1.1. Our Results We study a setting in which we wish to select k individuals out of a pool of n for some task. The individuals are drawn from d populations, each represented by a different 1Letters of recommendation often seek to communicate this information, with statements like This candidate is among the top 5 students I have seen in my 16 years as a professor. 2With respect to men s and women s sports, equal opportunity is legislated in Title IX. With respect to faculty hiring, fairness concerns can arise because the proportion of women can vary substantially across subfields. For example, as reported in (Cohoon et al., 2011), the percentage of female authors varies from 10% to 44% across ACM conferences, when averaged over the 10 year period from 1998-2008. Meritocratic Fairness for Cross-Population Selection distribution over real numbers.3 The number of draws from each distribution may differ. The quality of an individual is defined to be their (true) CDF value, as evaluated on the distribution from which they were drawn. An algorithm is evaluated based on the (expected) quality of the k individuals it selects. The meritocratic fairness definition we propose informally asks that lower quality individuals are never (probabilistically) favored over higher quality individuals. When formulating this definition, we have a choice as to how to incorporate randomness. The strongest formulation possible (ex-post fairness) does not involve randomness, and simply requires that every individual actually selected has quality at least that of every individual not selected. The weakest formulation (ex-ante fairness) incorporates the randomness of the selection of the population from the underlying distribution, and informally requires that for any pair of individuals, the higher quality individual is selected with weakly higher probability than the lower quality individual, where the randomness is over the realization of the population from the underlying distributions, as well as any internal randomness of the mechanism. An intermediate formulation (ex-interim fairness) requires informally that higher quality individuals be selected with weakly higher probability than lower quality individuals, where the probability is computed over the randomness of the mechanism, but not over the selection of the population. Roughly speaking, these choices correspond to what an individual may know and still be satisfied by a promise of fairness . Individuals should be satisfied with ex-post fairness even after the choices of the mechanism are made, with full knowledge of the applicant pool that is, they should be satisfied with the actual outcome, regardless of the algorithm used to reach it. In contrast, individuals with full knowledge of the applicant pool should still be satisfied with ex-interim fairness before the mechanism makes its decisions that is, they should feel satisfied that the algorithm used is fair. An individual should only be satisfied by ex-ante fairness if she has no knowledge of the applicant pool (and so can consider it a random variable) before the choices are made. Given such a spectrum of fairness constraints, we observe that the strongest ex-post fairness is impossible to achieve, whereas the weakest ex-ante fairness is sometimes easy to achieve: when the population sizes are the same, it is satisfied by the mechanism that simply selects the k individuals with highest empirical CDF values.4 Our main results 3We study the simple setting in which each individual is represented by a 1-dimensional score e.g. a credit score, a time in the 100m dash, etc. which itself may encapsulate or summarize many features into a single value. Generalizing this work to richer representations is an interesting direction for future work. 4However, for the cases in which the populations are not the same size, we do not know of better utility guarantees for ex-ante therefore concern the cost (in terms of the expected quality of the selected applicants) of asking for the stronger notion of ex-interim fairness. We show that satisfying an exact variant of this constraint requires the selection algorithm to select uniformly at random amongst all individuals, and hence obtain only trivial utility guarantees, but that subject to an approximate relaxation of this constraint, it is possible to recover asymptotically optimal utility bounds. We show that when we further relax the problem, to allow the algorithm to select approximately k individuals (rather than exactly k), it is possible to recover asymptotically optimal utility bounds while satisfying ex-post fairness guarantees within each sub-population, and approximate ex-interim fairness guarantees across populations. We summarize our results in Table 1. We complement our theoretical results with empirical simulations which emphasize that both the utility and fairness guarantees of our algorithms are better in practice than our theorems promise. Finally, we remark on an interesting property of our upper bounds: they are oblivious, in the sense that they do not make use of the raw scores associated with each individual only their empirical CDF ranking. As such, our upper bounds can be viewed as universal distributions over permutations (of empirical CDF rankings) that satisfy a fairness guarantee, rather than algorithms. Our lower bounds apply not just to oblivious algorithms, but to any algorithm, even those that can make use of raw scores (or indeed, even knowledge of the family of distributions from which populations are drawn). 1.2. Related Work This paper fits into a rapidly growing line of work studying fairness in learning settings that is now too large to summarize fully, and so we discuss only the most closely related work. Our definition of fairness is in the spirit of (Dwork et al., 2012), who propose that individual fairness should mean that similar individuals are treated similarly with respect to some underlying task-specific metric. As with the work of (Joseph et al., 2016; Jabbari et al., 2016), we define the metric to be a measure of quality already present in the model (in our case, the CDF values of individuals) but unknown to the algorithm, except through samples. It is this necessity to learn the underlying metric that poses the tension between the fairness constraint and the accuracy goal. Although in this line of work, we adopt a definition that merely requires better individuals be treated better according to the true unknown metric, this necessarily requires that similar individuals be treated similarly with respect to empirical estimates of the metric. Technically, our work includes adaptations of techniques in fairness than those we derive for the stronger notion of ex-interim fairness. Meritocratic Fairness for Cross-Population Selection Exact Fairness Approximate Fairness Ex-Ante Regret O(1/n) (Lemma 2.8) Regret O(1/n) (Lemma 2.8) Ex-Interim Regret Ω(1) (Theorem 5.1). Regret O( k/n) (Theorem 3.7) Ex-Post Impossible Regret O( k/n)* (Theorem 4.2) Table 1. An informal summary of results. The bounds are stated in the case when the populations have sizes within a constant factor of one another see the theorem statements for the precise bounds. When the population sizes are the same. *Exact ex-post fairness within each population, approximate ex-interim fairness between populations, and selects approximately k individuals. differential privacy (Dwork et al., 2006). Specifically, we adopt variants of the report noisy max algorithm (Dwork & Roth, 2014), and Raskhodnikova and Smith s exponential mechanism for scores with varying sensitivities (Raskhodnikova & Smith, 2016), which is itself a variant of the exponential mechanism (Mc Sherry & Talwar, 2007). 2. Model and Preliminaries There are d different populations, indexed by j. For each population j, there is a pool of candidates with their raw scores (and henceforth observations) drawn i.i.d. from some unknown continuous distribution Fj over R. Let F = F1 Fd denote the product distribution. We will slightly abuse notation and write xij to denote both the individual i in the population j and her associated observation, and write X to denote the set of all candidates. Let mj be the size of the candidate pool from population j, n = P j mj be the size of the total population, and m = minj mj be the smallest population size. Each individual xij is associated with the following values. A cumulative distribution function (CDF) value Fj(xij) = Pr Fj[x < xij],5 and an empirical CDF value b Fj(xij) = 1 mj Pmj i =1 1[x < xi j]. A complementary cumulative distribution function (CCDF) value: pij = 1 Fj(xij) and an empirical CCDF value ˆpij = 1 mj Pmj i =1 1[x xi j]. A selection algorithm A takes all the n observations X drawn from different distributions as input, and (randomly) selects k individuals as outputs. We will write A(X, xij) (or Aij for simplicity) to denote the selection probability over the individual xij. The utility for selecting an individual xij is her true CDF value Fj(xij). Equivalently, the loss for selecting an individual xij is the true CCDF value pij. The loss for an algorithm A on input X is then defined as L(A, X) = 1 xij X A(X, xij)(1 Fj(xij)) 5We adopt a slightly different definition from the standard one: Fj(xij) = Pr Fj[x xij]. and the expected loss of the algorithm is EX F [L(A, X)]. 2.1. Fairness Formulation Our goal is design selection algorithms subject to a meritocratic fairness notion that requires that less qualified candidates (in terms of CDF values) are never preferred over more qualified ones. We will present three different formulations of such notion based on the different forms of randomness we are considering. First, the weakest formulation is the following ex-ante fairness, which guarantees fairness over the randomness of both the random draws of the candidates and the coin flips of the algorithm. Definition 2.1 (Ex-Ante Fairness). An algorithm A satisfies ex-ante fairness if for any pair of candidates xij, xi j with CDF values Fj(xij) > Fj (xi j ), their selection probabilities (when they are in the pool) satisfy E [A(X, xij)] E [A(X, xi j )] where the expectations are taken over the (n 2) random draws of all the other candidates. An intermediate formulation of fairness is the following exinterim fairness, which guarantees fairness over the randomness of the algorithms (but not the realizations of X) on almost all of inputs drawn from the distribution. Definition 2.2 (Exact Ex-Interim Fairness). Let δ (0, 1). An algorithm A satisfies δ-exact ex-interim fairness if with probability at least 1 δ over the realized observations X, for any pair of individuals xij, xi j X, A(X, xij) > A(X, xi j ) only if Fj(xij) > Fj (xi j ) We also consider the following relaxation: Definition 2.3 (Approximate Ex-Interim Fairness). An algorithm A satisfies (ε, δ)-approximate ex-interim fairness if with probability at least 1 δ over the realized observations X, for any pair of individuals xij, xi j X, A(X, xij) > eε A(X, xi j ) only if Fj(xij) > Fj (xi j ) Remark 2.4. We note that this relaxation of ex-interim fairness bears a similarity to the definition of differential Meritocratic Fairness for Cross-Population Selection privacy (Dwork et al., 2006), and indeed, techniques from the differential privacy literature will prove useful in designing algorithms to satisfy it. Perhaps the strongest formulation is the following ex-post fairness condition, which requires that an individual is selected only if a more qualified individual is also selected. Definition 2.5 (Ex-post Fairness). An algorithm A satisfies ex-post fairness if any pair of individuals xij and xi j such that Fj(xij) > Fj (xi j ), the individual xi j is admitted only if xij is also selected. Note that any algorithm that satisfies ex-post fairness must admit a prefix of individuals from each population, which is also sufficient to guarantee within population ex-post fairness, but that this is not sufficient to satisfy the constraint between populations. It is not hard to see that satisfying ex-post fairness in the generality that we have defined it is impossible, since it requires perfectly selecting the k true best CDF values from only sample data. Thus, the primary focus of our paper is on ex-interim fairness. Unless we specify differently, the term fair and fairness refer to ex-interim fairness. 2.2. Oblivious Algorithms A special class of selection algorithms is the class of oblivious algorithms, which select candidates with probabilities that only depend on their empirical CDF values, not on their observations. Definition 2.6 (Oblivious Algorithms). An algorithm A is oblivious if for any pair of input observations X and X that induce the same empirical CDF values over the candidates, A(X) = A(X ). All of our algorithms presented in this paper are oblivious. As a result, we need to make no assumption on the underlying distributions to achieve both fairness and utility guarantees. Moreover, the utility guarantee of an oblivious algorithm can be characterized as follows. Lemma 2.7. The expected loss achieved by any oblivious algorithm A is the expected average empirical CCDF values among the selected candidates. A very simple example of an oblivious algorithm is GREEDY which selects the k individuals with the highest empirical CDF values (breaking ties uniformly at random). Lemma 2.8. Suppose that the populations sizes are the same, that is, mj = m for each j. The algorithm GREEDY satisfies ex-ante fairness and has an expected loss at most k 2n + 1 To simplify our bounds on the expected loss, we will use k/2n as our benchmark and define the regret of an algorithm A to be R(A) = EX F [L(A, X)] k 2n. 3. An Approximately Fair Algorithm In this section, we provide an algorithm that satisfies approximate fairness in the sense of Definition 2.3. We will present our solution in three steps. 1. First, we provide confidence intervals for the candidates CCDF values pij based on their empirical CCDF values ˆpij. As we show, our bound has a tighter dependence on pij, which gives better utility guarantee than using the standard DKW inequality of Dvoretzky et al. (1956). 2. Next, we give a simple subroutine NOISYTOP that randomly selects k individuals out of n based on their scores . We show that individuals with similar scores will have close selection probabilities under this subroutine. This subroutine is similar to the Report Noisy Max algorithm (Dwork & Roth, 2014). 3. Then, we will use the deviation bound in the first step to assign scores to the candidates. We show that running NOISYTOP based on these scores give approximate fairness and low regret guarantees. These scores are computed in a way similar to the generalized exponential mechanism of Raskhodnikova & Smith (2016). 3.1. Confidence Intervals for CCDF Values We will first give the following concentration inequality specialized for the uniform distribution over (0, 1). Lemma 3.1. Fix any n N. Let x1, x2, . . . , xn be i.i.d. draws from the uniform distribution over (0, 1). Then with probability at least 1 δ, for any p (0, 1), where ˆp = 1 n Pn i=1 1[xi < p]. To translate this result into a deviation bound on the CCDF values, first note that CCDF values for any distribution Fj are drawn from the uniform distribution over (0, 1), so the bound applies immediately to the CCDF values. By a standard calculation, we can also get a bound in terms of the empirical CCDF value ˆpij as shown below. Lemma 3.2. For each j [d], draw mj points Xj = {xij}mj i=1 i.i.d. from Fj. For each point xij, let pij be its true CCDF value and ˆpij be its empirical CCDF value in Fj. Then with probability at least 1 δ over the n random draws, |pij ˆpij| 9 where m = minj mj and n = Pd j=1 mj. Meritocratic Fairness for Cross-Population Selection Remark 3.3. The standard DKW inequality gives a bound of O( p 1/m). Our bound gives a tighter dependence for small empirical CCDF value ˆpij. For example, when ˆpij = 1/m, we obtain a bound of O(1/m). 6 3.2. The Noisy Top Subroutine Given a set of individuals with scores Y = {y1, . . . , yn}, the subroutine NOISYTOP will first perturb each score by adding independent noise drawn from the Laplace distribution,7 and output the k individuals with the minimum noisy scores (ties broken arbitrarily). We will now show that NOISYTOP has the following desirable Lipschitz property individuals with similar scores are chosen with similar probabilities. This is crucial for obtaining approximate fairness. Algorithm 1 NOISYTOP({y1, y2, . . . , yn}, α, k) Input: n numbers {y1, y2, . . . , yn} and parameter α For each i [n]: let yi = yi + Lap(α) Output: the k indices with the smallest yi Lemma 3.4. Let i, j [n] be such that = yi yj 0. Let Pi and Pj denote the probabilities that the two indices i and j are output by NOISYTOP({y1, y2, . . . , yn}, α) respectively. Then Pi Pj Pi exp(2 /α). Proof. Let yi and yj be the noisy scores for i or j. We will introduce a new random variable Q to denote the value of the (k 1)-st lowest noisy value, not counting yi and yj. We will slightly abuse notation and write Pr[R = r] as a shorthand for the pdf of any random variable R evaluated at r. The ratio Pi Pj can then be written as R q R Pr[Q = q] R t R Pr[ yj = t] Pr[ yi < min{t, q}]dt dq R q R Pr[Q = q] R t R Pr[ yi = t] Pr[ yj < min{t, q}]dt dq (1) For any fixed value r R, we also have the following based on the Laplace distribution, Pr[ yi = r] Pr[ yj = r] = 1 2α exp |r yi| 1 2α exp |r yj| = exp |r yj| By the triangle inequality we know that |r yj| |r yi| . It follows that for any t and q, exp( /α) Pr[ yi = t] Pr[ yj = t] exp( /α) and, 6As shown in Corollary 3.8, this also gives an improvement over regret when k is small ( O( k/n) versus O( p 1/n)). 7The Laplace distribution Lap(b) has density function f(x) = exp( |x|/b). Pr[ yi < min{q, t}] Pr[ yj < min{q, t}] = r 0, A incurs a regret of Ω(1). The main idea is to show that there exist distributions F1 and F2 such that any fair algorithm will essentially have to select uniformly at random across Ω(m) individuals, which incurs regret Ω(1). We will proceed via Bayesian reasoning. Suppose that the observations from the two populations are drawn from two different unit-variance Gaussian distributions N(µ1, 1) and N(µ2, 1), and both means µ1 and µ2 are themselves drawn from the prior N(0, 1). The following lemma characterizes the posterior distribution on the mean given a collection of observations. Lemma 5.2. (Murphy, 2007) Suppose that a mean parameter µ is drawn from a prior distribution N(0, 1). Let D = (x1, x2, . . . , xm) be m i.i.d. draws from the distribution N(µ, 1). Then the posterior distribution of µ conditioned on D is the Gaussian distribution N(ˆµ, σ2), where ˆµ = i xi m+1 and σ2 = 1 m+1. Meritocratic Fairness for Cross-Population Selection The result above shows that conditioned on any m draws from the Gaussian distribution, there is constant probability that the true mean will be bounded away from the posterior maximum likelihood estimate by at least Ω(1/ m). With this observation, we will partition the real line into the following intervals: Given any posterior mean ˆµ any integer r 1, let the two intervals I+ r (ˆµ) and I r (ˆµ) be I+ r (ˆµ) = [ˆµ + (r 1)/ m, ˆµ + r/ m] and I r (ˆµ) = [ˆµ r/ m, ˆµ (r 1)/ m] The intervals capture the uncertainty we have regarding the CDF values of the observations xij. Let Xj = (x1j, x2j, . . . , xmj} denote the m draws from each distribution Fj, ˆµj = i xij m+1 be the posterior mean for µj conditioned on the draws. Consider two individuals xi1 and xi 2 such that xi1 I+ r (ˆµ1) and xi 2 I+ r+1(ˆµ2). Even though (xi 2 ˆµ2) > (xi1 ˆµ1), there is a constant probability that their CDF values satisfy F1(xi1) > F2(xi 2). Any fair algorithm therefore must play these two individuals in these neighboring intervals with equal probabilities. Next, we show that with high probability over the realizations of the true mean µ and the m draws X, all of the O(m log m) intervals around the posterior mean will be hit by points in X. Lemma 5.3. Fix any c < 1 and β (0, 1). Let mean µ be drawn from N(0, 1), ˆr = cm log m + 1 and X = (x1, x2, . . . , xm) be m i.i.d. draws from N(µ, 1). Let ˆµ = i xi m+1 . Then except with probability 2ˆr exp m(1/2 c/2) + 3β over the joint realizations of µ and X, the following holds for all r ˆr 2 p 2 ln(2/β), there exist two draws x+ r , x r X such that x+ r I+ r (ˆµ) and x r I r (ˆµ); the number of points that are bigger than ˆµ + (ˆr 2 p 2 ln(2/β))/ m is no more than m1 c/2 + m1/2 c/4p We show that the event that all of the consecutive intervals are occupied for both populations will force a fair algorithm to play all the individuals in these intervals with equal probability. More formally, fix any c and sufficiently small constant β, let ˆr = cn log n + 1 and let Y = {xij | xij I+ r (ˆµj) xij I r (ˆµj) for some r ˆr 2 p 2 ln(2/β)}. Consider the following events: FULLCHAIN(X1, X2): for all r ˆr 2 p 2 ln(2/β) and j {1, 2}, both the intervals I+ r (ˆµj), I r (ˆµj) contain at least one point in Xj, UARCHAIN(A, X1, X2): the points in Y are selected by the algorithm A with equal probabilities. Lemma 5.4. Fix any δ-fair algorithm A for some δ < 0.0002. With probability at least 1/2 over the realizations of µ1, µ2, X1 and X2, the event FULLCHAIN(X1, X2) implies UARCHAIN(A, X1, X2). Proof sketch for Theorem 5.1. The combination of Lemmas 5.3 and 5.4 shows that with constant probability over µ1, µ2 and X, A will need to select Ω(m) individuals with equal probabilities, which leads to an expected regret of Ω(1) over the draws of µ1, µ2 This means there exist distributions F1 = N(µ 1, 1) and F2 = N(µ 2, 1) under which A incurs Ω(1) regret. 6. Sequential Batch Setting We briefly mention an extension to the sequential batch setting, in which the algorithm selects individuals in T rounds. In each round t, for each population j, there are mj new candidates with their observations drawn i.i.d. from the distribution Fj, At each round t, the algorithm needs to select k individuals from this pool. Let St j be the set of observations from population j accumulated after the first t rounds. In particular, for any observation x and population j, let the historical CCDF value be ˆqt j(x) = 1 |mjt| P x St j 1[x < x]. As t grows large, the empirical CCDF values become better estimates for the true CCDF values. We give a variant of the FAIRTOP algorithm that achieves (ε, δ)-approximate fairness in every round, and in- curs average regret over time diminishing as O 1 ε 7. Simulations We conclude by discussing some illustrative simulation results for FAIRTOP, along with comparisons to simpler algorithms without fairness guarantees. The simulations were conducted on data in which the raw scores for each population i = 1, 2 were drawn from N(µi, 1) respectively, and the µi themselves were chosen randomly from N(0, 1). Thus befitting the motivation for our model, the raw scores are not directly comparable between populations. While we varied the population sizes, they were held in the fixed ratio m1/m2 = 2 and k = 0.1(m1 + m2) . For such a simulation with population sizes m1 = 100 and m2 = 50, Figure 1(a) shows the underlying scores computed by FAIRTOP (which depend only on the empirical CDF values) for each member of both populations, but sorted according to their true CDF values so that the transpositions that occur between emprical and true CDFs are apparent; the red points are for the larger population and green for the smaller. Overlaid on this arc of underlying scores is a black plot illustrating sample post-noise scores Meritocratic Fairness for Cross-Population Selection Figure 1. (a) Sample underlying and noisy scores as a function of true CDF rank for ε = 10. (b) Empirical distribution of selection counts as a function of true CDF rank for ε = 10. (c) Regret as a function of population sizes, for ε = 1 (green), 5 (red), and 10 (blue). when ε = 10. As we can see, re-sorting the points by their noisy scores will result in a significant amount of additional reshuffling. Figure 1(b) illustrates the induced distribution over chosen individuals; here we show the results of resampling the Laplace noise (again at ε = 10) for 100,000 trials, and choosing the top k post-noise scores across populations. The ordering is again by true CDF values and the same color coding is used. At this value of ε the distribution is biased towards better true CDF values but still enjoys strong fairness properties. For example, the unfairness ratio (maximum ratio of the number of times a worse CDF value is chosen to a better CDF value is chosen) is only 1.56 (note that this is substantially stronger than the bound of e10 guaranteed by our theorem). It is also visually clear that FAIRTOP is treating similar CDF values similarly, both within and between populations. Nevertheless, the regret of FAIRTOP for these population sizes and ε is nontrivial (roughly 0.20 regret compared to the best k true CDF values). Of course, as per Theorem 3.7 by increasing ε we can reduce regret to any desired level at the expense of weakened fairness guarantees. However, as per Corollary 3.8 even for fixed ε (and therefore fixed fairness properties), regret diminishes rapidly in the natural scaling where the population sizes grow, but in a fixed ratio. This is illustrated empirically in Figure 1(c), where for varying choices of ε we plot regret as m1, m2 with m1/m2 = 2. We now briefly compare the properties of FAIRTOP to simpler approaches that generally enjoy lower regret but have no fairness properties. Perhaps the simplest is to pick the k highest ranked individuals by empirical CDF rank. This method will in general have very low regret, but since it is deterministic, any trial in which it doesn t select the top k true CDF values has no fairness guarantee (i.e. the unfairness ratio will be infinite), and this happens in approx- imately approximately 87% of trials under the simulation parameters above (and approaches 100% as populations grow in fixed ratio). Perhaps the most natural learning approach is to use the raw scores to obtain estimated population means ˆµi (or more generally to estimate the unknown parameters of some known or assumed parametric form) and then use the CDFs of N( ˆµ1, 1) and N( ˆµ2, 1) to select the k best individuals across the two populations. This again has generally lower regret than FAIRTOP, but is deterministic and without fairness guarantees, with approximately 53% of trials resulting in unbounded unfairness ratio (approaching 100% as populations grow in fixed ratio). But the main drawback of such a learning approach in comparison to the data-oblivious FAIRTOP is its need for realizability. For instance, if we change the population 2 scores to be drawn from the uniform distribution over a wide range, but the learning approach continues to assume normality in each population, it will virtually always choose only members of population 2, a clear and dramatic violation of any intuitive notion of fairness. This is of course due the fact that the highest scores in population 2 appear to have extraordinarily high CDF values when (incorrectly) assumed to have been drawn from a normal distribution. In contrast FAIRTOP, since it doesn t even consider the actual scores but only generic properties of the relationship between empirical and true CDF values, will behave exactly the same, in both fairness and regret, regardless of how the underlying scores are generated. Meritocratic Fairness for Cross-Population Selection Cohoon, J Mc Grath, Nigai, Sergey, and Kaye, Joseph Jofish. Gender and computing conference papers. Communications of the ACM, 54(8):72 80, 2011. Dvoretzky, A., Kiefer, J., and Wolfowitz, J. Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. Ann. Math. Statist., 27(3):642 669, 09 1956. doi: 10.1214/ aoms/1177728174. URL http://dx.doi.org/ 10.1214/aoms/1177728174. Dwork, Cynthia and Roth, Aaron. The algorithmic foundations of differential privacy. Foundations and Trends R in Theoretical Computer Science, 9(3 4):211 407, 2014. Dwork, Cynthia, Mc Sherry, Frank, Nissim, Kobbi, and Smith, Adam. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, pp. 265 284. Springer, 2006. Dwork, Cynthia, Hardt, Moritz, Pitassi, Toniann, Reingold, Omer, and Zemel, Richard. Fairness through awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, pp. 214 226. ACM, 2012. Jabbari, Shahin, Joseph, Matthew, Kearns, Michael, Morgenstern, Jamie, and Roth, Aaron. Fair learning in Markovian environments. ar Xiv preprint ar Xiv:1611.03071, 2016. Joseph, Matthew, Kearns, Michael, Morgenstern, Jamie H, and Roth, Aaron. Fairness in learning: Classic and contextual bandits. In Advances in Neural Information Processing Systems, pp. 325 333, 2016. Mc Sherry, Frank and Talwar, Kunal. Mechanism design via differential privacy. In Foundations of Computer Science, 2007. FOCS 07. 48th Annual IEEE Symposium on, pp. 94 103. IEEE, 2007. Murphy, Kevin P. Conjugate Bayesian analysis of the Gaussian distribution. 2007. URL https://www.cs.ubc.ca/ murphyk/ Papers/bayes Gauss.pdf. Raskhodnikova, Sofya and Smith, Adam D. Lipschitz extensions for node-private graph statistics and the generalized exponential mechanism. In Dinur, Irit (ed.), IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, 9-11 October 2016, Hyatt Regency, New Brunswick, New Jersey, USA, pp. 495 504. IEEE Computer Society, 2016. doi: 10.1109/FOCS. 2016.60. URL http://dx.doi.org/10.1109/ FOCS.2016.60.