# fair_and_useful_cohort_selection__9fe79009.pdf Published in Transactions on Machine Learning Research (09/2023) Fair and Useful Cohort Selection Konstantina Bairaktari bairaktari.k@northeastern.edu Khoury College of Computer Sciences Northeastern University Paul Langton langton.p@northeastern.edu Khoury College of Computer Sciences Northeastern University Huy Le Nguyen hu.nguyen@northeastern.edu Khoury College of Computer Sciences Northeastern University Niklas Smedemark-Margulies smedemark-margulie.n@northeastern.edu Khoury College of Computer Sciences Northeastern University Jonathan Ullman jullman@ccs.neu.edu Khoury College of Computer Sciences Northeastern University Reviewed on Open Review: https://openreview.net/forum?id=w Rep Wp1KC7 A challenge in fair algorithm design is that, while there are compelling notions of individual fairness, these notions typically do not satisfy desirable composition properties, and downstream applications based on fair classifiers might not preserve fairness. To study fairness under composition, Dwork & Ilvento (2019) introduced an archetypal problem called faircohort-selection problem, where a single fair classifier is composed with itself to select a group of candidates of a given size, and proposed a solution to this problem. In this work we design algorithms for selecting cohorts that not only preserve fairness, but also maximize the utility of the selected cohort under two notions of utility that we introduce and motivate. We give optimal (or approximately optimal) polynomial-time algorithms for this problem in both an offline setting, and an online setting where candidates arrive one at a time and are classified as they arrive. 1 Introduction The rise of algorithmic decision making has created a need for research on the design of fair algorithms, especially for machine-learning tasks like classification and prediction. Beginning with the seminal work of Dwork et al. (2012), there is now a large body of algorithms that preserve various notions of fairness for individuals. These notions of individual fairness capture the principle that similar people should be treated similarly, according to some task-specific measure of similarity. While these algorithms satisfy compelling notions of individual fairness in isolation, they will often be used as parts of larger systems. To address this broader context, Dwork & Ilvento (2019) initiated the study of fairness under composition, where one or more fair classifiers will be combined to make multiple decisions, and we want these decisions to preserve the fairness properties of those underlying classifiers. Their work Published in Transactions on Machine Learning Research (09/2023) introduced a number of models where fair mechanisms must be composed, demonstrated that naïve methods of composing fair algorithms do not preserve fairness, and identified strategies for preserving fairness in these settings. Importantly, these strategies treat the underlying classifiers as a black-box so that composition can be modular, eliminating the need to redesign the underlying classifier for every possible use. Our work studies the archetypal fair cohort selection problem introduced by Dwork & Ilvento (2019). In this problem, we would like to select k candidates for a job from a universe of n possible candidates. We are given a classifier that assigns a score si [0, 1] to each candidate,1 with the guarantee that the scores are individually fair with respect to some similarity metric D. We would like to select a set of k candidates such that the probability pi of selecting any candidate should be fair with respect to the same metric. Preserving Fairness. Since our goal is to treat the scores s1, . . . , sn as a black-box, without knowing the underlying fairness metric, we posit that the scores are fair with respect to some D, i.e. they satisfy i, j, |si sj| D(i, j), (1) but that this metric is unknown. Thus, we will design our cohort selection algorithm so that its output probabilities p1, . . . , pn will satisfy i, j, |pi pj| |si sj| D(i, j) (2) as this guarantees that we preserve the underlying fairness constraint. Without further assumptions about the fairness metric, the only way to preserve fairness is to satisfy (2). We can trivially solve the fair cohort selection problem by ignoring the scores and selecting a uniformly random set of k candidates, so that every candidate is chosen with the same probability k/n. Thus, in this paper we propose to study algorithms for fair cohort selection that produce a cohort with optimal (or approximately optimal) utility subject to the fairness constraint. Specifically, we consider two variants of the problem: Linear Utilities. For the linear utility model, we assume that 1. the utility for selecting a cohort is the sum of independently assigned utilities for each member, and 2. the scores given by the original classifier accurately reflect the utility of each member of the cohort. Assumption 1 essentially says that members of the cohort are not complements or substitutes and assumption 2 makes sense if the task that we designed the classifier to solve and the task that the cohort is being used for are closely related. Under these assumptions, it is natural to define the utility for a cohort selection algorithm to be P i pisi, which is the expectation over the choice of the cohort of the sum of the scores across members in the cohort. Ratio Utilities. For the ratio utility model, we continue to rely on assumption 1, but we now imagine that the expected utility for the cohort is some other linear function P i piui where the variables ui are some unknown measure of utility in [0, 1]. Assuming that for every candidate i the score si is generally correlated with the utility ui even if they are not exactly the same, we normalize by P i siui, as the original scores can still be a reasonable choice of selection probabilities that satisfy the fairness constraint. Since the utility function is unknown, our goal is now to produce a cohort that maximizes the utility in the worst case over possible utility functions. As we show (Lemma 2.1), maximizing the worst-case utility amounts to assigning probabilities that maximize the quantity mini pi/si. Optimizing the ratio utility implies a lower bound on the cohort utility under any utility function that satisfies assumption 1, including our linear utility above. However, it does not necessarily optimize the linear utility, thus the two utility models are distinct. Our technical contributions are polynomial-time algorithms for fair and useful cohort selection with both linear and ratio utilities, in two different algorithmic models. 1In the model of Dwork et al. (2012), fairness requires that the classifier be randomized, so we can think of the output of the classifier as continuous scores rather than binary decisions. Published in Transactions on Machine Learning Research (09/2023) Offline Setting. In this setting, we are given all the scores s1, . . . , sn at once, and must choose the optimal cohort. We present a polynomial-time algorithm for each model that computes a fair cohort with optimal expected utility. Online Setting. In this setting, we are given the scores s1, s2, . . . as a stream. Ideally, after receiving si, we would like to immediately accept candidate i to the cohort or reject them from the cohort. However, this goal is too strong (Dwork & Ilvento, 2019), so we consider a relaxation where candidate i can be either rejected at any point during the online process, or kept on hold until later, and we would like to output a fair cohort at the end of the stream with optimal expected utility while minimizing the number of candidates who are kept on hold. We must keep at least k candidates pending in order to fill our cohort. For ratio utilities we give a fair and optimal algorithm in this setting that keeps at most O(k) candidates on hold. For linear utilities, we give an approximately fair and optimal algorithm. Specifically, we present a polynomial-time algorithm for this online setting where the fairness constraints are satisfied with an ε additive error and utility is optimized up to an additive error of O(k ε), for any desired ε > 0, while ensuring that the number of users on hold never exceeds O(k + 1/ε). To interpret our additive error guarantee, since the fairness constraint applies to the probabilities p1, . . . , pn, allowing the fairness constraint to be violated by an additive error of, say, ε = 0.01 means that every candidate is selected with probability that is within 1% of the probability that candidate would have been selected by some fair mechanism. 1.1 Techniques We start with a brief overview of the key steps in our algorithms for each setting. The formal description of the algorithms can be found in Sections 3 and 4. All omitted proofs are in the appendix. Offline Setting. Our offline algorithm is based on two main properties of the fair cohort selection problem. First, we use a dependent-rounding procedure that takes a set of probabilities p1, . . . , pn such that Pn i=1 pi = k and outputs a random cohort C, represented by indicator random variables p1, . . . , pn {0, 1} with Pn i=1 pi = k such that E ( pu) = pu for every candidate u. Thus, it is enough for our algorithm to come up with a set of marginal probabilities for each candidate to appear in the cohort, and then run this rounding process, rather than directly finding the cohort. To find the marginal probabilities that maximize the linear utility with respect to the scores s1, . . . , sn [0, 1], we would like to solve the linear program (LP) maximize Pn i=1 pisi s.t. Pn i=1 pi k i, j, |pi pj| |si sj| i, 0 pi 1 In this LP, the first and third constraints ensure that the variables pu represent the marginal probability of selecting candidate u in a cohort of size k. The second constraint ensures that these probabilities p are fair with respect to the same measure D as the original scores s (i.e. |pu pv| |su sv| D(u, v)). Although we could have used the stronger constraint |pu pv| D(u, v), writing the LP as we do means that our algorithm does not need to know the underlying metric, and that our solution will preserve any stronger fairness that the scores s happen to satisfy. While we could use a standard linear program solver, we can get a faster solution that is also useful for extending to the online setting by noting that this LP has a specific closed form solution based on waterfilling . Specifically, if Pn i=1 si k, then the optimal solution simply adds some number c 0 to all scores and sets pu = min{su + c, 1}, and an analogous solution works when Pn i=1 si > k. For the ratio utility, although computing the optimal marginal probabilities does not correspond to solving a linear program, the solutions for the two utility functions are the same when Pn i=1 si k. When Pn i=1 si > k, we maximize the ratio of marginal probability over score by scaling all the scores down by the same factor k/ Pn i=1 si. Published in Transactions on Machine Learning Research (09/2023) Online Setting. In the online setting we do not have all the scores in advance, thus for the linear utility we cannot solve the linear program, and do not even know the value of the constant c that determines the solution. We give two algorithms for addressing this problem. The basic algorithm begins by introducing some approximation, in which we group users into 1/ε groups based on their scores, where group g contains users with scores in ((g 1)ε, gε]. This grouping can only reduce utility by O(k ε), and can only lead to violating the fairness constraint by ε. Since users in each bucket are treated identically, we know that when we reach the end of the algorithm, we can express the final cohort as containing a random set of ng members from each group g. Thus, to run the algorithm we use reservoir sampling to maintain a random set of at most k candidates from each group, reject all other members, and then run the offline algorithm at the end to determine how many candidates to select from each group. The drawback of this method is that it keeps as many as k/ε candidates on hold until the end. Thus, we develop an improved algorithm that solves the linear program in an online fashion, and uses the information it obtains along the way to more carefully choose how many candidates to keep from each group. This final algorithm reduces the number of candidates on hold by as much as a quadratic factor, down to 2k + 1 For ratio utility, we are able to fully maximize the utility due to the way we decrease scores when the sum is greater than k. When the sum of scores is less than k, we must be careful in order to avoid being unfair to candidates eliminated early on. Thus, we maintain three groups; the top candidates, candidates undergoing comparisons, and candidates given uniform probability. The groups are dynamic and candidates can move from the first to the second and then the third group before getting rejected. Once the sum of scores exceeds k the groups are no longer needed. The algorithm considers at most O(k) candidates at any time. 1.2 Related Work Individual Fairness and Composition. Our work fits into the line of work initiated by Dwork et al. (2012) on individual fairness, which imposes constraints on how algorithms may distinguish specific pairs of individuals. Within this framework, difficulties associated with composing fair algorithms were first explored by Dwork & Ilvento (2019), which introduced the fair-cohort-selection problem that we study. Issues of composition in pipelines were further studied by Dwork et al. (2020). Because the scores figure into both our fairness constraint and utility measure, our work has some superficial similarity to work on meritocratic fairness. At a high-level meritocratic fairness (Joseph et al., 2016; Kearns et al., 2017; Joseph et al., 2017; Kleine Buening et al., 2022) is a kind of individual fairness where we assume there exists an intrinsic notion of merit and requires that candidates with higher merit be given no less reward than candidates with lower merit. In this work, we study a notion of individual metric fairness that, intuitively, tries to preserve the distances between the scores and not necessarily their order. Our algorithms turn out to also preserve the ordering, but meritocratically fair algorithms will not, in general, preserve the distances. We note that in our cases the scores s1, . . . , sn might or may not be reflective of true merit, since we assume that these scores satisfy individual fairness with respect to some underlying metric D, which may or may not reflect meritocratic values. The work by Kleine Buening et al. (2022) on set selection under meritocratic fairness is the most related to our work. Yet, their setting is different as the authors make the assumption that the utility function is non-linear and it depends on the combination of the selected candidates. One of our main assumptions is that every individual contributes to the cohort utility separately. The differences between the two problems are also highlighted in their work. Individually Fair Ranking and Cohort Selection. Our problem has some similarity to fair ranking, where we have to produce a permutation over all n candidates instead of just selecting a small cohort of k n candidates. Any algorithm for ranking implies an algorithm for cohort selection, since we can select the top k elements in the ranking as our cohort, and a cohort selected this way may inherit some notion of fairness from the ranking. However, we do not know of any ranking algorithms that can be used in this way to satisfy the notions of fairness and optimality we study in this work. In particular, the most closely related work on individually fair ranking by Bower et al. (2021) uses an incomparable formulation of the ranking problem, and it is not clear how to express our constraints in their framework. In addition to the difference in how fairness and utility are defined, we are not aware of any fair ranking papers that work in an online model similar to the one we study. Published in Transactions on Machine Learning Research (09/2023) Group Fairness. A complementary line of work, initiated by Hardt et al. (2016) considers notions of group fairness, which imposes constraints on how algorithms may distinguish in aggregate between individuals from different groups. While our work is technically distinct from work on group fairness, cohort/set selection has also been studied from this point of view (Zehlike et al., 2017; Stoyanovich et al., 2018) and different approaches for fair ranking have been proposed (Yang & Stoyanovich, 2017; Celis et al., 2018; Singh & Joachims, 2018; Yang et al., 2019; Zehlike & Castillo, 2020). Issues of composition also arise in this setting, as noted by several works (Bower et al., 2017; Kannan et al., 2019; Arunachaleswaran et al., 2021). 2 Preliminaries We consider the problem of selecting a cohort using the output of an individually fair classifier as defined in Dwork & Ilvento (2019). Definition 2.1 (Individual Fairness (Dwork et al., 2012)). Given a universe of individuals U and a metric D, a randomized binary classifier C : U {0, 1} is individually fair if and only if for all u, v U, |P(C(u) = 1) P(C(v) = 1)| D(u, v). We restate the formal definition of the problem for convenience. Definition 2.2 (Fair Cohort Selection Problem). Given a universe of candidates U, an integer k and a metric D, select a set S of k individuals such that the selection is individually fair with respect to D, i.e. for all pairs of candidates u, v U, |P(u S) P(v S)| D(u, v). In this work, we are interested in the setting of fairness under composition, and designing algorithms that exploit the fairness properties of their components instead of having direct access to D. An algorithm which solves this problem will either preserve or shrink the pairwise distances between the selection probabilities of candidates. We can think of the probability of C selecting a candidate u as a continuous score su. Definition 2.3 (Fairness-Preserving Cohort Selection Problem). Given a classifier C who assigns score si to candidate i, select a cohort of k individuals where each individual i has marginal selection probability pi such that for any pair of individuals i, j, |pi pj| |si sj|. In some contexts that we define below, we consider a relaxation of this definition by adding a small error ε to the pairwise distances. Definition 2.4 (ε-Approximate Fairness-Preserving Cohort Selection Problem). Given a classifier C who assigns score si to any individual i and a parameter ε (0, 1], select a cohort of k individuals where each individual i has marginal selection probability pi such that for all pairs of individuals i, j |pi pj| |si sj|+ε. When the initial classifier C is individually fair and the cohort selection algorithm preserves fairness εapproximately, then the cohort selection probabilities satisfy ε-individual fairness. Definition 2.5 (ε-Individual Fairness). Given a universe of individuals U, a metric D and an ε in [0, 1], a randomized binary classifier C : U {0, 1} is ε-individually fair if and only if for all u, v U, |P(C(u) = 1) P(C(v) = 1)| D(u, v) + ε. The problem of fairness-preserving cohort selection may be studied in an offline setting, where the universe of candidates is known in advance, as well as in an online setting, in which candidates arrive one-at-atime. Dwork & Ilvento (2019) prove that making immediate selection decisions while preserving fairness is impossible, so we relax this to a setting in which we do not need to give immediate decisions as candidates arrive. To control the degree of relaxation, we consider the number of candidates who wait for a response during the selection stage; we seek to minimize the number of these pending candidates. Individual fairness can be achieved by making uniform random selections among candidates, which is trivially possible in the offline case, and can be implemented in the online setting using reservoir sampling (Vitter, 1985). Therefore, it is important to consider an extension to the cohort selection problem in which we also want to maximize the utility of the selected cohort. We consider two measures of utility and construct Published in Transactions on Machine Learning Research (09/2023) algorithms for cohort selection which optimize these measures. Let the selection scores for individuals under classifier C be s1, . . . , sn, and let their marginal probabilities of selection under an algorithm A be p1, . . . , pn. Utilities Correlated with Classifier Output. We may consider that the classifier scores s directly indicate the qualifications of an individual and that the utility of a cohort is the sum of the scores of each member. Then, our evaluation measure becomes the expected utility of the selected cohort. Definition 2.6 (Linear Utility). Consider a set of n candidates whose scores from classifier C are s1, . . . , sn and whose selection probabilities by algorithm A are p1, . . . , pn. We define the linear utility of A to be Utilitylinear (A, {si}) = Pn i=1 sipi. As seen in Example 2.1, the maximum linear utility under the fairness-preserving constraint is not a monotone function of the input scores; as the score si increases, the maximum linear utility may either increase, decrease, or stay constant. Yet, we show in Lemma 3.1 that it is robust to small input perturbations. Therefore, it can handle minor noise or errors in the input value. We exploit this property to develop an ε-individually fair algorithm for online cohort selection. Example 2.1. Suppose we want to select 2 people from a set of 4 candidates and classifier C assigns scores s1 = 0.1, s2 = 0.3, s3 = 0.6 and s4 = 0.9. The optimal fairness-preserving solution has marginal selection probabilities p1 = 0.125, p2 = 0.325, p3 = 0.625 and p4 = 0.925 and utility P4 i=1 pisi = 1.3175. By increasing the score of the first individual to s 1 = 0.3, the cohort selection probabilities become p 1 = 0.275, p 2 = 0.275, p 3 = 0.575 and p 4 = 0.875. The new utility is P4 i=1 p is i = 1.2975, which is lower than the previous one. Arbitrary Utilities. Alternatively, we may consider the case where there exists an arbitrary, unknown utility value for each candidate u1, . . . , un in [0, 1], which may differ from the scores s. Given si and ui, a simple quantity to study is the ratio utility achieved by an algorithm A: P i siui . Since we do not know the distribution of utility, we must consider the utility achieved by an algorithm A for a worst-case utility vector u. Lemma 2.1. The worst case ratio utility is equal to the minimum ratio of selection probability and classifier score of one candidate min u i siui = mini pi si . We use this equivalence to redefine the ratio utility as follows. Definition 2.7 (Ratio Utility). Given n candidates with scores si and a cohort selection algorithm A with marginal selection probabilities pi, let the ratio utility of an algorithm A be defined according to the coordinate with the minimum ratio: Utilityratio (A, {si}) = mini pi si . Of course other utility measures exist; we study the measures defined above because they capture a natural intuition about real world scenarios. We present an example to demonstrate the properties of these two utility functions and the performance of pre-existing cohort selection algorithms. Example 2.2. Consider the case where we would like to select 2 out of 3 candidates with scores s1 = 0.5, s2 = 0.5, s3 = 1. The optimal solution in terms of ratio and linear utility is to pick {1, 3} and {2, 3}, each with probability 0.5. This solution achieves ratio utility 1 and linear utility 1.5. The weighted sampling algorithm by Dwork & Ilvento (2019) selects each subset with probability proportional to its weight, and would select {1, 2} with probability 1/4, {1, 3} with probability 3/8, and {2, 3} with probability 3/8. The ratio utility of this solution is 3/4 and its linear utility is 11/8. Permute Then Classify by Dwork & Ilvento (2019) selects {1, 2} with probability 1/12 and {1, 3} or {2, 3} with probability 11/24 each. This solution achieves ratio utility 11/12 and linear utility 35/24. Hence, we see that weighted sampling and Permute Then Classify are sub-optimal for both utility definitions. 2.2 Dependent Rounding Our cohort selection algorithms for ratio and linear utility in the offline and online settings are based on a simple dependent rounding algorithm described in the appendix. This algorithm makes one pass over a list Published in Transactions on Machine Learning Research (09/2023) of numbers in [0, m], where m R, and rounds them so that their sum is preserved and the expected value of each element after rounding is equal to its original value. The algorithm operates on two entries at a time - call these a and b. If a + b m, the algorithm randomly rounds one of them to a + b and the other one to 0 with the appropriate probabilities. If a + b > m, it rounds one to m and the other to the remainder. This rounding procedure is a special case of rounding a fractional solution in a matroid polytope (in this case, we have a uniform matroid). This problem has been studied extensively with rounding procedures satisfying additional desirable properties (see e.g. Chekuri et al. (2010)). Here we use a simple and very efficient rounding algorithm for the special case of the problem arising in our work. Lemma 2.2. Let a1, . . . , an be a list of numbers in [0, m] with x = Pn i=1 ai. There is a randomized algorithm that outputs a1, . . . , an [0, m] such that x m of the ai will be equal to m, one ai will be x x m m, the rest will be equal to 0, and for all i {1, . . . , n}, E[ ai] = ai. When m = 1, we call this algorithm round-pr and when a1, . . . , an are natural numbers we call it round-nat. We use the rounding procedure in two ways. We call round-pr when we want to round a list of real numbers in [0, 1] which represent selection probabilities and round-nat to round natural numbers which correspond to counts. Corollary 2.1. If m = 1 and Pn i=1 ai = k N, then after the dependent rounding of round-pr k elements will be equal to 1 and the rest will be equal to 0. 3 Linear Utility The first definition of utility we study is that of linear utility. In this setting, the aim of the cohort selection is to maximize the expected sum of scores of the group of k people selected while preserving fairness. We can formulate this problem as a linear program, as shown in Section 1.1. The key idea of the solution is to move all the selection scores up or down by the same amount, depending on the value of their sum, and clip the probabilities that are less than zero or over one. The algorithm we propose for the online setting solves approximately a relaxation of the cohort selection problem, where the probabilities of two candidates can differ by at most the given metric plus a small constant ε. The new problem is defined as the linear program presented in Section 1.1, where the second constraint, i, j [n], |pi pj| |si sj|, is replaced by i, j [n], |pi pj| |si sj| + ε. 3.1 Offline Cohort Selection In the offline setting, the algorithm has full access to the set of candidates and their scores from classifier C. By Corollary 2.1, if the classifier s scores sum up to k, then the rounding algorithm can form a cohort by simply using the input scores as the marginal probabilities of selection. Yet, the sum of the scores assigned by C might not be k. If it is less than k, Algorithm 1 moves all the probabilities up by the same amount and clips those that exceed 1, so that their sum becomes equal to k. In more detail, the marginal selection probability of candidate i becomes pi = min{si + c, 1}, where constant c is such that Pn i=1 pi = k. This adjustment preserves the probability differences between the pairs of candidates, unless some candidate gets assigned probability 1, in which case some differences decrease. As a result, the differences of the new probabilities will be bounded by the same metric as the scores. The case where the sum of scores is greater than k is treated similarly. More specifically, the new marginal probabilities are of the form pi = max{0, si c}. After the adjustment, Algorithm 1 selects the cohort by running the dependent rounding procedure. Theorem 3.1. For any classifier C that assigns scores s1, . . . , sn to n candidates, Algorithm 1 solves the fairness-preserving cohort selection problem by selecting k candidates with marginal probabilities p1, . . . , pn which achieve optimal linear utility Pn i=1 pisi. Corollary 3.1. If C is individually fair, then Algorithm 1 is individually fair. 3.2 Online Cohort Selection For the online setting of the problem with linear utility, we present Algorithm 2 which reads C s output from a stream and selects a cohort with high utility while deferring the decision for a small number of candidates. Published in Transactions on Machine Learning Research (09/2023) Alg. 1: Offline Cohort Selection for Linear Utility Input: scores s1, . . . , sn [0, 1] from C, cohort size k Output: set of accepted candidates 1 sum Pn i=1 si 2 if sum < k then // move scores up n ; pi si + c, i [n] 4 while pi > 1 do 5 S<1 {j : pj < 1} 6 pj pj + pi 1 |S<1|, j S<1 // move scores down n ; pi si c, i [n] 10 while pi < 0 do 11 S>0 {j : pj > 0} 12 pj pj + pi |S>0|, j S>0 // Now list sums to k and all pi [0, 1] 14 { pi} round-pr(p1, . . . , pn, 1) // select cohort of size k 15 return candidates with non-zero pi While processing the stream, this algorithm only rejects candidates and releases all the acceptances at the end of the stream. The main difference in comparison to the offline cohort selection algorithm is that Algorithm 2 splits the candidates into groups with similar scores, within an ε interval, and treats any member of a group as equivalent. Hence, the final probability of being selected in the cohort is equal for any candidate of the same group. This leads to a relaxation of the problem, which we call ε-approximate fairness-preserving cohort selection problem. As we mentioned, the algorithm splits the candidates from the stream into groups according to their score from C. Particularly, the interval (0, 1] is divided into m intervals of size ε each, where m = 1 ε . The i-th candidate gets assigned to group 0 if si = 0 or group g {1, . . . , m} if si ((g 1)ε, gε]. Once candidate i is in the group, their score gets rounded up and their new score ˆsi is the same as that of all the other members of the group, ˆsg = gε. The use of the groups affects the performance of the algorithm in terms of not only individual fairness, but also utility. Lemma 3.1 shows that the decrease in utility is caused only by the use of rounding, as Algorithm 2 achieves the same linear utility as the corresponding offline algorithm when the scores are rounded to multiples of ε. Lemma 3.1. Given selection scores s1, . . . , sn from C, for a set of n candidates and a parameter ε (0, 1], we split (0, 1] into m = 1 ε intervals of length ε and for all i [n] we set ˆsi = gε, where g is such that si ((g 1)ε, gε]. When Algorithm 1 runs for input ˆs1, . . . , ˆsn and cohort size k it solves the ε-approximate fairness-preserving cohort selection problem with marginal selection probabilities p1, . . . , pn, and it achieves linear utility Pn i=1 pisi Pn i=1 p i si k(ε + 2 ε), where p 1, . . . , p n is the optimal solution for the offline fairness-preserving problem for input s1, . . . , sn. Since for each group we keep only some people on hold, we preserve the probability information of the rejected candidates by considering that each pending candidate represents themselves and a fraction of the rejected people from their group. The fraction of people represented by the i-th candidate is denoted by ni. One approach is to maintain at most k people pending per group uniformly at random and have every candidate within a certain group represent the same fraction of people. This basic algorithm keeps k/ε candidates on hold. Our algorithm reduces the number of people pending by allowing candidates of the Published in Transactions on Machine Learning Research (09/2023) Alg. 2: Online Cohort Selection for Linear Utility Input: stream of scores s1, . . . , sn [0, 1] from C, cohort size k, constant ε (0, 1] Output: set of accepted candidates 1 n 0; sum 0// number of candidates and sum of scores 2 ng 0, for all groups g [ 1 ε ]// size of group g 3 for candidate i in stream do 4 add i to group g where si ((g 1)ε, gε] 5 ng ng + 1 6 ˆsi gε; sum sum + ˆsi 7 ni 1; n n + 1 // Compute group scores for this input 8 if sum < k then n 10 sg gε + c, for all groups g 11 while group g with sg > 1 do 12 F<1 {group f : sf < 1} 14 sf sf + ng (sg 1) n<1 , f F<1 16 else if sum > k then n 18 sg gε c, for all groups g 19 while group g with sg < 0 do 20 F>0 {group f : sf > 0} 22 sf sf + ng sg n>0 , f F>0 24 else sg gε, for all groups g 25 for group g do 26 if sg > 0 then 27 {ni : i in g} round-nat({ni : i in g}, v = 1 sg ) // we ensure niv 1 28 reject candidate i if ni = 0 30 reject all candidates in g 31 si nisg, for all groups g and candidates i in g // Now candidate scores sum up to k and all si in [0, 1] 32 { si} round-pr({ si : i in any group}, 1)// select cohort of size k 33 return candidates with non-zero si same group to represent varying fractions of people. Procedure round-nat is used to eliminate candidates and determine what fraction of people s selection probabilities each candidate represents. Theorem 3.2. For any classifier C that assigns scores s1, . . . , sn to n candidates, Algorithm 2 solves the online ε-approximate fairness-preserving cohort selection problem for any ε (0, 1] by selecting a cohort of size k with marginal probabilities p1, . . . , pn, achieves linear utility Pn i=1 pisi Pn i=1 p i si k(ε + 2 ε) (where p 1, . . . , p n is the optimal solution for the offline fairness-preserving problem for input s1, . . . , sn), and keeps at most O(k + 1 ε) candidates pending. Corollary 3.2. If C is individually fair, then Algorithm 2 is ε-individually fair. Published in Transactions on Machine Learning Research (09/2023) 4 Ratio Utility In Definition 2.7 we consider the ratio utility obtained by a selection algorithm, and show that this is controlled by the minimum ratio of pi and si. 4.1 Offline Cohort Selection We begin with the scenario of selecting k of n individuals when all candidates are known in advance. Dwork & Ilvento (2019) give a solution to this scenario (with non-optimal utility, as shown in Example 2.2); we present Algorithm 3, an alternate one that achieves the optimal ratio utility. Notice that the sum of classifier scores P i si need not add up to k. If the sum exceeds k, the algorithm simply scales down all probabilities so that the new sum is k. It is not hard to show that this operation preserves fairness and gives optimal utility. This multiplicative scaling will not work when the sum is smaller than k, however, since it increases the probability difference among candidates and can be unfair. Intuitively, the solution in this case is to additively increase all probabilities by the same amount, thus preserving fairness. However, some care is needed, as no probability can exceed 1. Alg. 3: Offline Cohort Selection for Ratio Utility Input: scores s1, . . . , sn [0, 1] from C, cohort size k Output: set of accepted candidates 1 sum Pn i=1 si 2 if sum < k then // move scores up n ; pi si + c, i [n] 4 while pi > 1 do 5 S<1 {j : pj < 1} 6 pj pj + pi 1 |S<1|, j : pj < 1 // move scores down 9 pi si k sum, i [n] // Now list sums to k and all pi [0, 1] 10 { pi} round-pr(p1, . . . , pn, 1) // select cohort of size k 11 return candidates with non-zero pi Lemma 4.1. If sum < k, Algorithm 3 increases each input score si to a final output pi = si + αi si such that all of the candidates j with pj < 1 receive the same cumulative adjustment value αj = v. Theorem 4.1. For any classifier C that assigns scores s1, . . . , sn to n candidates, Algorithm 3 solves the fairness-preserving cohort selection problem by selecting k candidates with marginal probabilities p1, . . . , pn that achieve the optimal value of ratio utility mini pi si . Corollary 4.1. If C is individually fair, then Algorithm 3 is individually fair. 4.2 Online Cohort Selection We now consider the online scenario, in which our goal is to keep as few candidates pending as possible. It is clear that the number of candidates on hold must be at least k since we need to accept k candidates at the end. We provide an algorithm that achieves optimal ratio utility for the online fairness-preserving cohort selection problem, while leaving only O(k) candidates pending. We use a parameter α [0, 1/2], which controls the number of candidates pending, with α = 1/2 by default. The update procedure depends on the sum of scores of the candidates seen so far. We maintain a set of k/α candidates with highest scores called top, and two sets rest and rand of the remaining candidates. A new candidate i is added to either top (and thus bumps some other candidate from top to rest), or they are added to rest. We make sure Published in Transactions on Machine Learning Research (09/2023) Alg. 4: Online Cohort Selection for Ratio Utility Input: stream of scores s1, . . . , sn [0, 1] from C, cohort size k, constant α [0, 1/2] Output: set of accepted candidates 2 top ; rest ; rand 3 for candidate i in stream do 4 sum sum + si; pi si 5 if sum < k then 6 if |top| < k/α then add i to top 7 else if min(sj) in top < si then 8 move min element from top to rest 9 add i to top 10 else add i to rest 11 {pi : i rest} round-pr({pi : i rest }, 1 α) 12 for j in rest with pj = 0 do 13 remove j from rest 14 add j to unif. reservoir sample of size k 15 reject candidate that was eliminated from the unif. reservoir sample 16 set rand to unif. reservoir sample 18 if first occurrence of sum k then 19 reject all candidates in rand and remove rand 20 store top and rest in pending 21 incr k sum ; scale k sum 22 else 23 incr sum si sum ; scale scale incr 24 pi pi scale 25 pj pj incr, j pending 26 add i to pending 27 {pi : i pending} round-pr({pi : i pending}, 1) 28 reject j and remove it from pending if pj = 0 29 if sum < k then n ; pi pi + c, i top rest i top rest pi |rand| , i rand 32 while pi > 1 do 33 S( 1) {i : pi 1}; c pi 1 n |S( 1)| 34 pj pj + c, j top rest with pj < 1 35 set nmodified to number of modified people 36 pj pj + c (n |S( 1)| nmodified) |rand| , j rand 38 {pi : i top rest rand} round-pr({pi : i top rest rand}, 1) 39 set pending to indices of nonzero pi 40 return pending the list rest is not too big by rounding and eliminating candidates. Eliminated candidates are sampled uniformly to enter rand, which has size k. During the online phase, it is tempting to use the idea from the offline case to eliminate candidates; take two candidates from rest with scores a and b and round them to a + b and 0. However, we must be careful Published in Transactions on Machine Learning Research (09/2023) when rounding rest to ensure that probabilities will not later exceed 1 if the stream ends with sum < k. The key idea (and motivation for having top) is that candidates not in top have increments at most α, which we can see as follows. For rest to be non-empty, we must have at least n > k/α candidates. Let x be the number of candidates who are in top when the algorithm finishes and achieve probability 1. For each of the other candidates, the increment pi si is less than (k x)/(n x) < k/n, since k < n. Thus that cumulative increment is at most α, and it is safe to round probabilities in rest to 0 and 1 α. If the stream ends with P i si < k, we will increase the scores of everyone pending, and use rand to compensate for already eliminated candidates. Let A be the set of candidates in top and rest and A be the set of all others seen. As in the offline case, we will evenly increase the probabilities of all candidates by adding the appropriate c to each candidate, and using water-filling where some candidates reach 1. For candidates in A, this proceeds exactly as in the offline case. For candidates in A, we increase the probabilities without explicitly storing all of them by maintaining a set called rand consisting of k uniformly random candidates. Each member of rand receives probability equal to 1/k times the sum of the probabilities of the candidates in A. Since everyone in A has the same probability, this rounding clearly preserves the marginal probabilities. Finally, we use the offline algorithm to select the cohort from the union of rand and A. The case where the sum of probabilities exceeds k is simpler. The algorithm always scales down all probabilities so that they add up to exactly k. When a new candidate appears, they receive a probability equal to what C gives them, times the current scaling factor, and get added to top rest. The algorithm then scales down all probabilities so that the sum is exactly k and reduces the size of top rest by repeatedly rounding pairs of candidate as in the offline setting. In this case, the set rand is not used. Theorem 4.2. For any classifier C that assigns scores s1, . . . , sn to n candidates, Algorithm 4 solves the fairness-preserving cohort selection problem by selecting k candidates with marginal probabilities p1, . . . , pn that achieve the optimal value of ratio utility. The algorithm leaves no more than O(k) candidates pending at any time. Corollary 4.2. If C is individually fair, then Algorithm 4 is individually fair. 5 Conclusion In this work, we defined two notions of utility for the problem of cohort selection, linear and ratio utility, and designed polynomial time algorithms that optimize them (or approximately optimize them) while preserving the fairness of the given classifier. This means that if the given classifier is individually fair, then our cohort selection algorithms are also individually fair with respect to the same metric. We studied two settings, the standard offline setting and an online setting where we can reject candidates at any stage of the process and accept candidates only at the end, keeping the number of candidates that are waiting for the decision low. While our algorithms for ratio utility and offline linear utility are optimal in terms of fairness and utility, our online algorithm for linear utility achieves ε-individual fairness with an additive error in the utility and the number of candidates kept on hold. Overall, our approach is based on the assumptions that every candidate contributes to the utility of the cohort separately and that the candidate utilities are correlated with their selection scores given by the individually fair classifier. As the linear utility requires the assumption that the utilities of the candidates are their classification scores, we tried to relax it by defining the ratio utility in which the utilities are unknown but still correlated with the classification scores. The natural extension of this work would be to design algorithms that optimize a utility function where the candidate utilities are known and decoupled from the classification scores. Another interesting direction would be to relax our first assumption and study utility functions where the candidates do not contribute separately, but the cohort utility depends on the combination of candidates selected. Acknowledgments KB and JU were supported by NSF awards CCF-1750640 and CNS-2120603. KB and HN were supported by NSF grant CCF-1750716. Published in Transactions on Machine Learning Research (09/2023) Eshwar Ram Arunachaleswaran, Sampath Kannan, Aaron Roth, and Juba Ziani. Pipeline Interventions. In Innovations in Theoretical Computer Science Conference, ITCS 21, 2021. https://arxiv.org/abs/ 2002.06592. Amanda Bower, Sarah N Kitchen, Laura Niss, Martin J Strauss, Alexander Vargas, and Suresh Venkatasubramanian. Fair pipelines, 2017. https://arxiv.org/abs/1707.00391. Amanda Bower, Hamid Eftekhari, Mikhail Yurochkin, and Yuekai Sun. Individually fair rankings. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. Open Review.net, 2021. https://arxiv.org/abs/2103.11023. L. Elisa Celis, Damian Straszak, and Nisheeth K. Vishnoi. Ranking with fairness constraints. In 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, July 9-13, 2018, Prague, Czech Republic, volume 107 of LIPIcs, pp. 28:1 28:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. doi: 10.4230/LIPIcs.ICALP.2018.28. URL https://doi.org/10.4230/LIPIcs.ICALP. 2018.28. https://arxiv.org/abs/1704.06840. Chandra Chekuri, Jan Vondrák, and Rico Zenklusen. Dependent randomized rounding via exchange properties of combinatorial structures. In IEEE Symposium on Foundations of Computer Science, FOCS 10. IEEE, 2010. https://arxiv.org/abs/0909.4348. Cynthia Dwork and Christina Ilvento. Fairness under composition. In Innovations in Theoretical Computer Science, ITCS 19, 2019. http://arxiv.org/abs/1806.06122. Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Innovations in Theoretical Computer Science, ITCS 12. ACM, 2012. https://arxiv.org/ abs/1104.3913. Cynthia Dwork, Christina Ilvento, and Meena Jagadeesan. Individual Fairness in Pipelines. In Symposium on Foundations of Responsible Computing, FORC 20, 2020. https://arxiv.org/abs/2004.05167. Moritz Hardt, Eric Price, and Nathan Srebro. Equality of opportunity in supervised learning. In Conference on Neural Information Processing Systems, NIPS 16, 2016. https://arxiv.org/abs/1610.02413. Matthew Joseph, Michael J. Kearns, Jamie Morgenstern, and Aaron Roth. Fairness in learning: Classic and contextual bandits. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pp. 325 333, 2016. https://arxiv.org/abs/1605.07139. Matthew Joseph, Michael Kearns, Jamie Morgenstern, Seth Neel, and Aaron Roth. Fair algorithms for infinite and contextual bandits, 2017. https://arxiv.org/abs/1610.09559. Sampath Kannan, Aaron Roth, and Juba Ziani. Downstream effects of affirmative action. In Conference on Fairness, Accountability, and Transparency, FAT* 19. ACM, 2019. https://arxiv.org/abs/1808. 09004. Michael Kearns, Aaron Roth, and Zhiwei Steven Wu. Meritocratic fairness for cross-population selection. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML 17, pp. 1828 1836. JMLR.org, 2017. https://arxiv.org/abs/1805.08716. Thomas Kleine Buening, Meirav Segal, Debabrota Basu, Anne-Marie George, and Christos Dimitrakakis. On meritocracy in optimal set selection. In Equity and Access in Algorithms, Mechanisms, and Optimization, EAAMO 22, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450394772. doi: 10.1145/3551624.3555305. URL https://doi.org/10.1145/3551624.3555305. Ashudeep Singh and Thorsten Joachims. Fairness of exposure in rankings. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 18, pp. 2219 2228, New York, NY, USA, 2018. Association for Computing Machinery. ISBN 9781450355520. doi: 10.1145/3219819. 3220088. URL https://doi.org/10.1145/3219819.3220088. https://arxiv.org/abs/1802.07281. Published in Transactions on Machine Learning Research (09/2023) Julia Stoyanovich, Ke Yang, and H. V. Jagadish. Online set selection with fairness and diversity constraints. In Advances in Database Technology - EDBT 2018, Advances in Database Technology - EDBT, pp. 241 252. Open Proceedings.org, 2018. doi: 10.5441/002/edbt.2018.22. Jeffrey S. Vitter. Random sampling with a reservoir. ACM Trans. Math. Softw., 11(1):37 57, March 1985. ISSN 0098-3500. doi: 10.1145/3147.3165. URL http://doi.acm.org/10.1145/3147.3165. Ke Yang and Julia Stoyanovich. Measuring fairness in ranked outputs. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management, SSDBM 17, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 9781450352826. doi: 10.1145/3085504.3085526. URL https://doi.org/10.1145/3085504.3085526. https://arxiv.org/abs/1610.08559. Ke Yang, Vasilis Gkatzelis, and Julia Stoyanovich. Balanced ranking with diversity constraints. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pp. 6035 6042. International Joint Conferences on Artificial Intelligence Organization, 7 2019. https: //arxiv.org/abs/1906.01747. Meike Zehlike and Carlos Castillo. Reducing disparate exposure in ranking: A learning to rank approach. In Proceedings of The Web Conference 2020, pp. 2849 2855, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450370233. URL https://doi.org/10.1145/3366424.3380048. https://arxiv.org/abs/1805.08716. Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, and Ricardo Baeza-Yates. Fa*ir: A fair top-k ranking algorithm. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 17, pp. 1569 1578, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 9781450349185. doi: 10.1145/3132847.3132938. URL https://doi.org/ 10.1145/3132847.3132938. https://arxiv.org/abs/1706.06368. A Proofs and Algorithm from Section 2 A.1 Proof of Lemma 2.1 Lemma 2.1. The worst case ratio utility is equal to the minimum ratio of selection probability and classifier score of one candidate min u i siui = mini pi si . Proof. Let j be the index of the candidate with the smallest ratio of model output and classifier score, i.e. pj sj pi si , i = j. Compare the minimum over all utility vectors to a particular choice of utility vector u satisfying u j = 1, and u i =j = 0. We know that the ratio for any particular vector will be at least as large as the minimum: min u i siu i = pju j sju j = pj sj = mini pi si . We can also obtain a bound from the other direction. Let m be the smallest ratio between model output and classifier score m = mini pi/si and u [0, 1]n be any utility vector with P i ui > 0. Then, for all i [n] we have piui msiui. Therefore, min u i siui mini pi si A.2 Dependent Rounding Algorithm Here we include a detailed description of the dependent rounding algorithm (Algorithm 5) mentioned in Section 2.2. A.3 Proof of Lemma 2.2 Lemma 2.2. Let a1, . . . , an be a list of numbers in [0, m] with x = Pn i=1 ai. There is a randomized algorithm that outputs a1, . . . , an [0, m] such that x m of the ai will be equal to m, one ai will be x x Published in Transactions on Machine Learning Research (09/2023) Algorithm 5: Rounding Input: a list of n numbers a1, a2, . . . , an R 0 and the maximum value of a rounded number m R 0 Output: list of rounded numbers a1, a2, . . . , an R 0 1 pending Index 1 2 for i from 2 to n do 3 if ai = 0 and apending Index = 0 then continue to next i 4 a ai ; b apending Index 5 choose u randomly from Unif (0, 1) 6 if a + b m then 7 if u < a a+b then 8 ai a + b ; apending Index 0 ; pending Index i 10 ai 0 ; apending Index a + b 13 if u < m b 2m a b then 14 ai m ; apending Index a + b m 16 ai a + b m ; apending Index m ; pending Index i 20 return a1, a2, . . . , an rest will be equal to 0, and for all i {1, . . . , n}, E[ ai] = ai. When m = 1, we call this algorithm round-pr and when a1, . . . , an are natural numbers we call it round-nat. Proof. We begin by showing that for all i [n], E[ ai] = ai. At step i, we have two numbers a and b which get rounded and obtain new values denoted by a and b, respectively. The rounding depends on the value of a + b. If a + b is less than or equal to β, then E[ a] = (a + b) a a+b = a. If a + b is greater than β and at most 2β, then E[ a] = v β b 2β a b + (a + b β) β a 2β a b = a. Similarly, we obtain E[ b] = b. Since the expected values remain constant throughout the process, we conclude that for all i [n], E[ ai] = ai. We notice that for any step i of the algorithm and for all j that are smaller than i but are not the pending Index, aj = 0 or aj = β. As a result, at the end of the algorithm all elements with index j such that j < n and j = pending Index and one of the n-th or the pending Index elements are rounded to either 0 or β. The remaining element is the only one that can have a value in all [0, β]. Since Pn i=1 ai = sum, sum β elements are β and if the remainder sum sum β β is non-zero, it is assigned to the remaining element. A.4 Proof of Corollary 2.1 Corollary 2.1. If m = 1 and Pn i=1 ai = k N, then after the dependent rounding of round-pr k elements will be equal to 1 and the rest will be equal to 0. Proof. The proof follows directly from Lemma 2.2. Published in Transactions on Machine Learning Research (09/2023) B Proofs from Section 3 B.1 Proof of Theorem 3.1 Theorem 3.1. For any classifier C that assigns scores s1, . . . , sn to n candidates, Algorithm 1 solves the fairness-preserving cohort selection problem by selecting k candidates with marginal probabilities p1, . . . , pn which achieve optimal linear utility Pn i=1 pisi. Proof. Algorithm 1 consists of two parts. In the first part, it modifies the input scores so that they sum up to k, which is the size of the cohort, and for the second part it randomly selects k individuals by calling the round-pr subroutine. We can simplify the first part by rewriting it in a more concise way. If the sum of C s scores s1, . . . , sn is less than k, then for any candidate i [n] we set pi = min{si + b, 1}, with b in [0, 1] such that Pn i=1 pi = k. More specifically, pi are initially set to si + c so that their sum is equal to k. Their sum will remain equal to k, because the following loop only redistributes probability mass among candidates. If at any iteration some candidate i has pi > 1, then their marginal selection probability is set to 1 and the excess probability mass is redistributed evenly to all candidates with marginal probabilities less than 1. At the end, any probability pi that is less than 1 is equal to si plus a constant b that consists of the initial c and the fractions of the probabilities that exceeded 1. Similarly, we have that if the sum of C s scores is greater than k, then for any candidate i, pi = max{si b, 0} for a real number b [0, 1] such that Pn i=1 pi = k. If the sum is equal to k, the cohort selection marginal probabilities are equal to C s scores. Finally, by Lemma 2.2 we obtain that for any i [n] the expected value of indicator pi is pi, and since pi is either 0 or 1, the i-th candidate is selected with probability pi. First, we want to show that the cohort formed by Algorithm 1 preserves fairness. Depending on the value of Pn i=1 si, we have three cases. 1. Pn i=1 si = k. The input scores are used unaltered as the marginal probabilities for the cohort selection and, thus, we have that for any pair of indices i, j, |pi pj| = |si sj|. 2. Pn i=1 si < k. The cohort selection marginal probabilities are of the form pi = min{si + b, 1}. For any pair of individuals i, j, if both have probability of being selected 1, then |pi pj| = 0. Else, if pi = 1 and pj = sj + b, we have |pi pj| < |si sj| because si + b 1. Finally, if pi = si + b and pj = sj + b, then it holds that |pi pj| = |si sj|. As a result, we conclude that for any pair of individuals i, j, we have |pi pj| |si sj|. 3. Pn i=1 si > k. In this case, the cohort selection probabilities are of the form pi = max{si b, 0}. For any pair of individuals i, j, if both have zero probability of being selected then |pi pj| = 0. Else, if pi = 0 and pj = sj b, we have |pi pj| < |si sj|. Finally, if pi = si b and pj = sj b, then it holds that |pi pj| = |si sj|. Thus, for any pair of individuals i, j, we have |pi pj| |si sj|. Second, we show that this cohort optimizes linear utility. The fairness-preserving solution that optimizes linear utility solves the linear program defined in Section 1.1 and satisfies the first constraint with equality, i.e. Pn i=1 pi = k. This holds because otherwise we would be able to increase pi by setting p i = min{pi+d, 1}, for some d (0, 1], so that Pn i=1 p i = k and obtain greater utility while not violating any constraint. We assume that there exists a different fairness-preserving solution p 1, . . . , p n which achieves higher utility, i.e. Pn i=1 p isi > Pn i=1 pisi. Without loss of generality we can assume that the original probabilities si are sorted in increasing order. Specifically, we have s1 s2 . . . sn. The solution of Algorithm 1 maintains the same order, i.e. p1 p2 . . . pn. Let i be the individual with the smallest index for whom the two solutions differ. There are two possible cases. If p i < pi, then since Pn i=1 pi = Pn i=1 p i = k there exists an individual with j > i such that p j > pj. In addition, we see that pi > 0 and pj < 1. The values of the probabilities depend on the sum of the input scores in comparison to k. If the sum is greater than k, all pi that are not zero are of the form pi = si b. Hence, considering that p i < pi pj < p j we obtain |p i p j| = p j p i > sj b (si b) = sj si = |si sj|. Similarly, if the sum is less than k, the values of pi that are not one are of the form pi = si+b, therefore giving |p i p j| = p j p i > sj +b (si+b) = sj si = |si sj|. Published in Transactions on Machine Learning Research (09/2023) If the sum is equal to k, we have |p i p j| = p j p i > pj pi = sj si = |si sj|. In all three cases the constraints of the optimization are violated. If p i > pi, then there exists an individual with j > i such that p j < pj. Let j be the smallest such index. If there exists another index l > j such that p l pl, then by the previous argument the constraints are not satisfied. Therefore, for any candidate l after j p l < pl. This solution achieves non-optimal utility because there exist candidates with h < l and p h > ph, such that we can move probability mass from candidates h to individuals l without violating the constraints. By constructing a solution p 1, . . . , p n which gives a greater utility than p 1, . . . , p n, we arrive at a contradiction. Therefore, Algorithm 1 finds the optimal fairness-preserving solution. B.2 Proof of Corollary 3.1 Corollary 3.1. If C is individually fair, then Algorithm 1 is individually fair. Proof. The individual fairness condition for Algorithm 1 is satisfied by the assumption that the input scores are individually fair. More specifically, if the individual scores from C are individually fair with respect to a metric D, we obtain that i, j [n], |pi pj| |si sj| D(i, j). B.3 Proof of Lemma 3.1 Lemma 3.1. Given selection scores s1, . . . , sn from C, for a set of n candidates and a parameter ε (0, 1], we split (0, 1] into m = 1 ε intervals of length ε and for all i [n] we set ˆsi = gε, where g is such that si ((g 1)ε, gε]. When Algorithm 1 runs for input ˆs1, . . . , ˆsn and cohort size k it solves the ε-approximate fairness-preserving cohort selection problem with marginal selection probabilities p1, . . . , pn, and it achieves linear utility Pn i=1 pisi Pn i=1 p i si k(ε + 2 ε), where p 1, . . . , p n is the optimal solution for the offline fairness-preserving problem for input s1, . . . , sn. Proof. We have that for all individuals i in [n], 0 ˆsi si ε. Therefore, we obtain that for any pair of individuals i, j, |ˆsi ˆsj| |si sj| + ε. By Theorem 3.1 we have that |pi pj| |ˆsi ˆsj|. Combining the two inequalities, we obtain that fairness is preserved ε-approximately. Note that Algorithm 1 always adjusts the scores of candidates such that its final output probabilities sum to k. For any candidate i we observe that since the rounded score ˆsi is greater than si by at most ε, the two output probabilities, pi and p i are also close, at most ε apart. Depending on the sum of the input scores there are three cases. 1. If Pn i=1 si k and Pn i=1 ˆsi k, then Algorithm 1 shifts all the probabilities up. More specifically, we have that p i = min{1, si + x} and pi = min{1, ˆsi + y}, for some non-negative x and y. Suppose that y > x, then for any candidate i whose marginal selection probability pi has not been clipped to 1, pi = ˆsi + y > si + x = p i . By summing up for all candidates we obtain that k = Pn i=1 p i < Pn i=1 pi = k, which is not feasible. Similarly, suppose that x > y +ε, then for any candidate i whose marginal selection probability p i is has not been clipped, p i = si + x > ˆsi + y = pi. As a result, we get that Pn i=1 p i > Pn i=1 pi, which is again false. Hence, we see that y x y + ε. 2. If Pn i=1 si k and Pn i=1 ˆsi k, then Algorithm 1 shifts all the probabilities down, so that both sums become equal to k. Thus, p i = max{0, si x} and pi = max{0, ˆsi y}, for some non-negative x and y. Suppose that y < x, then for any candidate i who has non-zero marginal selection probability p i , we have that p i = si x < ˆsi y = pi. By summing up for all candidates we obtain that k = Pn i=1 p i < Pn i=1 pi = k, which is not feasible. Similarly, suppose that y > x + ε, then for any candidate i whose marginal selection probability pi is not 0, pi = ˆsi y < si x = p i . As a result, we get that Pn i=1 p i > Pn i=1 pi, which is false. Thus, we see that x y x + ε. 3. If Pn i=1 si k and Pn i=1 ˆsi k, then Algorithm 1 sets p i = min{1, si + x} and pi = max{0, ˆsi y}, for some non-negative x and y. Suppose that x + y > ε, then for any candidate i who has non-zero marginal selection probability pi, we have that pi = ˆsi y < si + x and pi < 1; therefore pi < p i . Published in Transactions on Machine Learning Research (09/2023) By summing up the selection probabilities of all the candidates we obtain that k = Pn i=1 pi < Pn i=1 p i = k, which is not feasible. Similarly, suppose that x + y < 0, then for any candidate i whose marginal selection probability p i is not 1, p i = si + x < ˆsi y = p i . As a result, we get that Pn i=1 p i < Pn i=1 pi, which is false. Therefore, we know that 0 x + y ε. In all three cases we see that for any candidate i the selection probabilities of the two solutions satisfy |pi p i | ε. Now, consider any coordinate i for which p i ε. Since pi p i ε, we have that pi (1 ε)p i and, thus, pisi (1 ε)p i si. Let E be the set of all individuals who have p i < ε. 1. If Pn i=1 si k, then p i = min{1, si + x}. Thus, 0 si + x < ε because both si and x are non-negative. Since both pi and p i sum up to k, we obtain that kx P i E pisi k ε kx and kx P i E p i si k ε kx. Therefore, P i E p i si k ε. 2. If Pn i=1 si k and Pn i=1 ˆsi k, then p i = max{0, si x} and pi = max{0, ˆsi y}. Thus, si x < ε. Furthermore, both solutions assign zero probability to candidates with si x < ε. As a result, all individuals in E whose si x > ε, have si within an interval of length ε + ε. Similarly to case 1, we have that P i E p i si k( ε + ε). If we split the linear utility to the utility of individuals in E and not in E, we see that i=1 pisi = X i/ E pisi + X i E pisi (1 ε) X i/ E p i si + X i E p i si k(ε + ε) i=1 p i si k(ε + 2 ε) B.4 Proof of Theorem 3.2 Theorem 3.2. For any classifier C that assigns scores s1, . . . , sn to n candidates, Algorithm 2 solves the online ε-approximate fairness-preserving cohort selection problem for any ε (0, 1] by selecting a cohort of size k with marginal probabilities p1, . . . , pn, achieves linear utility Pn i=1 pisi Pn i=1 p i si k(ε + 2 ε) (where p 1, . . . , p n is the optimal solution for the offline fairness-preserving problem for input s1, . . . , sn), and keeps at most O(k + 1 ε) candidates pending. Proof. We want to prove that Algorithm 2 and the algorithm described in Lemma 3.1 compute the same solution. In other words, we show that the probability pi that individual i is selected by Algorithm 2 is equal to the probability qi that i is selected by Algorithm 1 applied to the modified individual scores ˆs1, . . . , ˆsn, where ˆsi = gε if si ((g 1)ε, gε]. The first step is to prove that the scores s1, s2, . . . , sn which are the input of the final rounding in line 28 satisfy the following properties: 1. Pn i=1 si = k 2. E[ si] = qi, i [n]. We saw in the proof of Theorem 3.1 that for the offline algorithm all candidates of the same group have the same selection probability qi. We will show that for any person i we have qi = sg, where sg is the final calculated probability of any member of group g to which i belongs. The sg we are referring to is calculated by the final execution of lines 6-22. These lines of Algorithm 2 describe the same procedure as lines 2-15 of Algorithm 1, but from the point of view of groups instead of individual candidates. We consider three cases that depend on the value of the sum of scores from C. Case 1: Pn i=1 ˆsi = k. The offline algorithm considers the scores as selection probabilities and, thus, assigns probability qi = ˆsi = gε to candidate i of group g. Similarly, Algorithm 2 assigns probability sg = gε to group g and in line 27 sets si = nisg = nigε = niqi for the people who have not been rejected. Published in Transactions on Machine Learning Research (09/2023) Case 2: Pn i=1 ˆsi < k. The process that calculates sg starts by setting sg = gε + c. The offline algorithm initializes the probability of i who is a member of g as qi = ˆsi +c = gε+c. If for all groups g, gε+c 1, then the adjustment stops for both algorithms and we have that for any group g, any member i of g has qi = sg. If there exists g such that qi > 1, then the corresponding sg exceeds 1 by the same amount. Additionally, at this point the probabilities of all people in the same group as i will exceed 1 by this amount. Therefore, the n<1 of the two algorithms is the same. The offline algorithm runs the loop for all members of all groups that have individuals with qi > 1. The online version aggregates the excess mass from all ng members of group g and redistributes it all at once instead of running separate iterations for every member of the group as the offline does. No extra mass is added after the initialization but it is only moved from group to group. Hence, we obtain that at the end of this process qi = sg. Case 3:Pn i=1 ˆsi > k. Similar to case 2, if candidate i is a member of group g, then qi = sg. Those who were rejected have si = 0 and ni = 0. As a result, we see that for any person i, si = niqi. From this we can infer that i g qi = k. As new people are added to the groups, the group probabilities sg become smaller in order for the sum of probabilities ngsg to be equal to k. Therefore, the maximum number v of people that can be represented by a candidate in a given group either stays the same or increases after every iteration. By Lemma 2.2, we obtain that the rounding process maintains the expected value of the number of people each candidate represents equal to their initial value. Since every person begins by representing only themselves, we have that for the i-th candidate E[ni] = 1. Finally, we obtain E[ˆsi] = E[niqi] = E[ni]qi = qi, because the calculation of sgs is deterministic. The final rounding procedure makes the final decisions and outputs 0 if the candidate is rejected and 1 if the candidate is selected. Due to properties 1 and 2, the probability of candidate i being selected by the online algorithm is qi. Thus, Algorithm 2 and the offline algorithm with input scores rounded to multiples of ε have the same selection probabilities. By Lemma 3.1, Algorithm 2 preserves fairness ε-approximately and achieves linear utility Pn i=1 pisi Pn i=1 p i si k(ε + 2 ε), where p 1, . . . , p n is the optimal solution for the offline fairness-preserving problem for input s1, . . . , sn. Because of the online probability adjustments, we have that for n k the sum of all the probabilities is equal to k at the end of each loop. Therefore, we have Pm g=0 ngsg = k. If sf > 0, each person can represent at most 1 sg candidates. By Lemma 2, the number of representatives per group is at most sg + 1 2ngsg + 1, sg 2ngsg. If we sum up the number of representatives for all groups we obtain g=0 (2ngsg + 1) = 2k + 1 This completes the proof. B.5 Proof of Corollary 3.2 Corollary 3.2. If C is individually fair, then Algorithm 2 is ε-individually fair. Proof. If the individual probabilities of selection by C are individually fair with respect to a metric D, we obtain that i, j [n] |pi pj| |si sj| + ε D(i, j) + ε. Published in Transactions on Machine Learning Research (09/2023) C Proofs from Section 4 C.1 Proof of Lemma 4.1 Lemma 4.1. If sum < k, Algorithm 3 increases each input score si to a final output pi = si + αi si such that all of the candidates j with pj < 1 receive the same cumulative adjustment value αj = v. Proof. First we observe that pi si, i. At each step of the algorithm the value of pi either increases, or is clipped at 1. Since si [0, 1], i, we therefore know that pi si. Algorithm 3 initially adds a constant amount c to every candidate, then repeatedly redistributes overflow evenly across all candidates with si+αi < 1. We prove by induction that for all candidates i with pi < 1, their increment αi are the same. Consider two candidates i, j such that their final pi, pj < 1. As argued above, both pi, pj are smaller than 1 throughout the execution of the algorithm. At the beginning, αi = αj = c so the claim holds. Given that these candidates are equal at a certain step d, we also know they will remain equal at the next step d + 1. By assumption, neither candidate reaches 1 on either of these steps; therefore they will both receive an adjustment. When an adjustment is made, all candidates are affected equally. C.2 Proof of Theorem 4.1 Theorem 4.1. For any classifier C that assigns scores s1, . . . , sn to n candidates, Algorithm 3 solves the fairness-preserving cohort selection problem by selecting k candidates with marginal probabilities p1, . . . , pn that achieve the optimal value of ratio utility mini pi si . Proof. We first show that Algorithm 3 either preserves or decreases the difference between any pair of individuals. Let pi denote the probability that candidate i is selected by Algorithm 3. Thus we must show, for all i, j, |pi pj| |si sj|. We apply Algorithm 3 to a set of candidates and obtain two disjoint sets of output candidates: let T(=1) be the set of candidates whose output pi = 1, and let T(<1) be the set of candidates whose output pi < 1. Let Tall be the union of these two disjoint sets. The algorithm behaves differently depending on the value of P 1. First, consider when P i si < k. Let αi be the cumulative adjustment received by candidate i during the algorithm. The adjustments are exactly chosen such that P i(si + αi) = k. Then we apply round-pr, where we know by Lemma 2.2 that the probability of being selected is preserved during rounding Pr[round-pr selects i] = pi = si + αi We thus need to cover three cases: (a) First, we consider two elements si, sj T(=1), |pi pj| = |1 1| |si sj|. (b) Next, we have si, sj T(<1) By Lemma 4.1, these candidates all receive the same adjustment. Let this adjustment be called b, |pi pj| = |si + b (sj + b)| = |si sj|. (c) Finally we have si T(=1), sj T(<1). Note that αi αj, and si > sj. We see that |pi pj| = |si + αi (sj + αj)| = |si sj + αi αj| |si sj|. 2. Next, consider P i si k. Here we know k P i si 1. By Lemma 2.2, pi = k P l sl si. Then, for candidates i and j, we have: |pi pj| = k P l sl si k P l sl (si sj) |si sj| Thus we see that |si sj| |pi pj|. We next show that among fairness-preserving solutions to the cohort selection problem, Algorithm 3 achieves the maximum value of mini pi si . Let T(<1) and T(=1) be defined as above, and note that T(=1) may be empty. Consider for contradiction an arbitrary algorithm A with probabilities of selection p i which also preserves fairness while achieving better utility: mini p i si > mini pi si . Published in Transactions on Machine Learning Research (09/2023) 1. First, consider the case where P i si < k and T(=1) is nonempty. Let individual j have the most extreme ratio: pj sj = mini pi si . Consider how to make this ratio minimal. For all individuals in T(<1), increasing si will reduce this ratio: pi si = si+b si . If any individuals j are in T(=1), they will have received some adjustment αj b, and thus they will have an even smaller ratio. Therefore, we know that individual j must have sj si, i and be a member of T(=1), and we have that for Algorithm A, mini pi si = pj sj = 1 sj . By our definition of A , there is some potentially different most extreme element l providing better utility: p l sl = mini p i si > pj (a) If sl = sj, the largest minimum ratio p l sl that A can achieve is 1 sl = 1 sj , which is the same ratio as Algorithm A and contradicts our assumption. (b) If sl = sj, we know sl < sj, since we already know that sj is at least as big as all other elements. Since pj is 1, we know that pj sj p j sj > p l sl . This contradicts our assumption that A has greater utility than A. 2. Next, consider the case where P i si < k and T(=1) group is empty. Again, let element j be the most extreme element for algorithm A, i.e. mini pi si = pj sj . Consider A on its most extreme element l constructed as in Item 1. (a) This time, if sl = sj, we may have p l > pl, achieving greater utility. To preserve the distances, if A adjusts element l by a certain amount, it must adjust all smaller elements by at least the same amount. However, we know that P i pi = k, so this required extra adjustment will mean that P i p i > k, and A will not be selecting exactly k candidates. (b) If sl < sj, then person j has the largest score, and similar to (2a), if p j > pj we see that P i p i > k. If p j pj we have the contradiction in (1b). 3. Finally, consider the case where P i si k. In Algorithm 3, all elements will be multiplied by a factor of k P i si , and then by Lemma 2.2, any element will satisfy pi = k P l sl si, so we have mini pi/si = mini i si Notice that in this case, all elements are adjusted by the same constant ratio. Using extreme element l for algorithm A as constructed previously, and again supposing that this element is more extreme than any element in algorithm A, we derive that p l > sl k P i si . Since element l was the minimum ratio for algorithm A , it follows that for all i, . Taking the sum on both sides we see P = k. As before, the number of elements chosen by algorithm A will be more than k, which is a contradiction. Thus we conclude that A cannot exist. C.3 Proof of Corollary 4.1 Corollary 4.1. If C is individually fair, then Algorithm 3 is individually fair. Proof. The proof is analogous to that of Corollary 3.1. C.4 Proof of Theorem 4.2 Theorem 4.2. For any classifier C that assigns scores s1, . . . , sn to n candidates, Algorithm 4 solves the fairness-preserving cohort selection problem by selecting k candidates with marginal probabilities p1, . . . , pn that achieve the optimal value of ratio utility. The algorithm leaves no more than O(k) candidates pending at any time. Published in Transactions on Machine Learning Research (09/2023) Proof. We first show that, for a list of candidate scores s1, . . . , sn, the selection probability given by the offline Algorithm 3 for any candidate i is the same as that given by Algorithm 4, and therefore provides an optimal utility solution to the problem. Let Algorithm 4 be denoted A and let A have probability pi of selecting candidate i. Let Algorithm 3 be denoted B with probability qi selecting candidate i. A never accepts candidates until the end of the stream, thus there are two major cases. 1. The stream ends with sum k, and A selects all candidates in pending. A candidate i is added to pending in one of three ways: (a) Candidate i was in top when the sum reached k. At each round after the sum exceeded k, they must survive round-pr, which by Lemma 2.2 always preserves their marginal probability. Therefore we only need to consider the effect of the incremental scaling adjustments. First, let scalet, sumt, and incrt, represent the values of these variables at some step t when the sum first exceeds k. We initialized scalet k sumt . Consider the value of scalet+1: scalet+1 = incrt scalet = sumt+1 st+1 sumt+1 scalet = sumt sumt+1 k sumt = k sumt+1 Thus, we can see by induction that for all time steps t, scalet = k sumt . Furthermore, we can express the final value of a single element si, which by definition is at position i in the stream, as: pi = si scalei t=i+1 incrt = si k sumi = si k sumi sumt = si k sumn Thus we see that the final pi is adjusted exactly the same way as it would be in the offline case. (b) Candidate i was already in rest or arrived at the moment when the sum reached k. For each iteration, they must survive round-pr, which respects their marginal by Lemma 2.2. From that point forward, they are part of pending and treated the same as in 1a. (c) Candidate i was encountered when sum k. In this case, they are treated the same as in part 1a above. In either case, when A ends, a subset of candidates are selected from pending using round-pr, which again preserves their marginal probability. Thus pi = qi in these cases. 2. The stream ends with sum < k. A has three sets at this point: top (the greatest k/α elements of the stream), rest (at most k/α elements rounded to 1 α) and rand (a randomly selected group of k candidates, disjoint from top and rest). No acceptances are made until the stream ends. The top candidates are placed into top, and elements are added to rest via round-pr, which preserves original marginal probabilities exactly by Lemma 2.2. The only change in probabilities then comes from the additive adjustments made when the stream ends. The main difference between the additive adjustments here and the waterfilling behavior in B is how we treat the people in rand. For the purpose of proof, consider a conceptual algorithm A that behaves exactly the same as A, except instead of rand it has zeros, a list of all candidates outside of top and rest. When this conceptual algorithm ends, all the mass from water-filling in zeros will be collected into a set of k random candidates, which becomes the equivalent of the list rand in A. In more detail, at the end of A , each member of top, rest, and zeros is given k sum n , and any overflow is redistributed using water-filling. Any candidate i now has the same probability of selection by both A and B: si + αi. A then performs a final round-pr step to select the outputs. Thus, in A , top, rest, and zeros are given the same treatment they would receive in B. Before the final step of selecting candidates by round-pr, A collects all of the mass accumulated in zeros Published in Transactions on Machine Learning Research (09/2023) from water-filling into a random subset of k members from zeros. Thus the only difference between A and A is that, A first kept everyone and later uniformly sampled a subset of k people, whereas A immediately keeps a random subset of k. This subsampling step does not change the marginal probabilities of any element in zeros; their probability before and after collecting the mass of zeros At the end, A selects candidates using round-pr on the entire set top rest rand, which does not affect the adjusted marginals si + αi. Thus pi = si + αi = qi i. Next, we show that the algorithm keeps at most O(k) candidates pending. Any candidate not in one of: top, rest, rand, pending is considered rejected. The sizes of top and rand are explicitly bounded by k/α . rest is not bounded in size explicitly, but note that the set is maintained by constantly applying round-pr({pi}i rest, 1 α). After rounding, at most 1 person in rest has a value < (1 α). If we ever have more than k/(1 α) elements, we have sum k/(1 α) (1 α) > k. This goes to the sum k case, which no longer adds elements to rest, thus it is bounded in length by k/(1 α) . pending is created initially by adding top and rest ( k/(1 α) + k/α pending), but for subsequent steps is never longer than k by Lemma 2.2. Thus the worst we can do is remain under a sum of k for the entire stream. top, rest, and rand will all be kept pending until the end of the stream, but still we have at most k/α + k/(1 α) + k/α = O(k) candidates pending. C.5 Proof of Corollary 4.2 Corollary 4.2. If C is individually fair, then Algorithm 4 is individually fair. Proof. The proof is analogous to that of Corollary 4.1.