# doubly_constrained_fair_clustering__a502723b.pdf Doubly Constrained Fair Clustering John Dickerson1,2, Seyed A. Esmaeili3, Jamie Morgenstern4, and Claire Jie Zhang4 1University of Maryland, College Park 2Arthur 3Simons Laufer Mathematical Sciences Institute 4University of Washington The remarkable attention which fair clustering has received in the last few years has resulted in a significant number of different notions of fairness. Despite the fact that these notions are well-justified, they are often motivated and studied in a disjoint manner where one fairness desideratum is considered exclusively in isolation from the others. This leaves the understanding of the relations between different fairness notions as an important open problem in fair clustering. In this paper, we take the first step in this direction. Specifically, we consider the two most prominent demographic representation fairness notions in clustering: (1) Group Fairness (GF), where the different demographic groups are supposed to have close to population-level representation in each cluster and (2) Diversity in Center Selection (DS), where the selected centers are supposed to have close to population-level representation of each group. We show that given a constant approximation algorithm for one constraint (GF or DS only) we can obtain a constant approximation solution that satisfies both constraints simultaneously. Interestingly, we prove that any given solution that satisfies the GF constraint can always be post-processed at a bounded degradation to the clustering cost to additionally satisfy the DS constraint while the reverse is not true given a solution that satisfies DS instead. Furthermore, we show that both GF and DS are incompatible (having an empty feasibility set in the worst case) with a collection of other distance-based fairness notions. Finally, we carry experiments to validate our theoretical findings. 1 Introduction Algorithms deployment in consequential settings, from their use in selecting interview candidates during hiring, to criminal justice systems for risk assessment, to making decisions about where public resources should be allocated [39, 28, 11, 18, 12], have led to an explosion in interest of developing classification and regression algorithms designed to have equitable predictions. More recently, these questions have been extended to unsupervised settings, with the recognition that algorithms which behave as subroutines may have downstream impacts on the ability of a system to make equitable decisions. For example, while personalized advertising seems to be a much lower-risk application than those mentioned above, the advertisements in question might pertain to lending, employment, or housing. For this reason, understanding the impact of unsupervised data pre-processing, including dimensionality reduction and clustering, have been of recent interest, despite the fact that many mathematical operationalizations of fairness do not behave nicely under composition [19]. Clustering is a canonical problem in unsupervised learning and arguably the most fundamental. It is also among the classical problems in operations research and used heavily in facility location as well as customer segmentation. As a result, fairness in clustering has also been well-studied 37th Conference on Neural Information Processing Systems (Neur IPS 2023). in recent years [8]. Much of this work has identified that existing clustering algorithms fail to satisfy various notions of fairness, and introduce special-purpose algorithms whose outcomes do conform to these definitions. As in supervised learning, a list of mathematical constraints have been introduced as notions of fairness for clustering (more than seven different constraints so far). For example, Chierichetti et al. [17] show algorithms that are guaranteed to produce clustering outputs that prevent the under-representation of a demographic group in a given cluster, hence being compliant to the disparate impact doctrine [22]. Brubach et al. [13] show algorithms that bound the probability of assigning two nearby points to different clusters, therefore guaranteeing a measure of community cohesion. Other algorithms for well-motivated fairness notions have also been introduced, such as minimizing the maximum clustering cost per population among the groups [24, 1], assigning distance-to-center values that are equitable [14], and having proportional demographic representation in the chosen cluster centers [32]. These constraints and others may be well-justified in a variety of specific application domains, and which are more appropriate will almost certainly depend on the particular application at hand. The dominant approach in the literature has imposed only one constraint in a given setting, though some applications of clustering (which are upstream of many possible tasks) might naturally force us to reconcile these different constraints with one another. Ideally, one would desire a single clustering of the data which satisfies a collection of fairness notions instead of having different clusterings for different fairness notions. A similar question was investigated in fair classification [31, 18] where it was shown that unless the given classification instance satisfies restrictive conditions, the two desired fairness objectives of calibration and balance cannot be simultaneously satisfied. One would expect that such a guarantee would also hold in fair clustering. For various constraints it can be shown that they are in fact at odds with one another. However, it is also worthwhile on the other hand to ask if some fair clustering constraints are more compatible with one another, and how one can satisfy both simultaneously? Our Contributions: In this paper, we take a first step towards understanding this question. In particular, we consider two specific group fairness constraints (1) GF: The group fair clustering (GF) of Chierichetti et al. [17] which roughly states that clusters should have close to population level proportions of each group and (2) DS: The diversity in center selection (DS) constraint [32] which roughly states that the selected centers in the clustering should similarly include close to population level proportions of each group. We note that although these two definitions are both concerned with group memberships, the fact that they apply at different levels (clustered points vs selected centers) makes the algorithms and guarantees that are applicable for one problem not applicable to the other, certainly not in an obvious way. Further, both of these notions are motivated by disparate impact [22] which essentially states that different groups should receive the same treatment. Therefore, it is natural to consider the intersection of both definitions (GF + DS). We show that by post-processing any solution satisfying one constraint then we can always satisfy the intersection of both constraints. At a more precise level, we show that an α-approximation algorithm for one constraint results in an approximation algorithm for the intersection of the constraints with only a constant degradation to approximation ratio α. Additionally, we study the degradation in the clustering cost and show that imposing DS on a GF solution leads to a bounded degradation of the clustering cost while the reverse is not true. Moreover, we show that both GF and DS are incompatible (having an empty feasible set) with a set of distance-based fairness constraints that were introduced in the literature. Finally, we validate our finding experimentally. Due to the space limits we refer the reader to Appendix C for the proofs as well as further details. 2 Related Work Fairness in clustering: GF and DS. In the line of works on group-level fairness, Chierichetti et al. [17] defined balance in the case of two groups to require proportion of a group in any cluster to resemble its proportion in input data. They also proposed the method of creating fairlets and then run generic clustering algorithm on fairlets. Bera et al. [9] generalized the notion of balance to multiple groups, and considered when centers are already chosen how to assign points to centers so that each group has bounded representation in each cluster. While only assuming probabilistic group assignment, Esmaeili et al. [21] presented algorithms that guarantee cluster outputs satisfy expected group ratio bounds. Also working with bounded representation, Esmaeili et al. [20] showed how to minimize additive violations of fairness constraint while ensuring clustering cost is within given upper bound. Besides center-based clustering, group fairness constraint is also applied to spectral clustering [33]. Recent work due to Wang et al. [41] speeds up computation in this setting. Ahmadi et al. [2] applied group fairness to correlation clustering. As a recent follow-up, Ahmadian and Negahbani [4] generalized the setting and improved approximation guarantee. Ahmadian et al. [6] and Chhabra and Mohapatra [16] introduced group fairness in the context of hierarchical clustering, with work due to Knittel et al. [35] proposing an algorithm with improved approximation factor in optimizing cost with fairness setting. Diversity in center selection was first studied in Kleindessner et al. [32] for data summarization tasks. The authors presented an algorithms that solves the fair k-centers problem with an approximation factor that is exponential in the number of groups and with a running time that is linear in the number of input points. A follow-up work Jones et al. [29] improved the approximation factor to a constant while keeping the run time linear in number of input points. Concerned with both the diversity in selected center and its distortion of the summary, Angelidakis et al. [7] proposed an approximation algorithm for the k-centers problem that ensures number of points assigned to each center is lower bounded. Recent work due to Nguyen et al. [37] generalized the problem to requiring group representation in centers to fall in desired range, which is the setting this work is using. Other fairness considerations in clustering. Besides fairness notions already mentioned, other individual fairness notions include requiring individual points to stay close to points that are similar to themselves in output clusters such as that of [34, 13]. The proportionally fair clustering based on points forming coalitions [15]. A notion based on individual fairness that states that points should have centers within a distance R if there are n/k points around it within R [36, 30]. Newer fairness notions on clustering problems were introduced recently in Ahmadi et al. [3] and Gupta and Dukkipati [26]. For the interested reader, we recommend a recent overview due to Awasthi et al. [8] for exhaustive references and an accessible overview of fair clustering research. 3 Preliminaries and Symbols We are given a set of points C along with a metric distance d(., .). The set of chosen centers is denoted by S C and the assignment function (assigning points to centers) is ϕ : C S. We are concerned with the k-center clustering which minimizes the maximum distance between a point and its assigned center. Formally, we have: min S:|S| k,ϕ max j C d(j, ϕ(j)) (1) In the absence of constraints, the assignment function ϕ(.) is trivial since the optimal assignment is to have each point assigned to its nearest center. However, when the clustering problem has constraints this is generally not the case. In fair clustering, each point j C is assigned a color by a function χ(j) = h H to indicate its demographic group information, where H is the set of all colors. For simplicity, we assume that each point has one group associated with it and that the total number of colors m = | H | is a constant. Moreover, the set of points with color h are denoted by Ch. The total number of points is n = | C | and the total number of points of color h is nh = | Ch |. It follows that the proportion of color h is rh = nh n . Finally, given a clustering (S, ϕ), we denote the set of points in the ith cluster by Ci and the subset of color h by Ch i = Ci Ch. We now formally introduce the group fair clustering (GF) and the diverse center selection (DS) problems: Group Fair Clustering [17, 10, 9, 21, 5]: Minimize objective (1) subject to proportional demographic representation in each cluster. Specifically, i S, h H : βh| Ci | | Ch i | αh| Ci | where βh and αh are pre-set upper and lower bounds for the demographic representation of color h in a given cluster. Diverse Center Selection [32, 29, 37]: Minimize objective (1) subject to the set of centers S satisfying demographic representation. Specifically, denoting the number of centers from demographic (color) h by kh = |S Ch |, then as done in [37] it must satisfy kl h kh ku h where kl h and ku h are lower and upper bounds set for the number of centers of color h, respectvily. Importantly, throughout we have h H : βh > 0. Further, for GF we consider solutions that could have violations to the constraints as done in the literature [9, 10]. Specifically, a given a solution (S, ϕ) has an additive violation of ρ GF if ρ is the smallest number such that the following holds: i S, h H : βh|Ci| ρ |Ch i | αh|Ci| + ρ. We denote the problem of minimizing the k-center objective while satisfying both the GF and DS constraints as GF+DS. Why Consider GF and DS in Particular? There are two reasons to consider the GF and DS constraints in particular. First, from the point of view of the application both GF and DS are concerned with demographic (group) fairness. Further, they are both specifically focused on the representation of groups, i.e. the proportions of the groups (colors) in the clusters for GF and in the selected center for DS. Second, they are both distance-agnostic , i.e. given a clustering solution one can decide if it satisfies the GF or DS constraints without having access to the distance between the points. 4 Algorithms for GF+DS 4.1 Active Centers We start by observing the fact that if we wanted to satisfy both GF and DS simultaneously, then we should make sure that all centers are active (having non-empty clusters). More precisely, given a solution (S, ϕ) then the DS constraints should be satisfied further i S : |Ci| > 0, i.e. every center in S should have some point assigned to it and therefore not forming an empty cluster. The following example clarifies this: Figure 1: In this graph the distance between the points is defined as the path distance. Example: Consider Figure 1. Suppose we have k = 2 and we wish to satisfy the GF and DS constraints with equal red to blue representation. DS requires one blue and one red center. Further, each cluster should have |Cblue i | = |Cred i | = 1 2|Ci| to satisfy GF. Consider the following solution S1 = {2, 4} and ϕ1 which assigns all points to point 2 including point 4. This satisfies GF and DS. Since we have one blue center and one red center. Further, the cluster of center 4 has no points and therefore 0 = |Cblue i | = |Cred i | = 1 2|Ci|. Another solution would have S2 = S1 = {2, 4} but with ϕ2 assigning points 2 and 3 to center 2 and points 1 and 4 to center 4. This would also clearly satisfy the GF and DS constraints. There is a clear issue in the first solution which is that although center 4 is included in the selection it has no points assigned to it (it is an empty cluster). This makes it functionally non-existent. This is why the definition should only count active centers. This issue of active centers did not appear before in DS [32, 37], the reason behind this is that it is trivial to satisfy when considering only the DS constraint since each center is assigned all the points closest to it. This implies that the center will at least be assigned to itself, therefore all centers in a DS solution are active. However, we cannot simply assign each point to its closest center when the GF constraints are imposed additionally as the colors of the points have to satisfy the upper and lower proportion bounds of GF. 4.2 The DIVIDE Subroutine Here we introduce the DIVIDE subroutine (block 1) which is used in subsections 4.3 and 4.4 in algorithms for converting solutions that only satisfy DS or GF into solutions that satisfy GF+DS. DIVIDE takes a set of points C (which is supposed to be a single cluster) with center i along with a subset of chosen points Q (Q C). The entire set of points is then divided among the points Q forming |Q| many new non-empty (active) clusters. Importantly, the points of each color are divided among the new centers in Q so that the additive violation increases by at most 2. See Figure 2 for an intuitive illustration. Here we use the symbol q to index a point in the set Q. Importantly, the numbering starts with 0 and ends with |Q| 1. We prove the following about DIVIDE: Figure 2: Illustration of DIVIDE subroutine. Algorithm 1 DIVIDE 1: Input: Set of points C with center i C, Subset of points Q (Q C) of cardinality |Q|. 2: Output: An assignment function ϕ : C Q. 3: if |Q| = 1 then 4: Assign all points C to the single center in Q. 5: else 6: Set first Index = 0. 7: for h H do 8: Set: Th = |Ch| |Q| , bh = Th |Q| Th , count = 0 9: Set: q = first Index 10: while count |Q| 1 do 11: if bh > 0 then 12: Assign Th many points of color h in C to center q. 13: Update bh = bh 1. 14: Update first Index = (first Index + 1) mod |Q|. 15: else 16: Assign Th many points of color h in C to center q. 17: end if 18: Update q = (q + 1) mod |Q|, count = count + 1. 19: end while 20: end for 21: end if Lemma 1. Given a non-empty cluster C with center i and radius R that satisfies the GF constraints at an additive violation of ρ and a subset of points Q (Q C). Then the clustering (Q, ϕ) where ϕ = DIVIDE(C, Q) has the following properties: (1) The GF constraints are satisfied at an additive violation of at most ρ |Q| + 2. (2) Every center in Q is active. (3) The clustering cost is at most 2R. If |Q| = 1 then guarantee (1) is for the additive violation is at most ρ. 4.3 Solving GF+DS using a DS Algorithm Here we show an algorithm that gives a bounded approximation for GF+DS using an approximation algorithm for DS. Algorithm 2 works by first calling an αDS-approximation algorithm resulting in a solution ( S, ϕ) that satisfies the DS constraints, then it solves an assignment problem using the ASSIGNMENTGF algorithm (shown in 3) where points are routed to the centers S to satisfy the GF constraint. The issue is that some of the centers in S may become closed and as a result the solution may no longer satisfy the DS constraints. Therefore, we have a final step where more centers are opened using the DIVIDE subroutine to satisfy the DS constraints while still satisfying the GF constraints at an additive violation and having a bounded increase to the clustering cost. ASSIGNMENTGF works by solving a linear program (2) to find a clustering which ensures that (1) each cluster has at least a βh fraction and at most an αh fraction of its points belonging to color h, and (2) the clustering assigns each point to a center that is within a minimum possible distance R. While the resulting LP solution could be fractional, the last step of ASSIGNMENTGF uses MAXFLOWGF which is an algorithm for rounding an LP solution to valid integral assignments at a bounded degradation to the GF guarantees and no increase to the clustering cost. See Appendix B for details on the MAXFLOWGF and its guarantees. Algorithm 2 DSTOGF+DS 1: Input: Points C, Solution ( S, ϕ) with clusters {Ci, . . . , C k} satisfying the DS constraints with | S| = k k of approximation ratio αDS for the DS clustering problem. 2: Output: Solution (S, ϕ) satisfying the GF and DS constraints simultaneously. 3: (S , ϕ ) =ASSIGNMENTGF( S, C) 4: Update the set of centers S by deleting all non-active centers (which have no points assigned to them). Let {C 1, . . . , C k } be the (non-empty) clusters of the solution (S , ϕ ) with |S | = k k. 5: Set h H : sh = |S Ch | , Set i S : Qi = {i} 6: while h H such that sh < kl h do 7: Pick a color h0 such that sh0 < kl h0. 8: Pick a center i S where there exists a point of color h0. 9: Pick a point jh0 of color h0 in cluster C i 10: Set Qi = Qi {jh0}. 11: Update sh0 = sh0 + 1. 12: end while 13: for i S do 14: ϕi = DIVIDE(C i, Qi). 15: j C i : Set ϕ(j) = ϕi(j). 16: end for 17: Set S = S i S Qi . Algorithm 3 ASSIGNMENTGF 1: Input: Set of centers S, Set of Points C. 2: Output: An assignment function ϕ : C S. 3: Using binary search over the distance matrix, find the smallest radius R such that LP(C, S, R) in (2) is feasible and call the solution x . 4: Solve MAXFLOWGF(x , C, S) and call the solution x . LP(C, S, R) : j C, i S : xij = 0 if d(i, j) > R (2a) h H, i S : βh X j Ch xij αh X j C xij (2b) i S xij = 1 (2c) j C, i S : xij [0, 1] (2d) To establish the guarantees we start with the following lemma: Lemma 2. Solution (S , ϕ ) of line (3) in algorithm 2 has the following properties: (1) It satisfies the GF constraint at an additive violation of 2, (2) It has a clustering cost of at most (1 + αDS)R GF+DS where R GF+DS is the optimal clustering cost (radius) of the optimal solution for GF+DS, (3) The set of centers S is a subset (possibly proper subset) of the set of centers S, i.e. S S. Theorem 4.1. Given an αDS-approximation algorithm for the DS problem, then we can obtain an 2(1 + αDS)-approximation algorithm that satisfies GF at an additive violation of 3 and satisfies DS simultaneously. Remark: If in algorithm 2 no center is deleted in line (4) because it forms an empty cluster, then by Lemma 2 the approximation ratio is 1 + αDS which is an improvement by a factor of 2. Further, the additive violation for GF is reduced from 3 to 2. 4.4 Solving GF+DS using a GF Solution Algorithm 4 GFTOGF+DS 1: Input: Points C, Solution ( S, ϕ) with clusters { Ci, . . . , C k} satisfying the GF constraints with | S| = k k. 2: Output: Solution (S, ϕ) satisfying the GF and DS constraints simultaneously. 3: Initialize: h H : sh = 0, i S : Qi = {}. 4: for i S do 5: if h H : sh < kl h then 6: Let h0 be a color such that sh0 < kl h0 7: else 8: Pick h0 such that sh0 + 1 ku h0. 9: end if 10: Pick a point jh0 of color h0 in cluster Ci 11: Set Qi = {jh0}. 12: Update sh0 = sh0 + 1. 13: end for 14: while h H : sh < kl h do 15: Pick a color h0 such that sh0 < kl h0. 16: Pick a center i S with cluster Ci where there exists a point of color h0 not in Qi. 17: Pick a point jh0 of color h0 in cluster Ci 18: Set Qi = Qi {jh0}. 19: Update sh0 = sh0 + 1. 20: end while 21: Set S = i SQi. 22: for i S do 23: ϕi = DIVIDE( Ci, Qi). 24: j Ci : Set ϕ(j) = ϕi(j). {Assignment to center is updated using DIVIDE.} 25: end for Here we start with a solution ( S, ϕ) of cost R that satisfies the GF constraints and we want to make it satisfy GF and DS simultaneously. More specifically, given any GF solution we show how it can be post-processed to satisfy GF+DS at a bounded increase to its clustering cost by a factor of 2 (see Theorem 4.2). This implies as a corollary that if we have an αGF-approximation algorithm for GF then we can obtain a 2αGF-approximation algorithm for GF+DS (see Corollary 1). The algorithm essentially first covers each given cluster Ci of the given solution ( S, ϕ) by picking a point of some color h to be a future center given that picking a point of such a color would not violate the DS constraints (lines(4-13)). If there are still colors which do not have enough picked centers (below the lower bound kl h), then more points are picked from clusters where points of such colors exist (lines(14-20)). Once the algorithm has picked correct points for each color, then the DIVIDE subroutine is called to divide the cluster among the picked points. Now we state the main theorem: Theorem 4.2. If we have a solution ( S, ϕ) of cost R that satisfies the GF constraints where the number of non-empty clusters is | S| = k k, then we can obtain a solution (S, ϕ) that satisfies GF at an additive violation of 2 and DS simultaneously with cost R 2 R. Corollary 1. Given an αGF-approximation algorithm for GF, then we can have a 2αGFapproximation algorithm that satisfies GF at an additive violation of 2 and DS simultaneously. Remark: If the given GF solution has the number of cluster k = k, then the output will have an additive violation of zero, i.e. satisfy the GF constraints exactly. This would happen DIVIDE would always receive Qi with |Qi| = 1 and therefore we can use the guarantee of DIVIDE for the special case of |Q| = 1. Figure 3: Figure showing the Po F relation between Unconstrained, GF, DS, and GF+DS clustering. 5 Price of (Doubly) Fair Clustering Here we study the degradation in the clustering cost (the price of fairness) that comes from imposing the fairness constraint on the clustering objective. The price of fairness Po Fc is defined as Po Fc = Clustering Cost subject to Constraint c Clustering Cost of Agnostic Solution [20, 9]. Note that since we have two constrains here GF and DS, we also consider prices of fairness of the form Po Fc1 c2 = Clustering Cost subject to Constraints c1 and c2 Clustering Cost subject to Constraint c1 which equal the amount of degradation in the clustering cost if we were to impose constraint c2 in addition to constraint c1 which is already imposed. Note that we are concerned with the price of fairness in the worst case. Interesingly, we find that imposing the DS constraint over the GF constraint leads to a bounded Po F if we allow an additive violation of 2 for GF while the reverse is not true even if we allow an additive violation of Ω( n k ) for GF. We find the following: Proposition 5.1. For any value of k 2, imposing GF can lead to an unbounded Po F even if we allow an additive violation of Ω( n Proposition 5.2. For any value of k 3, imposing DS can lead to an unbounded Po F. Proposition 5.3. For any value of k 2, imposing GF on a solution that only satisfies DS can lead to an unbounded increase in the clustering cost even if we allow an additive violation of Ω( n Proposition 5.4. Imposing DS on a solution that only satisfies GF leads to a bounded increase in the clustering cost of at most 2 (Po F 2) if we allow an additive violation of 2 in the GF constraints. 6 Incompatibility with Other Distance-Based Fairness Constraints In this section, we study the incompatibility between the DS and GF constraints and a family of distance-based fairness constraints. We note that the results of this section do not take into account the clustering cost and are based only on the feasibility set. That is, we consider more than one constraints simultaneously and see if the feasibility set is empty or not. Two constraints are considered incompatible if the intersection of their feasible sets is empty. In some cases we also consider solution that could have violations to the constraints. We present two main findings here and defer the proofs and further details to the Appendix section1. Theorem 6.1. For any value k 2, the fairness in your neighborhood [30], socially fair constraint [1, 24] are each incompatible with GF even if we allow an additive violation of Ω( n k ) in the GF constraint. For any value k 5, the proportionally fair constraints [15] is incompatible with GF even if we allow an additive violation of Ω( n k ) in the GF constraint. Theorem 6.2. For any value k 3, the fairness in your neighborhood [30], socially fair [1, 24] and proportionally fair [15] constraints are each incompatible with DS. 1Note that socially fair clustering [1, 24] is defined as an optimization problem not a constraint. However, it can be straightforwardly turned into a constraint, see the Appendix for full details. 7 Experiments We use Python 3.9, the CPLEX package [38] for solving linear programs and Network X [27] for max-flow rounding. Further, Scikit-learn is used for some standard ML related operations. We use commdity hardware, specifically a Mac Book Pro with an Apple M2 chip. We conduct experiments over datasets from the UCI repository [23] to validate our theoretical findings. Specifically, we use the Adult dataset sub-sampled to 20,000 records. Gender is used for group membership while the numeric entries are used to form a point (vector) for each record. We use the Euclidean distance. Further, for the GF constraints we set the lower and upper proportion bounds to βh = (1 δ)rh and αh = (1 + δ)rh for each color h where rh is color h s proportion in the dataset and we set δ = 0.2. For the DS constraints, since we do not deal with a large number of centers we set kl h = 0.8rhk and ku h = rhk. We compare the performance of 5 algorithms. Specifically, we have (1) COLOR-BLIND: An implementation of the Gonzalez k-center algorithm [25] which achieves a 2-approximation for the unconstrained k-center problem. (2) ALG-GF: A GF algorithm which follows the sketch of [9], however the final rounding step is replaced by an implementation of the MAXFLOWGF rounding subroutine. This algorithm has a 3-approximation for the GF constrained instance. (3) ALG-DS: An algorithm for the DS problem recently introduced by [37] for which also has an approximation of 3. (4) GFTOGFDS: An implementation of algorithm 4 where we simply use the GF algorithm just mentioned to obtain a GF solution. (5) DSTOGFDS: Similarly an implementation of algorithm 2 where DS algorithm is used as a starting point instead. Throughout we measure the performance of the algorithms in terms of (1) Po F: The price of fairness of the algorithm. Note that we always calculate the price of fairness by dividing by the COLOR-BLIND clustering cost since it solves the unconstrained problem. (2) GF-Violation: Which is the maximum additive violation of the solution for the GF constraint as mentioned before. (3) DS-Violation: Which is simply the maximum value of the under-representation or over-representation across all groups in the selected centers. Figure 4: Adult dataset results: (a) Po F comparison of 5 algorithms, with COLOR-BLIND as baseline; (b) GF-Violation comparison; (c) DS-Violation comparison. Figure 4 shows the behaviour of all 5 algorithms. In terms of Po F, all algorithms have a significant degredation in the clustering cost compared to the COLOR-BLIND baseline except for ALG-DS. However, ALG-DS has a very large GF-Violation. In fact, the GF-Violation of ALG-DS can be more than 5 times the GF-Violation of COLOR-BLIND. This indicates that while ALG-DS has a small clustering cost, it can give very bad guarantees for the GF constraints. Finally, in terms of the DS-Violation we see that the ALG-GF and the COLOR-BLIND solution can violate the DS constraint. Note that both coincide perfectly on each other. Further, although the violation is 1, it is very significant since unlike the GF constraints the number of centers can be very small. On the other hand, we see that both GFTOGFDS and DSTOGFDS give the best of both worlds having small values for the GF-Violation and zero values for the DS-Violation and while their price of fairness can be significant, it is comparable to ALG-GF. Interestingly, the GFTOGFDS and DSTOGFDS are in agreement in terms of measures. This could be because our implementations of the GF part of DSTOGFDS (its handling of the GF constraints) has similarities to the GFTOGFDS algorithm. We show further experiments in the appendix. [1] Mohsen Abbasi, Aditya Bhaskara, and Suresh Venkatasubramanian. Fair clustering via equitable group representations, 2020. [2] Saba Ahmadi, Sainyam Galhotra, Barna Saha, and Roy Schwartz. Fair correlation clustering. 2020. [3] Saba Ahmadi, Pranjal Awasthi, Samir Khuller, Matthäus Kleindessner, Jamie Morgenstern, Pattara Sukprasert, and Ali Vakilian. Individual preference stability for clustering. ar Xiv preprint ar Xiv:2207.03600, 2022. [4] Sara Ahmadian and Maryam Negahbani. Improved approximation for fair correlation clustering. In International Conference on Artificial Intelligence and Statistics, pages 9499 9516. PMLR, 2023. [5] Sara Ahmadian, Alessandro Epasto, Ravi Kumar, and Mohammad Mahdian. Clustering without over-representation. In International Conference on Knowledge Discovery and Data Mining, 2019. [6] Sara Ahmadian, Alessandro Epasto, Marina Knittel, Ravi Kumar, Mohammad Mahdian, Benjamin Moseley, Philip Pham, Sergei Vassilvtiskii, and Yuyan Wang. Fair hierarchical clustering. In Neural Information Processing Systems, 2020. [7] Haris Angelidakis, Adam Kurpisz, Leon Sering, and Rico Zenklusen. Fair and fast k-center clustering for data summarization. In International Conference on Machine Learning, pages 669 702. PMLR, 2022. [8] Pranjal Awasthi, Brian Brubach, Deeparnab Chakrabarty, John P. Dickerson, Seyed Esmaeili, Matthäus Kleindessner, Marina Knittel, Jamie Morgenstern, Samira Samadi, Aravind Srinivasan, and Leonidas Tsepenekas. A tutorial and resources for fair clustering. https://www.fairclustering.com/. Accessed: 2023-05-16. [9] Suman Bera, Deeparnab Chakrabarty, Nicolas Flores, and Maryam Negahbani. Fair algorithms for clustering. In Neural Information Processing Systems, 2019. [10] Ioana O Bercea, Martin Groß, Samir Khuller, Aounon Kumar, Clemens Rösner, Daniel R Schmidt, and Melanie Schmidt. On the cost of essentially fair clusterings. 2019. [11] Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research, 50(1): 3 44, 2021. [12] Anna Brown, Alexandra Chouldechova, Emily Putnam-Hornstein, Andrew Tobin, and Rhema Vaithianathan. Toward algorithmic accountability in public services: A qualitative study of affected community perspectives on algorithmic decision-making in child welfare services. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pages 1 12, 2019. [13] Brian Brubach, Darshan Chakrabarti, John P Dickerson, Samir Khuller, Aravind Srinivasan, and Leonidas Tsepenekas. A pairwise fair and community-preserving approach to k-center clustering. 2020. [14] Darshan Chakrabarti, John P Dickerson, Seyed A Esmaeili, Aravind Srinivasan, and Leonidas Tsepenekas. A new notion of individually fair clustering: α-equitable k-center. In International Conference on Artificial Intelligence and Statistics, pages 6387 6408. PMLR, 2022. [15] Xingyu Chen, Brandon Fain, Charles Lyu, and Kamesh Munagala. Proportionally fair clustering. 2019. [16] Anshuman Chhabra and Prasant Mohapatra. Fair algorithms for hierarchical agglomerative clustering. In 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA), pages 206 211. IEEE, 2022. [17] Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. Fair clustering through fairlets. In Neural Information Processing Systems, 2017. [18] Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2):153 163, 2017. [19] Cynthia Dwork and Christina Ilvento. Fairness under composition. ar Xiv preprint ar Xiv:1806.06122, 2018. [20] Seyed Esmaeili, Brian Brubach, Aravind Srinivasan, and John Dickerson. Fair clustering under a bounded cost. Advances in Neural Information Processing Systems, 34:14345 14357, 2021. [21] Seyed A Esmaeili, Brian Brubach, Leonidas Tsepenekas, and John P Dickerson. Probabilistic fair clustering. In Neural Information Processing Systems, 2020. [22] Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. Certifying and removing disparate impact. In proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pages 259 268, 2015. [23] A Frank. Uci machine learning repository. irvine, ca: University of california, school of information and computer science. http://archive. ics. uci. edu/ml, 2010. [24] Mehrdad Ghadiri, Samira Samadi, and Santosh Vempala. Socially fair k-means clustering. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 438 448, 2021. [25] Teofilo F Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical computer science, 38:293 306, 1985. [26] Shubham Gupta and Ambedkar Dukkipati. Consistency of constrained spectral clustering under graph induced fair planted partitions. Advances in Neural Information Processing Systems, 35: 13527 13540, 2022. [27] Aric Hagberg, Dan Schult, Pieter Swart, D Conway, L Séguin-Charbonneau, C Ellison, B Edwards, and J Torrents. Networkx. high productivity software for complex networks. Webová strá nka https://networkx. lanl. gov/wiki, 2013. [28] White House. Big data: A report on algorithmic systems, opportunity, and civil rights. executive office of the president, 2016. [29] Matthew Jones, Huy Lê Nguyên, and Thy Nguyen. Fair k-centers via maximum matching. In International Conference on Machine Learning, 2020. [30] Christopher Jung, Sampath Kannan, and Neil Lutz. A center in your neighborhood: Fairness in facility location. 2019. [31] Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair determination of risk scores. ar Xiv preprint ar Xiv:1609.05807, 2016. [32] Matthäus Kleindessner, Pranjal Awasthi, and Jamie Morgenstern. Fair k-center clustering for data summarization. 2019. [33] Matthäus Kleindessner, Samira Samadi, Pranjal Awasthi, and Jamie Morgenstern. Guarantees for spectral clustering with fairness constraints. In International Conference on Machine Learning, pages 3458 3467. PMLR, 2019. [34] Matthäus Kleindessner, Pranjal Awasthi, and Jamie Morgenstern. A notion of individual fairness for clustering. 2020. [35] Marina Knittel, Max Springer, John P. Dickerson, and Mohammad Taghi Hajiaghayi. Generalized reductions: Making any hierarchical clustering fair and balanced with low cost. In International Conference on Machine Learning, 2023. [36] Sepideh Mahabadi and Ali Vakilian. Individual fairness for k-clustering. In International Conference on Machine Learning, 2020. [37] Huy Lê Nguyen, Thy Nguyen, and Matthew Jones. Fair range k-center. ar Xiv preprint ar Xiv:2207.11337, 2022. [38] Stefan Nickel, Claudius Steinhardt, Hans Schlenker, and Wolfgang Burkart. Ibm ilog cplex optimization studio a primer. In Decision Optimization with IBM ILOG CPLEX Optimization Studio: A Hands-On Introduction to Modeling with the Optimization Programming Language (OPL), pages 9 21. Springer, 2022. [39] Manish Raghavan, Solon Barocas, Jon Kleinberg, and Karen Levy. Mitigating bias in algorithmic hiring: Evaluating claims and practices. In Proceedings of the 2020 conference on fairness, accountability, and transparency, pages 469 481, 2020. [40] Clemens Rösner and Melanie Schmidt. Privacy preserving clustering with constraints. ar Xiv preprint ar Xiv:1802.02497, 2018. [41] Ji Wang, Ding Lu, Ian Davidson, and Zhaojun Bai. Scalable spectral clustering with group fairness constraints. In International Conference on Artificial Intelligence and Statistics, pages 6613 6629. PMLR, 2023. A Useful Facts and Lemmas Fact A.1. Given non-negative numbers a1, a2, . . . , an and positive numbers b1, b2, . . . , bn, then: min i [n] ai bi i [n] bi max i [n] ai bi B MAXFLOWGF We start with the following Lemma. First, note that xqj is a decision variable if xqj = 1 then point j is assigned to center q and if xqj = 0 then it is not. In an integral solutions xqj {0, 1}, but a fractional LP solution could instead have values in [0, 1]. We use the bold symbol x for the collection of value {xqj}q Q,j C: Lemma 3. Given a fractional solution xfrac that satisfies the GF constraints at an additive violation of at most ρ, then if there exists an integral solution xinteg that satisfies: j C xfrac qj j C xinteg qj j C xfrac qj j Ch xfrac qj j Ch xinteg qj j Ch xfrac qj Then this integral solution xinteg satisifies the GF constraints at an additive violation of at most ρ + 2. Proof. Since the fractional solution satisfies the GF constraints at an additive violation of ρ, then we have the following: j C xfrac qj X j Ch xfrac qj αh X j C xfrac qj + ρ We start with the upper bound: j Ch xinteg qj j Ch xfrac qj j Ch xfrac qj + 1 j C xfrac qj + ρ + 1 j C xinteg qj + 1) + ρ + 1 j C xinteg qj + (αh + ρ + 1) j C xinteg qj + (ρ + 2) Now we do the lower bound: j Ch xinteg qj j Ch xfrac qj j Ch xfrac qj 1 j C xfrac qj ρ 1 j C xinteg qj 1) (ρ + 1) j C xinteg qj (βh + ρ + 1) j C xinteg qj (ρ + 2) The LP solution given to MAXFLOWGF satisfies the GF constraints at an additive violation of ρ, we want to show that the output integral solution satisfies the above conditions of Eqs (3 and 4). The MAXFLOWGF(x LP, C, Q) subroutine is similar to that shown in [20, 5, 10]. Specifically, given an LP solution x LP = {x LP}q Q,j Q, a set of points C, and a set of centers Q and a color assignment function χ : C H which assigns to each point in C exactly one color in the set of colors H, we construct the flow network (V, A) according to the following: 1. V = {s, t} C {(q, qh)|q Q, h H}. 2. A = A1 A2 A3 A4 where A1 = {(s, j)|j C} with upper bound of 1. A2 = {(j, (q, qh))|j C, xqj > 0} with upper bound of 1. The arc set A3 = {((q, qh), q)|q Q, h H} with lower bound j P j Ch x LP qj k and upper bound of l P j Ch x LP qj m . As for A4 = {(q, t)|q Q} the lower and upper bounds are j P j C x LP qj k and l P j C x LP qj m . In the above all lower and upper bounds of the network are integral, therefore if we can show a feasible solution to the above then there must exist an integral flow assignment which also satisfies the constraints. By the construction of the network we have the following fact about any max flow integral solution xinteg. Fact B.1. j C x LP qj j C xinteg qj j C x LP qj j Ch x LP qj j Ch xinteg qj j Ch x LP qj Accordingly, the following theorem immediately holds: Theorem B.1. Given an LP solution to MAXFLOWGF that satisfies the GF constraints at an additive violation of ρ and a clustering cost of R, then the output integral solution satisfies the GF constraints at an additive violation of ρ + 2 and a clustering cost of at most R. Proof. The guarantee for the additive violation of GF follows immediately from Lemma 3 and Fact B.1. The guarantee for the clustering cost holds, since a point (vertex) j is not connected to a center vertex (q, qh) unless xqj > 0 which can only be the case if d(j, q) R. C OMITTED PROOFS We restate the following lemma and give its proof: Lemma 1. Given a non-empty cluster C with center i and radius R that satisfies the GF constraints at an additive violation of ρ and a subset of points Q (Q C). Then the clustering (Q, ϕ) where ϕ = DIVIDE(C, Q) has the following properties: (1) The GF constraints are satisfied at an additive violation of at most ρ |Q| + 2. (2) Every center in Q is active. (3) The clustering cost is at most 2R. If |Q| = 1 then guarantee (1) is for the additive violation is at most ρ. Proof. We first consider the case where |Q| > 1. We prove the following claim2: Claim 1. For the fractional assignment {xfrac qj }q Q,j C such that: q Q, h H : X j Ch xfrac qj = |Ch| It holds that: (1) q Q : P j C xfrac qj 1, (2) GF constraints are satisfied at an additive violation of ρ |Q|. Proof. Now we prove the first property j C xfrac qj = X j Ch xfrac qj = 1 |Q| h H |Ch| = |C| |Q| 1 (since Q C) (7) Since the GF constraints given center i are satisfied at an additive violation of ρ, then we have: h H : ρ + βh|C| |Ch| αh|C| + ρ (8) Therefore, since the amount of color for each center in Q with the fractional assignment can be obtained by dividing by |Q|, then we have: h H, q Q : ρ j C xfrac qj X j Ch xfrac qj αh X j C xfrac qj + ρ Therefore the GF constraints are satisfied at an additive violation of ρ |Q|. Denoting the assignment ϕ resulting from DIVIDE by {xinteg qj }q Q,j C, then the following claim holds: j C xfrac qj j C xinteg qj j C xfrac qj j Ch xfrac qj j Ch xinteg qj j Ch xfrac qj Proof. For any color h we have |Ch| = ah|Q| + bh where ah and bh are non-negative integers and bh is the remainder of dividing |Ch| by Q (bh {0, 1, . . . , |Q| 1}). It follows that P j Ch xfrac qj = Th = ah + bh |Q|. DIVIDE gives each center either P j Ch xinteg qj = ah = Th = j P j Ch xfrac qj k or P j Ch xinteg qj = ah + 1 = Th = l P j Ch xfrac qj m . This proves the second condition. For the first condition, note that |C| = P h H(ah|Q| + bh) = (P h H ah)|Q| + a|Q| + b where we set P h H bh = a|Q| + b with a and b being non-negative integers. b is the remainder and has values 2In our notation xqj [0, 1] denotes the assignment of point j to center q. in {0, 1, . . . , |Q| 1}. Accordingly, the sum of the remainders across the colors is a|Q|+b. Since the remainders are added successivly across the centers (see Figure 2) and a is divisible by |Q|, then for any center q Q either P j C xinteg qj = (P h H ah)+a or P j C xinteg qj = (P h H ah)+a+1. Note j C xfrac qj = P h H Th = (P h H ah)+a+ b |Q|. Therefore, j P j C xfrac qj k = (P j C xfrac qj m = (P h H ah) + a + 1. This proves, the first condition. By Claim 2 and Lemma 3 it follows that for each center q Q the assignment {xinteg qj }q Q,j C satisfies the GF constraints at an additive violation of ρ |Q| + 2, this proves the first guarantee. By Claim 2 and guarentee (1) of Claim 1, then q Q : P j C xinteg qj j P j C xfrac qj k 1. Therfeore, every center q Q is active proving the second guarantee. Guarantee (3) follows since j C : d(j, ϕ(j)) d(j, i) + d(i, ϕ(j)) 2R. Now if |Q| = 1, then guarantee (2) follows since the cluster C is non-empty. Guarantee (3) follows similarly to the above. The additive violation in the GF constraint on the other hand is ρ since the single center Q has the exact set of points that were assigned to the original center i. We restate the next lemma and give its proof: Lemma 2. Solution (S , ϕ ) of line (3) in algorithm 2 has the following properties: (1) It satisfies the GF constraint at an additive violation of 2, (2) It has a clustering cost of at most (1 + αDS)R GF+DS where R GF+DS is the optimal clustering cost (radius) of the optimal solution for GF+DS, (3) The set of centers S is a subset (possibly proper subset) of the set of centers S, i.e. S S. Proof. We begin with the following claim which shows that there exists a solution that only uses centers from S to satisfy the GF constraints exactly and at a radius of at most (1 + αDS)R GF+DS. Note that this claim has non-constructive proof, i.e. it only proves the existence of such a solution: Claim 3. Given the set of centers S resulting from the αDS-approximation algorithm, then there exists an assignment ϕ0 from points in C to centers in S such that the following holds: (1) The GF constraint is exactly satisfied (additive violation of 0). (2) The clustering cost is at most (1 + αDS)R GF+DS. Proof. Let (S GF+DS, ϕ GF+DS) be an optimal solution to the GF+DS problem. i S GF+DS let N(i) = arg min i S d(i, i), i.e. N(i) is the nearest center in S to center i (ties are broken using the smallest index). ϕ0 is formed by assigning all points which belong to center i S GF+DS to N(i). More formally, j C : ϕ GF+DS(j) = i we set ϕ0(j) = N(i). Note that it is possible for more than one center i in S GF+DS to have the same nearest center in S. We will now show that ϕ0 satisfies the GF constraint exactly. Note first that if a center i S has not been assigned any points by ϕ0, then it is empty and trivially satisfies the GF constraint exactly. Therefore, we assume that i has a non-empty cluster. Denote by N 1( i) the set of centers i S GF+DS for which i is the nearest center, then using Fact A.1 and the fact that every cluster in (S GF+DS, ϕ GF+DS) satisfies the GF constraint exactly we have: βh min i N 1( i) | Ch i | | Ci | i N 1( i) | Ch i | P i N 1( i) | Ci | = |Ch i | |C i| max i N 1( i) | Ch i | | Ci | αh (10) The proves guarantee (1) of the lemma. Now we prove guarantee (2), we denote by R DS the optimal clustering cost for the DS constrained problem. We can show that j C: d(j, ϕ0(j)) d(j, ϕ GF+DS(j)) + d(ϕ GF+DS(j), ϕ0(j)) d(j, ϕ GF+DS(j)) + d(ϕ GF+DS(j), N(ϕ GF+DS(j))) ( since ϕ0(j) = N(ϕ GF+DS(j)) ) R GF+DS + αDSR DS ( since S is an αDS-approximation for DS ) (1 + αDS)R GF+DS Where the last holds since R DS R GF+DS because the set of solutions constrained by DS is a subset of the set of solutions constrained by GF+DS. Now we can prove the lemma. By the above claim, it follows that when ASSIGNMENTGF is called, the LP solution from line (3) of algorithm block 3 satisfies: (1) The GF constraints exactly and (2) Has a clustering cost of at most (1 + αDS)R GF+DS. This is because LP (2) includes all integral assignments from C to S including ϕ0. Since this LP assignment is fed to MAXFLOWGF it follows by Theorem B.1 that the final solution satisfies: (1) The GF constraint at an additive violation of 2, (2) Has a clustering cost of at most (1 + αDS)R GF+DS. Guarantee (3) holds since some centers may become closed (assigned no points) and therefore S S (possibly being a proper subset). We restate the following theorem and give its proof: Theorem 4.1. Given an αDS-approximation algorithm for the DS problem, then we can obtain an 2(1 + αDS)-approximation algorithm that satisfies GF at an additive violation of 3 and satisfies DS simultaneously. Proof. By Lemma 2 above, the set of centers S is a subset (possibly proper) subset of S and therefore the DS constraints may no longer be satisfied. Algorithm 2 select points from each color h so that when they are added to S , then for each color h the set of centers is at least βhk. Since these new centers are opened using the DIVIDE subroutine then it follows that they are all active (guarantee (2) of Lemma 1). Further, by guarantee (3) of Lemma 1 for DIVIDE we have for any point j assigned to a new center q that d(j, q) 2d(j, ϕ (j)) 2(1 + αDS)R GF+DS. Finally, by guarantee (1) of Lemma 1 DIVIDE is called over a cluster that satisfies GF at an additive violation of 2 and therefore the resulting additive violation is at most max{2, 2 |Qi| + 2}. Since 2 2 |Qi| + 2 2 2 + 2 = 3. The additive violation is at most 3. We restate the next theorem and give its proof: Theorem 4.2. If we have a solution ( S, ϕ) of cost R that satisfies the GF constraints where the number of non-empty clusters is | S| = k k, then we can obtain a solution (S, ϕ) that satisfies GF at an additive violation of 2 and DS simultaneously with cost R 2 R. Proof. We point out the following fact: Fact C.1. Every cluster in ( S, ϕ) has at least one point from each color. Proof. This holds, since given a center i S we have | Ci| > 0 and therefore h H : | Ch i | βh| Ci| > 0 and therefore | Ch i | 1 since it must be an integer. We note that the values {βh, αh}h H and k must lead to a feasible DS problem, i.e. there exist positive integers gh such that P h H gh = k and h H : βhk gh αhk. Accordingly, since lines (4-13) in algorithm 4 can always pick a point of some color h such that the upper bound αhk is not exceeded for every cluster i. Therefore the following fact must hold Fact C.2. By the end of line (13) we have i S : |Qi| 1. Further, the final sh values are valid for DS: Claim 4. By the end of line (13) the values of sh satisfy: (1) P h H sh k, (2) h H : βhk sh αhk. Proof. Lines (4-13) add values to sh if the lower bound βhk for color h is not satisfied. If the lower bound is satisfied for all colors, then points of some color h are added provided that adding them would not exceed the upper bound of αhk (see line 5). Therefore, by the end of line (13) for any color h H : sh αhk and either sh βhk or sh < βhk3. If by the end of line (13) we have h H : sh βhk, then the algorithm moves to line (22). Otherwise, it will keep picking points and incrementing sh until h H : sh βhk. 3To see why we could have sh < βhk, consider the case where k < k and therefore there would not be enough clusters to so that we can add points for each color. Further, since such valid DS values exist it must be that the above satisfies P h H sh k and h H : sh αhk. This concludes the proof for the claim. By Lemma 1 for DIVIDE the new centers S = i SQi are all active (guarantee 2 of DIVIDE) and since the values of sh are valid (Claim 4 above), therefore S satisfies the DS constraints. Since the assignment in each cluster in the new solution (S, ϕ) is formed using DIVIDE over the clusters of ( S, ϕ) then by guarantee 1 of DIVIDE, each cluster (S, ϕ) satisifes GF at an additive violation of 2. Finally, the clustering cost is at most R 2 R (guarantee 3 of DIVIDE). We restate the following corollary and give its proof: Corollary 1. Given an αGF-approximation algorithm for GF, then we can have a 2αGFapproximation algorithm that satisfies GF at an additive violation of 2 and DS simultaneously. Proof. Using the previous theorem (Theorem 4.2) the solution ( S, ϕ) has a cost of R αGF OPTGF. The post-processed solution that satisfies GF at an additive violation of 2 and DS simultaneously has a cost of R 2 R 2αGF OPTGF 2αGF OPTGF+DS. The last inequality follows because OPTGF OPTGF+DS which is the case since both problems minimize the same objective, however by definition the constraint set of GF + DS is a subset of the constraint set of GF. Before we proceed, we define the following clustering instance which will be used in the proof: Definition 1. ℓ-Community Instance: The ℓ-community instance is a clustering instance where the set of points C can be partitioned into ℓcommunities (subsets) {CCI 1 , . . . , CCI ℓ } of coinciding points (points within the same community are separated by a distance of 0). Further, the communities are of equal size, i.e. i ℓ: |CCI i | = n ℓ. Moreover, the distance between any two points belonging to different communities in the partition is at least R > 0. Figure 5: An ℓ-community instance to show Price of Fairness (GF) and incompatibility between GF and other fairness constraints when k is even. Figure 6: An ℓ-community instance to show Price of Fairness (GF) and incompatibility between GF and other fairness constraints when k is odd. Figures 5 and 6 show two examples of the ℓ-community instances. When clustering with a value of k, the given ℓ-community instance with k = ℓis arguably the most natural clustering instance where the clustering output is the communities {CCI 1 , . . . , CCI k }. The following fact clearly holds for any ℓ-community instance: Fact C.3. If we cluster an ℓ-community instance with k = ℓthen: (1) The set of optimal solutions are (SCI, ϕCI) where SCI has exactly one center from each community {CCI 1 , . . . , CCI k }. Further, points are assigned to a center in the same community. (2) Clustering cost of (SCI, ϕCI) is 0. (3) Any solution other than (SCI, ϕCI) has a clustering cost of at least R > 0. We restate the following proposition and give its proof: Proposition 5.1. For any value of k 2, imposing GF can lead to an unbounded Po F even if we allow an additive violation of Ω( n Proof. Consider the case where k 2 is even and refer to Figure 5 where we have ℓ= k communities that alternate from red to blue color. Further, by Fact C.3 the optimal solution has a clustering cost of 0. The optimal solution would have one center in each of the k = ℓcommunities, and assign points to its closest center. If we set the lower and upper proportion bounds to 1 2 for both colors, then to satisfy GF each cluster should have both red and blue points. There must exists a cluster Ci of size |Ci| n k , it follows that to satisfy the GF constraints at an additive violation of ρ, then |Cblue i | 1 2|Ci| ρ = n 2k ρ and similarly we would have |Cred i | n 2k ρ. By setting ρ = n 2k ϵ for some constant ϵ > 0, then we have |Cblue i |, |Cred i | > 0. This implies that a point will be assigned to a center at a distance R > 0 and therefore the Po F is unbounded. For a value of k that is odd, see the example of Figure 6. Here instead the last community has the same number of red and blue points. We call the cluster whose center is in the last community Clast. If |Clast| = n k , then there are points assigned to the center of Clast from other communities incurring cost R > 0 or points in the last community are assigned to other centers at distance R > 0. If |Clast| = n k , then in the remaining k 1 communities with total of n n k points, k 1 centers are chosen. There must exists a cluster Ci of size |Ci| n k . We then follow the same argument as in the even k case, which is to satisfy GF with additive violation ρ = n 2k ϵ for both color, we must have |Cblue i |, |Cred i | > 0. This means at least a point will be assigned to a center at a distance R > 0 and therefore the Po F is unbounded for the odd k case as well. We restate the following proposition and give its proof: Proposition 5.2. For any value of k 3, imposing DS can lead to an unbounded Po F. Proof. Consider a case of the ℓcommunity instance shown in Figure 7 where k 3 and k = ℓ. Here all communities are blue, except for the last which has n 2k red points and n 2k green points. Similar to the previous proposition since it is a community instance with ℓ= k, then by Fact C.3 the optimal solution has a clustering cost of 0 and would have one center in each community and assign each point to its closest center. Suppose for DS we set kl blue, kl red, kl green > 0, this implies that we should pick a center of each color. This implies that we can have at most k 2 blue center, therefore there will be a community (composed of all blue points) where no point is picked as a center. Therefore, the clustering cost is R > 0 and the Po F is unbounded. Figure 7: An ℓ-community instance to show Price of Fairness (DS) and incompatibility between DS and other fairness constraints. We restate the following proposition and give its proof: Proposition 5.3. For any value of k 2, imposing GF on a solution that only satisfies DS can lead to an unbounded increase in the clustering cost even if we allow an additive violation of Ω( n Proof. The proof follows similarly to Proposition 5.1. For k 2 and k is even. Consider the same case as in Figure 5 where we have ℓ= k. In this case, we set the upper and lower proportion bounds for both GF and DS to 1 2. This implies to satisfy DS the number of red and blue centers should each be k 2. Thus solutions that satisfy the DS constraint are the optimal unconstrained solutions as specified in Fact C.3. The rest of the proof proceeds exactly as the proof for the even k case in Proposition 5.1. For k 3 and k is odd, consider the same case as in Figure 6. In this case, we set the upper and lower proportion bounds for GF to 1 2. And we set the upper and lower bound for number of centers in DS constraint as k 1 2 + 1 and k 1 2 respectively for both colors. Note that an optimal solution specified in Fact C.3 which chooses either a red or blue point in the right most community as a center satisfy this DS constraint. The rest of the proof proceeds exactly as the proof for the odd k case in Proposition 5.1. We restate the following proposition and give its proof: Proposition 5.4. Imposing DS on a solution that only satisfies GF leads to a bounded increase in the clustering cost of at most 2 (Po F 2) if we allow an additive violation of 2 in the GF constraints. Proof. This follows from Theorem 4.2 since we can always post-process a solution that only satisfies GF into one that satisfies both GF at an additive violation of 2 and DS simultaneously and clearly from the theorem we would have Po F = clustering cost of GF post-processed solution clustering cost of GF solution 2 clustering cost of GF solution clustering cost of GF solution 2. D Omitted Proofs, Additional Results, and Details for Section 6 In this section, we provide more details and proofs for theorems and facts that appeared in Section 6. We present proof for theorem 6.1. We begin by giving the full definitions of the relevant fairness constraints. Definition 2. Neighborhood Radius [30]: For a given set of points C to cluster and a given number of centers k, the neighborhood radius of a point j is the minimum radius r such that at least |C|/k of the points in C are within distance r of j: NRC,k(j) = min{r : |Br(j) C| |C|/k},where Br(j) is the closed ball of radius r around j. Definition 3. Fairness in Your Neighborhood Constraint [30]: For a given set of points C with metric d(., .), a clustering (S, ϕ) is αNR-fair if for all j C, d(j, ϕ(j)) αNR NRC,k(j). Definition 4. Socially Fair [1, 24]: For a clustering problem with k centers on points C which are from | H | groups and h H Ch = C, the socially fair clustering optimization problem is to minimize the maximum average clustering cost across all groups: min S:|S| k,ϕ max h H 1 | Ch | j Ch dp(j, ϕ(j))4. Note that socially fair does not optimize over assignment functions because it assumes assignment follows optimal rule: a point is assigned to the cluster center closest to it. Definition 5. An αSF-socially fair solution to a clustering problem is a solution of cost at most αSF of the optimal socially fair solution. This definition allows us to bound the clustering cost of an αSF-Socially Fair solution (Sα, ϕα) as: max h H 1 | Ch | j Ch dp(j, ϕα(j)) αSF min S:|S| k max h H 1 | Ch | j Ch dp(j, ϕ(j)). Definition 6. Approximately proportional[15]: Given a set of centers S C with |S| = k, it is αAP-approximately proportional (αAP-proportional) if U C and |U| n k and for all y C, there exists i U with αAP d(i, y) d(i, ϕ(i)) . 4p = 1 for the k-median and p = 2 for the k-means We restate the following theorem and give its proof: Theorem 6.1. For any value k 2, the fairness in your neighborhood [30], socially fair constraint [1, 24] are each incompatible with GF even if we allow an additive violation of Ω( n k ) in the GF constraint. For any value k 5, the proportionally fair constraints [15] is incompatible with GF even if we allow an additive violation of Ω( n k ) in the GF constraint. Proof. We show incompatibility of GF with the three Fairness notions. Recall that we consider GF and another fairness constraint at the same time, and incompatible means there are cases where no feasible solution exists that satisfies both constraints at the same time. Lemma 4. For any k 2, there exist a clustering problem where no feasible solution exists that satisfies both Fairness in Your Neighborhood and GF even if we allow an additive violation of Ω( n k ) in the GF constraint. Proof. Consider the case where k 2, and consider the clustering problem on a ℓ-community instance with k = ℓ. We consider the case where lower and upper proportion bound of GF are set to 1 2 for both colors. Claim 5. On the above mentioned clustering problem, an αNR-fair solution in the Fairness in Your Neighborhood notion for finite αNR is a solution in the set of optimal solutions (SCI, ϕCI). Proof. By Definitions 1 and 2, for any point j C, its neighborhood radius is NRC,k(j) = min{r : |Br(j) C| |C|/k} = 0. This is because each point is in one of the l subset, and by definition, the subset is of size n k , and points in the same subset are separated by a distance 0. For a solution on a ℓ-community instance (S, ϕ) (SCI, ϕCI), for any point j C, because S contains a center in the community where j is, d(j, S) = 0. By definition of αNR-fairness, this means on a ℓ-community instance, any solution in (SCI, ϕCI) is αNR-fair with a finite αNR. This is because for any (S, ϕ) (SCI, ϕCI), d(j, S) = 0 αNR NRC,k(j) holds for αNR equal to any finite value. In any solution that is not in (SCI, ϕCI), there is at least a point which is assigned to a center not in the point s own community. Thus for such a solution (S, ϕ), there exist j C, d(i, S) = R. Thus for (S, ϕ) / (SCI, ϕCI), for some j C, there is no finite αNR such that d(j, ϕ(j)) = R αNR NRC,k(j) holds. This shows that a solution that achieves αNR-fairness for a finite αNR must be a solution from the set of solutions (SCI, ϕCI). Thus we have shown that any solution that satisfies the fairness in your neighborhood constraint approximately do not assign points to centers not in its original community. To characterize the set of solutions that satisfy GF with additive Ω( n k ) violation, we consider two cases separately: k is even and k is odd. Consider the case where k 2 is even and refer to Figure 5 where we have ℓ= k communities that alternate from red to blue color. Since the lower and upper proportion bounds are set to 1 2 for both colors, then to satisfy GF each cluster should have both red and blue points. There must exists a cluster Ci of size |Ci| n k , it follows that to satisfy the GF constraints at an additive violation of ρ, then |Cblue i | 1 2|Ci| ρ = n 2k ρ and similarly we would have |Cred i | n 2k ρ. By setting ρ = n 2k ϵ for some constant ϵ > 0, then we have |Cblue i |, |Cred i | > 0. This implies that a point need be assigned to a center at a distance R > 0 for the solution to satisfy GF with additive Ω( n k ) violation. Therefore such a solution is not in the solution set that satisfies fairness in your neighborhood. For a value of k that is odd, see the example Figure 6. Here instead the last community has the same number of red and blue points. We call the cluster whose center is in the last community Clast. If |Clast| = n k , then there are points assigned to the center of Clast from other communities incurring cost R > 0 or points in the last community are assigned to other centers at distance R > 0. In both cases there is at least a point assigned to a center not in its community. If |Clast| = n in the remaining k 1 communities with total of n n k points, k 1 centers are chosen. A solution satisfying GF has one cluster Ci with at least 1 k 1 n n k points. Then we follow the same argument as in the even k case. That is, to satisfy GF with ρ additive violation on the Ci, |Cblue i | 1 2|Ci| ρ = n 2k ρ, |Cred i | n 2k ρ, with ρ = n 2k ϵ for some constant ϵ > 0, at least a point will be assigned to center at distance R > 0. Thus such a solution is not in the set of solutions (SCI, ϕCI). Thus the set of solutions that satisfies fairness in your neighborhood has no overlap with the set of solutions that satisfies GF with Ω( n k ) additive violation. Lemma 5. For any k 2, there exist a clustering problem where no feasible solution exists that satisfies both Socially Fair and GF even if we allow an additive violation of Ω( n k ) to the GF constraint. Proof. We follow a similar line of argument as in Lemma 4. Consider the case when k 2, and consider the clustering problem on a ℓ-community instance with k = ℓ. We consider the case where lower and upper proportion bound of GF are set to 1 2 for both colors. Claim 6. On the above mentioned clustering problem, an αSF-fair solution in the Socially Fair notion for finite αSF is a solution in the set of optimal solutions (SCI, ϕCI). Proof. Denote the clustering cost of an optimal solution to the a Socially Fair clustering problem as OPTSF. By definition, OPTSF = min S:|S| k max h H 1 | Ch | j Ch dp(j, ϕ(j)). We can formulate a problem that aims to find an αSF-socially fair solution as a constrained optimization problem. We use a dummy objective function f. The constraint can be set up as requiring maximum clustering costs across all colors to be upper-bounded by αSF times that of the optimal socially fair solution OPTSF. The constrained program can be set up as below: min S:|S| k f s.t. max h H 1 | Ch | j Ch dp(j, ϕ(j)) αSFOPTSF For a clustering problem with k centers on the ℓ-community instance with k = ℓ, in a solution (S, ϕ) that has one center in each subset, d(j, ϕ(j)) = 0 for each point j C. Thus this solution has clustering cost for each color h as P j Ch dp(j, ϕ(j)) = 0. Which implies that OPTSF = 0. Thus on the ℓ-community instance, feasible solutions to the αSF-socially fair problem, for finite αSF, have maxh H 1 | Ch | P j Ch dp(j, ϕ(j)) = 0. We now show (SCI, ϕCI) is the only set of solutions that have maxh H P j Ch dp(j, ϕ(j)) = 0. Thus, they will be the only feasible solutions. For any solution (S, ϕ) that is not in (SCI, ϕCI), there must be a point assigned to a center that is not in its own community. For such a point d(j, ϕ(j)) = R. Thus maxh H 1 | Ch | P j Ch dp(j, ϕ(j)) R maxh H | Ch |. Therefore, an αSF-fair solution in the socially fair notion for finite αSF must be a solution in the set of optimal solutions (SCI, ϕCI) . At this point, similar to proof of Lemma 4, we had shown that any solution that satisfies socially fair approximately do not assign points to centers not in its original community on the ℓ-community instance. The remaining of the proof is the same as that part of the proof in Lemma 4. We can use the same examples for even k and odd k to show that any solution that satisfies GF with Ω( n k ) additive violation any k 2 is not in the set of solutions (SCI, ϕCI). Thus the set of solutions that satisfies socially fair has no overlap with the set of solutions that satisfies GF with Ω( n k ) additive violation. Lemma 6. For any k 5, there exist a clustering problem where no feasible solution exists that satisfies both Proportional Fairness and GF even if we allow an additive violation of Ω( n k ) in the GF constraint. Proof. For a given value of αAP for the proportionally fair constraint, consider Figure 8. For the GF constraints, the upper and lower bounds for each color to 1 2 and the total number of points n is always even. Consider some k 5. It follows that the sum of cluster sizes assigned to centers on either the right side or the left side would be at least n 2 , WLOG assume that it is the left side and denote the total number of points assigned to clusters on the left size by |CLS| and let SLS be the centers on the left side. The total number of points on the left side may not be assigned to a single center but rather distributed among the centers SLS. To satisfy the GF constraints at an additive violation of ρ, it follows that the number of red points that have to be assigned to the left side is at least P 4 kρ. Set ρ = n 4k n k2 1, then it follows that at least n k red points are assigned to a center on the left at a distance of at least R. Since the maximum distance between any two red points by the triangle inequality is 2r < R αAP it follows that this set of red points forms a blocking coalition. I.e., these points would also have a lower distance from their assigned center if they were instead assigned to a red center. Figure 8: Instances to show incompatibility between Proportional Fairness and GF. We always have n/2 blue points on the left and n/2 red points on the right. For even k we would have k/2 locations for the blue and red points each. For odd k we have k/2 blue locations and k/2 red locations. For each color, there is always a location at the center at a distance r from the other locations. Points of different color are at a distance of at least R from each other. For any value of αAP for the proportionally fair constraint, we set r < R 2αAP . We restate the following theorem and give its proof: Theorem 6.2. For any value k 3, the fairness in your neighborhood [30], socially fair [1, 24] and proportionally fair [15] constraints are each incompatible with DS. Proof. Consider a case of the ℓ-community instance where the first ℓ 1 communities consist of points of only blue points. And the last community contains n 2ℓpoints of red points and n 2ℓpoints of green color. This ℓ-community instance is illustrated in Figure 7. Consider the clustering problem where k = ℓand DS constraint kblue, kred, kgreen > 0. We establish below two claims. Claim 7. On the above mentioned clustering problem, an αNR-fair solution in the Fairness in Your Neighborhood notion for finite αNR is a solution in the set of optimal solutions (SCI, ϕCI). Claim 8. On the above mentioned clustering problem, an αSF-fair solution in the Socially Fair notion for finite αSF is a solution in the set of optimal solutions (SCI, ϕCI). Those two claims can be proved with the same argument as in claim 5 and claim 6. However, satisfying DS on this with kblue, kred, kgreen > 0 requires a center of each color be picked. Thus a solution from the set of optimal solutions (SCI, ϕCI) does not satisfy DS because it will only pick one point from the right most subset as a center. Thus either green points or red points will not appear in the set of centers. On the other hand, since a solution satisfying DS has at least one center of each color, it will contain two centers, one green, one red chosen from the right most subset. And there are k 2 centers allocated to the k 1 communities on the left. By pigeon hole principle, one of the communities of all blue points will have no center allocated. All blue points in this subset are then assigned to a center in a nearby community, thus a DS satisfying solution is not in the set (SCI, ϕCI). Thus the set of DS satisfying solutions has no overlap with the set of solutions that satisfy either one of the two fairness constraints. Below we use the same example to show incompatibility between DS and Proportional Fair. Claim 9. On the above mentioned clustering problem, there is no feasible solution exists that satisfies both Proportional Fairness and DS. Proof. We show a DS satisfying solution on above example is not proportional fair. As argued above, a solution satisfying DS can allocate k 2 centers for the k 1communities on the left. There will be a community of size n k of which all points are assigned to a nearby center not in its community. This community forms a coalition of size n k and would have smaller distance if they get assigned a center in their own community. Therefore a DS satisfying solution is not proportional fair. Remark: For each of the above proofs we constructed an example which is parametric in the number of centers k. Moreover, for these examples for the optimal unconstrained k-center objective to equal 0 at least k centers have to be used. I.e., the points are spread over at least k locations. Furthermore, it is not difficult to see in each of the above examples that an optimal solution for the unconstrained k-center objective satisfies fairness in your neighborhood, socially fair, and the proportionally fair constraints. In fact, it is easy to show that any k-center which has a radius of 0 immediately satisfies fairness in your neighborhood, socially fair, and the proportionally fair constraints. However, the same is not true for GF or DS. This indicates that the above distance-based fairness constraints can be aligned with the clustering cost whereas the same cannot be said about GF or DS. Compatibility between GF and DS: One can easily show compatibility between GF and DS. Specifically, consider some values for the centers over the colors {kh}h H that satisfies the DS constraints, i.e. h H : kl h kh ku h and has P h H kh k. Then simply pick a set Qh of kh points of color h. Now if we give DIVIDE the entire dataset C and the set of centers h HQh as inputs, i.e. call DIVIDE(C, h HQh), then by the guarantees of divide each center would be active and each cluster would satisfy the GF constraints at an additive violation of 2. Our final conclusions about the incompatibility and compatibility of the constrains are summarized in Figure 9. E Example for Running DIVIDE: Consider the following example running the DIVIDE subroutine. Specifically, we have a set of points C with a total of n = |C| = 38 points. We have 3 colors (blue, red, and green) with the following points: |Cblue| = 15, |Cred| = 14, and |Cgreen| = 9. We have Q C with a total size of 4 (|Q| = 4). Accordingly, we have Tblue = 15 4, Tred = 14 2, and Tgreen = 9 4. Therefore, in the beginning of the iteration for each color h (line (8) in algorithm block 1) we have bblue = 3, bred = 2, bgreen = 1. Following the execution of the algorithm, the first three centers q = 0 to q = 2 receive Figure 9: (In)Compatibility of clustering constraints. Red arrows indicate empty feasible set when both constraints are applied, while green arrows indicate non-empty feasibility set when both constraints are applied. Tblue many blue points, the last (q = |Q| 1) and first center (q = 0) receive Tred many red points, and center q = 1 receives Tgreen . All other assignments would be the floor of Th. Figure 10 illustrates this. Figure 10: Diagram illustrating how DIVIDE would run over the example. The tape has different centers (cells) starting from q = 0 and ending with q = |Q| 1. We go over the tape for each color h H. In a given row h, centers marked with an X are assigned Th points, otherwise they are assigned Th points. F Additional Experiments Results Here we show additional experimental results. As a reminder, the lower and upper proportion bounds for any color h to αh = (1+δ)rh and βh = (1 δ)rh for some δ [0, 1]. Further, the DS constraints are set to kl h = θrhk where θ [0, 1] and ku h = k for every color h H. We call our run over the Adult dataset in Section 7 as (A-Adult). In that run δ = 0.2 and θ = 0.8. We also, run another experiment (B-Adult) over the Adult where we set δ = 0.05 and θ = 0.9. Figure 11 shows the new results. We do not see a change qualitatively. It is perhaps noteworthy that the DS-Violation values for COLOR-BLIND and ALG-GF are even higher as well as the GF-Violation for ALG-DS. On the other hand, we find that our algorithms that satisfy GF+DS have very low (almost zero) values for GF-Violation and DS-Violation at a moderate Po F that is comparable to ALG-GF which satisfies only one constraint. Figure 11: B-Adult results: (a) Po F comparison of 5 algorithms, with COLOR-BLIND as baseline; (b) GF-Violation comparison; (c) DS-Violation comparison. Additionally, we show results over the Census1990 dataset where we use age as the color (group) membership attribute. As done in [20] we merge the 9 age groups into 3. Specifically, groups {0, 1, 2}, {4, 5, 6}, and {7, 8} are each merged into one group leading to total of 3 groups. Further, we subsample 6, 000 records from the dataset. We run two experiments where in the first (A-Census1990) we have δ = 0.05 and θ = 0.7 whereas in the second (B-Census1990) we have δ = 0.1 and θ = 0.8. We also use different cluster values. In terms of the 3 objective measures of Po F, GF-Violation, and DS-Violation, we do not see a qualitative change as can be seen from Figures 12 and 13. Specifically, the DS algoruthim (ALG-DS) has a low Po F but high GF-Violation. Further, COLOR-BLIND and ALG-GF have significant DS-Violation values. On the other hand our algorithms for GF+DS have low values for both GF-Violation and DS-Violation and a moderate Po F. Figure 12: A-Census1990 results: (a) Po F comparison of 5 algorithms, with COLOR-BLIND as baseline; (b) GF-Violation comparison; (c) DS-Violation comparison. Figure 13: B-Census1990 results: (a) Po F comparison of 5 algorithms, with COLOR-BLIND as baseline; (b) GF-Violation comparison; (c) DS-Violation comparison. Run-Time: Here we show some run-time analysis results. We first calculate the incremental run time over GF. Specifically, given a solution from the ALG-GF (the algorithm for the GF constraints) we see that the additional run-time to post-process it to satisfy the GF+DS is very small in proportion. We measure t GF GF+DS = Time to process GF Solution to satisfy GF+DS Time to obtain GF Solution and show the results in Figure 14 over all 4 runs. We see that the additional run time is constantly at least two orders of magnitude smaller than the time required to obtain a GF solution. The fact that added run-time is small further encourages a decision maker to satisfy the DS constraint given a solution that satisfies GF only. Figure 14: t GF GF+DS over the 4 runs of (A-Adult), (B-Adult), (A-Census1990), and (BCensus1990). If we were to do the same using the ALG-DS we find the opposite. Specifically, we measure t DS GF+DS = Time to process DS Solution to satisfy GF+DS Time to obtain DS Solution and find that the additional run-time required to satisfy GF+DS starting from a DS solution is orders of magnitude higher in comparison to the time required to satisfy DS as shown in Figure 15. We conjecture that the reason is that ALG-DS is highly optimized in terms of run-time since it runs in O(nk) time [37]. On the other hand, the post-processing step (post-processing a DS solution to a GF+DS) requires solving an LP which although is done in polynomial time, can be more costly in terms of run time. Figure 15: t DS GF+DS over the 4 runs of (A-Adult), (B-Adult), (A-Census1990), and (BCensus1990). Finally, we show a full run-time comparison between GFTOGFDS which starts from a GF solution and DSTOGFDS which starts from a DS solution. We find that the run-times are generally comparable with one algorithm at times being faster than the other. Figure 16: Full run-time comparison between GFTOGFDS and DSTOGFDS over the 4 runs of (A-Adult), (B-Adult), (A-Census1990), and (B-Census1990). Using a Bi-Criteria Algorithm as ALG-GF: Our implementation of GF follows Bercea et al. [10] and Bera et al. [9] which would violate the GF constraints by at most 2. Empirically, this may cause issues for the GFTOGFDS algorithm since it requires a GF algorithm with zero additive violation and therefore assumes that every cluster has at least one point from each color. However, it would not cause issues as long as the resulting solution satisfies condition of having at least one point from each color in every cluster which would be the case if minh H,i S βh| Ci| > 2 where S is the GF solution and Ci is its ith cluster. If the condition not met, then it is reasonable to think that the value of k was set too high or that the dataset includes outlier points since the cluster sizes are very small. Furthermore, using a GF algorithm with an additive violation of 2 lead to a final GF+DS having a GF violation of at most 4 if the condition is satisified. However, empirically we find the GF violation to be generally smaller than 1. Finally, note that we treat the GF algorithm as a block-box and therefore it can be replaced by other algorithms such as those of [17, 40] which have no violation for GF. In our experiments, we run our algorithms over datasets and value of k where the condition is satisifed.