# individual_fairness_for_kclustering__21d1af45.pdf

Individual Fairness for k-Clustering

Sepideh Mahabadi * 1 Ali Vakilian * 2

We give a local search based algorithm for kmedian and k-means (and more generally for any k-clustering with ℓp norm cost function) from the perspective of individual fairness. More precisely, for a point x in a point set P of size n, let r(x) be the minimum radius such that the ball of radius r(x) centered at x has at least n/k points from P. Intuitively, if a set of k random points are chosen from P as centers, every point x P expects to have a center within radius r(x). An individually fair clustering provides such a guarantee for every point x P. In this work, we show how to get a bicriteria approximation for fair k-clustering: The k-median (k-means) cost of our solution is within a constant factor of the cost of an optimal fair k-clustering, and our solution approximately satisﬁes the fairness condition (also within a constant factor).

1. Introduction

Due to the increasingly use of machine learning in decisionmaking tasks such as awarding loans, estimating the likelihood of recidivism (Galindo & Tamayo, 2000; Chouldechova, 2017; Dressel & Farid, 2018; Kleinberg et al., 2018), it is crucial to design fair algorithms from the perspective of each individual input entity. In general, the rich area of algorithmic fairness over the pas few years has had two main aspects, (1) understanding different notions of fairness and formalizing them in the context of learning and optimization tasks (e.g. (Feldman et al., 2015; Kleinberg et al., 2017; Chouldechova & Roth, 2018; Mehrabi et al., 2019; Pessach & Shmueli, 2020)) and (2) designing efﬁcient algorithms with respect to the additional properties caused by the fairness requirement (e.g. (Joseph et al., 2016; Hardt et al., 2016)). Our work in this paper is focused on the latter

*Equal contribution 1TTIC, Chicago, Illinois, USA 2University of Wisconsin, Madison, Wisconsin, USA. Correspondence to: Sepideh Mahabadi <mahabadi@ttic.edu>, Ali Vakilian <vakilian@mit.edu>.

Proceedings of the 37 th International Conference on Machine Learning, Online, PMLR 119, 2020. Copyright 2020 by the author(s).

direction and in particular the design of fair algorithms for a basic task in unsupervised learning, namely clustering. Recently, there has been a large body of work on design of fair algorithms for unsupervised learning tasks and in particular clustering, e.g., (Chierichetti et al., 2017; Bera et al., 2019; Abraham et al., 2019; Har-Peled & Mahabadi, 2019; Elzayn et al., 2019; Baharlouei et al., 2019).

Clustering is a fundamental task with huge number of applications such as feature engineering, recommendation systems and urban planning. Due to its importance, the clustering problem has been studied extensively from the fairness point view over the past few years (Chierichetti et al., 2017; R osner & Schmidt, 2018; Kleindessner et al., 2019a;b; Backurs et al., 2019; Chen et al., 2019; Schmidt et al., 2019; Bercea et al., 2019; Bera et al., 2019; Huang et al., 2019; Ahmadian et al., 2019; Davidson & Ravi, 2019; Abraham et al., 2019). However, most previous results on this topic consider the clustering problem with respect to the notion of group fairness. As introduced by (Chierichetti et al., 2017), in the clustering problem with respect to the group fairness requirement, the high-level goal is to come up with a minimum cost clustering of a given set of points with an extra constraint that requires all clusters to be balanced with respect to a set of speciﬁed protected attributes such as gender, race or religious.

In this paper, following the work of (Jung et al., 2019), we study the clustering problem from the individual fairness point of view: The goal is to design a clustering of the input point set so that all points are treated (approximately) equally. As an example, this is important when the clustering is used in certain infrastructural decisions such as where to open new facilities to serve residents in different neighborhoods. Formally, given a point set P of size n, the fair radius is deﬁned for each point p P as the minimum radius such that the ball B(p, r(p)) contains at least n/k points from P. Intuitively, this is the radius which the point p expects to have a center within, if the centers were to be chosen uniformly at random. Therefore, it is natural to ask for a clustering solution to approximately respects this expectation and provides a clustering that has a center within distance O(r(p)) for every point p P, thus providing an individually fair clustering solution.

As mentioned above, this notion of fair clustering was in-

Individual Fairness for k-Clustering

Figure 1.1. This example shows that for any values of α and k, an optimal solution of k-median (or any other ℓp norm cost functions such as k-means or k-center) can be arbitrarily unfair. This is described in the proof of Observation 1.1 in Appendix A.1. In this example, M denote the pairwise distance of points in left part and D denotes the minimum distance of a point in the right part to a point in the left part. Moreover, the parameters r, R, M and D are picked so that R >> r and M = D >> 2n(R + r).

troduced in (Jung et al., 2019) where the authors showed that it is possible to get a 2-approximate fair clustering, meaning that there exists a feasible solution of a set of k centers where every point p in the input has a center within distance 2r(p). This algorithmic result is based on the previous works of (Chan et al., 2006; Charikar et al., 2010) on metric embedding. Among other results, they showed that this factor of two loss in the fairness is tight: there are metric spaces and conﬁgurations of points where for α < 2 it is impossible to ﬁnd k-centers such that for every point p, the distance of p to its center is at most αr(p). Moreover, they showed empirically that the standard k-median (k-means) algorithm does not provide a good fairness guarantee and further demonstrated the price of fairness on real-world data sets.

1.1. Our Contribution

The above empirical result is conﬁrmed by the following observation which shows that the solution returned by existing standard approaches for k-clustering that are oblivious to this fairness requirement, may be arbitrarily far from being fair.

Observation 1.1. The fairness ratio of an optimal kclustering can be arbitrarily large. (Proof is in Appendix A.1)

This in particular shows the need to design efﬁcient algorithms with this notion of fairness.

In this work, we show how to ﬁnd a fair clustering that also minimizes the k-median or k-means cost (and more generally, any ℓp norm cost functions with p 1). More speciﬁcally, we show that a variant of the local search algorithm provides the following (O(1), O(1))-bicriteria approximation for fair k-median and k-means.

Theorem 1.2. Given any desired fairness parameter α 1,

there exists a polynomial time constant factor bicriteria approximation algorithm: The algorithm can ﬁnd a set of centers C P of size k, such that for each point p P, we have d(p, C) O(α r(p)), and further, cost(C) O(cost(OPTα)). Here cost(OPTα) denotes the minimum clustering cost of P with k centers such that for every point p, there exists a center within distance α r(p).

We remark that while in this paper we only mention k-means and k-median clustering, more generally in Appendix B, we show that our analysis provides (O(1), O(p))-approximation for any cost function of the form (P

x P d(x, S)p)1/p; this in particular implies (O(1), O(1))-approximation for α-fair k-median (when p = 1), (O(1), O(1))-approximation for α-fair k-means (when p = 2) and (O(1), O(log n))-approximation for αfair k-center (when p = log n).

Also, we again note that our result is in contrast to the previous result which only provided one (approximately) feasible fair clustering without minimizing the clustering cost. Our algorithm is based on the local search algorithm and is easy to implement. More precisely, we start with a feasible solution which combines the output of the described algorithm of (Chan et al., 2006; Charikar et al., 2010; Jung et al., 2019) with the standard greedy algorithm of k-center (described in Section 3.2 in more details). Then in successive iterations, the algorithm improves the k-median (k-means) cost while respecting the fairness condition. Moreover, in order to show the theoretical guarantee, the local search algorithm should take swaps of size 4, meaning that it considers swapping of at most 4 centers in its current solution with the points outside of the solution.

Although local search is a widely used approach for kmedian (k-means) clustering, our analysis is more involved and involves new structures that are crucial for handling the fairness constraints. We remark that for the sake of simplicity and readability of the paper, we have not optimized the constants in the approximation factors of the cost and the fairness guarantees.

Experiments. Further, we run experiments on three datasets (Diabetes, Bank, Census) that have been previously used in the context of fair clustering (e.g., see (2017; 2019; 2019; 2019; 2019)). Our experiments show that in compare to the algorithm of (Jung et al., 2019), the k-median cost improves on average by a factor of 1.86, but it loses on fairness by a factor of 1.26 on average.

2. Preliminaries

Throughout the paper, we use P to denote the set of points that we wish to cluster, and the parameter k to denote the number of centers we allow for clustering. For each

Individual Fairness for k-Clustering

x P, we use B(x, r) = {y P : d(x, y) r} to denote the set of points that are contained in the ball of radius r around x. Also, in this paper we consider two main variants of clustering costs, k-median and k-means. In k-median, the goal is to select k centers in P such that the total sum of distances of points to their centers is minimized, min S P :|S| k P

p P d(p, S), where for a set of points S and a point p, the distance d(p, S) is deﬁned to be mins S d(p, s). In k-means, the goal is to select k centers in P such that the sum of the square distances of points to their centers is minimized, min S P :|S| k P

p P d(p, S)2. We also use d(x, y) to denote d(x, y)2 when working under the k-means cost.

More generally, we analyze our algorithm for the fair variant of clustering with any ℓp norm cost function, min S P :|S| k(P

p P d(p, S)p)1/p, where p 1. Besides including k-means and k-median as its special cases, the general ℓp norm cost function implies an approximation guarantee for another common clustering cost function, namely k-center. In standard k-center, to goal is to minimize the maximum distance of points to their centers: min S P :|S| k maxp P d(p, S).

Next, we formally deﬁne a fair radius for each point in the point set. Deﬁnition 2.1 (fair radius). Given a set of n points P in a metric space (X, d) and a parameter ℓ [n], for each x P deﬁne rℓ(x) to be the radius of the minimum ball centered at x that contains (n/ℓ) points of P; rℓ(x) = min(r : |B(x, r)| n/ℓ). Deﬁnition 2.2 (α-fair clustering (Jung et al., 2019)). Given a set of n points P in a metric space (X, d), a k-clustering using a set of centers S is (α, ℓ)-fair if for any x P, d(x, S) α rℓ(x) where d(x, S) denotes the distance of x to its closest neighbor in S. In the case ℓ= k, we succinctly denote it as α-fair k-clustering. Deﬁnition 2.3 (bicriteria approximation). Given a set of points P in a metric space (X, d), an algorithm is a (β, γ)- approximation for α-fair k-clustering of a given cost function cost1 if the solution SOL returned by the algorithm satisﬁes the following properties:

1. Cost guarantee: cost(SOL) β cost(OPTα) where OPTα denotes an optimal solution of α-fair clustering with respect to the given cost function cost, and

2. Fairness guarantee: SOL is a (γ α)-fair kclustering.

Next, we deﬁne critical balls which are crucial in our analysis of the local search algorithm. The notion is used to show that our solution satisﬁes the fairness guarantee.

1k-median, k-means, k-center or more generally any ℓp norm cost function.

Deﬁnition 2.4 (critical balls). Given a collection of n points P in a metric space (X, d), a set of balls B1 = B(c 1, αrk(c 1)), , Bℓ= B(c ℓ, αrk(c ℓ)) (where ℓ k) are called critical if they satisfy the following properties:

C-1 For each x P, d(x, {c 1, , c ℓ}) 6αrk(x),

C-2 For any pair of centers c i and c j, d(ci, cj) > 6α max{rk(ci), rk(cj)}

In Lemma 4.1, we show that there exists a polynomial time algorithm for ﬁnding a set of critical balls of P. We say that a set of centers S is feasible with respect to a set of given critical balls B if for each B B, |B S| 1; each critical ball contains a center from S.

Claim 2.5. Let o be a point in a critical ball B B. The nearest neighbor of o in a given set S of feasible centers with respect to the critical balls cannot belong to a ball other than B. Moreover, the nearest neighbors of two points o1 B1 and o2 B2 where B1 = B2 cannot be the same.

Proof: Consider the critical ball B1 = B(c1, αrk(c1)) that contains o1. Since S is a feasible center set with respect to B, d(o1, NNS(o1)) 2αrk(c1). However the distance of o1 to any other critical ball B2 = B(c2, αrk(c2)) is at least d(c1, c2) αrk(c1) αrk(c2) > 4αrk(c1) using Deﬁnition 2.4. Therefore, the nearest neighbor of o1 cannot be in any ball other than b1.

Now if two points o1 B1 and o2 B2 are in different balls and have the same nearest neighbor s S, it means that their distance is at most d(o1, s) + d(o2, s) 2αrk(c1) + 2αrk(c2) 4α max{rk(c1), rk(c2)}. However by the previous argument their distance is larger than 4αrk(c1) which is a contradiction.

Lastly, for a point x, let NNY (x) denote the nearest neighbor of x in the set of points Y . When Y = P, we drop the subscript and refer to NNP as NN.

3. High Level Description of Our Algorithm

In this section, we provide a high-level description of our local search algorithm for α-fair k-median. We note that the algorithm for α-fair k-clustering with other cost functions (e.g., k-means or more generally any ℓp norm cost) is almost identical but in Section 3.2, where we compute the clustering cost in the local search algorithm, instead of working with pairwise distances we consider the corresponding function of distance (e.g., distance squared for k-means).

3.1. Handling Fairness Constraints via Critical Balls

We use a slightly modiﬁed variant of the greedy approach of (Chan et al., 2006; Charikar et al., 2010) to ﬁnd ℓ k

Individual Fairness for k-Clustering

disjoint critical balls (see Lemma 4.1). Then, we ignore the fairness constraint and instead run the local search algorithm with an extra requirement: Find a set of k centers that are feasible with respect to the critical balls. By Lemma 4.2 in Section 4, these centers result an O(α)-fair k-clustering.

Algorithm 1 Finds a set of critical balls of size at most k

1: Input: set of points P of size n, fairness parameter α 2: Z P, C 3: repeat 4: c argminx Zrk(x) 5: C C {c} 6: Z {x Z : d(x, c) > 6α rk(x)} 7: until Z = 8: return {B(c, αrk(c)) : c C }

3.2. Initialization and Local Search Update

Next, we initiate the algorithm with a feasible set of centers S0 with respect to the constructed set of critical balls B by Algorithm 1. Note that the choice of initial feasible set of centers plays an important role in bounding the number of iterations required by our local search algorithm and consequently the total runtime of our algorithms. In Theorem 3.1, we show that a modiﬁed variant of the standard greedy algorithm of k-center can be used to ﬁnd a good initial set of centers S0. See part I in Algorithm 2.

Then, we go through iterations and in each iteration j, we check whether there exists a swap of size at most t (i.e., replacing t t centers in the current center set Sj with a set of t centers outside of Sj) that results in a feasible set of centers S with respect to B whose clustering cost improves upon the clustering cost with Sj as centers by a factor more than 1/(1 ε), i.e., cost(S ) (1 ε) cost(Sj). If there exits such a set S , then we set Sj+1 = S and proceed to the next iteration; otherwise, we stop the process and output the current set of centers, Sj, as our solution. We refer to such Sj as a set of (t, ε)-stable centers; there is no feasible swaps of size at most t with respect to the critical balls that improve the clustering cost signiﬁcantly . See part II in Algorithm 2.

Theorem 3.1. The local search algorithm stops after O( log n

ε ) iterations. Moreover, the runtime of the algorithm is e O((kn)t+1 ε 1).

Proof: Let C = {c 1, , c ℓ} denote the set of the balls returned by Algorithm 1. To construct S0, we run the following modiﬁed greedy algorithm of k-center (see part I in Algorithm 2).

Initialize S0 to C . Then, go through k ℓrounds and in each round i, add to S0 the point xi P \ S0 to S0 who is the furthest from S0; x P \ S0, d(xi, S0) d(x, S0).

Algorithm 2 local search algorithm w.r.t. critical balls

1: Input: set of points P of size n, set of critical balls B along with their centers C , upper bound t on swap size 2: I. Constructing Initial Center Set S

4: for i = 1 to k ℓdo 5: z argmaxx P \S d(x, S ) 6: S S {z} 7: end for 8: II. Local Search Update 9: repeat 10: S S

11: for i = 1 to t do 12: for T1 S and T2 P \ S of size i do 13: if (S T2) \ T1 is feasible w.r.t B then 14: S (S T2) \ T1 15: if cost(S ) (1 ε) cost(S) then 16: break to line 21 17: end if 18: end if 19: end for 20: end for 21: until cost(S ) (1 ε) cost(S) 22: return S

When the greedy algorithm terminates, let µ denote the maximum distance of a point in P \ S0 to S0; µ := maxx P \S0 d(x, S0). Note S0 is a feasible set of centers with respect to the critical balls B and the clustering cost with S0 as centers is at most n µ. Next, we show that the cost of k-clustering with any feasible set of centers with respect to the critical balls (in particular, the set of optimal centers O) is at least Ω(µ). Consider the k + 1 points in T := S0 {x}. In any k-clustering at least two points p1, p2 in T belong to a same cluster. There are two cases for p1 and p2:

At least one of p1, p2 belongs to T \ C . Let assume that p2 denote the one that added to T later than p1 (the last point added to T is x). Note that by the description of the greedy process, for each xi, d(xi, C {x1, , xi 1}) d(x, S0) = µ. This implies that d(p1, p2) µ. Then, since d( ) satisﬁes the triangle inequality2, the distance of at least one of p1 and p2 to the center of their cluster is Ω(µ). Hence, OPT = Ω(µ).

Both p1, p2 belong to C . Since all points in C belong to different critical balls, by Claim 2.5, in any feasible

2Note that the squared distance (which is used in k-means) does not satisfy the triangle inequality but instead satisfy an approximate version of triangle inequality which is sufﬁcient for our purpose: d(x, y)2 2d(x, z)2 + 2d(z, y)2.

Individual Fairness for k-Clustering

set of centers with respect to B, C belong to different clusters. Hence, this case cannot happen.

We showed that cost(S0) = O(n cost(O)) = O(n OPT). Since, in each round of the local search algorithm the cost decrease by a factor of (1 ε), the total number of rounds before obtaining a t-stable set of centers is O( log n

ε ). Moreover, the time to decide whether there exist a swap of size at most t that improves the clustering cost by at least a factor of (1 ε) is O(kt nt n k) where O(kt nt) bounds the number of swaps of size at most t and kn denotes the required amount of time to recompute the clustering cost for each potential swap. In total, the algorithm runs in time e O((kn)t+1 ε 1).

In Sections 4 and 5 we prove that the algorithm provides a bicriteria approximation. All missing proofs of the paper are presented in the appendix (A.2-A.5).

4. Handling Fairness Constraints

To satisfy the fairness requirement, as a ﬁrst step, by slightly modifying the greedy algorithm of (Chan et al., 2006; Charikar et al., 2010), in Lemma 4.1 we show that given a set of points P we can ﬁnd a set of critical balls in polynomial time. Then, in Lemma 4.2 we show that a feasible set of centers with respect to the critical balls is an approximate α-fair solution.

Lemma 4.1. Given a set of n points P in a metric space (X, d), there is an algorithm that runs in O(n2) and ﬁnds a set of critical balls of size at most k.

Next, we show that in order to provide the fairness guarantee (approximately), it sufﬁces to only satisfy the guarantee for the subset of points generated by Algorithm 1. In particular, this reduces the problem of α-fair k-clustering to an instance of k-clustering with partition constraints: given a collection of points P and a collection of ℓcritical balls B = {B1, , Bℓ} where ℓ k, ﬁnd a minimum cost clustering which is feasible with respect to B.

Note that by the deﬁnition of α-fairness, it is straightforward to verify that an α-fair k-clustering of P is a feasible k-clustering with respect to the critical balls B. In the following lemma we prove that any feasible k-clustering with respect to the critical balls is O(α)-fair.

Lemma 4.2. Given a set of ℓ k of critical balls B centered at C of the input point set P, let S = {s1, , sk} be a feasible k-clustering solution with respect to B. Then, the corresponding clustering using S as centers is an O(α)-fair k-clustering of P.

5. Analysis of Local Search Algorithm for α-Fair k-Median

In this section, we analyze our proposed local search algorithm for the α-fair k-median clustering. In Section 5.2, we adopt the analysis of the local search algorithm for the vanilla k-median (Arya et al., 2004) to the fair k-median problem via the mapping and covering introduced in Section 5.1 and show that the local search approach achieves an (O(1), O(1))-approximation for α-fair k-clustering. Similarly, in Appendix B, we follow the local search analysis of (Kanungo et al., 2004; Gupta & Tangwongsan, 2008) via our new mapping and covering constructions to prove an (O(1), O(1))-approximation for α-fair k-means and more generally any ℓp norm cost function.

5.1. Bounded Mapping and Covering

The analysis of the local search of vanilla k-median (and similarly k-means) relies on the existence of a mapping between any set of (t, ε)-stable centers and the set of optimal centers.

We use O and S to respectively denote an optimal set of centers for α-fair k-median of P and a set of (t, ε)-stable centers of P with respect to the critical balls B. Note that since not all sets of k points in P are feasible centers with respect to B, we have to deal with extra constraints when we design a mapping π : O S.

Next we list the desired properties for the mapping π and introduce a covering Q of the edges in π with certain properties that help us to bound the approximation guarantee of our algorithm. For simplicity, we assume that the sets S and O are disjoint. We remark that if S O is non empty, then we can simply ignore the centers in S O along with their clusters in the optimal solution at no cost.

Deﬁnition 5.1 ( -bounded Mapping). Given a pair of sets of points S, O in a metric space (X, d), we say a mapping π : O S is -bounded if it satisﬁes the following properties:

D-1 Each center in O is mapped to exactly one center in S.

D-2 For all s S, at most centers in O are mapped to s.

Deﬁnition 5.2 ((t, γ)-bounded Covering). Let B = {B1, , Bℓ} be a set of critical balls for P. We deﬁne Q = {Q1, , Qm} as a covering of all edges in π : O S, where each Qi is a subset of the edges (o, π(o)) such that their union covers all edges of the mapping. We refer to each Qi as a partition. For each partition Q Q, let O(Q) denote the set of endpoints in Q that belong to O, and let S(Q) denote the set of endpoints in Q that belong to S. We say that the covering Q is (t, γ)-bounded if it satisﬁes the following properties:

Individual Fairness for k-Clustering

E-1 Each edge (o, π(o)) appears in at least one and at most γ partitions in Q.

E-2 Each partition Q is a set of at most t disjoint edges. In other words, (i) for each pair of o1 = o2 O(Q), π(o1) = π(o2) and (ii) |O(Q)| = |S(Q)| t.

E-3 For each partition Q Q and each critical ball Bj B, | (Bj S) \ S(Q) (Bj O(Q))| 1. In words, by performing the swaps corresponding to the edges in Q, each ball Bj will have at least one center in (S \ S(Q)) O(Q).

E-4 Given a partition Q, for all pair o O(Q) and o O \ O(Q), π(o) = NN(o ).

5.2. Analysis of Local Search

In Sections 5.3 and 5.4, we prove the main result of this section which says that given a pair of sets of centers O and S, we can ﬁnd an O(1)-bounded mapping π together with a (O(1), O(1))-bounded covering Q of the edges in π. In the following we show that if there exist such bounded mapping and covering, then the k-median cost of P using a (t, ε)-stable set of centers S is within a constant factor of an optimal α-fair k-median of P (i.e., clustering using O).

Lemma 5.3. Consider a set of points P in a metric space (X, d) and let S be a set of (t, ε = (1/2tk))-stable centers. Suppose that there exists a pair of -bounded mapping π : O S and (t, γ)-bounded covering Q of the edges in π. Then, cost(S) 2γ (2 + 1) OPT where OPT denotes the cost of an optimal α-fair k-median of P (i.e., the k-median cost using O).

5.3. Construction of Mapping π

Let us start by a few notations and claims that we will use to construct our mapping.

Notations. We use NN to denote the subset of points in S that are the nearest neighbors of a point in O; NN = {s S : o O s.t. NNS(o) = s}. Moreover, for any value of i 0, NN i denotes the subset of S that are the nearest neighbors of exactly i points in O; NN i = {s S : |{o O : NNS(o) = s}| = i}. We also write NN i to denote the set {s S : |{o O : NNS(o) = s}| i}.

Figure 5.1. B1 and B2 are safe balls.

Deﬁnition 5.4 (Safe Ball/Point/Edge). A ball B B is safe if

1. either it has at least two points from S,

2. or it contains a point o O that has a unique nearest neighbor (i.e., NNS(o) NN 1).

A point s S is safe if s / NN 2 and

1. either s is not contained in any ball of B,

2. or the ball containing s is safe.

Moreover, for a pair of vertices o O and s S we say that the edge (o, s) is safe if

1. either s is a safe point,

2. or both s and o belong to the same ball in B.

Lastly, when a ball or point or edge is not safe we denote it as unsafe.

Deﬁnition 5.5 (s NN and sin). For each ball B B, deﬁne s NN(B) to be the point in S that is the nearest neighbor of an arbitrary point o B O. Similarly, for each ball B B, let sin(B) be an arbitrary point s B S.

Observation 5.6. For any pair of balls B1, B2 B, s NN(B1) = s NN(B2).

The observation above follows from Claim 2.5.

Observation 5.7. The set of unsafe points that are in NN 0 is a subset of S

B B sin(B).

Observation 5.8. The number of points in S that are not the nearest neighbor of any point in O is large enough, i.e.,

s S max{0, |NN 1(s)| 1}. (5.1)

The following lemma shows that there exist enough safe points in NN 0.

Lemma 5.9. Let F denote the set of safe points in NN 0. Then,

s S max{0, |NN 1(s)| 2}. (5.2)

Next we will deﬁne a mapping π : O S from which we can later derive a covering Q.

Lemma 5.10 (3-bounded Mapping). There exists a mapping π : O S with the following properties:

1. If NNS(o) NN 1, then π(o) = NNS(o).

Individual Fairness for k-Clustering

Figure 5.2. B is an unsafe ball and s = π(o1) = π(o2). Moreover, s is not the nearest neighbor of any points other than o1 and o2.

2. If NNS(o) NN 3, then the edge (o, π(o)) is safe and π(o) NN 0.

3. If s is the nearest neighbor of exactly two distinct points o1, o2 in O, then π(o1) = π(o2) NN 0. Moreover, one of the following holds

(a) each of the edges (oi, π(oi)) is safe. (b) each of the edges (oi, π(oi)) is unsafe and goes to a ball that has a safe outgoing edge. (c) there exists an (unsafe) ball B B such that (o1, π(o1)) belongs to B and o2 does not belong to any ball (see Figure 5.2).

4. For each s, |{o O : π(o) = s}| 3.

Proof: We design π as follows.

Step 1: NNS(o) NN 1. In this case we set π(o) = NNS(o). This guarantees the ﬁrst property in the lemma.

Step 2: NNS(o) NN 3. By Lemma 5.9 there exist enough safe points in NN 0 so that for each point s NN 3, we can allocate a set of safe points F(s), such that i) |F(s)| = |NN 1(s)| 2, and ii) F(s) F(s ) = for any s = s. Next we deﬁne the mapping π for the set of points in |NN 1(s)| and map them to the points in F(s) such that the in-degree of each safe point in F(s) is at most |NN 1(s)|/(|NN 1(s)| 2) 3. Note that for each o NN 1(s), (o , π(o )) is a safe edge. This guarantees the second property in the lemma.

Claim 5.11. Let F0 denote the set of points in NN 0 for which some points in O are mapped to by the end of step 2. Then, |NN 0 \ F0| |NN 2|.

Proof: The step 2 of the construction of the mapping consumes exactly P

s S max(0, |NN 1(s)| 2) safe points in NN 0; |F0| = P

s S max(0, |NN 1(s)| 2). On the other hand, by Observation 5.8, |NN 0| = P s S max(0, |NN 1(s)| 1). Hence, |NN 0 \ F0| |NN 2|.

Step 3.a: NNS(o) NN 2. First we assign these points to unused safe points (if there exists any) in a way so that if NN(o1) = NN(o2) then π(o1) = π(o2). In this case

both (o1, π(o1)) and (o2, π(o2)) are safe and the degree of π(o1) = π(o2) is exactly 2 which corresponds to property (3a) in the lemma.

Note that by Claim 5.11, at the end of step 2, the number of free points in NN 0 is at least |NN 2| and we have enough free points in NN 0 for mapping the subset of points whose nearest neighbors belong to NN 2. Claim 5.12. All outgoing edges of the mapping π constructed so far (i.e., in step 1, 2 and 3.a) that leave a ball in B are safe.

Proof: All edges of π constructed in steps 2 and 3.a are safe as stated earlier. So consider an outgoing edge (o, s) in π where s = NNS(o) belongs to NN 1. By Claim 2.5, s cannot be in any ball and thus it is a safe point and hence (o, s) is safe.

Next, let NN 2 denote the set of points s NN 2 such that the mapping π is not yet deﬁned for o1, o2 where NNS(o1) = NNS(o2) = s. Moreover, let NN 0 denote the subset of NN 0 that are not yet used in the mapping π so far. Since for each pair o1, o2 where NN(o1) = NN(o2) NN 2, π(o1) = π(o2), it is straightforward to verify that the invariant |NN 0| |NN 2| still holds.

Step 3.b: NNS(o) NN 2.

1. For a pair of points o1 and o2 whose (identical) nearest neighbor belongs to NN 2, if there exists an unsafe ball in B that contains a free point s NN 0 (i.e., is not assigned to any point in O by the mapping π so far) and B contains at least one of o1 and o2 then set π(o1) = π(o2) = s. (Note that using Claim 2.5, o1 and o2 cannot belong to different balls as they share the same nearest neighbor.)

2. Once there is no such pair of points o1 and o2 as in the previous case anymore, we consider an arbitrary one-to-one assignment φ from the free points in NN 2 to NN 0. Finally, for a pair of points o1 and o2 where NNS(o1) = NNS(o2) = s NN 2 we set π(o1) = π(o2) = φ(s).

Next we show that if (o1, π(o1) = s1) is an unsafe edge (which is true for all constructed edges in step 3.b), then either property (3b) or (3c) holds. Suppose for contradiction that (o1, s1) goes to a ball B that does not have any safe outgoing edge and o2 / B. Note that by the feasibility of centers set O, the ball B contains a point in o O where o / {o1, o2}. Since (o1, s1) is an unsafe edge, B S = {s1} which implies that π(o) does not belong to B. Moreover, by our construction (step 3.b-1 above) NNS(o) / NN 2; otherwise, o had to be mapped to s1. Hence, by Claim 5.12, the outgoing edge (o, π(o)) is safe which is a contradiction.

Individual Fairness for k-Clustering

5.4. Construction of the Covering Q

Lemma 5.13 ((4, 6)-bounded Covering). Given a mapping π that satisﬁes conditions of Lemma 5.10, we can ﬁnd a (4, 6)-bounded covering Q.

Corollary 5.14 (fair k-median). The local search algorithm with swaps of size at most 4 returns a (84, 7)- bicriteria approximate solution of α-fair k-median of a point set P of size n in time e O(k5n5).

Proof: By Lemma 4.2, the result of our local search algorithm returns a (7α)-fair k-clustering. By Lemma 5.3 and the existence of a pair of 3-bounded mapping and (4, 6)- bounded covering (see Lemma 5.10 and 5.13), the cost the returned k-clustering is not more that 84 OPT where OPT is the cost of an optimal α-fair k-clustering.

Finally, as we set ε = O(1/k) and by Theorem 3.1 the runtime of the algorithm is e O(k5n5).

6. Experiments

In this section, we provide an empirical evaluation of our local search based algorithm for α-fair k-clustering with k-median and k-means cost functions. We evaluate the empirical performance of the following approaches:

FAIRKCENTER (Jung et al., 2019). First we consider the algorithm of (Jung et al., 2019), where they proposed to perform a binary search to ﬁnd a value of 1 η 2 for which the number of critical balls turns out to be exactly k. Then, the centers of these balls are the clustering centers.3 More precisely, for a given value of η, they run the algorithm of (Chan et al., 2006; Charikar et al., 2010), which is similar to Algorithm 1 but in line 6 a point x is marked as covered if d(x, c) η rk(x) where c is the newly picked center.

Local Search with 1-Swap. Second, we consider our local search algorithm as described in Algorithm 2. However to make it faster, we set t, the maximum size of swaps, equal to 1 instead of 4.

Greedy. Finally as it seems a reasonable heurictic, we also consider the solution after the initialization step for our local search algorithm that is described in Algorithm 2, part I. This algorithm ﬁrst ﬁnds critical balls and then includes their centers to S. Next, it goes through iterations until S becomes of size k. In each iteration it adds the point which furthest from the current set S. This algorithm is the initialization

3(Jung et al., 2019) remarked that while it is not always guaranteed that there exists such α that results in exactly k balls, this approach ﬁnds k centers on most natural datasets.

method we used for our local search algorithm and is described in Algorithm 2, part I.

Dataset. We consider three datasets from UCI Machine Learning Repository (2017)4 which are standard benchmarks for clustering algorithms and in particular they were used in the context of fair k-median clustering in (Chierichetti et al., 2017; Chen et al., 2019; Backurs et al., 2019; Bera et al., 2019; Huang et al., 2019). Formally, we consider the following datasets where in each of them we consider only numerical attributes:

Diabetes. This dataset provides the information and outcome regarding patients related to diabetes from 1999 to 2008 at 130 hospitals across US5. Points in this datasets are in R2 and correspond to age and time-in-hospital attributes.

Bank. This datasets corresponds to information from a Portuguese Bank6. Here, points live in R3 and corresponds to age , balance and duration-of-account .

Census. The dataset is from 1994 US Census7 and here the selected attributes are age , fnlwgt , educationnum , capital-gain and hours-per-week ; points are in R5.

Dataset Dimension # of Points Aspect Ratio Diabetes 2 101, 765 90.2 Bank 3 4, 520 13511.9 Census 5 32, 560 58685

Table 6.1. Some statistics about the datasets used in our experiments. Aspect ratio denotes the ratio between maximum distance and minimum distance.

Finally, in all our experiments we randomly sample a subset of size 1000 points from the data set and run our experiments on this sub-sample.

Experiment Setup. In our experiments, we follow the description of Algorithm 1 and 2. The only discrepancy is that instead of considering a point to be covered if it has a center within distance of 6 times its fair radius, in our implementation, we consider a point covered if it has a center within distance of 3 times its fair radius (see line 6 in Algorithm 1).

In all experiments, the input parameter α to our local search algorithms (i.e., the desired fairness guarantee) is the fair-

4https://archive.ics.uci.edu/ml/datasets 5https://archive.ics.uci.edu/ml/datasets/diabetes+130us+hospitals+for+years+1999-2008

6https://archive.ics.uci.edu/ml/datasets/Bank+Marketing 7https://archive.ics.uci.edu/ml/datasets/adult

Individual Fairness for k-Clustering

5 10 15 20 25 30

fairness parameter

number of centers (k)

Fair KCenter Greedy Local Search

5 10 15 20 25 30

fairness parameter

number of centers (k)

Fair KCenter Greedy Local Search

5 10 15 20 25 30

fairness parameter

number of centers (k)

Fair KCenter Greedy Local Search

Figure 6.1. These plots compare the fairness guarantees of the solutions returned by each of the three algorithms we described for fair k-median in this section on the three datasets Diabetes, Bank and Census.

5 10 15 20 25 30

k-median cost

number of centers (k)

Fair KCenter Greedy Local Search

5 10 15 20 25 30

k-median cost

number of points

Fair KCenter Greedy Local Search

1 2 3 4 5 6

k-median cost

number of centers (k)

Fair KCenter Greedy Local Search

Figure 6.2. These plots compare the k-median costs of the solutions returned by each of the three algorithms we described for fair k-median in this section on the three datasets Diabetes, Bank and Census.

ness approximation 1 η 2 returned by the FAIRKCENTER algorithm of (Jung et al., 2019).

Finally, we consider values of k to be in range 5 to 30 with steps of size 5 and draw our plots as a function of k.

Results. Figures 6.1 and 6.2 show empirical comparisons of the aforementioned algorithms, both in terms of fairness and the k-median cost of the solution. Our plots imply that local-search based algorithms perform reasonably well with respect to the notion of α-fairness: While its fairness guarantee is close to the fairness of FAIRKCENTER, it always exhibits a better performance in terms of k-median cost. More precisely, on average the local search algorithm reports a solution whose cost is better than the algorithm of (Jung et al., 2019) by a factor of 1.4, 2.25, and 1.93 while the reported solution has a worse fairness by a factor of 1.13, 1.5, and 1.16 for Diabetes, Bank and Census respectively.

We note that we observe a similar behavior of local search based algorithm for fair k-means. The plots are presented in Appendix C.

7. Conclusion and Future Direction

We presented the ﬁrst bicriteria approximation algorithm for k-means, k-median and k-center (and more generally, any ℓp

norm cost function) with respect to the notion of individual fairness introduced by (Jung et al., 2019). In particular, Our analysis shows that the popular local search with constant size swaps achieves (O(1), O(1))-approximation for fair k-median and fair k-means.

While our algorithm runs in polynomial time e O(k5n5), an interesting direction is whether we can design a near-linear time algorithm for this problem presumably with worse approximation guarantees.

Acknowledgements

The authors would like to thank Konstantin Makarychev and Yury Makarychev for their helpful discussions on this work.

Abraham, S. S., P, D., and Sundaram, S. S. Fairness in clustering with multiple sensitive attributes. ar Xiv preprint ar Xiv:1910.05113, 2019.

Ahmadian, S., Epasto, A., Kumar, R., and Mahdian, M. Clustering without over-representation. In Proceedings of the 25th ACM SIGKDD International Conference

Individual Fairness for k-Clustering

on Knowledge Discovery & Data Mining, pp. 267 275, 2019.

Arya, V., Garg, N., Khandekar, R., Meyerson, A., Munagala, K., and Pandit, V. Local search heuristics for k-median and facility location problems. SIAM Journal on computing, 33(3):544 562, 2004.

Backurs, A., Indyk, P., Onak, K., Schieber, B., Vakilian, A., and Wagner, T. Scalable fair clustering. In International Conference on Machine Learning, pp. 405 413, 2019.

Baharlouei, S., Nouiehed, M., and Razaviyayn, M. R\ enyi fair inference. ar Xiv preprint ar Xiv:1906.12005, 2019.

Bera, S., Chakrabarty, D., Flores, N., and Negahbani, M. Fair algorithms for clustering. In Advances in Neural Information Processing Systems, pp. 4955 4966, 2019.

Bercea, I. O., Groß, M., Khuller, S., Kumar, A., R osner, C., Schmidt, D. R., and Schmidt, M. On the cost of essentially fair clusterings. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019). Schloss Dagstuhl Leibniz-Zentrum fuer Informatik, 2019.

Chan, T.-H. H., Dinitz, M., and Gupta, A. Spanners with slack. In European Symposium on Algorithms, pp. 196 207, 2006.

Charikar, M., Makarychev, K., and Makarychev, Y. Local global tradeoffs in metric embeddings. SIAM Journal on Computing, 39(6):2487 2512, 2010.

Chen, X., Fain, B., Lyu, L., and Munagala, K. Proportionally fair clustering. In International Conference on Machine Learning, pp. 1032 1041, 2019.

Chierichetti, F., Kumar, R., Lattanzi, S., and Vassilvitskii, S. Fair clustering through fairlets. In Advances in Neural Information Processing Systems, pp. 5036 5044, 2017.

Chouldechova, A. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2):153 163, 2017.

Chouldechova, A. and Roth, A. The frontiers of fairness in machine learning. ar Xiv preprint ar Xiv:1810.08810, 2018.

Cohen, M. B., Elder, S., Musco, C., Musco, C., and Persu, M. Dimensionality reduction for k-means clustering and low rank approximation. In Proceedings of the fortyseventh annual ACM symposium on Theory of computing, pp. 163 172, 2015.

Davidson, I. and Ravi, S. Making existing clusterings fairer: Algorithms, complexity results and insights. Technical report, Technical report, University of California, Davis, CA., 2019.

Dheeru, D. and Karra Taniskidou, E. UCI machine learning repository, 2017. URL http://archive.ics.uci. edu/ml.

Dressel, J. and Farid, H. The accuracy, fairness, and limits of predicting recidivism. Science advances, 4(1):eaao5580, 2018.

Elzayn, H., Jabbari, S., Jung, C., Kearns, M., Neel, S., Roth, A., and Schutzman, Z. Fair algorithms for learning in allocation problems. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 170 179, 2019.

Feldman, M., Friedler, S. A., Moeller, J., Scheidegger, C., and Venkatasubramanian, S. Certifying and removing disparate impact. In proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 259 268, 2015.

Galindo, J. and Tamayo, P. Credit risk assessment using statistical and machine learning: basic methodology and risk modeling applications. Computational Economics, 15(1-2):107 143, 2000.

Gupta, A. and Tangwongsan, K. Simpler analyses of local search algorithms for facility location. ar Xiv preprint ar Xiv:0809.2554, 2008.

Har-Peled, S. and Mahabadi, S. Near neighbor: Who is the fairest of them all? In Advances in Neural Information Processing Systems, pp. 13176 13187, 2019.

Hardt, M., Price, E., and Srebro, N. Equality of opportunity in supervised learning. In Advances in neural information processing systems, pp. 3315 3323, 2016.

Huang, L., Jiang, S., and Vishnoi, N. Coresets for clustering with fairness constraints. In Advances in Neural Information Processing Systems, pp. 7587 7598, 2019.

Johnson, W. B. and Lindenstrauss, J. Extensions of lipschitz mappings into a hilbert space. Contemporary mathematics, 26(189-206):1, 1984.

Joseph, M., Kearns, M., Morgenstern, J. H., and Roth, A. Fairness in learning: Classic and contextual bandits. In Advances in Neural Information Processing Systems, pp. 325 333, 2016.

Jung, C., Kannan, S., and Lutz, N. A center in your neighborhood: Fairness in facility location. ar Xiv preprint ar Xiv:1908.09041, 2019.

Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R., and Wu, A. Y. A local search approximation algorithm for k-means clustering. Computational Geometry, 28(2-3):89 112, 2004.

Individual Fairness for k-Clustering

Kleinberg, J., Mullainathan, S., and Raghavan, M. Inherent trade-offs in the fair determination of risk scores. In 8th Innovations in Theoretical Computer Science Conference (ITCS 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017.

Kleinberg, J., Lakkaraju, H., Leskovec, J., Ludwig, J., and Mullainathan, S. Human decisions and machine predictions. The quarterly journal of economics, 133(1): 237 293, 2018.

Kleindessner, M., Awasthi, P., and Morgenstern, J. Fair kcenter clustering for data summarization. In International Conference on Machine Learning, pp. 3448 3457, 2019a.

Kleindessner, M., Samadi, S., Awasthi, P., and Morgenstern, J. Guarantees for spectral clustering with fairness constraints. In International Conference on Machine Learning, pp. 3458 3467, 2019b.

Makarychev, K., Makarychev, Y., and Razenshteyn, I. Performance of johnson-lindenstrauss transform for k-means and k-medians clustering. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pp. 1027 1038, 2019.

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., and Galstyan, A. A survey on bias and fairness in machine learning. ar Xiv preprint ar Xiv:1908.09635, 2019.

Pessach, D. and Shmueli, E. Algorithmic fairness, 2020.

R osner, C. and Schmidt, M. Privacy preserving clustering with constraints. In 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, July 9-13, 2018, Prague, Czech Republic, pp. 96:1 96:14, 2018.

Schmidt, M., Schwiegelshohn, C., and Sohler, C. Fair coresets and streaming algorithms for fair k-means. In International Workshop on Approximation and Online Algorithms, pp. 232 251. Springer, 2019.