# approximate_group_fairness_for_clustering__65fc59c6.pdf

Approximate Group Fairness for Clustering

Bo Li 1 Lijun Li 2 Ankang Sun 3 Chenhao Wang 4 * Yingfan Wang 5

We incorporate group fairness into the algorithmic centroid clustering problem, where k centers are to be located to serve n agents distributed in a metric space. We reﬁne the notion of proportional fairness proposed in [Chen et al., ICML 2019] as core fairness, and k-clustering is in the core if no coalition containing at least n/k agents can strictly decrease their total distance by deviating to a new center together. Our solution concept is motivated by the situation where agents are able to coordinate and utilities are transferable. A string of existence, hardness and approximability results is provided. Particularly, we propose two dimensions to relax core requirements: one is on the degree of distance improvement, and the other is on the size of deviating coalition. For both relaxations and their combination, we study the extent to which relaxed core fairness can be satisﬁed in metric spaces including line, tree and general metric space, and design approximation algorithms accordingly.

1. Introduction

Motivated by various real-world machine learning algorithm deployments where the data points are real human beings who should be treated unbiasedly, fairness is increasingly concerned. Most traditional algorithms mainly focus on the efﬁciency or proﬁt, and thus fail to ensure fairness for individual point or collection of points. Accordingly, the past several years have seen considerable efforts in developing fair learning algorithms (Chierichetti et al., 2017; Chen et al., 2019; Backurs et al., 2019; Bera et al., 2019).

Following (Chen et al., 2019), we revisit the group fairness

1Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China 2School of Mathematical Sciences, Ocean University of China, Qingdao, China 3Warwick Business School, University of Warwick, United Kingdom 4University of Nebraska-Lincoln, United States 5Department of Computer Science, Duke University, United States. Correspondence to: Chenhao Wang <chenhwang4-c@my.cityu.edu.hk>.

Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s).

in unsupervised learning speciﬁcally, centroid clustering. A canonical clustering problem is described as: given a metric space X with distance measure d : X X R+ {0}, a multiset N X of n data points (in this work, each data point is an agent), a set M X of possible centers and a positive integer k, the task is to ﬁnd a subset Y M of k cluster centers and assign each data point to its closest center in Y . The commonly studied objective is to make data points to be as close to their assigned centers as possible. Standard algorithms, such as k-means and kmedians, solve the clustering problem by satisfying a global criterion, where individual or group-wise happiness has not been taken into consideration. It has been noted that the globally efﬁcient solutions are less preferred, especially when the application scenario is about public resources allocation (Conitzer et al., 2017; Fain et al., 2018). We consider the following facility location problem proposed in (Chen et al., 2019; Micha and Shah, 2020).

Example 1. Suppose the government plans to build k = 11 identical parks to serve the residents, where every resident s cost (e.g. gasoline) is proportional to the distance between her home and the closest park. There is a dense urban center with a population of 10, 000 and 10 small suburbs, each of which has a population of 100. Suppose the suburbs are close to each other (e.g. 10km) compared with the distance between them and the urban center (e.g. 500km).

Accordingly, by k-means or k-medians algorithms, the government will build 1 park at the urban center, and 10 parks for each small suburb. It is not hard to see such a plan is not fair: a single park is used to serve 10, 000 people in the urban area, but each small suburb of 100 people has its own a park. This intuition is formalized by Moulin as the principle of equal entitlement (Moulin, 2003): A solution is fair if it respects the entitlements of groups of agents, namely, every subset of agents that is of sufﬁciently large size is entitled to choose a center for themselves. Such group-wise fairness is also referred as core, which has been extensively studied in game theory (Deng and Papadimitriou, 1994; Chalkiadakis et al., 2011), social choice (Mc Kelvey, 1986; Feldman and Serrano, 2006) and fair division (Abdulkadiro glu and S onmez, 1998; Fain et al., 2018).

Chen et al. (2019) ﬁrst formally and theoretically studied group fairness in the clustering problem, and proposed

Approximate Group Fairness for Clustering

the proportionality, which is essentially core with nontransferable utilities. Informally, a k-clustering is proportional if there is no coalition of agents with size at least n/k such that by deviating together to a new center each member of this coalition gets strictly better off. We note that proportionality overlooks the situation where the agents in a coalition can facilitate internal monetary transfers between themselves if they can ﬁle a justiﬁed claim to build a park at a new location that is better for the coalition overall.

In this work, we reﬁne proportionality by core with transferable utilities or core for short. Formally, the cost of agent i N induced by a cluster center y M is d(i, y), i.e., the distance between i and y; and the cost induced by a k-clustering Y is d(i, Y ) miny Y d(i, y), i.e., the minimum distance from i to any cluster center in Y .

Deﬁnition 2 (Core). For any k-clustering Y , a group of agents S with |S| n

k is called a blocking coalition, if there is a new center y M \ Y such that by deviating to y together, the total distance of S can be strictly decreased, i.e., P

i S d(i, y ) < P

i S d(i, Y ). A k-clustering is called in the core or a core clustering if there is no blocking coalition.

It is not hard to verify that core fairness is stronger than proportionality (Chen et al., 2019) in the sense that a core clustering must be proportional, but not vice versa. Particularly, in Example 1, although a proportional clustering builds 10 parks for the city center and 1 park for all the suburbs, the one for suburbs can be arbitrarily built at any suburb s location. However, a core clustering will select a more preferred location for this park to serve the 10 suburbs by minimizing their total distance. Finally, it can be shown that traditional learning algorithms (such as k-means) can be arbitrarily bad with respect to core fairness.

1.1. Main Contribution

As core clusterings are not guaranteed to exist for all instances, in this work, we provide two relaxation dimensions to weaken the corresponding requirements. The ﬁrst dimension is in parallel with (Chen et al., 2019; Micha and Shah, 2020), where a valid blocking coalition should have large distance improvement. In the second dimension, different from their works, we study the relaxation when the size of a valid blocking coalition is required to be large enough. We formalize the two relaxations in the following deﬁnition.

Deﬁnition 3 (Approximate Core). For α 1 and β 1, we say a k-clustering Y is in the (α, β)-core or an (α, β)- core clustering if there is no S N with |S| α n

k and y M\Y such that β P

i S d(i, y ) < P

i S d(i, Y ).

We investigate to what extent these two relaxations can be satisﬁed by ﬁnding the smallest α and β such that (α, β)- core is nonempty. The relaxation on the size of blocking coalitions is regarded as α-dimension and that on the dis-

tance improvement as β-dimension. We consider both general metric space and special cases, including real line and discrete tree. Line and tree are two widely studied metric spaces, which is partly because they extensively exist in the real world. For example, line can be used to model the situations where people want to set proper temperatures for classrooms or schedule meeting times (Feldman and Wilf, 2013), and trees can be used to model referral or query networks (Kleinberg and Raghavan, 2005; Babaioff et al., 2012). As argued in (Micha and Shah, 2020), though objectives like truthfulness (Alon et al., 2010) and social welfare maximization (Feldman and Wilf, 2013) have received signiﬁcant attention, fairness is largely overlooked.

Table 1. Our results for (1, β)-core.

Line Tree Metric Space

Upper Bound O( n) O( n) 2 n

k + 1 (Thm 8) (Thm 9) (Thm 10)

Lower Bound Ω( n) Ω( n) Ω( n) (Thm 7) (Thm 9) (Thm 7)

(1, β)-Core. We ﬁrst, in Section 3, study the relaxation in β-dimension, where α is ﬁxed to be 1, i.e., (1, β)-core. Our results are summarized in Table 1. Different to the study of proportionality in (Chen et al., 2019), where a constant approximation can be guaranteed for any metric space, we show that for core fairness, the existence of (1, o( n))- core clustering is not guaranteed, even in a real line. On the other hand, when the metric space is a real line or a tree, we present efﬁcient algorithms that always output a (1, O( n))-core clustering, and thus we get the optimal approximation algorithm by relaxing the distance requirement solely. With respect to general metric space, we show that a greedy algorithm ensures O( n

k )-approximation. Beyond the study for arbitrary number k, when k n

2 , we show that for any metric space, there is a polynomial time algorithm which returns a (1, 2)-core clustering, whereas determining the existence of (1, 2 ϵ)-core clustering is NP-complete.

Table 2. Our results for (α, 1)-core.

Line Tree Metric Space

Upper Bound 2 2 k (Thm 14) (Thm 15) (Thm 16)

Lower Bound 2 2 min{k, max{ k

4 }} (Thm 13) (Thm 15) (Thm 16)

(α, 1)-Core. Next, in Section 4, we study the relaxation in α-dimension, where β is ﬁxed to be 1, i.e., (α, 1)-core. Our results are summarized in Table 2. Different to the relaxation in β-dimension, we prove a (2, 1)-core clustering is guaranteed to exist in line and tree spaces. We comple-

Approximate Group Fairness for Clustering

ment this result with a line instance where (2 ϵ, 1)-core is empty for any ϵ > 0. Thus our algorithms are optimal. For general metric space, we observe that a trivial upper-bound for α is k, which can be guaranteed by placing the k centers such that the total distance of all agents is minimized. We also complement this observation with a lower-bound instance where (α, 1)-core is empty for any α min{k, max{ k

4 }}, and thus our algorithm is tight up to a constant factor. Finally, we end this section by proving that determining the existence of (α, 1)-core clustering for any constant α 1 in general metric space is NP-complete.

(α, β)-Core In Section 5, we integrate previous results and study the case when both dimensions can be relaxed. Intuitively, sacriﬁcing the approximation ratio in one dimension should be able to improve that in the other. We prove this intuition afﬁrmatively by quantifying the tradeoff between the two relaxation dimensions. Speciﬁcally,

for line or tree space and any α (1, 2], (α, 1 α 1)-core is always non-empty (Thm 18);

for general metric space and α > 1, (α, max{4, 2 α 1 + 3})-core is always non-empty (Thm 19).

We want to highlight the signiﬁcance of the above two results, especially for the general metric space, which is regarded as the major theoretical contribution of the current work. The results in Sections 3 and 4 imply that, in the general metric space, if α = 1 is not relaxed, the best possible approximation ratio for β is Θ( n); on the other hand, if β = 1 is not relaxed, the best possible approximation ratio for α is max{ k

4 }. However, our results in this section show that if we sacriﬁce a small constant on the approximation for one dimension, we can guarantee constant approximation for the other dimension. For example, by relaxing α to 2, (2, 5)-core is always non-empty, and by relaxing β to 4, (3, 4)-core is always non-empty.

Experiments Finally, in Section 6, we conduct experiments to examine the performance of our algorithms. We note that our algorithms have good theoretical guarantees in the worst case, but they may not ﬁnd the fairest clustering for every instance. Accordingly, we ﬁrst propose a twostage algorithm to reﬁne the clusters and then use synthetic and real-world data sets to show how much it outperforms classic ones regarding core fairness. Actually, the second stage of our algorithm provides us an interface to balance fairness and social efﬁciency. As shown by the experiments, our solution does not sacriﬁce much efﬁciency.

1.2. Other Related Works

Fairness Study in Machine Learning. The necessity of incorporating fairness into machine learning algorithms has

been well recognized in the recent decade. Various fairness concepts have been proposed based on different principles and to adapt to different machine learning tasks. For example, in another seminal work, Chierichetti et al. (2017) also studied fairness issue under clustering context but deﬁned fairness as preserving equal representation for each protected class in every cluster based on disparate impact in US law. To satisfy their fairness requirement, Chierichetti et al. (2017) designed a two-step algorithm which ﬁrst decomposes data points into fairlets and then runs classical clustering algorithms on those fairlets. On one hand, the algorithm is improved by a number of subsequent works in the sense of approximation ratios (Harb and Lam, 2020; Bercea et al., 2019) and running time (Huang et al., 2019; Schmidt et al., 2018; Backurs et al., 2019). On the other hand, Bera et al. (2019); R osner and Schmidt (2018) and Braverman et al. (2019) extended the setting in (Chierichetti et al., 2017) to allow overlap in protected groups, consider multiple protected features or address fairness towards remote data points. Under the context of classiﬁcation, fairness is captured by envy-freeness in (Zafar et al., 2017; Ustun et al., 2019; Balcan et al., 2019). We refer readers to the survey by Mehrabi et al. (2019) for a detailed discussion on fairness study in various machine learning contexts.

Fair Resource Allocation. Group fairness is recently considered in the resource allocation ﬁeld when resources are allocated among groups of agents. Based on how groups are formed, the models can be classiﬁed into two categories. The ﬁrst one is when agents are partitioned into ﬁxed groups and the fairness is only concerned with the pre-existing groups (Segal-Halevi and Suksompong, 2019; Benabbou et al., 2019; Kyropoulou et al., 2020). The second one is to consider the fairness of arbitrarily formed groups of agents (Aziz and Rey, 2020; Berliant et al., 1992; Conitzer et al., 2019; Hossain et al., 2020). Our problem falls under the umbrella of the second category. However, the items in our work are public and fairness is deﬁned for collective utilities. Fair clustering problem is also related to public resource allocation, where resources can be shared by agents. For public resources, one popular research agenda is to impose (combinatorial) constraints on the allocations (Conitzer et al., 2017; Cheng et al., 2019; Fain et al., 2018; Li and Li, 2020). However, in our setting, all centers will be built without constraints.

2. Preliminaries

Recall that N X is a multiset of n agents in a metric space (X, d), where for any {x1, x2, x3} X, d(x1, x2) + d(x2, x3) d(x1, x3). Note that we allow repetitions in N which means multiple agents can be located at the same position. We refer to agents and their locations interchangeably, when no confusion arises. M X is the set of feasible

Approximate Group Fairness for Clustering

locations for cluster centers. Our task is to ﬁnd Y Mk to place the k centers such that the clustering is in the (approximate) core. We ﬁrst present an example where an exact core clustering does not exist, and illustrate the two relaxation dimensions so that an approximate core clustering exists.

Example 4. Consider a complete graph K4 = (V, E) with 4 vertices and the distance between any two vertices is 1. Let X = M = V . Suppose that at each vertex lies an agent (i.e., N = V ), and we want to locate k = 2 centers to cluster these n = 4 agents. First, we note that the (exact) core is empty for this instance, because for any 2-clustering Y M2, the remaining n/k = 2 agents (say u, v) in V \Y can form a blocking coalition and deviate to a new center v V \Y such that d(u, v) + d(v, v) = 1 < 2 = d(u, Y ) + d(v, Y ). Second, for all β 2, every solution Y M2 is a (1, β)-core clustering, because for any group S V with |S| 2 and any possible deviating center v V , we have β P

i S d(i, v) 2 P

i S d(i, Y ). Finally, for all α > 1, every solution Y M2 is an (α, 1)- core clustering, because for any group S V with |S| α n

k > 2 and any possible deviating center v V , we have P i S d(i, v) 2 P i S d(i, Y ).

Motivated by the above example, our task is to compute the smallest β or α such that (1, β)-core or (α, 1)-core is non-empty. Besides the general metric space, we are also interested in two special spaces, namely, real line R and graph spaces such as tree.

Line. X = M = R and the distance d is Euclidean, i.e., the agents and centers can be at anywhere in the real line.

Graph Space. Let G = (V, E) be an undirected tree graph. The edges in E may have speciﬁed length. In the graph space induced by G, X = M = V , and the distance d between two vertices is the length of the shortest path. A tree space is induced by a tree graph. Note that line is a continuous space where every point is feasible for cluster centers, while the graph space is discrete and centers can only be placed on its vertex set.

The following Lemma 5 enables us to only focus on the blocking coalitions with minimum possible size.

Lemma 5. Given α, β 1 and a solution Y [M]k, if there is a group S of size |S| > αn

k such that β P

i S d(i, y ) < P

i S d(i, Y ) for some y M, then there is a group S S of size |S | = αn

k such that β P

i S d(i, y ) < P

i S d(i, Y ).

Proof. If S and y M satisfy |S| > αn

i S d(i,Y ) P

i S d(i,y ) > β, then there must exist an agent w S subject to d(w, Y ) d(w, y ) P i S d(i, Y ) P

i S d(i, y ).

Let S = S\{w}. Then we have

i S d(i, Y ) P

i S d(i, y ) P

i S d(i, Y ) P

i S d(i, y ).

Repeating this process of removing one agent until |S | = αn

k , we establish the proof.

When k = 1 or n 1, the solution that minimizes the total distance to all agents is in the core. When k n

2 and the space is a connected graph G(V, E) with V = X = N = M, by contrast with the result in (Micha and Shah, 2020) where a proportional k-clustering always exists and can be computed efﬁciently, in our setting a core clustering is not guaranteed. We state a stronger result as follows.

Proposition 6. When k = 1 or k n 1, the core is always non-empty. When n

2 k n 2, for any 0 < ϵ 1, the existence of a (1, 2 ϵ)-core clustering is not guaranteed.

Proof. For the ﬁrst argument, let

Y = arg min Y Mk X

i N d(i, Y ), for k = 1 or n 1.

If k = 1, by deﬁnition it is obvious that Y is in the core as the ground agent set N is the unique possible deviating coalition. For k = n 1, a deviating coalition should contain 2 agents. Suppose for contradiction that i and j can decrease their cost by deviating to y / Y . Then we can construct a new clustering Y : (1) y Y and thus d(i, Y ) + d(j, Y ) > d(i, Y ) + d(j, Y ); (2) each of the other n 2 centers is at the optimal position for one of the remaining n 2 agents in N \ {i, j} and thus d(l, Y ) d(l, Y ) for l N \ {i, j}. Therefore, X

i N d(i, Y ) > X

i N d(i, Y ),

which is a contradiction with the deﬁnition of Y .

For the second argument, we consider the graph space induced by a complete graph G = (V, E) with n vertices, where at each vertex lies an agent, and V = X = N = M. The distance between any two vertices is 1. When n

2 k n 2, the minimum size of a possible blocking coalition is n

k = 2. For every k-clustering Y , there must exist two agents without center at their locations. Then they form a blocking coalition (w.r.t. (1, 2 ϵ)-core) with a deviating center on one of them: their total distance to the deviating center is 1, while their total distance to Y is 2. Hence, a (1, 2 ϵ)-core clustering does not exist.

3. (1, β)-Core

In this section, we study the relaxation on distance improvement, and show to what extent a (1, β)-core is non-empty.

Approximate Group Fairness for Clustering

First, a (1, o( n))-core clustering is not guaranteed to exist.

Theorem 7. There is a line instance such that the (1, o( n))-core is empty.

Proof. Consider an instance building in the real line and a set of integer points V = {1, 2, . . . , k + 1} in this line, each of which accommodates k agents. A k-clustering is required to serve all n = k(k + 1) agents by k centers.

Suppose for contradiction that Y is a (1, o( n))-core clustering. Since there are k + 1 agents locations and k centers to be located, there must be a point j V so that no center is located in the interval (j 1

2), i.e., (j 1

2) Y = . Assume w.l.o.g. that j = k + 1 by symmetry. Consider a group S consisting of k agents located at j and one agent located at j + 1. Its size is |S| = k + 1 = n

k , which entitles itself to choose a center. Because every agent at j has a distance at least 1

2 from its location to solution Y , the total distance of this group is P

i S d(i, Y ) 1 2 k. However, if they deviate to a new center y = j, their total distance changes into P

i S d(i, y ) = 1. Then we

i S d(i,Y ) P

i S d(i,y ) k

2 > o( n), which is a contradiction.

Hence, the (1, o( n))-core is empty.

Algorithm 1 ALGl(λ) for Line.

Input: Agents x = {x1, . . . , xn}, number k N+

Output: k-clustering Y = {y1, . . . , yk}

1: Rename the agents such that x1 xn. 2: for i = 1, 2, . . . , k 1 do 3: Locate a center at yi = xλi. 4: end for 5: Let r = min{λk, n}, and locate a center at yk = xr.

Next, we present our algorithm ALGl, as shown in Algorithm 1, which matches the lower-bound in Theorem 7. Roughly, ALGl has a tuneable parameter λ, and guarantees that the number of agents between any two contiguous centers to be no more than λ 1. By selecting the optimal λ depending on k, we can obtain the tight approximation.

Theorem 8. For any line instance, a (1, O( n))-core clustering can be found in linear time. Speciﬁcally, ALGl( n

k ) gives a (1, n

k 1)-core clustering if k = Ω( n), and ALGl( n k+1 ) gives a (1, k)-core clustering if k = o( n).

Proof. Let x = {x1, . . . , xn} be the locations of agents. When k = Ω( n) and k = o( n), we deﬁne λ = n

k and λ = n k+1 respectively, and implement ALGl(λ). The output is a k-clustering Y = {y1, . . . , yk}. Now we prove this solution is a (1, n

k 1)-core clustering when k = Ω( n),

and a (1, k)-core clustering when k = o( n). Therefore, it is always a (1, O( n))-core clustering.

Suppose for contradiction that there is a blocking coalition S N of agents with |S| n k and a deviating center

y R\Y satisfying r :=

i S d(i,Y ) P

i S d(i,y ) > n

r > k) when k = Ω( n) (resp. k = o( n)). Set two virtual points y0 = and yk+1 = + . Assume w.l.o.g. y (yj, yj+1) for some j = 0, . . . , k and d(yj, y ) d(yj+1, y ). Let (N1, N2, N3) be a partition of S with N1 = {i S|xi yj}, N2 = {i S|yj < xi < yj+1}, and N3 = {i S|xi yj+1}. Note that the algorithm guarantees |N2| λ 1. Then we have X

i S d(i, Y ) X

i N1 d(i, yj) + X

i N2 (d(i, y ) + d(y , yj))

i N3 d(i, yj+1), (1)

i S d(i, y ) = X

i N1 (d(i, yj) + d(yj, y )) + X

i N2 d(i, y )

i N3 (d(i, yj+1) + d(yj+1, y )). (2)

Combining Equations (1) and (2), it follows that

i N2 d(y , yj) P

i N1 d(yj, y ) + P

i N3 d(yj+1, y ) |N2| |N1 N3|,

where the second inequality is due to the assumption d(yj, y ) d(yj+1, y ).

When k = Ω( n) and λ = n

k , we have |N2| n

k 1, and |N1 N3| = |S| |N2| 1. It indicates that r n

k 1 which is a contradiction. So Y is a (1, n

k 1)- core clustering.

When k = o( n) and λ = n k+1 , we have |N2| n k+1 1, and |N1 N3| = |S| |N2| n

k n k+1 +1. By a simple calculation we have r k as a contradiction. So Y is a (1, k)-core clustering.

Both the lowerand upper-bound results for line space can be extended to trees. For the proportionality fairness, Micha and Shah (2020) proposed an algorithm Proportionally Fair Clustering for Trees (PFCT) as follows, which always returns a proportional solution for trees. A rooted tree (G, r) is obtained by rooting the tree at an arbitrary node r. Let level(x) denote the height of node x relative to the root r (with level(r) = 1), and ST(x) denote the subtree rooted at node x (i.e. the set of nodes v with level(v) level(x) and

Approximate Group Fairness for Clustering

the unique path from v to r passes by x). Let |ST(x)| be the number of agents contained in the subtree ST(x). PFCT traverses all the nodes from the highest level ones (i.e. the leaves), locates a center on the node whose subtree contains at least n

k agents, and then deletes this subtree. At the end, agents are assigned to the closest center in the output. We adapt PFCT into the following ALGt(λ) with a tuneable parameter λ, which controls the threshold value for locating a center. When λ = n

k , ALGt(λ) is equivalent to PFCT.

Algorithm 2 ALGt(λ) for Tree.

Input: A tree G = (V, E), n agents and integer k Output: k-clustering set Y

1: Let r the root of tree G and d be the height. 2: Y 3: Gd G 4: for l = d to 1 do 5: Gl 1 Gl

6: for every x V with level (x) = ℓand | ST(x)| λ do 7: if |Y | < k then 8: Y Y {x} and Gℓ 1 Gℓ 1 \ ST(x) 9: end if 10: end for 11: end for

Note that it always has |Y | k, and if |Y | < k, we can build k |Y | more centers arbitrarily. When λ = n

k or n k+1 , we observe that, after removing all centers in Y from the tree, the number of agents in every component is at most λ 1. It can be observed that ALGt(λ) is an extension of ALGl(λ) in the tree, and we have the following theorem.

Theorem 9. For any instance in the tree, we can ﬁnd a (1, O( n))-core clustering efﬁciently. In particular, when k = Ω( n), ALGt( n

k ) returns a (1, n

k 1)-core clustering, and when k = o( n), ALGt( n k+1 ) returns a (1, k)-core clustering. Moreover, a (1, o( n))-core clustering is not guaranteed to exist.

3.3. General Metric Space

For the general metric space, we show that a simple greedy algorithm, ALGg, as described in Algorithm 3, has the desired theoretical guarantee. For each x X, we use B(x, δ) = {i N | d(i, x) δ} to denote the set of agents in the ball with center x and radius δ. ALGg continuously grows the radius of each ball centered on each possible center, with the same speed. When a ball is large enough to contain at least n

k points, we open a cluster center at that ball center. Actually, the underlying idea of expanding balls of points has been utilized for k-median problems (Jain and Vazirani, 1999), and Chen et al. (2019) and Micha and Shah (2020) have proved that ALGg also has good

theoretical performance regarding proportional fairness. We note that ALGg may output less than k centers, and if so, the remaining centers can be selected arbitrarily. In Section 6, we will show how to reﬁne ALGg beyond the worst case.

Algorithm 3 ALGg for General Metric Space.

Input: Metric space (X, d), agents N X, possible locations M X, and k N+

Output: k-clustering Y

1: δ 0 ; Y ; N N. 2: while N = do 3: Smoothly increase δ 4: while x Y s.t. |B(x, δ) N| 1 do 5: N N\B(x, δ) 6: end while 7: while x M\Y s.t. |B(x, δ) N| n

k do 8: Y Y {x} and N N\B(x, δ) 9: end while 10: end while

Theorem 10. For any instance in metric space (X, d), algorithm ALGg always outputs a (1, 2 n

k +1)-core clustering.

Proof. Let Y be the k-clustering returned by ALGg. By Lemma 5, it sufﬁces to prove that, for any set of agents S N with |S| = n

i S d(i, Y ) (2 n

i S d(i, y ) holds for any point y M \ Y . Let y Y be the center closest to y , i.e., y arg miny Y d(y, y ). Since for any i S, d(i, Y ) d(i, y ), we have X

i S d(i, Y ) X

i S d(i, y ) X

i S d(i, y ) + X

i S d(y , y ).

(3) We discuss two cases.

Case 1: maxi S d(i, y ) 1

2d(y , y ). If so, then we have

i S d(i, y ) max i S d(i, y ) 1

2d(y , y ).

Combining with (3), it follows that

i S d(i, Y ) P i S d(i, y ) 1 + P

i S d(y , y ) P i S d(i, y )

i S d(y , y ) d(y , y )/2 = 1 + 2 n

Case 2: maxi S d(i, y ) < 1 2d(y , y ). Let δ = maxi S d(i, y ), and accordingly, S B(y , δ ). If there exists a center y Y such that B(y , δ ) B(y , δ ) = , then we have d(y , y ) 2δ < d(y , y ), which however, contradicts to the deﬁnition of y . So when the algorithm operating radius δ as δ , all balls with center in

Approximate Group Fairness for Clustering

Y have an empty intersection with ball B(y , δ ), that is, B(y , δ ) B(y, δ ) = , y Y . Since |S| = n

k and S B(y , δ ), point y must be selected as a center by the algorithm, contradicting to y / Y . Therefore, this case would never occur.

When the number k of cluster centers is large, we are able to get better approximations. Recall Proposition 6 that when k n 1, the core is always non-empty. We complement this result by the following theorem.

Theorem 11. For any metric space with n

2 k n 2, there is an algorithm that computes a (1, 2)-core clustering in O(n2 log n) time.

The (1, 2)-core clustering provided above is best possible, because by Proposition 6, a (1, 2 ϵ)-core clustering may not exist. Finally, we end this section by proving that deciding the existence of a (1, 2 ϵ)-core clustering is hard.

Theorem 12. For any 0 < ϵ 1, the problem of determining the existence of a (1, 2 ϵ)-core clustering is NPcomplete, even if k n

2 and it is in a graph space induced by G = (V, E) with |V | = n.

4. (α, 1)-Core

Next, we study to what extent core fairness can be achieved when only the requirement of blocking coalition s size is relaxed, i.e., ﬁnding the smallest α such that (α, 1)-core is non-empty.

Theorem 13. In line space, for any ϵ > 0, there exists an instance such that the (2 ϵ, 1)-core is empty.

Proof. Consider an instance in the real line with n = C(2C 1) agents and k = 2C 1 centers to be opened for some integer C 3 ϵ . Let K be a sufﬁciently large number, say K = Cn. The agents are partitioned into C parts (N1, , NC) such that for any j [C], Nj = {i1, i2, , i2C 1} contains 2C 1 agents where {i1, , i C 1} are located at j K, agent i C is located at j K + 1 and {i C+1, , i2C 1} are located at j K + 2.

As we only open 2C 1 centers, for any (2 ϵ, 1)-core clustering, there exists Nj such that all agents in Nj are served by a single center. Since the agents in {i1, , i C 1} and {i C+1, , i2C 1} are symmetric with respect to i C, assume w.l.o.g. the center is placed at x j K + 1. Note that 2C 3 (2 ϵ) n

k . We show that S = {i1, . . . , i2C 3} forms a blocking coalition w.r.t. (2 ϵ, 1)-core by deviating to a new center i1: X

i S d(i, i1) = 1 + 2(C 3) < 2C 4

i S d(i, i C) X

i S d(i, x).

Hence, there does not exist a (2 ϵ, 1)-core clustering.

Fortunately, by selecting a proper parameter λ, Algorithm ALGl guarantees the tight approximation ratio.

Theorem 14. For any line instance, ALGl( n

k ) returns a (2, 1)-core clustering.

We continue to consider the tree spaces.

Theorem 15. For any tree instance, we can ﬁnd a (2, 1)- core clustering efﬁciently. For any ϵ > 0, a (2 ϵ, 1)-core clustering is not guaranteed to exist in a tree.

4.3. General Metric Space

By the deﬁnition of (α, 1)-core clustering, a (k, 1)-core clustering always exists, because any potential blocking coalition must contain all agents and any solution containing the optimal single center is a (k, 1)-core clustering. Next, we show that this trivial upper-bound is asymptotically tight.

Theorem 16. The (k, 1)-core is always non-empty. Further, the existence of an (α, 1)-core clustering, for any α min{k, max{ k

4 }}, is not guaranteed.

Finally, we give a hardness result for the existence of an (α, 1)-core clustering.

Theorem 17. For any given constant α 1, the problem of determining the existence of an (α, 1)-core clustering is NP-complete.

5. (α, β)-Core

Finally, we study the approximate fairness by relaxing both dimensions simultaneously. Recalling the results in Sections 3.1 and 4.1, if α = 1, the best possible approximation of β is Θ( n); however, if α is relaxed to 2, we are able to get the optimum in the β-dimension, i.e., β = 1. The following theorem shows the exact tradeoff between the approximations in both dimensions for line and tree spaces.

Theorem 18. For any α > 1, every line or tree instance has a non-empty (α, β)-core with β = max n 1, 1 α 1 o .

Next, we extend the above theorem to general metric space. Again our results in Sections 3.3 and 4.3 show that if αdimension is not relaxed, the best approximation in βdimension is Θ( n); and if β-dimension is not relaxed, the best approximation in α-dimension is max{ k

4 }. As we will see in the following theorem, however, if we sacriﬁce a small constant for one dimension, we can guarantee constant approximation for the other as well.

Approximate Group Fairness for Clustering

Theorem 19. For any α > 1, any instance in a metric space has a non-empty (α, β)-core with β = max n 4, 2 α 1 + 3 o .

Proof. Let Y be the k-clustering returned by ALGg. By Lemma 5, it sufﬁces to prove that, for any set of agents S N with |S| = αn

i S d(i, Y ) max 4, 2 α 1 + 3 X

i S d(i, y )

holds for any point y M \ Y . Suppose for contradiction that there is a blocking coalition S N with |S| = αn

k and a deviating center y M\Y .

Let y Y be the cluster center closest to y , i.e., y arg miny Y d(y, y ). Note that for any i S, d(i, Y ) d(i, y ). Because S is a blocking coalition, we have

i S d(i, Y ) P

i S d(i, y ) > β = max 4, 2 α 1 + 3 .

Consider an open ball X centered at y with radius R := d(y ,y )

2 , X = {i S | d(i, y ) < R}. Note that |X| n

k 1, otherwise by ALGg, y should be selected as a center. Deﬁne

r1 := P i X d(i, Y ) P

i S d(i, y ) and r2 :=

i S\X d(i, Y ) P

i S d(i, y ) .

Then r = r1 + r2. For r1, if r1 1, we have,

r1 P i X d(i, y ) + P i X d(y , y ) P

i X d(i, y ) + P

i S\X d(i, y ) P i X d(y , y ) P

i S\X d(i, y )

|S\X| R 2 n

k + 1 2 α 1.

Combining with the other case r1 < 1, we have r1 max n 1, 2 α 1 o . Then, we derive an upper-bound for r2.

i S\X (d(i, y ) + d(y , y ))

i S\X d(i, y ) 1 + |S \ X| 2R

|S \ X| R = 3,

where the last inequality is due to d(y , y ) = 2R and d(i, y ) R for any i S \ X. Combing the upper-bounds of r1, r2, we have β < r = r1 + r2 max{4, 2 α+1 + 3} = β, which is a contradiction.

6. Experiments

6.1. A Reﬁned Algorithm ALG+ g (obj)

Before examining the performance of our algorithm, we note that though ALGg has good guarantee in the worst-case, it

may not produce the fairest clustering in every instance. For example, in Figure 1a, we randomly partition each of 1,000 nodes into 3 Gaussian-distributed sets with probability 0.2, 0.3 and 0.5 (from left to right). We want to build k = 10 centers to serve the nodes; however, ALGg only returns 4 clusters whose centers are shown by the red stars. Obviously, the extra 6 centers can signiﬁcantly improve its performance, regarding both fairness and efﬁciency.

Algorithm 4 ALG+ g (obj) for General Metric Space.

Input: Metric space (X, d), agents N X, possible locations M X, and k N+

Output: k-clustering C.

1: Initialize C = . 2: Run ALGg on (M, N, k) and get Y = {y1, . . . , yk }. 3: Let Ni be the corresponding cluster centered at yi. 4: Rename so that (|N1| mod n

k ) . . . (|Nk | mod n

5: Let r = k Pk

6: for i = 1, , k do 7: Let ki = |Ni|

n/k if i r; otherwise, ki = |Ni|

8: {yi1, . . . , yiki} = arg min obj(M, Ni, ki). 9: C C {yi1, . . . , yiki}. 10: end for

To improve the performance, we reﬁne ALGg in Algorithm 4, denoted by ALG+ g (obj). Roughly, we ﬁrst use ALGg to obtain a preliminary clustering Y and resultant partition N = i Ni, which guarantees the fairness in the worst case, and then we proportionally assign all centers to these clusters according to their populations, i.e., P

i ki = k and ki |Ni|. Within each preliminary cluster i, the real centers are selected to optimize a social objective function obj(M, Ni, ki) in Line 8. For example, when obj is the kmeans objective (i.e., minimizing the squared Euclidean distance), we refer our algorithm to ALG+ g (k-means). Thus, obj actually provides us an interface to balance fairness and social efﬁciency. In the experiment shown in Figure 1a, we feed ALG+ g (k-means) with k-means++ algorithm (Arthur and Vassilvitskii, 2007)1, which builds centers proportionally to the populations of the three Gaussian sets (i.e., 2,3,5 centers for each). However, if we directly use k-means++ algorithm on N, it builds 4,3,3 centers for each Gaussian cluster, as shown in Figure 1b, where the right set contains half of all points but only gets 3 centers.

6.2. Experiments

We implement experiments on two qualitatively different datasets used for clustering. (1) Gaussian dataset (synthetic): the experiment in Section 6.1 (Figure 1a and 1b) is repeated

1k-means++ algorithm is Lloyd s algorithm for k-means minimization objective with a particular initialization.

Approximate Group Fairness for Clustering

(a) ALG+ g (k-means)

(b) k-means++

(c) α and social cost in Gaussian dataset

(d) β in Gaussian dataset

(e) α and social cost in Mopsi locations

(f) β in Mopsi locations

Figure 1. (a) and (b) depict the clustering centers when two algorithms cluster Gaussian dataset for k = 10. For a range of k = 8, . . . , 17 (horizontal axis), (c) and (d) (resp. (e) and (f)) compare the fairness and efﬁciency in Gaussian dataset (resp. Mopsi locations).

for k from 8 to 17. (2) Mopsi locations in clustering benchmark datasets (Fr anti and Sieranoja, 2018) (real-world): a set of 2-D locations for n = 6014 users in Joensuu. Note that the second dataset concerning human beings is exactly the situation when the data points need to be treated fairly. Both datasets are in Euclidean plane.

We consider the k-means objective as social cost, and compare our algorithm ALG+ g (k-means) with k-means++. For each dataset, we consider a range of values of k. Figure 1c, 1e show the values of α (which is the minimum value such that the output clustering is a (α, 1)-core), and the ratio between the social costs of the two algorithms. Figure 1d, 1f show the values of β. In terms of fairness, ALG+ g (k-means) has a signiﬁcantly lower α and β than k-means++ in most cases (though in few cases it is slightly larger). In terms of efﬁciency, the social cost of ALG+ g (k-means) is bounded by a small constant compared with k-means++, and ours is even as good as k-means++ on Gaussian dataset. To conclude, our algorithm ensures better core fairness for the agents than classic clustering algorithms, and meanwhile, empirically has a good efﬁciency as well.

7. Conclusion

In this work, we revisited the algorithmic fair centroid clustering problem, and proposed a novel deﬁnition for group

fairness core. We demonstrated the extent to which an approximate core clustering is guaranteed to exist. There are many future directions that are worth exploring. For example, it will be interesting to improve the approximation bounds for other spaces such as d-dimensional Euclidean space for d 2, and simple graphs with unit weight. In Supplementary Material, we prove that even when all agents consist of all vertices of a unit-weight tree, an exact core can still be empty. A systematic analysis of upperand lower-bounds is left for future study.

Acknowledgement

The authors thank anonymous reviewers for their constructive comments. This work is supported by The Hong Kong Polytechnic University (Grant No. P0034420).

Approximate Group Fairness for Clustering

Atila Abdulkadiro glu and Tayfun S onmez. 1998. Random serial dictatorship and the core from random endowments in house allocation problems. Econometrica 66, 3 (1998), 689 701.

Noga Alon, Michal Feldman, Ariel D. Procaccia, and Moshe Tennenholtz. 2010. Strategyproof approximation of the minimax on networks. Math. Oper. Res. 35, 3 (2010), 513 526.

David Arthur and Sergei Vassilvitskii. 2007. k-means++: the advantages of careful seeding. In SODA. SIAM, 1027 1035.

Haris Aziz and Simon Rey. 2020. Almost group envyfree allocation of indivisible goods and chores. In IJCAI. ijcai.org, 39 45.

Moshe Babaioff, Shahar Dobzinski, Sigal Oren, and Aviv Zohar. 2012. On bitcoin and red balloons. In EC. ACM, 56 73.

Arturs Backurs, Piotr Indyk, Krzysztof Onak, Baruch Schieber, Ali Vakilian, and Tal Wagner. 2019. Scalable fair clustering. In ICML (Proceedings of Machine Learning Research, Vol. 97). PMLR, 405 413.

Maria-Florina Balcan, Travis Dick, Ritesh Noothigattu, and Ariel D. Procaccia. 2019. Envy-free classiﬁcation. In Neur IPS. 1238 1248.

Nawal Benabbou, Mithun Chakraborty, Edith Elkind, and Yair Zick. 2019. Fairness towards groups of agents in the allocation of indivisible items. In IJCAI. ijcai.org, 95 101.

Suman Kalyan Bera, Deeparnab Chakrabarty, Nicolas Flores, and Maryam Negahbani. 2019. Fair algorithms for clustering. In Neur IPS. 4955 4966.

Ioana Oriana Bercea, Martin Groß, Samir Khuller, Aounon Kumar, Clemens R osner, Daniel R. Schmidt, and Melanie Schmidt. 2019. On the cost of essentially fair clusterings. In APPROX-RANDOM (LIPIcs, Vol. 145). Schloss Dagstuhl - Leibniz-Zentrum f ur Informatik, 18:1 18:22.

Marcus Berliant, William Thomson, and Karl Dunz. 1992. On the fair division of a heterogeneous commodity. Journal of Mathematical Economics 21, 3 (1992), 201 216.

Vladimir Braverman, Shaofeng H.-C. Jiang, Robert Krauthgamer, and Xuan Wu. 2019. Coresets for ordered weighted clustering. In ICML (Proceedings of Machine Learning Research, Vol. 97). PMLR, 744 753.

Georgios Chalkiadakis, Edith Elkind, and Michael J. Wooldridge. 2011. Computational Aspects of Cooperative Game Theory. Morgan & Claypool Publishers.

Xingyu Chen, Brandon Fain, Liang Lyu, and Kamesh Munagala. 2019. Proportionally fair clustering. In ICML (Proceedings of Machine Learning Research, Vol. 97). PMLR, 1032 1041.

Yu Cheng, Zhihao Jiang, Kamesh Munagala, and Kangning Wang. 2019. Group fairness in committee selection. In EC. ACM, 263 279.

Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. 2017. Fair clustering through fairlets. In NIPS. 5029 5037.

Vincent Conitzer, Rupert Freeman, and Nisarg Shah. 2017. Fair public decision making. In Proceedings of the 18th ACM Conference on Economics and Computation (EC). 629 646.

Vincent Conitzer, Rupert Freeman, Nisarg Shah, and Jennifer Wortman Vaughan. 2019. Group fairness for the allocation of indivisible goods. In AAAI. AAAI Press, 1853 1860.

Xiaotie Deng and Christos H. Papadimitriou. 1994. On the complexity of cooperative solution concepts. Math. Oper. Res. 19, 2 (1994), 257 266.

Brandon Fain, Kamesh Munagala, and Nisarg Shah. 2018. Fair allocation of indivisible public goods. In EC. ACM, 575 592.

Allan M Feldman and Roberto Serrano. 2006. Welfare Economics and Social Choice Theory. Springer Science & Business Media.

Michal Feldman and Yoav Wilf. 2013. Strategyproof facility location and the least squares objective. In EC. ACM, 873 890.

Pasi Fr anti and Sami Sieranoja. 2018. K-means properties on six clustering benchmark datasets. Appl. Intell. 48, 12 (2018), 4743 4759. http://cs.uef.fi/sipu/ datasets/

Elfarouk Harb and Ho Shan Lam. 2020. KFC: A scalable approximation algorithm for k-center fair clustering. In Neur IPS.

Safwan Hossain, Andjela Mladenovic, and Nisarg Shah. 2020. Designing fairly Fair classiﬁers via economic fairness notions. In WWW. ACM / IW3C2, 1559 1569.

Lingxiao Huang, Shaofeng H.-C. Jiang, and Nisheeth K. Vishnoi. 2019. Coresets for clustering with fairness constraints. In Neur IPS. 7587 7598.

Kamal Jain and Vijay V. Vazirani. 1999. Primal-dual approximation algorithms for metric facility location and k-median problems. In FOCS. IEEE Computer Society, 2 13.

Approximate Group Fairness for Clustering

Jon M. Kleinberg and Prabhakar Raghavan. 2005. Query incentive networks. In FOCS. IEEE Computer Society, 132 141.

Maria Kyropoulou, Warut Suksompong, and Alexandros A. Voudouris. 2020. Almost envy-freeness in group resource allocation. Theor. Comput. Sci. 841 (2020), 110 123.

Bo Li and Yingkai Li. 2020. Fair resource sharing and dorm assignment. In AAMAS. International Foundation for Autonomous Agents and Multiagent Systems, 708 716.

Richard D Mc Kelvey. 1986. Covering, dominance, and institution-free properties of social choice. American Journal of Political Science (1986), 283 314.

Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2019. A survey on bias and fairness in machine learning. Co RR abs/1908.09635 (2019).

Evi Micha and Nisarg Shah. 2020. Proportionally fair clustering revisited. In ICALP (LIPIcs, Vol. 168). Schloss Dagstuhl - Leibniz-Zentrum f ur Informatik, 85:1 85:16.

Herv e Moulin. 2003. Fair division and collective welfare. MIT Press.

Clemens R osner and Melanie Schmidt. 2018. Privacy preserving clustering with constraints. In ICALP (LIPIcs, Vol. 107). Schloss Dagstuhl - Leibniz-Zentrum f ur Informatik, 96:1 96:14.

Melanie Schmidt, Chris Schwiegelshohn, and Christian Sohler. 2018. Fair coresets and streaming algorithms for fair k-means clustering. Co RR abs/1812.10854 (2018).

Erel Segal-Halevi and Warut Suksompong. 2019. Democratic fair allocation of indivisible goods. Artif. Intell. 277 (2019).

Berk Ustun, Yang Liu, and David C. Parkes. 2019. Fairness without harm: decoupled classiﬁers with preference guarantees. In ICML (Proceedings of Machine Learning Research, Vol. 97). PMLR, 6373 6382.

Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, Krishna P. Gummadi, and Adrian Weller. 2017. From parity to preference-based notions of fairness in classiﬁcation. In NIPS. 229 239.