# approximation_algorithms_for_stochastic_clustering__d8af7e4c.pdf Journal of Machine Learning Research 20 (2019) 1-33 Submitted 10/18; Revised 6/19; Published 8/19 Approximation algorithms for stochastic clustering David G. Harris davidgharris29@gmail.com Department of Computer Science, University of Maryland College Park, MD 20742 Shi Li shil@buffalo.edu University at Buffalo, Buffalo, NY Thomas Pensyl tpensyl@bandwidth.com Bandwidth, Inc. Raleigh, NC Aravind Srinivasan srin@cs.umd.edu Department of Computer Science and Institute for Advanced Computer Studies University of Maryland, College Park, MD 20742 Khoa Trinh khoatrinh@google.com Google, Mountain View, CA 94043 Editor: Maya Gupta We consider stochastic settings for clustering, and develop provably-good approximation algorithms for a number of these notions. These algorithms yield better approximation ratios compared to the usual deterministic clustering setting. Additionally, they offer a number of advantages including clustering which is fairer and has better long-term behavior for each user. In particular, they ensure that every user is guaranteed to get good service (on average). We also complement some of these with impossibility results. Keywords: clustering, k-center, k-median, lottery, approximation algorithms This is an extended version of a paper that appeared in the Conference on Neural Information Processing Systems (Neur IPS) 2018. Research supported in part by NSF Awards CNS-1010789, CCF-1422569, CCF-1749864, CCF-1566356, CCF-1717134, a gift from Google, Inc., and by research awards from Adobe, Inc. and Amazon, Inc. 1. Introduction Clustering is a fundamental problem in machine learning and data science. A general clustering task is to partition the given datapoints such that points inside the same cluster are similar to each other. More formally, consider a set of datapoints C and a set of potential cluster centers F, with a metric d on C F. We define n := |C F|. Given any set S F, each j C is associated with the key statistic d(j, S) = mini S d(i, j). The typical clustering task is to select a set S F which has a small size and which minimizes the values of d(j, S). c 2019 David G. Harris, Shi Li, Thomas Pensyl, Aravind Srinivasan. License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v20/18-716.html. Harris, Li, Pensyl, Srinivasan The size of the set S is often fixed to a value k, and we typically boil down the large collection of values d(j, S) into a single overall objective function. A variety of objective functions and assumptions on sets C and F are used. The most popular problems include 1 the k-center problem: minimize the value maxj C d(j, S) given that F = C. the k-supplier problem: minimize the value maxj C d(j, S) (where F and C may be unrelated); the k-median problem: minimize the summed value P j C d(j, S); and the k-means problem: minimize the summed square value P j C d(j, S)2. An important special case is when C = F (e.g. the k-center problem); since this often occurs in the context of data clustering, we refer to this as the self-contained clustering (SCC) setting. These classic NP-hard problems have been studied intensively for the past few decades. There is an alternative interpretation from the viewpoint of operations research: the sets F and C can be thought of as facilities and clients , respectively. We say that i F is open if i is placed into the solution set S. For a set S F of open facilities, d(j, S) can then be interpreted as the connection cost of client j. This terminology has historically been used for clustering problems, and we adopt it throughout for consistency. However, our focus is on the case in which C and F are arbitrary abstract sets in the data-clustering setting. Since these problems are NP-hard, much effort has been paid on algorithms with small provable approximation ratios/guarantees: i.e., polynomial-time algorithms that produce solutions of cost at most α times the optimal. The current best-known approximation ratio for k-median is 2.675 by Byrka et al. (2017) and it is NP-hard to approximate this problem to within a factor of 1 + 2/e 1.735 (Jain et al., 2002). The recent breakthrough by Ahmadian et al. (2017) gives a 6.357-approximation algorithm for k-means, improving on the previous approximation guarantee of 9+ϵ based on local search (Kanungo et al., 2004). Finally, the k-supplier problem is easier than both k-median and k-means in the sense that a simple 3-approximation algorithm (Hochbaum and Shmoys, 1986) is known, as is a 2-approximation for k-center problem: we cannot do better than these approximation ratios unless P = NP (Hochbaum and Shmoys, 1986). While optimal approximation algorithms for the center-type problems are well-known, one can easily demonstrate instances where such algorithms return a worst-possible solution: (i) all clusters have the same worst-possible radius (2T for k-center and 3T for k-supplier where T is the optimal radius) and (ii) almost all data points are on the circumference of the resulting clusters. Although it is NP-hard to improve these approximation ratios, our new randomized algorithms provide significantly better per-point guarantees. For example, we achieve a new per-point guarantee E[d(j, S)] (1 + 2/e)T 1.736T, while respecting the usual guarantee d(j, S) 3T with probability one. Thus, while maintaining good global quality with probability one, we also provide superior stochastic guarantees for each user. 1. In the original version of the k-means problem, C is a subset of Rℓand F = Rℓand d is the Euclidean metric. By standard discretization techniques (see, e.g., Matouˇsek (2000); Feldman et al. (2007)), the set F can be reduced to a polynomially-bounded set with only a small loss in the value of the objective function. Approximation Algorithms for Stochastic Clustering The general problem we study in this paper is to develop approximation algorithms for center-type problems where S is drawn from a probability distribution over k-element subsets of F; we refer to these as k-lotteries. We aim to construct a k-lottery Ωachieving certain guarantees on the distributional properties of d(j, S). The classical k-center problem can be viewed as the special case where the distribution Ωis deterministic, that is, it is supported on a single point. Our goal is to find an approximating distribution Ωwhich matches the target distribution Ωas closely as possible for each client j. Stochastic solutions can circumvent the approximation hardness of a number of classical center-type problems. There are a number of additional applications where stochasticity can be beneficial. We summarize three here: smoothing the integrality constraints of clustering, solving repeated problem instances, and achieving fair solutions. Stochasticity as interpolation. In practice, robustness of the solution is often more important than achieving the absolute optimal value for the objective function. One potential problem with the (deterministic) center measure is that it can be highly non-robust. As an extreme example, consider k-center with k points, each at distance 1 from each other. This clearly has value 0 (choosing S = C). However, if a single new point at distance 1 to all other points is added, then the solution jumps to 1. Stochasticity alleviates this discontinuity: by choosing k facilities uniformly at random among the full set of k + 1, we can ensure that E[d(j, S)] = 1 k+1 for every point j, a much smoother transition. Repeated clustering problems. Consider clustering problems where the choice of S can be changed periodically: e.g., S could be the set of k locations in the cloud chosen by a serviceprovider. This set S can be shuffled periodically in a manner transparent to end-users. For any user j C, the statistic d(j, S) represents the latency of the service j receives (from its closest service-point in S). If we aim for a fair or minmax service allocation, then our k-center stochastic approximation results ensure that, with high probability, every client j gets long-term average service of at most around 1.736T. The average here is taken over the periodic re-provisioning of S. (Furthermore, we have the risk-avoidance guarantee that in no individual provisioning of S will any client have service greater than 3T.) Fairness in clustering. The classical clustering problems combine the needs of many different points (elements of C) into one metric. However, clustering (and indeed many other ML problems) are increasingly driven by inputs from parties with diverse interests. Fairness in these contexts has taken on greater importance in the current environment where decisions are increasingly made by algorithms and machine learning. Some examples of recent concerns include the accusations of, and fixes for, possible racial bias in Airbnb rentals (Badger, 2016) and the finding that setting the gender to female in Google s Ad Settings resulted in getting fewer ads for high-paying jobs (Datta et al., 2015). Starting with older work such as Schulman et al. (1999), there have been highly-publicized works on bias in allocating scarce resources e.g., racial discrimination in hiring applicants who have very similar resum es (Bertrand and Mullainathan, 2004). Additional work discusses the possibility of bias in electronic marketplaces, whether human-mediated or not (Ayres et al., 2015; Badger, 2016). A fair allocation should provide good service guarantees to each user individually. In data clustering settings where a user corresponds to a datapoint, this means that every point j C should be guaranteed a good value of d(j, S). This is essentially the goal of Harris, Li, Pensyl, Srinivasan k-center type problems, but the stochastic setting broadens the meaning of good per-user service. Consider the following scenarios. Each user, either explicitly or implicitly, submits their data (corresponding to a point in C) to an aggregator such as an e-commerce site. A small number k of users are then chosen as influencer nodes; for instance, the aggregator may give them a free product sample to influence the whole population in aggregate, as in Kempe et al. (2015), or the aggregator may use them as a sparse sketch , so that each user gets relevant recommendations from a influencer which is similar to them. Each point j would like to be in a cluster that is high quality from its perspective, with d(j, S) being a good proxy for such quality. Indeed, there is increasing emphasis on the fact that organizations monetize their user data, and that users need to be compensated for this (see, e.g., Lanier (2014); Ibarra et al. (2018)). This is a transition from viewing data as capital to viewing data as labor. A concrete way for users (i.e., the data points j C) to be compensated in our context is for each user to get a guarantee on their solution quality: i.e., bounds on d(j, S). 1.1. Our contributions and overview In Section 2, we encounter the first clustering problem which we refer to as chance kcoverage: namely, every client j has a distance demand rj and probability demand pj, and we wish to find a distribution satisfying Pr[d(j, S) rj] pj. We show how to obtain an approximation algorithm to find an approximating distribution Ωwith2 Pr S Ω [d(j, S) 9rj] pj. In a number of special cases, such as when all the values of pj or rj are the same, the distance factor 9 can be improved to 3, which is optimal; it is an interesting question to determine whether this factor can also be improved in the general case. In Section 3, we consider a special case of chance k-coverage, in which pj = 1 for all clients j. This is equivalent to the classical (deterministic) k-supplier problem. Allowing the approximating distribution Ωto be stochastic yields significantly better distance guarantees than are possible for k-supplier or k-center. For instance, we find an approximating distribution Ωwith j C ES Ω[d(j, S)] 1.592T and Pr[d(j, S) 3T] = 1 where T is the optimal solution to the (deterministic) k-center problem. By contrast, deterministic polynomial-time algorithms cannot guarantee d(j, S) < 2T for all j, unless P = NP (Hochbaum and Shmoys, 1986). In Section 4, we show a variety of lower bounds on the approximation factors achievable by efficient algorithms (assuming P = NP). For instance, we show that our approximation algorithm for chance k-coverage with equal pj or rj has the optimal distance approximation factor 3, that our approximation algorithm for k-supplier has optimal approximation factor 1 + 2/e, and that the approximation factor 1.592 for k-center cannot be improved below 1 + 1/e. 2. Notation such as S Ω indicates that the random set S is drawn from the distribution Ω. Approximation Algorithms for Stochastic Clustering In Section 5, we consider a different type of stochastic approximation problem based on expected distances: namely, every client has a demand tj, and we seek a k-lottery Ωwith E[d(j, S)] tj. We show that we can leverage any given α-approximation algorithm for k-median to produce a k-lottery Ωwith E[d(j, S)] αtj. (Recall that the current-best α here is 2.675 as shown in Byrka et al. (2017).) In Section 6, we consider the converse problem to Section 3: if we are given a k-lottery Ωwith E[d(j, S)] tj, can we produce a single deterministic set S so that d(j, S) tj and |S| k? We refer to this as a determinization of Ω. We show a variety of determinization algorithms. For instance, we are able to find a set S with |S| 3k and d(j, S) 3tj. We also show a number of nearly-matching lower bounds. 1.2. Related Work With algorithms increasingly running our world, there has been substantial recent interest on incorporating fairness systematically into algorithms and machine learning. One important notion is disparate impact: in addition to requiring that protected attributes such as gender or race not be used (explicitly) in decisions, this asks that decisions not be disproportionately different for diverse protected classes (Feldman et al., 2015). This is developed further in the context of clustering in the work of Chierichetti et al. (2017). Such notions of group fairness are considered along with individual fairness treating similar individuals similarly in Zemel et al. (2013). See Dwork et al. (2012) for earlier work that developed foundations and connections for several such notions of fairness. In the context of the location and sizing of services, there have been several studies indicating that proactive on-site provision of healthcare improves health outcomes significantly: e.g., mobile mammography for older women (Reuben et al., 2002), mobile healthcare for reproductive health in immigrant women (Guruge et al., 2010), and the use of a community mobile-health van for increased access to prenatal care (Edgerley et al., 2007). Studies also indicate the impact of distance to the closest facility on health outcomes: see, e.g., Mc Carthy et al. (2007); Mooney et al. (2000); Schmitt et al. (2003). Such works naturally suggest tradeoffs between resource allocation (provision of such services, including sizing e.g., the number k of centers) and expected health outcomes. While much analysis for facility-location problems has focused on the static case, other works have examined a similar lottery model for center-type problems. Harris et al. (2019, 2017) analyzed models similar to chance k-coverage and minimization of E[d(j, S)], but applied to knapsack center and matroid center problems; they also considered robust versions (in which a small subset of clients may be denied service). While the overall model was similar to the ones we explore here, the techniques are somewhat different. Furthermore, these works focus only on the case where the target distribution is itself deterministic. Similar stochastic approximation guarantees have appeared in the context of approximation algorithms for static problems, particularly k-median problems. Charikar and Li (2012) discussed a randomized procedure for converting a linear-programming relaxation in which a client has fractional distance tj, into a distribution Ωsatisfying ES Ω[d(j, S)] 3.25tj. This property can be used, among other things, to achieve a 3.25-approximation for kmedian. However, many other randomized rounding algorithms for k-median only seek to preserve the aggregate value P j E[d(j, S)], without our type of per-point guarantee. Harris, Li, Pensyl, Srinivasan We also contrast our approach with a different stochastic k-center problem considered in works such as Huang and Li (2017); Alipour and Jafari (2018). These consider a model with a fixed, deterministic set S of open facilities, while the client set is determined stochastically; this model is almost precisely opposite to ours. 1.3. Publicly Verifying the Distributions Our approximation algorithms will have the following structure: given some target distribution Ω, we construct a randomized procedure A which returns some random set S with good probabilistic guarantees matching Ω. Thus the algorithm A is itself the approximating distribution Ω. In a number of cases, we can convert the randomized algorithm A into a distribution Ωwhich has a sparse support (set of points to which it assigns nonzero probability), and which can be enumerated directly. This may cause a small loss in approximation ratio. The distribution Ωcan be publicly verified, and the users can then draw from Ωas desired. Recall that one of our main motivations is fairness in clustering; the ability for the users to verify that they are being treated fairly in a stochastic sense (although not necessarily in any one particular run of the algorithm) is particularly important. 1.4. Notation We define F k to be the collection of k-element subsets of F. We assume throughout that F can be made arbitrarily large by duplicating its elements; thus, whenever we have an expression like F k , we assume without loss of generality that |F| k. We will let [t] denote the set {1, 2, . . . , t}. For any vector a = (a1, . . . , at) and a subset X [t], we write a(X) as shorthand for P i X ai. We use the Iverson notation throughout, so that for any Boolean predicate P we let [[P]] be equal to one if P is true and zero otherwise. For a real number q [0, 1], we define q = 1 q. Given any j C and any real number r 0, we define the ball B(j, r) = {i F | d(i, j) r}. We let θ(j) be the distance from j to the nearest facility, and Vj be the facility closest to j, i.e. d(j, Vj) = d(j, F) = θ(j). Note that in the SCC setting we have Vj = j and θ(j) = 0. For a solution set S, we say that j C is matched to i S if i is the closest facility of S to j; if there are multiple closest facilities, we take i to be one with least index. 1.5. Some Useful Subroutines We will use two basic subroutines repeatedly: dependent rounding and greedy clustering. In dependent rounding, we aim to preserve certain marginal distributions and negative correlation properties while satisfying some constraints with probability one. Our algorithms use a dependent-rounding algorithm from Srinivasan (2001), which we summarize as follows: Proposition 1 There exists a randomized polynomial-time algorithm Dep Round(y) which takes as input a vector y [0, 1]n, and outputs a random set Y [n] with the following properties: Approximation Algorithms for Stochastic Clustering (P1) Pr[i Y ] = yi, for all i [n], (P2) Pn i=1 yi |Y | Pn i=1 yi with probability one, (P3) Pr[Y S = ] Q i S(1 yi) for all S [n]. We adopt the following additional convention: suppose (y1, . . . , yn) [0, 1]n and S [n]; we then define Dep Round(y, S) S to be Dep Round(x), for the vector x defined by xi = yi[[i S]]. The greedy clustering procedure takes an input a set of weights wj and sets Fj F for every client j C, and executes the following procedure: Algorithm 1 Greedy Cluster(F, w) 1: Sort C as C = {j1, j2, . . . , jℓ} where wj1 wj2 wjℓ. 2: Initialize C = 3: for t = 1, . . . , ℓdo 4: if Fjt Fj = for all j C then update C C {jt} 5: Return C Observation 2 If C = Greedy Cluster(F, w) then for any j C there is z C with wz wj and Fz Fj = . 2. The Chance k-coverage Problem In this section, we consider a scenario we refer to as the chance k-coverage problem: every point j C has demand parameters pj, rj, and we wish to find a k-lottery Ωsuch that Pr S Ω[d(j, S) rj] pj. (1) If a k-lottery satisfying (1) exists, we say that parameters pj, rj are feasible. We refer to the special case wherein every client j has a common value pj = p and a common value rj = r, as homogeneous. Homogeneous instances naturally correspond to fair allocations, for example, k-supplier is a special case of the homogeneous chance k-coverage problem, in which pj = 1 and rj is equal to the optimal k-supplier radius. Our approximation algorithms for this problem will be based on a linear programming (LP) relaxation which we denote Pchance. It has fractional variables bi, where i ranges over F (bi represents the probability of opening facility i), and is defined by the following constraints: (B1) P i B(j,rj) bi pj for all j C, (B2) b(F) = k, (B3) bi [0, 1] for all i F. Proposition 3 If parameters p, r are feasible, then Pchance is nonempty. Harris, Li, Pensyl, Srinivasan Proof Consider a distribution Ωsatisfying (1). For each i F, set bi = Pr S Ω[i S]. For j C we have pj = Pr W i B(j,rj) i S P i B(j,rj) Pr[i S] = P i B(j,rj) bi and thus (B1) is satisfied. We have k = E[|S|] = P i F Pr[i S] = b(F) and so (B2) is satisfied. (B3) is clear, so we have demonstrated a point in Pchance. For the remainder of this section, we assume we have a vector b Pchance and focus on how to round it to an integral solution. By a standard facility-splitting step, we also generate, for every j C, a set Fj B(j, rj) with b(Fj) = pj. We refer to this set Fj as a cluster. In the SCC setting, it will also be convenient to ensure that j Fj as long as bj = 0. As we show in Section 4, any approximation algorithm must either significantly give up a guarantee on the distance, or probability (or both). Our first result is an approximation algorithm which respects the distance guarantee exactly, with constant-factor loss to the probability guarantee: Theorem 4 If p, r is feasible then one may efficiently construct a k-lottery Ωsatisfying Pr S Ω[d(j, S) rj] (1 1/e)pj. Proof Let b Pchance and set S = Dep Round(b). This satisfies |S| Pn i=1 bi k = k as desired. Each j C has Pr[S Fj = ] Y i Fj (1 bi) Y i Fj e bi = e b(Fi) = e pj. and then simple analysis shows that Pr[d(j, S) rj] Pr[S Fj = ] 1 e pj (1 1/e)pj. As we will later show in Theorem 20, this approximation constant 1 1/e is optimal. We next turn to preserving the probability guarantee exactly with some loss to distance guarantee. As a warm-up exercise, let us consider the special case of half-homogeneous problem instances: all the values of pj are the same, or all the values of rj are the same. A similar algorithm works both these cases: we first select a set of clusters according to some greedy order, and then open a single item from each cluster. We summarize this as follows: Algorithm 2 Rounding algorithm for half-homogeneous chance k-coverage 1: Set C = Greedy Cluster(Fj, wj) 2: Set Y = Dep Round(p, C ) 3: Form solution set S = {Vj | j Y }. Algorithm 2 opens at most k facilities, as the dependent rounding step ensures that |Y | P j C pj = P j C b(Fj) P i F bi k. The only difference between the two cases is the choice of weighting function wj for the greedy cluster selection. Proposition 5 1. Suppose that pj is the same for every j C. Then using the weighting function wj = rj ensures that every j C satisfies Pr[d(j, S) 3rj] pj. Furthermore, in the SCC setting, it satisfies Pr[d(j, S) 2rj] pj. Approximation Algorithms for Stochastic Clustering 2. Suppose rj is the same for every j C. Then using the weighting function wj = 1 pj ensures that every j C satisfies Pr[d(j, S) 3rj] pj. Furthermore, in the SCC setting, it satisfies Pr[d(j, S) 2rj] pj. Proof Let j C. By Observation 2 there is z C with wz wj and Fj Fz = . In either of the two cases, this implies that pz pj and rz rj. Letting i Fj Fz gives d(j, z) d(j, i) + d(z, i) rj + rz 2rj. This z C survives to Y with probability pz pj, and in that case we have d(z, S) = θ(z). In the SCC setting, this means that d(z, S) = 0; in the general setting, we have θ(z) rz rj. 2.1. Approximating the General Case We next show how to satisfy the probability constraint exactly for the general case of chance k-coverage, with a constant-factor loss in the distance guarantee. Namely, we will find a probability distribution with Pr S Ω[d(j, S) 9rj] pj. The algorithm is based on iterated rounding, in which the entries of b go through an unbiased random walk until b becomes integral (and, thus corresponds to a solution set S). Because the walk is unbiased, the probability of serving a client at the end is equal to the fractional probability of serving a client, which will be at least pj. In order for this process to make progress, the number of active variables must be greater than the number of active constraints. We ensure this by periodically identifying and discarding clients which will be automatically served by serving other clients. This is similar to a method of Krishnaswamy et al. (2018), which also uses iterative rounding for (deterministic) approximations to kmedian with outliers and k-means with outliers. The sets Fj will remain fixed during this procedure. We will maintain a vector b [0, 1]F and maintain two sets Ctight and Cslack with the following properties: (C1) Ctight Cslack = . (C2) For all j, j Ctight, we have Fj Fj = (C3) Every j Ctight has b(Fj) = 1, (C4) Every j Cslack has b(Fj) 1. (C5) We have b(S j Ctight Cslack Fj) k Given our initial solution b for Pchance, setting Ctight = , Cslack = C will satisfy criteria (C1) (C5); note that (C4) holds as b(Fj) = pj 1 for all j C. Proposition 6 Given any vector b [0, 1]F satisfying constraints (C1) (C5) with Cslack = , it is possible to generate a random vector b [0, 1]F such that E[b ] = b, and with probability one b satisfies constraints (C1) (C5) as well as having some j Cslack with b (Fj) {0, 1}. Harris, Li, Pensyl, Srinivasan Proof We will show that any basic solution b [0, 1]F to the constraints (C1) (C5) with Cslack = must satisfy the condition that b(Fj) {0, 1} for some j Cslack. To obtain the stated result, we simply modify b until it becomes basic by performing an unbiased walk in the nullspace of the tight constraints. So consider a basic solution b. Define A = S j Ctight Fj and B = S j Cslack Fj. We assume that b(Fj) (0, 1) for all j Cslack, as otherwise we are done. First, suppose that b(A B) > 0. So there must be some pair j Cslack, j Ctight with i Fj Fj such that bi > 0. Since b(Fj ) = 1, there must be some other i Fj with bi > 0. Consider modifying b by incrementing bi by ϵ and decrementing bi by ϵ, for some sufficiently small ϵ. Constraint (C5) is preserved. Since Fj Fj = for all j Ctight, constraint (C3) is preserved. Since the (C4) constraints are slack, then for ϵ sufficiently small they are preserved as well. This contradicts that b is basic. Next, suppose that b(A B) = 0 and b(A B) < k strictly. Let j Cslack and i Fj with bi > 0; if we change bi by ϵ for sufficiently small ϵ, this preserves (C4) and (C5); furthermore, since i / A, it preserves (C3) as well. So again b cannot be basic. Finally, suppose that b(A B) = 0 and b(A B) = k. So b(B) = k |A| and b(B) > 0 as Cslack = . Therefore, there must be at least two elements i, i B such that bi > 0, bi > 0. If we increment bi by ϵ while decrementing bi by ϵ, this again preserves all the constraints for ϵ sufficiently small, contradicting that b is basic. We can now describe our iterative rounding algorithm, Algorithm 3. Algorithm 3 Iterative rounding algorithm for chance k-coverage 1: Find a fractional solution b to Pchance and form the corresponding sets Fj. 2: Initialize Ctight = , Cslack = C 3: while Cslack = do 4: Draw a fractional solution b with E[b ] = b according to Proposition 6. 5: Select some v Cslack with b (Fv) {0, 1}. 6: Update Cslack Cslack {v} 7: if b (Fv) = 1 then 8: Update Ctight Ctight {v} 9: if there is any z Ctight Cslack such that rz rv/2 and Fz Fv = then 10: Update Ctight Ctight {z}, Cslack Cslack {z} 11: Update b b . 12: For each j Ctight, open an arbitrary center in Fj. To analyze this algorithm, define Ct tight, Ct slack, bt to be the values of the relevant variables at iteration t. Since every step removes at least one point from Cslack, this process must terminate in T n iterations. We will write bt+1 to refer to the random value b chosen at step (4) of iteration t, and vt denote the choice of v Cslack used step in step (5) of iteration t. Proposition 7 The vector bt satisfies constraints (C1) (C5) for all times t = 1, . . . , T. Approximation Algorithms for Stochastic Clustering Proof The vector b0 does so since b satisfies Pchance. Proposition 6 ensures that step (4) does not affect this. Also, removing points from Ctight or Cslack at step (6) or (1) will not violate these constraints. Let us check that adding vt to Ctight will not violate the constraints. This step only occurs if bt+1(vt) = 1, and so (C3) is preserved. Since we only move vt from Cslack to Ctight, constraints (C1) and (C5) are preserved. Finally, to show that (C2) is preserved, suppose that Fvt Fvs = for some other vs which was added to Ctight at time s < t. If rvt rvs, then step (10) would have removed vt from Cslack, making it impossible to enter Ct tight. Thus, rvt rvs; this means that when we add vt to Ct tight, we also remove vs from Ct tight. Corollary 8 Algorithm 3 opens at most k facilities. Proof At the final step (12), the number of open facilities is equal to |Ctight|. By Proposition 7, the vector b T satisfies constraints (C1) (C5). So b(Fj) = 1 for j Ctight and Fj Fj = for j, j Ctight; thus |Ctight| = P j Ctight b(Fj) k. Proposition 9 If j Ct tight for any time t, then d(j, S) 3rj. Proof Let t be maximal such that j Ct tight. We show the desired claim by induction on t. When t = T, then this certainly holds as step (12) will open some facility in Fj and thus d(j, S) rj. Suppose that j was added into Cs tight, but was later removed from Ct+1 tight due to adding z = vt. Thus there is some i Fz Fj. When we added j in time s, we would have removed z from Cs tight if rz rj/2. Since this did not occur, it must hold that rz < rj/2. Since z is present in Ct+1 tight, the induction hypothesis implies that d(z, S) 3rz and so d(j, S) d(j, i) + d(i, z) + d(z, S) rj + rz + 3rz rj + (rj/2)(1 + 3) = 3rj. Theorem 10 Every j C has Pr[d(j, S) 9rj] pj. Proof We will prove by induction on t the following claim: suppose we condition on the full state of Algorithm 3 up to time t, and that j Ct tight Ct slack. Then Pr[d(j, S) 9rj] bt(Fj). (2) At t = T, this is clear; since CT slack = , we must have j CT tight, and so d(j, S) rj with probability one. For the induction step at time t, note that as E[bt+1(Fj)] = b(Fj), in order to prove (2) it suffices to show that if we also condition on the value of bt+1, it holds that Pr[d(j, S) 9rj | bt+1] bt+1(Fj). (3) If j remains in Ct+1 tight Ct+1 slack, then we immediately apply the induction hypothesis at time t + 1. So the only non-trivial thing to check is that (3) will hold even if j is removed from Ct+1 tight Ct+1 slack. Harris, Li, Pensyl, Srinivasan If j = vt and bt+1(Fj) = 0, then (3) holds vacuously. Otherwise, suppose that j is removed from Ct tight at stage (10) due to adding z = vt. Thus rj rz/2 and there is some i Fj Fz. By Proposition 9, this ensures that d(z, S) 3rz. Thus with probability one we have d(j, S) d(j, i) + d(i, z) + d(z, S) rj + rz + 3rz rj + (2rj)(1 + 3) = 9rj. This proves the induction. The claimed result follows since b0(Fj) = pj and C0 slack = C. 3. Chance k-coverage: Approximating the Deterministic Case An important special case of k-coverage is where pj = 1 for all j C. Here, the target distribution Ωis just a single set S satisfying jd(j, S) rj. In the homogeneous case, when all the rj are equal to the same value, this is specifically the k-supplier problem. The usual approximation algorithm for this problem chooses a single approximating set S, in which case the best guarantee available is d(j, S) 3rj. We improve the distance guarantee by constructing a k-lottery Ωsuch that d(j, S) 3rj with probability one, and ES Ω[d(j, S)] crj, where the constant c satisfies the following bounds: 1. In the general case, c = 1 + 2/e 1.73576; 2. In the SCC setting, c = 1.60793; 3. In the homogeneous SCC setting, c = 1.592.3 We show matching lower bounds in Section 4; the constant value 1 + 2/e is optimal for the general case (even for homogeneous instances), and for the third case the constant c cannot be made lower than 1 + 1/e 1.367. We remark that this type of stochastic guarantee allows us to efficiently construct publicly-verifiable lotteries. Proposition 11 Let ϵ > 0. In any of the above three cases, there is an expected polynomial time procedure to convert the given distribution Ωinto an explicitly-enumerated k-lottery Ω , with support size O(log n ϵ2 ), such that Pr S Ω [d(j, S) 3rj] = 1 and ES Ω [d(j, S)] c(1 + ϵ)rj. Proof Take X1, . . . , Xt as independent draws from Ωfor t = 6 log n cϵ2 and set Ω to be the uniform distribution on {X1, . . . , Xt}. To see that ES Ω [d(j, S)] c(1 + ϵ)rj holds with high probability, apply a Chernoffbound, noting that d(j, X1), . . . , d(j, Xt) are independent random variables in the range [0, 3rj]. We use a similar algorithm to Algorithm 2 for this problem: we choose a covering set of clusters C , and open exactly one item from each cluster. The main difference is that 3. This value was calculated using some non-rigorous numerical analysis; we describe this further in what we call Pseudo-Theorem 17 Approximation Algorithms for Stochastic Clustering instead of opening the nearest item Vj for each j C , we instead open a cluster according to the solution b of Pchance. Algorithm 4 Rounding algorithm with clusters 1: Set C = Greedy Cluster(Fj, rj). 2: Set F0 = F S j C Fj; this is the set of unclustered facilities 3: for j C do 4: Randomly select a point Wj Fj according to the distribution Pr[Wj = i] = bi // This is a valid probability distribution, as b(Fj) = 1 5: Let S0 Dep Round(b, F0) 6: Return S = S0 {Wj | j C } We will need the following technical result in order to analyze Algorithm 4. Proposition 12 For any set U F, we have Pr[S U = ] Y i U F0 (1 bi) Y j C (1 b(U Fj)) e b(U). Proof The set U contains each Wj independently with probability b(U Fj). The set S0 is independent of them and by (P3) we have Pr[U S0 = ] Q i U F0(1 bi). So Pr[S U = ] Y i U F0 (1 bi) Y j C (1 b(U Fj)) Y i U F0 e bi Y j C e b(U Fj) = e b(U) At this point, we can show our claimed approximation ratio for the general (non-SCC) setting: Theorem 13 For any j C, the solution set S of Algorithm 4 satisfies d(j, S) 3rj with probability one and E[d(j, S)] (1 + 2/e)rj. Proof By Observation 2, there is some v C with Fj Fv = and rv rj. Letting i Fj Fv, we have d(j, S) d(j, i) + d(i, v) + d(v, S) rj + rv + rv 3rj. with probability one. If S Fj = , then d(j, S) rj. Thus, a necessary condition for d(j, S) > rj is that S Fj = . Applying Proposition 12 with U = Fj gives Pr[d(j, S) > rj] Pr[S Fj = ] e b(Fj) = e 1 and so E[d(j, S)] rj + 2rj Pr[d(j, S) > rj] (1 + 2/e)rj. Harris, Li, Pensyl, Srinivasan 3.1. The SCC Setting To motivate the algorithm for the SCC setting (where C = F), note that if some client j C has some facility i opened in a nearby cluster Fv, then this guarantees that d(j, S) d(j, v)+d(v, i) 3rj. This is what we used to analyze the non-SCC setting. But, if instead of opening facility i, we opened v itself, then this would ensure that d(j, S) 2rj. Thus, opening the centers of a cluster can lead to better distance guarantees compared to opening any other facility. We emphasize that this is only possible in the SCC setting, as in general we do not know that v F. We use the following Algorithm 5, which takes a parameter q [0, 1] which we will discuss how to set shortly. We recall that we have assumed in this case that j Fj for every j C. (Also, note our notational convention that q = 1 q; this will be used extensively in this section to simplify the formulas.) Algorithm 5 Rounding algorithm for k-center 1: Set C = Greedy Cluster(Fj, rj). 2: Set F0 = F S j C Fj; this is the set of unclustered facilities 3: for j C do 4: Randomly select Wj Fj according to the distribution Pr[Wj = i] = qbi + q[[i = j]] 5: Let S0 = Dep Round(b, F0) 6: Return S = S0 {Wj | j C } This is the same as Algorithm 4, except that some of the values of bi for i Fj have been shifted to the cluster center j. In fact, we can think of Algorithm 5 as a two-part process: we first modify the fractional vector b to obtain a new fractional vector b defined by qbi + q if i C qbi if i Fj {j} for j C and we then execute Algorithm 4 on the resulting vector b . In particular, Proposition 12 remains valid with respect to the modified vector b . Theorem 14 Let j C. After running Algorithm 5 with q = 0.464587 we have d(j, S) 3rj with probability one and E[d(j, S)] 1.60793rj. Proof Let D = {v C | Fj Fv = , rv rj}; note that D = by Observation 2. For each v D {0}, set av = b(Fj Fv) and observe that a0 + P v D av = b(Fj) = 1. As before, a necessary condition for d(j, S) > rj is that Fj S = . So by Proposition 12, Pr[d(j, S) > rj] Pr[Fj S = ] Y i Fj F0 (1 b i) Y v C (1 b (Fv Fj)) i Fj F0 (1 bi) Y v D (1 qb(Fv Fj)) e b(Fj F0) Y v D (1 qb(Fv Fj)) v D eav(1 qav) = (1/e) Y v D eav(1 qav). Approximation Algorithms for Stochastic Clustering where the last equality comes from the fact that a0 + a(D) = 1. Similarly, if there is some i S D, then d(j, S) d(v, i) 2rj. Thus, a necessary condition for d(j, S) > 2rj is that S (D Fj) = . Applying Proposition 12 with U = D Fj gives: Pr[d(j, S) > 2rj] Y v (D Fj) F0 (1 bv) Y v C (1 b ((D Fj) Fv)) e b(Fj F0) Y q(1 av) = (1/e) Y v D eavq(1 av) Putting these together gives: E[d(j, S)] rj 1 + 1/e Y v D eav(1 qav) + 1/e Y v D eavq(1 av) (4) Let us define s = a(D) and t = |D|. By the AM-GM inequality we have: E[d(j, S)] rj 1 + es 1 1 qs/t t + es 1qt(1 s/t)t This is a function of a single real parameter s [0, 1] and a single integer parameter t 1. Some simple analysis, which we omit here, shows that E[d(j, S)] 1.60793rj. 3.2. The Homogeneous SCC Setting From the point of view of the target distribution Ω, this setting is equivalent to the classical k-center problem. We may guess the optimal radius, and so we do not need to assume that the common value of rj is given to us by some external process. By rescaling, we assume without loss of generality here that rj = 1 for all j. We will improve on Theorem 14 through a more complicated rounding process based on greedily-chosen partial clusters. Specifically, we select cluster centers π(1), . . . , π(n), wherein π(i) is chosen to maximize b(Fπ(i) Fπ(1) Fπ(i 1)). By renumbering C, we may assume without loss of generality that the resulting permutation π is the identity; therefore, we assume throughout this section that C = F = [n] and for all pairs i < j we have b(Fi F1 Fi 1) b(Fj F1 Fi 1) (5) For each j [n], we let Gj = Fj F1 Fj 1 and we define zj = b(Gj). We say that Gj is a full cluster if zj = 1 and a partial cluster otherwise. We note that the values of z appear in sorted order 1 = z1 z2 z3 zn 0. We use the following randomized Algorithm 7 to select the centers. Here, the quantities Qf, Qp (short for full and partial) are drawn from a joint probability distribution which we discuss later. Harris, Li, Pensyl, Srinivasan Algorithm 7 Partial-cluster based algorithm 1: Draw random variables Qf, Qp. 2: Z Dep Round(z) 3: for j Z do 4: Randomly select Wj Gj according to the distribution Pr[Wj = i] = (qjyi + qj[[i = j]])/zj where qj is defined as ( Qf if zj = 1 Qp if zj < 1 5: Return S = {Wj | j Z} Before the technical analysis of Algorithm 7, let us provide some intuition. It may be beneficial to open the center of some cluster near a given client j C as this will ensure d(j, S) 2. However, there is no benefit to opening more than one such cluster center. So, we would like a significant negative correlation between opening the centers of distinct clusters near j. Unfortunately, the full clusters all look alike, and so it seems impossible to enforce any significant negative correlation among them. Partial clusters break the symmetry. There is at least one full cluster near j, and possibly some partial clusters as well. We will create a probability distribution with significant negative correlation between the event that partial clusters open their centers and the event that full clusters open their centers. This decreases the probability that j will see multiple neighboring clusters open their centers, which in turn gives an improved value of E[d(j, S)]. The dependent rounding in Algorithm 7 ensures that |Z| Pn j=1 zj = Pn j=1 b(Fj F1 Fj 1) = b(F) k, and so |S| k as required. We also need the following technical result; the proof is essentially the same as Proposition 12 and is omitted. Proposition 15 For any U C, we have Pr[S U = | Qf, Qp] 1 qjb(U Gj) qjzj[[j U]] . Our main lemma to analyze Algorithm 7 is the following: Lemma 16 Let i [n]. Define Jf, Jp [n] as Jf = {j [n] | Fi Gj = , zj = 1} Jp = {j [n] | Fi Gj = , zj < 1} Let m = |Jf| and Jp = {j1, . . . , jt} where j1 j2 jt. For each ℓ= 1, . . . , t + 1 define uℓ= b(Fi Gjℓ) + b(Fi Gjℓ+1) + + b(Fi Gjt) Then 1 u1 u2 ut ut+1 = 0 and m 1. Furthermore, if we condition on a fixed value of (Qf, Qp) then we have E[d(i, S)] 1 + 1 Qf ℓ=1 (1 Qp(uℓ uℓ+1)) + ℓ=1 (uℓ+ Qpuℓ+1). Approximation Algorithms for Stochastic Clustering Proof For ℓ= 1, . . . , t we let aℓ= b(Fi Gjℓ) = uℓ uℓ+1. For j Jf, we let sj = b(Fi Gj). First, we claim that zjℓ uℓfor ℓ= 1, . . . , t. For, by (5), we have zjℓ b(Fi F1 Fjℓ 1) b(Fi) X j Jf b(Fi Gj) X v 1 is that no point in Fi is open. Applying Proposition 15 with U = Fi yields Pr[S Fi = ] Y j Jf (1 Qfb(Fi Gj) Qf[[j Fi]]) Y j Jp (1 Qpb(Fi Gj)) Qpzj[[j Fi]]) j Jf (1 Qfb(Fi Gj)) Y j Jp (1 Qpb(Fi Gj)) = Y j Jf (1 Qfsj) ℓ=1 (1 Qpaℓ). A necessary condition for d(i, S) > 2 is that we do not open any point in Fi, nor do we open center of any cluster intersecting with Fi. Applying Proposition 15 with U = Fi Jf Jp, and noting that zjℓ uℓ, we get: Pr[S U = ] Y j Jf (1 Qfb(U Gj) Qf) Y j Jp (1 Qpb(U Gj)) Qpzj) j Jf (1 Qfb(Fi Gj) Qf) ℓ=1 (1 Qpb(Fi Gjℓ) Qpzjℓ) ℓ=1 (1 Qpaℓ Qpzjℓ) Y ℓ=1 (1 Qpaℓ Qpuℓ) E[d(i, S)] 1 + Y j Jf (1 Qfsj) ℓ=1 (1 Qpaℓ) + Y ℓ=1 (1 Qpaℓ Qpuℓ) (6) The sets Gj partition [n], so P j f sj = 1 Pt ℓ=1 aℓ= 1 u1. So by the AM-GM inequality, we have E[d(i, S)] 1 + 1 Qf ℓ=1 (1 Qpaℓ) + ℓ=1 (1 Qpaℓ Qpuℓ). (7) Harris, Li, Pensyl, Srinivasan The claim follows as aℓ= uℓ uℓ+1. Lemma 16 gives an upper bound on E[d(i, S)] for a fixed distribution on Qf, Qp and for fixed values of the parameters m, u1, . . . , ut. By a computer search, along with a number of numerical tricks, we can obtain an upper bound over all values for m and all possible sequences u1, . . . , ut satisfying 1 = u1 u2 ut. This gives the following result: Pseudo-Theorem 17 Suppose that Qf, Qp has the following distribution: ( (0.4525, 0) with probability p = 0.773436 (0.0480, 0.3950) with probability 1 p . Then for all j C we have d(j, S) 3rj with probability one, and E[d(j, S)] 1.592rj. We call this a pseudo-theorem because some of the computer calculations used doubleprecision floating point for convenience, without carefully tracking rounding errors. In principle, this could have been computed using rigorous numerical analysis, giving a true theorem. Since there are a number of technical complications in this calculation, we defer the details to Appendix A. 4. Lower Bounds on Approximating Chance k-coverage We next show lower bounds for the chance k-coverage problems of Sections 2 and 3. These constructions are adapted from lower bounds for approximability of k-median (Guha and Khuller, 1999), which in turn are based on the hardness of set cover. Formally, a set cover instance consists of a collection of sets B = {S1, . . . , Sm} over a ground set [n]. For any set X [m] we define SX = S i X Si. The goal is to select a collection X [m] of minimum size such that SX = [n]. The minimum value of |X| thus obtained is denoted by OPT. We quote a result of Moshkovitz (2015) on the inapproximability of set cover. Theorem 18 (Moshkovitz (2015)) Assuming P = NP, there is no polynomial-time algorithm which guarantees a set-cover solution X with SX = [n] and |X| (1 ϵ) ln n OPT, where ϵ > 0 is any constant. We will need a simple corollary of Theorem 18, which is a (slight reformulation) of the hardness of approximating max-coverage. Corollary 19 Assuming P = NP, there is no polynomial-time algorithm which guarantees a set-cover solution X with |X| OPT and SX cn for any constant c > 1 1/e. Proof Suppose for contradiction that A is such an algorithm. We will repeatedly apply A to solve residual instances, obtaining a sequence of solutions X1, X2, . . . ,. Specifically, for iterations i 1, we define Ui = [n] S j 1 1/e, we have 1 ln( 1 1 c ) (1 Ω(1)), which contradicts Theorem 18. Theorem 20 Assuming P = NP, there is a family of homogeneous chance k-coverage instances, with a feasible demand of pj = rj = 1 for all clients j, such that no polynomialtime algorithm can guarantee a distribution Ωwith either of the following: 1. j ES Ω[d(j, S)] crj for constant c < 1 + 2/e 2. j Pr S Ω[d(j, S) < 3rj] cpj for constant c > 1 1/e In particular, approximation constants in Theorem 4, Proposition 5, and Theorem 13 cannot be improved. Proof Consider a set cover instance B = {S1, . . . , Sm}. We begin by guessing the value OPT (there are at most m possibilities, so this can be done in polynomial time). We define a k-center instance with k = OPT and with disjoint client and facility sets, where F is identified with [m] and C is identified with [n]. We define d by d(i, j) = 1 if j Si and d(i, j) = 3 otherwise. If X is an optimal solution to B then d(j, X) 1 for all points j C. So there exists a (deterministic) distribution with feasible demand parameters pj = 1, rj = 1. Also, note that d(j, S) {1, 3} with probability one for any client j. Thus, if j satisfies the second property Pr S Ω[d(j, S) < 3rj] cpj for constant c > 1 1/e, then it satisfies E[d(j, S)] 1+2(1 cpj) = 3 2c = (1+2/e Ω(1))rj. So it will satisfy the first property as well. So it suffices to show that no algorithm A can satisfy the first property. Suppose that A does satisfy the first property. The resulting solution set S F can be regarded as a solution X to the set cover instance, where d(j, S) = 1 + 2[[j / SX]]. Thus X j [n] d(j, S) = |SX| + 3(n |SX|), and so |SX| = 3n P j [n] d(j,S) 2 . As E[d(j, S)] crj = c for all j, this implies that E[|SX|] (3 c)n 2 . After an expected constant number of repetitions of this process we can ensure that |SX| c n for some constant c > 3 (1+2/e) 2 = 1 1/e. This contradicts Corollary 19. A slightly more involved construction applies to the homogeneous SCC setting. Theorem 21 Assuming P = NP, there is a family of homogeneous SCC chance k-coverage instances, with a feasible demand of pj = rj = 1 for all j, such that no polynomial-time algorithm can guarantee a distribution Ωwith either of the following: 1. j ES Ω[d(j, S)] crj for constant c < 1 + 1/e 2. j Pr S Ω[d(j, S) < 2rj] cpj for constant c > 1 1/e Harris, Li, Pensyl, Srinivasan In particular, the approximation constants in Theorem 4 and Proposition 5 cannot be improved for SCC instances, and the approximation factor 1.592 in Pseudo-Theorem 17 cannot be improved below 1 + 1/e. Proof Consider a set cover instance B = {S1, . . . , Sm}, where we have guessed the value OPT = k. We define a k-center instance as follows. For each i [m], we create an item vi and for each j [n] we create t = n2 distinct items wj,1, . . . , wj,t. We define the distance by setting d(vi, wj,t) = 1 if j Si and d(vi, vi ) = 1 for all i, i [m], and d(x, y) = 2 for all other distances. This problem size is polynomial (in m, n), and so A runs in time poly(m, n). If X is an optimal solution to the set cover instance, the corresponding set S = {vi | i X} satisfies d(j, S) 1 for all j C. So the demand vector pj = rj = 1 is feasible. Also, note that that d(j, S) {0, 1, 2} with probability one for any j. Thus, if j satisfies the second property Pr S Ω[d(j, S) < 2rj] cpj for constant c > 1 1/e, then it satisfies E[d(j, S)] 1 + 1(1 cpj) = (1 + 1/e Ω(1))rj. So it will satisfy the first property as well. So it suffices to show that no algorithm A can satisfy the first property. Suppose that algorithm A satisfies the first property. From the solution set S, we construct a corresponding set-cover solution by X = {i | vi S}. For wj,ℓ/ S, we can observe that d(wj,ℓ, S) = 1 + [[j / SX]]. Therefore, we have ℓ=1 d(wj,ℓ, S) X j,ℓ:wj,ℓ/ S (1 + [[j / SX]]) X j,ℓ (1 + [[j / SX]]) 2|S| n2(2n |SX|) 2k, and so |SX| 2n P j,ℓd(wj,ℓ,S) n2 2k/n2. Taking expectations and using our upper bound on E[d(j, S)], we have E[|SX|] 2n cn 2k/n2 (2 c)n 2/n. Thus, for n sufficiently large, after an expected constant number of repetitions of this process we get |SX| (2 c o(1))n (1 1/e + Ω(1))n. This contradicts Corollary 19. 5. Approximation Algorithm for E[d(j, S)] In the chance k-coverage problem, our goal is to achieve certain fixed values of d(j, S) with a certain probability. In this section, we consider another criterion for Ω; we wish to achieve certain values for the expectation ES Ω[d(j, S)]. We suppose we are given values tj for every j C, such that the target distribution Ωsatisfies ES Ω[d(j, S)] tj. (8) In this case, we say that the vector tj is feasible. As before, if all the values of tj are equal to each other, we say that the instance is homogeneous. We show how to leverage any approximation algorithm for k-median with approximation ratio α, to ensure our target distribution Ωwill satisfy ES Ω[d(j, S)] (α + ϵ)tj. Approximation Algorithms for Stochastic Clustering More specifically, we need an approximation algorithm for weighted k-median. In this setting, we have a problem instance I = F, C, d along with non-negative weights wj for j C, and we wish to find S F k minimizing P j C wjd(j, S). (Nearly all approximation algorithms for ordinary k-median can be easily adapted to the weighted setting, for example, by replicating clients.) If we fix an approximation algorithm A for (various classes of) weighted k-median, then for any problem instance I we define αI = sup weights w P j C wjd(j, A(I, w)) min S (F k) P j C wjd(j, S). We first show how to use the k-median approximation algorithm to achieve a set S which matches the desired distances tj: Proposition 22 Given a weighted instance I and a parameter ϵ > 0, there is a polynomialtime algorithm to produce a set S F k satisfying: 1. P j C wj d(j,S) tj (αI + O(ϵ)) P j C wj, 2. Every j C has d(j, S) ntj/ϵ. Proof We assume αI = O(1), as constant-factor approximation algorithms for k-median exist. By rescaling w, we assume without loss of generality that P j C wj = 1. By rescaling ϵ, it suffices to show that d(j, S) O(ntj/ϵ). Let us define the weight vector zj = ϵ/n+wj tj . Letting Ωbe a distribution satisfying (8), we have j C zjd(j, S) = X tj )tj = ϵ|C|/n + X j C wj = 1 + ϵ. In particular, there exists some S F k with P j C zjd(j, S) 1 + ϵ. When we apply algorithm A with weight vector z, we thus get a set S F k with P j C zjd(j, S) αI(1+ϵ). We claim that this set S satisfies the two conditions of the theorem. First, we have j C zjd(j, S) αI(1 + ϵ) (αI + O(ϵ)) X Next, for any given j C, we have tj d(j, S)zj(n/ϵ) (n/ϵ) X w C zwd(w, S) (n/ϵ)αI(1 + ϵ) O(n/ϵ). Theorem 23 There is an algorithm which takes as input an instance I, a parameter ϵ > 0 and a feasible vector tj, runs in time poly(n, 1/ϵ), and returns an explicitly enumerated distribution Ωwith support size n and ES Ω[d(j, S)] (αI + ϵ)tj for all j C. Harris, Li, Pensyl, Srinivasan Proof We assume without loss of generality that ϵ 1; by rescaling ϵ it suffices to show that E[d(j, S)] (αI + O(ϵ))tj. We begin with the following Algorithm 8, which uses a form of multiplicative weights update with repeated applications of Proposition 22. Algorithm 8 Approximation algorithm for E[d(j, S)]: first phase 1: for ℓ= 1, . . . , r = n ln n 2: Let Xℓ F k be the resulting of applying the algorithm of Proposition 22 with parameters ϵ, tj and weight vector w given by wj = exp ϵ2 ℓ 1 X 3: Set Ω to be the uniform distribution on X1, . . . , Xr Let us define φ = ϵ2/n. For each iteration ℓ= 1, . . . , r + 1 let u(ℓ) j = φd(j, Xℓ)/tj, and let w(ℓ) j = e Pℓ 1 s=1 u(s) j denote the weight vector. Proposition 22 ensures that u(ℓ) j ϵ, and thus eu(ℓ) j 1 + eϵ 1 ϵ u(ℓ) j 1 + (1 + ϵ)u(ℓ) j , as well as ensuring that P j w(ℓ) j u(ℓ) j φ(αI + O(ϵ)) P j w(ℓ) j . Now let Φℓ= P j C w(ℓ) j . Note that Φ1 = n, and for each ℓ 1, we have j C w(ℓ) j eu(ℓ) j X j C w(ℓ) j 1 + (1 + ϵ)u(ℓ) j X j C w(ℓ) j + (1 + ϵ) X j C w(ℓ) j u(ℓ) j Φℓ 1 + (1 + ϵ)φ(αI + O(ϵ)) Φℓeφ(αI+O(ϵ)) This recurrence relation implies that Φℓ ne(ℓ 1)φ(αI+O(ϵ)). Since w(r+1) j Φr+1, this implies r X ℓ=1 φd(j, Xℓ)/tj = ln w(r+1) j ln Φr+1 ln n + rφ(αI + O(ϵ)). or equivalently, r X r = tj ln n rφ + (αI + O(ϵ)) As r = n ln n ϵφ , we thus have Pr ℓ=1 d(j,Xℓ) r (αI + O(ϵ))tj. Thus, the distribution Ω satisfies j C ES Ω [d(j, S)] (αI + O(ϵ))tj. (9) Now the distribution Ω satisfies the condition on E[d(j, S)], but its support is too large. We can reduce the support size to |C| by moving in the nullspace of the |C| linear constraints (9). Byrka et al. (2017) have shown a 2.675+ϵ-approximation algorithm for k-median, which automatically gives a 2.675 + ϵ-approximation algorithm for k-lottery as well. Some special Approximation Algorithms for Stochastic Clustering cases of k-median have more efficient approximation algorithms. For instance, Cohen Addad et al. (2016) gives a PTAS for k-median problems derived from a planar graph, and Ahmadian et al. (2017) gives a 2.633 + ϵ-approximation for Euclidan distances. These immediately give approximation algorithms for the corresponding k-lotteries. We also note that, by Theorem 20, one cannot obtain a general approximation ratio better than 1 + 2/e (or 1 + 1/e in the SCC setting). 6. Determinizing a k-lottery Suppose that we have a set of feasible weights tj such some k-lottery distribution Ωsatisfies ES Ω[d(j, S)] tj; let us examine how to find a single, deterministic set S with d(j, S) tj. We refer to this as the problem of determinizing the lottery Ω. Note that this can be viewed as a converse to the problem considered in Section 3. We will see that, in order to obtain reasonable approximation ratios, we may need |S| to be significantly larger than k. We thus define an (α, β)-determinization to be a set S F k with k αk and d(j, S) βtj for all j C. We emphasize that we cannot necessarily obtain (1, 1)-determinizations, even with unbounded computational resources. The following simple example illustrates the tradeoffbetween parameters α and β: Observation 24 Let α, β, k 1. If β < αk+1 (α 1)k+1, there is a homogeneous SCC instance for which no (α, β)-determinization exists. Proof Let k = αk and consider a problem instance with F = C = {1, . . . , k + 1}, and d(i, j) = 1 for every distinct i, j. Clearly, every S F k satisfies minj d(j, S) = 1. When Ω is the uniform distribution on F k , we have E[d(j, S)] = 1 k k +1. Thus tj = k k +1 is feasible and therefore β 1 1 k k +1 = αk+1 (α 1)k+1. In particular, when α = 1 we must have β k + 1 and when k , we must have β α α 1. We examine three main regimes for the parameters (α, β): (1) the case where α, β are scale-free constants; (2) the case where β is close to one, in which case α must be of order log n; (3) the case where α = 1, in which case β must be order k. Our determinization algorithms for the first two cases will based on the following LP denoted Pexpectation, defined in terms of fractional vectors bi, ai,j where i ranges over F and j ranges over C: (A1) j C, P i F ai,jd(i, j) tj, (A2) j C, P i F ai,j = 1, (A3) i F, y C, 0 ai,j bi, (A4) i F, 0 bi 1, (A5) P i F bi k. Theorem 25 If tj is feasible, then Pexpectation has a fractional solution. Harris, Li, Pensyl, Srinivasan Proof Let Ωbe a probability distribution with E[d(j, S)] tj. For any draw S Ω, define random variable Zj to be the facility of S matched by j. Now consider the fractional vector defined by bi = Pr S Ω[i S], ai,j = Pr S Ω[Zj = i] We claim that this satisfies (A1) (A5). For (A1), we have E[d(j, S)] = E[d(j, Zj)] = X i F d(i, j) Pr[Zj = i] = X i F d(i, j)ai,j tj. For (A2), note that P i Pr[Zj = i] = 1. For (A3), note that Zj = i can only occur if i S. (A4) is clear, and (A5) holds as |S| = k with probability one. We next describe upper and lower bounds for these three regimes. 6.1. The Case Where α, β are Scale-free Constants. For this regime (with all parameters independent of problem size n and k), we may use the following Algorithm 9, which is based on greedy clustering using a solution to Pexpectation. Algorithm 9 (α, β)-determinization algorithm 1: Let a, b be a solution to Pexpectation. 2: For every j C, select rj 0 to be minimal such that P i B(j,rj) ai,j 1/α 3: By splitting facilities, form a set Fj B(j, rj) with b(Fj) = 1/α. 4: Set C = Greedy Cluster(Fj, θ(j) + rj) 5: Output solution set S = {Vj | j C }. Step (3) is well-defined, as (A3) ensures that b(B(j, rj)) P i B(j,rj) ai,j 1/α. Let us analyze the resulting approximation factor β. Proposition 26 Every client j C has rj αtj θ(j) Proof Let s = αtj θ(j) α 1 . It suffices to show that X i F,d(i,j)>s ai,j 1 1/α. As d(i, j) θ(j) for all i F, we have i F d(i,j)>s i F d(i,j)>s ai,j d(i, j) θ(j) i F ai,j d(i, j) θ(j) s θ(j) = P i F ai,jd(i, j) θ(j) P i F ai,j s θ(j) s θ(j) = 1 1/α, by (A1), (A2). Approximation Algorithms for Stochastic Clustering Theorem 27 Algorithm 9 gives an (α, β)-determinization with the following parameter β: 1. In the general setting, β = max(3, 2α 2. In the SCC setting, β = 2α α 1. Proof We first claim that the resulting set S has |S| αk. The algorithm opens at most |C | facilities. The sets Fj are pairwise disjoint for j C and b(Fj) = 1/α for j C . Thus P j C b(Fj) = |C |/α. On the other hand, b(F) = k, and so k |C |/α. Next, consider some j C; we want to show that d(j, S) βtj. By Observation 2, there is z C with Fj Fz = and θ(z)+rz θ(j)+rj. Thus d(j, S) d(z, S)+d(z, i)+d(j, i) where i Fj Fz. Step (5) ensures d(z, S) = θ(z). We have d(z, i) rz and d(i, j) rj since i Fj B(j, rj) and i Fz B(z, rz). So d(j, S) θ(z) + rz + rj 2rj + θ(j). By Proposition 26, we therefore have d(j, S) 2αtj 2θ(j) α 1 + θ(j) = 2αtj α 1θ(j) (10) This immediately shows the claim for the SCC setting where θ(j) = 0. In the general setting, for α 3, the second coefficient in the RHS of (10) is non-positive and hence the RHS is at most 2αtj α 1 as desired. When α 3, then in order for t to be feasible we must have tj θ(j); substituting this upper bound on θ(j) into (10) gives d(j, S) 2αtj α 1tj = 3tj We note that these approximation ratios are, for α close to 1, within a factor of 2 compared to the lower bound of Observation 24. As α , the approximation ratio approaches to limiting values 3 (or 2 in the SCC setting). 6.2. The Case of Small β We now consider what occurs when β becomes smaller than the critical threshold values 3 (or 2 in the SCC setting). We show that in this regime we must take α = Ω(log n). Of particular interest is the case when β approaches 1; here, in order to get β = 1 + ϵ for small ϵ we show it is necessary and sufficient to take α = Θ(log n Proposition 28 For any ϵ < 1/2, there is a randomized polynomial-time algorithm to obtain a (3 log n ϵ , 1 + ϵ) determinization. Proof First, let a, b be a solution to Pexpectation. Define pi = min(1, 2 log n ϵ bi) for each i F and form S = Dep Round(p). Observe then that |S| P i pi 2 log n ϵ P bi 1 + 2k log n Harris, Li, Pensyl, Srinivasan For j C, define A = Bj,(1+ϵ)tj. Let us note that, by properties (A3), (A1) and (A2), we have X i A ai,j = 1 X i:d(i,j)>(1+ϵ)tj ai,j 1 X i:d(i,j)>(1+ϵ)tj ai,j d(i, j) (1 + ϵ)tj 1 1 1 + ϵ So by property (P3) of Dep Round, and using the bound ϵ < 1/2, have Pr[d(j, S) > (1 + ϵ)tj] = Pr[A S = ] Y i A (1 pi) Y i A e 2 log n ϵ bi e 2 log n ϵ (1 1 1+ϵ ) n 4/3 A union bound over j C shows that solution set S satisfies d(j, S) (1 + ϵ)tj for all j with high probability. The following shows matching lower bounds: Proposition 29 1. There is a universal constant K with the following properties. For any k 1, ϵ (0, 1/3) there is some integer Nk,ϵ such that for n > Nk,ϵ, there is a homogeneous SCC instance of size n in which every (α, 1+ϵ)-determinization satisfies α K log n 2. For each β (1, 2) and each k 1, there is a constant K β,k such that, for all n 1, there is a homogeneous SCC instance of size n in which every (α, β)-determinization satisfies α K β,k log n. 3. For each β (1, 3) and each k 1, there is a constant K β,k such that, for all n 1, there is a homogeneous instance of size n in which every (α, β)-determinization satisfies α K β,k log n Proof These three results are very similar, so we show the first one in detail and sketch the difference between the other two. Consider an Erd os-R enyi random graph G G(n, p), where p = 3ϵ/k; note that p (0, 1). As shown by Glebov et al. (2015) asymptotically almost surely the domination number J of G satisfies J = Ω(k log n ϵ ). We construct a related instance with F = C = [n], and where d(i, j) = 1 if (i, j) is an edge, and d(i, j) = 2 otherwise. Note that if X is not a dominating set of G, then some vertex of G has distance at least 2 from it; equivalently, maxj d(j, X) 2 for every set X with |X| < J. Chernoff s bound shows that every vertex of G has degree at least u = 0.9np with high probability. Assuming this event has occured, we calculate E[d(j, S)] where S is drawn from the uniform distribution on F k . Note that d(j, S) 1 if j is a neighbor of X and d(j, S) = 2 otherwise, so E[d(j, S)] 1 + n u k n k 1 + e 0.9pk = 1 + e 2.7ϵ. Both the bound on the domination number and the minimum degree of G hold with positive probability for n sufficiently large (as a function of k, ϵ). In this case, tj = 1+e 2.7ϵ Approximation Algorithms for Stochastic Clustering is a feasible homogeneous demand vector. At the same time, every set S F J 1 satisfies minj C d(j, S) 2. Thus, an (α, β)-determinization cannot have α < J k = Θ(log n ϵ ) and β 2 1+e 2.7ϵ . Note that 2 1+e 2.7ϵ 1 + ϵ for ϵ < 1/3. Thus, whenever β 1 + ϵ, we have α Θ(log n ϵ ). For the second result, we use the same construction as above with p = 1 1 where λ = 2 β. A similar analysis shows that the vector tj = 1 + λ/2 is feasible with high probability and |J| Ω(k log n) (where the hidden constant may depend upon β, k). Thus, unless α Ω(log n), the approximation ratio achieved is 2 1+λ/2 β. The third result is similar to the second one, except that we use a random bipartite graph. The left-nodes are associated with F and the right-nodes with C. For i F and j C, we define d(i, j) = 1 if (i, j) is an edge and d(i, j) = 3 otherwise. 6.3. The Case of α = 1 We finally consider the case α = 1, that is, where the constraint on the number of open facilities is respected exactly. By Observation 24, we must have β k + 1 here. The following greedy algorithm gives a (1, k + 2)-determinization, nearly matching this lower bound. Algorithm 10 (1, k + 2)-determinization algorithm 1: Initialize S = 2: for ℓ= 1, . . . , |F| do 3: Let Cℓdenote the set of points j C with d(j, S) > (k + 2)tj 4: If Cℓ= , then return S. 5: Select the point jℓ Cℓwith the smallest value of tjℓ. 6: Update S S {Vjℓ} Theorem 30 If the values tj are feasible, then Algorithm 10 outputs a (1, k+2)-determinization in O(|F||C|) time. Proof For the runtime bound, we first compute Vj for each j C; this requires O(|F||C|) time upfront. When we update S at each iteration ℓ, we update and maintain the quantities d(j, S) quantities by computing d(j, Vjℓ) for each j C. This takes O(|C|) time per iteration. To show correctness, note that if this procedure terminates at iteration ℓ, we have Cℓ= and so every point j C has d(j, S) (k + 2)tj. The resulting set S at this point has cardinality ℓ 1. So we need to show that the algorithm terminates before reaching iteration ℓ= k + 2. Suppose not; let the resulting points be j1, . . . , jk+1 and for each ℓ= 1, . . . , k + 1 let wℓ= tjℓ. Because jℓis selected to minimze tjℓwe have w1 w2 wk+1. Now, let Ωbe a k-lottery satisfying ES Ω[d(j, S)] tj for every j C, and consider the random process of drawing S from Ω. Define the random variable Dℓ= d(jℓ, S) for ℓ= 1, . . . , k + 1. For any such S, by the pigeonhole principle there must exist some pair jℓ, jℓ with 1 ℓ< ℓ k + 1 which are both matched to a common facility i S, that is Dℓ= d(jℓ, S) = d(jℓ, i), Dℓ = d(jℓ , S) = d(jℓ , i). Harris, Li, Pensyl, Srinivasan By the triangle inequality, d(jℓ , Vjℓ) d(jℓ , i) + d(i, jℓ) + d(jℓ, Vjℓ) = Dℓ + Dℓ+ θ(jℓ) On the other hand, jℓ Cℓ and yet Vjℓwas in the partial solution set S seen at iteration ℓ . Therefore, it must be that d(jℓ , Vjℓ) > (k + 2)tjℓ = (k + 2)wℓ Putting these two inequalities together, we have shown that Dℓ+ Dℓ + θ(jℓ) > (k + 2)wℓ . As θ(jℓ) wℓ wℓ , this implies that wℓ > (k + 2)wℓ θ(jℓ) wℓ (k + 2)wℓ wℓ wℓ (k + 2)wℓ wℓ wℓ = k + 1. We have shown that, with probability one, there is some pair ℓ< ℓ satisfying this inequality Dℓ/wℓ+ Dℓ /wℓ > k + 1. Therefore, with probability one it holds that ℓ=1 Dℓ/wℓ> k + 1. (11) But now take expectations, observing that E[Dℓ] = E[d(jℓ, S)] tjℓ= wℓ. So the LHS of (11) has expectation at most k + 1. This is a contradiction. We remark that it is possible to obtain an optimal (1, k+1)-determinization algorithm for the SCC or homogeneous settings, but we omit this since it is very similar to Algorithm 10. 7. Acknowledgments Our sincere thanks to Brian Brubach and to the anonymous referees, for many useful suggestions and for helping to tighten the focus of the paper. Appendix A. Proof of Pseudo-Theorem 17 We would like to use Lemma 16 to bound E[d(i, S)], over all possible integer values m 1 and over all possible sequences 1 u1 u2 u3 ut 0. One technical obstacle here is that this is not a compact space, due to the unbounded dimension t and unbounded parameter m. The next result removes these restrictions. Proposition 31 For any fixed integers L, M 1, and every j C, we have E[d(j, S)] 1 + max m {1,2,...,M} 1 u1 u2 ...u L 0 EQ ˆR(m, u1, u2, . . . , u L), Approximation Algorithms for Stochastic Clustering where we define ℓ=1 (1 Qp(uℓ uℓ+1)) e Qpu L, ℓ=1 (uℓ+ Qpuℓ+1) (1 u L) if u L Qp 1 Qp (1 Qp) if u L > Qp , ˆR(m, u1, . . . , u L) = m )mα + (Qf(1 u1 m ))mβ if m < M e Qfu1α + Qf Me u1β if m = M . The expectation EQ is taken only over the randomness involved in Qf, Qp. Proof By Lemma 16, E[d(i, S) | Qf, Qp] ℓ=1 (1 Qp(uℓ uℓ+1)) + ℓ=1 (uℓ+ Qpuℓ+1). where u1, . . . , ut, m are defined as in Lemma 16; in particular 1 u1 u2 ut ut+1 = 0 and m 1. If we define uj = 0 for all integers j t, then E[d(i, S) | Qf, Qp] ℓ=1 (1 Qp(uℓ uℓ+1)) + ℓ=1 (uℓ+ Qpuℓ+1). The terms corresponding to ℓ> L telescope so we estimate these as: ℓ=L (1 Qp(uℓ uℓ+1)) ℓ=L e Qp(uℓ uℓ+1) = e Qpu L ℓ=L (uℓ+ Qpuℓ+1) (1 u L + Qpu L+1) ℓ=L+1 e uℓ+Qpuℓ+1 (1 u L + Qpu L+1)e u L+1. Now consider the expression (1 u L +Qpu L+1)e u L+1 as a function of u L+1 in the range u L+1 [0, u L]. Elementary calculus shows that it satisfies the bound (1 u L + Qpu L+1)e u L+1 (1 u L) if u L Qp 1 Qp (1 Qp) if u L > Qp , E[d(i, S) | Qf, Qf] 1 + 1 Qf Harris, Li, Pensyl, Srinivasan If m < M we are done. Otherwise, for m M, we upper-bound the Qf terms as: (1 Qfu1/m)m e Qfu1, (Qf(1 u1/m))m Qf Me u1 In light of Proposition 31, we can get an upper bound on E[d(i, S)] by maximizing ˆR. We now discuss to bound ˆR for a fixed choice of L, M, where we select Qf, Qp according to the following type of distribution: ( (γ0,f, 0) with probability p (γ1,f, γ1,p) with probability 1 p . Note that we are setting γ0,p = 0 here in order to keep the search space more manageable. We need to upper-bound the function EQ ˆR(m, u1, . . . , u L), for fixed values γ and where m, u range over the compact domain m {1, . . . , M}, 1 u1 u L 0. The most straightforward way to do so would be to divide (u1, . . . , u L) into intervals of size ϵ, and then calculate upper bounds on ˆR within each interval and for each m. This process would have a runtime of Mϵ L, which is too expensive. But suppose we have fixed uj, . . . , u L, and we wish to continue to enumerate over u1, . . . , uj 1. To compute ˆR(m, u1, . . . , u L) as a function of m, u1, . . . , u L, observe that we do not need to know all the values uj+1, . . . , u L, but only the following four summary statistics of them: 1. a1 = e γ1,pu L QL 1 ℓ=j (1 γ1,p(uℓ uℓ+1)), 2. a2 = QL 1 ℓ=j (uℓ+ γ1,puℓ+1) (1 u L) if u L γ1,p 1 γ1,p (1 γ1,p) if u L > γ1,p , 3. a3 = e u L QL 1 ℓ=j (1 (uℓ uℓ+1)), 4. a4 = uj+1. We thus use a dynamic program wherein we track, for j = L, . . . , 1, all possible values for the tuple (a1, . . . , a4). Furthermore, since ˆR is a monotonic function of a1, a2, a3, a4, we only need to store the maximal tuples (a1, . . . , a4). The resulting search space has size O(ϵ 3). We wrote C code to perform this computation to upper-bound EQ ˆR with M = 10, ϵ = 2 12, L = 7. This runs in about an hour on a single CPU core. With some additional tricks, we can also optimize over the parameter p [0, 1] while still keeping the stack space bounded by O(ϵ 3). Note that due to the complexity of the dynamic programming algorithm, the computations were carried out in double-precision floating point arithmetic. The rounding errors were not tracked precisely, and it would be difficult to write completely correct code to do so. We believe that these errors should be orders of magnitude below the third decimal place, and that the computed value 1.592 is a valid upper bound. Approximation Algorithms for Stochastic Clustering Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. Better guarantees for k-means and Euclidean k-median by primal-dual algorithms. In Proc. 58th annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 61 72, 2017. Sharareh Alipour and Amir Jafari. Improvements on the k-center problem for uncertain data. In Proc. 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principle of Database System (PODS), pages 425 433, 2018. Ian Ayres, Mahzarin Banaji, and Christine Jolls. Race effects on e Bay. The RAND Journal of Economics, 46(4):891 917, 2015. ISSN 1756-2171. doi: 10.1111/1756-2171.12115. URL http://dx.doi.org/10.1111/1756-2171.12115. Emily Badger. How Airbnb plans to fix its racial-bias problem. Washington Post, 2016. URL https://www.washingtonpost.com/news/wonk/wp/2016/09/08/ how-airbnb-plans-to-fix-its-racial-bias-problem/. September 8, 2016. Marianne Bertrand and Sendhil Mullainathan. Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. American Economic Review, 94(4):991 1013, 2004. doi: 10.1257/0002828042002561. URL http: //www.aeaweb.org/articles.php?doi=10.1257/0002828042002561. Jaros law Byrka, Thomas Pensyl, Bartosz Rybicki, Aravind Srinivasan, and Khoa Trinh. An improved approximation for k-median, and positive correlation in budgeted optimization. ACM Transaction on Algorithms, 13(2):23:1 23:31, 2017. Moses Charikar and Shi Li. A dependent LP-rounding approach for the k-median problem. Automata, Languages, and Programming, pages 194 205, 2012. Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. Fair clustering through fairlets. In Advances in Neural Information Processing Systems 30, pages 5036 5044, 2017. Vincent Cohen-Addad, Philip N Klein, and Claire Mathieu. Local search yields approximation schemes for k-means and k-median in Euclidean and minor-free metrics. In Proc. 57th annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 353 364, 2016. Amit Datta, Michael Carl Tschantz, and Anupam Datta. Automated experiments on ad privacy settings: A tale of opacity, choice, and discrimination. In Proceedings on Privacy Enhancing Technologies, pages 92 112, 2015. Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Proc. 3rd Innovations in Theoretical Computer Science conference (ITCS), pages 214 226, 2012. Laura P Edgerley, Yasser Y El-Sayed, Maurice L Druzin, Michaela Kiernan, and Kay I Daniels. Use of a community mobile health van to increase early access to prenatal care. Maternal and Child Health Journal, 11(3):235 239, 2007. Harris, Li, Pensyl, Srinivasan Dan Feldman, Morteza Monemizadeh, and Christian Sohler. A PTAS for k-means clustering based on weak coresets. In Proc. 23rd ACM Symposium on Computational Geometry (SOCG), pages 11 18, 2007. Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. Certifying and removing disparate impact. In Proc. 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 259 268, 2015. Roman Glebov, Anita Liebenau, and Tibor Szab o. On the concentration of the domination number of the random graph. SIAM J. Discrete Math., 29(3):1186 1206, 2015. doi: 10.1137/12090054X. URL https://doi.org/10.1137/12090054X. Sudipto Guha and Samir Khuller. Greedy strikes back: improved facility location algorithms. Journal of algorithms, 31(1):228 248, 1999. Sepali Guruge, Jo Anne Hunter, Keegan Barker, Mary Jane Mc Nally, and Lilian Magalh aes. Immigrant womens experiences of receiving care in a mobile health clinic. Journal of Advanced Nursing, 66(2):350 359, 2010. David G. Harris, Thomas Pensyl, Aravind Srinivasan, and Khoa Trinh. Dependent rounding for knapsack/partition constraints and facility location. arxiv, abs/1709.06995, 2017. David G Harris, Thomas Pensyl, Aravind Srinivasan, and Khoa Trinh. A lottery model for center-type problems with outliers. ACM Transactions on Algorithms, 15(3):Article #36, 2019. Dorit S. Hochbaum and David B. Shmoys. A unified approach to approximation algorithms for bottleneck problems. J. ACM, 33(3):533 550, 1986. doi: 10.1145/5925.5933. URL http://doi.acm.org/10.1145/5925.5933. Lingxiao Huang and Jian Li. Stochastic k-center and j-flat-center problems. In Proc. 28th annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 110 129, 2017. Imanol Arrieta Ibarra, Leonard Goff, Diego Jim enez Hern andez, Jaron Lanier, and E. Glen Weyl. Should we treat data as labor? Moving beyond free . American Economic Association Papers & Proceedings, 1(1), 2018. Kamal Jain, Mohammad Mahdian, and Amin Saberi. A new greedy approach for facility location problems. In Proc. 34th annual ACM Symposium on Theory of Computing (STOC), pages 731 740, 2002. Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. A local search approximation algorithm for k-means clustering. Comput. Geom., 28(2-3):89 112, 2004. doi: 10.1016/j.comgeo.2004.03.003. URL https://doi.org/10.1016/j.comgeo.2004.03.003. David Kempe, Jon M. Kleinberg, and Eva Tardos. Maximizing the spread of influence through a social network. Theory of Computing, 11:105 147, 2015. Approximation Algorithms for Stochastic Clustering Ravishankar Krishnaswamy, Shi Li, and Sai Sandeep. Constant approximation for k-median and k-means with outliers via iterative rounding. In Proc. 50th annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 646 659, 2018. J. Lanier. Who Owns the Future? Simon Schuster, 2014. Jiˇr ı Matouˇsek. On approximate geometric k-clustering. Discrete & Computational Geometry, 24(1):61 84, 2000. doi: 10.1007/s004540010019. URL https://doi.org/10.1007/ s004540010019. John F Mc Carthy, Frederic C Blow, Marcia Valenstein, Ellen P Fischer, Richard R Owen, Kristen L Barry, Teresa J Hudson, and Rosalinda V Ignacio. Veterans affairs health system and mental health treatment retention among patients with serious mental illness: evaluating accessibility and availability barriers. Health services research, 42(3p1):1042 1060, 2007. Cathleen Mooney, Jack Zwanziger, Ciaran S Phibbs, and Susan Schmitt. Is travel distance a barrier to veterans use of VA hospitals for medical surgical care? Social science & medicine, 50(12):1743 1755, 2000. Dana Moshkovitz. The projection games conjecture and the NP-hardness of ln napproximating set-cover. Theory of Computing, 11(7):221 235, 2015. David B Reuben, Lawrence W Bassett, Susan H Hirsch, Catherine A Jackson, and Roshan Bastani. A randomized clinical trial to assess the benefit of offering on-site mobile mammography in addition to health education for older women. American Journal of Roentgenology, 179(6):1509 1514, 2002. Susan K Schmitt, Ciaran S Phibbs, and John D Piette. The influence of distance on utilization of outpatient mental health aftercare following inpatient substance abuse treatment. Addictive Behaviors, 28(6):1183 1192, 2003. Kevin A. Schulman, Jesse A. Berlin, William Harless, Jon F. Kerner, Shyrl Sistrunk, Bernard J. Gersh, Ross Dub, Christopher K. Taleghani, Jennifer E. Burke, Sankey Williams, John M. Eisenberg, William Ayers, and Jos J. Escarce. The effect of race and sex on physicians recommendations for cardiac catheterization. New England Journal of Medicine, 340(8):618 626, 1999. Aravind Srinivasan. Distributions on level-sets with applications to approximation algorithms. In Proc. 42nd annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 588 597, 2001. Richard S. Zemel, Yu Wu, Kevin Swersky, Toniann Pitassi, and Cynthia Dwork. Learning fair representations. In Proc. 30th International Conference on Machine Learning (ICML), pages 325 333, 2013.