# robust_load_balancing_with_machine_learned_advice__cfed304b.pdf Journal of Machine Learning Research 24 (2023) 1-46 Submitted 6/22; Published 1/23 Robust Load Balancing with Machine Learned Advice Sara Ahmadian sahmadian@google.com Google Research New York, NY 10011, USA Hossein Esfandiari esfandiari@google.com Google Research New York, NY 10011, USA Vahab Mirrokni mirrokni@google.com Google Research New York, NY 10011, USA Binghui Peng bp2601@columbia.edu Columbia University New York, NY 10027, USA Editor: Sebastien Bubeck Motivated by the exploding growth of web-based services and the importance of efficiently managing the computational resources of such systems, we introduce and study a theoretical model for load balancing of very large databases such as commercial search engines. Our model is a more realistic version of the well-received balls-into-bins model with an additional constraint that limits the number of servers that carry each piece of the data. This additional constraint is necessary when, on one hand, the data is so large that we can not copy the whole data on each server. On the other hand, the query response time is so limited that we can not ignore the fact that the number of queries for each piece of the data changes over time, and hence we can not simply split the data over different machines. In this paper, we develop an almost optimal load balancing algorithm that works given an estimate of the load of each piece of the data. Our algorithm is almost perfectly robust to wrong estimates, to the extent that even when all of the loads are adversarially chosen the performance of our algorithm is 1 1/e, which is provably optimal. Along the way, we develop various techniques for analyzing the balls-into-bins process under certain correlations and build a novel connection with the multiplicative weights update scheme. Keywords: Online algorithm, learning augmented algorithm design, machine learned prediction, load balancing, balls and bins 1. Introduction Due to the rapid growth of serving demand, web based services face challenging resource allocation problems in their data centers. Driven by the requirement of sub-second latency response, data centers replicate the data across distributed machines to accommodate serving queries in parallel. With a massive number of real time queries to serve on a daily basis, load balancing becomes a critical challenge in resource management and lies in the core of distribution system design. 2023 Sara Ahmadian, Hossein Esfandiari, Vahab Mirrokni, Binghui Peng. License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v24/22-0629.html. Ahmadian, Esfandiari, Mirrokni, Peng The balls-into-bins paradigm is the most fundamental model for load balancing in realtime distributed systems (Richa et al., 2001). In the classical balls-into-bins problem, we model the real-time requests as balls and the servers as bins. In each round, a memory-less allocation algorithm places the incoming balls into one of the bins. The goal is to balance the number of balls assigned to the bins, commonly referred as loads. Constrained by the latency requirement, the algorithm is only allowed to look at the loads of a few bins. The most well-established result in the context of balls-into-bins is the power of two choices, or more generally, the power of d choices algorithm which looks at d bins uniformly at random in each round and assigns the ball to the bin with the minimum load. The maximum load of all bins is then bounded by T n + O log log n log d (Berenbrink et al., 2006), where n is the number of bins and T is the number of balls. The classical formulation of balls-into-bins is extensively studied in the literature (Azar et al., 1994; Vöcking, 2003; Berenbrink et al., 2006; Talwar and Wieder, 2007; Godfrey, 2008; Dahlgaard et al., 2016; Mirrokni et al., 2018; Aamand et al., 2021; Richa et al., 2001), as it captures several applications including hashing, share memory emulations and jobs allocations. However, in modern distributed service systems such as web search engines, there is a significant gap between the theory and practice. In these modern applications, since the size of the whole data is very large, each piece of the data (e.g. one dataset) is only replicated across a few machines, and the replication is fixed throughout the time. As a consequence, a query can only be addressed on servers that hold a copy of its corresponding dataset. This challenges the common assumption that one can choose an arbitrary machine to process a request. Hence, two fundamental questions arise: Q1. How do we replicate the data across servers? Q2. When a real-time query appears, how should it be assigned? Given the emerging popularity of web based services, we believe it is crucial to address these questions and mitigate the gap of the classical theory. In this work, we introduce a new balls-into-bins model that captures the practical issues arising in such distributed systems and develop a near optimal algorithm with machine learned advice. 1.1 Models and motivations We first provide a high level overview on practical distributed service systems, abstract the model and explain the motivation. Distributed service system A modern distributed service system can be abstracted into three layers. The first layer consists of end users, the second layer consists of machines called clients, and the third layer consists of machines called servers. We describe the layers from third to first. The whole data is partitioned into a set of (non-intersecting) datasets. These datasets lie on the servers where each server contains a few of the datasets. Note that, even though the datasets are not intersecting, each dataset may lie on multiple servers. Queries to the datasets are served on the servers. Serving the queries is the time consuming procedure and we intend to balance this load. Robust Load Balancing For each dataset D there is one machine called client1 that is in charge of redirecting the queries to dataset D to one of the servers that contains a copy of that dataset. The clients are in charge of balancing the loads on the servers. Clients can observe the current loads of the servers to decide which server should serve a query. However, there are only a handful of servers that carry each dataset and a query to a dataset must be served by a server that carries that dataset. A user may request to perform a query to a dataset D. The query will be sent to the specific client that is associated with the dataset D, to be redirected to a server. Load balancing as a two stage optimization problem We face two optimization tasks in sequence. The first optimization problem is to decide which dataset to put on which server. This decision should be made before the system goes live. The second optimization problem is to decide how to allocate each query to one of the servers that can serve that query. This optimization problem deals with real time queries, when the system is live. Next, we provide more details on these two stages. Recall that, the data is partitioned into a set of (non-intersecting) datasets, and we intend to keep a copy of each dataset on only a handful of servers. We use parameter d to refer to the upper-bound on the number of servers that carry each dataset.2 We call this upper-bound budget constraint. We can represent this with a bipartite graph between clients and servers, where an edge between a client A and a server B means that server B contains a copy of the dataset corresponding to client A. Hence, each client can send its queries to its neighboring server in the bipartite graph. The construction of this graph should be done before the system goes live, and is fixed throughout the real-time service. We refer to this as a network design problem. In the second stage, the queries come in a real-time fashion and the clients assign each query to one of the servers that carry the query s corresponding dataset. In fact, our goal in this stage is to balance the load of the servers. The clients can use the information of the previous queries to make the assignment, however, they are not aware of the future queries. We refer to this as an online load balancing problem. Modeling the queries with ML advice There are two common approaches to model the pattern of real-time queries. The first approach, a.k.a. stochastic model (Feldman et al., 2009; Correa et al., 2017; Gupta et al., 2019; Huang and Shu, 2021), assumes that queries are generated by an i.i.d. distribution over the space of all queries. The second approach, all the way on the other end of the spectrum, has no assumption on the arrival of the queries and expects the worst case sequence of queries, a.k.a. adversarial model (Karp et al., 1990; Cohen et al., 2019; Huang et al., 2020). We consider a general model that captures both adversarial and stochastic models. Indeed, we intend to design an algorithm that takes a distribution as a signal, and performs similarly to the best stochastic algorithm when the input matches the signal. Moreover, even if the input is totally adversarial the algorithm is still robust and performs similarly to the best adversarial algorithm. We interpolate the performance of our algorithm based on an unknown parameter λ [0, 1] that indicates the 1. We call these machines clients because servers are in charge of responding to these machines. In other words, these are the clients to the servers. 2. Note that, unlike our model, in the original balls-into-bins problem each server should be able to process all queries, and hence all of the datasets should be copied on all servers. Ahmadian, Esfandiari, Mirrokni, Peng similarity of the query sequence to the signal. This model is called algorithm design with ML advice and has drawn a lot of attention in the recent years (Lykouris and Vassilvtiskii, 2018; Rohatgi, 2020; Boyar et al., 2016; Indyk et al., 2019). 1.2 Our Results We present a near optimal algorithm for the load balancing problem. The performance of our algorithm is measured w.r.t. the offline optimal solution, which is defined as the solution that minimizes the maximum load of the servers on the actual load distribution. Theorem 1 Suppose ϵ > 0, d = Ω(log n/ϵ2 + 1/ϵ4), and the estimation error is bounded by λ [0, 1]. There is a randomized network construction and online load balancing algorithm such that with probability at least 1 1/ poly(n), The maximum load on servers is always O(1) approximate to the offline optimal. When T = Ω n2 log n ϵ2 , the maximum load on servers is ( 1 1 λ exp( λ) + ϵ) approximate to the offline optimal. Moreover, no algorithm is capable of achieving ( 1 1 λ exp( λ) ϵ) approximation, even when all online requests are revealed at the beginning of the second stage. Remark 2 Based on the total number of balls (requests), there are two natural regimes of the balls-into-bins problem. In the small load regime, the total number of balls (requests) T = Θ(n), and the hope is to get a constant (or near constant) approximation, e.g. (Richa et al., 2001; Vöcking, 2003). A simple inductive argument extends this constant (or near constant) approximation to all ranges of T. In the large load, or the asymptotic regime, the hope is to get (1 + o(1))-approximation (Mitzenmacher and Upfal, 2005), or additive approximation (Berenbrink et al., 2006), when T . In particular, this would imply the load balancing algorithm converges to an optimal allocation asymptotically. Remark 3 As we will explain soon, the power of d-choice algorithm still guarantees (1 + o(1)) approximation asymptotically in the online load balancing stage (Theorem 5). What prevents us from getting (1 + o(1)) approximation is the network construction stage, where inaccurate estimation, or adversarial loads, imposes information theoretical barriers (see Theorem 4). Our algorithm consists of two subroutines. First, we come up with a network construction algorithm that receives an estimation of future loads as the input, outputs a bipartite graph of clients and servers with near optimal approximation guarantee. Since no online requests are revealed in the network construction stage, the performance of the algorithm is defined as the optimal value one can achieve with the bipartite graph, which is clearly the best one can hope in later online stage, see Section 2 for details. Theorem 4 (Informal) For any ϵ > 0, suppose d Ω log n/ϵ2 + 1/ϵ4 and the estimation error is bounded by λ [0, 1]. There is a randomized network construction algorithm, such that with probability 1 1/ poly(n), it achieves ( 1 1 λ exp( λ) + ϵ) approximation to the offline optimal. The approximation is tight as no algorithm can achieve ( 1 1 λ exp( λ) ϵ) approximation. Robust Load Balancing The second subroutine is online load balancing, where we use the power of d choice (abbreviated as d-choice) algorithm. Fixing the bipartite graph constructed by the algorithm of Theorem 4, we prove the d-choice algorithm is constant approximation in the small load regime and (1 + o(1)) approximation asymptotically. Theorem 5 (Informal) Let d = Ω(log n). Suppose the bipartite graph is constructed by the algorithm of Theorem 4, and we assign online requests using the d-choice algorithm, then with probability at least 1 1/ poly(n), The maximum load on servers is always constant approximate to the offline optimal. When the number of requests T = Ω n2 log n ϵ2 , the maximum load on servers is (1 + ϵ) approximate to the optimal allocation on the graph. 1.3 Related works Load balancing has been in the core of distributed system design, attracting extensive research effort over past decades (Phillips and Westbrook, 1993; Azar et al., 1994; Azar, 1998; Richa et al., 2001; Im et al., 2018; Azar et al., 2018; Ahmadian et al., 2021). Ours is mostly related to the balls-into-bins paradigm (Azar et al., 1994; Richa et al., 2001; Vöcking, 2003), which is the most well-received model for real-time query load balancing. Balls-into-bins The balls-into-bins paradigm has been extensively studied in the literature (Azar et al., 1994; Richa et al., 2001; Vöcking, 2003; Berenbrink et al., 2006; Talwar and Wieder, 2007; Godfrey, 2008; Dahlgaard et al., 2016; Aamand et al., 2018; Chen, 2019; Lenzen et al., 2019; Mirrokni et al., 2018; Aamand et al., 2021). Azar et al. (1994) initiate this line of work and prove that the d-choice algorithm gives an approximation of O(log log n log d ). Vöcking (2003) shows by adopting a non-asymmetric tie breaking rule and non-uniform selection choice, the maximum load can be further reduced to O(log log n d ). Subsequent work (Berenbrink et al., 2006; Talwar and Wieder, 2007) generalize the result to weighted loads and additive approximations. In contrast with these work, such (smooth) dependence on d is not achievable in our setting. Since the maximum load of clients is Ω( log n log log n) w.h.p. even for uniform load distribution, one can not do better than eΩ( log n d log log n) and the only interesting case is d = Ω(log n). The work of Godfrey (2008) is most relevant to us, which considers the case that bins are not chosen uniformly at random and proves that as long as some balance conditions are satisfied, one can achieve O (1) approximation when d = Ω(log n). The technique of Godfrey (2008) can only be applied to our setting when the load distribution is uniform. For detailed coverage on the literature, we refer the interested readers to the comprehensive survey of Richa et al. (2001). Machine learned advice There is a formidable trend of incorporating machine learned advice into algorithmic design (Indyk et al., 2019), which tries to close the gap between traditional algorithm design that deals with worst case inputs, and machine learning algorithms that require perfect stochastic assumptions. In this context we design an algorithm given some information (a.k.a. advice) that can be learned from a portion of the input. We want an algorithm that is robust to errors in the advice. There are a growing number of interesting results in this model for different problems including k-server (Böckenhauer et al., Ahmadian, Esfandiari, Mirrokni, Peng 2017), online set cover (Dobrev et al., 2017), online scheduling (Lattanzi et al., 2020), bloom filters (Mitzenmacher, 2018), caching (Lykouris and Vassilvtiskii, 2018; Rohatgi, 2020), and many others (Purohit et al., 2018; Boyar et al., 2016; Mitzenmacher and Vassilvitskii, 2020; Gollapudi and Panigrahi, 2019; Balcan et al., 2018; Bamas et al., 2020b,a; Wei and Zhang, 2020). 1.4 Technique overview We now provide a streamlined overview of our techniques. Network construction We start from the network construction subroutine. The problem is challenging as the estimate load has no accuracy guarantee and the adversary is allowed to make arbitrary perturbations within bounded TV distance. Moreover, even when the actual load distribution is revealed, the computation task is still NP-hard (see Theorem 13). Existing literature on robust algorithm design with ML advice often offers robustness-consistency guarantee, but most of the time they are not optimal, partial due to the seemingly contradictory fact that algorithms need to utilize the estimation to achieve high performance, but at the same time, it is required to ensure the performance does not degrade much when the estimation is incorrect. We start with some simple yet crucial observations. On one side, if the estimation is completely wrong (i.e. the adversarial case), a natural idea is to randomly assign clients with servers. Through a cute argument, one can show this is in fact e e 1 approximate to the optimal, when d = Ω(log n). On the other end of the spectrum, suppose the estimation is accurate, one can come up with a greedy solution that achieves d+1 d approximation. We combine these ideas and propose the Randomized greedy algorithm, which achieves (near)-optimal approximation guarantees on all ranges of estimation errors. The Randomized greedy algorithm first greedily assign clients to servers. For clients with leftover budget, it randomly distributes them by connecting to randomly chosen servers. The key challenges lie in the analysis of this simple strategy. First, an important byproduct of the greedy assignment is that the estimate load for any subsets of clients roughly equals the number of connected servers, up to some scaling factors. We refer this as the volumetric property. Next, the random assignment induces a bipartite graph with good expansion properties. As one can show that the expansion of the graph provides an upper bound on the approximation ratio, combining these two facts already provides worst case performance guarantee. The volumetric property and expansion property abstract the intuition of why Randomized greedy works well on two extreme cases. Generalizing this to all ranges of (adversarial) perturbation is the most intriguing part of our proof, and it requires some delicate analysis. We outline the intuition behinds the proof. For any subset of clients, if the estimate load is small, then actual load can not be too large, as the TV distance between estimate and actual load is bounded. The expansion property already guarantees reasonable performance in this case. If the estimate load is large, by the volumetric property, we know that the number of connected servers should be larger than normal, and therefore, guarantees reasonable performance. The full proof follows this intuition, but with some extra technical effort, mainly coming from the fact that the volumetric property holds only for light-loaded clients. Robust Load Balancing Online load balancing The second subroutine is online load balancing and we analyze the performance of the d-choice algorithm. In contrast to most of the existing literature on the balls-into-bins paradigm, the neighbor servers (a.k.a. the choice of bins) of each online request (a.k.a. the ball) are neither random nor independent in our context. It is not (uniformly at) random because the greedy allocation of the first subroutine is deterministic and driven by the estimate load, and this is inevitable in general for any algorithm that utilizes the ML advice. It is not independent because the bipartite graph is fixed in advance, and if two requests appear on the same client, they actually share the same set of neighbor servers. We note there are existing literature dealing with correlations between the choice of servers (Godfrey, 2008; Dahlgaard et al., 2016; Aamand et al., 2018), but their methods do not fit into our context. When the load is small (i.e. T = n), the key idea to overcome the above challenges is to separately analyse the initialization and the online process. We perform fine-grained analysis on the initialization, showing it gives a warm start to the online process most of the time. Combine with the classic layer-induction techniques for analyzing random process, we are able to prove a constant approximation. Concretely, we consider the load score for each server, which is defined as the expected number of requests that the server receives if all clients assign online requests uniformly at randomly. The load score could be large (e.g. Ω( log n log log n) with non-negligible chance), but the crucial observation is that they sum to n, and hence, they should be small on average. By a simple Markov inequality, one can show at least half of the servers have load score less than 2, and the key challenge is to show concentrations and prove every client has at least d/2 assigned servers of low load scores. This is non-trivial as load scores are not independent, not to mention the (deterministic) greedy allocation. We separately analyse the greedy allocation and the random allocation, and the intuition is that (1) The greedy allocations have limited overlaps, and thus contribute little to the load score. (2) The load scores are negatively associated and this allows one to use standard concentration inequality. The fine-grained analysis on initialization condition is unique to our context, but we believe the idea could be of independent interests and useful for other balls-into-bins problems. For the online allocation process, given each client is connected to at least d/2 low load score servers, we apply the layer-induction technique to inductively prove that there are always at least d/4 empty connected servers of each client for the first αn requests, where α is a small constant that only depends on the actual load distribution p. By standard argument, this can be generalized to n requests and deduce constant approximation guarantee. When the load is large, we prove that after T = Ω( n2 log n ϵ2 ) requests, the d-choice algorithm leads to a (1+ϵ) approximately optimal allocation. Instead of conducting probabilistic analysis on the stochastic process (as usually done in the balls-into-bins literature), we discover an elegant connection between the power of d-choice algorithm and the multiplicative weights updating (MWU) scheme under the i.i.d. arrival model. Based on this connection, we consider a two player zero-sum game between the allocation algorithm and a load player, where the allocation algorithm seeks to minimize the load whereas the load player tries to maximize it. One can view the load player uses the MWU strategy while the allocation player always best response. Thus, over time, the value would converge to the minmax value of the game. At the same time, since the requests are drawn i.i.d from the distribution, one can show the minmax value approaches the maximum load under optimal assignment, Ahmadian, Esfandiari, Mirrokni, Peng via the Azuma-Hoeffding bound. We believe this connection with MWU scheme could be of independent interests. Organization of the paper Notations and mathematical formulations are provided in Section 2. Section 3 is devoted to present results on network construction and prove Theorem 4. We provide the analysis of d-choice algorithm in Section 4 and prove Theorem 5. We perform empirical study in Section 5. Proofs are deferred into Appendix. 2. Preliminary Notation Let [n] = {1, . . . , n}, and [n1 : n2] = {n1, . . . , n2}. Let n be the n dimension simplex that contains all probability distributions over [n]. For any distribution p, q n, the total variation (TV) distance between p and q is defined as TV(p, q) = max S [n] |p(S) q(S)|, where p(S) = P i S pi for any subset S [n]. For any set X, Y , let X\Y := {i : i X, i / Y }. We use A to denote clients and B to denote servers. Given a bipartite graph G(A, B, E) between clients and servers, for any subset of clients S A, we use N(S) to denote the neighbor servers of clients S, i.e., N(S) := {j : j B, i S, (i, j) E}. 2.1 Problem formulation The load balancing task divides into two stages: The network design stage is devoted to replicate datasets and construct the storage graph between clients and servers. The graph is fixed afterwards, and the online load balancing stage is devoted to assign online requests to servers. Network design The input consists of (A, B, q, d), where A denotes the set of clients, B denotes the set of servers, q |A| is an estimation of the load distribution over clients, and d N is the budget constraint, meaning that each client can be connected to at most d servers. For simplicity, we assume |A| = |B| = n, although all of our results extend naturally to the general case. Let p n be the actual load distribution over clients A and it is oblivious to the network constructed by the algorithm. The estimation error is measured in total variation distance and let λ := TV(p, q). Both λ and p are unknown to the algorithm. We consider the oblivious adversary model, where the actual load distribution is independent of the randomness of our algorithm. A graph G = (A, B, E) is feasible if the budget constraint is satisfied. For any feasible graph G between clients and servers, consider the following LP min x,L L (1) X j N(i) xi,j = pi ( i A), X i N(j) xi,j L ( j B), xi,j 0 ( i A, j B). The optimal allocation of LP (1) is called the optimal flow, and its objective value is called the optimal flow value, denoted as OPTG,p. We further define the offline optimal solution as the minimum possible optimal flow value, denoted as OPTp = minfeasible G OPTG,p . Robust Load Balancing The offline optimal solution should be thought as the best possible (normalized) loads on servers that one can hope to achieve under load distribution p, as fractional allocation is allowed and loads are calculated on expectations. The approximation ratio of the network construction algorithm is measured in this ideal case, and it is defined as cnd = E[OPTG,p] OPTp , where the expectation is taken over the randomness of the algorithm. Online load balancing After constructing a bipartite graph G = (A, B, E) in the first stage, datasets are replicated across servers and the graph is fixed throughout the online load balancing stage. We consider a stochastic model where requests are drawn i.i.d. from the load distribution p. At each time step t, a request is drawn from p and appears on a client it A, and the algorithm assigns the request to one of the neighbor servers of it, immediately and irrevocably. We assume the request has unit weight. We focus on the interesting regime of T = Ω(n). Let ALGG,p,t denote the maximum load of the servers at time t (t [T]), under graph G and load distribution p. We measure the performance w.r.t. the optimal flow value OPTG,p, and denoted as clb = E[ALGG,p,T ] T OPTG,p . Here the expectation is taken over the randomness of requests and the randomness of our algorithm. Remark 6 Note that instead of the empirical loads, the performance is compared w.r.t. the optimal flow value OPTG,p, which is measured on the expected load distribution and fractional allocation is allowed. This is a stronger benchmark and used for the convenience of presentation. Overall objective The overall performance of our algorithm is measured w.r.t. the offline optimal, and defined as c = E[ALGG,p,T ] T OPTp . As noted in Remark 6, we compare with the stronger benchmark that allows fractional allocations over the expected load. 3. Network design We prove the following result for the network design stage. Theorem 7 (Formal version of Theorem 4) Let ϵ > 0, λ [0, 1]. (1) Suppose d Ω log n/ϵ2 + 1/ϵ4 . Then there is a randomized network construction algorithm, such that with probability 1 1/ poly(n), it achieves ( 1 1 λ exp( λ) + ϵ) approximation. (2) Suppose λ > ϵ, Ω(ϵ 1) d O( ϵ2n log n), then no algorithm can achieve ( 1 1 λ exp( λ) ϵ) approximation. We describe our algorithm in Section 3.1 and the analysis in Section 3.2. The lower bound is proved in Section 3.3, together with a NP-hardness result for the perfect knowledge case (i.e. λ = 0). The detailed proof are deferred to Appendix B. 3.1 Algorithm A formal description of Randomized greedy is presented in Algorithm 1. Based on the estimate load q, we split clients into two groups, Ahigh and Alow, where Ahigh contains large Ahmadian, Esfandiari, Mirrokni, Peng loaded clients whose loads are greater than d/n, and Alow contains the rest clients of small loads (Line 2). Randomized greedy gives rule for connecting clients to servers. For analysis purpose, we also implicitly maintain assignment of loads. Ideally, we want each server receives no more than 1/n unit of loads, and we use bj to denote the leftover budgets of server j (Line 3). While this goal is too ambitious for those connected to clients Ahigh, we simply assign d different servers for each of them (Line 5), and split the load evenly. For each client in Alow, we start with the first server with non-zero leftover budgets, greedily fill up the server and switch to a new server once the budgets are exhausted (Line 18). We repeat the process, until the client has d connected servers or we have assigned all its load (Line 16). In the former case, we split evenly of the client s leftover loads to all neighbor servers. We call any server assigned to a client so far greedily assigned. If a client is assigned to k < d servers, we continue and randomly assign the client to (d k) more servers (Line 11). We call servers assigned in such way randomly assigned. Notation We denote the i-th client in Ahigh (resp. Alow) as Ahigh,i (resp. Alow,i), and we denote the j-th server as Bj. For any subset clients SA A , recall we use N(SA) to denote the set of neighbor servers for clients in SA. We use NG(SA) (resp. NR(SA)) to denote the set of greedily (resp. randomly) assigned neighbor servers for clients in SA. Algorithm 1 Randomized greedy 1: Input: Clients A, servers B, budget d, estimate load distribution q n. 2: Ahigh i A : qi > d n , Alow A\Ahigh 3: bj = 1/n, j B Leftover budget of servers 4: for i = 1, . . . , |Ahigh| do 5: Assign client Ahigh,i to servers B(i 1)d+1 . . . Bid Assign high load clients 7: j d|Ahigh| + 1. 8: for i = 1, . . . , |Alow| do 9: for k = 1, . . . , d do 10: if q Alow,i = 0 then 11: Assign client Alow,i to a server uniformly at random Random assignment 13: Assign client Alow,i to server Bj Greedy assignment 15: if bj > q Alow,i then 16: bj bj q Alow,i, q Alow,i 0 18: q Alow,i q Alow,i bj, j j + 1 Switch to a new server 20: end for 21: end for Robust Load Balancing 3.2 Analysis The main purpose of this section is to prove the first part of Theorem 7, i.e., Randomized greedy achieves 1 1 λ exp( λ) + ϵ approximation w.h.p. Randomized greedy possesses two important properties, the volumetric property and the expansion property. We start with the volumetric property, which states that the estimate load for any subset of clients roughly equals the number of connected servers (up to some scaling factors). This holds even if one ignores those randomly assigned servers. In particular, the volumetric property guarantees that when the estimation is very accurate (i.e. λ = 0), Randomized greedy achieves (1 + 1 d) approximation. Lemma 8 (Volumetric property) When the estimation is accurate (λ = 0), randomized greedy achieves 1 + 1 d approximation. Furthermore, consider the subgraph induced by the greedy assignment, (i) For client i Ahigh, any server in NG(i) has load at most 1 d maxi A pi; (ii) For client i Alow, any server in NG(i) has load at most 1 + 1 The graph constructed by randomized greedy exhibits good expansion property. Lemma 9 (Expansion property) Suppose ϵ > 0 and d Ω(log n/ϵ2 + 1/ϵ4). With probability 1 1/ poly(n) over the randomness of randomized greedy, for any subset of clients SA with |SA| n/d, one has |N(SA)| (1 O(ϵ)) |NG(SA)| + (n |NG(SA)|) 1 exp NG(SA) d |SA| and for any subset of clients SA with |SA| > n/d, one has |N(SA)| (1 O(ϵ)) |NG(SA)| + (n |NG(SA)|) 1 exp NG(SA) n The proof of Lemma 9 relies on the following two crucial observations. First, the greedy assignment itself spreads out and has good expansion properties. Second, for the random assignment, define Ej(j [n]) to be the probability event that the server j is connected with some client in SA. Though {Ej}j [n] are not independent in general, we prove they are negatively associated , and therefore, one can give a refined bound on expansion via concentration inequalities. The following Lemma characterizes the optimal flow value, and more importantly, it relates the expansion of graph to the optimal flow value. Lemma 10 Given a graph G(A, B, E) and load distribution p, the optimal flow value satisfies OPTG,p = max SA A p(SA) |N(SA)|. Ahmadian, Esfandiari, Mirrokni, Peng We now provide a proof sketch for the first part of Theorem 7. While we focus on a special case, it gives a taste of the whole proof. The full proof can be found at Appendix B. Proof [Proof Sketch of Theorem 7, Part 1] Let SA be the set of clients satisfying OPTG,p = p(SA) |N(SA)|. (4) The existence of such set is guaranteed by Lemma 10. Let SB = N(SA) be the neighbor servers of clients in SA. Let k1 = |SA| be the cardinality of SA and let k2 = |NG(SA)| be the number of greedily assigned neighbors of SA. In this proof sketch, we assume (i) All clients in SA has estimate load less than d/n, i.e. qi d n for all i SA; and, (ii) The cardinality of SA is fairly large, i.e. |SA| = k1 > n/d. The volumetric property shown in Lemma 8 guarantees that under the estimate load distribution q, the greedy assignment ensures that all servers in NG(SA) has load no more than (1 + 1 q(SA) |NG(SA)| = q(SA) Moreover, since the TV distance between p and q is at most λ, one has p(SA) λ + q(SA) d + 1 d 1 nk2 + λ. (6) The expansion property (Lemma 9) implies that, with probability 1 1/ poly(n), one has |N(SA)| (1 O(ϵ)) k2 + (n k2) 1 exp k2 n Combining Eq. (4)(5)(6)(7) and the fact that OPTp 1 n, one can bound the approximation ratio cnd = OPTG,p OPTp p(SA)/|N(SA)| k2 + (n k2) 1 exp k2 n Optimizing the above expression, one can upper bound cnd by 1 1 λ exp( λ) + O(ϵ) . We conclude the proof sketch. Worst case guarantee A direct corollary of Theorem 7 is that even when the estimate is wrong (i.e. the adversarial case), Randomize greedy still guarantees e e 1 + ϵ approximation w.h.p. We get this by plugging in λ = 1. Robust Load Balancing Corollary 11 (Worst case guarantee) Suppose the actual load distribution is unknown and the estimate is wrong. For any ϵ > 0, suppose d Ω log n/ϵ2 + 1/ϵ4 . Then with probability 1 1/ poly(n), randomized greedy still achieves e e 1 + ϵ approximation. Bounded storage Though it is not the main focus of this paper, one can adapt the graph returned by Randomize greedy and make the storage well-balanced on each server. In particular, we can ensure that each server is connected to at most O(d/ϵ) clients (recall in average a server connects to d clients). Corollary 12 (Bounded storage) For any ϵ > 0, λ [0, 1] and d Ω log n/ϵ2 + 1/ϵ4 . There is an algorithm that achieves 1 1 λ exp( λ) + ϵ approximation with probability 1 1/ poly(n) and guarantees that each server is connected to at most O(d/ϵ) clients. 3.3 Lower bound We finish the second part of Theorem 7 and present an information theoretical lower bound on the approximation ratio, parameterized by the estimation error λ. We give a sketch of the construction and the detailed proof can be found in the Appendix B. Proof [Proof Sketch of Theorem 7, Part 2] We set the estimate load distribution q to be uniform over the first n/d clients and 0 on other clients. Consider the following hard distribution on the actual load p, we randomly remove λ units of load from the first n/d clients and put them into random λn/d clients in {n/d + 1, , n}. The offline optimal (i.e. OPTp) is always 1/n, while by considering the expected size of N(SA), where SA is the set of clients with non-zero loads, one can argue that the best possible approximation ratio is 1 1 λ exp( λ) ϵ . We also provide a computational lower bound, showing that even one knows the actual load distribution (i.e., λ = 0), the problem is still NP-hard. It is proved through a careful reduction from the subset-sum problem. Theorem 13 Suppose the estimation is accurate (λ = 0) and the budget d < n1 o(1), then it is NP-hard to find the optimal graph and allocation. 4. Online load balancing The second subroutine is to assign online requests to servers, with the goal of balancing loads on servers. We analyse the performance of d-choice algorithm under two regimes: The small load regime T = n and the large load regime T . We note by a standard argument, the approximation guarantee of T = n carries over to all range of T. Theorem 14 (Formal version of Theorem 5) Let d = Ω(log n). Suppose the bipartite graph is constructed using Randomize greedy and we assign online requests with the d-choice algorithm, then with probability 1 1/ poly(n), When the number of requests T = n, the maximum load on servers is bounded by O(1) 1, n d maxi C pi . Ahmadian, Esfandiari, Mirrokni, Peng When the number of requests T = Ω n2 log n ϵ2 , the maximum load on servers is (1 + ϵ) approximate to the optimal allocation. 4.1 The small load regime We prove the first part of Theorem 14 with proof details deferred to Appendix D. Let T = n, D = 1, n d maxi C pi . It is clear that the maximum load is at least D. Bounds at initialization For each server j, define its load score as d 1{j N(i)}. (8) Intuitively, the load score sj equals the expected number of requests that server j receives, if every client assigns the request uniformly at random. Let Blight contains servers with low load score, i.e., Blight := {j : sj 2 + D}. The main technical lemma states that with high probability, each client is connected with at least 1 2 κ d servers in Blight given d Ω(log(d)/κ2). Lemma 15 Let κ > 0 and d Ω(log(d)/κ2), then with probability 1 1/ poly(n), |N(i) Blight| (1 2 κ)d holds for every client i [n], Bounds during the process A server is empty if it does not receive any request. We prove that with high probability, the maximum load on servers is 1 after the first n 16(2+D) requests arrived. This is obtained by showing that for each client, it has at least d/4 empty neighbor servers after the arrival of the first n 16(2+D) requests. By the allocation rule of d-choice algorithm, it implies the maximum load is 1 after n 16(2+D) requests. Lemma 16 Let κ > 0 and d Ω(log(d)/κ2). With probability 1 1/ poly(n), for each client i [n], after the arrival of the first 1 16 κ 1 2+Dn requests, there are at least d 4 empty servers in N(i). Proof For t [0 : 1 16 κ 1 2+Dn], we use Et to denote the event that for every client i [n], N(i) contains at least d/4 empty servers at time t. We prove Pr[Et] 1 t nc for some fixed constant by induction. The base case is trivial as Pr[E0] = 1, i.e., every server is empty at the beginning. Suppose the argument holds up to time k 1. For any i [n], t [k], let the random variable a(i, t) = 1 if the t-th request goes to N(i) Blight and every client has at least d/4 empty neighbor servers at time t 1; a(i, t) = 0 otherwise. Denote the randomness of round Robust Load Balancing τ(τ t 1) to be wτ, then we have Pr [a(i, t)|w1, . . . , wt 1] i [n] Pr[client i assigns request to N(i) Blight] Pr[client i receives a request] i [n] pi |N(i ) N(i) Blight| j N(i) Blight d 1{j N(i )} j N(i) Blight d 1{j N(i )} j N(i) Blight The second step uses the fact that there are at least d/4 empty servers in N(i ) and the d-choice algorithm assigns requests uniformly at random. The fifth step follows from the definition of sj (see Eq. (8)), and the last step follows from sj 2+D when j N(i) Blight Blight and |N(i) Blight| d. Hence we know that t=1 a(i, t) = 4d(2 + D) 16 κ 1 2 + Dn Since the random sequence {a(i, t)}t [k] forms a martingale, using Azuma-inequality, one has t=1 a(i, t) 1 exp( 4(3κ)2d/2) 1 nc+2 . (9) We use d (c + 2)κ 2 log n in the last step. Ahmadian, Esfandiari, Mirrokni, Peng Consequently, we have that Ek Pr[Ek 1] + Pr[Ek | Ek 1] t=1 a(i, t) 1 4 κ d | Ek 1 Pr Pt k=1 a(i, t) 1 nc + n 1 nc+2 1 t The second step follows from an union bound over all clients i [n] and the fact that |N(i) Blight| (1 2 κ)d, the fourth step follows from the induction and Eq. (9). We conclude the proof here. Combining Lemma 15, Lemma 16, one can prove the d-choice algorithm yields constant factor approximation when T = n, and thus proves the first part of Theorem 14. 4.2 The large load regime We next move to the second part of Theorem 14 and prove the asymptotic convergence of the d-choice algorithm. We first observe the equivalence between the d-choice algorithm and a MWU updating scheme (Algorithm 2). Algorithm 2 MWU for load balancing 1: Initialize rj(1) = 1 n for all server j (j [n]) 2: for t = 1, . . . , T do 3: Set s(i) = arg minj N(i) rj(t) for all clients i [n]. 4: Draw a request it p and assign it to server s(it) 5: Update Lj(t) denotes the load on server j rj(t + 1) = exp(ηLj(t)) P j [n] exp(ηLj(t)) Lemma 17 The Algorithm 2 is equivalent to the d-choice algorithm for any η > 0. We can now prove the second part of Theorem 14. We note the proof does not rely on properties of the underlying bipartite graph. Proof [Proof of Theorem 14, Part 2.] Robust Load Balancing We use A Rn n to denote the polytope of feasible assignments. That is, for any A A, it obeys j N(i) Aj,i = 1 i [n] Aj,i = 0 i [n], j / N(i) Consider the following zero-sum game: We fix a load distribution p on the clients. The first player, which we refer as the allocation player, commits A A and the second player, which we we refer as the load player, commits a distribution r n over servers. The loss for the allocation player is r Ap and the load player receives loss r Ap. First, a few observations. The polytope A and n are convex. Moreover, the loss for both players are linear in their actions. Hence, we can apply the Minmax Theorem. Lemma 18 (Minmax Theorem Neumann (1928)) The minmax theorem guarantees that min A A max r n r Ap = max r n min A A r Ap = OPTp Proof The first equality directly follows our previous discussion. That is, the strategy space is convex and the loss is linear. To see the second one. Let A be the optimal allocation for p, then it is clear that r Ap OPTp holds for any r n. While by Lemma 10, we know that OPTp = max SA [n] p(SA) |N(SA)|. If we take r to be uniform distribution over N(SA), then it is clear that r Ap OPTp, thus concluding the proof of this Lemma. Back to our proof. We use At to denote the allocation at time t. In the online process, at time t, the load player commits r(t) n, receives loss Ateit 3 and updates according to the MWU rule. Here eit Rn is the one-hot vector with it-th entry equals 1 and others equal 0. Then by the regret guarantee of MWU Arora et al. (2012), one has t=1 r(t) Ateit min r n t=1 r, Ateit + log n t=1 r, Ateit + 2 p T log n. (10) The first inequality follows from the regret bound of MWU, we set η = q T in the second step. Denote Xt = r(t) Ateit, then we know that E[Xt|ei1, , eit 1] = r(t) Atp 3. We remark that At depends on r(t) but it does not. Ahmadian, Esfandiari, Mirrokni, Peng and |Xt| 1. X1, . . . , Xt forms a martingale, by Azuma-Hoeffding bound, we have t=1 r(t) Ateit t=1 r(t) Atp 1/ poly(n). (11) Finally, since the maximum load on servers equals maxr n PT t=1 r, Ateit , with probability 1 1/ poly(n), one has t=1 r, Ateit t=1 r(t) Ateit + 2 p t=1 r(t) Atp + O( p t=1 r(t) Ap + O( p T max r min A r Ap + O( p = T OPTp +O( p The first step follows from Eq. (10), the second step follows from Eq. (11). The third step follows from the fact that we choose At A that minimizes r(t) Ap. The last step follows from Lemma 18. Since OPTp 1/n, we know that after T rounds, the solution is within 1 + n log n to the optimal. 5. Experiment In this section, we describe different sets of experiments to show the efficiency of d-choice algorithm as well as the effect of the initial graph construction on the performance of the load balancing algorithm. For our experiments, we use two types of data: (a) real query arrivals generated from Google production web search backend, (b) synthetic query arrivals modeled based on well-established distributions. In our real world experiments, we first fix the underlying graph based on the algorithm of choice which can be the randomized greedy algorithm, the random allocation algorithm or the greedy allocation algorithm. Then we fix the stream of the incoming requests and use d-choice algorithm to assign these queries. For determining the query stream, we first determine a probability distribution p(t) over clients for each timestamp t and sample a client to receive a query with respect to p(t). In the case of real data, we use the clients load data on machines and generate p(t) proportional to this load distribution. For synthetic data, we fix a uniform well-known probability distribution for all timestamps. Robust Load Balancing After fixing the stream of the requests and the underlying graph, we utilize the d-choice algorithm to assign a client s query to its designated servers. We record the maximum server load for d-choice algorithm over time. Additionally, we periodically calculate the lowerbound on the optimal maximum load by calculating the optimal flow and the theoretical lower bound. Note that the theoretical bound at time t is basically the maximum of the average number of balls per bin, i.e., t/n, and maximum over all clients loads divided by d. We scale these three values by dividing them by t at time t and then plot them. Our experiments validate the effectiveness of our proposed algorithm. 5.1 Power of d-choice algorithm in real world In this section, we measure the performance of the d-choice algorithm. We perform two sets of experiments: (i) we fix an underlying graph by randomized greedy and then compare the performance of the d-choice algorithm to optimal flow value as well as theoretical lower-bound on the machine loads, (ii) we generate three different underlying graphs by randomized greedy algorithm, the random allocation algorithm, and the greedy allocation algorithm and compare the performance of the d-choice algorithm on the same stream of requests in (i) on these three networks. We look at three 3 independent query logs from data centers located in Asia(AS), Europe(EU), and North America(NA). For each timestamp, we process the load of different clients on machines to estimate the probability p(t) of a client receiving a query at timestamp t. We then use p(t) to sample queries and decide which client receives a query at timestamp t. We also use the initial distribution, i.e., p(0), to construct the input graph and assign all clients to the same number of servers. The graphs have the same number of machines as clients. We keep the constructed graph fixed over the period of simulation despite the fact that the distribution of loads over clients can change over time. Comparison to theoretical lower-bounds After fixing the underlying graph by randomized greedy, we apply the d-choice algorithm to queries arrived over time. We process the incoming queries and assign a client s query to one of its designated servers by d-choice algorithm. During this process, we periodically compute the optimal flow as well as the theoretical lower-bound.4 The plots on the left hand side of Figure 1 depict the performance of d-choice (denoted as OLB) for three different data sets over time. We can see that as the number of queries grows, for small number of queries, the ratio of max load of servers of the d-choice algorithm to the theoretical lower-bound is at most 1.5, and it quickly converges to 1 as number of queries grow. Comparison on different networks While we measure the performance of the d-choice algorithm, we also study the effect of the the underlying graph. This is done by fixing an input graph using randomized greedy algorithm, the random allocation algorithm, and the greedy allocation algorithm and then running the d-choice algorithm. We use the same query stream in the previous experiments and hence the plot for randomized greedy is the same as the plot in previous section (Note the y-axis shift). The plots on the right hand 4. We use this theoretical lower bound instead of the theoretical optimal because the later is NP-hard to compute (Theorem 13). Our lower bound is 1 + 1 d approximate to optimal (Lemma 8). Ahmadian, Esfandiari, Mirrokni, Peng side of 1, show that randomized greedy perform better than both random allocation and greedy allocation for all data sets. However, the difference seems to be within the margin of error. (a) AS data set (b) EU data set (c) NA data set Figure 1: Performance of d-choice algorithm on real query loads 5.2 Synthetic data We also perform experiments on the synthetic data. In the experiments, we take 200 clients and 200 servers (i.e. |A| = |B| = n = 200) and set the budget constraint d = 5. Comparison theoretical lower-bounds We run experiments on different load distributions and compare the performance of our load balancing algorithm with the optimal flow value of the graph as well as and the trivial theoretical lower bound. We generate Robust Load Balancing the estimate distribution based on three different families of distributions: the Gaussian distribution, the Exponential distribution and the Multinomial distribution. For the Gaussian (resp. Exponential distribution), we draw each qi N(0, σ) (resp. pi Exp(σ)) and normalize q, where σ denotes the variance and we set σ = 0.1 in the experiment; For the Multinomial distribution, we choose a random subset clients SA of size n/d and set qi = d/n for each i SA. The real load distribution is generated by mixing the estimate distribution together with a perturbation distribution, i.e. p = (1 β)q + βϵpb, where ϵpb denotes the perturbation distribution, for which we take a Multinomial distribution. The mixing rate, β, is an upper bound on the total variation distribution between the estimated distribution and the real distribution (i.e. λ β).5 We vary the mixing rate β = 0, 0.2, 0.5, 1. The experiment results can be found in Figure 2 and Table 1. In Figure 2, we plot the load balancing dynamics, where we generate 20000 requests and take a snapshot for each 200 iterations. The experiments validate our theoretical results: (1) the d-choice algorithm has constant approximation when the load is small and it approaches to the optimal flow value asymptotically; (2) the performance of randomized greedy smoothly changes with the mixing rate. In Table 1, we compute the limiting approximation ratio, this time, we generate 200000 requests and take a median over 9 independent experiments. β = 0 β = 0.2 β = 0.5 β = 1 Multinomial 1.134 1.275 1.414 1.729 Gaussian 1.01 1.04 1.24 1.62 Exponential 1.03 1.05 1.15 1.66 Table 1: The limiting approximation ratio of d-choice algoirthm Comparison on different networks We compare the randomized greedy algorithm with two baseline algorithms: the random allocation algorithm and the greedy allocation algorithm. The random allocation algorithm assigns a random set of servers (of size d) for each client and it is commonly adopted in practice. The greedy allocation algorithm assigns clients to servers using the greedy heuristics (see Line 13 in Algorithm 1). For the leftover budget, it uses a deterministic strategy and makes assignment to the next consecutive servers. In contrast, our randomize greedy assigns the leftover budget with random allocations (i.e., Line 11 in Algorithm 1). We choose the estimate distribution to be Multinomial distribution and vary the mixing rate in the experiment. Our experiment results are shown in Figure 4 and Figure 3. In Figure 4, we vary the mixing rate and compare the performance of d-choice algorithm on three set of graphs. We generate 200000 requests for each experiments and we take the median of limiting approximation ratios over 9 independent experiments. For clear presentation, we take logarithmic scale for the y axis in Figure 4. In Figure 3, we plot the load balancing dynamic for three different graphs and take the mixing rate β = 0, 0.2, 0.5, 1. Again, we generate 20000 requests and take a snapshot for each 200 iterations. 5. We use mixture of distributions (instead of hard total variation perturbation) because it is more natural in practice and easier to generate. Furthermore, for distributions generated from Gaussian, it is impossible to make the TV distance be 1. Ahmadian, Esfandiari, Mirrokni, Peng (b) β = 0.2 (c) β = 0.5 Multinomial distribution (f) β = 0.2 (g) β = 0.5 Gaussian distribution (j) β = 0.2 (k) β = 0.5 Exponential distribution Figure 2: Performance of d-choice on different distributions Robust Load Balancing These experiments indicate that when β 0, our algorithm matches the performance of greedy allocation; when β 1, our algorithm matches the random allocation, while in the middle range, our performance is better than both of them. (b) β = 0.2 (c) β = 0.5 Greedy allocation (f) β = 0.2 (g) β = 0.5 Randomize Greedy (j) β = 0.2 (k) β = 0.5 Random allocation Figure 3: Experiments on different graphs Figure 4: Comparison of three network algorithms Ahmadian, Esfandiari, Mirrokni, Peng Comparison of different budgets Our theoretical results indicate that as long as the budget d scales logarithmic of the size of the graph, then we can achieve robust and near optimal performance. We further provide experimental supports and exam the effect of budget d on the performance of our algorithm. We conduct experiments with Multinomial distribution and vary the degree to be d = 3, 5, 8, 10. The experiment outcome is shown in figure 5. We observe that as long as we choose d 5, the performance is very close, indicating that in practice, the requirement on the budget might be even smaller than the theoretical bound. (b) β = 0.2 (c) β = 0.5 Figure 5: Comparison of different budgets 6. Conclusion In this paper, we introduce a new model for load balancing by extending the classical ballsinto-bins problem that better suits very large databases with massive number of queries such as search engines. Our new model formulates the real-time query load balancing problem as a two-stage optimization task and obtains near optimal algorithm for both stages. Our paper opens up a number of future directions. First, we assume the online query has unit weight in our paper, a natural open question is to generalize it to arbitrary query weight. Second, we assume the budget d = Ω(log n) in our paper, as we explained in Section 1.3, this is necessary to obtain O(1) approximation, but it would be still interesting to obtain poly log(n) approximation when the budget d = O(1). Finally, our asymptotic bound of d-choice (i.e., the second part of Theorem 5) makes use of a connection with the MWU scheme. This connection seems to be very general and holds for greedy-type algorithms. It would be interesting to see if it is applicable to other online algorithms. Acknowledgments Part of the work was done when Binghui Peng was an intern in Google Research NYC. We wish to thank CliffStein for early stage discussion on the project, and suggesting the NP-hardness instance. We thank Hengjie Zhang for useful discussion on the MWU scheme. Binghui Peng is supported in part by Christos Papadimitriou s NSF grants CCF-1763970 AF and CCF-1910700 AF, and Xi Chen s NSF grants CCF-1703925. Robust Load Balancing Anders Aamand, Mathias Bæk Tejs Knudsen, and Mikkel Thorup. Power of d choices with simple tabulation. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. Anders Aamand, Jakob Bæk Tejs Knudsen, and Mikkel Thorup. Load balancing with dynamic set of balls and bins. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1262 1275, 2021. Sara Ahmadian, Allen Liu, Binghui Peng, and Morteza Zadimoghaddam. Distributed load balancing: a new framework and improved guarantees. In 12th Innovations in Theoretical Computer Science Conference (ITCS 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2021. Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1):121 164, 2012. Yossi Azar. On-line load balancing. In Online algorithms, pages 178 195. Springer, 1998. Yossi Azar, Andrei Z Broder, Anna R Karlin, and Eli Upfal. Balanced allocations. In Proceedings of the twenty-sixth annual ACM symposium on Theory of computing, pages 593 602, 1994. Yossi Azar, Ilan Reuven Cohen, and Debmalya Panigrahi. Randomized algorithms for online vector load balancing. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 980 991. SIAM, 2018. Maria-Florina Balcan, Travis Dick, Tuomas Sandholm, and Ellen Vitercik. Learning to branch. In International conference on machine learning, pages 344 353. PMLR, 2018. Etienne Bamas, Andreas Maggiori, Lars Rohwedder, and Ola Svensson. Learning augmented energy minimization via speed scaling. In Advances in Neural Information Processing Systems, 2020a. Etienne Bamas, Andreas Maggiori, and Ola Svensson. The primal-dual method for learning augmented algorithms. In Advances in Neural Information Processing Systems, 2020b. Petra Berenbrink, Artur Czumaj, Angelika Steger, and Berthold Vöcking. Balanced allocations: The heavily loaded case. SIAM Journal on Computing, 35(6):1350 1385, 2006. Hans-Joachim Böckenhauer, Dennis Komm, Rastislav Královič, and Richard Královič. On the advice complexity of the k-server problem. Journal of Computer and System Sciences, 86:159 170, 2017. Joan Boyar, Lene M Favrholdt, Christian Kudahl, Kim S Larsen, and Jesper W Mikkelsen. Online algorithms with advice: a survey. Acm Sigact News, 47(3):93 129, 2016. Xue Chen. Derandomized balanced allocation. In Proceedings of the Thirtieth Annual ACMSIAM Symposium on Discrete Algorithms, pages 2513 2526. SIAM, 2019. Ahmadian, Esfandiari, Mirrokni, Peng Ilan Reuven Cohen, Binghui Peng, and David Wajc. Tight bounds for online edge coloring. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 1 25. IEEE, 2019. José Correa, Patricio Foncea, Ruben Hoeksma, Tim Oosterwijk, and Tjark Vredeveld. Posted price mechanisms for a random stream of customers. In Proceedings of the 2017 ACM Conference on Economics and Computation, pages 169 186, 2017. Søren Dahlgaard, Mathias Bæk Tejs Knudsen, Eva Rotenberg, and Mikkel Thorup. The power of two choices with simple tabulation. In Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms, pages 1631 1642. SIAM, 2016. Stefan Dobrev, JeffEdmonds, Dennis Komm, Rastislav Královič, Richard Královič, Sacha Krug, and Tobias Mömke. Improved analysis of the online set cover problem with advice. Theoretical Computer Science, 689:96 107, 2017. Jon Feldman, Aranyak Mehta, Vahab Mirrokni, and Shan Muthukrishnan. Online stochastic matching: Beating 1-1/e. In 2009 50th Annual IEEE Symposium on Foundations of Computer Science, pages 117 126. IEEE, 2009. P Brighten Godfrey. Balls and bins with structure: balanced allocations on hypergraphs. In Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms, pages 511 517, 2008. Sreenivas Gollapudi and Debmalya Panigrahi. Online algorithms for rent-or-buy with expert advice. In International Conference on Machine Learning, pages 2319 2327, 2019. Anupam Gupta, Guru Guruganesh, Binghui Peng, and David Wajc. Stochastic online metric matching. In International Colloquium on Automata, Languages, and Programming, 2019. Zhiyi Huang and Xinkai Shu. Online stochastic matching, poisson arrivals, and the natural linear program. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 682 693, 2021. Zhiyi Huang, Ning Kang, Zhihao Gavin Tang, Xiaowei Wu, Yuhao Zhang, and Xue Zhu. Fully online matching. Journal of the ACM (JACM), 67(3):1 25, 2020. Sungjin Im, Nathaniel Kell, Debmalya Panigrahi, and Maryam Shadloo. Online load balancing on related machines. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 30 43, 2018. Piotr Indyk, Singer Yaron, Ali Vakilian, and Sergei Vassilvitskii. Summer workshop on learning-based algorithms, 2019. http://www.mit.edu/~vakilian/ttic-workshop. html. Richard M Karp, Umesh V Vazirani, and Vijay V Vazirani. An optimal algorithm for online bipartite matching. In Proceedings of the twenty-second annual ACM symposium on Theory of computing, pages 352 358, 1990. Robust Load Balancing Silvio Lattanzi, Thomas Lavastida, Benjamin Moseley, and Sergei Vassilvitskii. Online scheduling via learned weights. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1859 1877. SIAM, 2020. Christoph Lenzen, Merav Parter, and Eylon Yogev. Parallel balanced allocations: The heavily loaded case. In The 31st ACM Symposium on Parallelism in Algorithms and Architectures, pages 313 322, 2019. Thodoris Lykouris and Sergei Vassilvtiskii. Competitive caching with machine learned advice. In International Conference on Machine Learning, pages 3296 3305. PMLR, 2018. Vahab Mirrokni, Mikkel Thorup, and Morteza Zadimoghaddam. Consistent hashing with bounded loads. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 587 604. SIAM, 2018. Michael Mitzenmacher. A model for learned bloom filters and optimizing by sandwiching. In Advances in Neural Information Processing Systems, pages 464 473, 2018. Michael Mitzenmacher and Eli Upfal. Probability and computing: Randomized algorithms and probabilistic analysis, 2005. Michael Mitzenmacher and Sergei Vassilvitskii. Algorithms with predictions. ar Xiv preprint ar Xiv:2006.09123, 2020. J v Neumann. Zur theorie der gesellschaftsspiele. Mathematische annalen, 100(1):295 320, 1928. Steven Phillips and Jeffery Westbrook. Online load balancing and network flow. In Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, pages 402 411, 1993. Manish Purohit, Zoya Svitkina, and Ravi Kumar. Improving online algorithms via ml predictions. In Advances in Neural Information Processing Systems, pages 9661 9670, 2018. Andrea W Richa, M Mitzenmacher, and R Sitaraman. The power of two random choices: A survey of techniques and results. Combinatorial Optimization, 9:255 304, 2001. Dhruv Rohatgi. Near-optimal bounds for online caching with machine learned advice. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1834 1845. SIAM, 2020. Kunal Talwar and Udi Wieder. Balanced allocations: the weighted case. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 256 265, 2007. Berthold Vöcking. How asymmetry helps load balancing. Journal of the ACM (JACM), 50 (4):568 589, 2003. Alexander Wei and Fred Zhang. Optimal robustness-consistency trade-offs for learningaugmented online algorithms. In Advances in Neural Information Processing Systems, 2020. Ahmadian, Esfandiari, Mirrokni, Peng Organization The appendix is organized as follow. We provide probabilistic tools in Appendix A. Missing proof from Section 3 can be found at Appendix B and Appendix C. Missing proof from Section 4 can be found at Appendix D. Appendix A. Probabilistic tools Lemma 19 (Chernoffbound) Let X = Pn i=1 Xi, where Xi = 1 with probability pi and Xi = 0 with probability 1 pi, and all Xi are independent. Let µ = E[X] = Pn i=1 pi. Then 1. Pr[X (1 + δ)µ] exp( δ2µ/3), δ > 0 ; 2. Pr[X (1 δ)µ] exp( δ2µ/2), 0 < δ < 1. Lemma 20 (Hoeffding bound) Let X1, , Xn denote n independent bounded variables in [ai, bi]. Let X = Pn i=1 Xi, then we have Pr[|X E[X]| t] 2 exp 2t2 Pn i=1(bi ai)2 Lemma 21 (Azuma-Hoeffding bound) Let X0, . . . , Xn be a martingale sequence with respect to the filter F0 F2 Fn such that for Yi = Xi Xi 1, i [n], we have that |Yi| = |Xi Xi 1| ci. Then Pr[|Xt Y0| t] 2 exp t2 2 Pn i=1 c2 i Lemma 22 (Azuma bound, the multiplicative form) Let X1, . . . , Xn be a set of random variable with Xi {0, 1} for any i [n], and satisfy E[Xi|X1, . . . , Xn] µ Then, one has i=1 Xi (1 + δ)nµ] exp δ2nµ/3 . Definition 23 (Negative association) A set of random variables X1, . . . , Xn is said to be negatively associated (NA) if for any two disjoint sets I, J [n] and two functions f, g that are monotone increasing or both monotone decreasing, it holds E[f(XI) g(XJ)] E[f(XI)] E[g(XJ)] where we denote XI = {Xi : i I} Lemma 24 (Closure property for NA) We know that The union of independent sets of NA random variables is NA. That is, if X1, . . . , Xn are NA, Y1, . . . , Ym are NA, and {Xi}i [n] are independent of {Yj}j [m], then X1, . . . , Xn, Y1, . . . , Ym is NA. Robust Load Balancing Concordant monotone functions defined on disjoint subsets of a set of NA random variables are NA. That is, suppose f1, . . . , fk : Rn R are all monotonically increasing or all monotone decreasing, where each fi depends on disjoint subsets of [n], S1, S2, . . . , Sk [n]. In that case, if X1, . . . , Xn are NA, then the set of random variables Y1 = f1(S1), . . . , Yk = fk(Sk) are NA. Lemma 25 (Permutation Distributions are NA) Let x1 xn be n values and let X1, . . . , Xn be random variables such that {X1, X2, . . . , Xn} = {x1, x2, . . . , xn} always, with all possible assignments equally likely. Then X1, X2, . . . , Xn are NA. Lemma 26 (Chernoff-Hoeffding bounds for NA variables) . Let X1, . . . , Xn be NA random variables with Xi [ai, bi] always. Then Y = P i Xi satisfies Hoeffding s upper tail bound. Namely, Pr [|Y E[Y ]| t] 2 exp 2t2 P i(ai bi)2 In particular, if Xi {0, 1}, we have Pr[Y (1 + δ) E[Y ]] exp δ2 E[Y ] Pr[Y (1 δ) E[Y ]] exp δ2 E[Y ] Appendix B. Missing proof from Section 3.2 We first prove Lemma 8, which asserts Randomize greedy achieves d+1 d approximation when the estimate load is accurate (λ = 0), and more importantly, it gives the volumetric property. Proof [Proof of Lemma 8] Consider the assignment induced by the Randomize greedy algorithm. Since we assign each client in Ahigh to d different servers and these servers do not receive loads from any other client, we conclude that the maximum load on the them is less than 1 d maxi A pi. For the rest n d|Ahigh| servers, the load can exceed 1/n only when a client has d greedily assigned servers and does not distribute all its load. In this case, since the client connects to d servers and the last (d 1) servers it connects with do not connect to any other clients, the leftover load is at most d n = 1/n. Since we split the leftover load uniformly over its d neighbor servers and these d servers only receive leftover load from this client, the total load on each server is at most 1 d . Finally, notice that it is clear that OPTp 1 n, and since the maximum degree is bounded by d, we also have OPTp 1 dpi holds for i A. Thus, the optimal solution satisfies d max i A pi Since no servers in the our assignment has load more than max 1 d maxi A pi , we conclude it achieves 1 + 1 d approximation. Ahmadian, Esfandiari, Mirrokni, Peng To prove Lemma 9, we need to following technical result which states the membership of neighborhood is negative associate. Lemma 27 For any subset of clients SA A, Let the random variable Xj (j [n]) indicate whether the j-th server is connected to some client in SA, i.e., Xj = 1i N(SA). Then X1, . . . , Xn is negative associated. Proof For any i SA, j [n], define Yi,j = 1j NR(i), that is, Yi,j indicates whether server j is a randomly assigned neighbor of client i. We first show that for any i SA, {Yi,j}j is NA. To see this, we further define Zi,j,k = 1 if the k-th random edge of client i connects to machine j, where i SA, j [n], 1 k d n G(i). {Zi,j,k}j is NA since it is a random permutation of (1, , 0) (see Lemma 25). Furthermore, For any different k, k [d n G(i)], {Zi,j,k}j are independent of {Zi,j,k }j, by the closure property of NA (see Lemma 24), we know that {Zi,j,k}j,k is NA. Furthermore, d n G(i)+1 X k=1 Zi,j,k, 1 Since Yi,j (j [n]) are defined on disjoint sets of {Zi,j,k}j,k and the function is monotone, by the closure property of NA (see Lemma 24), we know that {Yi,j}j is NA. For any different i, i SA, {Yi,j}j are independent of {Yi ,j}j. By the closure property of NA (see Lemma 24), we know that {Yi,j}i,j is NA. Furthermore, we have 1i NG(SA) + X i SA Yi,j, 1 That is, Xj (j [n]) are defined on disjoint sets {Yi,j}i,j and they are monotone increasing. Hence, by the closure property of NA (see Lemma 24), X1, . . . , Xn is NA. We next prove Lemma 9, which is regard the expansion property of Randomize greedy. Proof [Proof of Lemma 9] Fix a subset of clients SA A and any server j [n], we use the random variable Xj to indicate whether server j is a neighbor of client SA, i.e., Xj = 1i N(SA). For the purpose of proof, we assign a new random server for each client, i.e., we assume there are (d + 1) neighbor servers for each client and there is at at least one randomly assigned neighbor servers. This is only used for the proof and we get to remove its effect at the end. Robust Load Balancing For any server j NG(SA), we know that E[Xj] = 1. For any server j [n]\NG(SA), we have E[Xj] = 1 Y 1 d + 1 NG(i) i SA exp d + 1 NG(i) P i SA d + 1 NG(i) 1 exp NG(SA) d |SA| The second step follows from exp( x) 1 x. For the last step, the overlap between NG(i) (i SA) counts for at most 1 for each client i SA, and therefore, we have i SA NG(i) |SA|, and thus, X i SA d + 1 NG(i) d|SA| NG(SA). By Lemma 27, we have that X1, , Xn is NA and we are going to apply concentration bounds. We divide into cases based on the cardinality of SA. Case 1. |SA| n/d. We have j [n] E[Xj] = |NG(SA)| + (n |NG(SA)|) 1 exp NG(SA) d |SA| | {z } :=Exps1(SA) 0 + n (1 exp( d|SA|/n) The second step holds since the expression is monotone decreasing with |NG(SA)| and takes minimum when |NG(SA)| = 0 (see Lemma 28), the third step follows from 1 e x (1 1 e)x holds for any x [0, 1]. Using Chernoffbound for NA random variable X1, , Xn (see Lemma 26), we have j [n] Xj (1 ϵ) X j [n] E[Xj] j [n] E[Xj]/2 exp( ϵ2dk/4), (14) Ahmadian, Esfandiari, Mirrokni, Peng where the second step follows from Eq. (13). Since we add one more extra neighbor for each clients, this would increase the size of neighbors of SA by at most |SA|. Thus, j [n] Xj |SA| X j [n] Xj e e 1 1 d j [n] E[Xj] X j [n] E[Xj], (15) where the second inequality follows from Eq. (13). Consider all subset of clients with size k n/d, combining Eq. (12)(14)(15), we have Pr h SA [n], |SA| = k, |N(SA)| Exps1(SA) i SA [n], |SA| = k, |N(SA)| 1 ϵ 2 j [n] E[Xj] exp( ϵ2dk/4) n k exp( ϵ2dk/4 + k log n) 1/ poly(n) The second step follows from the union bound and the last step uses the fact that d Ω(log(n)/ϵ2). Case 2. n/d < |SA| 1 2ϵ3n. By Eq. (12), we have j [n] E[Xj] = |NG(SA)| + (n |NG(SA)|) 1 exp NG(SA) d |SA| 0 + n (1 exp( d|SA|/n) The second step, again, holds because the expression is monotone increasing with |NG(SA)|. Since X1, , Xn is NA, by Chernoffbound, we have j [n] Xj (1 ϵ) X j [n] E[Xj]/2 exp( ϵ2n/4). (17) Furthermore, since we add one new random neighbor for each client, we have i SA Xi |SA| X i SA Xi ϵ3 X i SA E[Xi] (18) where the last step holds due to Eq. (16). Robust Load Balancing Exps2(SA) := |NG(SA)| + (n |NG(SA)|) 1 exp NG(SA) n Consider all subsets of clients of size k (n/d < k ϵ3n/2), we have Pr SA [n], |SA| = k, |N(SA)| (1 ϵ ϵ3) Exps2(SA) Pr SA [n], |SA| = k, |N(SA)| (1 ϵ ϵ3) Exps2(SA) SA [n], |SA| = k, |N(SA)| (1 ϵ ϵ3) X j [n] E[Xj] exp( ϵ2n/4) n k exp( ϵ2n/4) 2n H(ϵ3/2) exp( ϵ2n/8). The first step holds since |SA| > n/d, the second step holds due to Eq. (16). The third step follows from the union bound and Eq. (17). The fourth step follows from n k exp(n H(ϵ3/2)), where H(p) = p log2(1/p) + (1 p) log2(1/(1 p)) is the entropy function. The last step follows from H(ϵ3/2) = ϵ3 2 log2(2/ϵ3) + 1 ϵ3 log2(1/(1 ϵ3/2)) ϵ2/8. Case 3. |SA| ϵ3n/2. It suffices to prove that with probability at least 1 1/ poly(n), for any client set SA with |SA| = ϵ3n/2, |N(SA)| is greater than (1 ϵ)n. For any set of servers SB B with |SB| = ϵn, if none of the servers in SB is a neighbor of SA, then all randomly assigned servers from SA must go to [n]\SB, whose probability is bounded by Πi SA(1 ϵ)d NG(i) = (1 ϵ)d|SA| P i SA NG(i) (1 ϵ)n/ϵ = exp( n) The second step follows from d|SA| = dϵ3n/2 n/ϵ and P i SA NG(i) P i [n] NG(i) 2n. Taking an union bound over all set SA and SB, we conclude the proof. Lemma 10 relates the expansion of the graph to the optimal flow value. We provide the proof here. Proof [Proof of Lemma 10] First, it is clear that OPTG,p p(SA) |N(SA)| Ahmadian, Esfandiari, Mirrokni, Peng holds for any set of client SA [n], since the maximum load for servers in N(S) is at least p(SA) |N(SA)| in any feasible allocation. We delicate to prove the reverse direction. Let SB denote the set of servers with the maximum load in the optimal allocation, and we further assume SB has the minimum cardinality among all possible optimal allocations. Let SA be the set of clients whose neighbors are included in SB. We claim that servers in SB do not receive any loads from clients other than SA. Otherwise, we can shift load from SB to B\SB and reduce the cardinality of SB. This implies that OPT(G, p) = p(SA) |N(SA)|. Thus concluding the proof. We can now wrap up the proof of the first part (upper bound) of Theorem 7. Proof [Proof of Theorem 7, Part 1] Let SA be the set of clients satisfying OPTG,p = p(SA) |N(SA)|. (19) The existence of such set is guaranteed by Lemma 10. Let SB = N(SA) be the neighbor servers of clients in SA. Looking forward, one technical subtle of the proof is that we need to divide into cases based on the cardinality of SA, the intersection of SA with Ahigh, and the load on SA. We take moderate effort to simplify the proof but still left with six cases due to subtle difference among them. We start with a simple case that there is no heavy (estimate) load client in SA. No heavy load case. Assume SA Ahigh = . In another word, we assume clients SA have estimate load less than d/n, i.e. qi d n for all i SA. We use k1 to denote the number of clients in SA and use k2 to denote size of greedily assigned servers of clients in SA. That is, we denote k1 = |SA| and k2 = |NG(SA)| The fact that there is no heavy load client allows us to take advantage of Lemma 8. In particular, the volumetric property shown Lemma 8 guarantees that under the estimate load distribution q, the greedy assignment ensures that all servers in NG(SA) have load less than (1 + 1 n. This implies q(SA) |NG(SA)| = q(SA) In fact, we could also assume the equality holds in the above equation. This is w.l.o.g. since it only decreases the gap between p(SA) and q(SA), and reduce the distribution shift that is needed. Furthermore, since we only care about the load on clients in SA, it is also w.l.o.g to assume that the difference between p(SA) and q(SA) is exactly λ. Formally, we have q(SA) = d + 1 d 1 nk2 (20) Robust Load Balancing p(S) q(S) = λ. (21) Our analysis divides into cases based on the cardinality of clients set SA and the actual load p(SA). Case 1. |SA| = k1 > n/d. We start with the case that |SA| is fairly large. Since the total load on SA can never exceed 1, we know that p(SA) = λ + q(SA) = d + 1 d 1 nk2 + λ 1. (22) Using the expansion property shown Lemma 9, one has with probability 1 1/ poly(n), |N(SA)| (1 O(ϵ)) k2 + (n k2) 1 exp k2 n Since the optimal flow value in any feasible network can not go below 1/n (i.e. OPTp 1/n), the approximation ratio is bounded as cnd = OPTG,p OPTp p(SA)/|N(SA)| Plugging in Eq. (22)(23), we have cnd (1 + O(ϵ)) k2 + (n k2) 1 exp k2 n k2 + (n k2) dn | {z } λ = (1 + O(ϵ)) k2 + nλ k2 + (n k2) (1 exp ( λ )) (1 + O(ϵ)) n n nλ + nλ (1 exp ( λ )) = (1 + O(ϵ)) 1 1 λ exp( λ ) (1 + O(ϵ)) 1 1 λ exp( λ) where the second step follows from Eq. (22), i.e. d 1 nk2 + λ 1 k2 n nλ k2 Ahmadian, Esfandiari, Mirrokni, Peng In the third step, we take λ = λ+ k2 dn. The fourth step holds since the expression is monotone increasing of k2 when fixing λ (see Lemma 28) and k2 + nλ = d + 1 d k2 + nλ n k2 n nλ . The last step follows from λ = λ + k2 dn λ + 1/d, d Ω(1/ϵ) and Lemma 28. Case 2. |SA| = k1 n/d. We next consider the case that |SA| is small. By the expansion property shown in Lemma 9, with probability at least 1 1/ poly(n), one has |N(SA)| (1 O(ϵ)) k2 + (n k2) 1 exp k2 d k1 We further split the discussion based on the magnitude of load p(SA). Case 2-a. |SA| = k1 n/d and p(SA) d By Eq. (20)(21), We immediately have p(SA) = λ + q(SA) = d + 1 d 1 nk2 + λ d Similar to the first case, we have OPTp 1 n, and therefore, cnd = OPTG,p OPTp p(SA)/|N(SA)| Plugging in Eq. (24)(25), we have , we have cnd (1 + O(ϵ)) k2 + (n k2) 1 exp k2 d k1 k2 + (n k2) dn | {z } λ (1 + O(ϵ)) k2 + nλ k2 + (n k2) (1 exp ( λ )) (1 + O(ϵ)) n n nλ + nλ (1 exp( λ )) = (1 + O(ϵ)) 1 1 λ exp( λ ) (1 + O(ϵ)) 1 1 λ exp( λ). The first second step follows from Eq. (24)(25), and the second step follows from Eq. (25). More precisely, we use d 1 nk2 + λ d nk1 k2 d k1 Robust Load Balancing In the third step, we replace λ = λ + k2 dn. The fourth step holds since the expression is monotone increasing of k2 when fixing λ (see Lemma 28) and k2 + nλ = d + 1 d k2 + λ = np(SA) n k2 n nλ . The last step follows from λ = λ + k2 dn λ + 1/d, d Ω(1/ϵ) and Lemma 28. Case 2-b. |SA| = k1 n/d and p(SA) > d We next focus on the case that the actual load p(SA) is fairly large. Since the total load on SA can not exceed 1, we immediately have p(SA) = λ + q(SA) = d + 1 d 1 nk2 + λ d nk1, 1 . (26) Since there are at most d|SA| neighbors for clients in SA, we have d|SA| = p(SA) and therefore, OPTp p(SA)/|N(SA)| p(SA)/d|SA| . Plugging in Eq. (24), we have cnd (1 + O(ϵ)) dk1 k2 + (n k2) 1 exp k2 d k1 (1 + O(ϵ)) dk1 d + (n dk1 + nλ + k2 dn | {z } λ = (1 + O(ϵ)) (dk1 nλ ) + nλ (dk1 nλ ) + (n (dk1 nλ )) (1 exp ( λ )) (1 + O(ϵ)) (n nλ ) + nλ (n nλ ) + (n (n nλ )) (1 exp ( λ )) = (1 + O(ϵ)) 1 1 λ exp( λ ) (1 + O(ϵ)) 1 1 λ exp( λ). The second step holds since the denominator is monotone increasing with k2 and by Eq. (26), d 1 nk2 + λ d nk1 k2 dk1 nλ k2 Ahmadian, Esfandiari, Mirrokni, Peng In the third step, we replace λ = λ + k2 dn. The fourth step holds since the expression is monotone increasing of (dk1 nλ ) when fixing λ (see Lemma 28) and dk1 n. The last step follows from follows from λ = λ + k2 dn λ + 1/d, d Ω(1/ϵ) and Lemma 28. We next consider the general case that there are heavy load clients in SA. Heavy load case Assume SA Ahigh = . Let SA,high = SA Ahigh and SA,low = SA Alow. The major difference is that for servers in NG(SA,high), their load could be much larger than 1 + 1 n. Thus we can not directly apply the volumetric property proved in Lemma 8 and say the estimate load on SA roughly equals the number of neighbor servers. However, we will see that this won t degrade the performance of our algorithm. The intuition is that all clients in SA,high are connected to d different servers, which is already the best we can hope. For any client i SA,high, we first argue that it is w.l.o.g. to assume p(i) = q(i) d OPTG,p. The reason is as follow. (1) If pi > qi, then we can increase qi to pi. (2) If pi qi and pi > d/n, we can decrease qi to pi. (3) If pi d n < qi, we can decrease q(i) to d/n and put client i in SA,low. In all these three cases, we do not change the execution of our algorithm and the output bipartite remains the same, while the TV distance between p and q decrease. Finally, we take a step further and assume that pi = d L > d n holds for all client i SA,high. This is w.l.o.g. since we only take account into the total loads on SA,high. We make slight change of notations and denote |SA,low| = k1, |NG(SA,low)| = k2, and |SA,high| = k3. We note that |NG(SA)| = |NG(SA,high)| + |NG(SA,low)| = k1 + dk3, (27) since there is no overlap between NG(SA,low) and NG(SA,high). Furthermore, we can still assume that p(SA,low) = q(SA,low) + λ = d + 1 d 1 nk2 + λ. (28) Our analysis divides into cases based on the size of |SA| and the actual load p(SA). Case 3. |SA| n/d. By Lemma 9, with probability 1 1/ poly(n), one has |N(SA)| dk3 + k2 + (n dk3 k2) 1 exp dk3 + k2 n Furthermore, one has OPTp L and p(SA) = p(SA,high) + p(SA,high) = Ldk3 + d + 1 d 1 nk2 + λ 1 (30) Hence, we have cnd = OPTG,p OPTp p(SA)/|N(SA)| Robust Load Balancing Plugging in Eq. (29)(30), we have cnd (1 O(ϵ)) dk3 + λ/L + (d + 1)k2/d Ln dk3 + k2 + (n dk3 k2) 1 exp dk3+k2 n = (1 O(ϵ)) Ln (dk3 + λ/L + (d + 1)k2/d Ln) Ln dk3 + k2 + (n dk3 k2) 1 exp dk3+k2 n < (1 O(ϵ)) dk3Ln + nλ + (d + 1)k2/d dk3n L + k2 + (n dk3n L k2) 1 exp dk3L + k2 (1 O(ϵ)) dk3Ln + nλ + (d + 1)k2/d dk3Ln + k2 + (n dk3Ln k2) dn | {z } λ = (1 O(ϵ)) dk3Ln + k2 + nλ dk3Ln + k2 + (n dk3Ln k2) (1 exp ( λ )) (1 O(ϵ)) 1 1 λ + λ (1 exp ( λ )) (1 O(ϵ)) 1 1 λ exp( λ). The third step follows from the monotonicity of k2 in the denominator and n L > 1. The fourth step follows from Eq. (30), that is dk3L + d + 1 d 1 nk2 + λ 1 dk3L + k2 We replace λ = λ + k2 dn in the fifth step and the sixth step holds since the expression is monotone increasing in dk3n L + k2 when fixing λ (see Lemma 28) and by Eq. (30), one has Ldk3 + d + 1 d 1 nk2 + λ 1 dk3Ln + k2 n nλ . The last step follows from λ = λ + k2 dn λ + 1/d, d Ω(1/ϵ) and Lemma 28 Case 4. |SA| n/d. By Lemma 9, with probability 1 1/ poly(n), we have |N(SA)| dk3 + k2 + (n dk3 k2) 1 exp k2 dk1 We further split into two cases. Case 4-a. Suppose the load on p(SA) is fairly small, i.e. p(SA,low) = d + 1 d 1 nk2 + λ k1d Ahmadian, Esfandiari, Mirrokni, Peng Then the total load on clients SA obeys p(SA) = p(SA,high) + p(SA,low) = dk3L + d + 1 d 1 nk2 + λ 1. (33) Furthermore, since we assume pi = d L (L > 1/n) for all client i SA,high and SA,high = , we have OPTp L. Hence, cnd = OPTG,p OPTp p(SA)/|N(SA)| Plugging in Eq. (31)(33), we have cnd (1 O(ϵ)) dk3 + λ/L + (d + 1)k2/Lnd dk3 + k2 + (n dk3 k2) 1 exp k2 dk1 (1 O(ϵ)) dk3 + λ/L + (d + 1)k2/Lnd dk3 + k2 + (n dk3 k2) dn | {z } λ = (1 O(ϵ)) dk3 + k2/Ln + λ /L dk3 + k2/Ln + (n dk3 k2/n L) (1 exp ( λ )) = (1 O(ϵ)) Ln (dk3 + k2/Ln + λ /L) Ln (dk3 + k2/Ln + (n dk3 k2/Ln) (1 exp ( λ ))) (1 O(ϵ)) Lndk3 + k2 + nλ Lndk3 + k2 + (n Lndk3 k2) (1 exp ( λ ))) (1 O(ϵ)) n nλ n nλ + nλ (1 exp ( λ )) = (1 O(ϵ)) 1 1 λ exp( λ ) (1 O(ϵ)) 1 1 λ exp( λ). The second step follows from Eq. (32), i.e. p(SY C ) = d + 1 d 1 nk2 + λ k1d We replace λ = λ + k2 dn in the third step. The sixth step holds since the expression is monotone increasing of (Lndk3 + k2) when fixing λ (see Lemma 28) and Lndk3+k2+nλ = n Ldk3 + d + 1 d 1 nk2 + λ = np(SA) n Lndk3+k2 n nλ . The last step follows from λ = λ + k2 dn λ + 1/d, d Ω(1/ϵ) and Lemma 28. Robust Load Balancing Case 4-b. Suppose the load p(SA,low) is fairly large and satisfies p(SY A) = d + 1 d 1 nk2 + λ > k1d Then we have OPTp p(SA) dk1 + dk3 , cnd = OPTG,p OPTp p(SA)/|N(SA)| p(SA)/(dk1 + dk3) = dk1 + dk3 Plugging in Eq. (31), one has cnd (1 O(ϵ)) dk1 + dk3 dk3 + k2 + (n dk3 k2) 1 exp k2 dk1 (1 O(ϵ)) dk1 + dk3 dk3 + dk1 nλ k2 d + (n dk3 dk1 + nλ + k2 dn | {z } λ (1 O(ϵ)) dk1 + dk3 nλ + nλ dk3 + dk1 nλ +(n dk3 dk1 + nλ ) (1 exp ( λ )) (1 O(ϵ)) 1 1 λ exp( λ ) 1 1 λ exp( λ). The second step holds since the denominator is monotone increasing with dk3 + k2 (see Lemma 28) and Eq. (34), that is d 1 nk2 + λ > k1d n dk3 + k2 dk3 + dk1 k2 We take λ = λ + k2 dn in the third step. The fourth step holds since the expression is monotone increasing in (dk1 + dk3 nλ ), and by Eq. (34), one has 1 p(SA,low)+p(SA,high) = d + 1 d 1 nk2+λ+dk3L dk1 n dk1+dk3 nλ n nλ . We conclude the proof here. Lemma 28 (Technical Lemma) We have 1. Let f1(x) = x + (n x)(1 exp( x n α)), where x [0, n] and α 1. Then f1(x) is monotone increasing. Ahmadian, Esfandiari, Mirrokni, Peng 2. Let f2(x) = x+nλ x+(n x)(1 exp(λ)). Then f2(x) is monotone increasing. 3. Let f3(x) = 1 1 x exp( x). For x (0, 1), ϵ (0, 1 2) we have f3(x + ϵ) (1 + O(ϵ))f3(x). Proof For the first one, we have n exp(x/n α) 0. For the second one, we have f 2(x) = n(1 (1 + λ) exp( λ))) (x + (n x)(1 exp( λ)))2 0. For the third one, we have f 3(x) = exp( x)(1 x) (1 x exp( x))2 . It is easy to see that f3(x) is increasing in (0, 1) and decreasing in (1, 2). Furthermore, |f 3(x)| e2/(e 1)2 4, thus we know f3(x + ϵ) f3(x) 4ϵ O(ϵ)f3(x). Finally, we prove one can modify Randomize greedy and ensure each server is connected to at most O(d/ϵ) clients, i.e. Corollary 12. Proof [Proof of Corollary 12] Given a set of client A and a set of server B with |A| = |B| = n, we divide the server into two parts B = B1 B2 with |B1| = (1 ϵ )n and |B2| = ϵ n. We run Randomize greedy on client set A and server set B1. By Theorem 7, the Randomize greedy algorithm returns a network that is 1 1 λ exp( λ) + O(ϵ ) approximately optimal. We use the set B2 to balance the storage on servers. In particular, for any server in B1 that has more than d/ϵ connected clients, we redirect these clients to servers in B1. Since the total number of edges between A and B1 is at most nd and there are ϵ n servers in B2, we can make sure that no server has at more than d/ϵ neighbors. This new network can only be better than the original one. We conclude the proof by choosing a suitable parameter ϵ = O(ϵ). Appendix C. Missing proof from Section 3.3 Proof [Proof of Theorem 7, Part 2] For simplicity, we assume n is divisible by d. Let the estimate load distribution q be uniform over the first n/d clients, i.e. n, . . . , d n | {z } n/d , 0, . . . , 0 Robust Load Balancing Consider the following hard distribution on the actual load. For each i [n/d], let epi = d n with probability (1 λ + ϵ) and epi = 0 otherwise. For each i [n/d + 1 : n], let epi = d n with probability λ 2ϵ d 1 and epi = 0 otherwise. The final distribution is constructed as follow. If P i [n] epi > 1 or P i [n/d] epi < (1 λ), we simple take q = p. Else, we take p = ep and normalize p. First, we prove it is very unlikely that P i [n] epi > 1 or P i [n/d] epi > (1 λ)n/d. In particular, for i [n], let the random variable Yi = 1, if epi > 0, and Yi = 0 otherwise. Then by Chernoffbound, we have i [n] epi > 1 i [n] Yi > n/d i [n] Yi > (1 + ϵ) X i [n] E[Yi] exp( ϵ2(1 ϵ)n/3d) 1/ poly(n). where the second step follows from i [n] E[Yi] = i=1 E[Yi] + d +1 E[Yi] = (1 λ + ϵ) n d = (1 ϵ) d Moreover, we have i [n/d] epi < (1 λ) i [n/d] Yi (1 λ)n/d i [n/d] Yi (1 ϵ) X i [n/d] E[Yi] exp( ϵ2(1 λ + ϵ)n/2d) 1/ poly(n) where the second step follows from X i [n/d] E[Yi] = (1 λ + ϵ)n We focus on the case P i [n] epi 1 and P i [n/d] epi (1 λ)n/d. One can verify that TV(p, q) λ. Furthermore, one can show the optimal solution satisfies OPT(p) (1+3ϵ)/n with high probability, since i [n] epi (1 2ϵ)n i [n] Yi (1 2ϵ)n/d i [n] Yi (1 ϵ) X i [n] E[Yi] exp( ϵ2(1 ϵ)n/2d) 1/ poly(n) and thus maxi [n] pi 1 1 2ϵ d n (1 + 3ϵ) d n. Let SA denote the support of ep, i.e., SA = {i : epi > 0}. It suffices to prove that E[|N(SA)|] (1 λ exp( λ) + O(ϵ))n We use Xj to indicate whether server j is connected to some client in SA, i.e. Xj = 1{j N(SA)}. Let xj denote the number of neighbor clients of server j that comes from Ahmadian, Esfandiari, Mirrokni, Peng [n/d+1 : n] and zj denote the number of neighbor clients of server j that comes from [n/d]. We know that P j [n] xj = d (1 1/d)n = (d 1)n and P j [n] zj = d n/d = n. We have j [n] E[Xj] = X P j [n] xj/n (λ ϵ) P j [m] zj/n = n n 1 λ 2ϵ (1 λ exp(λ) + O(ϵ) + O(1/d))n The first step holds since Xj = 0 iffnone of the xj clients in [n/d + 1 : n] and zj clients in [n/d] belongs to SA. The second step follows from Jensen s inequality and the convexity of function f(x, z) = 1 λ 2ϵ d 1 x (λ ϵ)z. We conclude the proof here. We next prove Theorem 13, which asserts the NP-hardness of the network design problem. Proof [Proof of Theorem 13] We reduce from the subset-sum problem. In the subset-sum problem, we are given k numbers x1, . . . , xk satisfying Pk i=1 xi = 2, and the goal is to find a subset S [k] such that P i S xi = 1. The problem is known to be NP-complete. Given a subset-sum instance x1, . . . , xk, we set the number of client to be |A| = k and the number of server to be |B| = (d 1)k + 2. Note one can make the number of clients equals the number of servers by adding (d 2)k + 2 dummy clients whose load are 0. Define the load distribution p k to be pi = d 1 + xi (d 1)k + 2, i A. Suppose there exists S [k] such that P i S xi = 1. then we have an allocation with the maximum load equals 1 (d 1)k+2. To see this, let the i-th client put 1 (d 1)k+2 units of load on server (i 1)(d 1) + 1, . . . , i(d 1) and put xi (d 1)k+2 units of load on server (d 1)k + 1 if i [S]; it puts xi (d 1)k+2 units of load on server (d 1)k + 2 otherwise. It is easy to see that each machine has load 1 (d 1)k+2 and each client has d connected servers. On the other side, suppose one can find an allocation with the maximum load 1 (d 1)k+2, we proceed to show that one can recover a solution to the subset-sum instance. Note that the total number of servers is |B| = (d 1)k + 2, thus every server has load 1 (d 1)k+2 in the allocation. For each client i A, suppose it connects to d servers Bi1, . . . , Bid and assigns loads ℓi1 ℓid. Suppose there exists t (1 t d 1) such that ℓit < 1, then we do the following operations. We take ℓit 1 (d 1)k+2 and ℓid ℓit +ℓid 1 (d 1)k+2. To keep the maximum load to be 1 (d 1)k+2, for any client i that is connected to server Bit, we delete the connection between Bit and client i , re-assign it to server Bid and transfer all the load. We still maintain a feasible solution, i.e. the maximum load is still 1 (d 1)k+2 and the number of neighbor servers for each client is still at most d. By doing this for every client, we know Robust Load Balancing that each client connects to (d 1) servers, for which they put 1 (d 1)k+2 units of load; and they put their rest load on one of the two left servers. In particular, the i-th client would put xi (d 1)k+2 units of load load. Thus, we recover a solution to the subset-sum instance. Appendix D. Missing proof from Section 4 Let κ > 0, the follow Lemma states that each client is connected with at least (1 κ)d servers, we omit the proof as it directly follows from the Chernoffbound. Lemma 29 Suppose d Ω(log(d)/κ2) and we construct the graph with Randomize greedy, then with probability 1 1/ poly(n), |N(i)| (1 κ)d holds for every client i [n], Next, we prove Lemma 15, which asserts that each client is connected to at least (1/2 κ)d clients of low load score, w.h.p. Proof [Proof of Lemma 15] Fixed a client i [n], we prove that |N(i) Blight| (1 2 κ)d holds with probability 1 1/ poly(n), the Lemma is then followed by an union bound. For each server j B, define esj := X d 1{j N(i )}. We know that X j [n] esj < X j [n] sj = X d 1{j N(i)} d 1{j N(i)} n X i [n] pi = n, Thus, there are at least n/2 servers j [n] such that eqj 2, we use e Blight to denote this set. For any j N(i), we have qj = eqj + npi d 1{j N(i)} = eqj + npi Hence, it suffices to prove that | e Blight N(i)| (1 2 κ)d holds with high probability. We consider the randomly assigned servers and greedily assigned servers separately. For randomly assigned neighbors NR(i), it follows easily from the Hoeffding bound. In particular, we set the random variable Xk = 1 (k [|NR(i)|]) if the k-th random assigned neighbor server belongs to e Blight; and Xj = 0 otherwise. It is easy to see that {Xk}k [|NR(i)|] is i.i.d. and E[P k [|NR(i)|] Xk] 1 2|NR(i)|. Then by Hoeffding bound, one has Pr NR(i) e Blight 1 k [|NR(i)|] Xk 1 exp κ2d/2 1/ poly(n). (35) Ahmadian, Esfandiari, Mirrokni, Peng For the greedily assigned neighbors NG(i), one crucial observation is that there are at most two servers in NG(i) that are also assigned greedily for some other client. Since we assume d = Ω(log n) we could simply ignore them, and it is w.l.o.g to assume all servers in NG(i) are not greedily assigned to other client. For any server j NG(i), define the random variable Yj = 1 if j e Blight. We still have E[Yj] 1 2 due to the symmetricity of all j [n] for random assignment. Furthermore, we claim that {Yj}j NG(i) are negative associated. To see this, for any server j NG(i) and any client i , define Zi ,j = npi d 1{j NR(i )}. We first prove that {Zi ,j}i ,j is NA. We further define Wi ,j,k where k [d |NG(i )|], and Wi ,j,k = npi d if the k-th randomly assigned server of client i is server j. It is clear that {Wi ,j,k}j [n] is NA since they are permutations npi d , 0, . . . , 0 (see Lemma 25). This implies that {Wi ,j,k}i ,j,k is also NA by the closure property of NA (see Lemma 24). Moreover, we have Zi ,j = min d |NG(i)| X k=1 Wi ,j,k Thus {Zi ,j}i ,j are also NA since they are defined on disjoint subset of {Wi ,j,k}i ,j,k and it is monotone (see Lemma 24). Finally, observe that Yj = 0 P i [n],i =i Zi ,j 2 1 otherwise Hence {Yj}j NG(i) are also NA since they are defined on disjoint subset of {Zi ,j}i ,j and the function is monotone decreasing. Now that we proved {Yj}j NG(i) are NA, by the Hoeffding bound for NA variables, we have Pr NG(i) e Blight 1 j NG(i) Yj 1 exp κ2d/2 1/ poly(n). (37) Combining Eq. (35)(37) and Lemma 29, we conclude the proof.