# online_facility_location_with_predictions__7d6414fd.pdf ONLINE FACILITY LOCATION WITH PREDICTIONS Shaofeng H.-C. Jiang Peking University Email: shaofeng.jiang@pku.edu.cn Erzhi Liu Shanghai Jiao Tong University Email: lezdzh@sjtu.edu.cn You Lyu Shanghai Jiao Tong University Email: vergil@sjtu.edu.cn Zhihao Gavin Tang Shanghai University of Finance and Economics Email: tang.zhihao@mail.shufe.edu.cn Yubo Zhang Peking University Email: zhangyubo18@pku.edu.cn We provide nearly optimal algorithms for online facility location (OFL) with predictions. In OFL, n demand points arrive in order and the algorithm must irrevocably assign each demand point to an open facility upon its arrival. The objective is to minimize the total connection costs from demand points to assigned facilities plus the facility opening cost. We further assume the algorithm is additionally given for each demand point xi a natural prediction f pred xi , which is supposed to be the facility f opt xi that serves xi in the offline optimal solution. Our main result is an O(min{log nη OPT, log n})-competitive algorithm where η is the maximum prediction error (i.e., the distance between f pred xi and f opt xi ). Our algorithm overcomes the fundamental Ω( log n log log n) lower bound of OFL (without predictions) when η is small, and it still maintains O(log n) ratio even when η is unbounded. Furthermore, our theoretical analysis is supported by empirical evaluations for the tradeoffs between η and the competitive ratio on various real datasets of different types. 1 INTRODUCTION We study the online facility location (OFL) problem with predictions. In OFL, given a metric space (X, d), a ground set D X of demand points, a ground set F X of facilities, and an opening cost function w : F R+, the input is a sequence of demand points X := (x1, . . . , xn) D, and the online algorithm must irrevocably assign xi to some open facility, denoted as fi, upon its arrival, and an open facility cannot be closed. The goal is to minimize the following objective X f F w(f) + X xi X d(xi, fi), where F is the set of open facilities. Facility location is a fundamental problem in both computer science and operations research, and it has been applied to various domains such as supply chain management (Melo et al., 2009), distribution system design (Klose & Drexl, 2005), healthcare (Ahmadi-Javid et al., 2017), and more applications can be found in surveys (Drezner & Hamacher, 2002; Laporte et al., 2019). Its online variant, OFL, was also extensively studied and well understood. The state of the art is an O( log n log log n)-competitive deterministic algorithm proposed by Fotakis (2008), and the same work shows this ratio is tight. We explore whether or not the presence of certain natural predictions could help to bypass the O( log n log log n) barrier. Specifically, we consider the setting where each demand point xi receives a predicted facility f pred xi that is supposed to be xi s assigned facility in the (offline) optimal solution. This prediction is very natural, and it could often be easily obtained in practice. For instance, if the dataset is generated from a latent distribution, then predictions may be obtained by analyzing past data. Moreover, predictions could also be obtained from external sources, such as expert advice, and additional features that define correlations among demand points and/or facilities. Previously, predictions of a similar flavor were also considered for other online problems, such as online caching (Rohatgi, 2020; Lykouris & Vassilvitskii, 2021), ski rental (Purohit et al., 2018; Gollapudi & Panigrahi, 2019) and online revenue maximization (Medina & Vassilvitskii, 2017). 1.1 OUR RESULTS Our main result is a near-optimal online algorithm that offers a smooth tradeoff between the prediction error and the competitive ratio. In particular, our algorithm is O(1)-competitive when the predictions are perfectly accurate, i.e., f pred xi is the nearest neighbor of xi in the optimal solution for every 1 i n. On the other hand, even when the predictions are completely wrong, our algorithm still remains O(log n)-competitive. As in the literature of online algorithms with predictions, our algorithm does not rely on the knowledge of the prediction error η . Theorem 1.1. There exists an O(min{log n, max{1, log nη OPT}})-competitive algorithm for the OFL with predictions, where η := max1 i n d(f pred xi , f opt xi ) measures the maximum prediction error, and f opt xi is the nearest neighbor of xi from the offline optimal solution OPT1. Indeed, we can also interpret our result under the robustness-consistency framework which is widely considered in the literature of online algorithms with predictions (cf. Lykouris & Vassilvitskii (2021)), where the robustness and consistency stand for the competitive ratios of the algorithm in the cases of arbitrarily bad predictions and perfect predictions, respectively. Under this language, our algorithm is O(1)-consistent and O(log n)-robust. We also remark that our robustness bound nearly matches the lower bound for the OFL without predictions. One might wonder whether or not it makes sense to consider a related error measure, η1 := Pn i=1 d(f pred xi , f opt xi ), which is the total error of predictions. Our lower bound result (Theorem 1.2) shows that a small η1 is not helpful, and the dependence of η in our upper bound is fundamental (see Section F for a more detailed explanation). The lower bound also asserts that our algorithm is nearly tight in the dependence of η . Theorem 1.2. Consider OFL with predictions with a uniform opening cost of 1. For every η (0, 1], there exists a class of inputs, such that no (randomized) online algorithm is o( log nη OPT log log n )- competitive, even when η1 = O(1). As suggested by an anonymous reviewer, it might be possible to make the error measure η more robust by taking outliers into account. A more detailed discussion on this can be found in Section G. Empirical evaluation. We simulate our algorithm on both Euclidean and graph (shortest-path) datasets, and we measure the tradeoff between η and the empirical competitive ratio, by generating random predictions whose η is controlled to be around some target value. We compare with two baselines, the O(log n)-competitive algorithm by Meyerson (2001) which do not use the prediction at all, and a naive algorithm that always trusts the prediction. Our algorithm significantly outperforms Meyerson s algorithm when η 0, and is still comparable to Meyerson s algorithm when η is large where the follow-prediction baseline suffers a huge error. We observe that our lead is even more significant on datasets with non-uniform opening cost, and this suggests that the prediction could be very useful in the non-uniform setting. This phenomenon seems to be counter-intuitive since the error guarantee η only concerns the connection costs without taking the opening cost into consideration (i.e., it could be that a small η is achieved by predictions of huge opening cost), which seems to mean the prediction may be less useful. Therefore, this actually demonstrates the superior capability of our algorithm in using the limited information from the predictions even in the non-uniform opening cost setting. 1When the context is clear, we also use OPT to denote the optimal solution. Finally, we test our algorithm along with a very simple predictor that does not use any sophisticated ML techniques or specific features of the dataset, and it turns out that such a simple predictor already achieves a reasonable performance. This suggests that more carefully engineered predictors in practice is likely to perform even better, and this also justifies our assumption of the existence of a good predictor. Comparison to independent work. A recent independent work by Fotakis et al. (2021a) considers OFL with predictions in a similar setting. We highlight the key differences as follows. (i) We allow arbitrary opening cost w( ), while Fotakis et al. (2021a) only solves the special case of uniform facility location where all opening costs are equal. This is a significant difference, since as we mentioned (and will discuss in Section 1.2)), the non-uniform opening cost setting introduces outstanding technical challenges that require novel algorithmic ideas. (ii) For the uniform facility location setting, our competitive ratio is O(log nη OPT) = O(log nη w ), while the ratio in Fotakis et al. (2021a) is log(err0) log(err 1 1 log(err0)), where err0 = nη /w, err1 = η1/w and w is the uniform opening cost. Our ratio is comparable to theirs when η1 is relatively small. However, theirs seems to be better when η1 is large, and in particular, it was claimed in Fotakis et al. (2021a) that the ratio becomes constant when err1 err0. Unfortunately, we think this claim contradicts with our lower bound (Theorem 1.2) which essentially asserts that η1 = O(η ) is not helpful for the ratio. Moreover, when η1 is large, we find their ratio is actually not well-defined since the denominator log(err 1 1 log(err0)) is negative. Therefore, we believe there might be some unstated technical assumptions, or there are technical issues remaining to be fixed, when η1 is large. Another independent work by Panconesi et al. (2021) considers a different model of predictions which is not directly comparable to ours. Upon the arrival of i-th demand point, a set of multiple predictions Si is provided, each suggesting a list of facilities to be opened. Their algorithm achieves a cost of O(log(| S i Si|) OPT(S i Si)) where OPT(S i Si)) is the optimal using only facilities in S i Si, and the algorithm also has a worst-case ratio of O( log n log log n) in case the predictions are inaccurate. Similar to the abovemention Fotakis et al. (2021a) result, this result only works for uniform opening cost while ours work for arbitrary opening costs. 1.2 TECHNICAL OVERVIEW Since we aim for a worst-case (i.e., when the predictions are inaccurate) O(log n) ratio, our algorithm is based on an O(log n)-competitive algorithm (that does not use predictions) by Meyerson (2001), and we design a new procedure to make use of predictions. In particular, upon the arrival of each demand point, our algorithm first runs the steps of Meyerson s algorithm , which we call the Meyerson step, and then runs a new procedure (proposed in this paper) to open additional facilities that are near the prediction, which we call the Prediction step. We make use of the following crucial property of Meyerson s algorithm. Suppose before the first demand point arrives, the algorithm is already provided with an initial facility set F, such that for every facility f opened in OPT, F contains a facility f that is of distance η to f , then Meyerson s algorithm is O(log nη OPT)-competitive. To this end, our Prediction step aims to open additional facilities so that for each facility in OPT, the algorithm would soon open a close enough facility. Uniform opening cost. There is a simple strategy that achieves this goal for the case of uniform opening cost. Whenever the Meyerson step decides to open a facility, we further open a facility at the predicted location. By doing so, we would pay twice as the Meyerson algorithm does in order to open the extra facility. As a reward, when the prediction error η is small, the extra facility would be good and close enough to the optimal facility. Non-uniform opening cost. Unfortunately, this strategy does not extend to the non-uniform case. Specifically, opening a facility exactly at the predicted location could be prohibitively expensive as it may have huge opening cost. Instead, one needs to open a facility that is close to the prediction, but attention must be paid to the tradeoff between the opening cost and the proximity to the prediction. The design of the Prediction step for the non-uniform case is the most technical and challenging of our paper. We start with opening an initial facility that is far from the prediction, and then we open a series of facilities within balls (centered at the prediction) of geometrically decreasing radius, while we also allow the opening cost of the newly open facility to be doubled each time. We stop opening facilities if the total opening cost exceeds the cost incurred by the preceding Meyerson step, in order to have the opening cost bounded. We show that our procedure always opens a correct facility ˆf that is Θ(η ) apart to the corresponding facility in OPT, and that the total cost until ˆf is opened is merely O(OPT). This is guaranteed by the gradually decreasing ball radius when we build additional facilities (so that the correct facility will not be skipped), and that the total opening cost could be bounded by that of the last opened facility (because of the doubling opening cost). Once we open this ˆf, subsequent runs of Meyerson steps would be O(log nη OPT)-competitive. 1.3 RELATED WORK OFL is introduced by Meyerson (2001), where a randomized algorithm that achieves competitive ratios of O(1) and O(log n) are obtained in the setting of random arrival order and adversarial arrival order, respectively. Later, Fotakis (2008) gave a lower bound of Ω( log n log log n) and proposed a deterministic algorithm matching the lower bound. In the same paper, Fotakis also claimed that an improved analysis of Meyerson s algorithm actually yields an O( log n log log n) competitive ratio in the setting of adversarial order. Apart from this classical OFL, other variants of it are studied as well. Div eki & Imreh (2011) and Bamas et al. (2020b) considered the dynamic variant, in which an open facility can be reassigned. Another variant, OFL with evolving metrics, was studied by Eisenstat et al. (2014) and Fotakis et al. (2021b). There is a recent trend of studying online algorithms with predictions and many classical online problems have been revisited in this framework. Lykouris & Vassilvitskii (2021) considered online caching problem with predictions and gave a formal definition of consistency and robustness. For the ski-rental problem, Purohit et al. (2018) gave an algorithm with a hyper parameter to maintain the balance of consistency and robustness. They also considered non-clairvoyant scheduling on a single machine. Gollapudi & Panigrahi (2019) designed an algorithm using multiple predictors to achieve low prediction error. Recently, Wei & Zhang (2020) showed a tight robustness-consistency tradeoff for the ski-rental problem. For the caching problem, Lykouris & Vassilvitskii (2021) adapted Marker algorithm to use predictions. Following this work, Rohatgi (2020) provided an improved algorithm that performs better when the predictor is misleading. Bamas et al. (2020b) extended the online primal-dual framework to incorporate predictions and applied their framework to solve the online covering problem. Medina & Vassilvitskii (2017) considered the online revenue maximization problem and Bamas et al. (2020a) studied the online speed scaling problem. Both Antoniadis et al. (2020) and D utting et al. (2021) considered the secretary problems with predictions. Jiang et al. (2021) studied online matching problems with predictions in the constrained adversary framework. Azar et al. (2021) and Lattanzi et al. (2020) considered flow time scheduling and online load balancing, respectively, in settings with error-prone predictions. 2 PRELIMINARIES Recall that we assume a underlying metric space (X, d). For S X, x X, define d(x, S) := miny S d(x, y). For an integer t, let [t] := {1, 2, . . . , t}. We normalize the opening cost function w so that the minimum opening cost equals 1, and we round down the opening cost of each facility to the closest power of 2. This only increases the ratio by a factor of 2 which we can afford (since we make no attempt to optimize the constant hidden in big O). Hence, we assume the domain of w is {2i 1 | i [L]} for some L. Further, we use Gk to denote the set of facilities whose opening cost is at most 2k 1, i.e. Gk := {f F | w(f) 2k 1}, 1 k L. Observe that G0 = . Algorithm 1 Prediction-augmented Meyerson 1: initialize F , FP F, FP are the set of all open facilities and those opened by PRED 2: for arriving demand point x X and its associated f pred x do 3: cost M(x) MEY (x) 4: PRED f pred x , cost M(x) 5: procedure MEY(x) 6: for k [L] do 7: fk arg minf F Gk d(x, f) 8: δk d(x, fk) 9: pk (δk 1 δk)/2k, where δ0 = d(x, F) 10: let sk PL i=k pi for k [L] 11: sample r Uni[0, 1], and let i [L] be the index such that r [si+1, si) 12: if such i exists then 13: F F fi open facility at fi 14: connect x to the nearest neighbor in F 15: return d(x, F) + w(fi) if i does not exist, simply return d(x, F) 16: procedure PRED(f pred x , q) 17: repeat 18: r 1 2d(f pred x , FP) 19: fopen arg minf:d(f pred x ,f) r w(f) break ties by picking the closest facility to f pred x 20: if q w(fopen) then 21: F F fopen, FP FP fopen open facility at fopen 22: q q w(fopen) 23: until q < w(fopen) 24: open facility at fopen with probability q w(fopen), and update F, FP 3 OUR ALGORITHM: PREDICTION-AUGMENTED MEYERSON We prove Theorem 1.1 in this section. Our algorithm, stated in Algorithm 1, consists of two steps, the Meyerson step and the Prediction step. Upon the arrival of each demand point x, we first run the Meyerson algorithm Meyerson (2001), and let cost M(x) be the cost from the Meyerson step. Roughly, for a demand point x, the Meyerson algorithm first finds for each k 1, a facility fk which is the nearest to x, among facilities of opening cost at most 2k 1 and those have been opened, and let δk = d(x, fk). Then, open a random facility from {fk}k, such that fk is picked with probability (δk 1 δk)/2k. For technical reasons, we do not normalize P k pk, and we simply truncate if P k pk > 1. Next, we pass the cost incurred in the Meyerson step (which is random) as the budget q to the Prediction step, and the Prediction step would use this budget to open a series of facilities through the repeat-until loop. Specifically, for each iteration in this loop, a facility is to be opened around distance r to the prediction, where r is geometrically decreasing, and the exact facility is picked to be the one with the minimum opening cost. The ratio of this algorithm is stated as follows. Theorem 3.1. Prediction-augmented Meyerson is O(min{log n, max{1, log nη OPT}})-competitive. Calibrating predictions. Recall that η = maxi [n] d(f pred xi , f opt xi ) can be unbounded. We show how to calibrate the bad predictions so that η = O(OPT). Specifically, when a demand point x arrives, we compute f x := arg minf F{d(x, f) + w(f)}, and we calibrate f pred x by letting f pred x = f x if d(x, f pred x ) 2d(x, f x) + w(f x). We show the calibrated predictions satisfy η = O(OPT). Indeed, for every demand point x, d(f x, f opt x ) d(x, f x) + d(x, f opt x ) 2d(x, f x) + w(f x) O(OPT), where the second to last inequality follows from the optimality (i.e., the connection cost d(x, f opt x ) has to be smaller than the cost of first opening f x then connecting x to f x). Note that, if η = O(OPT) then nη OPT = O(n), hence it suffices to prove a single bound O(max{1, log nη OPT}) for Theorem 3.1 (i.e., ignoring the outer min). Algorithm analysis. Let Fopt be the set of open facilities in the optimal solution. We examine each f Fopt and its corresponding demand points separately, i.e. those demand points connecting to f in the optimal solution. We denote this set of demand points by X(f ). Definition 3.2 (Open facilities). For every demand point x, let F(x) be the set of open facilities right after the arrival of request x, F(x) be the set of open facilities right before the arrival of x, and ˆF(x) = F(x) \ F(x) be the set of the newly-open facilities on the arrival of x. Moreover, let FM(x), FP(x) be a partition of F(x), corresponding to the facilities opened by MEY and PRED, respectively. Let FM(x), ˆFM(x), FP(x), ˆFP(x) be defined similarly. Definition 3.3 (Costs). Let cost M(x) = w( ˆFM(x)) + d x, FM(x) FP(x) be the the total cost from MEY step. Let cost P(x) = w( ˆFP(x)) be the cost from PRED step. Let cost(x) = cost M(x) + cost P(x) be the total cost of x. Recall that after the MEY step, we assign a total budget of cost M(x) to the PRED step. That is, the expected cost from the PRED step is upper bounded by the cost from the MEY step. We formalize this intuition in the following lemma. Lemma 3.4. For each demand point x X, E[cost(x)] = 2 E[cost M(x)]. Proof. Consider the expected cost from the PRED step. Given arbitrary Fp, q, there is only one random event from the algorithm and it is straightforward to see that the expected cost equals q. Notice that our algorithm assigns a total budget of q = cost M(x). Thus, E[cost(x)] = E[cost M(x)+ cost P(x)] = 2 E[cost M(x)]. Therefore, we are left to analyze the total expected cost from the MEY step. The following lemma is the most crucial to our analysis. Before continuing to the proof of the lemma, we explain how it concludes the proof of our main theorem. Lemma 3.5. For every facility f Fopt, we have x X(f ) E[cost M(x)] O max n 1, log nη x X(f ) d(x, f ) Proof of Theorem 3.1. Summing up the equation of Lemma 3.5 for every f Fopt, we have x X E[cost(x)] = 2 X x X E[cost M(x)] = 2 X x X(f ) E[cost M(x)] O max n 1, log nη x X(f ) d(x, f ) + O |X(f )| = O max n 1, log nη where the second equality is by Lemma 3.4 and the inequality is by Lemma 3.5, and this also implies that the ratio is O(1) when η = 0. 3.1 PROOF OF LEMMA 3.5 Fix an arbitrary f Fopt. Let ℓ be the integer such that w(f ) = 2ℓ 1, or equivalently f Gℓ . Let X(f ) = {x1, x2, . . . , xm} be listed according to their arrival order. We remark that there can be other demand points arriving between xi and xi+1, but they must be connected to other facilities in the optimal solution. Let ℓi be the index ℓsuch that d(f , F(xi)) [2ℓ 1, 2ℓ). ℓi can be negative and if d(f , F(xi)) = 0, let ℓi = . Observe that d(f , F(xi)) and ℓi are non-increasing in i. Let τ be the largest integer i with d(f , F(xi)) > min{7η , 4w(f )}. Note that τ is a random variable. We refer to the demand sequence before xτ as the long-distance stage and the sequence after xτ as the short-distance stage. In this section, we focus on the case when 7η 4w(f ). The proof for the other case is very similar and can be found in Section E. Long-distance stage. We start with bounding the total cost in the long-distance stage. Intuitively, our procedure PRED would quickly build a facility that is close to f within a distance of O(η ), since it is guaranteed that for each demand point xi, the distance between f and its corresponding prediction f pred xi is at most η . Formally, we prove the following statements. Lemma 3.6. E P i<τ cost M(xi) 2w(f ). Proof. The proof can be found in Section A. Lemma 3.7. E[cost M(xτ)] 2w(f ) + 2 Pm i=1 d(xi, f ). Proof. The proof can be found in Section B. Short-distance stage. Next, we bound the costs for the subsequence of demand points whose arrival is after the event that some facility near f is open. Formally, these demand points are xi s with i > τ. The analysis of this part is conceptually similar to a part of the analysis of Meyerson s algorithm. However, ours is more refined in that it utilizes the already opened facility that is near f to get an improved O(log nη OPT) ratio instead of O(log n). Moreover, Meyerson did not provide the full detail of this analysis, and our complete self-contained proof fills in this gap. By the definition of τ, we have d(f , F(xi)) 7η for such i > τ. For integer ℓ, let Iℓ= {i > τ | ℓi = ℓ}. Then for t such that 2t 1 > min{7η , 4w(f )}, we have It = . Let ℓbe the integer that OPT n [2ℓ 1, 2ℓ), and let ℓbe the integer such that min{7η , 4w(f )} [2ℓ 1, 2ℓ). We partition all demand points with i > τ into groups I ℓ 1, Iℓ, . . . , Iℓaccording to ℓi, where I ℓ 1 = S Lemma 3.8. E h P i I ℓ 1 cost M(xi) i 2|X(f )| Proof. The proof can be found in Section C. Lemma 3.9. For every ℓ [ℓ, ℓ], E P i Iℓcost M(xi) 18 P i Iℓd(xi, f ) + 32w(f ). Proof. The proof can be found in Section D. Proof of Lemma 3.5. We conclude Lemma 3.5 by combining Lemma 3.6, 3.7, 3.8, and 3.9. i [m] E [cost M(xi)] = E i [τ 1] cost M(xi) + E[cost M(xτ)] i Iℓ cost M(xi) i I ℓ 1 E [cost M(xi)] i=1 d(xi, f ) i Iℓ d(xi, f ) + 32w(f ) O(ℓ ℓ) + O(1) i [m] d(xi, f ) + w(f ) O max n 1, log nη i [m] d(xi, f ) + w(f ) + O |X(f )| Table 1: Specifications of datasets dataset type size # of dimension/edges non-uniform Twitter Euclidean 30k # dimension = 2 no Adult Euclidean 32k # dimension = 6 no US-PG Graph 4.9k # edges = 6.6k no Non-Uni Euclidean 4.8k # dimension = 2 yes 4 EXPERIMENTS We validate our online algorithms on various datasets of different types. In particular, we consider three Euclidean data sets, a) Twitter (Chan et al.), b) Adult (Dua & Graff, 2017), and c) Non Uni (Cebecauer & Buzna, 2018) which are datasets consisting of numeric features in Rd and the distance is measured by ℓ2, and one graph dataset, US-PG (Rossi & Ahmed, 2015) which represents US power grid in a graph, where the points are vertices in a graph and the distance is measured by the shortest path distance. The opening cost is non-uniform in Non-Uni dataset, while it is uniform in all the other three. These datasets have also been used in previous papers that study facility location and related clustering problems (Chierichetti et al., 2017; Chan et al., 2018; Cohen-Addad et al., 2019). A summary of the specification of datasets can be found in Table 1. Tradeoff between η and empirical competitive ratio. Our first experiment aims to measure the tradeoff between η and the empirical competitive ratio of our algorithm, and we compare against two extreme baselines, a) the vanilla Meyerson s algorithm without using predictions which we call MEYERSON, and b) a naive algorithm that always follows the prediction which we call FOLLOW-PREDICT. Intuitively, a simultaneously consistent and robust online algorithm should perform similarly to the FOLLOW-PREDICT baseline when the prediction is nearly perfect, while comparable to MEYERSON when the prediction is of low quality. Since it is NP-hard to find the optimal solution to facility location problem, we run a simple 3approximate MP algorithm (Mettu & Plaxton, 2000) to find a near optimal solution F for every dataset, and we use this solution as the offline optimal solution (which we use as the benchmark for the competitive ratio). Then, for a given η and a demand point x, we pick a random facility f F such that η /2 d(f, F x) η as the prediction, where F x is the point in F that is closest to x. The empirical competitive ratio is evaluated for every baselines as well as our algorithm on top of every dataset, subject to various values of η . All our experiments are conducted on a laptop with Intel Core i7 CPU and 16GB memory. Since the algorithms are randomized, we repeat every run 10 times and take the average cost. In Figure 1 we plot for every dataset a line-plot for all baselines and our algorithm, whose x-axis is η and y-axis is the empirical competitive ratio. Here, the scale of x-axis is different since η is dependent on the scale of the dataset. From Figure 1, it can be seen that our algorithm performs consistently better than MEYERSON when η is relatively small, and it has comparable performance to MEYERSON when η is so large that the FOLLOW-PREDICT baseline loses control of the competitive ratio. For instance, in the Twitter dataset, our algorithm performs better than MEYERSON when η 2, and when η becomes larger, our algorithm performs almost the same with MEYERSON while the ratio of FOLLOW-PREDICT baseline increases rapidly. Furthermore, we observe that our performance lead over MERYERSON is especially significant on dataset Non-Uni whose opening cost is non-uniform. This suggests that our algorithm manages to use the predictions effectively in the non-uniform setting, even provided that the prediction is tricky to use since predictions at good locations could have large opening costs. In particular, our algorithm does a good job to find a nearby facility that has the correct opening cost and location tradeoff. A simple predictor and its empirical performance. We also present a simple predictor that can be constructed easily from a training set, and evalute its performance on our datasets. We assume the predictor has access to a training set T. Initially, the predictor runs the 3-approximate MP algorithm on the training set T to obtain a solution F . Then when demand points from the dataset (i.e., 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Error Parameter η Competitive Ratio Meyerson Follow-Predict Ours Twitter dataset 0 250 500 750 1000 1250 1500 1750 Error Parameter η Competitive Ratio Meyerson Follow-Predict Ours Adult dataset 0 1 2 3 4 5 6 7 Error Parameter η Competitive Ratio Meyerson Follow-Predict Ours US-PG dataset 0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016 Error Parameter η Competitive Ratio Meyerson Follow-Predict Ours Non-Uni dataset Figure 1: The tradeoff between prediction error η and empirical competitive ratio over four datasets: Twitter, Adult, US-GD and Non-Uni. Table 2: Empirical competitive ratio evaluation for the simple predictor dataset Meyerson Follow-Predict Ours Twitter 1.70 1.69 1.57 Adult 1.55 1.57 1.49 US-PG 1.47 1.47 1.43 Non-Uni 5.66 5.7 2.93 test set) X arrives, the predictor periodically reruns the MP algorithm, on the union of T and the already-seen test data X X, and update F . For a demand point x, the prediction is defined as the nearest facility to x in F . To evaluate the performance of this simple predictor, we take a random sample of 30% points from the dataset as the training set T, and take the remaining 70% as the test set X. We list the accuracy achieved by the predictor in this setup in Table 2. From the table, we observe that the predictor achieves a reasonable accuracy since the ratio of Follow-Predict baseline is comparable to Meyerson s algorithm. Moveover, when combining with this predictor, our algorithm outperforms both baselines, especially on the Non-Uni dataset where the improvement is almost two times. This not only shows the effectiveness of the simple predictor, but also shows the strength of our algorithm. Finally, we emphasize that this simple predictor, even without using any advanced machine learning techniques or domain-specific signals/features from the data points (which are however commonly used in designing predictors), already achieves a reasonable performance. Hence, we expect to see an even better result if a carefully engineered predictor is employed. ACKNOWLEDGMENTS This work is supported by Science and Technology Innovation 2030 New Generation of Artificial Intelligence Major Project No.(2018AAA0100903), Program for Innovative Research Team of Shanghai University of Finance and Economics (IRTSHUFE) and the Fundamental Research Funds for the Central Universities. Shaofeng H.-C. Jiang is supported in part by Ministry of Science and Technology of China No. 2021YFA1000900. Zhihao Gavin Tang is partially supported by Huawei Theory Lab. We thank all anonymous reviewers for their insightful comments. Amir Ahmadi-Javid, Pardis Seyedi, and Siddhartha S. Syam. A survey of healthcare facility location. Comput. Oper. Res., 79:223 263, 2017. Antonios Antoniadis, Themis Gouleakis, Pieter Kleer, and Pavel Kolev. Secretary and online matching problems with machine learned advice. In Neur IPS, 2020. Yossi Azar, Stefano Leonardi, and Noam Touitou. Flow time scheduling with uncertain processing time. In STOC, pp. 1070 1080, 2021. Etienne Bamas, Andreas Maggiori, Lars Rohwedder, and Ola Svensson. Learning augmented energy minimization via speed scaling. In Neur IPS, 2020a. Etienne Bamas, Andreas Maggiori, and Ola Svensson. The primal-dual method for learning augmented algorithms. In Neur IPS, 2020b. Matej Cebecauer and L uboˇs Buzna. Large-scale test data set for location problems. Data in brief, 17:267 274, 2018. T.-H. Hubert Chan, Arnaud Guerqin, and Mauro Sozio. Fully dynamic k-center clustering. In WWW, pp. 579 587. ACM, 2018. T. Hubert Chan, A. Guerqin, and M. Sozio. Twitter data set. URL https://github.com/ fe6Bc5R4Jv Lk Fk Se Ex HM/k-center. Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. Fair clustering through fairlets. In NIPS, pp. 5029 5037, 2017. Vincent Cohen-Addad, Niklas Hjuler, Nikos Parotsidis, David Saulpic, and Chris Schwiegelshohn. Fully dynamic consistent facility location. In Neur IPS, pp. 3250 3260, 2019. Gabriella Div eki and Csan ad Imreh. Online facility location with facility movements. Central Eur. J. Oper. Res., 19(2):191 200, 2011. Zvi Drezner and Horst W. Hamacher. Facility location - applications and theory. Springer, 2002. Dheeru Dua and Casey Graff. UCI machine learning repository, 2017. URL https://archive. ics.uci.edu/ml/datasets/adult. Paul D utting, Silvio Lattanzi, Renato Paes Leme, and Sergei Vassilvitskii. Secretaries with advice. In EC, pp. 409 429, 2021. David Eisenstat, Claire Mathieu, and Nicolas Schabanel. Facility location in evolving metrics. In ICALP, volume 8573, pp. 459 470, 2014. Dimitris Fotakis. On the competitive ratio for online facility location. Algorithmica, 50(1):1 57, 2008. Dimitris Fotakis, Evangelia Gergatsouli, Themis Gouleakis, and Nikolas Patris. Learning augmented online facility location. Co RR, abs/2107.08277, 2021a. Dimitris Fotakis, Loukas Kavouras, and Lydia Zakynthinou. Online facility location in evolving metrics. Algorithms, 14(3):73, 2021b. Sreenivas Gollapudi and Debmalya Panigrahi. Online algorithms for rent-or-buy with expert advice. In ICML, volume 97 of Proceedings of Machine Learning Research, pp. 2319 2327. PMLR, 2019. Zhihao Jiang, Pinyan Lu, Zhihao Gavin Tang, and Yuhao Zhang. Online selection problems against constrained adversary. In ICML, volume 139 of Proceedings of Machine Learning Research, pp. 5002 5012. PMLR, 2021. Andreas Klose and Andreas Drexl. Facility location models for distribution system design. Eur. J. Oper. Res., 162(1):4 29, 2005. Gilbert Laporte, Stefan Nickel, and Francisco Saldanha da Gama. Location science. Springer, 2019. Silvio Lattanzi, Thomas Lavastida, Benjamin Moseley, and Sergei Vassilvitskii. Online scheduling via learned weights. In SODA, pp. 1859 1877, 2020. Thodoris Lykouris and Sergei Vassilvitskii. Competitive caching with machine learned advice. J. ACM, 68(4):24:1 24:25, 2021. Andres Mu noz Medina and Sergei Vassilvitskii. Revenue optimization with approximate bid predictions. In NIPS, pp. 1858 1866, 2017. M. Teresa Melo, Stefan Nickel, and Francisco Saldanha-da-Gama. Facility location and supply chain management - A review. Eur. J. Oper. Res., 196(2):401 412, 2009. Ramgopal R. Mettu and C. Greg Plaxton. The online median problem. In FOCS, pp. 339 348. IEEE Computer Society, 2000. Adam Meyerson. Online facility location. In FOCS, pp. 426 431. IEEE Computer Society, 2001. Alessandro Panconesi, Flavio Chierichetti, Giuseppe Re, Matteo Almanza, and Silvio Lattanzi. Online facility location with multiple advice. In Neur IPS 2021, 2021. Manish Purohit, Zoya Svitkina, and Ravi Kumar. Improving online algorithms via ML predictions. In Neur IPS, pp. 9684 9693, 2018. Dhruv Rohatgi. Near-optimal bounds for online caching with machine learned advice. In SODA, pp. 1834 1845. SIAM, 2020. Ryan A. Rossi and Nesreen K. Ahmed. The network data repository with interactive graph analytics and visualization. In AAAI, 2015. URL http://networkrepository.com. Alexander Wei and Fred Zhang. Optimal robustness-consistency trade-offs for learning-augmented online algorithms. In Neur IPS, 2020. A PROOF OF LEMMA 3.6 Lemma A.1 (Restatement of Lemma 3.6). E P i<τ cost M(xi) 2w(f ). Proof. Let G = {g1, g2, . . . , gt} be the facilities opened by the PRED steps on the arrivals of all {xi}i<τ, and are listed according to their opening order. Observe that the budget that is assigned to the PRED step is cost M(xi), and the expected opening cost of each PRED step equals its budget. By the definition of G, we have E P i<τ cost M(xi) = E h P k [t] w(gk) i . Next, we prove that the opening costs of gk s are always strictly increasing. For each k [t 1], suppose gk is opened on the arrival of demand x and gk+1 is opened on the arrival of demand y. Let F denote the set of all open facilities right before gk+1 is opened. Let f := arg minf:d(f pred y ,f) 1 2 d(f pred y ,gk) w(f), we must have w(gk+1) w(f), since gk+1 = arg minf:d(f pred y ,f) 1 2 d(f pred y ,F ) w(f) and that d(f pred y , F) d(f pred y , gk). Moreover, d(f, f pred x ) d(f, f pred y ) + d(f pred x , f pred y ) 1 2d(f pred y , gk) + 2η 2 d(f pred x , gk) + d(f pred x , f pred y ) + 2η 1 2d(f pred x , gk) + 3η < d(f pred x , gk). Here, the second and fourth inequalities use the fact that d(f pred x , f pred y ) d(f pred x , f ) + d(f , f pred y ) 2η since both x, y X(f ). The last inequality follows from the fact that d(f pred x , gk) d(gk, f ) d(f pred x , f ) > 7η η = 6η by the definition of τ. Consider the moment when we open gk, the fact that we do not open facility f implies that w(gk) < w(f). Therefore, w(gk) < w(f) w(gk+1). Recall that the opening cost of each facility is a power of 2. We have that P k [t] w(gk) 2w(gt). Finally, we prove that the opening cost w(gt) is at most w(f ). Consider the moment when gt is open, let x be the corresponding demand point. Notice that d(f , f pred x ) η < 1 2d(f , F(x)) 1 2d(f , FP(x)), by the definition of τ. However, we open gt instead of f on Line 20 of our algorithm. It must be the case that w(f ) w(gt). To conclude the proof, we have P k [t] w(gk) 2w(gt) 2w(f ). B PROOF OF LEMMA 3.7 Lemma B.1 (Restatement of Lemma 3.7). E[cost M(xτ)] 2w(f ) + 2 Pm i=1 d(xi, f ). Proof. We bound the opening cost and connection cost separately. Opening cost. By the algorithm, we know for every 1 i m, E h w( ˆFM(xi))1(w( ˆFM(xi)) > w(f )) i X 2i 2i = δl d(xi, f ). Hence, using the fact that at most one facility is opened by Meyerson step for each demand, E[w( ˆFM(xτ))] E max f FM(xm) w(f) w(f ) + E f FM(xm),w(f)>w(f ) w(f) i=1 d(xi, f ). This finishes the analysis for the opening cost. Connection cost. We claim that d(xτ, F(xτ)) w(f ) + d(xτ, f ) (with probability 1). Let δ0, δ1, . . . , δL be defined as in our algorithm. We assume δ0 = d(xτ, F(xτ)) > w(f ) + d(xτ, f ), as otherwise the claim holds immediately. Let k := max{k : δk > w(f ) + d(xτ, f )}. Then Pr[d(xτ, F(xτ)) w(f ) + d(xτ, f )] = min δk δℓ 2ℓ , 1 = 1. Therefore, E[cost M(xτ)] = E[w( ˆFM(xτ))] + E[d(f , F(xτ))] 2w(f ) + 2 Pm i=1 d(xi, f ), which concludes the proof. C PROOF OF LEMMA 3.8 Lemma C.1 (Restatement of Lemma 3.8). E h P i I ℓ 1 cost M(xi) i 2|X(f )| Proof. For i I ℓ 1, we have that its connection cost d(xi, F(xi)) d(xi, F(xi)) 2ℓ 1 OPT n . Let δ0, δ1, . . . , δL and f1, . . . , f L be defined as in our algorithm upon xi s arrival. Then, its expected opening cost in the MEY step equals E h w( ˆFM(xi)) i = X k [L] Pr h ˆFM(xi) = {fk} i w(fk) X k [L] min (pk, 1) 2k 1 k [L] min δk 1 δk 2k , 1 2k 1 X i I ℓ 1 cost M(xi) d(xi, F(xi)) + w( ˆFM(xi)) D PROOF OF LEMMA 3.9 Lemma D.1 (Restatement of 3.9). For every ℓ [ℓ, ℓ], i Iℓ cost M(xi) i Iℓ d(xi, f ) + 32w(f ). Proof. Recall that we relabeled and restricted the indices of points in X to X(f ) = {x1, . . . , xm}. However, in this proof, we also need to talk about the other points in the original data set (that do not belong to X(f )). Hence, to avoid confusions, we rewrite X = {y1, . . . , yn} (so y1, . . . , yn are the demand points in order), and we define σ : [m] [n] that maps elements xj X(f ) to its identity yi X, i.e., σ(j) = i. For i [n], let z M i := ˆFM(yi) ˆFM(yi) = none otherwise , and define z P i similarly. Since the MEY step opens at most one facility, z M i is either a singleton or none . Finally, define zi := z M i z P i . Define τℓ= min{i [n] | z M i = none and d(f , z M i ) < 2ℓ 1}, we have i Iℓ cost M(xi) 18d(xi, f ) i [τℓ] (cost M(yi) 18d(yi, f )) 1(i σ(Iℓ)) Next, we prove a stronger version by induction, and the induction hypothesis goes as follows. For every j [n], i=j (cost M(yi) 18d(yi, f )) 1(i σ(Iℓ)) | Ej 1 where Ej = (z1, . . . , zj). Clearly, applying this with j = 1 implies (1), hence it suffices to prove this hypothesis. Base case. The base case is j = n, and we would do a backward induction on j. Since we condition on Ej 1, the variables δ0, . . . , δL in the MEY step of yj, as well as wether or not j σ(Iℓ), are fixed. Hence, E[cost M(yj) | Ej 1] d(yj, F(yj)) + 2i 2i 2d(yj, F(yj)). This implies i=j (cost M(yi) 18d(yi, f )) 1(i σ(Iℓ)) | Ej 1 E[(cost M(yn) 18d(yn, f ))1(n σ(Iℓ)) | En 1] 2d(yn, F(yn)) 18d(yn, f ) |n σ(Iℓ) 2d(yn, F(yn)) 2d(yn, f ) |n σ(Iℓ) 2d( F(yn), f ) |n σ(Iℓ) 2ℓ+1 2ℓ +1 O(w(f )). Inductive step. Next, assume the hypothesis holds for j + 1, j + 2, . . . , n, and we would prove the hypothesis for j. We proceed with the following case analysis. Case 1: j / σ(Iℓ). We do a conditional expectation argument, and we have i=j (cost M(yi) 18d(yi, f )) 1(i σ(Iℓ)) | Ej 1 z M j ,z P j Pr z M j , z P j | Ej 1 E i=j (cost M(yi) 18d(yi, f )) 1(i σ(Iℓ)) | Ej 1, z M j , z P j where the second step follows by induction hypothesis. Case 2: j Iℓand 8d(yj, f ) > d(f , F(yj)). We have d(yj, F(yj)) d(yj, f ) + d(f , F(yj)) 9d(yj, f ). Hence E[cost M(yj) 18d(yj, f )] 0, and the hypothesis follows from a similar argument as in Case 1. Case 3: j Iℓ, and 8d(yj, f ) d(f , F(yj)). We have δℓ d(yj, f ) 1 8d(f , F(yj)) 2ℓ 3. Let D = {f : d(f , f) 2ℓ 1} {none}. Let k = min{k | δk 2ℓ 2}. Then i k, d(fi, f ) d(yj, fi) + d(yj, f ) < 2ℓ 1. Pr [τℓ j | Ej 1] Pr z M j / D | Ej 1 Pr [fi is open for some i k | Ej 1] min δk 1 δℓ 2ℓ , 1 min 2ℓ 2 2ℓ 3 2ℓ , 1 min 2ℓ 3 ℓ , 1 δ0 16w(f ) E[cost M(yj) | Ej 1] where the second last inequality follows from δ0 = d(yj, F(yj)) 2ℓand w(f ) = 2ℓ 1. Thus, LHS = E[cost M(yj) | Ej 1] Pr[z M j = tj, z P j | Ej 1] E i=j+1 (cost M(yi) 18d(yi, f )) 1(i σ(Iℓ)) | Ej 1, z M j = tj, z P j Pr z M j / D | Ej 1 32w(f ) + Pr z M j D | Ej 1 32w(f ) where the second step follows by induction hypothesis. In summary, we have i=1 (cost M(yi) 18d(yi, f )) 1(i σ(Iℓ)) | Ej 1 E PROOF OF LEMMA 3.5: WHEN 7η > 4w(f ) The proof is mostly the same as in the other case which we prove in Section 3.1, and we only highlight the key differences. We use the same definition for parameters τ, ℓ, and ℓ. It can be verified that Lemma 3.7, 3.8 and 3.9 still holds and their proof in Section 3.1 still works. However, Lemma 3.6 relies on that 7η 4w(f ), which does not work in the current case. We provide Lemma E.2 in replacement of Lemma 3.6, which offers a slightly different bound but it still suffices for Lemma 3.5. Lemma E.1. For every i < τ, d(xi, f ) w(f ). Proof. We prove the statement by contradiction. Suppose d(xi, f ) < w(f ). Let δ0, δ1, . . . , δW be defined as in our algorithm. Then, we have δ0 = d(xi, F(xi)) d(f , F(xi)) d(f , xi) d(f , F(xi)) w(f ) > 3w(f ), where the last inequality follows from the definition of τ that d(f , F(xi)) > 4w(f ). Let k = min{k | δk 3w(f )}. We have k 1. We prove that a facility within a distance of 3w(f ) from xi must be open at this step. Pr [d(xi, F(xi)) 3w(f )] Pr [fj is opened for some j k] min δk 1 δℓ 2ℓ , 1 min 3w(f ) w(f ) 2ℓ , 1 = 1, where the last inequality uses the fact that δℓ d(xi, f ) w(f ). Consequently, d(f , ˆF(xi+1)) d(f , F(xi)) d(xi, F(xi)) + d(xi, f ) < 4w(f ), which contradicts the definition of τ. Lemma E.2. E h P i [τ] cost(xi) i 6 E[P i [τ] d(xi, f )] + 4 w(f ). Proof. On the arrival of xi for each i [τ], let δ0, δ1, . . . , δL be defined as in our algorithm. We first study the connecting cost of request xi. We have d(xi, F(xi)) max d(xi, F(xj)\ ˆF(xi)), d(xi, ˆF(xi)) = max δ0, d(xi, ˆF(xi)) . We prove this value is at most d(xi, f ) + 2w(f ). Suppose δ0 > d(xi, f ) + 2w(f ), as otherwise the statement holds. Recall that δj is non-increasing, let k = min{k | δk d(xi, f ) + 2w(f )}. By the definition of k, we have that Pr [d(xi, F(xi)) d(xi, f ) + 2w(f )] Pr[fj is opened for some j k] min δk 1 δℓ min d(xi, f ) + 2w(f ) δℓ 2ℓ , 1 min w(f ) 2ℓ 1 , 1 = 1. To sum up, we have shown that the connecting cost d(xi, F(xi)) d(xi, f ) + 2w(f ). Next, we study the expected opening cost E[w( ˆF(xi))]. We have E[w( ˆF(xi))] = X j [L] Pr h ˆF(xi) = {fj} i w(fj) X j [L] min δj 1 δj 2j , 1 2j 1 j ℓ 2j 1 + X 2j 2j 1 < 2ℓ + 1 2δℓ < 2w(f ) + d(xi, f ). By Lemma E.1, we conclude the proof of the statement: i [τ] cost(xi) d(xi, F(xi)) + w( ˆF(xi)) i [τ] (2d(xi, f ) + 4w(f )) i<τ d(xi, f ) + (2d(xτ, f ) + 4w(f )) i [τ] d(xi, f ) F LOWER BOUND In the classical online facility location problem, a tight lower bound of O( log n log log n) is established by Fotakis (2008), even for the special case when the facility cost is uniform and the metric space is a binary hierarchically well-separated tree (HST). We extend their construction to the setting with predictions, proving that when the predictions are not precise (i.e. η > 0), achieving a competitive ratio of o( log nη OPT log log n ) is impossible. Theorem F.1. Consider OFL with predictions with a uniform opening cost of 1. For every η (0, 1], there exists a class of inputs, such that no (randomized) online algorithm is o( log nη OPT log log n )- competitive, even when η1 = O(1). Before we provide the proof of our theorem, we give some implications of our theorem. First of all, we normalize the opening cost to be 1. Consequently, the optimal cost is at least 1. Furthermore, we are only interested in the case when η 1. Indeed, the optimal facility for each demand point must be within a distance of 1, as otherwise, we can reduce the cost by opening a new facility at the demand point. Therefore, we can without loss of generality to study predictions with η = O(1). Without the normalization, our theorem implies an impossibility result of o( log nη OPT log log n ) for online facility location with predictions. Moreover, recall the definition of total prediction error η1 = Pn i=1 d(f pred xi , f opt xi ), which is at least η = maxi d(f pred xi , f opt xi ). Our theorem states that even when the total error η1 is constant times larger than the opening cost of 1, there is no hope for a good algorithm. Note that our bound above holds for any constant value of η . As an implication, our construction rules out the possibility of an o( log η1 OPT log log n)-competitive algorithm. Proof. By Yao s principle, the expected cost of a randomized algorithm on the worst-case input is no better than the expected cost for a worst-case probability distribution on the inputs of the deterministic algorithm that performs best against that distribution. For each η , we shall construct a family of randomized instance, so that the expected cost of any deterministic algorithm is at least Ω( log nη OPT log log n )-competitive. We first import the construction for the classical online facility location problem by Fotakis (2008). Consider a hierarchically well-separated perfect binary tree. Let the distance between root and its children as D. For every vertex i of the tree, the distance between i and its children is 1 m times the distance between i and its parent. That is to say, the distance between a height i vertex and its children of height i + 1 is D mi . Let the height of the tree be h. The demand sequence is consisted of h + 1 phases. For each 0 i h, in the (i + 1)-th phase, mi demand points arrive consecutively at some vertex of height i. For i 1, the identity of the height i vertex is independently and uniformly chosen between the two children of i-th phase vertex. Consider the solution that opens only one facility at the leaf node that is the (h+1)-th phase demand vertex and assign all demand points to the leaf. The cost of this solution equals i=0 mi h 1 X mi m m 1 = 1 + h D m m 1. This value serves as an upper bound of the optimal cost. I.e., OPT 1 + h D m m 1. Next, we describe the predictions associated with the demand points. Intuitively, we try the best to hide the identity of the leaf node in the last phase. We denote the leaf node as f , which is also the open facility in the above described solution. Our prediction is produced according to the following rule. When the distance between the current demand point and f is less than η , let the prediction be the same as the demand vertex. Otherwise, let the prediction be the vertex on the path from the current demand point to f , whose distance to f equals η . We prove a lower bound on the expected cost of any deterministic algorithm. We first overlook the first a few phases until the subtree of the current demand vertex has diameter less than η . Let it be the h -th phase. We now focus on the cost induced after the h -th phase and notice that the predictions are useless as they are just the demand points. When the mi demand points at vertex of height i comes, we consider the following two cases: Case 1: There is no facility opened in the current subtree. Then we either open a new facility bearing an opening cost 1 or assign the current demands to facilities outside the subtree bearing a large connection cost at least D mi 1 per demand. So the cost of this vertex is min{1, Dm}. Case 2: There has been at least one facility opened in the current subtree. Since the next vertex is uniformly randomly chosen in its two children, the expected number of facility that will not enter the next phase subtree is at least 1 2. We call it abandoned. The expected sum of abandoned facility is 1 2 of the occurrence of case 2. Thus the expected cost in every occurrence of case 2 is 1 We carefully choose D, m, h so that 1) m D = 1; 2) the total number of demand points is n, i.e. h P i=0 mi = n; and 3) D mh = η . These conditions give that (h + 1) log m = log n, (h + 1) log m = log η . Consequently, h h = log nη To sum up, an lower bound of any deterministic algorithm is (h h ) min{ 1 2, m D} = Ω log nη while the optimal cost is at most 1 + h D m m 1 = O log n m log m . Setting m = log n log log n, the optimal cost is O(1). And we prove the claimed lower bound of Ω( log nη log log n). Finally, it is straightforward to see that the summation of the prediction error η1 OPT = O(1). G t-OUTLIER SETTING Observe that our main error parameter η could be very sensitive to a single outlier prediction that has a large error. To make the error parameter bahave more smoothly, we introduce an integer parameter 1 t n, and define η(t) as the t-th largest prediction error, i.e., the maximum prediction error excluding the t 1 high-error outliers. Define η(t) 1 similarly. In the following, stated in Theorem G.1, we argue that our algorithm, without any modification (but with an improved analysis), actually has a ratio of O(log(1 + t) + log nη(t) OPT ) that holds for every t. This also means the ratio would be min1 t n O(log(1 + t) + log nη(t) OPT ), and this is clearly a generalization of Theorem 1.1, since η(t) = η when t = 1. Moreover, we note that our lower bound, Theorem 1.2, is still valid for ruling out the possibility of replacing η(t) with η(t) 1 in the abovementioned ratio, since one can still apply Theorem 1.2 with t = 1. However, it is an interesting open question to explore whether or not an algorithm with a ratio like O(log t + η(t) 1 OPT) for some specific t (e.g., t n) exists. Theorem G.1. For every integer 1 t n, Algorithm 1 is O(log(t + 1) + min{log n, max{1, log nη(t) OPT }})-competitive. Proof sketch. Let Γ = (γ1, γ2, . . . , γt 1) be indices of the t 1 largest prediction error d(f pred xγ , f opt xγ ). The high level idea is to break the dataset X into a good part XG := {xi : i [n]\Γ} and a bad part XB := {xi : i Γ} according to whether or not the prediction is within the outlier, and we argue that the expected cost of ALG on XG is O( nη(t) OPT ) times larger than that in OPT, and show the expected cost on XB is O(log(t + 1)) times. For the good part XG, we let X (f ) := X(f )\Γ and apply a similar analysis as in the proof of Lemma 3.5 to show that x X (f ) cost(x) 1, log nη(t) OPT x X (f ) d(x, f ) For the bad part XB, we assume all predictions for points in XB are of infinite error (which could only increase the cost of the algorithm), and let X+(f ) := X(f ) Γ. Then this lies in the case of Section E, where 7η > 4w(f ), which essentially means the prediction is almost useless and the whole proof reverts to pure Meyerson s algorithm. Combine the arguments from Section E and the analysis for the short distance stage, we have x X+(f ) cost(x) O(log(t + 1)) x X+(f ) d(x, f ) We finish the proof by combining the bound for the two parts.