# discretesmoothness_in_online_algorithms_with_predictions__b0cfc4db.pdf

Discrete-Smoothness in Online Algorithms with Predictions

Yossi Azar Tel Aviv University azar@tau.ac.il

Debmalya Panigrahi Duke University debmalya@cs.duke.edu

Noam Touitou Amazon noamtwx@gmail.com

In recent years, there has been an increasing focus on designing online algorithms with (machine-learned) predictions. The ideal learning-augmented algorithm is comparable to the optimum when given perfect predictions (consistency), to the best online approximation for arbitrary predictions (robustness), and should interpolate between these extremes as a smooth function of the prediction error. In this paper, we quantify these guarantees in terms of a general property that we call discretesmoothness and achieve discrete-smooth algorithms for online covering, specifically the facility location and set cover problems. For set cover, our work improves the results of Bamas, Maggiori, and Svensson (2020) by augmenting consistency and robustness with smoothness guarantees. For facility location, our work improves on prior work by Almanza et al. (2021) by generalizing to nonuniform costs and also providing smoothness guarantees by augmenting consistency and robustness.

1 Introduction

The field of learning-augmented online algorithms has gained rapid prominence in recent years. The basic premise is to provide an online algorithm with additional (machine-learned) predictions about the future to help bypass worst-case lower bounds. Since machine-learned predictions can be noisy in general, a key desideratum of the model is that the competitive ratio of the online algorithm should degrade gracefully with prediction error. In particular, the cost of the algorithm should be bounded against that of the predicted solution (called consistency) or that of an online algorithm without predictions (called robustness) and should smoothly interpolate between the two with increase in prediction error (called smoothness). (The terms consistency and robustness were originally coined by Purohit, Svitkina, and Kumar [38].) While robustness and consistency are problem-independent notions, smoothness depends on prediction error which has been defined in a problem-specific manner. In this paper, we introduce a novel, problem-independent notion of smoothness called discrete-smoothness that applies to any combinatorial problem. As illustrative applications of this new framework, we design discrete-smooth (learning-augmented) algorithms for two classic problems, facility location and set cover, which improve and generalize previous results for these problems due to Almanza et al. (Neur IPS 21 [1]) and Bamas et al. (Neur IPS 20 [11]).

First, we introduce discrete-smoothness. Suppose we are given a problem instance of size 𝑛. Let OPT be a solution for this instance. (The reader may think of OPT as an optimal solution, although our guarantees will hold for any feasible solution.) Let the predicted solution be 𝑆. Ideally, 𝑆= OPT; therefore, in general, OPT comprises two parts: the predicted part OPT|𝑆:= OPT 𝑆and the unpredicted part OPT|𝑆:= OPT \ 𝑆. On the predicted part OPT|𝑆, the algorithm has a meaningful signal from the prediction but the noise in the signal is given by the overprediction 𝑠Δ := |𝑆\ OPT|. Naturally, the competitive ratio of the algorithm on this part will degrade with increase in this noise. On the unpredicted part OPT|𝑆, the algorithm does not have any signal from the prediction and

This paper does not relate to the author s work at Amazon.

37th Conference on Neural Information Processing Systems (Neur IPS 2023).

cannot hope for a better competitive ratio than that of an online algorithm without prediction. Slightly abusing notation, we use OPT|𝑆, OPT|𝑆to denote both the aforementioned sets of items and their total cost; putting the two together, a learning-augmented algorithm ALG should satisfy

ALG 𝑂( 𝑓(𝑠Δ)) OPT|𝑆+ 𝑂( 𝑓(𝑛)) OPT|𝑆, (1)

where 𝑂( 𝑓( )) is the competitive ratio without prediction. We call the property of Equation (1) discrete-smoothness.

Let us first argue that Equation (1) recovers consistency and robustness. Consistency follows from setting 𝑆= OPT; then, Equation (1) demands a constant approximation to OPT. Similarly, robustness follows from the fact that for any 𝑆, the right hand side of Equation (1) is at most 𝑂( 𝑓(𝑛)) OPT.

Next, we show that the two terms 𝑓(𝑠Δ) and 𝑓(𝑛) in Equation (1) are the best possible. For the first term, consider a prediction 𝑆comprising the entire instance (of size 𝑛); in this case, we cannot hope for the better than 𝑓(𝑛)-competitive algorithm; thus, 𝑓(𝑠Δ) is necessary in the first term. And, for the second term, consider an empty prediction 𝑆= , in which case we again cannot hope for a better than 𝑓(𝑛)-competitive algorithm; thus, 𝑓(𝑛) is necessary in the second term. Note that the asymmetry between these two terms is necessary: specifically, 𝑓(𝑛) cannot be replaced by 𝑓(|OPT \ 𝑆|) since that would imply an 𝑓(OPT)-competitive online algorithm when 𝑆= . This is impossible, e.g., for the set cover problem.

A technical subtlety of the definition of discrete-smoothness (Equation (1)) is that given a fixed prediction 𝑆, the minimum value of the right hand side might actually be a solution OPT that is different from an optimal solution to the problem instance. So, although the solution OPT is intuitively an optimal solution, we require that a discrete-smooth algorithm satisfy Equation (1) for all feasible solutions OPT, and not just optimal solutions.

1.1 Our Results

We apply discrete-smoothness to the classic problems of online facility location and set cover. For these problems, we obtain results that improve on prior work. We describe these next.

Online Facility Location with Predictions. In the online facility location problem, we are given offline a metric space (𝑋, 𝛿) of 𝑚:= |𝑋| points, where each point 𝑣 𝑋has an associated facility opening cost 𝑜𝑣 0. On receiving an online request for a client at some location 𝑥 𝑋, the online algorithm must connect the client to an open facility at some location 𝑣 𝑋incurring connection cost 𝛿(𝑥, 𝑣). At any time, the algorithm is also allowed to open a facility at any location 𝑣 𝑋by incurring the opening cost 𝑜𝑣. (Note that a client cannot update her connection even if a closer facility is opened later.) The total cost of the algorithm is the sum of opening costs of opened facilities and connection costs of clients.

The first result for the online facility location problem is due to Meyerson [33] who obtained a randomized algorithm with a competitive ratio of 𝑂(log 𝑛) for 𝑛requests. This result was first derandomized [18], and later the competitive ratio slightly improved to 𝑂 log 𝑛 log log 𝑛 [19], by Fotakis. This latter bound is asymptotically tight. More recently, the online facility location problem has been considered in the context of machine-learned predictions (OFLP) by several papers [20, 1, 22]. Of these, the work of Almanza et al. [1] is the closest to our work (the other papers use metric error measures that are incomparable to our results). In [1], the offline input additionally contains a predicted solution of facilities 𝑆 𝑋, where we denote |𝑆| = 𝑠. By restricting the available facilities to the predicted set, they obtained an 𝑂(log 𝑠)-competitive algorithm for uniform facility opening costs, under the condition that OPT 𝑆.

We improve and generalize the Almanza et al. work by giving a discrete-smooth algorithm for the OFLP problem, i.e., an algorithm ALG that satisfies Equation (1): Theorem 1.1. There is an algorithm ALG for online (nonuniform) facility location with a predicted solution 𝑆that satisfies for every solution OPT

ALG 𝑂(log 𝑠Δ) OPT|𝑆+ 𝑂(log 𝑛) OPT|𝑆, (2)

where 𝑠Δ is the number of facilities in 𝑆\OPT and 𝑛is the number of online requests. Here, OPT|𝑆 (resp., OPT|𝑆) represents the sum of opening costs of facilities in OPT 𝑆(resp., OPT \ 𝑆) and connection costs of all clients connecting to facilities in OPT 𝑆(resp., OPT \ 𝑆).

This generalizes and improves the Almanza et al. result in three ways:

The result is generalized from uniform facility opening costs to arbitrary (nonuniform) costs. In fact, even for the online facility location problem (without prediction), we get an 𝑂(log 𝑚)-competitive algorithm for arbitrary (nonuniform) facility opening costs previously, Almanza et al. only established this for uniform costs.

The assumption that OPT 𝑆, i.e., the prediction contains the entire solution, is no longer required.

If OPT 𝑆(i.e., under the assumption of the Almanza et al. result), the competitive ratio improves from 𝑂(log 𝑠) to 𝑂(log 𝑠Δ). That is, the dependence is only on the prediction error and not the entire prediction.

In some situations, the length of the request sequence 𝑛can exceed the size of the metric space 𝑚. To address this situation, we show that 𝑛can be replaced by 𝑚in the above result:

Theorem 1.2. There is an algorithm ALG for online (nonuniform) facility location with a predicted solution 𝑆that satisfies for every solution OPT

ALG 𝑂(log 𝑠Δ) OPT|𝑆+ 𝑂(log 𝑚) OPT|𝑆, (3)

where 𝑚is the number of facilities in the metric space overall.

Online Set Cover with Predictions. In the online set cover problem, we are given offline a universe of elements 𝐸and 𝑚sets defined on them 𝑈 2𝐸with nonnegative costs. In each online step, we get a new element 𝑒 𝐸. If 𝑒is not already covered by the current solution, then the algorithm must add a new set from 𝑈that contains 𝑒to its solution. The total cost of the algorithm is the sum of costs of all sets in its solution.

Alon et al. [3] gave the first algorithm for the online set cover problem by introducing the online primal dual method, and obtained a competitive ratio of 𝑂(log 𝑚log 𝑛) where 𝑛denotes the number of requests. They also proved an almost matching lower bound of Ω log 𝑚log 𝑛 log log 𝑚+log log 𝑛 . Bamas, Maggiori, and Svensson [11] extended their work to online set cover with predictions (OSCP), where the offline input additionally contains a predicted solution of sets 𝑆 𝑈. They established consistency and robustness bounds for this problem by adapting the online primal dual method to use the predicted solution. The cost of their algorithm is bounded by the minimum of 𝑂(log 𝑛) times the cost of the prediction and 𝑂(log 𝑚log 𝑛) times the optimal cost. However, one cannot achieve smoothness through their work without choosing a trust parameter correctly in advance of the input.

We obtain a discrete-smooth algorithm for the OSCP problem, thereby giving the first algorithm for OSCP that goes beyond only consistency and robustness and achieves a smoothness guarantee: Theorem 1.3. There is an algorithm ALG for online set cover with a predicted solution 𝑆that satisfies for every solution OPT

ALG 𝑂(log 𝑠Δ log 𝑛) OPT|𝑆+ 𝑂(log 𝑚log 𝑛) OPT|𝑆, (4)

where 𝑠Δ is the number of sets in 𝑆\OPT. Here, OPT|𝑆(resp., OPT|𝑆) represents the sum of costs of sets in OPT 𝑆(resp., OPT \ 𝑆).

1.2 Our Techniques: A Framework for Discrete-Smooth Algorithms

At a high level, our framework merges two online algorithms to obtain a discrete-smooth algorithm. The algorithms differ in the guarantees they provide. The first algorithm ALG1 gets a sharper competitive ratio of 𝑂( 𝑓(𝑠)) but against the optimal solution restricted to the prediction 𝑆. The second algorithm ALG2 has the standard competitive ratio of 𝑂( 𝑓(𝑛)) but against the unconstrained optimum OPT. The main challenge in the combiner algorithm (call it ALG) is to decide how to route online requests to the two algorithms. The natural choice would be to decide this based on whether OPT|𝑆or OPT𝑆serves the request in OPT: in the first case, the request should be routed to ALG1 and in the second case, it should be routed to ALG2. But, of course, we do not know OPT and therefore don t know OPT|𝑆and OPT|𝑆.

Before we describe the combiner strategy, consider the properties that these algorithms need to satisfy.

First, consider the subset of requests served by OPT|𝑆. Intuitively, ALG1 should be competitive on these requests, which means that we need a stronger property from ALG1 that its cost on any subset of requests is competitive against the optimal solution for this subset. We call this property subset competitiveness.2 Symmetrically, subset competitiveness of ALG2 ensures that it is competitive on the requests in OPT|𝑆. Next, we need a guarantee on the cost of ALG1 on OPT|𝑆, and symmetrically, of ALG2 on OPT|𝑆. For this, we first augment ALG1, ALG2 to address the prize-collecting version of the original problem, where each online request can be ignored at a penalty cost. (Note that this is more general than the original problem where every online request must be served, since the latter can be recovered by setting the penalties to be infinitely large.) Setting the penalties appropriately, we ensure that the total penalty of the requests in OPT|𝑆is bounded against the cost of ALG1 on those requests (similarly for OPT|𝑆). Finally, we require that the cost of ALG1, ALG2 on any set of requests is bounded against the total penalty of the requests. We call this strengthened competitiveness w.r.t. penalties the Lagrangian property3. Note that this ensures that the cost of ALG1, ALG2 on OPT|𝑆, OPT|𝑆 are respectively bounded.

Now, we give the formal definition of Lagrangian subset-competitiveness that we motivated above. We use ALG(𝑄 |𝑄) to refer to the total cost of ALG incurred when addressing a subset 𝑄 𝑄as part of running on an input 𝑄. For any prize collecting solution SOL for input 𝑄, we separate its total cost into SOL𝑏(𝑄) (buying cost) and SOL𝑝(𝑄) (penalty cost). We formalize the Lagrangian subset-competitiveness property below: Definition 1.4 (Lagrangian subset-competitive algorithm). Let ALG be a randomized prizecollecting algorithm running on an input 𝑄. For any competitive ratio 𝛽, we say that ALG is Lagrangian 𝛽-subset-competitive if for every subset 𝑄 𝑄we have

E[ALG(𝑄 |𝑄)] 𝛽 OPT𝑏(𝑄 ) + 𝑂(1) OPT𝑝(𝑄 ) (5)

If in the equation above we replace the unconstrained optimum (OPT) by the optimal solution that can only use the prediction 𝑆, we say that ALG is Lagrangian 𝛽-subset-competitive w.r.t. 𝑆.

We now give the combiner algorithm:

Algorithm 1: Smooth merging framework (The combiner algorithm)

1 Let ALG1, ALG2 be two prize-collecting Lagrangian subset-competitive algorithms.

2 Event Function UPONREQUEST(𝑞)

3 Let 𝛼be the minimum penalty such that releasing (𝑞, 𝛼) to ALG1, ALG2 would result in the request being served in either ALG1 or ALG2. (The value of 𝛼can be determined by a standard guess-and-double .)

4 Release (𝑞, 𝛼) to both ALG1 and ALG2. Buy the items bought by ALG1, ALG2 as a result of this step.

The algorithm is simple: for a new online request 𝑞, the framework chooses the minimum penalty 𝛼which ensures that at least one of the two constituent algorithms ALG1, ALG2 would actually serve 𝑞(instead of paying the penalty). (𝑞, 𝛼) is then presented as a (prize-collecting) request to both algorithms. (Recall that the combined algorithm is for the non-prize-collecting problem, but the individual algorithms ALG1, ALG2 are for the prize-collecting problem.) At this stage, one of the algorithms serves the request (due to the choice of 𝛼) while the other may choose to pay the penalty. The combiner algorithm now simply buys all items bought by either algorithm.

Finally, we state the guarantees of the combiner algorithm informally. (For a formal description, see Appendix C.) Theorem 1.5. (Informal) If ALG1, ALG2 are Lagrangian 𝛽-subset-competitive algorithms for 𝛽= 𝑓(𝑠), 𝑓(𝑛) respectively, then Algorithm 1 satisfies the discrete-smoothness property (Equation (1).

2Our subset-competitiveness property is similar to [9]. 3Our Lagrangian competitiveness is similar to the Lagrangian multiplier preserving property in approximation algorithms for prize-collecting problems, e.g., [37, 26].

Applications of Theorem 1.5: Section 3 and Appendix B give Lagrangian subset-competitive algorithms for facility location, and Section 4 gives a Lagrangian subset-competitive algorithm for set cover. Given these constituent algorithms, we use Theorem 1.5 to prove Theorem 1.1 and Theorem 1.2 for facility location and Theorem 1.3 for set cover. These proofs are given in Appendix D.

Related Work. There is a growing body of work in online algorithms with predictions in the last few years (see, e.g., the surveys [35, 36]). This model was introduced by Lykouris and Vassilvitskii for the caching problem [32] and has since been studied for a variety of problem classes: rent or buy [27, 25, 21, 41, 5, 39], covering [11], scheduling [27, 41, 10, 28, 34, 30, 8], caching [31, 40, 24, 13], matching [29, 16, 7, 23], graph problems [6, 22, 1, 14, 4, 20, 9], and so on. Prior works on online facility location with predictions either do not consider prediction error [1] or use continuous notions of error [22, 20], such as functions of the distances between predicted and optimal facilities. Our discrete notion of error refers only to whether an optimal item is predicted. Similarly, prior work on online set cover with predictions [11, 4] also does not consider prediction error. Finally, we note that discrete prediction error (similar to this paper) as well as hybrids between discrete and continuous error have also been considered [42, 9, 14] but the prediction here is on the input rather than the solution.

2 The Framework

We now describe some of the concepts of the framework in more detail.

Reduction from 𝑠𝛿to 𝑠. Recall that we seek discrete-smooth algorithms, i.e., satisfying Equation (1). Our first step is to give a generic reduction that allows us to slightly weaken the guarantee to the following: ALG 𝑂( 𝑓(𝑠)) OPT|𝑆+ 𝑂( 𝑓(𝑛)) OPT|𝑆, (6)

where 𝑂( 𝑓( )) is the competitive ratio without predictions. We give a reduction from an algorithm that satisfies Equation (6) to one that satisfies Equation (1):

Theorem 2.1. Given an algorithm ALG such that ALG 𝑂( 𝑓(𝑠)) OPT|𝑆+ 𝑂(𝑔) OPT| 𝑆, there exists an algorithm ALG such that ALG 𝑂( 𝑓(𝑠Δ)) OPT|𝑆+ 𝑂(𝑔) OPT| 𝑆.

The proof of this theorem, in Appendix E, is roughly the following: for every integer 𝑖, once the cost of the algorithm exceeds 2𝑖, we buy the cheapest predicted items of total cost at most 2𝑖, and then remove them from the prediction. While 2𝑖< OPT, the total cost is 𝑂(1) OPT; once 2𝑖exceeds OPT, the size of the prediction is at most 𝑠Δ, and Equation (6) implies Equation (1).

Monotonicity. An additional, natural property that we demand from a constituent algorithm in our smooth combination framework is that increasing the penalty of input requests does not decrease the cost incurred by the algorithm. This is stated formally in the following definition.

Definition 2.2. We say that a prize-collecting algorithm ALG is monotone if, fixing the input request prefix ((𝑞𝑖, 𝜋𝑖))𝑘 1 𝑖=1 and current request (𝑞𝑘, 𝜋𝑘), then increasing 𝜋𝑘does not decrease ALG(𝑞𝑘, 𝜋𝑘).

Online amortization. Our framework extends to the case where Lagrangian subset-competitiveness and monotonicity are satisfied by amortized costs instead of actual costs. This is important because for some problems, the actual cost expressly prohibits subset competitiveness. For example, consider facility location: given an input of multiple, identical requests with very small penalty, the algorithm should eventually stop paying penalties and open a facility. However, for the specific request upon which the facility is opened, the cost of the algorithm is much larger than the penalty for that request, the latter being optimal for just that request. To overcome this complication, we allow the cost for a request to be amortized over previous requests, and call this online amortization.

First, we define online amortization of costs, and define a monotone online amortization which can be used in our framework. Definition 2.3 (online amortization). Let 𝑄= ((𝑞1, 𝜋1), , (𝑞𝑛, 𝜋𝑛)) be an online input given to ALG. An online amortization or OA is a number sequence (OA(𝑞, 𝜋))(𝑞, 𝜋) 𝑄such that:

1. ALG(𝑄) Í (𝑞, 𝜋) 𝑄OA(𝑞, 𝜋).

2. OA(𝑞𝑖, 𝜋𝑖) is only a function of (𝑞1, 𝜋1), , (𝑞𝑖, 𝜋𝑖), and can thus be calculated online.

When considering the amortized cost of an algorithm, we use similar notation to the actual cost: on an input 𝑄, we use OA(𝑄) to denote the total amortized cost. We also use OA(𝑄 |𝑄) to denote the total amortized cost incurred on a request subset 𝑄 𝑄. In addition, for a request (𝑞, 𝜋) in the input 𝑄, we use OA(𝑞, 𝜋) to refer to the amortized cost of (𝑞, 𝜋); here the input 𝑄is clear from context. Definition 2.4 (monotone online amortization). We call an online amortization OA monotone if (a) fixing previous requests, increasing the penalty of request (𝑞, 𝜋) never decreases OA(𝑞, 𝜋), and (b) when the algorithm pays penalty for (𝑞, 𝜋) then OA(𝑞, 𝜋) 𝜋.

The Main Theorem We are now ready to state the main theorem of our algorithmic framework. We use 𝛽1 and 𝛽2 to denote the competitive ratios of ALG1 and ALG2; the reader should think of 𝛽1 as 𝑂( 𝑓(𝑠)) and 𝛽2 as 𝑂( 𝑓(𝑛)), i.e., 𝛽2 𝛽1. Theorem 2.5. Consider any online covering problem with predictions P. Let ALG1, ALG2 be two algorithms for the prize-collecting version of P with monotone (online amortized) costs OA1, OA2 respectively such that (a) ALG1 is Lagrangian 𝛽1-subset-competitive using OA1 w.r.t. the prediction 𝑆, and (b) ALG2 is Lagrangian 𝛽2-subset-competitive using OA2 (against general OPT).

Then there exists ALG for P such that for every partition of the input 𝑄into 𝑄1, 𝑄2 we have

ALG(𝑄) 𝑂(𝛽1) OPT𝑆(𝑄1) + 𝑂(𝛽2) OPT(𝑄2)

We later show that Theorem 2.5 implies Equation (6) for facility location and set cover.

3 Online Facility Location

In this section, we consider metric, nonuniform facility location with predictions and present a novel prize-collecting algorithm TREEPROXY. This algorithm is Lagrangian 𝑂(log|𝑆|)-subset-competitive w.r.t. the prediction 𝑆of possible facilities; thus, it is used in our framework to prove Theorems D.2 and D.3, which in turn imply Theorems 1.1 and 1.2, respectively. In addition, TREEPROXY is a result independent of our framework/predictions: the competitiveness guarantee shown for TREEPROXY also achieves 𝑂(log 𝑚) competitiveness where 𝑚= |𝑋| is the size of the metric space. We prove the following theorem: Theorem 3.1. For facility location with predictions, there exists a randomized prize-collecting algorithm ALG with a monotone online amortization OA which is Lagrangian 𝑂(log|𝑆|)-subset competitive using OA w.r.t. 𝑆.

3.1 The Algorithm

Weighted hierarchically-separated trees (HSTs). The algorithm starts by embedding the metric space into the leaves of a weighted 3-HST, a metric space in which edge weights decrease at least exponentially as one descends from the root. Definition 3.2. For 𝛾> 1, a rooted tree with weights 𝑐to the edges is a weighted 𝛾-HST if for every two edges 𝑒1, 𝑒2 such that 𝑒2 is a parent edge of 𝑒1, it holds that 𝑐(𝑒2) 𝛾𝑐(𝑒1).

The following result is often used for embedding general metric spaces into weighted HSTs; it involves composing the embeddings of Fakcharoenphol et al. [17] and Bansal et al. [12]. Theorem 3.3 (Due to [17] and [12]). For every metric space (𝑋, 𝛿) and constant 𝛾, there exists a distribution D over weighted 𝛾-HSTs of depth 𝑂(log|𝑋|) in which the points in 𝑋are the leaves of the HST, such that for every two points 𝑥1, 𝑥2 𝑋we have:

1. 𝛿(𝑥1, 𝑥2) 𝛿𝑇(𝑥1, 𝑥2) for every 𝑇in the support of D.

2. E𝑇 D [𝛿𝑇(𝑥1, 𝑥2)] 𝑂(log|𝑋|) 𝛿(𝑥1, 𝑥2).

The algorithm starts by embedding the induced metric space of 𝑆into a weighted HST using Theorem 3.3; 𝑇denotes the resulting tree, and 𝑟denotes its root. For each edge 𝑒 𝑇, we denote by 𝑐(𝑒) the cost of the edge 𝑒. Denote the set of leaves in the subtree rooted at 𝑣by 𝐿(𝑣); note that 𝐿(𝑟) = 𝑆. Denote the distance between two nodes 𝑢, 𝑣in the tree by 𝛿𝑇(𝑢, 𝑣). For every point 𝑢 𝑋, define 𝑝(𝑢) := arg min𝑢 𝑆𝛿(𝑢, 𝑢 ); that is, 𝑝(𝑢) is the closest predicted point to 𝑢(abusing notation, we similarly define 𝑝(𝑞) for request 𝑞).

Proxy list. After embedding 𝑆into the leaves of a tree, the algorithm must open facilities on those leaves to serve requests. Intuitively, at any point the algorithm considers some (possibly internal) node 𝑣 𝑇, and considers connecting the current request through 𝑣to a facility in 𝐿(𝑣). Choosing from 𝐿(𝑣) introduces a tradeoff between the cost of opening the facility and its distance from 𝑣. For every 𝑣, we identify the leaves in 𝐿(𝑣) which offer the best points in this tradeoff (i.e., a Pareto frontier), and only allow the algorithm to choose from these leaves. This subset is called the proxy list of 𝑣, and denoted 𝑃(𝑣) 𝐿(𝑣).

We now define the proxy list 𝑃(𝑣). For ease of notation, define the logarithmic class operator ℓ(𝑥) := log 𝑥 . For node 𝑣 𝑇, we construct the proxy list 𝑃(𝑣) 𝐿(𝑣) as follows:

1. Start with 𝑉 𝐿(𝑣).

2. While there exist distinct 𝑣1, 𝑣2 𝑉such that ℓ(𝑜𝑣1) ℓ(𝑜𝑣2) and ℓ(𝛿𝑇(𝑣, 𝑣1)) ℓ(𝛿𝑇(𝑣, 𝑣2)), remove 𝑣1 from 𝑉.

3. Output 𝑉as 𝑃(𝑣).

We denote by 𝑘(𝑣) the size of the proxy list 𝑃(𝑣). We order the proxy list of 𝑣by increasing facility cost, thus writing 𝑃(𝑣) = (𝑠𝑣 1, , 𝑠𝑣 𝑘(𝑣)). For every 𝑣, 𝑖, we use the shorthands 𝑜𝑣 𝑖:= 𝑜𝑠𝑣 𝑖and 𝛿𝑣 𝑖:= 𝛿𝑇 𝑣, 𝑠𝑣 𝑖 . Slightly abusing notation, for every node 𝑣 𝑇we define 𝑐(𝑣) := 𝑐(𝑒𝑣) where 𝑒𝑣is the edge connecting 𝑣to its parent node (for 𝑟, we define 𝑐(𝑟) = ). For a more streamlined notation, for every node 𝑣 𝑇we define 𝛿𝑣 0 := 𝑐(𝑣) and 𝑜𝑣 𝑘(𝑣)+1 := .

Observation 3.4. For every node 𝑣 𝑇, the proxy list 𝑃(𝑣) satisfies:

1. For every 𝑢 𝐿(𝑣), there exists index 𝑖such that ℓ(𝑜𝑣 𝑖) ℓ(𝑜𝑢) and ℓ(𝛿𝑣 𝑖) ℓ(𝛿𝑇(𝑣, 𝑢)).

2. For every distinct 1 𝑖< 𝑗 𝑘(𝑣) + 1, it holds that ℓ(𝑜𝑣 𝑖) < ℓ(𝑜𝑣 𝑗).

3. For every distinct 0 𝑖< 𝑗 𝑘(𝑣), it holds that ℓ(𝛿𝑣 𝑖) > ℓ(𝛿𝑣 𝑗).

When 𝑖= 0, the third item in Observation 3.4 uses the fact that 𝑇is a weighted 3-HST; thus, the cost of an edge is at least twice the distance from the child node of that edge to any descendant leaf.

Counters. For every node 𝑣and every 𝑖 {1, 𝑘(𝑣) + 1}, we define a counter 𝜆(𝑣, 𝑖) of size 𝑜𝑣 𝑖.

Algorithm description. The algorithm for facility location with predictions is given in Algorithm 2. Initially, the algorithm embeds the metric space induced by 𝑆into a weighted 3-HST 𝑇, using Theorem 3.3; upon each node in this 𝑇the proxy lists are computed, and the corresponding counters are assigned. Upon the release of a request (𝑞, 𝜋), the function UPONREQUEST is triggered. Upon receiving (𝑞, 𝜋), it maps the request to the closest point 𝑝(𝑞) in 𝑆(that is, a leaf of the HST). Then, the algorithm attempts to solve the request on the HST through a process of increasing counters, which we soon describe. (While the described algorithm raises these counters continuously, the process can easily be discretized, replacing the continuous growth with jumping discretely to the next event.) The algorithm keeps track of (some measure of) the cost involved; if during UPONREQUEST that amount exceeds the penalty 𝜋, the algorithm pays the penalty instead (see Line 9).

When solving the request on 𝑢= 𝑝(𝑞), the algorithm climbs up the branch of 𝑢, until a facility is found (or opened) to connect 𝑢. At each ancestor 𝑣of 𝑢, the algorithm invests a growing amount 𝜏𝑣 in advancing the proxy list of 𝑣(i.e., buying a facility in 𝑃(𝑣) closer to 𝑣). It raises the counter for the next item on the proxy list until full, at which point the relevant proxy facility is opened, and the next counter in the proxy list begins to increase. (Note that the same facility can be opened more than once due to being on multiple proxy lists.) Once 𝜏𝑣reaches the cost of connecting 𝑣to an open proxy, the algorithm stops increasing counters and makes the connection. When no proxy in 𝑃(𝑣) is open, it could be that 𝜏𝑣exceeds the cost of moving from 𝑣to its parent 𝑝(𝑣); in this case, we ascend the branch and explore proxies for 𝑝(𝑣). Note that the function UPONREQUEST of Algorithm 2 also returns a value; this return value is the online amortization cost of the request, to be used in the analysis of the algorithm. (See Figure 1 for an example.)

The analysis of Algorithm 2, and the proof of Theorem 3.1, appear in Appendix A.

Algorithm 2: TREEPROXY for Prize Collecting Facility Location with Predictions

1 Initialization

2 Embed the prediction 𝑆into a weighted 3-HST 𝑇using Theorem 3.3.

3 For every 𝑣 𝑇, and every 𝑖 {1, , 𝑘(𝑣) + 1}, set 𝜆(𝑣, 𝑖) 0.

4 For every 𝑣 𝑇, set 𝑡(𝑣) 0.

5 Event Function UPONREQUEST(𝑞, 𝜋)

// Upon the next request 𝑞with penalty 𝜋in the sequence

6 Define 𝑢, 𝑣 𝑝(𝑞).

7 Define 𝜏 0, 𝜏𝑣 0.

8 continually increase 𝜏, 𝜏𝑣and 𝜆(𝑣, 𝑡(𝑣) + 1) at the same rate until:

9 if 𝜏+ 𝛿(𝑢, 𝑞) 𝜋then // cost for request exceeds penalty; pay penalty instead.

10 Pay the penalty 𝜋for the request.

11 return 𝜏+ 𝜋. // return amortized cost.

12 if 𝜆(𝑣, 𝑡(𝑣) + 1) = 𝑜𝑣 𝑡(𝑣)+1 then // counter for next proxy is full; open facility at proxy.

13 Open a facility at 𝑠𝑣 𝑡(𝑣)+1.

14 Increment 𝑡(𝑣).

15 goto Line 8.

16 if 𝜏𝑣 𝛿𝑣 𝑡(𝑣) then

17 if 𝑡(𝑣) = 0 then

// escalate the request to parent node.

18 Set 𝑣 𝑝(𝑣).

19 Define 𝜏𝑣 0.

20 goto Line 8.

21 Connect 𝑞to 𝑠𝑣 𝑡(𝑣). // connect request to closest proxy.

22 return 𝜏+ (𝜏+ 𝛿(𝑢, 𝑞)). // return amortized cost.

Figure 1: A possible state of Algorithm 2, immediately before connecting a request 𝑞. Here, 𝑞has been mapped to 𝑢, which is the closest point in 𝑆. The variable 𝑣, an ancestor of 𝑢, is shown, as is its proxy list 𝑠𝑣 1, 𝑠𝑣 2, 𝑠𝑣 3. The counters of the proxy list are also shown: 𝜆(𝑣, 1) is full (and a facility thus exists in 𝑠𝑣 1), and 𝜆(𝑣, 2) is partial (the last counter to be raised handling 𝑞). At some point, the growth in the counters of 𝑣exceeded the distance from 𝑣to 𝑠𝑣 1, and thus the connection of 𝑞to 𝑠𝑣 1 is made.

Algorithm 3: Online Prize-Collecting Fractional Set Cover

1 Initialization

2 Set 𝑥𝑠 0 for every set 𝑠.

3 Event Function UPONREQUEST (𝑞, 𝜋)

4 Set 𝑦𝑞 0.

5 while Í 𝑠 𝑈(𝑞) 𝑥𝑠 1 do

6 Set 𝑦𝑞 𝑦𝑞+ 1

7 if 𝜋 𝑦𝑞then

8 Pay penalty 𝜋for 𝑞.

9 return OA(𝑞, 𝜋) = 3𝜋.

10 foreach 𝑠 𝑈(𝑞) do

11 𝑥𝑠 𝑥𝑠 (1 + 1

𝑐𝑠) + 1 |𝑈(𝑞) |𝑐𝑠

12 return OA(𝑞, 𝜋) = 2𝑦𝑞.

4 Online Set Cover

In this section, we present and analyze an algorithm for prize-collecting fractional set cover which uses the well-known multiplicative updates method, and show that it is Lagrangian subset-competitive. Using this algorithm together with Algorithm 1 yields Theorem 1.3 (the proof appears in Appendix C).

Preliminaries. In prize-collecting fractional set cover, we are given a universe with elements 𝐸 and sets 𝑈; we define 𝑚:= |𝑈|. A solution may fractionally buy sets, according to a cost function 𝑐. Requests then arrive online, where each request is for covering some element 𝑒 𝐸, which is contained in some subfamily of sets from 𝑈. To cover an element, an algorithm must hold fractions of sets containing 𝑒which sum to at least 1. Observe that fractional set cover with predictions conforms to the definition of an online covering problem with predictions; in this problem, the items are the sets. For prize-collecting fractional set cover, we prove the following theorem.

Theorem 4.1. There exists a deterministic algorithm ALG for prize-collecting fractional set cover that ALG is Lagrangian 𝑂(log 𝑚)-subset-competitive

Theorem 4.1 implies that, in the framework of Algorithm 1, our algorithm can be used as the general component, independent of the prediction. But, given a prediction 𝑆 𝑈, we can simply restrict the

family of sets used by the algorithm to the given prediction, yields an algorithm competitive against OPT𝑆. Thus, Theorem 4.1 immediately yields the following corollary. Corollary 4.2. There exists a deterministic algorithm ALG for prize-collecting fractional set cover such that ALG is Lagrangian 𝑂(log 𝑚 )-subset-competitive w.r.t. prediction 𝑆 𝑈, where |𝑆| = 𝑚 .

The Algorithm. The algorithm for prize-collecting set cover is given in Algorithm 3. The algorithm follows the standard multiplicative updates method: while the pending request is uncovered, sets containing that request are bought at an exponential rate (see [2, 15]). However, in this prizecollecting version, the algorithm never lets its cost for a specific request exceed its penalty. For ease of notation, define 𝑈(𝑞) to be the collection of sets containing 𝑞; that is, 𝑈(𝑞) := {𝑠 𝑈|𝑞 𝑠}.

Analysis. Where the input 𝑄is fixed, and for (𝑞, 𝜋) 𝑄, we use ALG(𝑞, 𝜋) as a shorthand for ALG({(𝑞, 𝜋)}|𝑄); i.e., the cost of ALG when handling the request (𝑞, 𝜋) as part of 𝑄. We prove the two following lemmas: Lemma 4.3. For every (𝑞, 𝜋) 𝑄, it holds that ALG(𝑞, 𝜋) 3𝜋.

Lemma 4.4. For every subset 𝑄 𝑄, we have ALG(𝑄 |𝑄) 𝑂(log 𝑚) OPT(𝑄 ), where 𝑄 is the non-prize-collecting input formed from 𝑄 .

These two lemmas imply penalty-robust subset competitiveness, a property shown in Proposition C.2 to be equivalent to Lagrangian subset-competitiveness. Thus, we focus on proving these lemmas; note that the proof of Lemma 4.4 appears in Appendix F. Proposition 4.5. In every iteration of UPONREQUEST(𝑞, 𝜋), it holds that the total buying cost is at most 2𝑦𝑞, where 𝑦𝑞is the final value of the variable of the same name.

Proof. Consider each time 𝑦𝑞is incremented. The total cost of buying sets is the following.

𝑠 𝑈(𝑞) 𝑐𝑠 𝑥𝑠 1

𝑐𝑠 + 1 |𝑈(𝑞)|𝑐𝑠

𝑠 𝑈(𝑞) 𝑥𝑠 2

where the inequality is due to the fact that Í 𝑠 𝑈(𝑞) 𝑥𝑠 1. Thus, each time 𝑦𝑞is incremented by 1, the cost of buying sets is at most 2, completing the proof.

Proof of Lemma 4.3. Consider UPONREQUEST(𝑞, 𝜋). If it returned through Line 11, it holds that 𝑦𝑞 𝜋; Proposition 4.5 shows that the total buying cost was thus at most 2𝜋, and this cost is also ALG(𝑞, 𝜋). Otherwise, the function returned through Line 8; in this case, since 𝑦𝑞was incremented immediately before comparing 𝑦𝑞to 𝜋, the argument from the proof of Proposition 4.5 implies that the total buying cost is at most 2(𝑦𝑞 1) (using the final value of 𝑦𝑞). In turn, this is at most 2𝜋. In addition, the algorithm paid the penalty of 𝜋; overall, ALG(𝑞, 𝜋) 3𝜋.

Proof of Theorem 4.1. Lemma 4.3 and Lemma 4.4 show that the algorithm is 𝑂(log 𝑚)-PRSC; Proposition C.2 then yields that the algorithm is Lagrangian 𝑂(log 𝑚)-subset-competitive.

5 Experiments

Input Generation. Our set cover instances contain 100 elements. (The number of sets will vary in the experiments.) Every set contains every element with some constant probability 𝛼(we choose 𝛼= 0.02); that is, the input is represented by a random bipartite graph in which each edge manifests independently. Since this may not cover every element, we also add singleton sets for all elements. We generate random costs for the sets, independently drawn from a log-normal distribution (𝜇= 0, 𝜎= 1.6). For a given input, we generate a prediction in the following way:

1. Using an LP solver, we obtain an optimal fractional solution to the problem instance. 2. We randomly round the solution, such that every set appears in the prediction with probability proportional to its value in the fractional solution. 3. We apply noise to the prediction, of two types: false-positive noise, in which every set is added to the prediction with some probability 𝑝; and false-negative noise, in which every set is removed from the prediction with some probability 𝑞. (The reader should think of 𝑝 and 𝑞as the classification error where the predictions were generated using a classifier.)

𝑝, 𝑞 ON comp. ratio PREDON comp. ratio BASEMERGE comp. ratio SMOOTHMERGE comp. ratio

0, 0 6.007 (0.244) 1.689 (0.070) 3.102 (0.565) 2.779 (0.122) 0, 0.15 6.007 (0.244) 46.815 (54.436) 6.246 (1.516) 3.820 (0.555) 0, 0.3 6.007 (0.244) 96.156 (76.196) 7.093 (1.358) 4.824 (0.687) 0.005, 0 6.007 (0.244) 1.989 (0.106) 3.648 (0.630) 3.251 (0.184) 0.005, 0.15 6.007 (0.244) 25.983 (30.294) 6.597 (1.642) 4.200 (0.534) 0.005, 0.3 6.007 (0.244) 51.533 (43.375) 7.543 (1.541) 5.120 (0.642) 0.02, 0 6.007 (0.244) 2.631 (0.154) 4.489 (0.660) 4.240 (0.266) 0.02, 0.15 6.007 (0.244) 10.555 (7.549) 7.007 (1.496) 5.024 (0.498) 0.02, 0.3 6.007 (0.244) 17.588 (9.549) 8.156 (1.433) 5.760 (0.569)

Table 1: Competitive ratios for varying 𝑝, 𝑞, in a "mean (standard deviation)" format. Best values in each row are underlined.

2500 5000 7500 10000 12500 15000 17500 20000 #Sets

Competitive ratio

On Pred On Base Merge Smooth Merge

Figure 2: The competitive ratio for varying numbers of sets.

4. Finally, we add the singleton sets to the prediction, to ensure that the prediction covers all elements.

Baselines and evaluation. We evaluate our algorithm described in Section 4, denoted SMOOTHMERGE, against three baselines: the standard online algorithm without predictions, denoted ON; the online algorithm restricted to predicted sets, denoted PREDON; and the standard merging BASEMERGE of those two algorithms, which alternates between ON and PREDON whenever the overall cost doubles. For every choice of parameters, we measure the costs of the four algorithms; these costs are then averaged over 300 different random inputs. We then measure the expected competitive ratio of each algorithm. Our experiments were run on an AWS EC2 r5.16xlarge machine.

We ran the following experiments: (a) we vary the false-positive rate 𝑝and the false-negative rate 𝑞 keeping the number of sets fixed at 10000 (Table 1), and (b) we vary the number of sets in the input, fixing 𝑝= 0.005, 𝑞= 0.15 (Figure 2).

Experimental Results. We ran two sets of experiments. In the first experiment, we varied the falsepositive rate 𝑝and the false-negative rate 𝑞keeping the number of sets fixed at 10000. The results are reported in Table 1. We note that our algorithm SMOOTHMERGE outperforms the standard merging algorithm BASEMERGE and the online algorithm without predictions ON consistently across all values of 𝑝, 𝑞. SMOOTHMERGE also outperforms PREDON, the online algorithm restricted to the prediction, except when there are no false negatives, i.e., 𝑞= 0. This is to be expected because 𝑞= 0 implies that there is a good solution contained in the prediction. When 𝑞> 0, PREDON fails miserably and our algorithm SMOOTHMERGE obtains a competitive ratio that is an order of magnitude better than PREDON. This demonstrates the lack of robustness of PREDON because it is specifically tuned to correct predictions.

In the second set of experiments, we varied the number of sets in the input fixing the noise rates 𝑝= 0.005, 𝑞= 0.15. The results are reported in Figure 2. Our algorithm SMOOTHMERGE consistently outperforms all the baseline algorithms. In particular, it is able to utilize predictions to outperform ON, which the standard merging BASEMERGE is unable to achieve. Moreover, as the number of sets in the input grows, the gap between the two merging solutions increases.

6 Discussion

In this paper, we presented a novel framework for smooth interpolation between robustness and consistency guarantees in learning-augmented online algorithms, and applied it to set cover and facility location. More broadly, predictions for online algorithms are of two forms: prediction of the input and that of the solution. The notion of discrete-smoothness applies to any online combinatorial problem in the latter category, i.e., where a solution is provided in the form of a prediction to the algorithm. Many problems have been considered in this model including rent or buy problems, scheduling, matching, graph problems, etc. For all of these problems, the discrete-smoothness framework alleviates the need for problem-specific notions of prediction error and instead gives a common framework for arguing about the gradual degradation of solution quality with increase in prediction error. We hope that the current work will streamline the desiderata for learning-augmented online algorithms by adding this problem-independent notion of smoothness to the established (and also problem-independent) properties of consistency and robustness.

Acknowledgments

YA was supported in part by the Israel Science Foundation (grant No. 2304/20). DP was supported in part by NSF awards CCF-1750140 (CAREER) and CCF-1955703.

[1] Matteo Almanza, Flavio Chierichetti, Silvio Lattanzi, Alessandro Panconesi, and Giuseppe Re. Online facility location with multiple advice. In Marc Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, Neur IPS 2021, December 6-14, 2021, virtual, pages 4661 4673, 2021.

[2] Noga Alon, Baruch Awerbuch, Yossi Azar, Niv Buchbinder, and Joseph Naor. The online set cover problem. In Lawrence L. Larmore and Michel X. Goemans, editors, Proceedings of the 35th Annual ACM Symposium on Theory of Computing, June 9-11, 2003, San Diego, CA, USA, pages 100 105. ACM, 2003.

[3] Noga Alon, Baruch Awerbuch, Yossi Azar, Niv Buchbinder, and Joseph Naor. The online set cover problem. SIAM J. Comput., 39(2):361 370, 2009.

[4] Keerti Anand, Rong Ge, Amit Kumar, and Debmalya Panigrahi. Online algorithms with multiple predictions. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato, editors, International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 582 598. PMLR, 2022.

[5] Keerti Anand, Rong Ge, and Debmalya Panigrahi. Customizing ML predictions for online algorithms. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 303 313. PMLR, 2020.

[6] Antonios Antoniadis, Christian Coester, Marek Elias, Adam Polak, and Bertrand Simon. Online metric algorithms with untrusted predictions. In Proceedings of the 37th International Conference on Machine Learning,ICML 2020, 2020.

[7] Antonios Antoniadis, Themis Gouleakis, Pieter Kleer, and Pavel Kolev. Secretary and online matching problems with machine learned advice. In Hugo Larochelle, Marc Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, Neur IPS 2020, December 6-12, 2020, virtual, 2020.

[8] Yossi Azar, Stefano Leonardi, and Noam Touitou. Flow time scheduling with uncertain processing time. In Samir Khuller and Virginia Vassilevska Williams, editors, STOC 21: 53rd Annual ACM SIGACT Symposium on Theory of Computing, Virtual Event, Italy, June 21-25, 2021, pages 1070 1080. ACM, 2021.

[9] Yossi Azar, Debmalya Panigrahi, and Noam Touitou. Online graph algorithms with predictions. In Joseph (Seffi) Naor and Niv Buchbinder, editors, Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms, SODA 2022, Virtual Conference / Alexandria, VA, USA, January 9 - 12, 2022, pages 35 66. SIAM, 2022.

[10] Étienne Bamas, Andreas Maggiori, Lars Rohwedder, and Ola Svensson. Learning augmented energy minimization via speed scaling. In Hugo Larochelle, Marc Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, Neur IPS 2020, December 6-12, 2020, virtual, 2020.

[11] Étienne Bamas, Andreas Maggiori, and Ola Svensson. The primal-dual method for learning augmented algorithms. In Hugo Larochelle, Marc Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, Neur IPS 2020, December 6-12, 2020, virtual, 2020.

[12] Nikhil Bansal, Niv Buchbinder, Aleksander Madry, and Joseph Naor. A polylogarithmiccompetitive algorithm for the k-server problem. In Rafail Ostrovsky, editor, IEEE 52nd Annual Symposium on Foundations of Computer Science, FOCS 2011, Palm Springs, CA, USA, October 22-25, 2011, pages 267 276. IEEE Computer Society, 2011.

[13] Nikhil Bansal, Christian Coester, Ravi Kumar, Manish Purohit, and Erik Vee. Scale-free allocation, amortized convexity, and myopic weighted paging. Co RR, abs/2011.09076, 2020.

[14] Giulia Bernardini, Alexander Lindermayr, Alberto Marchetti-Spaccamela, Nicole Megow, Leen Stougie, and Michelle Sweering. A universal error measure for input predictions applied to online graph problems. In Neur IPS, 2022.

[15] Niv Buchbinder and Joseph Naor. The design of competitive online algorithms via a primal-dual approach. Found. Trends Theor. Comput. Sci., 3(2-3):93 263, 2009.

[16] Paul Dütting, Silvio Lattanzi, Renato Paes Leme, and Sergei Vassilvitskii. Secretaries with advice. In Péter Biró, Shuchi Chawla, and Federico Echenique, editors, EC 21: The 22nd ACM Conference on Economics and Computation, Budapest, Hungary, July 18-23, 2021, pages 409 429. ACM, 2021.

[17] Jittat Fakcharoenphol, Satish Rao, and Kunal Talwar. A tight bound on approximating arbitrary metrics by tree metrics. Journal of Computer and System Sciences, 69(3):485 497, 2004. Special Issue on STOC 2003.

[18] Dimitris Fotakis. A primal-dual algorithm for online non-uniform facility location. J. Discrete Algorithms, 5(1):141 148, 2007.

[19] Dimitris Fotakis. On the competitive ratio for online facility location. Algorithmica, 50(1):1 57, 2008.

[20] Dimitris Fotakis, Evangelia Gergatsouli, Themis Gouleakis, and Nikolas Patris. Learning augmented online facility location. Co RR, abs/2107.08277, 2021.

[21] Sreenivas Gollapudi and Debmalya Panigrahi. Online algorithms for rent-or-buy with expert advice. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 2319 2327. PMLR, 2019.

[22] Shaofeng H.-C. Jiang, Erzhi Liu, You Lyu, Zhihao Gavin Tang, and Yubo Zhang. Online facility location with predictions. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. Open Review.net, 2022.

[23] Zhihao Jiang, Pinyan Lu, Zhihao Gavin Tang, and Yuhao Zhang. Online selection problems against constrained adversary. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 5002 5012. PMLR, 2021.

[24] Zhihao Jiang, Debmalya Panigrahi, and Kevin Sun. Online algorithms for weighted caching with predictions. In 47th International Colloquium on Automata, Languages, and Programming, ICALP 2020, 2020.

[25] Ali Khanafer, Murali Kodialam, and Krishna P. N. Puttaswamy. The constrained ski-rental problem and its application to online cloud cost optimization. In Proceedings of the INFOCOM, pages 1492 1500, 2013.

[26] Jochen Könemann, Sina Sadeghian Sadeghabad, and Laura Sanità. An LMP o(log n)- approximation algorithm for node weighted prize collecting steiner tree. In 54th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2013, 26-29 October, 2013, Berkeley, CA, USA, pages 568 577. IEEE Computer Society, 2013.

[27] Ravi Kumar, Manish Purohit, and Zoya Svitkina. Improving online algorithms via ML predictions. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, Neur IPS 2018, December 3-8, 2018, Montréal, Canada, pages 9684 9693, 2018.

[28] Silvio Lattanzi, Thomas Lavastida, Benjamin Moseley, and Sergei Vassilvitskii. Online scheduling via learned weights. In Shuchi Chawla, editor, Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, pages 1859 1877. SIAM, 2020.

[29] Thomas Lavastida, Benjamin Moseley, R. Ravi, and Chenyang Xu. Learnable and instancerobust predictions for online matching, flows and load balancing. In Petra Mutzel, Rasmus Pagh, and Grzegorz Herman, editors, 29th Annual European Symposium on Algorithms, ESA 2021, September 6-8, 2021, Lisbon, Portugal (Virtual Conference), volume 204 of LIPIcs, pages 59:1 59:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021.

[30] Russell Lee, Jessica Maghakian, Mohammad H. Hajiesmaili, Jian Li, Ramesh K. Sitaraman, and Zhenhua Liu. Online peak-aware energy scheduling with untrusted advice. In Herman de Meer and Michela Meo, editors, e-Energy 21: The Twelfth ACM International Conference on Future Energy Systems, Virtual Event, Torino, Italy, 28 June - 2 July, 2021, pages 107 123. ACM, 2021.

[31] Thodoris Lykouris and Sergei Vassilvitskii. Competitive caching with machine learned advice. In Jennifer G. Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 3302 3311. PMLR, 2018.

[32] Thodoris Lykouris and Sergei Vassilvitskii. Competitive caching with machine learned advice. J. ACM, 68(4):24:1 24:25, 2021.

[33] Adam Meyerson. Online facility location. In 42nd Annual Symposium on Foundations of Computer Science, FOCS 2001, 14-17 October 2001, Las Vegas, Nevada, USA, pages 426 431, 2001.

[34] Michael Mitzenmacher. Scheduling with predictions and the price of misprediction. In Thomas Vidick, editor, 11th Innovations in Theoretical Computer Science Conference, ITCS 2020, January 12-14, 2020, Seattle, Washington, USA, volume 151 of LIPIcs, pages 14:1 14:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020.

[35] Michael Mitzenmacher and Sergei Vassilvitskii. Algorithms with predictions. In Tim Roughgarden, editor, Beyond the Worst-Case Analysis of Algorithms, pages 646 662. Cambridge University Press, 2020.

[36] Michael Mitzenmacher and Sergei Vassilvitskii. Algorithms with predictions. Commun. ACM, 65(7):33 35, 2022.

[37] Anna Moss and Yuval Rabani. Approximation algorithms for constrained node weighted steiner tree problems. SIAM J. Comput., 37(2):460 481, 2007.

[38] Manish Purohit, Zoya Svitkina, and Ravi Kumar. Improving online algorithms via ML predictions. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, Neur IPS 2018, December 3-8, 2018, Montréal, Canada, pages 9684 9693, 2018.

[39] Shufan Wang, Jian Li, and Shiqiang Wang. Online algorithms for multi-shop ski rental with machine learned advice. In Hugo Larochelle, Marc Aurelio Ranzato, Raia Hadsell, Maria Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, Neur IPS 2020, December 6-12, 2020, virtual, 2020.

[40] Alexander Wei. Better and simpler learning-augmented online caching. In Jaroslaw Byrka and Raghu Meka, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2020, August 17-19, 2020, Virtual Conference, volume 176 of LIPIcs, pages 60:1 60:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020.

[41] Alexander Wei and Fred Zhang. Optimal robustness-consistency trade-offs for learningaugmented online algorithms. In Hugo Larochelle, Marc Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, Neur IPS 2020, December 6-12, 2020, virtual, 2020.

[42] Chenyang Xu and Benjamin Moseley. Learning-augmented algorithms for online steiner tree. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022, pages 8744 8752. AAAI Press, 2022.

A Analysis of Algorithm 2

For this analysis section, we fix any input 𝑄= ((𝑞1, 𝜋1), , (𝑞𝑛, 𝜋𝑛)). For both ALG and OPT𝑆 we use the superscript f to refer only to facility opening costs and c to refer only to connection costs. We denote by OA(𝑞, 𝜋) the value returned by UPONREQUEST in Algorithm 2 upon receiving the pair (𝑞, 𝜋); we choose (OA(𝑞, 𝜋)) as the online amortization of Algorithm 2.

Online Amortization. First, we show that the cost of the algorithm is bounded by the online amortization: Lemma A.1. It holds that ALG(𝑄) OA(𝑄).

Proof. We use the subscript 𝑞to refer to the final value of a variable in UPONREQUEST(𝑞, 𝜋). The cost of the algorithm has the following three components:

1. Penalties paid.

2. Opening costs for facilities in 𝑆.

3. Connection costs for facilities in 𝑆.

Let 𝑄 𝑄be the set of requests served by the algorithm (i.e., no penalty was paid).

Penalties for requests in 𝑄\𝑄 . Consider that whenever a penalty 𝜋is paid for a request in Line 10, the additive term 𝜋appears in the amortized cost of that request. We charge the penalty cost to that term.

Opening cost. Note that a facility 𝑠𝑣 𝑖is only opened (at cost 𝑜𝑣 𝑖) when the counter 𝜆(𝑣, 𝑖) reaches 𝑜𝑣 𝑖, and that counter is never used again; thus, the total opening cost can be charged to the sum over request 𝑞of the amount by which request 𝑞raises counters, which is 𝜏𝑞. We charge this to the term 𝜏𝑞in OA(𝑞, 𝜋).

Connection cost for requests in 𝑄 . Suppose a request (𝑞, 𝜋) 𝑄 is connected to some point 𝑤 𝑆. There exists an index 𝑖such that 𝑤= 𝑠𝑣𝑞 𝑖. It holds that

𝛿(𝑞, 𝑤) 𝛿 𝑞, 𝑢𝑞 + 𝛿 𝑢𝑞, 𝑤 𝛿(𝑞, 𝑆) + 𝛿𝑇 𝑢𝑞, 𝑤 𝛿(𝑞, 𝑆) + 𝛿𝑇 𝑢𝑞, 𝑣𝑞 + 𝛿𝑇 𝑣𝑞, 𝑤 . (7)

where the first and third inequalities are due to the triangle inequality, and the second inequality is due to the definition of 𝑢𝑞and Theorem 3.3. Now, note that 𝛿𝑇 𝑣𝑞, 𝑤 = 𝛿𝑣𝑞 𝑖 𝜏𝑣𝑞 𝑞from the condition of Line 16.

Enumerate the path in the tree between 𝑢𝑞and 𝑣𝑞as 𝑢𝑞= 𝑤0, 𝑤1, , 𝑤𝑘= 𝑣𝑞, and note that 𝛿𝑇 𝑢𝑞, 𝑣𝑞 = Í𝑘 1 𝑗=0 𝑐 𝑤𝑗 . Now, note that the variable 𝑣advanced from 𝑤𝑗to 𝑤𝑗+1 due to 𝜏𝑤𝑗 𝑞 𝑐 𝑤𝑗 ; thus, 𝛿𝑇 𝑢𝑞, 𝑣𝑞 = Í𝑘 1 𝑗=0 𝜏𝑤𝑗 𝑞. Finally, note that Í𝑘 𝑗=0 𝜏𝑤𝑗 𝑞 = 𝜏𝑞; combining, we get

𝛿𝑇 𝑢𝑞, 𝑣𝑞 + 𝛿𝑇 𝑣𝑞, 𝑤

𝑗=0 𝜏𝑤𝑗= 𝜏𝑞

Plugging the above into Equation (7), we get 𝛿(𝑞, 𝑤) 𝜏𝑞+ 𝛿(𝑞, 𝑆). We thus charge the connection cost of requests from 𝑄 to the (𝜏𝑞+ 𝛿(𝑞, 𝑆)) term in OA(𝑞, 𝜋).

This completes the proof of the lemma.

Observation A.2. The online amortization OA of Algorithm 2 is monotone.

Bounding Amortized Costs. Having shown that the online amortization is valid and monotone, it remains to bound the amortized cost of the algorithm. To show that the algorithm is Lagrangian subset-competitive, it is enough to show that it is PRSC; see Proposition C.2. We thus focus on showing that the algorithm is PRSC using OA w.r.t. 𝑆.

From this point on, for every node 𝑣 𝑇and index 𝑖 [𝑘(𝑣) + 1], we slightly abuse notation and use 𝜆(𝑣, 𝑖) to refer to both the counter itself, and its value at the end of the algorithm. Proposition A.3 (Penalty Robustness). For every (𝑞, 𝜋) 𝑄, it holds that OA(𝑞, 𝜋) 2𝜋.

Proof. If UPONREQUEST(𝑞, 𝜋) returns in Line 22, then it must be that the condition in Line 9 has failed, and thus 𝜏+ 𝛿(𝑢, 𝑞) 𝜋; thus, OA(𝑞, 𝜋) = 𝜏+ (𝜏+ 𝛿(𝑢, 𝑞)) 2𝜋.

Otherwise, UPONREQUEST(𝑞, 𝜋) returned on Line 11, in which case note that since 𝜏is raised continuously from 0, Line 11 ensures that 𝜏 𝜋at all times. Thus, OA(𝑞, 𝜋) = 𝜏+ 𝜋 2𝜋, completing the proof.

It remains to show subset competitiveness for the algorithm. Henceforth, fix any subset of the input 𝑄 𝑄.

Proposition A.4. For every request 𝑞and 𝑣 𝑇, 𝜏𝑣 𝑞 𝑐(𝑣).

Proof. Observe that 𝜏𝑣 𝑞cannot exceed 𝛿𝑣 𝑡(𝑣), for some current value of 𝑡(𝑣), or else the request is connected (or escalated to a parent node). The fact that 𝛿𝑣 0 = 𝑐(𝑣), together with the fact that 𝛿𝑣 𝑖is a decreasing sequence in 𝑖(Observation 3.4) complete the proof.

We now begin to bound the (amortized) costs of the algorithm. Recall that 𝑄 is the input 𝑄 with the penalties set to infinity; that is, the prize-collecting input converted to the standard setting. We would like to prove the following lemma.

Lemma A.5. E[OA(𝑄 |𝑄)] 𝑂(log(|𝑆|)) OPT𝑆

When the input consists of requests that are also from 𝑆, both the clients and facilities are from 𝑆, and thus on the leaves of the tree 𝑇. In this case, we define OPT𝑇to be any solution for the input under the metric space induced by the weighted HST 𝑇. To prove Lemma A.5, we first bound the cost of the algorithm against OPT𝑇on a set of clients mapped to their closest neighbors in 𝑆.

Lemma A.6. Let 𝑄 𝑆be the input formed from 𝑄 by mapping each request (𝑞, 𝜋) 𝑄 to the request (𝑝(𝑞), 𝜋). It holds that

(𝑞, 𝜋) 𝑄 𝛿(𝑞, 𝑆) + 𝑂(𝐷) OPT 𝑓 𝑇

𝑄 𝑆 + 𝑂(1) OPT𝑐 𝑇

Proof. First, observe both return statements in Algorithm 2 and note that for every request (𝑞, 𝜋) 𝑄 it holds that

OA(𝑞, 𝜋) 2𝜏𝑞+ 𝛿(𝑞, 𝑆). (8)

We now focus on bounding Í (𝑞, 𝜋) 𝑄 𝜏𝑞, i.e., total amount by which counters are raised when handling 𝑄 . Let 𝑤be a facility opened in OPT𝑇(𝑄 𝑆). Let 𝑅 𝑄 be the set of requests such that their corresponding requests in 𝑄 𝑆are connected by OPT𝑇to the facility 𝑤. Using Observation 3.4, for every ancestor tree node 𝑣of 𝑤, we define 𝑖𝑣to be the minimal index such that ℓ(𝑜𝑣 𝑖𝑣) ℓ(𝑜𝑤) and ℓ(𝛿𝑣 𝑖𝑣) ℓ(𝛿𝑇(𝑣, 𝑤)).

Let 𝑃(𝑤) = (𝑣0 = 𝑤, 𝑣1, , 𝑣𝑘= 𝑟) be the path from 𝑤to the root. The sum Í (𝑞, 𝜋) 𝑄 𝜏𝑞can be divided as follows:

1. Raising counters 𝜆(𝑣, 𝑖) for 𝑣 𝑃(𝑤), 𝑖 𝑖𝑣. The total amount here is at most

𝑖=1 𝜆(𝑣, 𝑖)

𝑖=1 2ℓ(𝑜𝑣 𝑖)+1

𝑣 𝑃(𝑤) 2ℓ(𝑜𝑣 𝑖𝑣)+2

𝑣 𝑃(𝑤) 2ℓ(𝑜𝑤)+2

𝑣 𝑃(𝑤) 4𝑜𝑤 4𝐷𝑜𝑤.

2. Raising counters 𝜆(𝑣, 𝑖) for 𝑣 𝑃(𝑤). Consider a request 𝑞 𝑅, and define 𝑢:= 𝑝(𝑞) = 𝑢 and 𝑣to be the lowest common ancestor of 𝑢and 𝑤. The only nodes not in 𝑃(𝑤) in which counters are raised when handling 𝑞are on the path from 𝑢(inclusive) to 𝑣(non-inclusive). Using Proposition A.4, the total increase in counters for these nodes is at most 𝛿𝑇(𝑢, 𝑣).

3. Raising counters 𝜆(𝑣, 𝑖) for 𝑣 𝑃(𝑤) and 𝑖> 𝑖𝑣. Suppose that a request 𝑞raises such a counter 𝜆 𝑣𝑗, 𝑖 for some node 𝑣𝑗 𝑃(𝑤). When such a counter is raised, the proxy 𝑠𝑣𝑗 𝑖𝑣𝑗is already open, and thus the total raising of counters of index greater than 𝑖𝑣𝑗for 𝑣𝑗by 𝑞is at most 𝛿𝑣𝑗 𝑖𝑣𝑗 2𝛿𝑣𝑗 𝑤= 2𝛿𝑇 𝑣𝑗, 𝑣 + 2𝛿𝑇(𝑣, 𝑤), where 𝑣is the lowest common ancestor of 𝑢 and 𝑤. (Note that other proxies of 𝑣𝑗of larger index could be open, but they can only be closer to 𝑣𝑗, thus limiting the raising of counters even further.)

Of those two costs, we would like to charge 𝑞only for 2𝛿𝑇(𝑣, 𝑤), and charge 2𝛿𝑇 𝑣𝑗, 𝑣 in aggregate over all 𝑞. To do so, observe that the counters for nodes in 𝑃(𝑤)\ 𝑣𝑗 that were raised upon request 𝑞must be of the form 𝜆(𝑣𝑙, 1) for 𝑣𝑙 𝑣0, , 𝑣𝑗 1 . As the request 𝑞was repeatedly escalated from 𝑣to 𝑣𝑗, the total increase in those counters must be at least 𝛿𝑇 𝑣, 𝑣𝑗 , and thus 2𝛿𝑇 𝑣, 𝑣𝑗 is upper bounded by twice the increase in those counters. However, as seen in Item 1, over all requests, these increases sum to at most 4𝐷𝑜𝑤over all 𝑞 𝑅; thus, the term 2𝛿𝑇 𝑣𝑗, 𝑣 sums in aggregate to at most 8𝐷𝑜𝑤.

Overall, denoting by 𝑤𝑞the lowest common ancestor of 𝑝(𝑞) and 𝑤, we get:

(𝑞, 𝜋) 𝑅 𝜏(𝑞, 𝜋) 4𝐷𝑜𝑤+

(𝑞, 𝜋) 𝑅 𝛿𝑇(𝑝(𝑞), 𝑤𝑞) + 8𝐷𝑜𝑤+

(𝑞, 𝜋) 𝑅 𝛿𝑇(𝑤𝑞, 𝑤)ª 12𝐷𝑜𝑤+ 2𝛿𝑇(𝑝(𝑞), 𝑤).

Summing over all 𝑤, we get

(𝑞, 𝜋) 𝑄 𝜏(𝑞, 𝜋) 12𝐷 OPT 𝑓 𝑇

𝑄 𝑆 + 2 OPT𝑐 𝑇

Combining with Equation (8), we get

(𝑞, 𝜋) 𝑄 𝛿(𝑞, 𝑆) + 24𝐷 OPT 𝑓 𝑇

𝑄 𝑆 + 4 OPT𝑐 𝑇

Having bounded the costs of the algorithm against OPT𝑇, we can now prove Lemma A.5.

Proof of Lemma A.5. Using Lemma A.6, we get the following.

E[OA(𝑄 |𝑄)]

(𝑞, 𝜋) 𝑄 𝛿(𝑞, 𝑆) + E h 𝑂(log(|𝑆|)) OPT 𝑓 𝑇

𝑄 𝑆 + 𝑂(1) OPT𝑐 𝑇

Now, note that every solution OPT𝑆(𝑄 𝑆) induces a solution for 𝑄 𝑆on 𝑇, which opens the same facilities and makes the same connections (through the tree); the new tree solution has the same facility opening costs, and connection costs which are, in expectation, at most 𝑂(log(|𝑆|))-times greater (see Theorem 3.3). Thus, we have

E[OA(𝑄 |𝑄)]

(𝑞, 𝜋) 𝑄 𝛿(𝑞, 𝑆) + 𝑂(log(|𝑆|)) OPT𝑆

Now, note that any solution OPT𝑆

𝑄 induces a solution for 𝑄 𝑆of cost Í (𝑞, 𝜋) 𝑄 𝛿(𝑞, 𝑆) +

𝑄 , and also note that Í (𝑞, 𝜋) 𝑄 𝛿(𝑞, 𝑆) is a lower bound for OPT𝑆

𝑄 . Plugging into the displayed equation above completes the proof of the lemma.

Proof of Theorem 3.1. Lemma A.1 and Observation A.2 show that the online amortization OA is valid and monotone. Proposition A.3 shows penalty robustness, while Lemma A.5 shows subset competitiveness; thus, the algorithm is 𝑂(log|𝑆|)-PRSC using OA w.r.t. 𝑆. Using Proposition C.2, the algorithm is Lagrangian 𝑂(log|𝑆|)-subset-competitive using OA w.r.t. 𝑆.

Algorithm 4: Variant of Fotakis Algorithm for Prize-Collecting OFLP

1 Initialization

4 For every 𝑣 𝑋, let 𝑝(𝑣) 0.

5 Event Function UPONREQUEST(𝑞, 𝜋) // Upon the next request 𝑞in the sequence on point 𝑢 𝑋

6 Set 𝑄 𝑄 {𝑞}.

7 Denote by 𝑣0 the closest open facility to 𝑞.

8 Define 𝜏𝑞 min{𝜋, 𝛿(𝑞, 𝐹), min𝑣 𝑋{𝑜𝑣 𝑝(𝑣) + 𝛿(𝑞, 𝑣)}}

9 if 𝜏𝑞= 𝛿(𝑞, 𝐹) then

10 Connect 𝑞to the closest facility in 𝐹.

11 else if 𝜏𝑞= 𝑜𝑣 𝑝(𝑣) + 𝛿(𝑞, 𝑣) for some 𝑣 𝑋then

12 Open a facility at 𝑣.

13 Connect 𝑞to 𝑣.

15 Pay the penalty 𝜋for 𝑞.

16 COMPUTEPOTENTIALS()

17 return 2𝜏𝑞// return amortized cost

18 Function COMPUTEPOTENTIALS()

19 For every 𝑞 𝑄, define 𝜆𝑞= min 𝛿(𝑞, 𝐹), 𝜏𝑞

20 For every location 𝑣 𝑋, set 𝑝(𝑣) Í 𝑞 𝑄 𝜆𝑞 𝛿(𝑞, 𝑣) +.

B Online Facility Location: The 𝑂(log 𝑛)-Competitive Algorithm

In this section, we present and analyze a prize-collecting algorithm for facility location with predictions whose competitive ratio on the number of requests 𝑛= |𝑄|. As is required for using Algorithm 1, this algorithm is Lagrangian subset-competitive. This algorithm is based on the work of Fotakis [18] for the non-prize-collecting setting. Specifically, we prove the following theorem. Theorem B.1. For facility location with predictions, there exists a deterministic prize-collecting algorithm ALG with a monotone online amortization OA which is Lagrangian 𝑂(log 𝑛)-subsetcompetitive using OA.

B.1 The Algorithm

Algorithm s description. This algorithm follows the main principles of Fotakis [18]. Each point in the metric space has an associated potential, such that when that potential exceeds the cost of opening a facility at that point, the facility is opened. This potential roughly translates to the amount by which the cost of the offline solution for known requests would decrease by opening a facility at that location. Observing each request, consider the ball centered at that request such that the closest open facility lies on the sphere of that ball; the request imposes a potential increase for every point inside that ball. However, as the requests now have penalties, these penalties cap the radius of the ball, i.e., limit the potential imposed by the requests.

Specifically, the algorithm assigns each request a cost 𝜏𝑞, which intuitively is the minimum cost of handling the current request. This cost could be the penalty cost, the cost of connecting to an open facility, or the cost of opening a facility (beyond the current potential budget) and then connecting to it. The algorithm spends an amortized cost of 𝜏𝑞to serve 𝑞, but a potential ball of radius 𝜏𝑞is also created to serve future requests (at an future cost of at most 𝜏𝑞).

For every 𝑥, we use 𝑥+ as a shorthand for max{0, 𝑥}. The prize-collecting algorithm based on [18] is given in Algorithm 4.

B.1.1 Analysis

We now analyze Algorithm 4 and show that it proves Theorem B.1. For this analysis, we fix the prize-collecting input 𝑄. Next, we define the online amortization OA such that OA(𝑞, 𝜋) is the value returned by UPONREQUEST in Theorem B.1 upon release of (𝑞, 𝜋) 𝑄.

Online Amortization

We first prove that OA is valid and monotone.

Lemma B.2. The online amortization OA for Algorithm 4 is valid, i.e., ALG(𝑄) OA(𝑄).

Proof. For each request, observe the variable 𝜏𝑞, and note that:

If the penalty 𝜋is paid for 𝑞, then 𝜏𝑞= 𝜋.

If 𝑞is connected to some facility, the connection cost of 𝑞does not exceed 𝜏𝑞.

It remains to bound the opening costs of the algorithm. Observe the evolution of the potential function Í 𝑞 𝑄min 𝛿(𝑞, 𝐹), 𝜏𝑞 as 𝑄and 𝐹grow over time. This function is nonnegative, and grows by exactly 𝜏𝑞upon the release of (𝑞, 𝜋) (after Line 8). Moreover, whenever a facility at 𝑣is opened (thus joining 𝐹), it decreases this amount by exactly 𝑜𝑣. Thus, the total opening cost can be bounded by Í 𝑞 𝑄𝜏𝑞.

Overall, we bounded the cost of the algorithm by Í (𝑞, 𝜋) 𝑄2𝜏𝑞= Í (𝑞, 𝜋) 𝑄OA(𝑞, 𝜋).

Observation B.3. The online amortization OA given for Algorithm 4 is a monotone online amortization.

B.2 Bounding Amortized Costs

Having shown the necessary properties for the online amortization, we proceed to show that Algorithm 4 is Lagrangian subset-competitive using this amortization. As in Section 3, we first show that the algorithm is PRSC (see Proposition C.2); we begin by observing the penalty robustness of the algorithm.

Observation B.4. For every (𝑞, 𝜋) 𝑄, it holds that OA(𝑞, 𝜋) 2𝜋.

We now fix the subset 𝑄 𝑄for the sake of proving subset competitiveness. Recall that 𝑄 is the standard input formed from the prize-collecting input 𝑄 (by setting penalties to infinity).

Before proving subset-competitiveness, we need to prove the following simple lemma.

Lemma B.5 (Min trace lemma). Let (𝑎1, , 𝑎𝑘), (𝑏1, , 𝑏𝑘) be two sequences of non-negative numbers, and define 𝑐𝑖, 𝑗= min(𝑎𝑖, 𝑏𝑗). Then if there exists 𝑧such that for every 𝑖it holds that Í𝑖 𝑗=1 𝑐𝑖, 𝑗 𝑧, then it holds that Í𝑘 𝑖=1 𝑐𝑖,𝑖= 𝑂(log 𝑘) 𝑧.

Proof. We prove that Í𝑘 𝑖=1 𝑐𝑖,𝑖 𝐻𝑘 𝑧by induction on 𝑘, where 𝐻𝑘= Í𝑘 𝑖=1 1

𝑖is the 𝑘-th harmonic number. Note that the base case, in which 𝑘= 1, holds as 𝑐1,1 𝑧.

Now, for the general case, note that if we can find 𝑖such that 𝑐𝑖,𝑖 𝑧

𝑘, then we can complete the proof by induction on the sequences (𝑎1, , 𝑎𝑖 1, 𝑎𝑖+1, , 𝑎𝑘) and (𝑏1, , 𝑏𝑖 1, 𝑏𝑖+1, , 𝑏𝑘). (Note that the constraints required for this inductive instance are implied by the original constraints.) This induction would imply that Í 𝑖 𝑖𝑐𝑖 ,𝑖 𝐻𝑘 1 𝑧, to which adding 𝑐𝑖,𝑖would complete the proof.

It remains to find 𝑖such 𝑐𝑖,𝑖 𝑧

𝑘. We consider the constraint Í𝑘 𝑗=1 𝑐𝑘, 𝑗 𝑧, and observe the following cases.

Case 1: 𝑐𝑘, 𝑗are equal for all 𝑗. In this case, all 𝑐𝑘, 𝑗are at most 𝑧

𝑘. In particular, this is true for 𝑐𝑘,𝑘; thus, choosing 𝑖= 𝑘completes the proof.

Case 2: 𝑐𝑘, 𝑗are not all equal. In this case, observe 𝑗that minimizes 𝑐𝑘, 𝑗, and note that 𝑐𝑘, 𝑗 𝑧

𝑘. There exists 𝑗 such that 𝑐𝑘, 𝑗< 𝑐𝑘, 𝑗 , which implies 𝑐𝑘, 𝑗< 𝑎𝑘, and thus 𝑐𝑘, 𝑗= 𝑏𝑗, yielding 𝑏𝑗 𝑧

𝑘. But this implies 𝑐𝑗, 𝑗 𝑏𝑗 𝑧

𝑘, and thus choosing 𝑖= 𝑗completes the proof.

We can now prove subset-competitiveness, as stated in Lemma B.6.

Lemma B.6. OA(𝑄 |𝑄) 𝑂(log|𝑄 |) OPT

Proof. Let 𝑤be some facility opened by OPT

𝑄 , and denote by 𝑅 𝑄 the set of requests

connected to that facility in OPT

𝑄 . Define 𝐶𝑤:= Í (𝑞, 𝜋) 𝑅𝛿(𝑤, 𝑞) the total connection cost

incurred by OPT

𝑄 on the facility 𝑤. Enumerate these requests as ((𝑞1, 𝜋1), , (𝑞𝑘, 𝜋𝑘)), where 𝑘= |𝑅|. For 1 𝑖 𝑘, denote by 𝐹𝑖the set of facilities which were open immediately before the release of (𝑞𝑖, 𝜋𝑖). As a shorthand, we also define 𝜏𝑖= 𝜏𝑞𝑖. Consider that the total potential of the facility 𝑤can never exceed its cost 𝑜𝑤; moreover, upon release of (𝑞𝑖, 𝜋𝑖), the choice of 𝜏𝑖ensures that

𝑜𝑤 𝜏𝑖 𝛿(𝑞𝑖, 𝑤) +

𝑗=1 min(𝜏𝑗, (𝛿 𝑞𝑗, 𝐹𝑖 𝛿 𝑞𝑗, 𝑤 )+)

𝜏𝑖 𝛿(𝑞𝑖, 𝑤) +

𝑗=1 min(𝜏𝑗, 𝛿(𝑞𝑖, 𝐹𝑖) 𝛿(𝑞𝑖, 𝑤) 𝛿 𝑞𝑗, 𝑤 𝛿 𝑞𝑗, 𝑤 )

𝜏𝑖 𝛿(𝑞𝑖, 𝑤) +

𝑗=1 min(𝜏𝑗, 𝜏𝑖 𝛿(𝑞𝑖, 𝑤)) 2

𝑗=1 𝛿 𝑞𝑗, 𝑤

𝜏𝑖 𝛿(𝑞𝑖, 𝑤) +

𝑗=1 min(𝜏𝑗, 𝜏𝑖 𝛿(𝑞𝑖, 𝑤)) 2𝐶𝑤

𝑗=1 min(𝜏𝑗, 𝜏𝑖 𝛿(𝑞𝑖, 𝑤)) 2𝐶𝑤 (9)

where the second inequality uses the triangle inequality and the third inequality uses the definition of 𝜏𝑖.

From Equation (9), we have that for every 1 𝑖 𝑘it holds that

𝑗=1 min(𝜏𝑗, 𝜏𝑖 𝛿(𝑞𝑖, 𝑤)) 𝑜𝑤+ 2𝐶𝑤.

Using Lemma B.5, this yields

𝑖=1 min(𝜏𝑖, 𝜏𝑖 𝛿(𝑞𝑖, 𝑤)) 𝑂(log 𝑘) (𝑜𝑤+ 2𝐶𝑤)

Since 𝜏𝑖 𝛿(𝑞𝑖, 𝑤) = min(𝜏𝑖, 𝜏𝑖 𝛿(𝑞𝑖, 𝑤)), and since Í𝑘 𝑖=1 𝛿(𝑞𝑖, 𝑤) = 𝐶𝑤, we have

𝑖=1 𝜏𝑖 𝑂(log 𝑘) (𝑜𝑤+ 𝐶𝑤) 𝑂(log|𝑄 |) (𝑜𝑤+ 𝐶𝑤)

Finally, summing over all facilities 𝑤in OPT(𝑄 ) yields

(𝑞, 𝜋) 𝑄 𝜏𝑞 𝑂(log|𝑄 |)OPT(𝑄 ).

Proof of Theorem B.1. Through Lemma B.2 and Observation B.3, we have that OA is a valid and monotone amortization for Algorithm 4. Lemma B.6 and Observation B.4 then yield that the algorithm is 𝑂(log 𝑄)-PRSC using OA. Using Proposition C.2 yields that the algorithm is Lagrangian 𝑂(log 𝑄)-subset-competitive using OA, which completes the proof of the theorem.

C The Smooth Combination Framework

C.1 Proof of Theorem 2.5

Proof of Theorem 2.5. Consider the framework in Algorithm 1 applied to algorithms ALG1, ALG2. The framework ensures that all requests are satisfied, as at least one of the constituent algorithms

serves each request. Denote by 𝛼(𝑞) the final value assigned to the variable 𝛼upon request 𝑞; the prize-collecting input given to both constituent algorithms is 𝑄 = ((𝑞, 𝛼(𝑞)))𝑞 𝑄. We define 𝑄 1, 𝑄 2 be the partition of 𝑄 induced by the partition of 𝑄into 𝑄1, 𝑄2. As the algorithm only buys items bought by one of the constituent algorithms, its cost can thus be bounded by ALG1(𝑄 ) + ALG2(𝑄 ). We now bound ALG1(𝑄 ); bounding ALG2(𝑄 ) is identical.

First, consider the prize-collecting solution which serves 𝑄 1 optimally subject to using items from 𝑆, but pays the penalty for requests from 𝑄 2; using the Lagrangian subset-competitiveness of ALG1 against this solution yields

E[ALG1(𝑄 )] 𝑂(𝛽1) OPT𝑆(𝑄1) + E

Now, observe that using the definition of 𝛼and the fact that ALG2 is monotone, we have that 𝛼(𝑞) ALG2(𝑞, 𝛼(𝑞)); summing over requests in 𝑄2 we get that Í 𝑞 𝑄2 𝛼(𝑞) ALG2 𝑄 2|𝑄 ). Plugging into Equation (10), we get

E[ALG1(𝑄 )] 𝑂(𝛽1) OPT𝑆(𝑄1) + E 𝑂(1) ALG2 𝑄 2|𝑄 )

𝑂(𝛽1) OPT𝑆(𝑄1) + 𝑂(𝛽2) OPT(𝑄2)

where the second inequality uses the fact that ALG2 is subset competitive to bound its cost on the subset 𝑄 2 against the solution which serves those requests optimally. This completes the bounding of costs for ALG1; we can bound E[ALG2(𝑄 )] in the same way. Summing the bounds for ALG1 and ALG2, we get ALG(𝑄) 𝑂(𝛽1) OPT𝑆(𝑄1) + 𝑂(𝛽2) OPT(𝑄2) which completes the proof.

C.2 Penalty-Robust Subset-Competitive Algorithms

In proving that a prize-collecting algorithm is Lagrangian subset-competitive (for use in our framework), we sometimes find it easier to prove that it is penalty-robust subset competitive. As we now prove, this latter property is sufficient to prove the former. (In fact, it is easy to see that both properties are in fact equivalent.) Definition C.1 (PRSC algorithm using online amortization). Let ALG be a randomized prizecollecting algorithm equipped with an online amortization OA running on an input 𝑄. We say that ALG is 𝛽penalty-robust subset competitive (PRSC) using OA if both following conditions hold:

1. For every (𝑞, 𝜋) 𝑄we have OA(𝑞, 𝜋) 𝑂(1) 𝜋.

2. For every subset 𝑄 𝑄, we have E[OA(𝑄 |𝑄)] 𝛽 OPT(𝑄 ). (where 𝑄 is the input formed from 𝑄 by forcing service, i.e., setting penalties to infinity.)

If in the second condition of PRSC we replace OPT

𝑄 , we say that ALG is 𝛽-PRSC using OA w.r.t. 𝑆. Proposition C.2. A 𝛽-PRSC algorithm using OA (w.r.t. 𝑆) is also Lagrangian 𝛽-subset-competitive using OA (w.r.t. 𝑆).

Proof. We prove this for a general solution, restricting to 𝑆is identical. Consider prize-collecting input 𝑄, and any subset 𝑄 𝑄. Let SOL be the optimal solution for 𝑄 , which pays penalties for 𝑄 𝑝and serves 𝑄 𝑏= 𝑄 \𝑄 𝑝optimally. Then it holds that

E[OA(𝑄 |𝑄)] = E OA 𝑄 𝑏|𝑄 + E h OA 𝑄 𝑝|𝑄 i

(𝑞, 𝜋) 𝑄 𝑝 𝜋

= 𝛽 SOL𝑏(𝑄 ) + 𝑂(1) SOL𝑝(𝑄 )

where the inequality uses both properties of PRSC.

D Proofs of Theorems 1.1, 1.2 and 1.3

We establish these theorems in three steps. First, we combine various constituent prize-collecting algorithms using Theorem 2.5 and explicitly state the guarantees for the resulting algorithms. Then, we use these guarantees to derive the discrete-smoothness property for the individual problems with respect to the size of the prediction (i.e., Equation (6)). Finally, we use Theorem 2.1 to make the competitive ratio depend on |𝑆\OPT| rather than on |𝑆|.

Before proceeding further, we need to precisely define the intersection/difference of a solution with a prediction to make Theorem 1.1, Theorem 1.2, and Theorem 1.3 completely formal. Definition D.1 (restriction of solution with prediction). Consider an online covering problem with items E, let 𝑆 E be some prediction. For every solution 𝐴which buys some items from E:

Define 𝐴|𝑆to be the solution which only buys items from 𝑆, to the same amount as 𝐴.

Define 𝐴|𝑆to be the solution which only buys items outside 𝑆, to the same amount as 𝐴.

Facility Location with Predictions. In order to describe facility location as a covering problem, we must describe the set of items. Here, the set of items comprises an opening item 𝑏𝑣for each facility and a connection item 𝑐𝑣,𝑞for each (request, facility) pair. When we informally write that 𝑆is a set of possible facilities, this can be formalized to the set of items 𝑏𝑣for 𝑣 𝑆, plus the connection items 𝑐𝑣,𝑞for all 𝑞in the input and 𝑣 𝑆.

Due to Theorem 3.1 and Theorem B.1, we have that both Algorithm 2 and Algorithm 4 can serve as constituent algorithms in our framework. Combining both algorithms using Theorem 2.5 thus implies the following theorem. Theorem D.2. For facility location with predictions, there exists a randomized algorithm ALG such that for every input 𝑄, and for every partition of 𝑄into 𝑄1, 𝑄2, we have

E[ALG(𝑄)] 𝑂(log|𝑆\OPT|) OPT𝑆(𝑄1) + 𝑂(log|𝑄2|) OPT(𝑄2).

We obtain an additional result, which is useful for small metric spaces, from combining two instances of Algorithm 2, one for the entire metric space 𝑋and one for the predictions 𝑆. Theorem D.3. For facility location with predictions, there exists a randomized algorithm ALG such that for every input 𝑄, and for every partition of 𝑄into 𝑄1, 𝑄2, we have

E[ALG(𝑄)] 𝑂(log|𝑆\OPT|) OPT𝑆(𝑄1) + 𝑂(log|𝑋|) OPT(𝑄2).

Proof of Theorem 1.1. Consider a solution OPT to facility location on a set of requests 𝑄. Partition 𝑄into 𝑄1, 𝑄2 such that 𝑄1 contains all requests from 𝑄that are connected to a facility in OPT|𝑆 (and 𝑄2 is complementary). Using the algorithm ALG from Theorem D.2, we have

ALG(𝑄) 𝑂(log|𝑆|) OPT𝑆(𝑄1) + 𝑂(log|𝑄2|) OPT(𝑄2). (11)

Now note that OPT|𝑆is a solution to 𝑄1 that only uses facility and connection items from 𝑆, and thus OPT𝑆(𝑄1) OPT|𝑆. Moreover, OPT|𝑆is a solution to 𝑄2, and thus OPT𝑆(𝑄2) OPT|𝑆. Plugging into Equation (12), and noting that |𝑄2| |𝑄|, we get

ALG(𝑄) 𝑂(log|𝑆|) OPT|𝑆+ 𝑂(log|𝑄|) OPT|𝑆.

We now plug the above equation into Theorem 2.1, thus replacing the dependence on |𝑆| with dependence on |𝑆\OPT|.

Proof of Theorem 1.2. Identical to the proof of Theorem 1.1, but using Theorem D.3.

Set Cover with Predictions. Theorem 4.1 implies that Algorithm 3 is Lagrangian subset-competitive. In addition, it is easy to see that Algorithm 3 is monotone, as defined in Definition 2.2. Thus, the algorithm can serve as a constituent algorithm in our framework. From combining two instances of Algorithm 3, Theorem 2.5 thus implies the following theorem. Theorem D.4. For fractional set cover with predictions, with universe (𝐸,𝑈) and a prediction 𝑆 𝑈, there exists a deterministic algorithm ALG such that for every input 𝑄, and for every partition of 𝑄 into 𝑄1, 𝑄2, we have

ALG(𝑄) 𝑂(log|𝑆|) OPT𝑆(𝑄1) + 𝑂(log|𝑈|) OPT(𝑄2).

Using standard rounding techniques (see [3, 15]) for online set cover, we can round the fractional solution online at a loss of 𝑂(log|𝑄|). In addition, we can then apply Theorem 2.1 to replace |𝑆| with |𝑆\OPT|. Thus, Theorem D.4 yields the following corollary.

Corollary D.5. For (integral) set cover with predictions, with universe (𝐸,𝑈) and a prediction 𝑆 𝑈, there exists a randomized algorithm ALG such that for every input 𝑄, and for every partition of 𝑄into 𝑄1, 𝑄2, we have

E[ALG(𝑄)] 𝑂(log|𝑄| log|𝑆\OPT|) OPT𝑆(𝑄1) + 𝑂(log|𝑄| log|𝑈|) OPT(𝑄2).

Proof of Theorem 1.3. Consider a solution OPT to set cover on a set of requests 𝑄. Partition 𝑄 into 𝑄1, 𝑄2 such that 𝑄1 contains all requests from 𝑄that belong to a set in OPT|𝑆(and 𝑄2 is complementary). Using the randomized algorithm ALG from Corollary D.5, we have

ALG(𝑄) 𝑂(log|𝑄| log|𝑆|) OPT𝑆(𝑄1) + 𝑂(log|𝑄| log|𝑈|) OPT(𝑄2). (12)

Now note that OPT|𝑆is a solution to 𝑄1 that only uses sets from 𝑆, and thus OPT𝑆(𝑄1) OPT|𝑆. Moreover, OPT|𝑆is a solution to 𝑄2, and thus OPT𝑆(𝑄2) OPT|𝑆. Plugging into Equation (12), we get ALG(𝑄) 𝑂(log|𝑄| log|𝑆|) OPT|𝑆+ 𝑂(log|𝑄| log|𝑈|) OPT|𝑆.

E Proof of Theorem 2.1: Reduction from Equation (6) to Equation (1)

In this section, we give the proof of Theorem 2.1 whose goal is to give a reduction from Equation (6) to Equation (1). This replaces 𝑠in the bound of Equation (6) with the term 𝑠𝛿, where 𝑠𝛿:= |𝑆\OPT|, in order to obtain Equation (1).

Proof of Theorem 2.1. Assume, without loss of generality, that the cheapest item in E costs 1. Consider the following construction of the algorithm ALG using the algorithm ALG :

1 Initialize 𝑖 0, 𝑆 𝑆, 𝐵 0, and define the item cost function 𝑐 𝑐.

2 Let 𝐴be an instance of ALG with prediction set 𝑆 , and cost function 𝑐 .

3 for incoming request 𝑞do

4 while True do

5 Simulate sending 𝑞to 𝐴, and let 𝑐be the resulting cost.

6 if 𝐵+ 𝑐< 2𝑖then break

7 Spend 2𝑖budget in buying the cheapest items in 𝑆 , let the bought subset of items be 𝑇.

8 Set 𝑆 𝑆 \𝑇, 𝐵 0, 𝑖 𝑖+ 1.

9 For every 𝑒 𝑇, set 𝑐 (𝑒) 0.

10 Reset 𝐴to be a new instance of ALG , given 𝑆 as prediction, and using the (modified) cost function 𝑐 .

11 Send 𝑞to 𝐴, and set 𝐵 𝐵+ 𝑐.

For integer ℓ, define phase ℓto be the subsequence of requests in which variable 𝑖takes value ℓ. The cost of the algorithm can be charged to a constant times 2 𝑗, where 𝑗is the penultimate value of 𝑖 in the algorithm. If 2 𝑗 1 < OPT, then the cost of the algorithm is at most 𝑂(1) OPT and we are done. Henceforth, suppose OPT 2𝑗 1. Define 𝑆 𝑗, 𝐴𝑗, 𝑐 𝑗to be the values of the variables 𝑆 , 𝐴 and 𝑐 during phase 𝑗. When considering the cost of a solution relative to a cost function, we place that cost function as superscript (e.g., OPT𝑐 𝑗). Before the beginning of phase 𝑗, the algorithm spent at least OPT budget on buying the cheapest items in the (remaining) prediction; it thus holds that 𝑆 𝑗 |𝑆\OPT|. Let 𝑞1, , 𝑞𝑘be the requests of phase 𝑗; moreover, let 𝑞𝑘+1 be the request upon which the variable 𝑖was incremented to 𝑗+ 1. From the definition of 𝑞𝑘+1, it holds that the cost of the instance of 𝐴in phase 𝑗on (𝑞1, , 𝑞𝑘, 𝑞𝑘+1) is at least 2𝑗; thus, the total cost of the algorithm can be charged to this cost, which we denote by 𝛼. But, through Equation (6), and from the fact that

OPT is a solution which serves (𝑞1, , 𝑞𝑘+1), we have

𝛼 𝑂( 𝑓(|𝑆\OPT|)) OPT𝑐 𝑗|𝑆 𝑗+ 𝑂(𝑔) OPT𝑐 𝑗|𝑆 𝑗

𝑂( 𝑓(|𝑆\OPT|)) OPT|𝑆 𝑗+ 𝑂(𝑔) OPT𝑐 𝑗|𝑆+ OPT𝑐 𝑗|𝑆\𝑆 𝑗

𝑂( 𝑓(|𝑆\OPT|)) OPT|𝑆+ 𝑂(𝑔) OPT|𝑆

F Proof of Lemma 4.4

Proof of Lemma 4.4. First, note that ALG(𝑞, 𝜋) 3𝑦𝑞, where 𝑦𝑞is the final value of the variable of that name: Proposition 4.5 implies that the buying cost is at most 2𝑦𝑞, while a penalty of 𝜋is paid only if 𝜋 𝑦𝑞. We show that Í (𝑞, 𝜋) 𝑄 𝑦𝑞 𝑂(log 𝑚) OPT(𝑄 ); since ALG(𝑞, 𝜋) 3 𝑦𝑞, this would complete the proof of the lemma. Consider the (standard) primal and dual LPs for fractional set cover of 𝑄 without penalties (i.e. solving 𝑄 ). The primal LP is given by:

𝑠 𝑈 𝑥𝑠 𝑐(𝑠) such that 𝑞 𝑄 :

𝑠|𝑞 𝑠 𝑥𝑠 1 and 𝑠 𝑈: 𝑥𝑠 0.

and the dual LP is given by:

𝑞 𝑄 𝑦𝑞such that 𝑠 𝑈:

𝑞|𝑞 𝑠 𝑦𝑞 𝑐(𝑠) and 𝑞 𝑄 : 𝑦𝑞 0.

We claim that the dual solution 𝑦𝑞

𝑞 𝑄 violates dual constraints by at most 𝑂(log 𝑚); thus, scaling

it down by that factor yields a feasible dual solution, and a lower bound to OPT

Consider the dual constraint corresponding to the set 𝑠; we want to bound the term Í 𝑞 𝑄 |𝑞 𝑠𝑦𝑞. Through induction on 𝑘, we can prove that once Í 𝑞 𝑄 |𝑞 𝑠𝑦𝑞= 𝑘for some integer 𝑘, it holds that

𝑘 1 . Thus, once 𝑘= Θ(𝑐𝑠log 𝑚) we have 𝑥𝑠 1, and Í 𝑞 𝑠𝑦𝑞would increase

no more. This implies that scaling down 𝑦𝑞

𝑞 𝑄 by Θ(log 𝑚) yields a feasible dual solution, which

lower bounds OPT(𝑄 ), and completes the proof.

It remains to prove the inductive claim. For the base case where 𝑘= 0, the claim holds trivially. Now, assume that the claim holds for 𝑘 1, and consider point in which Í 𝑞 𝑄 |𝑞 𝑠𝑦𝑞is incremented from 𝑘 1 to 𝑘; let 𝑥, 𝑥 be the old and new amounts by which 𝑠is held in the algorithm. We have

𝑥 = 𝑥 1 + 1 𝑐(𝑠)

+ 1 𝑈(𝑞)𝑐(𝑠) 1

+ 1 𝑚𝑐(𝑠) 1

(13) where the inequality uses the inductive hypothesis as well as the fact that |𝑈(𝑞)| 𝑚.