# online_algorithms_with_uncertaintyquantified_predictions__b784de46.pdf

Online Algorithms with Uncertainty-Quantified Predictions

Bo Sun 1 Jerry Huang 2 Nicolas Christianson 2 Mohammad Hajiesmaili 3 Adam Wierman 2 Raouf Boutaba 1

The burgeoning field of algorithms with predictions studies the problem of using possibly imperfect machine learning predictions to improve online algorithm performance. While nearly all existing algorithms in this framework make no assumptions on prediction quality, a number of methods providing uncertainty quantification (UQ) on machine learning models have been developed in recent years, which could enable additional information about prediction quality at decision time. In this work, we investigate the problem of optimally utilizing uncertainty-quantified predictions in the design of online algorithms. In particular, we study two classic online problems, ski rental and online search, where the decisionmaker is provided predictions augmented with UQ describing the likelihood of the ground truth falling within a particular range of values. We demonstrate that non-trivial modifications to algorithm design are needed to fully leverage the UQ predictions. Moreover, we consider how to utilize more general forms of UQ, proposing an online learning framework that learns to exploit UQ to make decisions in multi-instance settings.

1. Introduction

Classic online algorithms are designed to ensure worst-case performance guarantees. However, such algorithms are often overly pessimistic and perform poorly in real-world applications since worst-case instances rarely occur. To address the pessimism of these algorithms, a recent surge of work has investigated the design of algorithms utilizing machine-learned predictions (Mitzenmacher & Vassilvitskii,

1David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Canada. 2Computing + Mathematical Sciences Department, California Institute of Technology, Pasadena, USA. 3Manning College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, USA. Correspondence to: Bo Sun <bo.sun@uwaterloo.ca>.

Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s).

2020; Lykouris & Vassilvitskii, 2021; Purohit et al., 2018). In this line of research, an algorithm is given additional information on the problem instance in the form of predictions or advice , possibly from a machine learning model. Notably, it is typically the case that no assumptions are made on predictions quality. Thus, algorithms must treat them as untrusted , seeking to exploit them when they are accurate while ensuring worst-case guarantees when they are not.

Driven by safety-critical applications, uncertainty quantification (UQ) has recently become a prominent field of research in machine learning. UQ aims to provide quantitative measurements on machine learning models uncertainty about their predictions. One of the state-of-the-art methods for UQ is conformal inference (Vovk et al., 1999; 2005; Papadopoulos et al., 2002), which can transform the predictions of any black-box algorithm into a prediction interval (or prediction set) that contains the true value with high probability. Although UQ has been widely used for general decisionmaking under uncertainty, as in (Vovk & Bendtsen, 2018; Marusich et al., 2023; Sun et al., 2023), there has been limited study on its use for online problems. Thus, the key question we aim to answer in this paper is:

How can we incorporate uncertainty-quantified predictions into the design of competitive online algorithms?

To address the above question, we require a new design objective that interpolates between worst-case analysis and average-case analysis for online algorithms. Algorithms augmented with UQ predictions have access to both predictions of future inputs and the associated prediction quality, which can be leveraged to improve upon worst-case performance; however, UQ predictions often cannot exactly reconstruct distributional information to enable typical averagecase guarantees. As such, we design algorithms to minimize a new performance metric, which we term distributionallyrobust competitive ratio (DRCR): we seek algorithms that perform well on the worst-case distribution drawn from the ambiguity set determined by a given UQ.

In particular, this paper makes contributions in threefold for designing online algorithms that leverage UQ predictions.

Optimal online algorithms under distributionally-robust analysis We frame the online algorithms with UQ predic-

Online Algorithms with Uncertainty-Quantified Predictions

tions as a distributionally-robust online algorithm design problem. Then we design online algorithms using probabilistic interval predictions for the ski rental and online search problems, in both cases showing that they attain the optimal DRCR (see Theorems 2 and 3). These two problems have played a key role in the development of learningaugmented algorithms and thus are natural problems with which to begin the study of UQ for online algorithms.

Optimization-based algorithmic approach Technically, we propose an optimization-based approach for incorporating UQ predictions into online algorithms. The approach consists of building an ancillary optimization problem based on predictions with the objective of minimizing DRCR for hard instances of the problem. The solution of the optimization problem then yields the optimal algorithm design, considering the provided UQ. This approach is general in the sense that the optimization can be tuned based on the specific forms of the predictions and design goals, and, thus, this approach can potentially be applied to incorporate other forms of UQ predictions and to devise algorithms for other online problems beyond ski rental and online search.

Online learning for exploiting UQs across multiple instances Finally, we propose an online learning approach that learns to exploit general forms of UQ across multiple problem instances. Here, the probabilistic interval predictions may be imperfect (e.g., due to non-exchangeability of the data) or alternative notions of UQ are employed. We show that, under mild Lipschitzness conditions, one can obtain sublinear regret guarantees with respect to solving the full optimization formulation of the DRCR problem. We demonstrate the regret guarantees obtained by this framework in the ski rental and online search problems. Moreover, when problem instances are not fully adversarial (i.e., the distribution generating problem instances is not the worst case for the given UQ), our online learning approach outperforms the optimization-based approach, as we demonstrate in experiments in Section 5.

1.1. Related Literature

Algorithms with untrusted predictions A significant body of work has emerged considering the design of algorithms that incorporate untrusted predictions of either problem parameters or optimal decisions (Lykouris & Vassilvitskii, 2021; Purohit et al., 2018; Mahdian et al., 2012; Mitzenmacher & Vassilvitskii, 2022; Wei & Zhang, 2020; Antoniadis et al., 2020; Christianson et al., 2023; Sun et al., 2021). However, in nearly all of these works, predictions are assumed to be point predictions of problem parameters or decisions, i.e., individual untrusted decisions with no further assumptions on quality, uncertainty, probability of correctness, etc. Several recent studies have consid-

ered alternative prediction paradigms, including the setting of learning predictions from samples or distributional advice (Anand et al., 2020; Diakonikolas et al., 2021; Besbes et al., 2022; Khodak et al., 2022), and predictions which are assumed to be correct with a certain, known or unknown probability (Gupta et al., 2022). Our work is distinguished from these prior results in that we consider a more general class of uncertainty-quantified predictions. In particular, our model of probabilistic interval predictions allows for predictions that fall into a certain interval with a given probability, thus generalizing the prediction paradigm of (Gupta et al., 2022) to one more closely matched to uncertainty quantification methods in the machine learning literature.

Online learning Our online learning-based approach to utilizing UQ builds on techniques from the online learning literature and, specifically, online learning with side information and data-driven algorithm design. The problem of exploiting additional side information to improve performance in online learning has been widely studied in both bandit (Agrawal & Goyal, 2013; Slivkins, 2011; Bastani & Bayati, 2020) and partial/full-feedback (Hazan & Megiddo, 2007; Dekel et al., 2017; Kuzborskij & Cesa-Bianchi, 2020) settings. Our work is most closely related to the results in (Hazan & Megiddo, 2007); however, our results go beyond the Lipschitz assumptions on the policy class employed in (Hazan & Megiddo, 2007), and we show that we can exploit Lipschitzness of any problem instance cost upper bound to enable competing against general policies when exploiting UQ in online problems. Our online learning formulation is also aligned with the data-driven algorithm design framework (Balcan, 2020; Balcan et al., 2018) that adaptively selects the parameterized algorithms across multiple instances without using side information. Our work extends the problem setting by exploring how to utilize the additional UQ predictions in the algorithm selection, and showing regret guarantees under mild Lipschitzness assumptions.

2. Online Algorithms with UQ Predictions

For an online cost minimization problem, let I denote the set of all instances. For each instance I I, let ALG(A, I) and OPT(I) denote, respectively, the (expected) cost attained by an online algorithm A and the cost of the offline solution. Under the classic competitive analysis framework (Borodin & El-Yaniv, 2005), online algorithms have no prior knowledge of the instance I. Algorithmic design is framed as a single-instance min-max problem, with the objective of finding an online algorithm A to minimize the worst-case competitive ratio , i.e., max I I ALG(A,I)

To improve the performance of online algorithms and go beyond worst-case analysis, there has recently been research emerging on algorithms with (untrusted) predic-

Online Algorithms with Uncertainty-Quantified Predictions

tions (Mitzenmacher & Vassilvitskii, 2022; Lykouris & Vassilvitskii, 2021; Purohit et al., 2018). In an abstract setup, we consider the input instance of an online problem that can be characterized by a critical value V (e.g., the number of skiing days for the ski-rental problem). Machine learning tools can be leveraged to make a prediction P about the critical value V . In most scenarios, the quality of the prediction is unknown to the online decision-maker; hence, the goal of algorithm design with predictions is to guarantee good performance when the prediction is accurate (i.e., consistency) while still maintaining worst-case guarantees regardless of the prediction accuracy (i.e., robustness). Let IP I denote a consistent set that contains all instances confirming with the prediction P. Then the consistency η and robustness γ of an online algorithm A are defined as

η = max I IP ALG(A, I)

OPT(I) and γ = max I I ALG(A, I)

OPT(I) , (1)

which are the worst-case ratios over IP and I, respectively. Prior work has shown that there exist strong trade-offs between consistency and robustness bounds (Purohit et al., 2018; Wei & Zhang, 2020; Balseiro et al., 2023; Sun et al., 2021; Etienne Bamas et al., 2020). Therefore, algorithms with predictions usually provide a parameterized class of online algorithms (using a hyper-parameter λ) to achieve different trade-offs. Due to the lack of prediction quality, the selection of the hyper-parameter is left to end users.

In practice, we often have access to some forms of uncertainty quantification about the prediction of the input instance. We model an uncertainty-quantified (UQ) prediction by a vector θ := {P; Q}, where P is the prediction and Q specifies the quality of the prediction. For a given θ, we assume that instance I belongs to a fixed unknown distribution ξθ. If ξθ can be completely specified by θ, the averagecase analysis aims to design the online algorithm that can minimize the expected competitive ratio, i.e., Eξθ[ ALG(A,I)

OPT(I) ]. However, in most cases, UQ can only partially specify the instance distribution, and thus an interpolation between the worst-case analysis and average-case analysis is desired.

2.1. Distributionally-Robust Competitive Analysis

When UQ can coarsely characterize the instance distribution, for a given θ, we can construct an ambiguity set Dθ that includes all instance distributions that conform with UQ. An important example of such UQ is probabilistic quantification of prediction correctness, and the ambiguity set contains all distributions that conform with such predictions. In this case, an online algorithm A can be designed to minimize the distributionally-robust competitive ratio (DRCR)

DRCRθ(A) = max ξθ Dθ Eξθ

which is the worst expected competitive ratio over instance distributions from Dθ. Such an algorithmic design can be considered as an interpolation between worst-case analysis and average-case analysis.

For average-case analysis of competitive algorithms, the performance can be evaluated by expectation of ratios or ratio of expectations. We choose to define the DRCR as the expectation of ratios for two reasons. First, this metric is more commonly considered as the average-case performance measure for the ski rental problem and online search problems (e.g., (Fujiwara & Iwama, 2005) and (Fujiwara et al., 2011)), which are the focus of this paper. Second, the DRCR defined based on the expectation of ratios can be shown to be a convex combination of the consistency and robustness of the online algorithms with untrusted algorithms. This connection makes the DRCR more appealing as an extension of the consistency-robustness metric given additional information on the quality of prediction.

Probabilistic interval predictions One important class of UQs that can be leveraged for distributionally-robust analysis is probabilistic interval predictions (PIP). Definition 1. For ℓ u and δ [0, 1], PIP(θ) with θ = {ℓ, u; δ} is called a probabilistic interval prediction for a critical value V of an input instance, and it predicts that with at least probability 1 δ, the true value V is within [ℓ, u], i.e., P(V [ℓ, u]) 1 δ.

In the literature of algorithms with untrusted predictions, the untrusted prediction P is a special case of PIP(ℓ, u; δ) when the prediction is a point prediction ℓ= u = P and there is no guarantee on this prediction δ = 1.

PIP can be obtained through conformal predictions (Vovk et al., 1999; 2005; Papadopoulos et al., 2002). Given exchangeable data, conformal prediction can transform the outputs of any black-box predictors into a prediction set/interval that can cover the true value with high probabilities. In particular, conformal inference certifies that the prediction P over the critical value V is accurate within an error ε with at least probability 1 δ, i.e., P(|V P| ε) 1 δ. In this case, the prediction quality is characterized by the prediction error ε and prediction confidence δ. Equivalently, we can frame this UQ as a PIP(θ) over the instance, i.e., P(V [ℓ, u]) 1 δ, where ℓ:= P ε and u := P + ε.

For a given PIP, let Iℓ,u I denote a consistent set that contains all instances that confirm with the interval prediction. Then the ambiguity set Dθ can include all instance distributions such that Pξθ(I Iℓ,u) 1 δ, ξθ Dθ, i.e., under distribution ξθ, the probability that an instance I belongs to a set Iℓ,u is at least 1 δ. Further, we can observe that the worst instance distribution that maximizes the DRCR in Equation (2) is a two-point distribution, with probability 1 δ for instance Iη and probability δ

Online Algorithms with Uncertainty-Quantified Predictions

for instance Iγ, where Iη = arg max I Iℓ,u ALG(A,I)

Iγ = arg max I I ALG(A,I)

OPT(I) . Thus, the DRCR of online algorithm with PIP θ can be transformed into

(1 δ) max I Iℓ,u ALG(A, I)

OPT(I) + δ max I I ALG(A, I)

:= (1 δ) η + δ γ, (3)

where η and γ are the consistency and robustness of algorithms with untrusted interval predictions.

2.2. An Optimization-Based Algorithmic Approach

We introduce an optimization-based algorithmic approach for the single-instance distributionally-robust analysis that can be leveraged to systematically design online algorithms with UQ predictions. We focus on a class of parameterized online algorithms. Let A(w) denote the online algorithm with parameter w Ω, where Ωis the parameter set. The design of an online algorithm augmented by a UQ prediction PIP(θ) is to find a policy π Π : Θ Ωthat maps from θ to an online algorithm A(w). We propose a general optimization-based approach to design the policy by solving an ancillary optimization problem.

We start by constructing a family of representative hard instances H I and parameterized algorithms {A(w)}w Ω for the online problem based on the problem-specific knowledge. Let ALG(w, I) and OPT(I) denote the costs of online algorithm A(w) and offline algorithm under the instance I H. Given the prediction θ, we can further determine a subset Hℓ,u of H, containing instances that conform with the interval prediction, i.e., Hℓ,u = Iℓ,u H. Then, we formulate an optimization problem to minimize DRCR over all parameterized algorithms under such hard instances.

min η,γ 1;w Ω (1 δ)η + δγ (4a)

s.t. ALG(w, I) η OPT(I), I Hℓ,u, (4b)

ALG(w, I) γ OPT(I), I H. (4c)

Each constraint from either constraint (4b) or constraint (4c) ensures that the ratio between the expected cost of the online algorithm and the cost of the offline optimum is upper bounded by η or γ, respectively. If restricted only to hard instances H, the variables η and γ represent the consistency and robustness of the algorithm A(w), and the objective directly optimizes DRCR over all parameterized algorithms. Let {η , γ , w } denote the optimal solution of the above problem. Then we propose to choose A(w ) as the online algorithm with UQ prediction θ.

Since the optimization problem (4) is based on hard instances, its optimal objective provides a lower bound for DRCR over the parameterized algorithms. Proposition 1. No parameterized algorithms A(w), w Ω can achieve a DRCR smaller than (1 δ)η + δγ .

This lower bound can be extended for all online algorithms if the parameterized algorithms can characterize all online algorithms under the hard instances (e.g., see examples in Sections 3 and 4). Note that the ancillary problem often involves an infinite number of variables and constraints, which correspond to the high dimension of parameter w and the cardinality of hard instance set H. This necessitates efficient methods for obtaining (approximately) optimal solutions to the problem (4). Furthermore, although the optimization can give a lower bound for the target performance, it is essential to additionally establish an upper bound on DRCR of the algorithm A(w ) that is devised based on the solution of the optimization. Developing an online algorithm with matching upper and lower bounds requires carefully constructing the hard instances, crafting the parameterized algorithms, and (approximately) solving the ancillary optimization problem simultaneously. In Sections 3 and 4, we showcase how to use this approach to design online algorithms that can make the best use of a given UQ prediction to minimize DRCR in two classic online algorithms problems, the ski rental problem and the online search problem.

3. Ski Rental Problem with UQ Prediction

Problem statement A player aims to ski for an unknown time horizon N Z+. Each day she needs to decide whether to rent skis, which cost $1 for this day or buy the skis at the cost of $B Z+ and ski for free from then on. The goal is to minimize the cost of buying and renting skis.

The difficulty of the problem lies in the uncertain time horizon N. If N is known in advance, then the optimal decision is to buy in the beginning if N B and keep renting otherwise. When N is completely unknown, a deterministic online algorithm can achieve a competitive ratio of 2 (Karlin et al., 1988), and this result can be improved to e/e 1 by randomization (Karlin et al., 1990). Both results have been proven to be optimal in the worst case. In previous work on the learning-augmented setting of ski rental, the algorithm is assumed to additionally have access to a deterministic point prediction P over the time horizon N but has no information on the quality of this prediction. There exist both deterministic and randomized algorithms that can attain the Pareto-optimal trade-off between consistency and robustness (Purohit et al., 2018; Wei & Zhang, 2020;

Etienne Bamas et al., 2020).

We study online algorithms for ski rental with UQ predictions. In particular, UQ about N is given in the form of a probabilistic interval prediction θ = {ℓ, u; δ}, i.e., the time horizon N is predicted to be within Z+ ℓ,u := {ℓ, ℓ+1, . . . , u} with at least probability 1 δ. Instead of making a rent-orbuy decision each day, online decision-making for ski rental can be described as an online (randomized) algorithm with a (random) variable Y Z+ that keeps renting skis until day

Online Algorithms with Uncertainty-Quantified Predictions

Y 1 (if the time horizon has not ended) and buys on day Y . We aim to leverage UQ prediction to design the determination of Y so that DRCR can be minimized. In Section 3.1, we first introduce a deterministic algorithm as a warm-up problem to provide insights on algorithms with probabilistic predictions , and then in Section 3.2, we further propose an optimal randomized algorithm augmented with probabilistic interval predictions using the optimization-based approach.

3.1. Warm-up: A Deterministic Algorithm

We first focus on a deterministic algorithm for ski rental with a probabilistic point prediction PPP(P; δ), which forecasts the skiing horizon is P with probability at least 1 δ. To simplify the presentation, we show the results based on a continuous version of the ski rental, where the number of skiing days increases continuously, and N, B, Y R+.

A simple meta-algorithm Based on the definition in Equation (3), the DRCR of online algorithms is a linear combination of consistency and robustness from an algorithm with untrusted predictions. Therefore, we can devise a simple meta-algorithm by leveraging existing consistent and robust algorithms. Let LAP (λ) denote the algorithms with untrusted prediction P designed in (Purohit et al., 2018) for a hyper-parameter λ (0, 1]. In particular, LAP (λ) determines the day of purchase Y = B/λ if P < B and Y = Bλ otherwise. LAP (λ) has been proved (1 + λ)- consistent and (1 + 1/λ)-robust. A simple meta-algorithm then can take LAP (λ) as input and select the online algorithm with parameter λ to optimize DRCR. Specifically, it determines λδ = arg minλ (0,1](1 δ)(1+λ)+δ(1+1/λ) = min{ p

δ/(1 δ), 1}, and the meta-algorithm is given as LAP (λδ). Further, the DRCR of LAP (λδ) is derived as

δ(1 δ) δ [0, 1

2, 1] . (5)

The meta-algorithm can improve DRCR beyond the worstcase competitive ratio of 2 when the prediction quality is high (δ [0, 1/2]), with χ(δ) rapidly converging to 1 as δ approaches 0. Nonetheless, the prediction becomes ineffective as its quality deteriorates beyond δ > 1/2, reducing the meta-algorithm to the worst-case performance. However, a fundamental question remains: Can we extract the benefit from low-quality predictions? Furthermore, the DRCR of the meta-algorithm is independent of the prediction P as the algorithm LAP (λ) treats the prediction P as untrusted and does not leverage its quality δ in its design. Instead, the prediction quality is only used for the hyper-parameter selection. These limitations of the meta-algorithm motivate us to design a new algorithm capable of harnessing the probabilistic predictions more effectively.

Algorithm 1 DSR: Deterministic algorithm for ski rental

1: input: prediction PPP(P; δ), buying cost B; 2: if P < B then determine Y = B; 3: else if P (

2 B, + ) then determine Y = B min{ p

δ/(1 δ), 1};

4: else if P [B,

2 B] then 5: if χ(δ) δ + P B then determine Y = B min{ p

δ/(1 δ), 1}; 6: else determine Y = P; 7: buy skis on day Y .

An optimal deterministic algorithm In Algorithm 1, we introduce a new deterministic algorithm, referred to as DSR. This algorithm operates within distinct prediction regions: (i) in the pro-rent region, defined as P (0, B), the algorithm purchases on day B regardless of the specific prediction and prediction quality; (ii) in the pro-buy region, defined as P (

2 B, + ), the algorithm makes an early purchase within the initial B days, with the specific day determined by the design of DSR; (iii) in the rent-or-buy region, denoted by P [B,

2 B], this algorithm can opt to buy on the predicted day P or make a purchase within the first B days. The decision, in this case, is influenced by both the prediction and its quality, creating a nuanced trade-off between buy and rent. We show that DSR can achieve the optimal DRCR among all deterministic algorithms.

Theorem 1. Given a PPP(P; δ), DSR is the optimal deterministic algorithm for ski rental and achieves the DRCR

1 + δ P (0, B) min χ(δ), δ + P

2 B] χ(δ) P (

The crux of DSR s design lies in identifying the dominant decision when the prediction falls within distinct prediction regions. Compared to the meta-algorithm, DSR achieves an improved DRCR over the meta-algorithm for any given prediction. The performance gain can be attributed to explicit utilization of both the prediction and its associated quality in the decision-making process. In particular, even in scenarios where the prediction quality is low δ > 1/2, DSR still manages to enhance the DRCR, especially when the prediction P <

2 B. In such cases, any probabilistic information from the prediction can mitigate the worst-case scenarios. Although we can extend the ideas of DSR to incorporate a probabilistic interval prediction (see Appendix B.2 for more details), it becomes increasingly complicated to identify the dominant decisions. In the following section, we show that we can design algorithms using a more systematic optimization-based approach proposed in Section 2.2.

Online Algorithms with Uncertainty-Quantified Predictions

Algorithm 2 RSR(y): Randomized algorithm for ski rental

1: input: purchase distribution y Y; 2: draw a buying day Y from the distribution y; 3: rent skis up to day Y 1 and buy on day Y .

3.2. An Optimal Randomized Algorithm

We now introduce a more general randomized algorithm with probabilistic interval prediction PIP(ℓ, u; δ). We can consider a parameterized algorithm RSR(y) (described in Algorithm 2) with the purchasing probability y := {y(t)}t Z+ as the parameter. Specifically, y(t) denotes the probability of purchasing on day t. Then any online randomized algorithm for ski rental problems can be captured by y := {y(t)}t Z+, where y is a distribution of the buying day with support Z+ and the feasible set of y is given by Y := {y : P

t Z+ y(t) = 1, y(t) 0, t Z+}.

Let IN denote an instance of the ski rental problem with time horizon N. We consider the hard instance set H := I = {IN}N Z+, which in fact contains all instances of the ski rental problem. The instances conforming with the interval prediction can then be denoted by Hℓ,u = {IN}N Z+ ℓ,u. Given each instance IN, the expected cost of a randomized algorithm RSR(y) is ALG(y, IN) = PN t=1(B+t 1)y(t)+ N P+ t=N+1 y(t), and the cost of the offline algorithm is OPT(IN) = min{N, B}. Given a PIP(ℓ, u; δ), we can formulate an optimization problem (4) to minimize DRCR. Let {η , γ , y } and CR sr denote the optimal solution and the optimal objective value of the problem, respectively. The optimal randomized algorithm is then given by RSR(y ).

Theorem 2. Given a PIP(ℓ, u; δ), the DRCR of RSR(y ) is CR sr. Further, CR sr is the optimal DRCR for ski rental.

The optimization-based approach for designing RSR(y ) is general in the sense that it can be tuned to design others algorithms for related problems. For example, one can derive a deterministic algorithm with PIP(ℓ, u; δ) by replacing the feasible set with ˆY := {y : P

t Z+ y(t) = 1, y(t) {0, 1}, t Z+} that restricts the decisions to be deterministic. This systemic design stands in contrast to the ad-hoc development of the deterministic algorithm discussed in the previous section. Given that ˆY Y, the ancillary problem for the randomized algorithm is a relaxation of that of the deterministic algorithm. Thus, RSR(y ) outperforms the optimal deterministic algorithms.

Noting that the optimization problem for ski rental is a linear program with an infinite number of variables and constraints, to solve y , we show that the problem can be reduced to an equivalent problem with a finite number of variables and constraints. Therefore, y can be solved optimally and efficiently by standard linear programs.

Lemma 1. The problem (4) for ski rental can be reduced to

an optimization with O(B) variables and O(B) constraints.

4. Online Search Problem with UQ Prediction

Problem statement A player seeks to sell one unit of a resource over a sequence of prices {vn}n [N] that arrive online. In response to each price vn, the player must immediately decide an amount xn of its remaining resource to sell (resulting in the player earning vnxn), without the knowledge of future prices or the sequence length N. If any resource remains unsold at the last step N, it is compulsorily sold at the final price v N. The player s goal is to maximize its total profit P

n [N] vnxn. Following the standard assumption (El-Yaniv et al., 2001; Lorenz et al., 2009), prices are chosen (possibly adversarially) from a bounded interval, i.e., vn [m, M] for all n [N], where m > 0 .

In prior work, there exist several optimal deterministic algorithms (e.g., threat-based algorithm (El-Yaniv et al., 2001), threshold-based algorithm (Sun et al., 2020)) that can achieve the optimal worst-case competitive ratio α = O(ln(M/m)). Since it is known that randomization does not improve the performance of algorithms for one-way trading problems (El-Yaniv et al., 2001; Im et al., 2021). We focus on deterministic algorithms in this section.

In online search, if the actual maximum price is known in advance, the offline optimal algorithm simply waits until the maximum price to sell the whole resource. Previous work on online search with machine-learned advice has considered point predictions of the maximum price (Sun et al., 2021). Following this prediction paradigm, in our setting, we consider a probabilistic interval prediction PIP(ℓ, u; δ) of the maximal price, which represents a prediction that the maximum price V lies within the interval [ℓ, u] with probability at least 1 δ. In the following, we design algorithms to minimize DRCR given predictions of the form PIP(ℓ, u; δ).

4.1. An Optimal-Protection-Function Based Algorithm

We first introduce a class of protection function -based algorithms (PFA) in Algorithm 3. The PFA is parameterized by a protection function G(v) : [m, M] [0, 1] that defines the maximum selling amount upon receiving a price v [m, M]. Then a PFA only sells the resource if the current price vn is the maximum one among all previous prices, and the selling amount is G(vn) G(ˆv), where ˆv is the previous maximum price. PFA can optimally solve the one-way trading problem when the protection function is given by G(v) = 0, v [m, α m) and G(v) = 1 α ln v m α m m, v [α m, M], where α = O(ln(M/m)) is the optimal worstcase competitive ratio. Let PFA(G) denote the algorithm with protection function G. In the following, we aim to redesign the protection function G for PFA based on the

Online Algorithms with Uncertainty-Quantified Predictions

Algorithm 3 PFA(G): Protection-function-based algorithm

1: input: protection function G; 2: initiate running maximum price ˆv = m; 3: for n = 1, . . . , N 1 do 4: sell xn = [G (vn) G (ˆv)]+; 5: update ˆv = max{vn, ˆv}; 6: end for 7: x N = 1 G (ˆv).

solution of an optimization problem for a given PIP, and show PFA(G ) can attain the optimal DRCR.

Optimization problem based on hard instances. We consider a family of hard instances H := {IV }V [m,M], where IV includes a sequence of prices that continuously increase from m to V and then drop to the lowest price m in the end. Under any instance from {IV }V (v,M], PFA(G) sells G(v + dv) G(v) amount of resource at price v when the running maximum price increases from v to v + dv for some small dv. For notational convenience, we define a new parameter q(v) := [G(v + dv) G(v)]/dv, v [m, M]. The protection function G can be uniquely determined by q := {q(v)}v [m,M] and the feasible set of q is Q = {q : q(v) 0, v [m, M], R M m q(v)dv 1}. Since the online decision is irrevocable and all instances in {IV }V (v,M] have the same prefix (i.e., the price sequence continuously increasing from m to v), q(v) is the same for all {IV }V (v,M]. Moreover, note that any online algorithm corresponds to a solution q := {q(v)}v [m,M] under the hard instances, and thus we can use q to model all online algorithms. Under an instance IV , the profit of an online algorithm modeled by q is ALG(q, IV ) = R V m v q(v)dv + (1 R V m q(v)dv)m, where the first term is the profit of selling the item over prices from m to V and the second term is the profit from compulsory selling at the last price. The offline algorithm sells the entire item at the maximum price and thus OPT(IV ) = V . Given a PIP(ℓ, u; δ), we can formulate an optimization problem (4) to minimize the DRCR under hard instances. Let {η , γ , q } and CR os denote the optimal solution and the optimal objective value. Based on q , we can build a protection function G (v) = R v m q (s)ds, v [m, M] and propose PFA(G ) as the algorithm for online search.

Theorem 3. Given a PIP(ℓ, u; δ), the DRCR of PFA(G ) is CR os, and CR os is optimal for online search.

Proof of Theorem 3. Note that PFA(G ) only sells the resource when the current price is the running maximum one or when it is the last price. Thus, for any instance I = {vn}n [N], we can instead focus on a new instance I = {v n}n [N +1], where {v n}n [N ] is the N strictly increasing prices of I and v N +1 = v N. Thus, we can lower

bound the profit of PFA(G ) by

ALG(q , I) = ALG(q , I ) (6a)

v n 1 q (v)dv + [1 G (v N )]v N +1 (6b)

v n 1 vq (v)dv + [1 G (v N )]m (6c)

0 vq (v)dv + [1 G (v N )]m. (6d)

In addition, we have OPT(I) = OPT(I ) = v N . Since q is the optimal solution of the optimization problem (4), we have ALG(q , I) OPT(I)

η , v N [ℓ, u] and

ALG(q , I) OPT(I)

γ , v N [m, ℓ) (u, M]. And thus the DRCR of PFA(G ) is (1 δ)η + δγ = CR os.

Since the parameterized PFA(G) can capture the performance of all online algorithms under hard instances H, based on Proposition 1, it is straightforward to show no online algorithms can achieve a DRCR smaller than CR os.

The optimization (4) is a problem with infinite number of variables and constraints. To obtain the solution, we propose a discrete approximation to solve it. Further, if we let ˆG

denote the protection function built based on the solution of the approximation problem, the following lemma shows that PFA( ˆG ) can achieve a DRCR close to PFA(G ).

Lemma 2. For a given parameter ϵ > 0, there exists a discrete approximation problem with O( ln(M/m)

ln(1+ϵ) ) variables and constraints for the problem (4) of online search. Further, PFA( ˆG ) can achieve DRCR CR os + ϵM/m.

5. Learning Algorithms with UQ Prediction

In previous sections, we focused on a single instance of an online problem with UQ prediction θ, and designed algorithms to optimize the DRCR, i.e., the expected cost ratio under the worst-case distribution in the ambiguity set built by the UQ prediction. However, in practice, the conditional distribution ξθ is often not the worst-case one. Furthermore, the PIP may be imprecise due to non-exchangeability of the data or distribution shift, and alternative notions of UQ may be employed, e.g., see (Abdar et al., 2021). For instance, one may approximate the predictive distributions, e.g., through Monte-Carlo methods, but these may be imprecise. In these cases, it may not be tractable to formulate proper ambiguity sets. This motivates us to consider the multi-instance setting, using online learning to learn the intrinsic correlation between UQ predictions and instance costs as well as to go beyond the DRCR guarantees.

Online Algorithms with Uncertainty-Quantified Predictions

Online learning formulation The idea is to learn the mapping, or policy, from any given UQ prediction to an online algorithm over T rounds. At the beginning of round t [T], we receive a UQ prediction θt Θ about the input instance It. Then we select an algorithm parameter wt = πt(θt) Ωusing a chosen policy πt Π : Θ 7 Ω, and run the online algorithm A(wt) to execute the instance It on the fly. In the end, we observe the entire instance It drawn from the unknown conditional distribution ξθt, and the cost function ft := ft(wt; θt) = ALG(A(wt),It)

OPT(It) : Ω R+, which is the cost ratio of the online algorithm A(wt) and the offline optimal solution under the instance It. In our formulation, the goal is to compete against a function Ut := Ut(wt; θt) that upper bounds the expected cost function Eξθt[ft(wt; θt)], i.e., Ut(wt; θt) Eξθt[ft(wt; θt)], wt Ω. This upper bound exhibits certain properties (e.g. Lipschitzness) that will allow one to conduct online learning on it. We aim to select policies {πt}t [T ], which determine the parameter selection of {wt}t [T ] based on the UQ predictions {θt}t [T ], to minimize the policy regret over T instances PREGT , i.e., X

t [T ][Eξθtft(πt(θt); θt) Ut(π (θt); θt)], (7)

where π = arg minπ P t [T ] Ut(π(θt); θt). In general, it is impossible to obtain sublinear policy regret if we do not impose any restrictions on the cost functions with respect to the UQ. This is because the instance of each round can only depend on the newly received UQ but may be unrelated to the past observations. Thus, we consider that there is some local regularity that encodes the notion that similar UQ predictions should yield similar instance costs. In this paper, we consider cost functions that are L-Lipschitz in θ, i.e., for any θi, θj Θ, supw Ω|Ui(w; θi) Uj(w; θj)| L θi θj . The goal of the online learning algorithm is to achieve a sublinear regret with respect to any cost upper bound function Ut that satisfies the local regularity condition. This allows our approach to be adaptive to the inherent difficulty of the problem instance: the closer the expected cost Eξθtft is to being L-Lipschitz, the tighter the cost upper bound Ut will be to the true expected cost, and the more optimally the algorithm will perform. Furthermore, the DRCR studied in the previous sections is by definition an upper bound of the expected cost. For certain forms of UQ including PIP predictions, the DRCR is Lipschitz with respect to the UQ. This means that our approach can at least compete against the optimal DRCR, and outperform them when distributions are not adversarially given.

Algorithms and results Using the algorithmic framework in (Hazan & Megiddo, 2007), one can obtain sublinear policy regret as we defined. The main idea for the algorithm is to cover the space of UQ prediction Θ with an ϵ-net, where an instance of a sublinear regret algorithm (e.g., randomized exponentiated gradients algorithm) is assigned to each point

Figure 1. Comparisons of cumulative empirical ratios (minus 1) of the following algorithms: WOA: worst-case optimal randomized algorithm that is e/e 1-competitive. FTP: follow-the-prediction algorithm that fully trusts the prediction; OL-Dynamic: online learning with respect to policy regret by leveraging UQ predictions. OL-Static: online learning with respect to static regret without considering UQ predictions. RSR-PIP: randomized algorithm with PIP (Algorithm 3) that achieves the optimal DRCR.

in the net. Whenever a UQ prediction falls into one of the ϵ-balls, only the algorithm instance assigned to that ball will be run and updated. Thus, every algorithm instance is only run on similar problem instances. In this way, the algorithm exploits local regularities in the UQ space to achieve improved guarantees. Specifically, given that the cost upper bound is L-Lipschitz, we show that an ϵ-net based algorithm can guarantee a sublinear policy regret O(T 1 1 d+2 ) for general UQ, where d is the covering dimension of the provided UQs. To attain this result, we extend the algorithm and regret analysis from (Hazan & Megiddo, 2007) by (i) indicating how the algorithm can exploit local regularities other than just Lipschitz policies for convex functions and (ii) refining the regret analysis for competing against cost upper bounds exhibiting Lipschitzness. As concrete examples, we apply the ϵ-net algorithm to derive policy regret guarantees on DRCR for the ski rental and online search problems with PIP θ = {ℓ, u; δ}. Indeed, under mild conditions the DRCR exhibits Lipschitzness with respect to θ, which we prove using the optimization problems from previous sections. In particular, for ski-rental we can exploit that {ℓ, u} are discrete parameters to achieve an improved regret O(T 2/3) compared to the general guarantees O(T 4/5). For online search, we introduce a discretization on the UQ space to obtain Lipschitzness and achieve regret O(T 4/5). The detailed algorithms and results are in Appendix D.

Empirical results Figure 1 compares the empirical competitive ratios (CRs) of our proposed online algorithms in the setting of a multiple-instance ski rental problem. The setup details can be found in Appendix D.6. All our proposed algorithms use UQ to improve the performance compared

Online Algorithms with Uncertainty-Quantified Predictions

to those that are worst-case optimized (i.e., WOA) or just use the predictions blindly (i.e., FTP). Initially, RSR outperforms all other algorithms since RSR is designed to achieve the optimal DRCR, allowing it to perform well before the online learning approaches have had time to learn. As the number of instances increases, the cumulative CR of our proposed online learning algorithm OL-Dynamic increases sublinearly, gradually approaching and then outperforming RSR. This is because the distribution used to generate the problem instances is not the worst-case one for the given UQ. Thus, OL-Dynamic can better learn to use UQ for non-worst-case distributions, while RSR, designed toward this worst case, performs more conservatively over the long run. This emphasizes the importance of the online learning approaches in multiple-instance settings in real-world applications, where adversarial distributions rarely occur. In addition, the online learning algorithm OL-Static, which is designed for static regret, can also gradually learn to achieve a performance comparable to the optimal DRCR solution but fails to improve much beyond it. This further validates the importance of our policy regret guarantees compared to the classic static regret, which can be obtained without UQ.

6. Concluding Remarks

This paper has developed two paradigms for incorporating uncertainty-quantified predictions into the design and analysis of online algorithms. For UQ predictions that are descriptive and enable a tractable ambiguity set about the future input to be constructed, we have proposed an optimizationbased approach that utilizes the predictions and minimizes a form of distributionally-robust competitive ratio on a perinstance basis. We applied this approach to design optimal online algorithms for two classic online problems with UQ predictions: ski rental and online search problems. Additionally, we devised an online learning approach that can learn to utilize the predictions across multiple instances and attain sublinear regret under mild Lipschitz conditions. We posit that both these paradigms for incorporating uncertaintyquantified advice in online decision-making hold promise for designing algorithms using UQ for other online problems, and can enable better and more reliable use of machine learning in general online decision-making.

Online Algorithms with Uncertainty-Quantified Predictions

Acknowledgements

Bo Sun and Raouf Boutaba acknowledge the NSERC Discovery Grant RGPIN-2019-06587. Jerry Huang is supported by a Caltech Summer Undergraduate Fellowship and the Kiyo and Eiko Tomiyasu SURF Fund. Nicolas Christianson and Adam Wierman acknowledge the support from an NSF Graduate Research Fellowship (DGE-2139433) and NSF Grants CNS-2146814, CPS-2136197, CNS-2106403, and NGSDI-2105648. The work of Mohammad Hajiesmaili is supported by NSF Grants CPS-2136199, CNS-2106299, CNS-2102963, CCF-2325956, and CAREER-2045641. We also thank Yiheng Lin and Yisong Yue for insightful discussions.

Impact Statement

This paper presents research concerned with utilizing uncertainty quantification to improve machine learningaugmented online decision-making. While there are a number of potential societal consequences of our work, we do not feel these must be specifically highlighted here.

Abdar, M., Pourpanah, F., Hussain, S., Rezazadegan, D., Liu, L., Ghavamzadeh, M., Fieguth, P., Cao, X., Khosravi, A., Acharya, U. R., Makarenkov, V., and Nahavandi, S. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information Fusion, 76:243 297, December 2021.

Agrawal, S. and Goyal, N. Thompson sampling for contextual bandits with linear payoffs. In International Conference on Machine Learning, pp. 127 135. PMLR, 2013.

Anand, K., Ge, R., and Panigrahi, D. Customizing ML Predictions for Online Algorithms. In Proceedings of the 37th International Conference on Machine Learning, pp. 303 313. PMLR, November 2020.

Antoniadis, A., Coester, C., Elias, M., Polak, A., and Simon, B. Online metric algorithms with untrusted predictions. In International Conference on Machine Learning, pp. 345 355. PMLR, 2020.

Balcan, M.-F. Data-driven algorithm design. ar Xiv preprint ar Xiv:2011.07177, 2020.

Balcan, M.-F., Dick, T., and Vitercik, E. Dispersion for data-driven algorithm design, online learning, and private optimization. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pp. 603 614. IEEE, 2018.

Balseiro, S., Kroer, C., and Kumar, R. Single-leg revenue management with advice. In Proceedings of the 24th

ACM Conference on Economics and Computation, pp. 207 207, 2023.

Bastani, H. and Bayati, M. Online decision making with high-dimensional covariates. Operations Research, 68 (1):276 294, 2020.

Besbes, O., Ma, W., and Mouchtaki, O. Beyond IID: Datadriven decision-making in heterogeneous environments. In Advances in Neural Information Processing Systems, May 2022.

Borodin, A. and El-Yaniv, R. Online computation and competitive analysis. Cambridge University Press, 2005.

Christianson, N., Shen, J., and Wierman, A. Optimal robustness-consistency tradeoffs for learning-augmented metrical task systems. In International Conference on Artificial Intelligence and Statistics, pp. 9377 9399. PMLR, 2023.

Dekel, O., flajolet, a., Haghtalab, N., and Jaillet, P. Online Learning with a Hint. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.

Diakonikolas, I., Kontonis, V., Tzamos, C., Vakilian, A., and Zarifis, N. Learning online algorithms with distributional advice. In International Conference on Machine Learning, pp. 2687 2696. PMLR, 2021.

El-Yaniv, R., Fiat, A., Karp, R. M., and Turpin, G. Optimal search and one-way trading online algorithms. Algorithmica, 30:101 139, 2001.

Fujiwara, H. and Iwama, K. Average-case competitive analyses for ski-rental problems. Algorithmica, 42:95 107, 2005.

Fujiwara, H., Iwama, K., and Sekiguchi, Y. Average-case competitive analyses for one-way trading. Journal of combinatorial optimization, 21(1):83 107, 2011.

Gupta, A., Panigrahi, D., Subercaseaux, B., and Sun, K. Augmenting online algorithms with ε-accurate predictions. Advances in Neural Information Processing Systems, 35:2115 2127, 2022.

Hazan, E. and Megiddo, N. Online learning with prior knowledge. In Bshouty, N. H. and Gentile, C. (eds.), Learning Theory, pp. 499 513, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg.

Im, S., Kumar, R., Montazer Qaem, M., and Purohit, M. Online knapsack with frequency predictions. Advances in Neural Information Processing Systems, 34:2733 2743, 2021.

Online Algorithms with Uncertainty-Quantified Predictions

Karlin, A. R., Manasse, M. S., Rudolph, L., and Sleator, D. D. Competitive snoopy caching. Algorithmica, 3: 79 119, 1988.

Karlin, A. R., Manasse, M. S., Mc Geoch, L. A., and Owicki, S. Competitive randomized algorithms for non-uniform problems. In Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms, pp. 301 309, 1990.

Khodak, M., Balcan, M.-F., Talwalkar, A., and Vassilvitskii, S. Learning predictions for algorithms with predictions, 2022.

Kuzborskij, I. and Cesa-Bianchi, N. Locally-adaptive nonparametric online learning, 2020.

Lorenz, J., Panagiotou, K., and Steger, A. Optimal algorithms for k-search with application in option pricing. Algorithmica, 55(2):311 328, 2009.

Lykouris, T. and Vassilvitskii, S. Competitive caching with machine learned advice. Journal of the ACM (JACM), 68 (4):1 25, 2021.

Mahdian, M., Nazerzadeh, H., and Saberi, A. Online Optimization with Uncertain Information. ACM Transactions on Algorithms, 8(1):1 29, January 2012. ISSN 15496325, 1549-6333. doi: 10.1145/2071379.2071381.

Marusich, L. R., Bakdash, J. Z., Zhou, Y., and Kantarcioglu, M. Using ai uncertainty quantification to improve human decision-making. ar Xiv preprint ar Xiv:2309.10852, 2023.

Mc Mahan, H. B. A survey of algorithms and analysis for adaptive online learning, 2015.

Mitzenmacher, M. and Vassilvitskii, S. Algorithms with predictions, 2020.

Mitzenmacher, M. and Vassilvitskii, S. Algorithms with predictions. Communications of the ACM, 65(7):33 35, 2022.

Papadopoulos, H., Proedrou, K., Vovk, V., and Gammerman, A. Inductive confidence machines for regression. In European Conference on Machine Learning, 2002.

Purohit, M., Svitkina, Z., and Kumar, R. Improving online algorithms via ml predictions. Advances in Neural Information Processing Systems, 31, 2018.

Shalev-Shwartz, S. Online learning and online convex optimization. Found. Trends Mach. Learn., 4(2):107 194, feb 2012.

Slivkins, A. Contextual bandits with similarity information. In Proceedings of the 24th annual Conference On Learning Theory, pp. 679 702. JMLR Workshop and Conference Proceedings, 2011.

Sun, B., Zeynali, A., Li, T., Hajiesmaili, M., Wierman, A., and Tsang, D. H. Competitive algorithms for the online multiple knapsack problem with application to electric vehicle charging. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 4(3): 1 32, 2020.

Sun, B., Lee, R., Hajiesmaili, M., Wierman, A., and Tsang, D. Pareto-optimal learning-augmented algorithms for online conversion problems. Advances in Neural Information Processing Systems, 34:10339 10350, 2021.

Sun, C., Liu, S., and Li, X. Maximum optimality margin: A unified approach for contextual linear programming and inverse linear programming. In International Conference on Machine Learning, pp. 32886 32912. PMLR, 2023.

Vovk, V. and Bendtsen, C. Conformal predictive decision making. In Conformal and Probabilistic Prediction and Applications, pp. 52 62. PMLR, 2018.

Vovk, V., Gammerman, A., and Saunders, C. Machinelearning applications of algorithmic randomness. In Proceedings of the Sixteenth International Conference on Machine Learning, pp. 444 453, 1999.

Vovk, V., Gammerman, A., and Shafer, G. Algorithmic learning in a random world, volume 29. Springer, 2005.

Wei, A. and Zhang, F. Optimal robustness-consistency trade-offs for learning-augmented online algorithms. Advances in Neural Information Processing Systems, 33: 8042 8053, 2020.

Etienne Bamas, Maggiori, A., and Svensson, O. The primaldual method for learning augmented algorithms, 2020.

Online Algorithms with Uncertainty-Quantified Predictions

A. Proof of Proposition 1

Given a probabilistic interval prediction PIP(ℓ, u; δ), for any parameterized algorithm A( w), ( w Ω), let η and γ denote its consistency and robustness. By definition (1), we have

η = max I Iℓ,u ALG( w, I)

OPT(I) and γ = max I I ALG( w, I)

where I and Iℓ,u are the entire instance set and the instance subset that contains all instances confirming with the interval prediction, respectively. Then we have H I and Hℓ,u Iℓ,u since H and Hℓ,u only contain hard instances. Thus, { η, γ, w} is a feasible solution of the optimization problem (4). Consequently, the DRCR of A( w) is (1 δ) η + δ γ (1 δ)η + δγ , where η and γ are the optimal solution of the problem (4). Therefore, the DRCR of parameterized algorithms is lower bounded by (1 δ)η + δγ .

B. Technical Proofs and Supplementary Results for Ski Rental with UQ Predictions

B.1. Proof of Theorem 1

To design deterministic algorithms for the ski rental problem, we can first derive the distributionally-robust competitive ratio (DRCR) when the buying strategy Y operates in different prediction regions, and then choose the strategy that can minimize the DRCR. For notational convenience, we let ALG and OPT denote the cost of the online algorithm and the cost of offline algorithm, and let CR be the cost ratio of ALG and OPT.

Case I: P < B. We derive the cost ratios when Y falls in different regions.

Case I(a): when 0 < Y P,

if 0 < N < Y , we have ALG = OPT = N; and CR = 1;

if Y N < P, we have ALG = Y + B and OPT = N; and thus CR = Y +B

if N = P, we have ALG = Y + B and OPT = P; and thus CR = Y +B

if N > P, we have ALG = Y + B and OPT = min{N, B}; and thus CR = Y +B min{N,B} Y +B

Thus, we have DRCR = (1 δ) Y +B

Y , which is minimized by Y =

δ 1 δBP δ [0, P P +B ]

P δ ( P P +B , 1] , and the

corresponding DRCR is

P + (1 δ) B

P + δ, δ [0, P P +B ]

P δ ( P P +B , 1] .

Case I(b): when P < Y B,

if 0 < N < Y , we have ALG = OPT = N; and CR = 1;

if Y N, we have ALG = Y + B and OPT = min{N, B}; and thus CR = Y +B min{N,B} Y +B

Thus, we have DRCR = 1 δ + δ Y +B

Y , which is minimized when Y = B and DRCR = 1 + δ.

Case I(c): when Y > B,

if 0 < N B, we have ALG = OPT = N; and CR = 1;

if B < N < Y , we have ALG = N and OPT = B; and thus CR = N

if N Y , we have ALG = Y + B and OPT = B; and thus CR = Y +B

Online Algorithms with Uncertainty-Quantified Predictions

Thus, we have DRCR = 1 δ + δ Y +B

Y , which is minimized when Y = B and DRCR = 1 + δ.

By comparing the DRCR of three sub-cases, Y = B minimizes the DRCR when P < B, and the minimum DRCR is 1 + δ.

Case II: P > B. Consider the following sub-cases.

Case II(a): when 0 < Y B,

if 0 < N < Y , we have ALG = OPT = N; and CR = 1;

if Y N B, we have ALG = Y + B and OPT = N; and thus CR = Y +B

if B < N, we have ALG = Y + B and OPT = B; and thus CR = Y +B

Thus, we have DRCR = (1 δ) Y +B

Y , which is minimized when Y = B min{ p

δ/(1 δ), 1}, and DRCR :=

δ(1 δ) + 1 δ [0, 1

Case II(b): when B < Y P,

if 0 N B, we have ALG = OPT = N; and CR = 1;

if B < N < Y , we have ALG = N and OPT = B; and thus CR = N

if Y N P, we have ALG = Y + B and OPT = B; and thus CR = Y +B

if P < N, we have ALG = Y + B and OPT = B; and thus CR = Y +B

Thus, we have DRCR = Y +B

B , which is minimized when Y B and DRCR 2.

Case II(c): when Y > P,

if 0 < N B, we have ALG = OPT = N; and CR = 1;

if B < N P, we have ALG = N and OPT = B; and thus CR = N

if P < N < Y , we have ALG = N and OPT = B; and thus CR = N

if N Y , we have ALG = Y + B and OPT = B; and thus CR = Y +B

Thus, we have DRCR = (1 δ) P

B , which is minimized when Y P, and DRCR δ + P

By comparing the DRCR of the above three sub-cases, we have when P >

2 B, χ(δ) δ + P

B , δ [0, 1] and thus, the optimal buying day is Y = B min{ p

δ/(1 δ), 1}. When B P

2 B, the optimal buy strategy is Y = B min{ p

δ/(1 δ), 1} if χ(δ) δ + P

B and Y = P otherwise.

B.2. Deterministic Algorithms with Probabilistic Interval Predictions

We can extend the ideas of DSR to incorporate a probabilistic interval prediction PIP(ℓ, u; δ). The extension adheres to a decision structure similar to that of DSR when dealing with predicted intervals that clearly favor buying decisions (i.e., B < ℓ u) or renting decisions (i.e., ℓ< u B). The additional complexity arises when ℓ B u. In such cases, the optimal decision greatly depends on the prediction interval and its quality, and this leads to three possible scenarios: (i) making a pro-rent decision by purchasing at the predicted interval upper bound u, (ii) opting for a pro-buy decision by purchasing within the first ℓdays , or (iii) buying on day B when the prediction is proved to be unhelpful. The full algorithm is presented in Algorithm 4 and its DRCR result is presented in Lemma 4. The proof of Lemma 4 follows the same proof idea as Theorem 1.

Online Algorithms with Uncertainty-Quantified Predictions

Algorithm 4 Online deterministic algorithm with PIP for ski rental

1: input: prediction PIP(ℓ, u; δ), buying cost B; 2: if ℓ u < B then 3: set Y = B; 4: else if B < ℓ u then 5: if χ(δ) δ + u

6: set Y = B min{ p

δ/(1 δ), 1}; 7: else 8: set Y = u; 9: end if 10: else if ℓ B u then 11: if ζ(δ, ℓ) 2 and δ + u

B 2 then 12: set Y = B; 13: else if ζ(δ, ℓ) δ + u

14: set Y = ℓ min{ p

Bδ/(ℓ(1 δ)), 1}; 15: else 16: set Y = u; 17: end if 18: end if 19: buy skis on day Y .

Theorem 4. Given a PIP(ℓ, u; δ), Algorithm 4 for ski rental achieves the DRCR

1 + δ ℓ u < B min χ(δ), δ + u

B B < ℓ u min ζ(δ, ℓ), δ + u

B , 2 ℓ B u , (8)

( δ + (1 δ)B/ℓ+ 2 p

δ(1 δ)B/ℓ δ [0, ℓ ℓ+B ) 1 + B/ℓ δ [ ℓ ℓ+B , 1] .

Further, the attained DRCR is optimal for all deterministic algorithms for ski rental with probabilistic interval predictions.

B.3. Proof of Theorem 2

First, the optimization formulation for the ski rental problem with a probabilistic interval prediction PIP(ℓ, u; δ) can be formally stated as follows:

min η,γ,y (1 δ)η + δγ (9a)

t=1(B + t 1)y(t) + N X

t=N+1 y(t) η min{N, B}, N Z+ ℓ,u, (9b) XN

t=1(B + t 1)y(t) + N X

t=N+1 y(t) γ min{N, B}, N Z+ \ Z+ ℓ,u, (9c)

1 η γ, (9d)

Each constraint N from either constraint (9b) or constraint (9c) ensures that the ratio between the expected cost of the online algorithm and the cost of the offline optimum is upper bounded by η or γ, respectively. The constraint 1 η γ guarantees that PN t=1(B + t 1)y(t) + N P t=N+1 y(t) γ min{N, B}, N Z+ ℓ,u, that corresponds to the constraint (4c) in the optimization problem (4).

Given any instance IN, the expected cost of RSR(y ) is ALG(y , IN) = PN t=1(B + t 1)y (t) + N P t=N+1 y (t), and the offline optimal cost is OPT(IN) = min{N, B}. Thus, based on the definition in Equation (3), η and γ are the

Online Algorithms with Uncertainty-Quantified Predictions

consistency and robustness of the algorithm with untrusted interval prediction [ℓ, u], respectively. And the DRCR of RSR(y ) is (1 δ)η + δγ = CR sr.

To show the optimality of the result, we note that the hard instance set H used for formulating the problem is in fact the entire instance set I = {IN}N Z+ for the ski rental problem, and the parameterized algorithms RSR(y) can capture all online algorithms under H. Thus, based on Proposition 1, no online algorithms can achieve a DRCR smaller than CR sr.

B.4. Proof of Lemma 1

The proof is based on the optimization formulation (9). Let CN denote the constraint indexed by N from the constraints (9b) and (9c). Then the problem (9) can be reduced to an optimization with a finite number of constraints and variables. Particularly, variables {y(t)}t Z+ and constraints {CN}N Z+ (i.e., constraints (9b) and (9c)) can be reduced as follows:

when ℓ u < B, only B variables {y(t)}t Z+ 1,B and B constraints {CN}N Z+ 1,B are non-redundant;

when B < ℓ u or ℓ B u, only B + 1 variables {y(t)}t Z+ 1,B {u+1} and B + 1 constraints {CN}N Z+ 1,B 1 {u,u+1} are non-redundant.

We start by showing the structural property of constraints (9b) and (9c). Let RHS(N) and LHS(y, N) denote the right-handside and the left-hand-side of the constraint CN, respectively. Given a feasible solution {η, γ, y}, if there exists a k Z+

such that RHS(k) RHS(k + 1), we can move the probability mass from y(k + 1) to y(k) and obtain a new solution ˆy, i.e.,

y(t) + y(t + 1) t = k 0 t = k + 1 y(t) otherwise ,

and the solution {η, γ, ˆy} is also feasible to the problem (9). To show this, we make the following claims.

Claim 1. {η, γ, ˆy} satisfies the constraints {CN}N Z+ 1,k 1.

Note that LHS(y, N) = PN t=1(B +t 1)y(t)+N[1 PN t=1 y(t)]. Then we have LHS(ˆy, N) = LHS(y, N), N k 1, and thus the first k 1 constraints are feasible for ˆy.

Claim 2. {η, γ, ˆy} satisfies the constraints {CN}N Z+ k+1, .

When N k + 1, we have

LHS(ˆy, N) LHS(y, N) = (B + k 1)[ˆy(k) y(k)] (B + k)y(k + 1) = y(k + 1) 0,

and thus all constraints after k + 1 are feasible for ˆy.

Claim 3. {η, γ, ˆy} satisfies the constraint Ck.

Note that LHS(ˆy, k) is dominated by LHS(ˆy, k + 1) since

LHS(ˆy, k + 1) LHS(ˆy, k) = 1 Xk

t=1 ˆy(t) + (B 1)ˆy(k + 1) > 0.

Therefore, if RHS(k) RHS(k + 1) and Ck+1 is satisfied, Ck is also feasible for ˆy.

Combining above three claims gives the structural property. Next, we show the reduction of variables {y(t)}t Z+ and constraints {CN}N Z+ in the problem (9) as follows.

Case I: ℓ u < B. In this case, RHS(N) remains the same when N B. Therefore, we can iteratively move all the probability mass from {y(t)}t Z+ B+1, to y(B) and obtain the new feasible solution

y(t) t Z+ 1,B 1 P t=B+1 y(t) t = B 0 t Z+ B+1,

Online Algorithms with Uncertainty-Quantified Predictions

Since ˆy(t) = 0, t Z+ B+1, , all variables {y(t)}t Z+ B+1, and constraints {CN}N Z+ B+1, are redundant. Thus, we only need to focus on B variables {y(t)}t Z+ 1,B and B constraints {CN}N Z+ 1,B.

Case II: B < ℓ u. Note that (i) RHS(N) = γB remains the same when N Z+ u+1, ; and (ii) RHS(N) is nonincreasing in N when N Z+ B,u since we have RHS(N) = γB, N Z+ B,ℓ 1 and RHS(N) = ηB, N Z+ ℓ,u. Based on the structural property, we can iteratively move the probability mass from {y(t)}t Z+ u+2, to y(u + 1), and from {y(t)}t Z+ B+1,u to y(B). This gives a new feasible solution

y(t) t Z+ 1,B 1 Pu t=B y(t) t = B 0 t = B + 1, . . . , u P t=u+1 y(t) t = u + 1 0 t Z+ u+2,

Thus, we can just focus on the B + 1 variables {y(t)}t Z+ 1,B {u+1}.

For N Z+ B,u, note that LHS(ˆy, N) is non-decreasing and RHS(N) is non-increasing in N. Thus, the last constraint Cu is the most difficulty one and we can just focus on Cu. For N Z+ u+1, , we can just focus on the constraint Cu+1 since ˆy(t) = 0, t Z+ u+2, . Thus, we only need to consider the constraints {CN}N Z+ 1,B 1 {u,u+1}.

Case III: ℓ B u. In this case, we have that (i) RHS(N) = γB remains the same when N Z+ u+1, ; and (ii) RHS(N) = ηB remains the same when N Z+ B,u. Then, the probability mass {y(t)}t Z+ u+2, can be moved to y(u + 1), and the probability mass {y(t)}t Z+ B+1,u can be moved to y(B). Thus, a new feasible solution ˆy is also given in the same form as Equation (10) and we can focus on the B + 1 variables {y(t)}t Z+ 1,B {u+1}.

For N Z+ B,u, LHS(ˆy, N) is non-decreasing and RHS(N) is constant in N. We can just focus on the last constraint Cu. For N Z+ u+1, , we can just focus on the constraint Cu+1 since ˆy(t) = 0, t Z+ u+2, . Thus, we only need to consider the constraints {CN}N Z+ 1,B 1 {u,u+1}.

Combining Case I - Case III, there are no more than B + 1 non-redundant variables in {y(t)}t Z+, and B + 1 constraints in {CN}N Z+. This completes the proof.

C. Technical Proofs and Supplementary Results for Online Search with UQ Predictions

C.1. Discrete Approximation for the Optimization Problem

We start by providing a formal formulation for the online search with a probabilistic interval prediction PIP(ℓ, u; δ) under the hard instances H := {IV }V [m,M].

min η,γ,q (1 δ)η + δγ (11a)

m v q(v)dv + (1 Z V

, V [ℓ, u], (11b)

m v q(v)dv + (1 Z V

, V [m, ℓ) (u, M], (11c)

1 η γ, (11d)

Each constraint indexed by V in Equation (11b) (or in Equation (11c)) ensures the ratio between the profit of the offline optimum and the profit of the online algorithm is upper bounded by η (or γ) under the instance IV . Thus, η and γ represent the consistency and robustness (under the hard instances), and the optimization objective is to minimize the DRCR under the hard instances. Let {q , η , γ } and CR os denote the optimal solution and the optimal objective value of the above problem.

Online Algorithms with Uncertainty-Quantified Predictions

We propose a discrete approximation for the problem (11) as follows. Fix a parameter ϵ > 0. Let K be the largest integer such that m(1 + ϵ)K M, i.e., K = ln(M/m)

ln(1+ϵ) . Consider the following K = K + 4 discrete values that include K + 1 values {m(1 + ϵ)k}k=0,...,K , and three additional values ℓ, u, M. We arrange these K values in a non-decreasing order and let Vk denote the k-th value. Particularly, we have V1 = m, VK = M, and we define index kℓand ku such that Vkℓ= ℓand Vku = u. The key property of the defined discrete values is that Vk/Vk 1 1 + ϵ, k = 2, . . . , K. We define K variables ˆq := {ˆqk}k [K] and its feasible set ˆQ := {ˆq : ˆqk 0, k [K], P

k [K] ˆqk 1}. Then we consider the following discrete version of the problem (11).

min ˆη,ˆγ,ˆq (1 δ)ˆη + δˆγ (12a)

s.t. Vk ˆη Xk

i=1 Viˆqi + (1 Xk

i=1 ˆqi)m , k = kℓ, . . . , ku, (12b)

i=1 Viˆqi + (1 Xk

i=1 ˆqi)m , k = 1, . . . , kℓ 1, ku + 1, . . . , K, (12c)

1 ˆη ˆγ, (12d)

ˆq ˆQ, (12e)

which only contains K variables in ˆq and K constraints in Equations (12b) and (12c).

C.2. Proof of Lemma 2

Based on the discrete approximation (12) proposed in Appendix C.1, we can have an approximate problem with O( ln(M/m)

variables and constraints. Below we prove the approximation error of PFA( ˆG ).

First, we show the discrete problem (12) is a relaxation of the original problem (11). The relaxation is by (i) removing all constraints except when V takes the K discrete values {Vk}k [K]; and (ii) further relaxing the remaining K constraints as follows. The K remaining constraints of the original problem can be shown as

m v ˆq(v)dv + (1 Z Vk

m ˆq(v)dv)m

, k = kℓ, . . . , ku, (13a)

m v ˆq(v)dv + (1 Z Vk

m ˆq(v)dv)m

, k = 1, . . . , kℓ 1, ku + 1, . . . , K. (13b)

Constraints (12b) and constraints (12c) are further relaxed constraints of the constraints (13a) and constraints (13b), respectively. In particular, we relax R Vk m v ˆq(v)dv to Pk i=1 Viˆqi, k [K].

Let {ˆη , ˆγ , ˆq } denote the optimal solution of the discrete problem (12). Since the discrete problem is a relaxation of the original problem, we have (1 δ)ˆη + δˆγ (1 δ)η + δγ .

Based on ˆq , we can build a piece-wise constant protection function ˆG (v) = Pk i=1 ˆqi if v [Vk, Vk+1], k [K 1]. We then aim to analyze the DRCR of PFA( ˆG ). Following similar approach to the proof of Theorem 3, PFA( ˆG ) can guarantee

ALG(ˆq , Iv N ) OPT(Iv N )

Vk Vk 1 ˆη OPT(Iv N )

(1 + ϵ)ˆη , v N [ℓ, u]

ALG(ˆq , Iv N ) OPT(Iv N )

Vk Vk 1 ˆγ OPT(Iv N )

(1 + ϵ)ˆγ , v N [m, ℓ) (u, M].

Thus, the DRCR of PFA( ˆG ) is (1 + ϵ)[(1 δ)ˆη + δˆγ ] (1 + ϵ)[(1 δ)η + δγ ] (1 δ)η + δγ + ϵM/m. This completes the proof.

Online Algorithms with Uncertainty-Quantified Predictions

Algorithm 5 Online learning algorithm with uncertainty-quantified predictions

1: input: master algorithm A (e.g., exponentiated gradients algorithm); UQ space Θ; parameterized algorithms A(w), w Ω; parameter ϵ; 2: initialize ϵ-net N = ; 3: for each round t = 1, ..., T do 4: receive UQ θt Θ and let θt = arg minθ N θt θ be the closest vector in N; 5: if θt θt > ϵ then 6: add θt to N; 7: start a new instance of algorithm Aθt corresponding to θt; 8: update θt θt; 9: end if 10: choose wt as the output of A θt; 11: run algorithm A(wt) to execute the instance It, and observe the cost function ft(wt; θt); 12: update A θt based on ft(wt; θt). 13: end for

D. Technical Proofs and Supplementary Results for Learning Algorithms with UQ Predictions

D.1. Algorithm and Main Results

The key algorithmic framework we use is based on (Hazan & Megiddo, 2007), as illustrated in Algorithm 5. Given that there exists a master algorithm that can achieve O(

T) static regret on the cost sequence {ft}t [T ] and the cost upper bounds {Ut}t [T ] are Lipschitz in UQ predictions, we show that one can use Algorithm 5 to to achieve a sublinear policy regret PREGT . We prove this fact in the following theorem.

Theorem 5. For each UQ prediction θt Θ [0, 1]D, suppose cost function ft ξθt, and that there exists an algorithm A that achieves O(

T) static regret in expectation for f1, ..., f T . Moreover, assume that the covering dimension of the {θt}t T is d D, i.e., the UQ vectors originate from a d-dimensional subspace of [0, 1]D. For any cost upper bound Ut of Eξθtft, i.e. Ut(wt; θt) Eξθtft(wt; θt) for all wt Ω, such that Ut is L-Lipschitz in θt Θ, Algorithm 5 with A as its master algorithm achieves the expected policy regret with respect to the optimal policy π for U1, ..., UT

t [T ][Eft(πt(θt); θt) Ut(π (θt); θt)] = O(L1 2 d+2 T 1 1 d+2 ),

where π = arg minπ P

t [T ] Ut(π(θt); θt). Here, the expectation is taken over ft ξθt and the randomness of the master algorithm.

By definition, DRCR is a natural cost upper bound for cost functions. However, we remark that in many instances there will likely be a tighter upper bound than the DRCR that is also Lipschitz in θ, and we showed that Algorithm 5 can automatically compete against any such upper bound. Thus, we expect that in practice, Algorithm 5 can potentially do better than the bounds stated in this section, and in particular it likely can exploit PIPs more optimally than optimization-based algorithms. We confirm this in our numerical experiments.

D.2. Applications to Ski Rental and Online Search Problems

We consider how one can use a variant of Algorithm 5 with master algorithm being the randomized exponentiated (sub)gradient (EG) algorithm (Shalev-Shwartz, 2012; Mc Mahan, 2015) to derive policy regret guarantees on the DRCR for the ski rental and online search problems when UQs are probabilistic interval predictions PIP(θ) = PIP(ℓ, u; δ). This is done by observing that the DRCR upper bounds the expected competitive ratio, and, under mild conditions, DRCR exhibits Lipschitzness with respect to θ. Then, using the ideas in Theorem 5, we show that it is sufficient to learn the low-regret policy with respect to DRCR by just using the realized cost ratio function ft for each instance t. Also note that the learning algorithm is unaware of θ s status as UQ, instead treating the vectors θt as generic side information. However, by leveraging the structure of the information space for PIP, we can improve the regret guarantees for ski-rental (see Corollary 1) compared to the general guarantees in Theorem 5, and obtain provable Lipschitzness guarantees and, hence, regret guarantees for online search (see Corollary 2).

Online Algorithms with Uncertainty-Quantified Predictions

Algorithm 6 Online learning algorithm for multiple-instance ski rental

1: input: master algorithm A; space of probabilistic interval prediction Θ := [ N] [ N] [0, 1]; parameterized algorithms RSR(y), y Y; parameter ϵ; 2: Initialize N = {N(i,j) = }(i,j) [ N] [ N]; 3: for each t = 1, ..., T do 4: receive θt = (ℓt, ut; δt) [ N] [ N] [0, 1] and let θt = arg minθ N(ℓt,ut) θt θ be the closest vector in N(ℓt,ut);

5: if θt θt > ϵ then 6: add θt to N(ℓt,ut); 7: start a new instance of algorithm Aθt corresponding to θt; 8: update θt θt; 9: end if 10: choose yt as the output of A θt; 11: run algorithm RSR(yt) to execute the instance It; 12: update A θt. 13: end for

D.2.1. SKI RENTAL PROBLEM

Consider an online sequence of T instances of the ski rental problem, where we assume that N 2 bounds the number of skiing days and B > 0 is the buying cost. We will also assume that at the start of instance t, we receive a PIP(ℓt, ut; δt) [ N] [ N] [0, 1]. The Algorithm 6 we use is a slight modification of Algorithm 5, where we consider the fact that ℓt, ut are discrete. Specifically, we separately consider the spaces (ℓ, u, ) = [0, 1] indexed by all N 2 combinations of (ℓ, u) (the number of combinations can be further reduced in practice by noting u ℓ), and we separately cover each space with an ϵ-net N(ℓ,u). Whenever we receive a UQ vector θt = (ℓt, ut; δt), we will run the online learning algorithm corresponding to the closest point in N(ℓt,ut).

Let Ft := Ft(Yt; θt) = ALG(Yt,It)

OPT(It) : [ N] R+ denote the cost ratio function for ski rental that can be constructed after observing the instance It, where Ft(Yt; θt) is the cost ratio of buying on day Yt [ N]. Let ft := ft(yt; θt) denote the expected cost ratio function over the randomized decision Yt, i.e., ft(yt; θt) = EYt yt Ft(Yt; θt). We will also denote the DRCR function we derived in Section 3 as Ut := Ut(yt; θt), which upper bounds Eξθft, i.e., Ut(yt; θt) Eξθft(yt; θt), yt Y, where Y is the simplex over support [ N]. By using similar proof ideas to Theorem 5, we can show that learning using the cost ratio ft, which we can construct in hindsight after each instance It, allows us to compete against the DRCR Ut.

Corollary 1. For the multi-instance ski rental problem, there is a policy {πt}t [T ] that can compete against the optimal policy π that maps Θ to the simplex Y over [ N] with respect to DRCR {Ut}t [T ], obtaining expected policy regret

t [T ][Eft(πt(θt); θt) Ut(π (θt); θt)] = O( N(max{( N + B)/B, B})T 2/3),

where π = arg minπ P

t [T ] Ut(π(θt); θt). The expectation is taken over the randomness of the instance distribution and the algorithm. The O( ) hides factors of log N.

D.2.2. ONLINE SEARCH PROBLEM

Consider an online sequence of T instances of the online search problem with prices bounded within [m, M]. At the start of instance It, we receive a PIP(ℓt, ut; δt) [m, M] [m, M] [0, 1]. We use a slight modification of Algorithm 6 to solve this problem, but we further discretize the space of possible (ℓt, ut) [m, M]2 before applying the algorithm.

Denote ft := ft(qt; θt) = OPT(It) ALG(qt,It) : Q R+ as the profit ratio function, where ft(q; θt) is the profit ratio for choosing

q Q = {q : q(v) 0, v [m, M], R M m q(v)dv = 1}. The DRCR is again denoted as Ut := Ut(qt; θt), which upper bounds Eξθtft(q; θt). We can design an online learning algorithm that can compete against the optimal DRCR.

Corollary 2. For the multi-instance online search problem, there is a policy {πt}t [T ] that can compete against the optimal

Online Algorithms with Uncertainty-Quantified Predictions

policy π mapping Θ to Q with respect to DRCR {Ut}t [T ], obtaining expected policy regret X

t [T ][Eft(πt(θt); θt) Ut(π (θt); θt)] = O((M m + 1)((M/m)8/5 + 1)T 4/5),

where π = arg minπ P

t [T ] Ut(π(θt); θt). The expectation is taken over the instance distribution. The O( ) hides factors of log(M/m), log(M m), and log T.

D.3. Proof of Theorem 5

For each round t, let wt = πt(θt) be the decision of Algorithm 5 and ut = π (θt) be the offline optimal decision for Ut. The policy πt(θt) = πt(θt|θ1, f1, ..., θt 1, ft 1) determined by Algorithm 5 depends on all past observed cost functions f1, . . . , ft 1 and UQ predictions θ1, . . . , θt 1. We can reformulate the policy regret defined in (7) as follows: X

t [T ][Eft(πt(θt); θt) Ut(π (θt); θt)] = E X

t Tθ[ft(wt; θt) Ut(ut; θt)], (14)

where Tθ = {t [T] : θt = θ} denotes set of rounds with UQ corresponding to θ N. Let Tθ = |Tθ| be the number of such rounds. We can further bound this quantity as follows:

t Tθ [ft(wt; θt) Ut(ut; θt)] = E X

t Tθ [ft(wt; θt) ft(u1; θt)] + X

t Tθ [Eft(u1; θt) Ut(ut; θt)] (15a)

t Tθ [Ut(u1; θt) Ut(ut; θt)] (15b)

t Tθ [Ut(u1; θt) U1(u1; θ1) + U1(ut; θ1) Ut(ut; θt)] (15c)

Tθ) + 2Tθ ϵL, (15d)

where the first inequality follows since the master algorithm A guarantees the static regret and Ut(u1; θt) upper bounds the expected cost Eft(u1; θt), the second inequality follows since u1 is the minimizer of U1(u; θ1), and the last one follows from our Lipschitz assumption of the cost upper bounds.

Then, using the Cauchy-Schwarz inequality, X

t Tθ [Eft(wt; θt) Ut(ut; θt)] X

Tθ) + 2TθϵL O( p

|N|T) + 2TϵL. (16)

Since the covering dimension is d and the distance between any two UQ points in the net N constructed by Algorithm 5 is ϵ, one can deduce by volume arguments that the ϵ-net has size |N| = O(1/ϵd). Finally, choosing ϵ = (L2T) 1/(d+2)

minimizes this expression, giving the final result.

D.4. Proof of Corollary 1

We use Algorithm 6 with the master algorithm being the randomized exponentiated (sub)gradient (EG) algorithm (Shalev Shwartz, 2012; Mc Mahan, 2015), which is an algorithm for learning from experts. EG runs on the simplex Y over [ N] and the function ft, in order to learn a randomized decision y Y, from which the decision Y is sampled from. Note that ft is

bounded above by max{( N + B)/B, B}. Thus, by setting the step size to be

log N max{( N+B)/B,B}

2T and using Corollary 2.14 in (Shalev-Shwartz, 2012), EG achieves expected static regret guarantee

t [T ] Eft(yt; θt) Eft(y ; θt) = O max{( N + B)/B, B} q

T log N , (17)

where y = arg miny Y P

t [T ] Eft(y; θt) is the optimal decision.

Next, we show that we can compete against the offline optimal sequence of decisions with respect to the DRCR Ut(yt; θt), which upper bounds Eft(yt; θt). Consider the formulation of the DRCR in the problem (9): Ut(yt; θt) = (1 δ)ηt + δγt, where ηt and γt are chosen to be as small as possible while satisfying the constraints given yt. Here, we consider

Online Algorithms with Uncertainty-Quantified Predictions

Lipschitzness of Ut with respect to only δ, which is sufficient due to the specific manner in which the ϵ-net is constructed in Algorithm 6 (i.e., the (ℓ, u) coordinates of the points in the net are selected from a discrete set). Since ηt and γt are bounded by max{( N + B)/B, B}, Ut is max{( N + B)/B, B}-Lipschitz in δ. For brevity, we let L := max{( N + B)/B, B}. We also note that in Algorithm 6, any θt belonging to the same point of any N(i,j) is within ϵ distance of each other. Thus, we can use the same steps of the proof of Theorem 5, which gives us X

t Tθ[Eft(yt; θt) Ut(y t ; θt)] O(L p

Tθ) + 2Tθ ϵL,

where yt = πt(θt) is the online decision by Algorithm 6 and y t = π (θt) is the optimal decision for Ut. Then, using Cauchy-Schwarz inequality, X

Tθ) + 2TθϵL O(L p

|N|T) + 2TϵL

(ℓ,u) |N(ℓ,u)|T) + 2TϵL

T/ϵ) + 2TϵL,

where the last inequality follows because |N(ℓ,u)| = O(1/ϵ). Setting ϵ = T 1/3 < 1 yields the final result.

D.5. Proof of Corollary 2

Define two sets of discretized points Λ1 = {m(1 + λ1)k}k=0,...,K, where K = ln(M/m)

ln(1+λ1) , and Λ2 = {m, m + λ2, m + 2λ2, . . . , M}. Then we consider the discretization Λ = Λ1 Λ2. Note that Λ takes points from the discrete approximation of problem (11) and the evenly spaced points in [m, M]. In the algorithm, for each t [T], we round (ℓt, ut) into points ( ℓt, ut) in Λ2 Λ2, where ℓt is rounded down and ut is rounded up. Then, Algorithm 6 with EG as the master algorithm is run on Λ and ( ℓt, ut, δt). Here, EG runs over the simplex ˆQ supported on Λ and optimizes the profit ratio function ft(ˆq; θt).

Since ft is bounded above by M/m, EG with step size

2T yields the following static regret guarantee with

respect to any ˆq ˆQ:

t [T ][ft(ˆqt; θt) ft(ˆq ; θt)] = O

λ2 + K + 1)

λ2 + ln(M/m)(1 + 1

where the last inequality follows because ln(1 + λ1) λ1/(1 + λ1) for λ1 > 0. Ultimately, we will optimize the DRCR over points in ˆQ and a rounded ( ℓ, u, δ). We claim that this still allows us to compete against the optimal point in Q. To show this, we will need to (i) verify that the optimal DRCR with respect to ˆQ is not far from the optimal DRCR with respect to original feasible set Q, and (ii) verify that the optimal DRCR with ( ℓ, u) is not far from the DRCR with (ℓ, u), both being with respect to Q.

For the first part, set Ut(q; θt) = (1 δt)ηt + δtγt as in the DRCR formulation in problem (11). Since ηt, γt are bounded by M/m, Ut is M/m-Lipschitz in δ. We will also consider another DRCR upper bound U t optimizing over ˆQ corresponding to problem (12), which is a relaxation of problem (11). For q = arg minq Q Ut(q; θt) and ˆq = arg minˆq ˆ Q U t(ˆq; θt), Lemma 2 yields U t(ˆq ; θt) Ut(q ; θt) + M

m λ1. This holds even though we added {m + λ2, m + 2λ2, ..., M} to the original discretization in problem (12) because adding more points to the discretization will only decrease the approximation error of the discrete DRCR. Thus, competing against U t allows us to compete against Ut.

For the second part, note that the error incurred by rounding (ℓ, u) to ( ℓ, u) only affects η: in problem (11), constraints corresponding to V [ ℓ, ℓ) (u, u] will be added to constraint set (11b) and removed from constraint set (11c). Thus, η will only increase, while γ is bounded below by η and can only increase by the amount that η increases since (11c) has no added constraints. Focusing on constraints (11b), we consider how much η can increase by adding constraints corresponding to V = ℓand V = u. For any fixed q, denote

m v q(v)dv + (1 Z c

m q(v)dv)m.

Online Algorithms with Uncertainty-Quantified Predictions

For the former, by adding the constraint ℓ ηG( ℓ),

η will change by at most

|ℓ/G(ℓ) ℓ/G( ℓ)| |ℓ/G(ℓ) ℓ/G( ℓ)| + λ2/G( ℓ)

ℓG(ℓ) G( ℓ)

G(ℓ)G( ℓ) + λ2/G( ℓ)

ℓG(ℓ) G(ℓ) + R ℓ ℓv q(v)dv m R ℓ ℓq(v)dv m2 + λ2/m

ℓ(ℓ2 ℓ2)/2m2 + λ2/m λ2(M 2/m2 + 1/m),

where the first inequality follows by triangle inequality and the third inequality follows by taking G(c) m and x(v) 1. Similar steps for the latter yields the same bound:

|u/G(u) u/G( u)| |u/G(u) u/G( u)| + λ2/G( u)

u G( u) G(u)

G(u)G( u) + λ2/G( u)

u G( u) G( u) + R u u v q(v)dv m R u u q(v)dv m2 + λ2/m

u( u2 u2)/2m2 + λ2/m λ2(M 2/m2 + 1/m),

Denote θt = ( ℓt, ut, δt), and let q , q be the optimal decisions for Ut( ; θt), Ut( ; θt), respectively. Thus, Ut( q ; θt) Ut(q ; θt) + ( M 2

m)λ2. Finally, note that the number of points in the ϵ-net constructed by Algorithm 6 is |N| = P

(ℓ,u) |N(ℓ,u)| = O(( M m

λ2 + 1)2/ϵ), and that U t is M/m-Lipschitz in δ. Putting everything together and again following the steps of the proof for Theorem 5,

t [T ] ft(ˆqt; θt) O((M m

λ2 + 1)(M/m) p

T/ϵ) + 2Tϵ(M/m) + X

t [T ] U t(ˆq t ; θt)

λ2 + 1)(M/m) p

T/ϵ) + 2Tϵ(M/m) + Tλ1(M/m) + X

t [T ] Ut( q t ; θt)

λ2 + 1)(M/m) p

T/ϵ + (M 2/m2 + M/m)(ϵ + λ1 + λ2)T) + X

t [T ] Ut(q t ; θt).

Here O hides factors of ln(M/m). Setting ϵ = min{(M/m) 2/5, 1}T 1/5 < 1, λ1 = min{(M/m) 2/5, 1} min{M/m 1, 1}T 1/5 < M/m 1, and λ2 = min{(M/m) 2/5, 1}(M m)T 1/5 < M m yields the result.

D.6. Experiments

Setup. We set buying cost to B = 2. We generate T = 3000 instances, each with true skiing days nt sampled uniformly at random from {1, . . . , 8}. Synthetic PIP predictions are generated by sampling a point pt from a normal distribution N(nt, σ2 t ) and then taking the 90% confidence interval (ℓt, ut) = (pt z0.95σt, pt + z0.95σt). Here, σt is set to simulate oscillating good and bad predictors: the first 10 instances have σt = 0, followed by the next 10 with σt = 6, and repeating.

Comparison algorithms. We run the following online algorithms over the same sequence of UQs and instances 10 times and evaluate the average excess CR (i.e., the empirical ratio minus 1) of these algorithms over these runs. WOA: worst-case optimal randomized algorithm (Karlin et al., 1990) that is e/e 1-competitive. FTP: follow-the-prediction algorithm that fully trusts the prediction; it buys immediately on day 1 if the prediction is less than B, and otherwise rents forever. OL-Dynamic: online learning with respect to policy regret (Algorithm 5 adapted to ski rental, i.e., Algorithm 6), i.e., competing against π = arg minπ P

t [T ] Ut(π(θt); θt). OL-Static: online learning with respect to static regret, i.e., competing against the optimal fixed decision without considering UQ predictions. RSR-PIP: randomized algorithm with PIP (Algorithm 3) that achieves the optimal DRCR.