# adaptive_conformal_inference_by_betting__76f6752d.pdf Adaptive Conformal Inference by Betting Aleksandr Podkopaev 1 Darren Xu 1 Kuang-chih Lee 1 Conformal prediction is a valuable tool for quantifying predictive uncertainty of machine learning models. However, its applicability relies on the assumption of data exchangeability, a condition which is often not met in real-world scenarios. In this paper, we consider the problem of adaptive conformal inference without any assumptions about the data generating process. Existing approaches for adaptive conformal inference are based on optimizing the pinball loss using variants of online gradient descent. A notable shortcoming of such approaches is in their explicit dependence on and sensitivity to the choice of the learning rates. In this paper, we propose a different approach for adaptive conformal inference that leverages parameter-free online convex optimization techniques. We prove that our method controls long-term miscoverage frequency at a nominal level and demonstrate its convincing empirical performance without any need of performing cumbersome parameter tuning. 1. Introduction Accurate uncertainty estimation plays a crucial role in the practical deployment of machine learning models, particularly in contexts where model outputs impact downstream decision-making. A popular approach for quantifying predictive uncertainty is by using prediction sets: intervals in regression tasks or collection of labels in classification problems. The primary objective of such sets is to achieve valid coverage, meaning that they should cover the true labels with high probability (e.g., 90%). In addition to coverage, the sharpness, or size of such prediction sets is extremely important in real-world applications. Conformal prediction (Vovk et al., 2005) stands out as a versatile framework which is well-suited for this task: it allows for the con- 1Walmart Global Tech. Correspondence to: Aleksandr Podkopaev . Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s). struction of uncertainty quantification wrappers that can be seamlessly placed on top of arbitrary prediction models. Suppose that a model ˆf : Rd R has been trained to generate real-valued predictions. One option is to resort to conformal predictors that output sets of the following form: ˆC(x; s) := [ ˆf(x) s, ˆf(x)+s]. Here, candidate prediction sets are parameterized by a single univariate parameter, denoted by s. The goal is to calibrate this parameter, i.e, determine a suitable ˆs that ensures coverage. In this context, we remind the reader of a well-known technique of split conformal prediction, which relies on a holdout set not used during training: {(Xi, Yi)}n i=1. Estimating errors via the absolute residuals, or the nonconformity scores: Ri = | ˆf(Xi) Yi|, and selecting ˆs as the (1 α)(n + 1) -smallest value amongst {R1, . . . , Rn, + } results in a conformal predictor that satisfies: P Ytest ˆC(Xtest; ˆs) 1 α, as long as (X1, Y1), . . . , (Xn, Yn), (Xtest, Ytest) are exchangeable. We note that conformal inference is not specific to the above setting and has been extended in various directions. First, it offers much higher flexibility beyond the standard point forecasting model described above, e.g., it can be applied to recalibrate the prediction intervals associated with conditional quantile regression models (Romano et al., 2019). Moreover, procedures that extend the split conformal framework beyond a single sample split, and hence allow better utilization of available data, have been developed, including procedures like Jackknife+ (Barber et al., 2021). For a detailed overview of recent advancements and trends in the field of conformal inference, we refer the reader to Angelopoulos & Bates (2023). While conformal inference relies on the assumption of data exchangeability, it often fails to be met in practice. Examples include cases where data arrive sequentially over time, possibly exhibiting shifts in distribution, or where one is dealing with time series data. Despite the imposed practical challenges, there remains a huge demand for supplementing point predictions with valid measures of uncertainty. In this work, we consider the problem of online conformal inference without imposing any distributional or dependency assumptions on the data generating process and focus on approaches that are applicable to arbitrary data streams. Adaptive Conformal Inference by Betting We assume that the data are observed as a stream: ((Xt, Yt))t 1. At each time point t, the goal is to construct a prediction set for Yt using all of the previously observed data {(Xi, Yi)}i t 1, as well as feature vector Xt. For brevity, we often capture the dependence of a conformal predictor on all of the available information that can be used at any time point (e.g., exogenous features or lagged response variables) using the index variable. Specifically, we write ˆCt(s) := ˆCt({(Xi, Yi)}i t 1 , Xt; s) to represent the prediction set for the response variable Yt. Our objective is to design a conformal predictor whose observed long-term miscoverage rate is equal to the nominal level denoted as α. In other words, we aim to construct a sequence of radii (st)t 1 so that the corresponding prediction sets satisfy: t=1 1 n Yt / ˆCt(st) o α In addition to assessing long-term coverage, we consider other performance metrics that are helpful in differentiating meaningful conformal predictors from trivial ones. For example, assuming that the response variables are bounded (i.e., |Yt| B, t 1, for some absolute constant B > 0), a conformal predictor that switches at random between generating empty sets (α fraction of the time) and intervals [ B, B] (the remaining (1 α) fraction of the time) technically satisfies (1), yet represents a practically useless tool for uncertainty quantification. To address this, we use the concept of regret, quantified as the cumulative pinball loss (since we are working with online quantile estimation) of a sequence of radii obtained using our method in comparison to an unknown benchmark point, as an additional metric. In particular, sub-linear regret bounds allow to justify the effectiveness of an adaptive conformal predictor in a more meaningful way compared to the coverage guarantee alone. Related Work. One of the earliest works where online convex optimization techniques have been applied in the context of uncertainty quantification is the one by Gibbs & Cand es (2021). Their methodology for learning a sequence of radii is based on applying online (sub)gradient descent to optimize the pinball loss. However, this approach has some limitations, including the need to specify the learning rate in advance and the potential for outputting empty or infinite prediction sets. Subsequent works by Zaffran et al. (2022); Gibbs & Cand es (2022); Bhatnagar et al. (2023); Angelopoulos et al. (2023) have introduced extensions of the above method to address some of those shortcomings. The primary drawback of the aforementioned methods lies in their explicit dependence on a learning rate (or a specified grid thereof for approaches that utilize meta-learning to improve upon using a single learning rate), with the performance often being highly sensitive to such design choices. For example, higher learning rates promote adapt- ability to dynamic environments but may often lead to highly volatile prediction sets. The resulting online conformal predictors may oscillate between outputting overly small (anticonservative) and excessively large (conservative) prediction sets for consecutive time steps, while still demonstrating empirical coverage close to the target level. Conversely, lower learning rates often result in conformal predictors that may sacrifice coverage for stability, potentially taking much longer to adapt to changes in distribution, hence failing to accurately represent uncertainty. Moreover, the selection of learning rates is heavily influenced by the scale of the errors, or the nonconformity scores, particularly in the case of realvalued responses. Heuristic approaches for approximating the scale, such as using a maximum or a high quantile of the historical response values, come with risks that may compromise the performance in practice (beyond the coverage guarantee being lost). These considerations introduce (potentially unnecessary) complexity if one aims to automate the implementation of the uncertainty quantification block in practice and become even more pronounced if uncertainty estimates are constructed for (a) a large collections of input data streams instead of a single one, and (b) multi-horizon forecasts rather than the one-step-ahead ones. We note that several of the aforementioned works (Bhatnagar et al., 2023; Gibbs & Cand es, 2022) proposed to supplement the coverage guarantee (1) with regret guarantees that are stronger compared to the one considered in the current work. This is achieved via meta-learning: selected base models, e.g., the adaptive conformal predictors proposed by Gibbs & Cand es (2021) with different learning rates, are subsequently aggregated using a meta-procedure. Our focus, however, is different as we consider practical algorithms designed to address the issues of cumbersome parameter tuning or even the necessity of selecting a grid of learning rates. We achieve this by leveraging parameter-free online convex optimization techniques with sub-linear regret bounds, particularly those that are based on coin betting (Orabona & P al, 2016; Cutkosky & Orabona, 2018). Amongst other related works on conformal prediction with non-exchangeable data, we highlight methods that leverage reweighting schemes (Tibshirani et al., 2019; Podkopaev & Ramdas, 2021; Lei & Cand es, 2021; Fannjiang et al., 2022; Cand es et al., 2023) and approaches designed to handle time series data (Chernozhukov et al., 2018; Xu & Xie, 2021; Stankeviciute et al., 2021; Xu & Xie, 2023). We note that these methods either place some distributional assumptions (e.g., relationship between the source and the target domains for covariate/label shift, mixing assumptions for time series data) or characterize the coverage gap rather than guaranteeing coverage of the resulting predictor at a user-specified level. Adaptive Conformal Inference by Betting Contributions. In this work, we apply parameter-free optimization techniques to the problem of online conformal inference. We prove that the resulting conformal predictor controls the miscoverage rate at a pre-specified level. Through extensive simulations with focus on adaptability to distribution shifts, we demonstrate the compelling empirical performance of the proposed methods. Our approach nicely complements the existing methods in the literature due to its ease of implementation, computational efficiency, and absence of any parameter tuning. 2. Betting-based Adaptive Conformal Inference We focus on conformal predictors that output sets of the following form: ˆCt(s) := [ ˆYt s, ˆYt+s], where the prediction ˆYt is based on all information available prior to the true response Yt being revealed. We note that our methodology is applicable beyond such setting, e.g., it can be used to recalibrate the prediction intervals based on conditional quantile regression models: ˆCt(s) := [ˆq(α/2) t s; ˆq(1 α/2) t + s], but we avoid the details for brevity. We refer the reader to Gupta et al. (2022) for more versions of the nested prediction sets for which our methodology is applicable. Let St denote the radius of a smallest prediction set that contains the true response Yt: St = inf n s R : Yt ˆCt(s) o = inf n s R : Yt [ ˆYt s, ˆYt + s] o Since the coverage event: {Yt ˆCt(s)}, is equivalent to {St s}, the target property (1) of the miscoverage being equal to the nominal level α can be expressed as: t=1 1 {St s} (1 α) Hence, we can frame the task of constructing adaptive conformal predictors as a problem of learning (1 α)-quantile of the nonconformity scores: (St)t 1, in an online fashion. Quantile Estimation and Adaptive Conformal Inference. Learning the quantiles of a distribution is achieved by optimizing the pinball loss, defined for β-quantile as ℓβ(s, St) = max {β(St s), (1 β)(s St)} = (1 {s St} β) (s St) . The pinball loss is a convex and max{β, 1 β}-Lipschitz (in the first argument) loss function, whose subdifferential is given by ℓβ(s, St) = ( 1 {St s} β, s = St, [ β, 1 β], s = St. (3) Taking β = 1 α, we recall that the updates corresponding to the online subgradient descent (OGD) take form: st+1 = st η (1 {St st} (1 α)) = st η 1 n Yt ˆCt(st) o (1 α) = st η α 1 n Yt / ˆCt(st) o . Throughout this work, we let gt ℓ1 α(s, St)|s=st denote the subgradients of the quantile losses at time steps t = 1, 2, . . . The online gradient descent updates stated above admit a natural interpretation as an adjustment of the prediction interval s radius for the subsequent round in response to whether a conformal predictor covers the truth at a given time step: the radius is increased if a conformal predictor fails to cover the truth, and decreased otherwise. We note that in Gibbs & Cand es (2021), the authors did not directly apply the online subgradient descent to update the radii as stated in (4). Instead, they applied it to update a sequence of quantile levels: (αt)t 1, and the radii were determined by computing the empirical quantiles of the residuals, or the nonconformity scores, at the corresponding levels. However, this approach faces the following issue: whenever some αt falls outside the unit interval, the resulting conformal predictor outputs either infinite (αt > 1) or empty (αt < 0) prediction sets. While this problem does not arise when the radii are updated as per (4), the scale of the radii becomes a crucial factor in determining an appropriate learning rate η since the subgradients of the pinball loss are less than max {1 α, α} 1 in absolute value. We also consider a closely-related method for adaptive conformal inference that has been proposed by Bhatnagar et al. (2023). This approach builds upon scale-free online gradient descent (SF-OGD) introduced in Orabona & P al (2018). The corresponding update rule takes form: st+1 = st η α 1 n Yt / ˆCt(st) o Pt i=1 α 1 n Yi / ˆCi(si) o 2 , (5) where, in contrast to (4), the effective learning rate decays over time, being inversely proportional to the square root of the sum of the squared gradients. We note that SF-OGD still requires pre-specifying the learning rate, just as the standard OGD, and hence, requires considering the scale of the nonconformity scores. In both cases, the tuning process becomes much more complex if one applies adaptive conformal inference for multi-step forecasts or for a potentially large collection of input data streams. Adaptive Conformal Inference by Betting Our Approach. We address the issues related to tuning the learning rates when learning conformal predictors by adopting parameter-free online convex optimization techniques. Specifically, we utilize optimization techniques that are based on coin betting (Orabona & P al, 2016; Cutkosky & Orabona, 2018). The high-level idea involves framing the learning process as a game where a gambler repeatedly places bets on the outcomes of continuous coin flips. Let Wt denote the gambler s wealth at the end of round t. Starting with initial capital W0 = 1, the gambler bets on the outcome of a coin flip ct [ 1, 1] at each round t. The gambler is allowed to bet any amount st on either heads or tails but is restricted from borrowing any money, i.e., we can write st = λt Wt 1 for some λt [ 1, 1]. The sign of st specifies the gambler s choice between heads or tails (in general, st may be negative) and the absolute value represents the corresponding betting amount. In the t-th round, the gambler gains stct if sign(st) = sign(ct) and incurs a loss of stct otherwise. Thus, we have: Wt = Wt 1 + stct = 1 + Pt i=1 sici. For the setting of i.i.d. coin flips (ct { 1, +1} are generated i.i.d. with a known probability of heads p [0, 1]), the optimal strategy has been proposed by Kelly (1956): he showed that betting st = 2p 1 yields more wealth than betting any other fixed fraction in the long run. For a sequence of possibly adversarial coin flips, Krichevsky & Trofimov (1981) proposed a practical betting scheme that guarantees almost the same wealth as one could obtain betting any fixed fraction of wealth at each round. Moreover, the corresponding guarantee is known to be optimal up some constant factors (Cesa-Bianchi & Lugosi, 2006). In our scenario, the coin outcomes are determined by the negation of the subgradients of the pinball loss defined in (3): ct = gt for t 1. We consider two popular betting strategies: one based on Krichevsky-Trofimov (KT) estimator (Krichevsky & Trofimov, 1981), extended to the case of continuous coins (i.e., ct [ 1, 1] instead of { 1, +1}) in Orabona & P al (2016), and a simple optimization procedure based on the Online Newton Step (ONS) method (Hazan et al., 2007; Cutkosky & Orabona, 2018). In online learning, a standard performance metric is regret, which measures the cumulative loss of (st)T t=1 relative to an unknown benchmark point, denoted by s : t=1 ℓt(st) ℓt(s ). Betting games are useful for designing online convex optimization algorithms since the bounds on the minimum wealth can be used to derive the corresponding regret bounds. Both of the considered betting strategies yield online convex optimization algorithms with sub-linear regret in our setting. In particular, taking ℓt(s) = ℓ1 α(s, St) for t 1, we get a sequence of convex and (1 α)-Lipschitz (assuming α < 1/2) loss functions. Therefore, for a sequence of radii (st)T t=1 obtained using the KT estimator it holds that: RT (s ) 1+|s | p 4T ln (1 + |CTs |), s R, (6) for some universal constant C (Orabona & P al, 2016). For online subgradient descent, the corresponding regret bound involves the learning rate parameter. More precisely, it can be shown for online subgradient descent that: RT (s ) (s )2 and hence, the learning rate which minimizes the upper bound is η = |s | / T. As discussed later in this Section, the boundedness of the nonconformity scores: |St| D for some D > 0 and all t 1, is a necessary condition to ensure long-term coverage (1). In this case, using η = D/ T results in regret bound: D T, which is known to be optimal for bounded domains up to multiplicative constants. However, implementing the resulting algortihm in practice still requires an explicit knowledge of D. In contrast, utilizing KT betting results in a sub-optimal regret bound up to logarithmic factors (which is a secondary metric of our interest), but allows to avoid tuning any parameters. We summarize the adaptive conformal predictor that utilizes KT betting strategy in Algorithm 1 and defer the description of the adaptive conformal predictor that uses the ONS betting scheme to Appendix A. Algorithm 1 KT-based Adaptive Conformal Predictor. Initialize: α (0, 1), W0 = 1, λ1 = 0, s1 = 0. for t = 1, 2, . . . do Produce a forecast ˆYt = ft(Xt, {(Xi, Yi)}i t 1) and output a set: ˆCt(st) = [ ˆYt st; ˆYt + st]; Observe Yt and compute error: St = Yt ˆYt ; Compute gt ℓ1 α(s, St)|s=st as per (3); Set Wt = Wt 1 gtst; Set λt+1 = t t+1λt 1 t+1gt; Set st+1 = λt+1Wt; end for While the sub-linear regret guarantees are helpful in eliminating trivial conformal predictors, the coverage guarantee outlined in (1) does not directly follow from the regret bound. These guarantees have to be derived independently. In the following result, we establish that the proposed approach for online conformal inference based on KT betting strategy attains a long-term miscoverage rate precisely equal to the nominal level α. The proof is deferred to Appendix B. Theorem 2.1. Fix the target miscoverage level α (0, 1/2). Suppose that the nonconformity scores are bounded: St Adaptive Conformal Inference by Betting [0, D] for t = 1, 2, . . . , for some D > 0. Then the adaptive conformal predictor defined in Algorithm 1 satisfies the long-term coverage guarantee (1). The boundedness of the nonconformity scores is the only assumption which is made in Theorem 2.1 to ensure coverage of the proposed conformal predictor. The method itself does not depend on the explicit knowledge of such bound. It is easy to see that for the KT-based online conformal predictor outlined in Algorithm 1 the assumption regarding bounded scores is indeed necessary for achieving (1). If this assumption is violated, then one can easily construct an adversarial example where miscoverage rate is actually equal to one: once the radius st is predicted, it is always possible to choose a response value Yt that lies outside of the predicted interval, resulting in error at each round. Finally, we note that the same argument can be used to show that the boundedness assumption is also necessary for conformal predictors whose radii are updated according to (4) or (5) to satisfy the long-term coverage guarantee (1). 3. Experiments In our simulation study, we consider a collection of simulated and real datasets where the data distribution changes over time, and hence, the exchangeability assumption is violated. Throughout all experiments, we fix the target coverage level at 90% (α = 0.1). We compare adaptive conformal predictors learned using the proposed betting scheme against those that are learned using OGD (4) and SF-OGD (5). We demonstrate that our method without performing any parameter tuning achieves performance that either matches or is close to that of a conformal predictor obtained by deploying versions of online gradient descent with carefully tuned learning rates. Changepoint Setting. Following Barber et al. (2023), we consider a changepoint setting setting where the data {(Xt, Yt)}n t=1 are generated according to a linear model: Yt = X t βt +εt, Xt N(0, I4), εt N(0, 1), t 1. We consider the following scenario: βt = β(0) = (2, 1, 0, 0) , t = 1, . . . , 500, βt = β(1) = (0, 2, 1, 0) , t = 501, . . . , 1500, βt = β(2) = (0, 0, 2, 1) , t = 1501, . . . , 2000, where two changes in the coefficients happen up to time 2000. For prediction, we first use a standard linear regression model whose coefficients are learned by optimizing the least squares objective on observed data prior to a given time step. In Figure 1, we compare our adaptive conformal predictors against those that are trained using variants of gradient descent with varying learning rates. The top plot shows that the empirical coverage of the con- formal predictors learned via versions of online gradient descent is nearly equal to the nominal level whenever the learning rates are high enough. Although our conformal predictor demonstrates slightly lower coverage (around 88%), such difference is typically of minor practical importance. When the learning rates are too small, the OGD-based conformal predictors demonstrate empirical coverage that is significantly lower than the target level. Conversely, with learning rates that are overly high, the average width of the output sets increases, resulting in overly conservative sets, as observed in the bottom plot. In fact, such conformal predictors oscillate between outputting overly narrow and overly wide sets (since the learning rate is high and fixed). Figure 1. Comparison of the proposed conformal predictor against those learned via OGD/SF-OGD with different learning rates. The performance of the conformal predictors learned via OGD/SFOGD is sensitive to the choice of the learning rate, whereas the performance of the betting-based ones is close to (in terms of coverage and width) to that of the carefully tuned alternative methods. The results are aggregated over 200 random seeds. While useful, the above findings provide limited insights into the adaptability of conformal predictors to changes in distribution. To illustrate such adaptability properties, we compare the localized coverage and width of our conformal Adaptive Conformal Inference by Betting predictors and those trained via online gradient descent for three particular choices of learning rates in Figure 2. We observe a drastic impact of a learning rate on the performance of the resulting conformal predictor. For the OGD-based method, the output sets are generally overly conservative (top-right plot) for large learning rate (η = 4). This issue is addressed by SF-OGD (bottom-right plot), whose learning rate effectively decreases over time. However, if the learning rate becomes too low (η < 1), the localized coverage of SFOGD-based conformal predictor recovers very slowly after changes in distribution take place. As alluded to before, if the learning rate for OGD-based conformal predictors is set too high, the resulting sets may become volatile. We illustrate this problem on Figure 3 where the KT-based conformal predictor is compared against the one based on OGD with η {0.25, 1}. For each approach, we estimate local deviation of the interval width using a rolling window of size 10. Although the empirical coverage and average width of the uncertainty intervals produced by conformal predictor based on OGD with η = 1 is close to that of the KT-based one (Figure 2), we observe that the local deviation of the interval width for the former method is much higher, indicating that the corresponding width changes abruptly between consecutive time steps. Under an abrupt change in the data distribution, the predictive accuracy may drop if the model is trained on all data (without considering potential shifts in distribution). Predictive models which are trained using online gradient descent or utilize weighting schemes, with higher weights being assigned to the most recent datapoints, may adapt to shifts in distribution much faster. We consider a second option and refer the reader to Appendix C.1 for a comparison between various methods for adaptive conformal inference when a linear model, whose coefficients are learned by optimizing the weighted least squares objective, is used. Electricity Demand Data. Next, we consider the dataset for forecasting the electricity demand in New South Wales (Harries, 1999). Following Angelopoulos et al. (2023), we use AR(3) model as an underlying predictor. In Figure 4, we compare coverage and width respectively of conformal predictors constructed using betting schemes against those based on online gradient descent with varying learning rates. We present the results for SF-OGD only, deferring those for OGD to Appendix C.2. We observe that four conformal predictors demonstrate similar performance: for all methods, the empirical coverage is near the nominal level and the resulting prediction sets have roughly similar width. As we illustrate in Appendix C.2 (particularly, Figure 12), the resulting prediction sets become visually indistinguishable after processing a relatively small number of observations. In contrast, OGD with the same learning rate (η = 0.1) yields conformal predictors that are significantly wider on average than that of alternative methods. This is due to using a fixed learning rate throughout the whole process; see Figure 10 for details. In applications, practitioners are often interested in multistep forecasting for some horizon H. Adaptive conformal predictors may handle such cases by simply associating each step with a separate radius: s(1), . . . , s(H), and deploying online optimization schemes (e.g., one in Algorithm 1) independently for each of the parameters. On the same electricity dataset, we consider the problem of uncertainty quantification in multi-step forecasting, setting the horizon H = 5. We use AR(3) model and utilize a simple approach when k-step ahead forecast is used as an input feature for constructing a forecast on (k + 1)-st step. The parameters of the model are updated each time the next 5 true responses are revealed. In Table 1, we summarize average empirical coverage and average size of the prediction set for conformal predictors learned using KT betting scheme and versions of online gradient descent with η = 0.01. In terms of global metrics, all methods demonstrate similar performance. In Figure 5, we illustrate localized coverage and width for all methods, restricting the attention to the last step in the forecasting horizon (k = 5). In this case, we observe that the KT-based conformal predictor exhibits behavior that is closer to that based on OGD (4) rather than to that based on SF-OGD (5): for the two former methods, localized coverage is more tightly concentrated around the nominal level. In contrast, the conformal predictor based on SFOGD happens to either undercover or be overly conservative over long periods of time, failing to respond quickly to the changes in the data distribution. Coverage Width k KT OGD SF-OGD KT OGD SF-OGD 1 89.1 89.9 90.1 7.58 8.36 7.92 2 89 89.9 90.1 14.3 14.9 14.6 3 89 89.9 89.8 21.6 22.6 21.8 4 89 89.9 89.5 28.6 30 28.7 5 88.9 89.8 89.2 35.3 36.3 35 Table 1. The results for uncertainty quantification in k-step ahead electricity demand forecasting. The empirical coverage of conformal predictors learned using versions of online gradient descent is closer to the nominal level, yet the coverage of that learned using betting schemes is only slightly below. The KT-based conformal predictor yields shorter prediction sets on average (four out of five cases). The empirical coverage is shown in percentages. The average width of prediction sets has been multiplied by 100. Stock Prices Data. Finally, we consider uncertainty quantification in the problem of forecasting stock prices. In particular, we use the closing prices of five different stocks: Adaptive Conformal Inference by Betting 0 250 500 750 1000 1250 1500 1750 2000 Time 0 250 500 750 1000 1250 1500 1750 2000 Time KT ONS = 1 = 0.25 = 4 Figure 2. Comparison of the conformal predictor trained using parameter-free optimization techniques (KT, ONS) against those trained using variants of online gradient descent with varying learning rates (OGD, SF-OGD). We avoid plotting results observed for the first 50 observations. The results are aggregated over 250 random seeds and smoothed using rolling window of size 10. 0 250 500 750 1000 1250 1500 1750 2000 Time Local Deviation of Width Figure 3. Local deviation of the width of the uncertainty sets returned by KT-based and OGD-based conformal predictors. If the learning rate for OGD-based conformal predictor is set too high, the width of the output sets may change abruptly between consecutive time steps. Deviations are computed using rolling window of size 10 and are averaged over 250 random seeds. Apple (APPL), Meta (META), Microsoft (MSFT), Netflix (NFLX), and Walmart (WMT), collected over a five-year period (from January, 25th 2019 to Jan, 24th 2024); see Figure 61. Rather than forecasting the closing price for the next day only, we consider multi-horizon forecasting for each calendar week (i.e., forecasting horizon H = 5): each day is associated with the corresponding radius which is updated on a weekly basis, i.e., s(1) is a radius that is used to construct uncertainty estimates for Mondays exclusively. We take into account the days of market closure due to holidays as follows. For example, if the first trading day of a week happens to be Wednesday, we use the radius s(3) to construct the corresponding prediction interval. Subsequently, after prices in a given week are observed, there is no update to s(1) and s(2). We use the Prophet (Taylor & Letham, 2018) as our prediction model and operate in the log-space for forecasting. For each stock, the first 25 weeks of data are used to train the initial model, followed by retraining at the end of each subsequent week. For conformal predictors based on variants of online gradient descent, we compute the in-sample residuals for the initial model and use the empirical absolute error as a learning rate. Although the in-sample residuals 1Example of the data source for one of the stocks. Adaptive Conformal Inference by Betting 0 10000 20000 30000 40000 Time step (a) One-step ahead. 0 10000 20000 30000 40000 Time step (b) One-step ahead. Figure 4. Comparison between conformal predictors constructed using betting schemes and SF-OGD (5) with η {0.01, 0.1} for one-step ahead forecasting. For all methods, the empirical coverage is near the nominal level and the resulting prediction sets have roughly similar width. The results are smoothed over a rolling window of 100 observations. 0 2000 4000 6000 8000 Time step KT SF-OGD OGD (a) Five-step ahead. 0 2000 4000 6000 8000 Time step KT SF-OGD OGD (b) Five-step ahead. Figure 5. Comparison between conformal predictors constructed using betting schemes and versions of versions of online gradient descent (4) and (5) with η = 0.01 for five-step ahead forecasting. The results are presented for the fifth step and are smoothed over a rolling window of 100 observations. underestimate the out-of-sample residuals, one may want to try a smaller learning rate, particularly for the OGD-based conformal predictor (4). However, our empirical observations indicated that such choice worsens the results in terms of coverage. In addition to conformal methods, we also consider prediction intervals that are provided by Prophet as a native uncertainty quantification tool. In Table 2, we present an overview of the empirical coverage results aggregated across five stocks. We observe that all methods for uncertainty quantification demonstrate an average coverage lower than the nominal level. The uncertainty intervals provided by Prophet stand out as significantly suboptimal compared to alternative methods. In Appendix C.3, we further illustrate Prophet fails to adapt to multi-step forecasting by only marginally increasing in set size relative to the number of steps ahead. Amongst other methods, conformal predictor which is based on OGD (4) demonstrates the coverage that is closest to the nominal level. However, we note that the KT-based conformal predictor is only slightly inferior, while being insensitive to parameter tuning. Unlike 2019 2020 2021 2022 2023 2024 Date MSFT APPL WMT META NFLX Figure 6. Visualization of the prices for the selected stocks over five years. standard OGD, the effective learning rate of SF-OGD is decreasing over time. This in turn has a direct impact on the poor empirical coverage of the resulting conformal predictor: decreasing learning rate happens to hurt the ability of the uncertainty estimates to appropriately adjust in response Adaptive Conformal Inference by Betting to the drop of model accuracy. We refer the reader to Appendix C.3 for additional results about the performance of different adaptive conformal predictors in stock price forecasting. In particular, we demonstrate that in some cases, the KT-based conformal predictor yields shorter intervals than that based on OGD, despite showing similar coverage. Coverage k KT OGD SF-OGD Native method 1 84.8 86 79.6 69.4 2 85.0 86.4 79.5 67.6 3 84.9 86.4 78.8 62.9 4 84.6 85.9 76.9 60.4 5 84.6 85.4 76.3 58.8 Table 2. Empirical coverage comparison in k-step ahead stock price forecasting. The OGD-based conformal predictor demonstrates coverage that is closest to the nominal level, with the KTbased one being slightly inferior. The native prediction intervals offered by Prophet show the lowest coverage across all steps. 4. Conclusion A number of methods have been recently proposed for online conformal inference, primarily utilizing versions of online gradient descent. Such methods are generally sensitive to the choice of (possibly a grid of) learning rates, and hence, usually require careful tuning. Our primary contribution lies in demonstrating that parameter-free online convex optimization techniques can effectively address this issue, resulting in a compelling method for adaptive conformal inference. Despite its simplicity, our approach is advantageous from several standpoints. First, our online conformal predictor provably achieves long-term coverage. Second, additional properties, such as sub-linear regret, justify its practical utility. We note that the absence of tuning comes at a cost: our method is guaranteed to achieve correct coverage rate in the limit, it may demonstrate coverage that is lower than the nominal level in finite-sample regime (although, the difference is usually small and is of little practical interest). Methods that are based on versions of online gradient descent often demonstrate marginal coverage that is closer to the the nominal level, but run into risk of failing to adapt to distribution shifts, yielding either overly conservative or overly optimistic prediction sets over spans of time as a result. Empirical evidence demonstrates that our method generally performs only slightly worse or matches the performance of methods based on gradient descent with carefully tuned parameters. Therefore, using betting-based conformal inference can be advantageous in scenarios where precision is prioritized (given an acceptable coverage). The primary strength of our method lies in its simplicity and ease of implementation, making it a practical and accessible choice for applications. Therefore, we view it as a useful method in a toolbox of machine learning practitioners. Impact Statement This paper presents general methodological work whose goal is to advance the field of Machine Learning. By providing open access to the code as a supplement for the purposes of transparency and reproducibility, our work aims to reach better understanding within the research community. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here. Angelopoulos, A., Cand es, E. J., and Tibshirani, R. J. Conformal pid control for time series prediction. In Advances in Neural Information Processing Systems, 2023. Angelopoulos, A. N. and Bates, S. Conformal prediction: A gentle introduction. Foundations and Trends in Machine Learning, 2023. Barber, R. F., Cand es, E. J., Ramdas, A., and Tibshirani, R. J. Predictive inference with the jackknife+. The Annals of Statistics, 2021. Barber, R. F., Cand es, E. J., Ramdas, A., and Tibshirani, R. J. Conformal prediction beyond exchangeability. The Annals of Statistics, 2023. Bhatnagar, A., Wang, H., Xiong, C., and Bai, Y. Improved online conformal prediction via strongly adaptive online learning. In International Conference on Machine Learning, 2023. Cand es, E. J., Lei, L., and Ren, Z. Conformalized survival analysis. Journal of the Royal Statistical Society Series B: Statistical Methodology, 2023. Cesa-Bianchi, N. and Lugosi, G. Prediction, learning, and games. Cambridge University Press, 2006. Chernozhukov, V., W uthrich, K., and Yinchu, Z. Exact and robust conformal inference methods for predictive machine learning with dependent data. In Conference On Learning Theory, 2018. Cutkosky, A. and Orabona, F. Black-box reductions for parameter-free online learning in banach spaces. In Conference On Learning Theory, 2018. Fannjiang, C., Bates, S., Angelopoulos, A. N., Listgarten, J., and Jordan, M. I. Conformal prediction under feedback covariate shift for biomolecular design. Proceedings of the National Academy of Sciences, 2022. Gibbs, I. and Cand es, E. J. Adaptive conformal inference under distribution shift. In Advances in Neural Information Processing Systems, 2021. Adaptive Conformal Inference by Betting Gibbs, I. and Cand es, E. J. Conformal inference for online prediction with arbitrary distribution shifts. In ar Xiv preprint: 2305.12616, 2022. Gupta, C., Kuchibhotla, A. K., and Ramdas, A. Nested conformal prediction and quantile out-of-bag ensemble methods. Pattern Recognition, 2022. Harries, M. Splice-2 comparative evaluation: Electricity pricing. Technical report, University of New South Wales, 1999. Hazan, E., Agarwal, A., and Kale, S. Logarithmic regret algorithms for online convex optimization. Machine Learning, 2007. Kelly, J. L. A new interpretation of information rate. IRE Transactions on Information Theory, 1956. Krichevsky, R. and Trofimov, V. The performance of universal encoding. IEEE Transactions on Information Theory, 1981. Lei, L. and Cand es, E. J. Conformal inference of counterfactuals and individual treatment effects. Journal of the Royal Statistical Society Series B: Statistical Methodology, 2021. Orabona, F. and P al, D. Coin betting and parameter-free online learning. In Advances in Neural Information Processing Systems, 2016. Orabona, F. and P al, D. Scale-free online learning. Theoretical Computer Science, 2018. Podkopaev, A. and Ramdas, A. Distribution-free uncertainty quantification for classification under label shift. In Uncertainty in Artificial Intelligence, 2021. Romano, Y., Patterson, E., and Cand es, E. Conformalized quantile regression. In Advances in Neural Information Processing Systems, 2019. Stankeviciute, K., M. Alaa, A., and van der Schaar, M. Conformal time-series forecasting. In Advances in Neural Information Processing Systems, 2021. Taylor, S. J. and Letham, B. Forecasting at Scale. The American Statistician, 2018. Tibshirani, R. J., Barber, R. F., Candes, E., and Ramdas, A. Conformal prediction under covariate shift. In Advances in Neural Information Processing Systems, 2019. Vovk, V., Gammerman, A., and Shafer, G. Algorithmic Learning in a Random World. Springer-Verlag, 2005. Xu, C. and Xie, Y. Conformal prediction interval for dynamic time-series. In International Conference on Machine Learning, 2021. Xu, C. and Xie, Y. Sequential predictive conformal inference for time series. In International Conference on Machine Learning, 2023. Zaffran, M., Feron, O., Goude, Y., Josse, J., and Dieuleveut, A. Adaptive conformal predictions for time series. In International Conference on Machine Learning, 2022. Adaptive Conformal Inference by Betting A. Omitted Details Online Conformal Predictor with ONS Bets. A complete description of online conformal predictor that uses bets provided by online Newton step (ONS) is provided in Algorithm 2. Algorithm 2 ONS-based Online Conformal Predictor. Initialize: W0 = 1, λ1 = 0, A0 = 1, α (0, 1). for t = 1, 2, . . . do Produce a forecast ˆYt = ft(Xt, {(Xi, Yi)}i t 1) and output a set: ˆCt(st) = [ ˆYt st; ˆYt + st]; Observe Yt and compute error: St = Yt ˆYt ; Compute gt = ℓ1 α(s, St)|s=st; Set Wt = Wt 1 gtst; Set zt = gt/(1 λtgt); Set At = At 1 + z2 t ; Set λt+1 = λt 2 2 log(3) zt At 2; Set st+1 = λt+1Wt; end for While ONS betting scheme tends to yield adaptive conformal predictors with very impressive empirical performance, the corresponding betting fractions: (λt)t 1, are defined recursively which complicates the theoretical analysis of the resulting adaptive conformal predictors. Theorem 2.1. Fix the target miscoverage level α (0, 1/2). Suppose that the nonconformity scores are bounded: St [0, D] for t = 1, 2, . . . , for some D > 0. Then the adaptive conformal predictor defined in Algorithm 1 satisfies the long-term coverage guarantee (1). Proof. 1. First, note under the assumption that the nonconformity scores are bounded: Si D, i = 1, 2, . . . , for some D > 0, the following statements hold: (a) Suppose that for some i 1, it happens that the predicted radius si exceeds the upper bound D: si > D. Since si = λi Wi 1 and the wealth is nonnegative Wi 1 0, it implies that λi > 0. Further, the corresponding (sub)gradient is gi = α 1{Yi / ˆCi(Xi)} = α 1{Si > si} = α, which in turn implies that Wi = Wi 1(1 λigi) < Wi 1. For KT estimator, it holds that: λi+1 = i i+1λi 1 i+1gi < λi. In other words, we get that si+1 = λi+1Wi < si, meaning that the predicted radius for the next step necessarily decreases, and this process repeats until the predicted radius becomes less or equal than D. (b) Suppose that for some i 1, it holds that: si 0, but si+1 < 0. Then it has to be the case that si+2 > 0. Indeed, si 0 implies that λi 0 and si+1 < 0 implies that λi+1 < 0. Next, note that for KT estimator, it holds that: 0 > λi+1 = i i + 1λi 1 i + 1gi, which implies that gi > 0, and hence, gi = α. Since Si+1 0, it holds that gi+1 = α 1{Si+1 > si+1} = α 1. Adaptive Conformal Inference by Betting λi+2 = i + 1 i + 2λi+1 1 i + 2gi+1 i + 2 i i + 1λi i + 1 i + 2 1 i + 1gi 1 i + 2gi+1 = i i + 2λi 1 i + 2(gi + gi+1). Hence, since λi 0 and gi + gi+1 = 2α 1 < 0 (where we make a mild assumption that α < 0.5), we conclude that λi+2 > 0, and hence, si+2 > 0. 2. Since for any t 1, Wt = 1 Pt i=1 sigi 0, we get that Pt i=1 sigi 1. On the other hand, recall that if si > D, then we have that: gi = α > 0, and if si < 0, then gi = α 1 < 0. Hence, i=1 gisi |{z} >0 1 {si > D} + i=1 gisi 1 {si [0, D]} + i=1 gisi |{z} >0 1 {si < 0} i=1 gisi 1 {si [0, D]} We have shown that: Dt Pt i=1 sigi 1, and hence, max {1, Dt} Dt + 1. (7) Next, we bound the distance between the consecutive predicted radii. Observe that for KT bettor: st+1 = Pt i=1 gi t + 1 = Pt i=1 gi t + 1 Pt i=1 gi t + 1 = Pt 1 i=1 gi t + 1 Pt i=1 gi t + 1 = t t + 1st + 1 t + 1 i=1 gisi + gtst st+1 st = 1 t + 1 i=1 gisi + gtst From (8) and (7), it follows that: |st+1 st| 1 t + 1 (D + 1 + D(t 1) + 1 + Dt) 2D + 1. Combining that with the fact that s1 = 0 [0, D] and the result in step 1, we conclude that the iterates of the KT algorithm are bounded: |st| 3D + 1. 3. Finally, we show that if (1) fails to hold, then the iterates of KT bettor can not be bounded. Note that: 1 i=1 1 n Yi / ˆCi(Xi) o α Adaptive Conformal Inference by Betting where gi are defined in Algorithm 1. Next, suppose that (1) is not true, that is, ε > 0 : T T > T : 1 T PT i=1 gi ε. Since |st+1| = |λt+1Wt| = 1 t + 1 we have that ε > 0 : T T > T such that: |s T +1| 1 T + 1 T + 1ε WT . For KT bettor, it holds that (Orabona & P al, 2016): where K > 0 is a universal constant. Hence, we know that T T > T : implying that the iterates are unbounded. Hence, we have reached a contradiction with the conclusion of step 2, and thus, the coverage guarantee (1) has to hold. This completes the proof. C. Additional Experiments In this Section, we present additional simulations to Section 3. Section C.1 is deferred to the changepoint setting. Section C.2 is deferred to the experiment with electricity demand dataset (Harries, 1999). C.1. Changepoint Setting and Weighted Least Squares Model Here, we compare adaptive conformal predictors that are learned using parameter-free optimization techniques against those that are trained via versions of online gradient descent, and hence, require specifying the learning rates (see Section 2 for details). As an underlying model, we use a linear model, whose coefficients are learned by optimizing the weighted least squares objective: i=1 wi(Yi X i β)2. Specifically, with t available training points, the weights (wi)t i=1 are assigned to the first t (ordered) points, where wi = 0.99t+1 i, i = 1, . . . , t. The results for varying learning rates are presented in Figure 7. Similar to the case of a standard linear model, adaptive conformal predictors that utilize betting scheme tends to slightly undercover after processing 2000 observations. Using learning rates that are too high results in conformal predictors that output overly conservative sets. For example, OGD with η = 4 yields conformal predictors that output sets which are more than 50% larger than those corresponding to KT betting. The results localized coverage and width for a subset of learning rates are presented in Figure 8. We observe that the performance of the proposed parameter-free approaches is close or matches that of the competitors with carefully tuned learning rates. Our conformal predictor quickly restore coverage after a change in distribution has occurred and avoid being overly conservative once an underlying model adapts to the new settings (see bottom-left plot and η = 0.25 or η = 1). Adaptive Conformal Inference by Betting Figure 7. Comparison of our conformal predictor against those learned via OGD/SF-OGD with varying learning rates. We observe that the performance of the proposed parameter-free approaches is close or matches that of the competitors with carefully tuned learning rates. Importantly, it avoids outputting overly conservative sets. 0 250 500 750 1000 1250 1500 1750 2000 Time 0 250 500 750 1000 1250 1500 1750 2000 Time KT ONS = 1 = 0.25 = 4 Figure 8. Performance of several methods when a linear model, whose coefficients are learned by optimizing the weighted least squares objective. We observe that the performance of the proposed parameter-free approaches is close or matches that of the competitors with carefully tuned learning rates. The results are aggregated over 250 random seeds and smoothed using rolling window of size 10. Adaptive Conformal Inference by Betting C.2. Electricity Demand Dataset In Figure 9, we compare coverage and width (smoothed over a rolling window of 100 observations) of conformal predictors constructed using betting schemes against those based on OGD with varying learning rates. While for all methods the empirical coverage is near the nominal level, the width of a conformal predictor based on OGD with learning rate η = 0.1 is consistently higher than that of other methods. In Figure 10, we demonstrate the histograms for the ratios of the widths of the prediction intervals obtained from conformal predictors based on versions of online gradient descent to that of conformal predictors based on KT betting scheme. Ignoring the first 100 observations (warm-up period), the average width of intervals corresponding to OGD with learning rate η = 0.1 is almost 70% larger than that of KT-based conformal predictor. For SF-OGD with the same learning rate, this number reduces to only 3%. 0 10000 20000 30000 40000 Time step (a) Coverage. 0 10000 20000 30000 40000 Time step Figure 9. Comparison between conformal predictors constructed using betting schemes and versions of OGD (4) with learning rates η {0.01, 0.1}. For all methods the empirical coverage is near the nominal level. The width of a conformal predictor based on OGD with learning rate η = 0.1 is consistently higher than that of other methods. The results are smoothed over a rolling window of 100 observations. 0 1 2 3 4 5 0 Width (OGD, = 0.1) Width(KT) Width (SF-OGD, = 0.1) Width(KT) (a) Higher learning rate. 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 0 Width (OGD, = 0.01) Width(KT) Width (SF-OGD, = 0.01) Width(KT) (b) Lower learning rate. Figure 10. The histograms for the ratios of the widths of the prediction intervals obtained from conformal predictors based on versions of online gradient descent to that of conformal predictors based on KT betting scheme. The average width of intervals corresponding to OGD with learning rate η = 0.1 is almost 70% larger than that of KT-based conformal predictor, whereas for SF-OGD with the same learning rate the number reduces to 3%. For lower learning rates, the average width are almost equal. In Figure 11, we compare the KT-based conformal predictor against that based on OGD with either of two learning rates: 0.01 or 0.1. The prediction bands for conformal predictors based on KT-betting and OGD with learning rate η = 0.01 are visually very close, particularly for later time steps. Conformal predictor based on OGD with learning rate η = 0.1 yields sets that are generally larger. In Figure 12, we compare KT-based conformal predictor against that based on SF-OGD with the same learning rates: 0.01 or 0.1. The prediction bands for conformal predictors based on KT-betting and SF-OGD with either of the learning Adaptive Conformal Inference by Betting 100 150 200 250 300 350 400 1100 1150 1200 1250 1300 1350 1400 2100 2150 2200 2250 2300 2350 2400 0.00 3100 3150 3200 3250 3300 3350 3400 4100 4150 4200 4250 4300 4350 4400 Time Figure 11. Prediction bands for conformal predictors which are learned via KT-betting (green), OGD with learning rate η = 0.01 (yellow), and OGD with learning rate η = 0.1 (coral). Learning rate η = 0.1 yields conformal predictors that output overly large prediction sets across all time steps. KT-based and learning rate η = 0.01 bands are visually very close, particularly for later time steps. rates become visually indistinguishable, especially for later time steps. The difference between the outputs of conformal predictors based on SF-OGD with different learning rates diminishes due to effective learning rate that decreases over time. Adaptive Conformal Inference by Betting 100 150 200 250 300 350 400 1100 1150 1200 1250 1300 1350 1400 2100 2150 2200 2250 2300 2350 2400 3100 3150 3200 3250 3300 3350 3400 4100 4150 4200 4250 4300 4350 4400 Time Figure 12. Prediction bands for conformal predictors which are learned via KT-betting (green), SF-OGD with learning rate η = 0.01 (yellow), and SF-OGD with learning rate η = 0.1 (coral). While for the initial time steps the bands are close, they become visually indistinguishable for larger time steps. The difference between two learning rates essentially disappears due to effective learning rate that decreases over time. Adaptive Conformal Inference by Betting C.3. Stock Prices Data In this Section, we demonstrate the results of running different approaches for quantifying predictive uncertainty in stock price forecasting for MSFT (Figure 13 and Table 3), META (Figure 14 and Table 4), APPL (Figure 15 and Table 5), NFLX (Figure 16 and Table 6), and WMT (Figure 17 and Table 7). Coverage Width k KT OGD SF-OGD Native KT OGD SF-OGD Native 1 85.7 87.1 81.9 71.9 32.2 33.5 27.8 21.1 2 85.5 87.2 80.9 71.1 32.8 34.3 31 21.4 3 85.5 86.8 80.3 63.7 36.2 39.3 31.9 21.7 4 85.2 86.5 77.8 63.5 38.2 42 34.6 21.4 5 85 85.9 78 61.2 39.4 42.3 33.3 21.5 Table 3. The empirical coverage and average width of prediction intervals for MSFT stock. 2020-01 2020-07 2021-01 2021-07 2022-01 2022-07 2023-01 2023-07 KT OGD SF-OGD Native method (a) 1-day ahead. 2020-01 2020-07 2021-01 2021-07 2022-01 2022-07 2023-01 2023-07 KT OGD SF-OGD Native method (b) 5-days ahead. 2020-01 2020-07 2021-01 2021-07 2022-01 2022-07 2023-01 2023-07 Date Price KT OGD SF-OGD Native method (c) 1-day ahead. 2020-01 2020-07 2021-01 2021-07 2022-01 2022-07 2023-01 2023-07 Date Price KT OGD SF-OGD Native method (d) 5-days ahead. Figure 13. Top row: localized coverage for MSFT stock. The results are averaged over a rolling window of size 30. Bottom row: stock prices on Fridays plotted along with prediction bands corresponding to different methods. Adaptive Conformal Inference by Betting Coverage Width k KT OGD SF-OGD Native KT OGD SF-OGD Native 1 83.8 86.2 80.5 69 54.9 60.7 47.9 31.6 2 84.3 86.4 80 66 52.5 64.1 50.2 31.9 3 84.6 86.8 79.1 61.5 56.7 68 55.5 32.3 4 84.3 85.7 78.3 58.7 63.6 74.1 55.3 31.9 5 84.6 85 78.4 60.8 63.5 76.4 55.5 32.1 Table 4. The empirical coverage and average width of prediction intervals for META stock. 2020-01 2020-07 2021-01 2021-07 2022-01 2022-07 2023-01 2023-07 KT OGD SF-OGD Native method (a) 1-day ahead. 2020-01 2020-07 2021-01 2021-07 2022-01 2022-07 2023-01 2023-07 KT OGD SF-OGD Native method (b) 5-days ahead. 2020-01 2020-07 2021-01 2021-07 2022-01 2022-07 2023-01 2023-07 Date Price KT OGD SF-OGD Native method (c) 1-day ahead. 2020-01 2020-07 2021-01 2021-07 2022-01 2022-07 2023-01 2023-07 Date Price KT OGD SF-OGD Native method (d) 5-days ahead. Figure 14. Top row: localized coverage for META stock. The results are averaged over a rolling window of size 30. Bottom row: stock prices on Fridays plotted along with prediction bands corresponding to different methods. Adaptive Conformal Inference by Betting Coverage Width k KT OGD SF-OGD Native KT OGD SF-OGD Native 1 84.8 84.8 77.1 64.3 21.8 27.1 20.9 14 2 84.7 85.5 77 63 24.3 25.8 22.1 14.1 3 84.6 85.5 76.1 58.5 23.6 28.7 22.5 14.3 4 84.3 85.2 74.3 54.8 24.5 28.3 22.4 14.1 5 84.1 84.1 73.6 53.3 27.5 31.7 23.2 14.2 Table 5. The empirical coverage and average width of prediction intervals for APPL stock. 2020-01 2020-07 2021-01 2021-07 2022-01 2022-07 2023-01 2023-07 KT OGD SF-OGD Native method (a) 1-day ahead. 2020-01 2020-07 2021-01 2021-07 2022-01 2022-07 2023-01 2023-07 KT OGD SF-OGD Native method (b) 5-days ahead. 2020-01 2020-07 2021-01 2021-07 2022-01 2022-07 2023-01 2023-07 Date Price KT OGD SF-OGD Native method (c) 1-day ahead. 2020-01 2020-07 2021-01 2021-07 2022-01 2022-07 2023-01 2023-07 Date Price KT OGD SF-OGD Native method (d) 5-days ahead. Figure 15. Top row: localized coverage for APPL stock. The results are averaged over a rolling window of size 30. Bottom row: stock prices on Fridays plotted along with prediction bands corresponding to different methods. Adaptive Conformal Inference by Betting Coverage Width k KT OGD SF-OGD Native KT OGD SF-OGD Native 1 83.8 86.7 80.5 70.5 78.7 84.6 68.8 55.3 2 84.3 86.8 80.9 69.8 79 87.5 69.7 55.3 3 83.3 86.8 80.3 64.1 82.9 94.8 76.8 55.8 4 83.5 87 79.1 62.6 88.9 96.5 84.5 55.5 5 83.7 86.3 78 59.9 105.2 105.6 85.9 55.4 Table 6. The empirical coverage and average width of prediction intervals for NFLX stock. 2020-01 2020-07 2021-01 2021-07 2022-01 2022-07 2023-01 2023-07 KT OGD SF-OGD Native method (a) 1-day ahead. 2020-01 2020-07 2021-01 2021-07 2022-01 2022-07 2023-01 2023-07 KT OGD SF-OGD Native method (b) 5-days ahead. 2020-01 2020-07 2021-01 2021-07 2022-01 2022-07 2023-01 2023-07 Date Price KT OGD SF-OGD Native method (c) 1-day ahead. 2020-01 2020-07 2021-01 2021-07 2022-01 2022-07 2023-01 2023-07 Date Price KT OGD SF-OGD Native method (d) 5-days ahead. Figure 16. Top row: localized coverage for NFLX stock. The results are averaged over a rolling window of size 30. Bottom row: stock prices on Fridays plotted along with prediction bands corresponding to different methods. Adaptive Conformal Inference by Betting Coverage Width k KT OGD SF-OGD Native KT OGD SF-OGD Native 1 85.7 85.2 78.1 71.4 13.5 12.7 10.4 9.1 2 86 86 78.7 68.1 13.4 13.1 11.8 9.2 3 86.3 86.3 78.2 66.7 14.6 14.3 11.7 9.2 4 85.7 85.2 75.2 62.6 15.2 15.4 12.4 9.2 5 85.5 85.5 73.6 58.6 16.9 16.5 12.9 9.2 Table 7. The empirical coverage and average width of prediction intervals for WMT stock. 2020-01 2020-07 2021-01 2021-07 2022-01 2022-07 2023-01 2023-07 KT OGD SF-OGD Native method (a) 1-day ahead. 2020-01 2020-07 2021-01 2021-07 2022-01 2022-07 2023-01 2023-07 KT OGD SF-OGD Native method (b) 5-days ahead. 2020-01 2020-07 2021-01 2021-07 2022-01 2022-07 2023-01 2023-07 Date Price KT OGD SF-OGD Native method (c) 1-day ahead. 2020-01 2020-07 2021-01 2021-07 2022-01 2022-07 2023-01 2023-07 Date Price KT OGD SF-OGD Native method (d) 5-days ahead. Figure 17. Top row: localized coverage for WMT stock. The results are averaged over a rolling window of size 30. Bottom row: stock prices on Fridays plotted along with prediction bands corresponding to different methods.