# continuous_prediction_with_experts_advice__990343b5.pdf Journal of Machine Learning Research 25 (2024) 1-32 Submitted 7/22; Revised 11/23; Published 8/24 Continuous Prediction with Experts Advice Nicholas J. A. Harvey nickhar@cs.ubc.ca University of British Columbia Vancouver, BC, Canada Christopher Liaw cvliaw@google.com Google Mountain View, CA, USA Victor S. Portella victorsp@cs.ubc.ca University of British Columbia Vancouver, BC, Canada Editor: Ambuj Tewari Prediction with experts advice is one of the most fundamental problems in online learning and captures many of its technical challenges. A recent line of work has looked at online learning through the lens of differential equations and continuous-time analysis. This viewpoint has yielded optimal results for several problems in online learning. In this paper, we employ continuous-time stochastic calculus in order to study the discrete-time experts problem. We use these tools to design a continuous-time, parameterfree algorithm with improved guarantees on the quantile regret. We then develop an analogous discrete-time algorithm with a very similar analysis and identical quantile regret bounds. Finally, we design an anytime continuous-time algorithm with regret matching the optimal fixed-time rate when the gains are independent Brownian motions; in many settings, this is the most difficult case. This gives some evidence that, even with adversarial gains, the optimal anytime and fixed-time regrets may coincide. Keywords: experts, online learning, stochastic calculus, anytime, quantile regret 1. Introduction One of the cornerstone online learning (OL) tasks is prediction with experts advice or experts problem. In this problem, at each round t = 1, 2, . . . a player picks a probability distribution pt over n experts. Next, an adversary picks gains1 gt [ 1, 1]n for each of the experts. At the end of round t the player receives the expected gains p T t gt of the experts according to pt. The performance of the player is usually measured by the regret: the difference between the best expert s gains in hindsight and the players gains. Albeit classical, the experts problem already captures many of the key theoretical challenges in OL. Determining the minimax optimal regret has been a foundational research vein in OL. We focus on the analysis of optimal rates in two settings: anytime regret and quantile regret. 1. In this paper we use gains in [ 1, 1] instead of costs in [0, 1] due to parallels to random walks and Brownian motion. c 2024 Nicholas J. A. Harvey, Christopher Liaw, Victor S. Portella. License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v25/22-0803.html. Harvey, Liaw, Portella An intriguing question is to determine the optimal regret achievable by an anytime algorithm, that is, an algorithm that does not have access to the total number T of rounds. When the algorithm knows T beforehand, which we refer to as the fixed-time setting, the classical Multiplicative Weights Update (MWU) method (Vovk, 1990; Littlestone and Warmuth, 1994) suffers no more than 2T ln n regret, which is optimal in the worst-case (Cesa Bianchi et al., 1997). For the time being, the best anytime regret guarantee known is 2 T ln n using MWU with a time-varying step-size (Bubeck, 2011, Theorem 2.4). It is unknown if for general n there is an algorithm that guarantees regret smaller than 2 T ln n or whether one can prove a lower bound strictly better than 2T ln n. The classical notion of regret may not always be ideal. For example, one might not mind if the player performs badly when compared to the single best expert if they perform well when compared to some ε-quantile of the top experts, denoted by ε-quantile regret. The first algorithms aimed provably good quantile regret were proposed by Chaudhuri et al. (2009). Currently, the best-known ε-quantile regret guarantees are2 2 p 3T ln(1/ε) in the fixed-time setting (Orabona and Pal, 2016) and 4 p T ln(1/ε) in the anytime setting (Chernov and Vovk, 2010). Recently, Negrea et al. (2021) showed a lower-bound of p 2T ln(1/ε) on the ε-regret. Thus, the gap between the upper and lower bounds is 6 in the fixed-time setting and 2 2 in the anytime setting. 1.1 Our Contributions We present a continuous-time variant of the experts problem and use it to study minimax optimal (quantile) regret rates. Our setting is a simpler variant of the framework proposed by Freund (2009), but we use it as a guide in algorithm design. Namely, working in continuous time allows us to utilize powerful analytical tools from stochastic calculus, which often allow for simpler analyzes. In this paper, we use continuous-time techniques to obtain improved bounds on the minimax optimal ε-quantile regret and obtain intriguing results for the anytime continuous regret. Furthermore, we hope this to be a showcase of the potential of the impact of this continuous-time framework in research on the experts problem. Our specific results are as follows. Continuous MWU. To demonstrate the parallels between the discrete and continuous time problems, we describe a continuous-time version of MWU. In Theorem 3.4, we show that we can easily obtain bounds on the continuous regret that match the bestknown regret bounds for MWU: 2T ln n in the fixed time setting and 2 T ln n in the anytime setting. Continuous quantile regret. Taking inspiration from Itô s formula from stochastic calculus, we propose a new algorithm for quantile regret in continuous time. This algorithm has anytime continuous quantile regret bounds whose leading constants are better than any known results for the discrete-case (Theorem 4.5). The bounds hold for all ε (0, 1) and T > 0 simultaneously. The algorithm can be interpreted as parameter-free , since it does not involve a learning rate. 2. Ignoring low order terms relative to p T ln(1/ε) and multiplying by 2 due to gains being in [ 1, 1] instead of [0, 1]. Continuous Prediction with Experts Advice Discretized quantile regret bound. Next, we discretize the algorithm from the previous section while preserving the anytime quantile regret guarantees (Theorem 4.6). This algorithm is also parameter-free, and improves upon the best-known3 quantile regret bounds in the literature. Furthermore, our analysis closely matches the continuous-time analysis. Improved anytime continuous regret with independent experts. We design an anytime continuous-time algorithm that suffers at most 2T ln n regret (almost surely for all T > 0), asymptotically in n, when the gains are independent Brownian motions (Theorem 5.2). A simple argument shows that this is optimal (Proposition 5.3): for any algorithm and fixed time T > 0, the expected regret at time T exceeds 2T ln n(1 o(1)). Thus, against independent experts, the anytime setting is no harder than the fixed-time setting. This shows that independent experts, source of many of the known lowerbounds on regret, are not enough to prove tighter lower-bounds in the anytime setting in continuous time. This can also be seen as evidence that 2T ln n anytime regret against all adversaries might be possible, matching the optimal fixed-time regret. 1.2 Related Work Optimal regret in fixed and anytime settings. The most well-known algorithm for the experts setting is the Multiplicative Weights Update (MWU) algorithm (Littlestone and Warmuth, 1994; Vovk, 1990). In the fixed-time setting (with gains in [ 1, 1]), MWU achieves a regret bound of 2T ln n and this bound is tight (Cesa-Bianchi et al., 1997, Corollary 3.2.2). In the anytime setting, MWU with a dynamic step size is known to achieve a regret bound of 2 T ln n for all times T 0 (e.g. Cesa-Bianchi and Lugosi, 2006, 14, Nesterov, 2009, Theorem 4, and Bubeck, 2011, 2.5). It is not known whether the constant 2 is tight; the best known lower bound for the anytime setting is the 2T ln n which is inherited from the fixed-time setting. In the fixed-time setting, the minimax regret is known for n = 2, 3, 4 experts (Cover, 1967; Abbasi-Yadkori et al., 2017; Bayraktar et al., 2020a). In the anytime setting, we know an optimal algorithm only for n = 2 experts, where Harvey et al. (2020b) showed that the optimal regret is γ T for all T 0 where γ 1.3069. Another model for regret introduced by Gravin et al. (2016) is the geometric stopping time model in which the number of rounds is a geometric random variable. There is a growing body of work in exploring connections between PDEs and the expert problems in the pursuit of optimal algorithms (Andoni and Panigrahy, 2013; Bayraktar et al., 2020a,b; Drenska, 2017; Drenska and Kohn, 2020; Kobzar et al., 2020). Recently, Zhang et al. (2022) also used PDE techniques to obtain an optimal algorithm for unconstrained online linear optimization. Quantile regret. Chaudhuri et al. (2009) introduced the notion of ε-regret where instead of comparing with the best expert, one compares with the εn -th best expert (amongst n total experts). They devised the Normal Hedge algorithm which they prove has an ε-quantile regret of O( p T ln(1/ε) + ln2 n) in T rounds. Moreover, the bound holds for all ε and T 3. In independent work, Zhang et al. (2022) developed an algorithm using coin-betting and a similar potential function to the one we use and which yields a similar quantile regret bound. We further discuss how to obtain the bound from their results in Section 4.1. In contrast, our analysis is quite self-contained, and avoids using the coin-betting framework. Harvey, Liaw, Portella simultaneously. A somewhat different bound of O( p T(ln ln T + ln 1/ε)) was proved by Luo and Schapire (2015) and Koolen and van Erven (2015). Yet, optimal worst-case regret bounds are not the goal in these works. Instead, they seek to design algorithms with adaptive guarantees on the regret such as second-order bounds, i.e., bounds that depend on some sort of variance on the choices of the adversary. Curiously, all of these works make use of a potential function to control the regret. Our work also makes use of a potential function inspired by the work of Harvey et al. (2020b) which may be somewhat reminiscent of the potentials used by Chaudhuri et al. (2009) and Luo and Schapire (2015). It is possible to improve upon the above bounds. Indeed, Chernov and Vovk (2010, Theorem 3), Foster et al. (2015, Example 5.1), Orabona and Pal (2016, Corollary 6), and Negrea et al. (2021, Corollary 2) show that it is possible to obtain an ε-quantile regret of O( p T ln(1/ε)). This turns out to be tight up to constant factors (Negrea et al., 2021, Theorem 1). We note that Foster et al. (2015), Orabona and Pal (2016), and Negrea et al. (2021) derive regret bounds which depend on the KL divergence between a known prior and the player s probability distribution at a specific point of time; ε-quantile regret bounds can be recovered as a special case of such bounds. In this paper, we also recover the O( p T ln(1/ε)) bound on the ε-quantile regret although, as we shall see, we obtain an improved constant in front of the p T ln(1/ε) term. Connection to Stochastic Control. The continuous-time setting we consider in this paper has a similar formulation to problems in optimal control (Bertsekas, 2005). In this field, a classical approach is deriving the optimal control via the Hamilton-Jacobi-Bellman (HJB) equations (for example see Bertsekas, 2005, Section 3.2). In fact, for the experts problem in discrete time similar techniques were already used, such as in Cesa-Bianchi et al. (1997), but even then analyzing the regret of such an optimal policy is challenging or impossible. For the continuous-time setting, the HJB equations would involve a non-smooth value function and then one would need to resort to viscosity solutions. There is an emerging line of work using such techniques to obtain regret bounds, such as the recent solution for the minimax regret with 4 experts (Bayraktar et al., 2020a). Yet, these techniques often rely on knowing the time-horizon, and our focus in this paper is to analyze anytime regret bounds. It is still an interesting direction of future research to investigate the use of classical control techniques in the continuous experts problem. 1.3 Basic Notation We use [n] to denote the set {1, . . . , n}. For a predicate P, we write [P] to be 1 if P is true and 0 otherwise. Moreover, if [P] is multiplying an invalid expression (such as one with a division by 0) and P is false, we consider the whole expression to be 0. Set [α]+ := max{α, 0} for all α R. We use 1 Rn to denote the all-ones vector and ei Rn for i [n] the indicator vector given by ei(j) := [i = j] for all j [n]. We denote the (n 1)-dimensional probability simplex by n := p [0, 1]n : 1Tp = 1 . For partial derivatives, we write i := xi and ij := xi,xj. Lastly, for x Rn and ε (0, 1), we write quantile(ε, x) = xπ( εn ) where π: [n] [n] is a permutation with xπ(1) . . . xπ(n). (1) Continuous Prediction with Experts Advice 2. The Continuous Prediction Problem As in the discrete experts problem, we have a number n N of experts to choose from. In the continuous time setting, we model the cumulative gain of each expert as a mixture of n independent Brownian motions, as done by Freund (2009). Formally, let B1, . . . , Bn be n independent standard Brownian motion processes (i.e., B is an n-dimensional standard Brownian motion). The cumulative gain process Gi(t) of expert i [n] is given by the following stochastic differential equation (SDE) j=1 w(i) j (t) d Bj(t) =: w(i)(t), d B(t) , t 0, i [n], where (w(i)(t))t 0 is any continuous stochastic process4 in Rn, not necessarily non-negative, such that w(i)(t) 2 = 1 at all times t 0. For example, if w(i)(t) = ei for all i [n] and t 0, then G(i)(t) is an independent Brownian motion for each i [n]. An analogous situation in discrete time would be each expert receiving { 1} gain uniformly at random at each step, so each cumulative gain would be a standard random walk.5 In our analysis the instantaneous covariance matrix Σ(t) between the gain processes will be prominent. Formally, we define Σ(t) Rn n by Σij(t) := w(i)(t), w(j)(t) , t 0, i, j [n]. From its definition, we have that Σ(t) is a positive semi-definite matrix with ones along its diagonal. Next, we define what a player strategy is in continuous-time and its corresponding regret. A player (strategy) is a left-continuous6 process (p(t))t 0 on Rn such that p(t) n for all t 0, where n is the (n 1)-dimensional simplex. The player gain process (A(t))t 0 is given by i=1 pi(t) d Gi(t) = p(t), d G(t) . Moreover, the (continuous) regret vector process is given by Ri(t) := Gi(t) A(t), i [n], t 0. That is, Ri(t) is the regret (in the online learning sense) of the player with respect to expert i. Finally, the continuous regret (of the player strategy (p(t))t 0) at time T is Cont Regret(T) := max i [n] Ri(T) = max i [n] Gi(T) A(T) Also, define the continuous ε-quantile regret to be Quant Regret(ε, T) := quantile(ε, R(T)). In this paper we will investigate player strategies that guarantee bounds on the (quantile) regret that hold almost surely, that is, with probability 1. 4. Any stochastic process we mention in this paper is adapted to the filtration generated by (B(t))t 0 5. Intuitively, that is a reason results in this setting mirrors the discrete-time case with costs in [ 1, 1]. 6. One might loosen this to only assuming (p(t))t 0 is predictable. For a discussion, see Appendix C. Harvey, Liaw, Portella 3. A Continuous Multiplicative Weights Update Method In this section, we describe a continuous-time version of the classical Multiplicative Weights Update (MWU) method. This serves as a way to introduce some of our technical tools while avoiding the complexities we later introduce in the choice of potential function. Furthermore, we show bounds on its continuous regret that exactly match the bounds that the discrete algorithm enjoys, giving evidence of the parallels between the discrete and continuous time settings. Analogous to the discrete version of MWU, we want the probability mass of an expert i at time t to be proportional to exp(ηt Gi(t)), where ηt is some positive learning rate that is non-increasing in t. A familiar approach (see, e.g., Cesa-Bianchi and Lugosi, 2006, page 14 and Bubeck, 2011, 2.5) is to use the Log Sum Exp function given by Φ(t, x) := [ηt > 0] i=1 eηtxi with ηt 0, t 0, x Rn. (2) In our case, the main property that we shall use from Φ is that xΦ(t, ) is the softmax function. That is, xΦ(t, x) n and ( xΦ(t, x))i exp(ηtxi), which is exactly the probability mass MWU places on expert i with cumulative gain xi. Thus, we define the player strategy (p(t))t 0 by p(t) := xΦ(t, G(t)), t 0. (3) To analyze the regret of p, we need a way to handle A(t) = R t 0 p(s), d G(s) . This is a stochastic integral, so we may use Itô s formula (Theorem 3.1), which one can think of as the analogue of the fundamental theorem of calculus for stochastic integrals. Theorem 3.1 (Itô s Formula, Revuz and Yor, 1999, Theorem IV.3.3) Let F : R Rn R be continuously differentiable in its first argument and twice continuously differentiable in its second argument and let (Xt)t 0 be a continuous semimartingale in Rn. Then, for any T 0 F(T, X(T)) F(0, X(0)) = Z T 0 x F(t, X(t)), d X(t) + Z T 0 t F(t, X(t)) dt i,j [n] xi,xj F(t, X(t)) d[Xi, Xj]t In the third derivative above we use the bracket notation: for two continuous (local) martingales M and N, the process [M, N], denoted as the bracket of M and N, is the unique adapted continuous process such that MN [M, N] is a local martingale (Revuz and Yor, 1999, Theorem IV.1.9). 3.1 Bracket Processes In general, computing the bracket of two stochastic processes may be non-trivial, which makes the application of Itô s formula more challenging. Luckily, all the process we deal with are defined as stochastic integrals with respect to other continuous martingales which, Continuous Prediction with Experts Advice ultimately, depend only on integrals with respect to Brownian motions. For example, A(t) is defined as (a sum of) stochastic integrals of a left-continuous and bounded function p(t) with respect to the process G(t). The latter is also a continuous martingale since it is a stochastic integral of a continuous and bounded function w(i)(t) with respect to the Brownian motion B(t), which is also a martingale. This is specially useful to compute the bracket of two of these processes. More specifically, using that [Bi, Bj]t = ( 0 if i = j, t if i = j, (4) we can compute the bracket of martingales by use of formal rules to manipulate the differential symbols (see Cohen and Elliott, 2015, Remark 14.2.7 or Øksendal, 2003, Theorem 4.1.2). More specifically, for two continuous martingales M and N, we have d[M, N]t = d M(t) d N(t), and for the right-hand side above we usually can expand according to our definitions. In our case, we can always expand these expressions until they are written only in terms of the differentials of Brownian motions, and such expressions can be simplified using (4). This is one of the major reasons why the gain processes in our setting defined in (2) are each a mixture of independent Brownian motions. Indeed, the continuous-time setting would be well-defined in the case when (Gi(t))t 0 is a continuous semi-martingale for each i [n], but the results would heavily hinge on the bracket processes of different gains. We do not explore this general case since gains as in (2) already yields an expressive continuous-time setting with results that translate to the discrete-time experts problem. 3.2 Analysis of Continuous Multiplicative Weights Update We can now combine these rules on manipulation of bracket processes together with Itô s formula to analyze to regret of continuous MWU. Lemma 3.2 Let Φ be defined as in (2), p be as in (3), Then, almost surely for all T > 0 Cont Regret(T) Φ(0, 0) + Z T tΦ(t, G(t)) + 1 i,j [n] ijΦ(t, G(t))Σij(t) dt. (5) Proof For all t 0 and i, j [n], d[Gi, Gj]t = d Gi(t) d Gj(t) = w(i)(t), d B(t) w(j)(t), d B(t) k=1 w(i) k (t) w(j) k (t) dt = Σij(t) dt, where the second to last equation follows from (4) and in the last equation we use the fact that Σij(t) = wi(t), w(j)(t) . Harvey, Liaw, Portella Let T > 0. Itô s formula (Theorem 3.1) allows us to express A(T) as 0 xΦ(t, G(t)), d G(t) = Φ(T, G(T)) Φ(0, 0) Z T tΦ(t, G(t)) + 1 i,j [n] ijΦ(t, G(t))Σij(t) dt, Thus, almost surely, Cont Regret(T) = max i [n] Gi(T) Φ(T, G(T)) + Φ(0, 0) tΦ(t, G(t)) + 1 i,j [n] ijΦ(t, G(t))Σij(t) dt. Finally, recall that the Log Sum Exp function smoothly approximates the maximum function, that is, max i [n] xi Φ(T, x) max i [n] xi + log n This implies that maxi [n] Gi(T) Φ(T, G(T)) 0. Thus, for any T > 0, we have that (5) holds almost surely. To switch the order of quantifiers and ensure (5) for all T > 0 almost surely, note first that by an union bound we can guarantee that (5) holds for all rational t > 0 almost surely. Since Cont Regret(t) is continuous in t (see Appendix C for details), we conclude that (5) holds for all T > 0 almost surely. At this point, to bound the continuous regret of (p(t))t 0 it suffices to bound the partial derivatives of Φ. Lemma 3.3 bounds these partial derivatives in terms of a tunable learning rate ηt; minimizing the regret bound boils down to optimizing ηt. We defer the proof of the following lemma to Appendix B since it essentially follows from simple properties of the Log Sum Exp function. Lemma 3.3 Let Φ be as in (2). Let ηt be constant in t or of the form c t, with c > 0. Then, i,j [n] ijΦ(t, x)Σij ηt 2 and tΦ(t, x) log n 2tηt , t 0, x Rn. Theorem 3.4 summarizes the continuous regret bounds for MWU with properly chosen learning rates, both for the fixed-time and anytime settings. Crucially, these regret bounds match the best known regret bounds for the discrete-time MWU method (see Bubeck, 2011, Theorems 2.1 and 2.4). Theorem 3.4 Let Φ be as in (2) and T be a positive number. If ηt := p 2 ln n/T for all t 0, then Cont Regret(T) 2T ln n almost surely. If ηt := [t > 0] p ln n/t for all t 0, then, almost surely, Cont Regret(t) 2 t ln n for all t 0. Continuous Prediction with Experts Advice Proof Let us first consider the fixed-time case, that is, ηt := p 2 ln n/T for all t 0. In this case we have tΦ(t, x) = 0 since Φ( , x) is constant for any x Rn. Moreover, note that Φ(0, 0) = ln n/η0 = ln n/ηT . Combining this with Lemmas 3.2 and 3.3, we have Cont Regret(T) Φ(0, 0) + Z T i,j [n] ijΦ(t, G(t))Σij dt ln n Let us now consider the anytime case, that is, when ηt := [t > 0] p ln n/t for all t 0. In this case we have Φ(0, 0) = 0, but tΦ(t, x) is not necessarily 0 anymore. By Lemma 3.3, we have tΦ(t, G(t)) + 1 i,j [n] ijΦ(t, G(t))Σij log n Thus, we have Cont Regret(t) R t 0 s ds 2 t log n for any t 0 almost surely. It is intriguing that these bounds on the continuous regret differ by a factor of 2, exactly as in the discrete experts problem. A natural question is whether there is an anytime algorithm that enjoys continuous regret bound smaller than 2 t log n. This is the topic we investigate in Section 5. 4. Quantile Regret Bounds with the Confluent Hypergeometric Potential In this section, we design a different algorithm for the continuous prediction problem. We choose a potential function inspired by Itô s formula and obtain quantile regret bounds that are better than the ones known with a relatively simple proof. Furthermore, we show that this strategy has a simple discretization and obtain an algorithm with the same bounds for the discrete experts problem. In Section 5 we shall see how a similar algorithm suggests an intriguing result for the anytime setting. This time around we analyze a player strategy parameterized by a function of R(t) instead of G(t). That is, let Φ: R 0 Rn R be a continuously differentiable function, which we refer to as a potential function. We consider the player strategy (p(t))t 0 given by7 p(t) := 1 1T xΦ(t, R(t)) xΦ(t, R(t)), t 0, (6) setting p(t) := 1 n1 when 1T xΦ(t, R(t)) = 0. This class of player strategies mimics the potential-based strategies from the discrete experts problems (Cesa-Bianchi and Lugosi, 2006, Chapter 2). As in the discrete case, if Φ is the Log Sum Exp potential from (2), we obtain the same player strategy of the last section. In the next lemma we use Itô s formula to get a useful expression for Φ(T, R(T)) that, in turn, will guide us in the choice of Φ. 7. Throughout this paper all entries of xΦ(t, R(t)) have the same sign, implying that p(t) n. This p(t) can be discontinuous when xΦ(t, R(t)) = 0, so we need to ensure it is predictable. This issue is discussed in Appendix C. Harvey, Liaw, Portella Lemma 4.1 Let Φ: R 0 Rn R be one time continuously differentiable on its first argument and two-times continuously differentiable on its second argument. Let the player strategy (p(t))t 0 be as in (6). Then, almost surely for all T 0 we have Φ(T, R(T)) Φ(0, 0) = Z T tΦ(t, R(t))+ 1 i,j [n] ijΦ(t, R(t))(ei p(t))TΣ(t)(ej p(t)) dt. (7) In particular, if for all t 0 and x Rn we have ijΦ(t, x) = 0 for all distinct i, j [n] and iiΦ(t, x) 0 for each i [m], then almost surely for all T 0 we have Φ(T, R(T)) Φ(0, 0) Z T tΦ(t, R(t)) + 2 i=1 iiΦ(t, R(t)) dt. (8) Proof Let T 0. Itô s formula gives us a useful formula to compute the evolution of the potential: Φ(T, R(T)) Φ(0, 0) = Z T 0 xΦ(t, Rt), d R(t) + Z T 0 tΦ(t, R(t)) dt 0 ijΦ(t, R(t)) d[Ri, Rj]t. For the first term above, note that xΦ(t, R(t)), d R(t) = 1T xΦ(t, R(y)) p(t), d R(t) by the definition of p(t) in (6). Furthermore, this term is zero since p(t), d R(t) = p(t), d G(t) p(t), 1 d A(t) = d A(t) d A(t) = 0. Finally, it only remains to show that d[Ri, Rj]t = (ei p(t))TΣ(t)(ej p(t)) dt for all i, j [n] to conclude the proof of (7). By the definition of Ri(t), we have d Ri(t) = d Gi(t) d A(t) = d Gi(t) k=1 pk(t) d Gk(t) = ei p(t), d G(t) . Moreover, using that d Bi(t) d Bj(t) = [i = j] dt, we have d Gi(t) d Gj(t) = w(i)(t), d B(t) w(j)(t), d B(t) = w(i)(t), w(j)(t) dt = Σij(t) dt. Combining these two facts we have d[Ri, Rj]t = d Ri(t) d Rj(t) = ei p(t), d G(t) ej p(t), d G(t) = (ei p(t))TΣ(t)(ej p(t)) dt, which concludes the proof of (7) Let us now prove (8). Suppose that for all t 0 and x Rn we have ijΦ(t, x) = 0 for all distinct i, j [n]. Then, Φ(T, RT ) Φ(0, 0) = Z T tΦ(t, R(t)) + 1 i=1 (ei p(t))TΣ(t)(ei p(t)) iiΦ(t, R(t)) Continuous Prediction with Experts Advice Since Σ(t) is positive definite with ones in its diagonal entries, we have |Σij(t)| |Σii(t)| = 1. Therefore, for any v Rn, we have v TΣ(t)v v 2 1. Thus, if iiΦ(t, x) 0 for all i [n], then the second inequality in the lemma follows since ei p(t) 1 2 for all i [n]. Figure 1: Plot of φ(1, x) in red and of the bound from Lemma A.3 in blue. The second expression in Lemma 4.1 (Eq. (8)) hints at properties of potential functions Φ that may be particularly useful. More precisely, separable functions Φ that satisfy a diffusion constraint of the form ( t + 2 Pn i=1 ii) Φ(t, α) 0 would guarantee that Φ(t, R(t)) is non-decreasing in t, which in turn may allow us to bound the continuous regret. The player strategies in the rest of this paper involve the function M0 defined as M0(α) := eα πα erfi( α), α R, (9) where erfiis the imaginary error function given by erfi(α) := (2/ π) R α 0 exp(β2) dβ for all α R. This is an example of a confluent hypergeometric function (of the first kind). We use M0 in the form t M0 [α]2 + 2t , α R, t > 0. (10) Similar functions have been used in the stochastic process literature (Breiman, 1967; Davis, 1976; Perkins, 1983) and in the online learning literature (see Harvey et al. 2020b, eq. (2.6) and Zhang et al. 2022, eq. (11)). Two particularly useful properties of φ are: 2 xx)φ(t, α) is zero8 for all t > 0 and α 0, and non-negative for all α < 0. This is a PDE known as the backwards heat equation (BHE). Diffusion terms like these appear in Itô s formula, so functions satisfying the BHE are well-behaved under stochastic integration. α2 eα2/2 (see Lemma A.3 and Figure 1), and so the potential resembles the density of the normal distribution. Potentials of this form have been useful in the literature such as for Normal Hedge (Chaudhuri et al., 2009) and Ada Normal Hedge (Luo and Schapire, 2015). Moreover, Freund (2009) has used these normal-like potentials in continuous time. The algorithm in this section uses the separable potential function Φ given by9 i=1 φ t, xi t > 0, x Rn. (11) 8. Actually, φ is not doubly differentiable in its second argument because of the truncation in the definition of φ. Although this might seem like a problem to apply Itô s formula, we luckily have a single point of non-differentiability at each time t 0, and the truncation makes ( t + 1 2 xx)φ(t, x) no smaller everywhere else. Thus, standard smooth-truncation arguments can be made to apply Itô s formula. For an example, see Harvey et al. (2020a, Section 5.2.2). For the sake of simplicity, we set xxφ(t, 0) := limε 0 xxφ(t, ε) 9. Ideally we would modify the potential by eliminating the denominator of 2. If this could be analyzed, it would yield an optimal quantile regret bound. At present, we have been unable to accomplish this. Harvey, Liaw, Portella Lemma 4.2 Let Φ and (p(t))t 0 be as in (11) and (6), respectively. Then Φ(T, R(T)) 0 for all T 0 almost surely. Note that if Φ(0, 0) = 0 then Lemma 4.2 would immediately follow from Lemma 4.1 (in particular, Eq. (8)) and our choice of Φ since Φ is concave and ( t + 2 Pn i=1 ii) Φ(t, x) 0. The only minor snag is that Φ(0, 0) is not well-defined since Eq. (10) would involve a division by zero. Nonetheless, it is possible to resolve this issue since limδ 0 Φ(δ, 0) = 0; the details are relegated to Appendix D. It remains to translate bounds on the value of the potential Φ(t, R(t)) to bounds on the continuous regret. The following function is used in the regret bounds throughout the remainder of the paper. Definition 4.3 For α R 0, let λ(α) > 0 be the unique positive solution to the equation α = M0(λ(α)2/2) φ(1, λ(α)). (12) We note that the function M0(x2/2) is strictly decreasing on R 0 and the image of M0 on R 0 is ( , 1] (see Appendix A). In particular, a solution to Eq. (12) exists and is unique so λ(α) is well-defined for all α 0. We also note that λ(α) is strictly increasing in α. The next lemma show us how bounds in the value of Φ(t, x) can be translated into bounds on the quantiles of x, which were defined in (1). Lemma 4.4 Let T > 0 and x Rn. Suppose that Φ(T, x) 0. Then, for any ε (0, 1], quantile(ε, x) 2λ 1 ε Proof For simplicity, suppose x1 x2 . . . xn. Since φ(T, ) is decreasing, we have φ(T, xεn/2) φ(T, xi/2) for every i εn. Summing this inequality for i {1, , εn}, using the assumption Pn i=1 φ(T, xi/2) = Φ(T, x) 0 and since φ(T, α) T for any α R, we have εnφ(T, xεn/2) i=1 φ(T, xi/2) i=εn+1 φ(T, xi/2) (1 ε)n By the definition of φ, the above series of inequalities implies TM0 ([xεn]+/2)2 T = M0 ([xεn]+/2)2 Using the definition of λ and the fact that M0 is a decreasing function, we have M0 ([xεn]+/2)2 = xεn 2λ(γ) The second inequality in (13) follows from Lemma A.5. Finally, we can combine Lemma 4.2 and Lemma 4.4 to prove a bound on the quantile regret in continuous-time. These quantile regret bounds improve upon the best known in the discrete case. In Section 4.1 we discretize this algorithm while preserving the same quantile regret bound. Theorem 4.5 Let Φ and (p(t))t 0 be as in (11) and (6), respectively. Then almost surely Quant Regret(ε, T) 2λ((1 ε)/ε) Continuous Prediction with Experts Advice 4.1 Discretization In this section, we propose an algorithm for the original experts problem based on the continuous-time solution of the previous section. As in the continuous setting, we have n N experts. At each round t N, the player picks a probability vector pt n and the adversary picks a gain vector gt [ 1, 1]n. The instantaneous regret vector at round t 1 is given by rt := gt 1 p T t gt. Moreover, define the regret vector at round t by Rt := Pt s=1 rs. To discretize the algorithm from the previous section, we shall make use of discrete derivatives in a way similar to Harvey et al. (2020a). For a bivariate function f, define its discrete derivatives as ft(t, x) = f(t, x) f(t 1, x), fx(t, x) = f(t, x + 1) f(t, x 1) fxx(t, x) = (f(t, x + 1) + f(t, x 1)) 2f(t, x). Let Φ be defined as in Eq. (11). For i [n], we define the discrete derivative of Φ as i=1 φt t, xi ; Φi(t, x) = 1 ; Φii(t, x) = 1 For notation convenience, we also define the discrete gradient e Φ(t, x) := (Φ1(t, x), . . . , Φn(t, x))T. The algorithm we use for the discrete setting is the natural analogue of the algorithm for the continuous setting as defined in Eq. (6). Specifically, for t N 1, we set n1 if e Φ(t, Rt 1) = 0 1 1T e Φ(t,Rt 1) e Φ(t, Rt 1) otherwise. (15) Note that φ(t, x) is non-increasing in x so e Φ(t, x) 0 (component-wise), which implies pt n. Let us now analyze the performance of this algorithm. We summarize our results in the next theorem. Theorem 4.6 We have quantile(ε, RT ) 2λ((1 ε)/ε) 8 ln(1/ε) + 20 Let c be the optimal constant multiplying p T ln(1/ε) in the minimax optimal ε-quantile regret for anytime algorithms. Theorem 4.6 shows c 2 2. Previously, Chernov and Vovk (2010, Theorem 3) and Negrea et al. (2021, Example 3) both proved10 c 4. On the lower bound side, Negrea et al. (2021, Theorem 1) proved that c 2. So there remains a gap of 2 for the constant c. Finally, we note that if T is known beforehand (the fixed-time setting), Orabona and Pal (2016, Corollary 6) showed c 2 3. Theorem 4.6 improves this to c 2 2. Interestingly, Zhang et al. (2022) proposed independently of us an algorithm using coinbetting with a potential similar to (11) also inspired by the work of Harvey et al. (2020b). Although their work is mostly on unconstrained online learning, one can obtain an algorithm 10. Note that these papers consider costs in [0, 1], so their results must be multiplied by 2 to coincide with our setting. Harvey, Liaw, Portella for the experts problem from their coin-betting algorithm whose ε-quantile regret is similar to our bound from Theorem 4.6, that is, roughly no more than 2 p 2T ln(1/ε). More precisely, Orabona and Pal (2016, Theorem 4) show how to obtain an algorithm for the experts problem with costs in [0, 1] from a coin-betting algorithm together with regret guarantees against any comparison point u n. We may obtain bounds on the ε-quantile regret by noting that it is the same as the regret against all points u n with εn non-zero equal entries. Combining this with Theorem 1 and Lemma B.2 of Zhang et al. (2022) yields the desired bound. In contrast, our analysis in this section is tailored specifically for the experts problem, and does not make use of the coin-betting framework. To prove Theorem 4.6, we make use of the following lemma. Lemma 4.7 Let x1, x2, be a sequence of real numbers such that |xt xt 1| 1. Then for any function f that is concave in its second argument and any integer T 2, we have f(T, x T ) f(1, x1) t=2 fx(t, xt 1) (xt xt 1) + 2fxx(t, xt 1) + ft(t, xt 1) . The above lemma is essentially a corollary of the following discrete version of Itô s formula (see Klenke, 2008, Example 10.9 or Harvey et al., 2020a, Lemma 3.13). Lemma 4.8 (Discrete Itô s Formula) Let x1, be a sequence of real numbers. Then for any function f and any fixed time T 2, we have f(T, x T ) f(1, x1) = t=2 f(t, xt) f(t, xt 1 + 1) + f(t, xt 1 1) 2fxx(t, xt 1) + ft(t, xt 1) . Proof of Lemma 4.7 We prove the following statement. Let f be a bivariate function that is concave in its second argument. Then for all t, x R and y [ 1, 1] we have f(t, x + y) f(t, x + 1) f(t, x 1) 2 fx(t, x) y. Equality holds for y { 1, 1}. Since the LHS is concave in y and the RHS is linear in y, the inequality holds for all y [ 1, 1]. The lemma now follows from Lemma 4.8 with x = xt 1 and y = xt xt 1. We now prove a discrete analogue of the second assertion of Lemma 4.1. Note that for technical reasons we start at t = 1 instead of t = 0. One can directly deal with the case t = 0 by customizing Eq. (10). However, this cumbersome approach does not yield improved bounds. Lemma 4.9 Fix any T 2. Then Φ(T, RT ) Φ(1, R1) Φt(t, Rt 1) + 2 i=1 Φii(t, Rt 1) φt(t, Rt 1,i/2) + 1 2φxx(t, Rt 1,i/2) . Continuous Prediction with Experts Advice Proof Note that the equality follows from the definition of Φii and Φt. Here, we prove that the term on the left is an upper bound on the final term on the right. Recall that Φ(t, Rt) = Pn i=1 φ(t, Rt,i/2). Since the gains are in [ 1, 1], it follows that |(Rt,i Rt 1,i)/2| 1. From Lemma 4.7, since φ is concave in its second argument, we have φ(T, Rt,i/2) φ(1, R1,i/2) i=1 φx(t, Rt 1,i/2) (Rt,i Rt 1,i) 2φxx(t, Rt 1,i/2) + φt(t, Rt 1,i/2) . To prove the lemma, it suffices to show that the first sum on the RHS of Eq. (17) is exactly 0. To that end, fix t [T] such that t = 1. We have Rt,i Rt 1,i = gt,i p T t gt. Hence, i=1 φx(t, Rt 1,i/2) (Rt,i Rt 1,i) = i=1 φx(t, Rt,i/2) (gt,i p T t gt). (18) If φx(t, Rt 1,i/2) = 0 i [n] then the RHS of (18) is 0. Otherwise, with pt as defined in (15), p T t gt = Pn i=1 φx(t, Rt 1,i/2) gt,i Pn i=1 φx(t, Rt 1,i/2) . (19) Plugging (19) into (18) gives Pn i=1 φx(t, Rt 1,i/2) (Rt,i Rt 1,i) = 0, as required. In the continuous setting, we used the key fact that for any t R 0 and x R, we have t + 1 2 xx φ(t, x) 0. Luckily, the same fact holds in the discrete setting. Lemma 4.10 For any t > 1 and x R, we have φt(t, x) + 1 2φxx(t, x) 0. Before we prove the above lemma, let us combine it together with the results in this section to prove Theorem 4.6. Proof of Theorem 4.6 Lemma 4.9 and Lemma 4.10 imply that Φ(T, RT ) Φ(1, R1) for all T 1. Note that Φ(1, R1) = Pn i=1 φ(1, R1,i/2) n M0(1/2) > n M0(γ2/2) = 0. The bound on the quantile regret now follows from Lemma 4.4. Finally, let us conclude this section with the proof of Lemma 4.10. Proof of Lemma 4.10 Recalling the definition of the discrete derivatives (see Eq. (14)), we have that φt(t, x) + 1 2φxx(t, x) = φ(t, x + 1) + φ(t, x 1) 2 φ(t 1, x). Hence, it suffices to prove that φ(t, x + 1) + φ(t, x 1) 2φ(t 1, x). (20) We consider several cases depending on the value of x. Harvey, Liaw, Portella Case 1: x 1. In this case, Eq. (20) is equivalent to Note that t > 1 so all terms are well-defined. Rearranging, this is equivalent to which follows from Lemma A.4 by setting x and z in Lemma A.4 to x/ t, respectively. Case 2: x [0, 1]. Note that Eq. (21) holds for all x R (because Lemma A.4 holds for all x R). Next, observe that φ(t, x) M0(x2/2t) with equality whenever x 0 (this is because M0(x2/2t) is increasing on the interval ( , 0] while φ(t, x) = φ(t, 0) due to the truncation defined in Eq. (10)). So the LHS of Eq. (21) is upper bounded by the LHS of Eq. (20) and the RHS of Eq. (21) is equal to the RHS of Eq. (20). So Eq. (21) holds in this case as well. Case 3: x [ 1, 0]. Note that φ(t, x + 1) is the only term in Eq. (20) that is not constant on the interval [ 1, 0]. Further, note that Eq. (20) holds for x = 0 by the previous case, and it holds for x = 1 since the LHS is 2 t and the RHS is 2 t 1. So the inequality holds since φ is concave in its second argument (Lemma A.2). Case 4: x 1. Due to the truncation in Eq. (10), Eq. (20) becomes 2 5. Minimax Optimal Continuous Regret with Independent Experts We have shown that an algorithm using the hypergeometric potential (11) suffers no more than 2 2T ln n at any round T (Theorem 4.5). Similarly, we saw that the anytime continuous MWU suffers at most 2 T ln n regret at any round T (Theorem 3.4) while its fixed-time variant suffers no more than 2T ln n regret. A natural question is whether there are anytime algorithms in the continuous setting that enjoy regret almost surely smaller than 2 T ln n. In the discrete version of the problem, we know that no algorithm can guarantee regret smaller than 2T ln n, although there is no known anytime algorithm that attains the lower-bound. This lower-bound comes from the fact that the expected regret against a simple adversary that assigns 1 gains uniformly at random the uniformly random adversary, for short on the experts gets arbitrarily close to 2T ln n as n grows, regardless of the player s strategy. Interestingly, no tighter lower-bounds are known for anytime algorithms, that is, algorithms that do not know the length of the game. Furthermore, for two experts (n = 2) we know that there is a separation between the minimax optimal regret in the fixed-time and anytime settings, and both lower bounds arise from the expected regret against a simple adversary similar to the uniformly random one! Namely, for fixed-time algorithms the best possible regret is p 2T/π (Cover, 1967) while for anytime algorithms the best possible regret is λ(0) T (Harvey et al., 2020b), where λ(0) 1.3069 > p 2/π 0.798. In both cases the lower bound arises from the expected regret (with a suitably chosen stopping-time in the anytime case) against an adversary the picks uniformly at random a cost vector from Continuous Prediction with Experts Advice {( 1, 1)T, (1, 1)T} each round. Yet, to this day we do not know if the minimax regret attained by fixed-time and anytime algorithms are different as n tends to infinity. At this point, one may hope that we can get a better lower bound with the uniformly random adversary by constructing a good stopping time as in the n = 2 case. Intriguingly, we show that in continuous time with independent experts that is, when each (Gi(t))t 0 is an independent Brownian motion there is an anytime algorithm whose regret is at most λ(3n) 2T ln n for all T 0. Furthermore, in Proposition 5.3 we show a matching lower-bound, just as in the discrete-time case. Therefore, if there is a gap between the best regret attained by fixed-time and anytime algorithms in continuous time, this shows that the anytime lower-bound does not arise from independent experts. We conjecture that this algorithm can be discretized, and a 2T ln n anytime regret bound guarantee that holds with high probability against independent experts is possible in discrete time. 5.1 Algorithm The player strategy we use in this section is similar to the one from the last section, with a crucial difference on Φ: we do not divide the second argument by 2. Namely, define i=1 φ(t, xi) t 0, x Rn, (22) and we set (p(t))t 0 as in (6) but using the above potential. We can still use Lemma 4.1 to get a formula for Φ(t, R(t)). However, the lower-bound given in (8) is not of much use anymore since we do not have ( t +2 Pn i=1 ii)Φ(t, x) 0 anymore. Thus, we need to directly analyze the formula in (7). It turns out that when the instantaneous correlation matrix Σ(t) is the identity matrix for all t 0, which happens when we have independent experts, we can show that this term does not become too negative. This term will be denoted s BHT(t) := tΦ(t, R(t)) + 1 i=1 iiΦ(t, R(t))(ei p(t))T(ei p(t)). That is, the s BHT is the integrand that appears in Lemma 4.1. Intuitively, since Φ satisfies the backwards-heat inequality, we should expect s BHT(t) to not be too negative. That is exactly what we show in the next lemma. A complete proof is in Section 5.3, but we give a brief sketch here. Lemma 5.1 Let Φ be as in (22). Then s BHT(t) (2 n)/2 t for all t > 0. Proof sketch For simplicity, fix t > 0 and assume R1(t) R2(t) Rn(t). Moreover, explicitly write the dependency of the s BHT on the regret vector by writing s BHT(t, R(t)). The first step is to show that forcefully setting Rn(t) to zero, denote such a vector by R(t), can only decrease the s BHT. That is, s BHT(t, R(t)) s BHT(t, R(t)). Then, we show that s BHT(t, R(t)) + 1/2 t is equal to the s BHT restricted to the experts 1, . . . , n 1. Then, by induction we get that s BHT(t, R(t)) + (n 1)/2 t is greater than the s BHT for a single expert, which one can verify that is at least 1/2 t, concluding the proof. Harvey, Liaw, Portella Finally, the above lemma together with the expression for Φ(t, R(t)) given by (7) yields the desired regret bound when the instantaneous covariance matrix Σ(t) is always the identity matrix. Theorem 5.2 Let Φ be defined as in (22) and let (p(t))t 0 be as in (6). Suppose we have Σ(t) = I for all t 0. Then, almost surely for all T 0 we have Cont Regret(T) λ(3n 1) 2T ln n + 6 Proof Let T 0. By Lemma 4.1 we have11 Φ(T, R(T)) = Z T tΦ(t, R(t)) + 1 i,j [n] ijΦ(t, R(t))(ei p(t))TΣ(t)(ej p(t)) dt. (23) Since Σ(t) = I for all t 0, the above integrand is exactly s BHT(t). Therefore, by Lemma 5.1, 0 s BHT(t) dt Z T t dt = (2 n) Finally, we translate this lower bound Φ(T, R(T)) to an upper bound on Cont Regret(T). By an easy modification of Lemma 4.4 for ε = 1/n and non-zero lower-bounds on Φ(T, R(T)), we get Cont Regret(T) λ(2n 1) 2T ln(2n) + 4 2T ln n + 6 5.2 A matching lower-bound The next proposition shows that, against independent experts, the expected regret is always roughly 2T ln n. This matches the discrete-time lower bound and shows that Theorem 5.2 is tight. The proof is a straightforward modification of the analogous discrete-time result (Cesa-Bianchi and Lugosi, 2006, Theorem 3.7). It is important to note that Theorem 5.2 is considerably stronger than Proposition 5.3: the latter bounds the expectation separately for each t, whereas the former gives a bound that holds almost surely for all t. Proposition 5.3 Assume w(i) = ei for each i [n]. Then, for any player strategy, 2T ln n(1 on(1)) E [ Cont Regret(T) ] 2T ln n T > 0, where on(1) is some function of n whose limit is 0 as n goes to + . Proof Since the functions in the stochastic differential equations in Section 2 are at least left-continuous and bounded, the stochastic integrals are well-defined, vanish at time 0, and are martingales (Revuz and Yor, 1999, Def. IV.2.1 and IV.2.3). In particular, E [ A(t) ] = 0 for all t 0. For each i [n] we have Gi(t) = Bi(t) since w(i)(t) = ei. Thus, each gain process is an independent Brownian motion. Therefore, E [ Cont Regret(T) ] = E max i [n] Bi(T) [ 2T ln n(1 on(1)), 11. In fact, we should be careful with Φ(0, 0) as we were in the proof of Theorem 4.5 in Appendix D. Since the exact same trick works in this case as well, we take Φ(0, 0) := 0 for the sake of simplicity. Continuous Prediction with Experts Advice where in the last equation we used the well-known asymptotics for the maximum of n Gaussian random variables (e.g. Wainwright, 2019, Exercise 2.11 or Orabona and Pál, 2015, Theorem 3) and the fact that for any i [n] the random variable Bi(t) follows a Gaussian distribution with mean zero and variance t. 5.3 Proof of Lemma 5.1 We will prove the lower-bound on the s BHT evaluated at any point x Rn instead of only when it is evaluated at R(t). Thus, we shall define some notation to analyze the s BHT evaluated at arbitrary points. Namely, let x Rn. Define p(t, x) := 1 1T xΦ(t, x) xΦ(t, x), t 0, setting p(t, x) = (1/n)1 if 1T xΦ(t, x) = 0. Moreover, define s BHT(t, x) := tΦ(t, x) + 1 i=1 iiΦ(t, x)(ei p(t, x))T(ei p(t, x)). Note now that we may assume x 0. Indeed, assume xi < 0 for some xi and take t > 0. Then, due to the truncation in the definition of φ, we have iiΦ(t, x) = xxφ(t, xi) = 0. Since tφ(t, xi) = 1 2t exp(x2 i 2t ) 0, we conclude that s BHT(t, x) s BHT(t, x eixi), that is, setting the i-th entry of x to zero can only decrease the value of the s BHT. Let T > 0. For each i [n] define qi := ex2 i /2T , Qi := erfi xi j=1 Qj, pi := Qi Θ p(T, x)i. Then, evaluating the derivatives according to Lemma A.2 and using that pi xφ(t, xi) Qi we have s BHT(T, x) = 1 2 i=1 ex2 i /2t 1 (ei p)T(ei p) i=1 ex2 i /2t 2pi p Tp i=1 qi 2ΘQi QTQ . Let us now make a couple of simplifying assumptions: Assume x1 x2 xn; Harvey, Liaw, Portella We may assume without loss of generality that T = 1 since, from the above calculations, one can see that s BHT(T, x) = T s BHT(1, x/ Now, it suffices to show that 2 xn(Θ2 s BHT(1, x)) 0 (24) to prove the desired claim by induction. Indeed, note that if xn(Θ2 s BHT(1, x)) 0, then decreasing xn all the way to zero only decreases the value of Θ2 s BHT(1, x). More specifically, let x := x xnen and define Θ , Q , and q accordingly (that is, substituting x by x in the original definition of these terms). In this case, we have Q n = 0 and q n = 1, yielding Θ2 s BHT(1, x) (Θ )2 s BHT(1, x ) = 1 i=1 q i(2ΘQ i Q TQ ) i 0, we have π 2 erfi(z) = Z z 0 et2 dt < ez2 1 In particular, xi , i [n]. Proof The inequality from (26) can be found in Olver et al. (2010, Section 7.8). For the second inequality, note that Qi = erfi xi (26) < 2 π ex2 i /2 1 xi/ The above lemma together with 2Qn Qi 0 implies, for each i [n], xnqn Qi(2Qn Qi) xn 2 π (qi 1) | {z } qi (2Qn Qi) qn2 2 πqi(2Qn Qi). 2 πqiqn(Qi Qn) + xnqn Qi(2Qn Qi) 2 πqiqn(Qi Qn) + qn2 2 πqi(2Qn Qi) 2 πqnqi Qn 0. This completes the proof of (24) and, thus, of Lemma 5.1. Harvey, Liaw, Portella Figure 2: Left: Plot of s BHTΣ(1, v(α, β)) where v(α, β) := (α, β, . . . , β)T Rn with n = 105, with Σ := v( 1, 1)v( 1, 1)T, and with α and β such that α [(6/7)λ(3n), λ(3n)] and β [1, λ(3n)]. The highest value displayed in the plot is (4 3n)/2. Right: Plot of upper bounds to f(n) := min { s BHTΣ(1, v(α, β) : α, β [0, λ(3n)] } (found using L-BFGS) for n {10, . . . , 104} and with Σ := v( 1, 1)v( 1, 1)T. This suggests that f(n) is not in O(n). 6. Challenges for Optimal Anytime Discrete Regret Bounds Throughout the paper, we studied player strategies derived from potentials that use the hypergeometric function M0 as defined in (9). Namely, we used potentials of the form i=1 φ(t, β xi), x Rn, (27) where φ(t, α) := t M0(α2/(2t)) and β > 0. In Section 4 we set β := 1/2 and showed that, in this case, Φ(T, R(T)) 0 for all T > 0 almost surely, where (R(t))t 0 is the regret process. The proof of this fact boiled down to showing that a generalized form of the skewed Backwards Heat Term introduced in the last section is non-negative. That is, showing that s BHTΣ(t, x) := tΦ(t, R(t)) + 1 i=1 iiΦ(t, R(t))(ei p(t))TΣ(ei p(t)) is non-negative for all t > 0, all vectors x Rd, and for any Σ 0 with all diagonal entries equal to 1. From the lower bound on Φ(T, R(T)) we derived quantile regret bounds in Lemma 4.4. In particular, we have a continuous regret bound of the form Cont Regret(T) 2λ(n 1) T, which is asymptotically suboptimal if compared to the lower bound from Proposition 5.3. In general, one may modify Lemma 4.4 to show that if we use the potential in (27) and show that Φ(T, R(T)) O(n T) for all T almost surely, we obtain the bound Cont Regret(T) 1 T, for all T almost surely. From Lemma 4.4 we know that λ(O(n)) 3 + p 2 ln(O(n) + 1). Thus, for β = 1 the above anytime regret bound of λ(O(n)) T would asymptotically match the fixed-time regret from continuous MWU and the lower-bound from Proposition 5.3 up to the constant multiplying the T ln n term. Continuous Prediction with Experts Advice In Lemma 5.1 we showed that for Σ = I we have s BHTΣ(T, x) lim ε 0 s BHTΣ(T, ε e1) = (2 n)/ T, x Rn. (28) This implies continuous-time bounds when the gain processes are independent. To remove the independence assumption (and possibly transfer the result to discrete time), it suffices to extend this lemma and show that s BHTΣ(T, x) O(n/ T) for all T > 0, all x Rn, and all Σ 0 with ones in the diagonal entries. In fact, one may verify that we need only show this for T = 1. One may conjecture that the inequality in (28) still holds, which would yield the desired regret bounds since we have s BHTΣ(1, x) limε 0 s BHT(1, ε e1) (4 3n)/2. However, this is not true, as show in Figure 2. In fact, these experiments show that it is unlikely that s BHTΣ(T, x) O(n T) for all x Rn. We still conjecture that Φ(T, R(T)) O(n T) with reasonable probability, but these plots show that a proof of this fact would require the use of more structure on R(T) since there are x Rn where s BHT(T, x) is too low. We believe p(t) acts on R(t) in such a way that s BHT(t, R(t)) cannot be too low for all t in a long interval. Acknowledgments We would like to thank the anonymous reviewers for their feedback on earlier versions of this work, and in particular for pointing out the connection with the work of Zhang et al. (2022) and for the suggestion to add Section 6. Harvey, Liaw, Portella Appendix A. Properties of the Confluent Hypergeometric Function In this section we outline some of the main properties of the confluent hypergeometric function M0 that we use throughout the paper. Many of the properties we use in this section come from Harvey et al. (2020a, Section 2.6). Fact A.1 (Harvey et al., 2020a, Facts 2.4, 2.5, and 2.6) For any x R we have (i) M 0(x) = π 2 x erfi( x) (ii) M0(x) is strictly decreasing and concave on [0, ) From the above facts together with M0(0) = 1 shows us that M0 is strictly decreasing on [0, + ) and its image over this domain is ( , 1] (since its derivative is negative and strictly decreasing). Furthermore, the above properties of M0 allow us to derive many properties about the function φ(t, x) = t M0([x]2 +/2t), as we show in the next lemma. Lemma A.2 Let t R 0 and x R. Then (i) φ(t, x) is concave and non-increasing in x; (ii) xφ(t, x) = pπ 2t) for x > 0; (iii) xxφ(t, x) = 1 t exp(x2/2t) for x > 0; (iv) tφ(t, x) = 1 2 t exp([x]2 +/2t); Proof Properties (i) and (ii) follow directly from Fact A.1 and the chain rule. Property (iii) follows from the fundamental theorem of calculus together with the chain rule since p π 2 R x 0 ez2 dz. For (iv), assume for notational simplicity only that x > 0. Then, π2t 2x erfi x The next lemma gives an upper-bound to M0(x2/2) for x 0. Beyond other uses, this will be useful to upper-bound the regret bounds we derive with better-known functions. Lemma A.3 For every x 0, 1 M0(x2/2) exp(x2/2) x2 + 1 + 2/x2 x 0. Continuous Prediction with Experts Advice Proof Define f(x) = 1 M0(x2/2) and g(x) = exp(x2/2)/(x2 + 1 + 2/x2). The derivatives are 2) and g (x) = exp(x2/2) x7 x5 + 2x3 + 4x (x4 + x2 + 2)2 . The second derivatives are f (x) = exp(x2/2) and g (x) = exp(x2/2) x12 x10 + 9x8 + 7x6 32x4 + 8x2 + 8 (x4 + x2 + 2)3 . We will show that f (x) g (x). By rearranging, this amounts to showing that x12 x10 + 9x8 + 7x6 32x4 + 8x2 + 8 (x4 + x2 + 2)3. Expanding the right-hand side, we get = x12 + 3x10 + 9x8 + 13x6 + 18x4 + 12x2 + 8. The right-hand side coefficients are no smaller, which shows that f (x) g (x) for x 0. By integrating and using that f (0) = g (0) = 0, we obtain that f (x) g (x) for all x 0. Finally, by integrating and using that f(0) = g(0) = 0, we obtain f(x) g(x) for all x 0. The next lemma from Harvey et al. (2020a, Lemma 3.10) is used to proved that the hypergeometric potential satisfies the discrete version of the backwards heat inequality. Lemma A.4 (Harvey et al., 2020a, Lemma 3.10) For all z [0, 1) and x R, we have Recall from Definition 4.3 that λ is the non-negative inverse of x 7 M0(x2/2). The bound on M0 from Lemma A.3 allows us to upper-bound λ as follows. Lemma A.5 Let n R be positive. Then, 2 ln(n + 1). Consequently, lim n λ(n) p Proof Define ℓ= 3+ p 2 ln(n + 1). Note that ℓ2 = 2 ln(n+1)+6ℓ 9. So, by Lemma A.3, 1 M0(ℓ2/2) exp(ℓ2/2) ℓ2 + 1 + 2/ℓ2 = exp(ln(n + 1) + 3ℓ 9/2) ℓ2 + 1 + 2/ℓ2 . (29) Since ℓ 3 we have 3ℓ 9/2 ℓ, and also eℓ ℓ2 + 1 + 2/ℓ. (30) Harvey, Liaw, Portella This may be seen by a direct calculation for ℓ= 3, then observing that the second derivative of the left-hand side exceeds the second derivative of the right-hand side for ℓ 3. Combining (29) and (30) we obtain 1 M0(ℓ2/2) exp(ln(n + 1)) = n + 1. Since M0 is monotonically increasing, it follows that λ(n) ℓ. Appendix B. Missing Proofs of Section 3 Lemma B.1 Let Φ be as in (2) and ηt 0 for all t 0. Then, i,j [n] ijΦ(t, x)Σij(t) ηt 2 , t 0, x Rn. Proof Define Θ := (Pn i=1 exp(ηtxi))2. First of all, one may verify that iiΦ(t, x) = 1 j =i eηtxj = 1 j=1 eηt(xi+xj) e2ηtxi and that, for i = j, ijΦ(t, x) = 1 Θηteηt(xi+xj). Therefore, using that Σii(t) = 1 for any i [n] and defining vi := eηtxi for each i [n] X i,j [n] ijΦ(t, x)Σij(t) = ηt i,j [n] eηt(xi+xj) i,j eηt(xi+xj)Σij(t) | {z } =v TΣ(t)v 0 Lemma B.2 Let Φ be as in (2) and ηt be either constant in t or of the form [t > 0]c/ t for some c > 0. Then, tΦ(t, x) log n 2tηt , t 0, x Rn. Proof If ηt is constant as a function of t, then tΦ(t, x) = 0. Otherwise, one may verify that (using the fact ηt = c/ tΦ(t, x) = 1 2tηt log n X i=1 eηtxi 1 t xi eηtxi Pn j=1 eηtxj 1 ηt log n X | {z } =Φ(t,x) i=1 xi eηtxi Pn j=1 eηt Cj | {z } =CTp Continuous Prediction with Experts Advice Let us now show that 1 ηt log n X i=1 eηtxi log n ηt + x Tp, (31) which completes the proof of the lemma. We have 1 ηt log n X i=1 eηtxi = log n Define zi := ηtxi for every i [n]. Then, 1 ηt log n X ezi Pn j=1 ezj zi j=1 ezj log n X 1 nezj log n X 1 nezi log(ezi), and this last inequality is true by the convexity of α R 0 7 α log α. This concludes the proof of (31) and, thus, of the lemma. Appendix C. Ensuring Predictability of the Player Strategy In Section 2, we required player strategies to be left-continuous in time. In fact, one could possibly loosen this assumption to only require (p(t))t 0 to be predictable (with respect to the filtration generated by (B(t))t 0) as defined in Revuz and Yor (1999, Definition IV.5.2). Yet, adapted left-continuous process are predictable (Mörters and Peres, 2010, Lemma 7.2) and are easier to reason about directly. One should note, however, that the player strategies defined in (6) may not be leftcontinuous. This happens due to the discontinuity when the gradient entries sum to 0. For simplicity, we will discuss here how to modify player strategies generated by the potentials in (11) and (22) to ensure left-continuity, but similar techniques work for players generated by other potentials. In particular, the discontinuity problems may happen only when the gradient is 0. Finally, we note that the exact same predictability issue arises in the continuous Normal Hedge algorithm due to Freund (2009). Let us look at an example to see when p can be discontinuous and that it is not clear how to avoid the discontinuity in our case. Suppose R(s) < 0 for s (t ε, t] for some t, ε > 0, and that R1(s) > 0 while Ri(s) 0 for s (t, t + ε) and i {2, . . . , n}. Then we for all s (t ε, t + ε) we have p(s) = (1/n)1 if s t and p(s) = e1 for s > t. Ideally, we would like a smooth transition between the uniform distribution and the point mass in the first expert, but there is no clear way to enforce that. In the case of the Log Sum Exp potential from Section 3, the gradient always lives in the relative interior of the simplex, so we never place 0 probability on any of the experts. Harvey, Liaw, Portella To ensure left-continuity of the player strategy, we can simply redefine it to be leftcontinuous. In our case, this will only modify the player strategy at the points of discontinuity and shall not affect our calculations. More precisely, let (ˆp(t))t 0 be the player strategy as described in (6) and define (p(t))t 0 by p(t) := lim s t ˆp(s), where (R(t))t 0 is still defined in terms of (p(t))t 0. One might worry that this definition becomes circular, but note that to define p(t) we only need the values of R(s) for s < t, and for t = 0 we have R(t) = 0. This ensures that p(t) and R(t) are well-defined. Furthermore, since (G(t))t 0 is continuous, we also have that (A(t))t 0, and thus (R(t))t 0, are continuous, even though (p(t))t 0 may not be continuous (Cohen and Elliott, 2015, Remark 12.1.12). Now by definition we have that p is left-continuous. Moreover, the points t of discontinuity for p are the points such that R(t) enters or leaves the non-positive orthant { x Rn : x 0 }. Thus, we need only to ensure that any claims that explicitly use the form of p given by (6) also hold in the discontinuity points. In such points it is clear we have Z t 0 Φ(s, R(s)), d R(s) = 0, as required by Lemma 4.1. This also should not affect the calculations in the proof of Lemma 5.1 since we can avoid the points of non-discontinuity my small perturbations without changing the value of the s BHT significantly. Appendix D. Missing Proofs for Section 4 Proof of Lemma 4.2 Let T 0. Intuitively, we want to show that Φ(T, R(T)) 0 by using Lemma 4.1. However, it is not clear what should be the value of Φ(0, 0). To handle this issue, let δ > 0 and define Φδ(t, x) := Φ(t + δ, x) for all t 0 and x Rn, let p(δ) be defined as in (6) but replacing Φ by Φδ, and let Rδ be the continuous regret vector of pδ. Our goal now is to show that Φδ(T, Rδ(T)) Φδ(0, 0) = δ almost surely. (32) Then, by taking the limit with δ tending to 0 we have Φ(T, R(T)) 0 almost surely. Furthermore, by a union bound we have Φ(t, R(t)) 0 for all rational t 0, and since both Φ and R are continuous in t, this implies that Φ(t, R(t)) 0 for all t 0 almost surely by density of the rationals. Note, however, that there is a subtlety in the step in which we take the limit with δ 0, since we are implicitly assuming that lim δ 0 Rδ(T) = R(T) almost surely. (33) Let us prove that (33) indeed holds. Since Rδ i (T) = Gi(T) R T 0 p(δ)(t), d G(t) and Gi(T) is independent of δ for each i [n], we only need to show that 0 p(δ)(t), d G(t) = Z T 0 p(t), d G(t) almost surely. Continuous Prediction with Experts Advice Since p(δ)(t) is bounded and predictable (since we can assume it is left-continuous, see Appendix C) and Gi is a continuous martingale (since it is given by a stochastic integral of continuous functions with respect to a Brownian motion) for each i [n], the above holds by the Dominated Convergence Theorem for stochastic integrals (Revuz and Yor, 1999, Theorem IV.2.12). It is also worth mentioning that we indeed have limδ 0 pδ = p point-wise since taking δ to 0 would not make x2/2(t + δ) cross the negative orthant where the points of discontinuity of p may be. This completes the proof of (33). We now proceed with the proof of (32). Since Φδ is separable and φ(t + δ, ) is concave for any t 0, by Lemma 4.1 we have Φδ(T, Rδ(T)) Φδ(0, 0) Z T tΦδ(t, Rδ(t)) + 2 i=1 iiΦδ(t, Rδ(t)) dt. Note that, for any x Rn, we have tΦδ(t, x) = Pn i=1 tφ(t + δ, x/2) and for all i [n] we have iiΦ(t, x) = (1/4) xxφ(t + δ, xi). Therefore, tΦδ(t, Rδ(t)) + 2 i=1 iiΦδ(t, Rδ(t)) = tφ(t + δ, Rδ i (t)) + 1 2 xxφ(t + δ, Rδ i (t)) 0, where the last equation holds since tφ(t, α) + (1/2) xxφ(t, α) 0 for any t > 0 and α R. This implies that Φδ(T, Rδ(T)) Φδ(0, 0) = δ and Φ(T, R(T)) 0 by taking the limit. Yasin Abbasi-Yadkori, Peter L. Bartlett, and Victor Gabillon. Near minimax optimal players for the finite-time 3-expert prediction problem. In Annual Conference on Neural Information Processing Systems (NIPS), pages 3033 3042, 2017. Alexandr Andoni and Rina Panigrahy. A differential equations approach to optimizing regret trade-offs. May 2013. URL https://arxiv.org/abs/1305.1359. Erhan Bayraktar, Ibrahim Ekren, and Xin Zhang. Finite-time 4-expert prediction problem. Communications in Partial Differential Equations, 45(7):714 757, 2020a. Erhan Bayraktar, Ibrahim Ekren, and Yili Zhang. On the asymptotic optimality of the comb strategy for prediction with expert advice. The Annals of Applied Probability, 30(6): 2517 2546, 2020b. Dimitri P. Bertsekas. Dynamic programming and optimal control. Vol. I. Athena Scientific, Belmont, MA, third edition, 2005. Leo Breiman. First exit times for a square root boundary. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 2: Contributions to Probability Theory, Part 2, pages 9 16. University of California Press, 1967. Sébastien Bubeck. Introduction to online optimization, December 2011. URL http://www. cse.iitd.ac.in/~naveen/courses/CSL866/Bubeck Lecture Notes.pdf. Harvey, Liaw, Portella Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006. ISBN 978-0-521-84108-5. Nicolò Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire, and Manfred K. Warmuth. How to use expert advice. Journal of the ACM, 44(3):427 485, 1997. Kamalika Chaudhuri, Yoav Freund, and Daniel J. Hsu. A parameter-free hedging algorithm. In Annual Conference on Neural Information Processing Systems (NIPS), pages 297 305, 2009. Alexey V. Chernov and Vladimir Vovk. Prediction with advice of unknown number of experts. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI), pages 117 125, 2010. Samuel N. Cohen and Robert J. Elliott. Stochastic calculus and applications. Probability and its Applications. Springer, Cham, second edition, 2015. ISBN 978-1-4939-2866-8; 978-1-4939-2867-5. Thomas M. Cover. Behavior of sequential predictors of binary sequences. In Trans. Fourth Prague Conference on Information Theory, Statistical Decision Functions, Random Processes, pages 263 272. 1967. Burgess Davis. On the Lp norms of stochastic integrals and other martingales. Duke Math. J, 43(4):697 704, 1976. Nadejda Drenska. A PDE approach to a prediction problem involving randomized strategies. Ph D thesis, New York University, 2017. Nadejda Drenska and Robert V Kohn. Prediction with expert advice: A PDE perspective. Journal of Nonlinear Science, 30(1):137 173, 2020. Dylan J Foster, Alexander Rakhlin, and Karthik Sridharan. Adaptive online learning. Advances in Neural Information Processing Systems (NIPS), 28:3375 3383, 2015. Yoav Freund. A method for hedging in continuous time. October 2009. URL http: //arxiv.org/abs/0904.3356. Nick Gravin, Yuval Peres, and Balasubramanian Sivan. Towards optimal algorithms for prediction with expert advice. In Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 528 547, 2016. Nicholas J. A. Harvey, Christopher Liaw, Edwin Perkins, and Sikander Randhawa. Optimal anytime regret with two experts. February 2020a. URL https://arxiv.org/abs/2002. 08994v2. Nicholas J. A. Harvey, Christopher Liaw, Edwin A. Perkins, and Sikander Randhawa. Optimal anytime regret for two experts. In 61st IEEE Annual Symposium on Foundations of Computer Science (FOCS), pages 1404 1415, 2020b. Continuous Prediction with Experts Advice Achim Klenke. Probability Theory: A Comprehensive Course. Springer, 2008. Vladimir A. Kobzar, Robert V. Kohn, and Zhilei Wang. New potential-based bounds for prediction with expert advice. In Conference on Learning Theory, (COLT), volume 125, pages 2370 2405, 2020. Wouter M Koolen and Tim van Erven. Second-order quantile methods for experts and combinatorial games. In Conference on Learning Theory (COLT), volume 40, pages 1155 1175, 2015. Nick Littlestone and Manfred K Warmuth. The weighted majority algorithm. Information and computation, 108(2):212 261, 1994. Haipeng Luo and Robert E. Schapire. Achieving all with no parameters: Ada Normal Hedge. In Conference on Learning Theory (COLT), volume 40, pages 1286 1304, 2015. Peter Mörters and Yuval Peres. Brownian motion, volume 30 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2010. Jeffrey Negrea, Blair Bilodeau, Nicolò Campolongo, Francesco Orabona, and Dan Roy. Minimax optimal quantile and semi-adversarial regret via root-logarithmic regularizers. Advances in Neural Information Processing Systems (Neur IPS), 34, 2021. Yurii Nesterov. Primal-dual subgradient methods for convex problems. Math. Program., 120 (1, Ser. B):221 259, 2009. Bernt Øksendal. Stochastic differential equations. Universitext. Springer-Verlag, Berlin, sixth edition, 2003. An introduction with applications. Frank W. J. Olver, Daniel W. Lozier, Ronald F. Boisvert, and Charles W. Clark, editors. NIST handbook of mathematical functions. Cambridge University Press, Cambridge, 2010. Francesco Orabona and Dávid Pál. Optimal non-asymptotic lower bound on the minimax regret of learning with expert advice. November 2015. URL http://arxiv.org/abs/1511. 02176. Francesco Orabona and David Pal. Coin betting and parameter-free online learning. Advances in Neural Information Processing Systems (Neur IPS), 29:577 585, 2016. Edwin Perkins. On the Hausdorffdimension of the Brownian slow points. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 64:369 399, 1983. Daniel Revuz and Marc Yor. Continuous martingales and Brownian motion, volume 293 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, third edition, 1999. Volodimir G Vovk. Aggregating strategies. Proc. of Computational Learning Theory, 1990, 1990. Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press, 2019. Harvey, Liaw, Portella Zhiyu Zhang, Ashok Cutkosky, and Ioannis Paschalidis. PDE-based optimal strategy for unconstrained online learning. January 2022. URL https://arxiv.org/abs/2201.07877.