# online_convex_optimization_with_continuous_switching_constraint__bbf1f6ce.pdf

Online Convex Optimization with Continuous Switching Constraint

Guanghui Wang1, Yuanyu Wan1,2, Tianbao Yang3, Lijun Zhang1,2,

1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China

2Peng Cheng Laboratory, Shenzhen, Guangdong, China 3Department of Computer Science, The University of Iowa, Iowa City, USA {wanggh,wanyy,zhanglj}@lamda.nju.edu.cn, tianbao-yang@uiowa.edu

In many sequential decision making applications, the change of decision would bring an additional cost, such as the wear-and-tear cost associated with changing server status. To control the switching cost, we introduce the problem of online convex optimization with continuous switching constraint, where the goal is to achieve a small regret given a budget on the overall switching cost. We ﬁrst investigate the hardness of the problem, and provide a lower bound of order (

T) when the switching cost budget S = (

T), and (min{T/S, T}) when S = O(

T), where T is the time horizon. The essential idea is to carefully design an adaptive adversary, who can adjust the loss function according to the cumulative switching cost of the player incurred so far based on the orthogonal technique. We then develop a simple gradient-based algorithm which enjoys the minimax optimal regret bound. Finally, we show that, for strongly convex functions, the regret bound can be improved to O(log T) for S = (log T), and O(min{T/ exp(S) + S, T}) for S = O(log T).

1 Introduction

Online convex optimization (OCO) is a fundamental framework for studying sequential decision making problems (Shalev-Shwartz, 2011). Its protocol can be seen as a game between a player and an adversary: In each round t, ﬁrstly, the player selects an action wt from a convex set D Rd. After submitting the answer, a loss function ft : D 7! R is revealed, and the player suffers a loss ft(wt). The goal is to minimize the regret:

which is the difference between the cumulative loss of the player and that of the best action in hindsight.

Over the past decades, the problem of OCO has been extensively studied, yielding various algorithms and theoretical guarantees (Hazan, 2016; Orabona, 2019). However, most of the existing approaches allow the player to switch her action freely during the learning process. As a result, these methods become unsuitable for many real-life scenarios, such as the online shortest paths problem (Koolen et al., 2010), and portfolio management (Dekel et al., 2014; Vittori et al., 2020), where the switching of actions brings extra cost, and the budget for the overall switching cost is strictly

Lijun Zhang is the corresponding author.

35th Conference on Neural Information Processing Systems (Neur IPS 2021).

constrained. To address this problem, recent advances in OCO introduced the switching-constrained problem (Altschuler & Talwar, 2018; Chen et al., 2020), where a hard constraint is imposed to the number of the player s action shifts, i.e.,

{wt 6= wt 1} K, (2)

and the goal is to minimize regret under a ﬁxed budget K. For this problem, Chen et al. (2020) have shown that, given any K, we could precisely control the the overall switching cost in (2), while achieving a minimax regret bound of order (T/

One limitation of (2) is that it treats different amounts of changes between wt 1 and wt equally, since the binary function is used as the penalty for action shifts. However, as observed by many practical applications, e.g., thermal management (Zanini et al., 2010), video streaming (Joseph & de Veciana, 2012) and multi-timescale control (Goel et al., 2017), the price paid for large and small action changes are not the same. Speciﬁcally, for these scenarios, the switching cost between two consecutive rounds is typically characterized by a 2-norm function, i.e., kwt wt 1k. Motivated by this observation, in this paper, we introduce a novel OCO setting, named OCO with continuous switching constraint (OCO-CSC), where the player needs to choose actions under a hard constraint on the overall 2-norm switching cost, i.e.,

kwt wt 1k S, (3)

where S is a budget given by the environment. The main advantage of OCO-CSC is that, equipped with (3), we could have a more delicate control on the overall switching cost compared to the binary constraint in (2).

For the proposed problem, we ﬁrstly observe that, an O(T/

S) regret bound can be achieved by using the method proposed for the switching-constrained OCO (Chen et al., 2020) under a proper conﬁguration of K. However, this bound is not tight, since there is a large gap from the lower bound established in this paper. Speciﬁcally, we provide a lower bound of order (

T) when S = (

T), and (min{ T

S , T}) when S = O(

T). Our basic framework for constructing the lower bound follows the classical linear game (Abernethy et al., 2008), while we adopt a novel continuous-constraint-related dynamic policy for the adversary, which allows it to adaptively change the loss function according to the player s cumulative switching costs. Furthermore, we prove that the classical online gradient descent (OGD) with an appropriately chosen step size is able to obtain the matching upper bound. These results demonstrate that there is a phase transition phenomenon between large and small switching budget regimes, which is in sharp contrast to the switchingconstrained setting, where the minimax bound always decreases with (1/

K). Finally, we propose a variant of OGD for λ-strongly convex functions, which can achieve an O(log T) regret bound when S = (log T), and an O(T/ exp(S) + S) regret bound when S = O(log T).

2 Related Work

In this section, we brieﬂy review related work on online convex optimization.

2.1 Classical OCO

The framework of OCO is established by the seminal work of Zinkevich (2003). For general convex functions, Zinkevich (2003) shows that online gradient descent (OGD) with step size on the order of O(1/

t) enjoys an O(

T) regret bound. For λ-strongly convex functions, Hazan et al. (2007) prove that OGD with step size of order O(1/[λt]) achieves an O(log T) regret bound. Both bounds have been proved to be minimax optimal (Abernethy et al., 2008). For exponentially concave functions, the state-of-the-art algorithm is online Newton step (Hazan et al., 2007), which enjoys an O(d log T) regret bound, where d is the dimensionality.

2.2 Switching-constrained OCO

One related line of research is the switching-constrained setting, where the player is only allowed to change her action no more than K times. This setting has been studied in various online learning scenarios, such as prediction with expert advice (Altschuler & Talwar, 2018) and bandits problems (Simchi-Levi & Xu, 2019; Dong et al., 2020; Ruan et al., 2021). In this paper, we focus on online convex optimization. Jaghargh et al. (2019) ﬁrstly consider this problem, and develop a novel online algorithm based on the Poison Process, which can achieve an expected regret of order O(T 3/2/E[K]) for any given expected switching budget E[K]. Therefore, the regret will become sublinear for E[K] = o(

T). Later, Chen et al. (2020) propose a variant of the classical OGD based on the mini-batch approach. Speciﬁcally, the algorithm averagely divides the time horizon into K intervals, and only update the decision at the end of each interval based on OGD. The show that the simple algorithm enjoys an O(T/

K) regret bound for any given budget K. They also prove that this result is minimax optimal by establishing a matching (T/

K) lower bound. We note that, when the action set is bounded (i.e., maxw1,w22D kw1 w2k D), since

kwt wt 1k DK,

we could set K = b S/Dc to satisfy (3) and immediately obtain an O(T/

S) regret for OCOCSC, but there is still a large gap from the lower bound we provide in this paper. Very recently, a concurrent work (Sherman & Koren, 2021) considered switching-constrained OCO under oblivious adversary setting, and derived (T/S) bounds. However, we note that: (i) lower bound for the switching-constraint OCO does not translate to a lower bound for our continuous switching cost setting; and (ii) their upper bound for oblivious only holds in expectation, and has an undesirable p

2.3 OCO with Ramp Constraints

Another related setting is OCO with ramp constraints, which is studied by Badiei et al. (2015). In this setting, at each round, the player must choose an action satisfying the following inequality:

|wt,i wt 1,i| Xi, (4)

where wt,i denotes the i-th dimension of wt, and Xi is a constant factor. The constraint in (4) limits the player s action switching in a per-round and per-dimension level. This is very different from the constraint we proposed in (3), which mainly focus on the long-term and overall switching cost. Moreover, we note that, Badiei et al. (2015) assume the player could get access to a sequence of future loss functions before choosing wt, while in this paper we follow the classical OCO framework in which the player can only make use of the historical data.

2.4 OCO with Long-term Constraints

Our proposed problem is also related to OCO with long-term constraints (Mahdavi et al., 2012; Jenatton et al., 2016; Yu et al., 2017), where the action set is written as m convex constraints, i.e.,

D = {w 2 Rd : gi(w) 0, i 2 [m]}, (5)

and we only require these constraints to be satisﬁed in the long term, i.e., PT

t=1 gi(wt) 0, i 2 [m]. The goal is to minimize regret while keeping PT

t=1 gi(wt) small. We note that, in this setting, the action set is expressed by the constraint, which is in contrast to OCO-CSC, where the constraint and the decision set D are independent. Moreover, the constraint in OCO-CSC is time-variant and decided by the historical decisions, while the constraint in (5) is static (or stochastic, considered by Yu et al., 2017). Recently, several work start to investigate OCO with long-term and time-variant constraints, but this task is proved to be impossible in general (Mannor et al., 2009). Therefore, existing studies have to consider more restricted settings, such as weaker deﬁnitions of regret (Neely & Yu, 2017; Liakopoulos et al., 2019; Yi et al., 2020; Valls et al., 2020).

2.5 Smoothed OCO

The problem of smoothed OCO is originally proposed in the dynamic right-sizing for powerproportional data centers (Lin et al., 2012b), and has received great research interests during the past decade (Lin et al., 2012a; Bansal et al., 2015; Antoniadis & Schewior, 2017; Chen et al., 2018; Goel et al., 2019; Goel & Wierman, 2019; Zhang et al., 2021a). In smoothed OCO, at each round, the learner will incur a hitting cost ft( ) as well as a switching cost kwt wt 1k, and the goal is to minimize dynamic regret (Zinkevich, 2003) or competitive ratio (Borodin & El-Yaniv, 2005) with respect to ft(wt) + kwt wt 1k. This setting is closely related to but different from OCO-CSC, where the goal is to minimize regret with respect to ft( ), and the overall switching cost is limited by a given budget. Additionally, we note that, similar to Badiei et al. (2015), studies for the smoothed OCO typically assume the player could see ft( ) or sometimes a window of future loss functions (Chen et al., 2015, 2016; Li et al., 2018) before choosing wt. By contrast, in OCO-CSC the player can not obtain these additional information.

3 Main Results

In this section, we present the algorithms and theoretical guarantees for OCO-CSC. Before proceeding to the details, following previous work, we introduce some standard deﬁnitions (Boyd & Vandenberghe, 2004) and assumptions (Abernethy et al., 2008).

Deﬁnation 1 A function f : D 7! R is convex if 8w1, w2 2 D,

f(w1) f(w2) + rf(w2)>(w1 w2). (6)

Deﬁnation 2 A function f : D 7! R is λ-strongly convex if 8w1, w2 2 D,

f(w1) f(w2) + rf(w2)>(w1 w2) + λ

2 kw1 w2k2. (7)

Assumption 1 D is a d-dimensional ball of radius D

2 , i.e., D = {w|w 2 Rd, kwk D

Assumption 2 The gradients of all the online functions are bounded by G, i.e.,

max w2D krft(w)k G, 8t 2 [T]. (8)

3.1 Lower Bound for Convex Functions

Algorithm 1 Adversary s Policy

1: = 1, l = 1 2: Observe the player s action w1 3: Choose m1 such that m>

1 w1 = 0 4: for t = 2 to T do 5: Observe the player s action wt 6: if Pt

j=l +1 kwj wj 1k c

S then 7: Choose mt = mt 1 8: else 9: Choose mt such that m>

t wt 0 and m>

10: Set = + 1, l = t 11: end if 12: end for

We ﬁrst describe the adversary s policy for obtaining the lower bound. Following previous work Abernethy et al. (2008), our proposed policy is based on the linear game, i.e., in each round, the adversary chooses from a set of bounded linear functions:

F = {f( ) : D 7! R|f(w) = m>w, kmk = G},

which is a subset of convex functions satisfying Assumptions 1 and 2. For this setting, the regret can be written as

where the third equality is because the minimum is only obtained when

t=1 mt 2k PT

According to (9), to get a tight lower bound for R, we have to make both PT

t wt and k PT

t=1 mtk as large as possible. One classical way to achieve this goal is through the orthogonal technique (Abernethy et al., 2008; Chen et al., 2020), that is, in round t, the adversary chooses mt such that m>

t wt 0 and m>

i=1 mi) 0. Note that such a mt can always be found for d 2. For this technique, it can be easily shown that PT

t wt 0, while k PT

T, which implies an (DG

T) lower bound.

The above policy does not take the constraint on the player s action shifts into account. In the following, we show that, by designing a more adaptive adversary which automatically adjusts its action based on the player s historical switching costs, we can obtain a tighter lower bound when S is small. The details is summarized in Algorithm 1. Speciﬁcally, in the ﬁrst round, after observing the player s action w1, the adversary just simply chooses f1(w) = m>

1 w such that m>

1 w1 = 0 (Step 3). For round t 2, the adversary divides the time horizon into several epochs. Let the number of epochs be N. For each round t in epoch 2 [N], after obtaining wt, the adversary checks if the cumulative switching cost of the player inside epoch exceeds a threshold (Step 6). To be more speciﬁc, the adversary will check if

kwj wj 1k c

where l is the start point of epoch , 1 = 1, and c > 0 is a constant factor. If the inequality holds, then the adversary will keep the action unchanged (Step 7); otherwise, the adversary will ﬁnd a new mt based on the orthogonal technique, i.e., ﬁnd mt such that m>

t wt 0 and m>

t Mt 1 0, where Mt 1 = Pt 1

j=1 mj, and then start a new epoch (Steps 9-10).

The essential idea behind the above policy is that the adversary adaptively divides T iterations into N epochs, such that for each epoch 2 [N], the cumulative switching cost inside of is upper bounded by

kwj wj 1k c

and for each epoch 2 [N 1],

kwj wj 1k > c

The above two inequalities help us obtain novel lower bounds for the two terms at the R.H.S. of (9) respectively which depend on S. Speciﬁcally, we prove the following two lemmas.

Lemma 1 We have

Lemma 2 We have '''''

''''' G Tpc p

By appropriately tuning the parameter c, we ﬁnally prove the following lower bound.

Theorem 1 For any online algorithm, under any given switching cost budget S, Algorithm 1 can generate a series of loss functions f1( ), . . . , f T ( ) satisfying Assumptions 1 and 2, such that

T, DT] 0.05DG DT

S , S 2 [D, D

T) 0.05DGT, S 2 [0, D).

Remark 1 The above theorem implies that, when S D, the lower bound for OCO-CSC is linear with respect to T; When S 2 [D, D

T), it s possible to achieve sublinear results, and the lower bound decreases with T/S; for sufﬁciently large S, i.e., when S = (D

T), the lower bound is (DG

T), which matches the lower bound for the general OCO problem (Abernethy et al., 2008). Note that in this case the lower bound will not further improve as S increases, which is very different from the switching-constrained setting, where the lower bound is (T/

K), which means that increasing the budget K is always beneﬁcial. Finally, we note that, since

{wt 6= wt 1},

lower bound for the binary switching cost setting (Altschuler & Talwar, 2018; Chen et al., 2020) does not translate to lower bound for our continuous switching cost setting.

Remark 2 We summarize our main ideas for proving Theorem 1 as follows. For the proposed adversary, we ﬁrstly show that (Lemma 1, Eq. (26)), inside of each batch, there exists a negative error term caused by the possible non-orthogonality between the player s decisions and the adversary s choice. To maximize this error term and further the lower bound, the adversary should change its choice as many times as possible (thus require a small threshold); On the other hand, Lemma 2 (Eq. (30)) indicates that, the lower bound is tighter when the number of batches is small (thus require a large threshold). Based on the two lemmas, in the proof of Theorem 1, we show that the ﬁnal lower bound is a function of the threshold. Very luckily, we ﬁnd that the optimal lower bound can be derived by choosing a threshold that maximizes the function.

3.2 Upper Bounds

In this section, we provide the algorithm for obtaining the upper bound. Before introducing our method, we note that, as mentioned in Section 2.2, the mini-batch OGD algorithm proposed by Chen et al. (2020) enjoys an O(T/

S) regret bound for OCO-CSC, which is suboptimal based on the lower bound we constructed at the last section. In the following, we show that, perhaps a bit surprisingly, the classical online gradient descent with an appropriately chosen step size is sufﬁcient for obtaining the matching upper bound. Speciﬁcally, in round t, we update wt by

wt+1 = D [wt rft(wt)] , (10)

where D[p] denotes projecting p into D, i.e.,

D[p] = argmin

(w p)>(w p).

For this algorithm, we prove the following theoretical guarantee.

Theorem 2 Suppose Assumptions 1 and 2 hold, and all loss functions are convex. Then, under any given switching cost budget S, OGD with step size

S GT , S 2 [0, D

satisﬁes (3), and achieves the following regret:

T, DT] DG DT

S , S 2 [D, D

T) DGT, S 2 [0, D).

Minimax Regret

Figure 1: Minimax regret of OCO-CSC. Axies are plotted in log-log scale.

Remark 3 Theorems 1 and 2 show that our proposed algorithm enjoys an O(GD

T) minimax regret bound for S = (D

T), and O(DG min{DT/S, T}) regret bound for S = O(D

T). We illustrate the relationship between the minimax regret and S in Figure 1. Finally, we note that the algorithm requires the paramter T as input, but it could be easily extended to an any-time algorithm by using the doubling trick Shalev-Shwartz (2011).

Although the analysis above implies that the theoretical guarantee for OCO-CSC is unimproveable in general, in the following, we show that tighter bounds is still achievable when the loss functions are strongly convex. Speciﬁcally, when there are no constraints on the switching cost, the state-ofthe-art algorithm is OGD with a time variant step size t = 1/[λt], which enjoys an O(log T) regret bound. For the OCO-CSC setting, in order to control the overall switching cost, we propose to add a tuning parameter at the denominator of the step size. To be more speciﬁc, in round t, we update wt by

wt+1 = D [wt trft(wt)] , (12)

where t = 1 λ(t+c), and c > 0 is a constant factor. By conﬁguring c properly, we can obtain the following regret bound.

Theorem 3 Suppose Assumptions 1 and 2 hold, and all loss functions are λ-strongly convex. Then, under any given switching cost budget S, the algorithm in (12) with

0, S 2 [ 2G

λ log(T + 1), DT] T exp( λ

2G S) 1 1, S 2 [0, 2G

λ log(T + 1)) (13)

satisﬁes (3), and achieves R λD2 + 2G2

λ log (T + 1) . for S 2 [ 2G

λ log(T + 1), DT], and

2GS) 1 + GS, DGT

for S 2 [0, 2G

λ log(T + 1)).

Remark 4 Theorem 3 implies that, when S 2G

λ log(T + 1), the proposed algorithm enjoys an O(log T) optimal regret bound; for S 2G

λ log(T + 1), the proposed algorithm achieves an O(T/ exp(S)+S) regret bound. To obtain a sublinear regret bound, consider S = 2G

λ log(T +1). In this case, we have

R λD2T 1 + 2G2

λ log(T + 1),

which is sublinar for 2 (0, 1].

4 Theoretical Analysis

In this section, we present the proofs for the main conclusions.

4.1 Proof of Theorem 1

T, the lower bound can be directly obtained by using the minimax linear game provided by (Abernethy et al., 2008). When S 2 [D, D

T), by the deﬁnition of regret, we have

Based on Lemmas 1 and 2, we get

R = a1 + a2 G0.5DTpc p

Let c = c0D2, and we have

S2 + c0D2 c0D2GT

S2 + c0D2 c0

where the second inequality is due to D S. Note that the R.H.S. of the above inequality is a function of c0. To maximize the lower bound, we should solve the following convex problem:

0.5px p1 + x x,

which is equivalent to ﬁnding the solution of the following equation:

16x4 + 32x3 + 49x2 + 15x 1 = 0.

it can be easily shown that the optimal solution x 0.056. Thus, by setting c0 = 0.056, we get

R 0.05GD2 T

For S 2 (0, D], based on (14) and setting c = 0.056S2, we have

S2 + 0.056S2 0.056S2

1.056 0.056

where the second inequality is because S D.

4.2 Proof of Theorem 2

We ﬁrst prove that by setting as in (11), the constraint in (3) always holds. Let w0

t = wt 1 rft 1(wt 1). We have

k rft 1(wt 1)k

(8) GT, (16)

where the ﬁrst inequality is based the following lemma, which describes the non-expansion property of the projection.

Lemma 3 (Mc Mahan & Streeter, 2010) For the projection operation, we have 8w1, w2 2 Rd,

k D(w1) D(w2)k kw1 w2k.

Based on (16), for S 2 [D

T, DT], we have

kwt wt 1k GT = D

For S 2 [0, D

T), we have

kwt wt 1k GT = S GT GT = S.

Next, we turn to upper bound the regret. Let w = argminw2D

t=1 ft(w). Based on the classical analysis of OGD (Hazan, 2016), we have

kwt w k2 kw0

t w k2 =kwt 1 rft 1(wt 1) w k2

=kwt 1 w k2 + 2krft 1(wt 1)k2

2 (wt 1 w )>rft 1(wt 1).

(wt w )>rft(wt) kwt w k2 kwt+1 w k2

2krft(wt)k2.

Based on the convexity of the loss functions and summing the above inequality up from 1 to T, we get

kwt w k2 kwt+1 w k2

krft(wt)k2 D2

Thus, for S 2 [D

T, DT], we have

For S 2 [0, D

Finally, note that by Assumptions 1 and 2 and the convexity of the loss functions, we always have R DGT.

5 Conclusion and Future Work

In this paper, we propose a variant of the classical OCO problem, named OCO with continuous switching constraint, where the player suffers a 2-norm switching cost for each action shift, and the overall switching cost is constrained by a given budget S. We ﬁrst propose an adaptive mini-batch policy for the adversary, based on which we prove that the lower bound for this problem is (

T) when S = (

T), and (min{ T

S , T}) when S = O(

T). Next, we demonstrate that OGD with a proper conﬁguration of the step size achieves the minimax optimal regret bound. Finally, for λ-strongly convex functions, we develop a variant of OGD, which has a tunable parameter at the denominator, and we show that it enjoys an O(log T) regret bound when S = (log T), and an O(T/ exp(S) + S) regret bound when S = O(log T).

In the future, we would like to investigate how to extend the proposed constraint setting to other OCO scenarios, such as minimizing adaptive regret (Jun et al., 2017; Zhang et al., 2019), dynamic regret (Zinkevich, 2003; Zhang et al., 2018), and Universal OCO (van Erven & Koolen, 2016; Wang et al., 2019; Zhang et al., 2021b). Moreover, it is also an interesting question to study the switching constraint problem under other distance metrics such as the Bregman divergence.

Acknowledgments and Disclosure of Funding

Funding in direct support of this work: National Key Research and Development Program of China (2018AAA0101100), NSFC grant 61921006 and 61976112.

Abernethy, J., Bartlett, P. L., Rakhlin, A., and Tewari, A. Optimal stragies and minimax lower

bounds for online convex games. In Proceedings of the 21st Annual Conference on Learning Theory, pp. 415 423, 2008.

Altschuler, J. and Talwar, K. Online learning over a ﬁnite action set with limited switching. In

Proceedings of the 31st Annual Conference on Learning Theory, pp. 1569 1573, 2018.

Antoniadis, A. and Schewior, K. A tight lower bound for online convex optimization with switching

costs. In International Workshop on Approximation and Online Algorithms, pp. 164 175, 2017.

Badiei, M., Li, N., and Wierman, A. Online convex optimization with ramp constraints. In 2015

54th IEEE Conference on Decision and Control, pp. 6730 6736, 2015.

Bansal, N., Gupta, A., Krishnaswamy, R., Pruhs, K., Schewior, K., and Stein, C. A 2-competitive

algorithm for online convex optimization with switching costs. In Algorithms and Techniques for Approximation, Randomization, and Combinatorial Optimization, 2015.

Borodin, A. and El-Yaniv, R. Online computation and competitive analysis. cambridge university

press, 2005.

Boyd, S. and Vandenberghe, L. Convex Optimization. Cambridge University Press, 2004.

Chen, L., Yu, Q., Lawrence, H., and Karbasi, A. Minimax regret of switching-constrained online

convex optimization: No phase transition. In Advances in Neural Information Processing Systems 33, 2020.

Chen, N., Agarwal, A., Wierman, A., Barman, S., and Andrew, L. L. Online convex optimization

using predictions. In Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 191 204, 2015.

Chen, N., Comden, J., Liu, Z., Gandhi, A., and Wierman, A. Using predictions in online optimiza-

tion: Looking forward with an eye on the past. ACM SIGMETRICS Performance Evaluation Review, 44(1):193 206, 2016.

Chen, N., Goel, G., and Wierman, A. Smoothed online convex optimization in high dimensions via

online balanced descent. In Proceedings of the 31st Annual Conference on Learning Theory, pp. 1574 1594, 2018.

Dekel, O., Ding, J., Koren, T., and Peres, Y. Bandits with switching costs: T 2/3 regret. In Proceed-

ings of the 46th annual ACM symposium on Theory of computing, pp. 459 467, 2014.

Dong, K., Li, Y., Zhang, Q., and Zhou, Y. Multinomial logit bandit with low switching cost. In

Proceedings of the 37th International Conference on Machine Learning, pp. 2607 2615, 2020.

Gaillard, P., Stoltz, G., and Van Erven, T. A second-order bound with excess losses. In Proceedings

of the 27th Annual Conference on Learning Theory, pp. 176 196, 2014.

Goel, G. and Wierman, A. An online algorithm for smoothed regression and lqr control. In The

22nd International Conference on Artiﬁcial Intelligence and Statistics, pp. 2504 2513. PMLR, 2019.

Goel, G., Chen, N., and Wierman, A. Thinking fast and slow: Optimization decomposition across

timescales. In Proceedings of the 56th IEEE Annual Conference on Decision and Control, pp. 1291 1298, 2017.

Goel, G., Lin, Y., Sun, H., and Wierman, A. Beyond online balanced descent: An optimal algorithm

for smoothed online optimization. In Advances in Neural Information Processing Systems 32, pp. 1875 1885, 2019.

Hazan, E. Introduction to online convex optimization. Foundations and Trends in Optimization, 2

(3-4):157 325, 2016.

Hazan, E., Agarwal, A., and Kale, S. Logarithmic regret algorithms for online convex optimization.

Machine Learning, 69(2-3):169 192, 2007.

Jaghargh, M. R. K., Krause, A., Lattanzi, S., and Vassilvtiskii, S. Consistent online optimization:

Convex and submodular. In The 22nd International Conference on Artiﬁcial Intelligence and Statistics, pp. 2241 2250, 2019.

Jenatton, R., Huang, J., and Archambeau, C. Adaptive algorithms for online convex optimization

with long-term constraints. In Proceedings of the 37th International Conference on Machine Learning, pp. 402 411, 2016.

Joseph, V. and de Veciana, G. Jointly optimizing multi-user rate adaptation for video transport over

wireless systems: Mean-fairness-variability tradeoffs. In Proceedings of the 31st Annual IEEE International Conference on Computer Communications, pp. 567 575, 2012.

Jun, K.-S., Orabona, F., Wright, S., and Willett, R. Online learning for changing environments using

coin betting. Electronic Journal of Statistics, 11(2):5282 5310, 2017.

Koolen, W. M., Warmuth, M. K., Kivinen, J., et al. Hedging structured concepts. In Proceedings of

the 23rd Annual Conference on Learning Theory, pp. 93 105, 2010.

Li, Y., Qu, G., and Li, N. Using predictions in online optimization with switching costs: A fast

algorithm and a fundamental limit. In 2018 Annual American Control Conference, pp. 3008 3013, 2018.

Liakopoulos, N., Destounis, A., Paschos, G., Spyropoulos, T., and Mertikopoulos, P. Cautious regret

minimization: Online optimization with long-term budget constraints. In Proceedings of the 39th International Conference on Machine Learning, pp. 3944 3952, 2019.

Lin, M., Liu, Z., Wierman, A., and Andrew, L. L. Online algorithms for geographical load balancing.

In International Green Computing Conference, pp. 1 10, 2012a.

Lin, M., Wierman, A., Andrew, L. L., and Thereska, E. Dynamic right-sizing for power-proportional

data centers. IEEE/ACM Transactions on Networking, 21(5):1378 1391, 2012b.

Mahdavi, M., Jin, R., and Yang, T. Trading regret for efﬁciency: online convex optimization with

long term constraints. The Journal of Machine Learning Research, 13(1):2503 2528, 2012.

Mannor, S., Tsitsiklis, J. N., and Yu, J. Y. Online learning with sample path constraints. Journal of

Machine Learning Research, 10(3), 2009.

Mc Mahan, H. B. and Streeter, M. Adaptive bound optimization for online convex optimization. In

Proceedings of the 23rd Annual Conference on Learning Theory, pp. 224 256, 2010.

Neely, M. J. and Yu, H. Online convex optimization with time-varying constraints. ar Xiv preprint

ar Xiv:1702.04783, 2017.

Orabona, F. A modern introduction to online learning. ar Xiv preprint ar Xiv:1912.13213, 2019.

Ruan, Y., Yang, J., and Zhou, Y. Linear bandits with limited adaptivity and learning distributional

optimal design. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pp. 74 87, 2021.

Shalev-Shwartz, S. Online learning and online convex optimization. Foundations and Trends in

Machine Learning, 4(2):107 194, 2011.

Sherman, U. and Koren, T. Lazy oco: Online convex optimization on a switching budget. Proceed-

ings of the 34th Annual Conference on Learning Theory, 2021.

Simchi-Levi, D. and Xu, Y. Phase transitions and cyclic phenomena in bandits with switching constraints. In Advances in Neural Information Processing Systems 32, pp. 7523 7532, 2019.

Valls, V., Iosiﬁdis, G., Leith, D., and Tassiulas, L. Online convex optimization with perturbed constraints: Optimal rates against stronger benchmarks. In Proceedings of the 23rd International Conference on Artiﬁcial Intelligence and Statistics, pp. 2885 2895, 2020.

van Erven, T. and Koolen, W. M. Meta Grad: Multiple learning rates in online learning. In Advances

in Neural Information Processing Systems 29, pp. 3666 3674, 2016.

Vittori, E., de Luca, M. B., Trov o, F., and Restelli, M. Dealing with transaction costs in portfolio

optimization: Online gradient descent with momentum. In ACM International Conference on AI in Finance, pp. 1 8, 2020.

Wang, G., Lu, S., and Zhang, L. Adaptivity and optimality: A universal algorithm for online convex

optimization. In Proceedings of the 35th Conference on Uncertainty in Artiﬁcial Intelligence, 2019.

Yi, X., Li, X., Xie, L., and Johansson, K. H. Distributed online convex optimization with time-

varying coupled inequality constraints. IEEE Transactions on Signal Processing, 68:731 746, 2020.

Yu, H., Neely, M. J., and Wei, X. Online convex optimization with stochastic constraints. In Advances in Neural Information Processing Systems 30, pp. 1427 1437, 2017.

Zanini, F., Atienza, D., De Micheli, G., and Boyd, S. P. Online convex optimization-based algo-

rithm for thermal management of mpsocs. In Proceedings of the 20th symposium on Great lakes symposium on VLSI, pp. 203 208, 2010.

Zhang, L., Lu, S., and Zhou, Z.-H. Adaptive online learning in dynamic environments. In Advances

in Neural Information Processing Systems 31, pp. 1323 1333, 2018.

Zhang, L., Liu, T.-Y., and Zhou, Z.-H. Adaptive regret of convex and smooth functions. In Proceed-

ings of the 36th International Conference on Machine Learning, pp. 7414 7423, 2019.

Zhang, L., Jiang, W., Lu, S., and Yang, T. Revisiting smoothed online learning. In Advances in

Neural Information Processing Systems 34, 2021a.

Zhang, L., Wang, G., Tu, W.-W., and Zhou, Z.-H. Dual adaptivity: A universal algorithm for min-

imizing the adaptive regret of convex functions. In Advances in Neural Information Processing Systems 34, 2021b.

Zinkevich, M. Online convex programming and generalized inﬁnitesimal gradient ascent. In Pro-

ceedings of the 20th International Conference on Machine Learning, pp. 928 936, 2003.