# revisiting_smoothed_online_learning__383313de.pdf

Revisiting Smoothed Online Learning

Lijun Zhang1,2, Wei Jiang1, Shiyin Lu1, Tianbao Yang3

1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 2Peng Cheng Laboratory, Shenzhen, Guangdong, China 3Department of Computer Science, The University of Iowa, Iowa City, IA 52242, USA {zhanglj, jiangw, lusy}@lamda.nju.edu.cn, tianbao-yang@uiowa.edu

In this paper, we revisit the problem of smoothed online learning, in which the online learner suffers both a hitting cost and a switching cost, and target two performance metrics: competitive ratio and dynamic regret with switching cost. To bound the competitive ratio, we assume the hitting cost is known to the learner in each round, and investigate the simple idea of balancing the two costs by an optimization problem. Surprisingly, we ﬁnd that minimizing the hitting cost alone is max(1, 2

α)-competitive for α-polyhedral functions and 1 + 4

λ-competitive for λ-quadratic growth functions, both of which improve state-of-the-art results signiﬁcantly. Moreover, when the hitting cost is both convex and λ-quadratic growth, we reduce the competitive ratio to 1 + 2

λ by minimizing the weighted sum of the hitting cost and the switching cost. To bound the dynamic regret with switching cost, we follow the standard setting of online convex optimization, in which the hitting cost is convex but hidden from the learner before making predictions. We modify Ader, an existing algorithm designed for dynamic regret, slightly to take into account the switching cost when measuring the performance. The proposed algorithm, named as Smoothed Ader, attains an optimal O( p

T(1 + PT )) bound for dynamic regret with switching cost, where PT is the path-length of the comparator sequence. Furthermore, if the hitting cost is accessible in the beginning of each round, we obtain a similar guarantee without the bounded gradient condition, and establish an Ω( p

T(1 + PT )) lower bound to conﬁrm the optimality.

1 Introduction

Online learning is the process of making a sequence of predictions given knowledge of the answer to previous tasks and possibly additional information [Shalev-Shwartz, 2011]. While the traditional online learning aims to make the prediction as accurate as possible, in this paper, we study smoothed online learning (SOL), where the online learner incurs a switching cost for changing its predictions between rounds [Cesa-Bianchi et al., 2013]. SOL has received lots of attention recently because in many real-world applications, a change of action usually brings some additional cost. Examples include the dynamic right-sizing for data centers [Lin et al., 2011], geographical load balancing [Lin et al., 2012], real-time electricity pricing [Kim and Giannakis, 2014], video streaming [Joseph and de Veciana, 2012], spatiotemporal sequence prediction [Kim et al., 2015], multi-timescale control [Goel et al., 2017], and thermal management [Zanini et al., 2010].

Speciﬁcally, SOL is performed in a sequence of consecutive rounds, where at round t the learner is asked to select a point xt from the decision set X, and suffers a hitting cost ft(xt). Depending on the performance metric, the learner may be allowed to observe ft( ) when making decisions, which is different from the traditional online learning in which ft( ) is revealed to the learner after submitting the decision [Cesa-Bianchi and Lugosi, 2006]. Additionally, the learner also incurs a switching cost m(xt, xt 1) for changing decisions between successive rounds. The switching cost m(xt, xt 1)

35th Conference on Neural Information Processing Systems (Neur IPS 2021).

could be any distance function, such as the ℓ2-norm distance xt xt 1 and the squared ℓ2-norm distance xt xt 1 2/2 [Goel et al., 2019]. In the literature, there are two performance metrics for SOL: competitive ratio and dynamic regret with switching cost.

Competitive ratio is popular in the community of online algorithms [Borodin and El-Yaniv, 1998]. It is deﬁned as the worst-case ratio of the total cost incurred by the online learner and the ofﬂine optimal cost: PT t=1 ft(xt) + m(xt, xt 1)

minu0,u1,...,u T X PT t=1 ft(ut) + m(ut, ut 1) . (1)

When focusing on the competitive ratio, the learner can observe ft( ) before picking xt. The problem is still nontrivial due to the coupling created by the switching cost. On the other hand, dynamic regret with switching cost is a generalization of dynamic regret a popular performance metric in the community of online learning [Zinkevich, 2003]. It is deﬁned as the difference between the total cost incurred by the online learner and that of an arbitrary comparator sequence u0, u1, . . . , u T X:

ft(xt) + m(xt, xt 1)

ft(ut) + m(ut, ut 1) . (2)

Different from previous work [Chen et al., 2018, Goel et al., 2019], we did not introduce the minimization operation over u0, u1, . . . , u T in (2). The reason is that we want to bound (2) by certain regularities of the comparator sequence, such as the path-length

PT (u0, u1, . . . , u T ) =

t=1 ut ut 1 . (3)

When focusing on (2), ft( ) is generally hidden from the learner before submitting xt. The conditions for bounding the two metrics are very different, so we study competitive ratio and dynamic regret with switching cost separately. To bound the two metrics simultaneously, we refer to Andrew et al. [2013] and Daniely and Mansour [2019], especially the meta-algorithm in the latter work.

This paper follows the line of research stemmed from online balanced descent (OBD) [Chen et al., 2018, Goel and Wierman, 2019]. The key idea of OBD is to ﬁnd an appropriate balance between the hitting cost and the switching cost through iterative projections. It has been shown that OBD and its variants are able to exploit the analytical properties of the hitting cost (e.g., polyhedral, strongly convex) to derive dimension-free competitive ratio. At this point, it would be natural to ask why not use the greedy algorithm, which minimizes the weighted sum of the hitting cost and the switching cost in each round, i.e., min x X ft(x) + γm(x, xt 1) (4)

to balance the two costs, where γ 0 is the trade-off parameter. We note that the greedy algorithm is usually treated as the baseline in competitive analysis [Borodin and El-Yaniv, 1998], but its usage for smoothed online learning is quite limited. One result is given by Goel et al. [2019], who demonstrate that the greedy algorithm as a special case of Regularized OBD (R-OBD), is optimal for strongly convex functions. Besides, Lin et al. [2020] have analyzed the greedy algorithm with γ = 0, named as the naive approach below, for polyhedral functions and quadratic growth functions.

In this paper, we make the following contributions towards understanding the greedy algorithm.

For α-polyhedral functions, the competitive ratio of the naive approach is max(1, 2

α), which is a signiﬁcant improvement over the 3 + 8

α competitive ratio of OBD [Chen et al., 2018] and the 1 + 2

α ratio proved by Lin et al. [2020, Lemma 1]. When α > 2, the ratio becomes 1, indicating that the naive approach is optimal in this scenario. For λ-quadratic growth functions, the competitive ratio of the naive algorithm is 1 + 4

λ, which matches the lower bound of this algorithm [Goel et al., 2019, Theorem 5], and is better than the max(1 + 6

λ, 4) ratio obtained by Lin et al. [2020, Lemma 1]. If the hitting cost is both convex and λ-quadratic growth, the greedy algorithm with γ > 0 attains a 1+ 2

λ competitive ratio, which demonstrates the advantage of taking the switching

cost into considerations. Our 1 + 2

λ ratio is on the same order as Greedy OBD [Goel et al., 2019, Theorem 3] but with much smaller constants.

Our analysis of the naive approach and the greedy algorithm is very simple. In contrast, both OBD and Greedy OBD rely on intricate geometric arguments.

While both OBD and R-OBD are equipped with sublinear dynamic regret with switching cost, they are unsatisfactory in the following aspects:

The regret of OBD depends on an upper bound of the path-length instead of the path-length itself [Chen et al., 2018, Corollary 11], making it nonadaptive. The regret of R-OBD is adaptive but it uses the squared ℓ2-norm to measure the switching cost, which may not be suitable for general convex functions [Goel et al., 2019].1 Both OBD and R-OBD observe ft( ) before selecting xt, which violates the convention of online learning.

To avoid the above limitations, we demonstrate that a small change of Ader [Zhang et al., 2018a], which is an existing algorithm designed for dynamic regret, is sufﬁcient to minimize the dynamic regret with switching cost under the setting of online convex optimization [Shalev-Shwartz, 2011]. Ader runs multiple online gradient descent (OGD) [Zinkevich, 2003] with different step sizes as expert-algorithms, and uses Hedge [Freund and Schapire, 1997] as the meta-algorithm to aggregate predictions from experts. The only modiﬁcation is to incorporate the switching cost into the loss of Hedge. The proposed algorithm, named as Smoothed Ader (SAder), attains the optimal O( p

T(1 + PT )) dynamic regret, where PT is the path-length deﬁned in (3). Thus, our regret bound is adaptive because it automatically becomes small when the comparators change slowly. Finally, we also investigate the case that the hitting cost is available before predictions, and establish a similar result without the bounded gradient condition. To this end, we design a lookahead version of SAder, which chooses the greedy algorithm in (4) as the expert and utilizes the cost of the current round in Hedge. To show the optimality of this algorithm, we further establish an Ω( p

T(1 + PT )) lower bound under the lookahead setting.

2 Related work

This section reviews related work on smoothed online learning (SOL) and dynamic regret.

2.1 Smoothed online learning

SOL has been investigated under the setting of multi-armed bandits [Agrawal et al., 1990, Guha and Munagala, 2009, Dekel et al., 2014, Koren et al., 2017a,b], prediction with expert advice [Cesa Bianchi et al., 2013], and online convex optimization [Lin et al., 2011, Bansal et al., 2015, Zhao et al., 2020b]. In the following, we mainly discuss smoothed online convex optimization (SOCO).

The early works on SOCO focus on designing competitive algorithms in the low-dimensional setting [Lin et al., 2011]. In particular, Bansal et al. [2015] show that for SOCO on the real line, the competitive ratio can be upper bounded by 2, which is proved to be optimal [Antoniadis and Schewior, 2018]. They also establish a competitive ratio of 3 under the memoryless setting. In the study of SOCO, it is common to assume that the learner has access to predictions of future hitting costs, and several algorithms [Lin et al., 2012, Chen et al., 2015, 2016, Li et al., 2018, Li and Li, 2020] have been developed based on receding horizon control (RHC) [Kwon and Han, 2005]. In fact, the greedy algorithm in (4) can be treated as a variant of RHC. However, previous results for RHC are limited to special problems, and (4) remains under-explored. For example, when the learner can observe the next W hitting costs, Li et al. [2018] demonstrate that both the competitive ratio and the dynamic regret with switching cost decay exponentially fast with W. But their analysis relies on very strong conditions, including strong convexity and smoothness.

One milestone is the online balanced descent (OBD) [Chen et al., 2018], which has dimensionfree competitive ratio even when the learner can only observe the hitting cost of the current round. Speciﬁcally, OBD iteratively projects the previous point onto a carefully chosen level set of the hitting cost so as to balance the switching cost and the hitting cost. When the hitting cost is α-polyhedral and convex, and the switching cost is the ℓ2-norm distance, OBD attains a 3 + 8

α competitive ratio. Furthermore, OBD can also be tuned to control the dynamic regret with switching cost. Let L be

1We usually assume the convex function is Lipschitz continuous, and in this case choosing the ℓ2-norm distance as the switching cost makes it on the same order as the hitting cost.

an upper bound of the path-length of the comparator sequence, i.e., PT (u0, u1, . . . , u T ) L. Chen et al. [2018, Corollary 11] have proved that

ft(xt) + xt xt 1 min PT (u0,u1,...,u T ) L

ft(ut) + ut ut 1 = O

leading to sublinear regret when L = o(T). However, the upper bound is nonadaptive because it depends on L instead of the actual path-length PT .

Later, Goel and Wierman [2019] demonstrate that OBD is 3 + O( 1

λ)-competitive for λ-strongly convex functions, when the switching cost is set to be the squared ℓ2-norm distance. In a subsequent work, Goel et al. [2019, Theorem 4] propose Regularized OBD (R-OBD), which improves the

competitive ratio to 1

λ, matching the lower bound of strongly convex functions exactly [Goel et al., 2019, Theorem 1]. R-OBD includes (4) as a special case, which also enjoys the optimal competitive ratio for strongly convex functions. Furthermore, Goel et al. [2019, Theorem 6] have analyzed the dynamic regret of R-OBD and the following result can be extracted from that paper

2 xt xt 1 2

2 ut ut 1 2 = O

t=1 ut ut 1 2

Compared with (5), this bound is adaptive because the upper bound depends on the switching cost of the comparator sequence. However, it chooses the squared ℓ2-norm as the switching cost, which may not be suitable for general convex functions. When the hitting cost is both quasiconvex and λquadratic growth, Goel et al. [2019, Theorem 3] have demonstrated that their Greedy OBD algorithm attains an O(1/

λ) competitive ratio, as λ 0.

Lin et al. [2020] have analyzed the naive approach which ignores the switching cost and simply minimizes the hitting cost in each round, i.e., the greedy algorithm with γ = 0. It is a bit surprising that this naive approach is 1 + 2

α-competitive for α-polyhedral functions and max(1 + 6

λ, 4)-competitive for λ-quadratic growth functions, without any convexity assumption [Lin et al., 2020, Lemma 1]. Argue et al. [2020a] have investigated a hybrid setting in which the hitting cost is both λ-strongly convex and H-smooth, but the switching cost is the ℓ2-norm distance instead of the squared one. They develop Constrained OBD, and establish a 4 + 4 p

2H/λ competitive ratio. However, their analysis relies on a strong condition that the hitting cost is non-negative over the whole space, i.e., minx Rd ft(x) = 0, as opposed to the usual condition minx X ft(x) = 0.

Finally, we note that SOCO is closely related to convex body chasing (CBC) [Friedman and Linial, 1993, Antoniadis et al., 2016, Bansal et al., 2018, Argue et al., 2019, Bubeck et al., 2019, 2020]. In this problem, the online learner receives a sequence of convex bodies X1, . . . , XT Rd and must select one point from each body, and the goal is to minimize the total movement between consecutive output points. Apparently, we can treat CBC as a special case of SOCO by deﬁning the hitting cost ft( ) as the indicator function of Xt, which means that the domains of hitting costs are allowed to be different. On the other hand, we can also formulate a d-dimensional SOCO problem as a d + 1-dimensional CBC problem [Lin et al., 2020, Proposition 1]. For the general setting of CBC, the competitive ratio exhibits a polynomial dependence on the dimensionality, and the state-of-the-art result is O(min(d, d log T)) [Argue et al., 2020b, Sellke, 2020], which nearly match the Ω(

d) lower bound [Friedman and Linial, 1993]. Our paper aims to derive dimensionality-independent competitive ratios and sublinear dynamic regret for SOCO, under appropriate conditions.

2.2 Dynamic regret

Recently, dynamic regret has attained considerable interest in the community of online learning [Zhang, 2020]. The motivation of dynamic regret is to deal with changing environments, in which the optimal decision may change over time. It is deﬁned as the difference between the cumulative loss of the learner and that of a sequence of comparators u1, . . . , u T X:

D-Regret(u1, . . . , u T ) =

t=1 ft(ut). (6)

In the general form of dynamic regret, u1, . . . , u T could be an arbitrary sequence [Zinkevich, 2003, Hall and Willett, 2013, Zhang et al., 2018a, 2020, Cutkosky, 2020, Zhao et al., 2020a], and in the restricted form, they are chosen as the minimizers of online functions, i.e., ut argminx X ft(x) [Jadbabaie et al., 2015, Besbes et al., 2015, Yang et al., 2016, Mokhtari et al., 2016, Zhang et al., 2017, 2018b, Wan et al., 2021, Zhao and Zhang, 2021]. While it is well-known that sublinear dynamic regret is unattainable in the worst case, one can bound the dynamic regret in terms of some regularities of the comparator sequence. An instance is given by Zinkevich [2003], who introduces the notion of path-length deﬁned in (3) to measure the temporal variability of the comparator sequence, and derives an O(

T(1 + PT )) bound for the dynamic regret of OGD. Later, Zhang et al. [2018a] develop adaptive learning for dynamic environment (Ader), which achieves the optimal O( p

T(1 + PT )) dynamic regret. In this paper, we show that a small change of Ader attains the same bound for dynamic regret with switching cost.

3 Competitive ratio

In this section, we focus on competitive ratio. Without loss of generality, we assume the hitting cost is non-negative, since the competitive ratio can only improve if this is not the case.

3.1 Polyhedral functions

We ﬁrst introduce the deﬁnition of polyhedral functions.

Deﬁnition 1 A function f( ) : X 7 R with minimizer v is α-polyhedral if

f(x) f(v) α x v , x X. (7)

We note that polyhedral functions have been used for stochastic network optimization [Huang and Neely, 2011] and geographical load balancing [Lin et al., 2012].

Following Chen et al. [2018], we set the switching cost as m(xt, xt 1) = xt xt 1 . Intuitively, we may expect that the switching cost should be taken into consideration when making decisions. However, our analysis shows that minimizing the hitting cost alone yields the tightest competitive ratio so far. Speciﬁcally, we consider the following naive approach that ignores the switching cost and selects xt = argmin x X ft(x). (8)

The theoretical guarantee of (8) is stated below.

Theorem 1 Suppose each ft( ) : X 7 R with minimizer vt is α-polyhedral. We have

ft(xt) + xt xt 1 max 1, 2

ft(ut) + ut ut 1 , u0, u1, . . . , u T X

where we assume x0 = u0.

Remark: Our max(1, 2

α) competitive ratio is much better than the 3 + 8

α ratio of OBD [Chen et al., 2018], and also better than the 1 + 2

α ratio established by Lin et al. [2020] for (8). When α > 2, the ratio becomes 1, indicating that the naive approach is optimal in this scenario. Furthermore, the proof of Theorem 1 is much simpler than that of OBD, and reﬁnes that of Lin et al. [2020, Lemma 1].

We have analyzed the greedy algorithm (4) with γ > 0, but the competitive ratio does not improve. So, the current knowledge suggests that there is no need to consider the switching cost when facing polyhedral functions. It is unclear whether this is an artifact of our analysis or an inherent property, and will be investigated in the future. Due to space limitations, all the proofs are deferred to the supplementary.

3.2 Quadratic growth functions

In this section, we consider the quadratic growth condition.

Deﬁnition 2 A function f( ) : X 7 R with minimizer v is λ-quadratic growth if

f(x) f(v) λ

2 x v 2, x X. (9)

The quadratic growth condition has been exploited by the optimization community [Drusvyatskiy and Lewis, 2018, Necoara et al., 2019] to establish linear convergence, and this condition is weaker than strong convexity [Hazan and Kale, 2011].

Following Goel et al. [2019], we set the switching cost as m(xt, xt 1) = xt xt 1 2/2. We also consider the naive approach in (8) and have the following theoretical guarantee.

Theorem 2 Suppose each ft( ) : X 7 R with minimizer vt is λ-quadratic growth. We have

2 xt xt 1 2 1 + 4

2 ut ut 1 2 ,

for all u0, u1, . . . , u T X, where we assume x0 = u0.

Remark: The above theorem implies that the naive approach achieves a competitive ratio of 1 + 4

λ, which matches the lower bound of this algorithm [Goel et al., 2019, Theorem 5]. Furthermore, it is also much better than the max(1 + 6

λ, 4) ratio established by Lin et al. [2020] for (8). Similar to the case of polyhedral functions, it seems safe to ignore the switching cost here.

3.3 Convex and quadratic growth functions

When ft( ) is both quasiconvex and λ-quadratic growth, Goel et al. [2019] have established an O(1/

λ) competitive ratio for Greedy OBD. Inspired by this result, we introduce convexity to further improve the competitive ratio. In this case, the switching cost plays a role in deriving tighter competitive ratios. Speciﬁcally, we choose the greedy algorithm with γ > 0 to select xt, i.e.,

xt = argmin x X

2 x xt 1 2 . (10)

The theoretical guarantee of (10) is stated below.

Theorem 3 Suppose the domain X is convex, and each ft( ) : X 7 R with minimizer vt is λ-quadratic growth and convex. By setting γ = λ/(λ +

λ), we have

2 xt xt 1 2 1 + 2

2 ut ut 1 2 ,

for all u0, u1, . . . , u T X, where we assume x0 = u0.

Remark: The above theorem shows that the competitive ratio is improved to 1 + 2

λ under the additional convexity condition. According to the lower bound of strongly convex functions [Goel et al., 2019, Theorem 1], the ratio in Theorem 3 is optimal up to constant factors. Compared with Greedy OBD [Goel et al., 2019, Theorem 3], our assumption is slightly stronger, since we require convexity instead of quasiconvexity. However, our algorithm and analysis are much simpler, and the constants in our bound are much smaller.

4 Dynamic regret with switching cost

When considering dynamic regret with switching cost, we adopt the common assumptions of online convex optimization (OCO) [Shalev-Shwartz, 2011].

Assumption 1 All the functions ft s are convex over their domain X.

Assumption 2 The gradients of all functions are bounded by G, i.e.,

max x X ft(x) G, t [T]. (11)

Algorithm 1 SAder: Meta-algorithm Require: A step size β, and a set H containing step sizes for experts

1: Activate a set of experts {Eη|η H} by invoking the expert-algorithm for each step size η H 2: Sort step sizes in ascending order η1 η2 ηN, and set wηi 1 = C i(i+1) 3: for t = 1, . . . , T do 4: Receive xη t from each expert Eη

5: Output the weighted average xt = P

η H wη t xη t 6: Observe the loss function ft( ) 7: Update the weight of each expert by (14) 8: Send gradient ft(xt) to each expert Eη

Assumption 3 The diameter of the domain X is bounded by D, i.e.,

max x,x X x x D. (12)

Assumption 2 implies that the hitting cost is Lipschitz continuous, so it is natural to set the switching cost as m(xt, xt 1) = xt xt 1 .

4.1 The standard setting

We ﬁrst follow the standard setting of OCO in which the learner can not observe the hitting cost when making predictions, and develop an algorithm based on Ader [Zhang et al., 2018a]. Speciﬁcally, we demonstrate that a small change of Ader, which modiﬁes the loss of the meta-algorithm to take into account the switching cost of experts, is sufﬁcient to minimize the dynamic regret with switching cost. Our proposed method is named as Smoothed Ader (SAder), and stated below.2

Meta-algorithm The meta-algorithm is similar to that of Ader [Zhang et al., 2018a, Algorithm 3], and summarized in Algorithm 1. The inputs of the meta-algorithm are its own step size β, and a set H of step sizes for experts. In Step 1, we active a set of experts {Eη|η H} by invoking the expert-algorithm for each η H. In Step 2, we set the initial weight of each expert. Let ηi be the i-th smallest step size in H. The weight of Eηi is chosen as

wηi 1 = C i(i + 1), and C = 1 + 1

In each round, the meta-algorithm receives a set of predictions {xη t |η H} from all experts (Step 4), and outputs the weighted average (Step 5):

η H wη t xη t

where wη t is the weight assigned to expert Eη. After observing the loss function, the weights of experts are updated according to the exponential weighting scheme (Step 7) [Cesa-Bianchi and Lugosi, 2006]:

wη t+1 = wη t e βℓt(xη t ) P

η H wη t e βℓt(xη t ) (14)

where ℓt(xη t ) = ft(xt), xη t xt + xη t xη t 1 . (15)

When t = 1, we set xη 0 = 0, for all η H. As can be seen from (15), we incorporate the switching cost xη t xη t 1 of expert Eη to measure its performance. This is the only modiﬁcation made to Ader. In the last step, we send the gradient ft(xt) to each expert Eη so that they can update their own predictions.

2In a concurrent work, Zhao et al. [2021] independently develop a similar algorithm for OCO with memory.

Algorithm 2 SAder: Expert-algorithm Require: The step size η

1: Let xη 1 be any point in X 2: for t = 1, . . . , T do 3: Submit xη t to the meta-algorithm 4: Receive gradient ft(xt) from the meta-algorithm 5:

xη t+1 = ΠX xη t η ft(xt)

Expert-algorithm The expert-algorithm is the same as that of Ader [Zhang et al., 2018a, Algorithm 4], which is OGD over the linearized loss or the surrogate loss

st(x) = ft(xt), x xt . (16)

For the sake of completeness, we present its procedure in Algorithm 2. The input of the expert is its step size η. In Step 3 of Algorithm 2, each expert submits its prediction xη t to the meta-algorithm, and receives the gradient ft(xt) in Step 4. Then, in Step 5, it performs gradient descent

xη t+1 = ΠX xη t η ft(xt)

to get the prediction for the next round. Here, ΠX [ ] denotes the projection onto the nearest point in X.

We have the following theoretical guarantee.

Theorem 4 Set

ηi = 2i 1 s

i = 1, . . . , N

2 log2(1 + 2T) + 1, and β = 2 (2G + 1)D

in Algorithm 1. Under Assumptions 1, 2 and 3, for any comparator sequence u0, u1, . . . , u T X, SAder satisﬁes

ft(xt) + xt xt 1

t=1 ft(ut) (18)

v u u t T(G2 + 2G)

t=1 ut ut 1

+ (2G + 1)D

8 [1 + 2 ln(k + 1)]

T(1 + PT ) +

T(1 + log log PT ) = O p

where we deﬁne x0 = 0, and

Remark: Theorem 4 shows that SAder attains an O( p

T(1 + PT )) bound for dynamic regret with switching cost, which is on the same order as that of Ader for dynamic regret. From the Ω( p

T(1 + PT )) lower bound of dynamic regret [Zhang et al., 2018a, Theorem 2], we know that our upper bound is optimal up to constant factors. Compared with the regret bound of OBD in (5) [Chen et al., 2018], the advantage of SAder is that its regret depends on the path-length PT directly, and thus becomes tighter when focusing on comparator sequences with smaller path-lengths. Finally, note that in (18), we did not minus the switching cost of the comparator sequence, i.e., PT , that is because it is always smaller than p

DT(1 + PT ) and does not affect the order.

Algorithm 3 Lookahead SAder: Meta-algorithm Require: A step size β, and a set H containing step sizes for experts

1: Activate a set of experts {Eη|η H} by invoking the expert-algorithm for each step size η H 2: Sort step sizes in ascending order η1 η2 ηN, and set wηi 0 = C i(i+1) 3: for t = 1, . . . , T do 4: Observe the loss function ft( ) and send it to each expert Eη

5: Receive xη t from each expert Eη

6: Update the weight of each expert by (20) 7: Output the weighted average xt = P

η H wη t xη t 8: end for

Algorithm 4 Lookahead SAder: Expert-algorithm Require: The step size η

1: for t = 1, . . . , T do 2: Receive the loss ft( ) from the meta-algorithm 3: Solve the optimization problem in (21) to obtain xη t 4: Submit xη t to the meta-algorithm 5: end for

4.2 The lookahead setting

It is interesting to investigate whether we can do better if the hitting cost is available before predictions. In this case, we propose a lookahead version of SAder, and demonstrate that the regret bound remains on the same order, but Assumption 2 can be dropped. That is, the gradient of the function could be unbounded, and thus the function could also be unbounded.

Meta-algorithm We design a lookahead version of Hedge, and summarize it in Algorithm 3. Compared with Algorithm 1, we make the following modiﬁcations.

In the t-th round, the meta-algorithm ﬁrst sends ft( ) to all experts so that they can also beneﬁt from the prior knowledge of ft( ) (Step 4). After receiving the prediction from experts (Step 5), the meta-algorithm makes use of ft( ) to determine the weights of experts (Step 6):

wη t = wη t 1e βℓt(xη t ) P

η H wη t 1e βℓt(xη t ) (20)

where ℓt(xη t ) is deﬁned in (15).

Expert-algorithm To exploit the hitting cost of the current round, we choose an instance of the greedy algorithm in (4) as the expert-algorithm, and summarize it in Algorithm 4. The input of the expert is its step size η. After receiving ft( ) (Step 2), the expert solves the following optimization problem to obtain xη t (Step 3):

min x X ft(x) + 1

2η x xη t 1 2. (21)

We have the following theoretical guarantee of the lookahead SAder.

Theorem 5 Set

ηi = 2i 1 r

i = 1, . . . , N

2 log2(1 + 2T) + 1, and β = 1

in Algorithm 3. Under Assumptions 1 and 3, for any comparator sequence u0, u1, . . . , u T X, the lookahead SAder satisﬁes

ft(xt) + xt xt 1

v u u t T(D2 + 2D

t=1 ut ut 1 ) + D

2 [1 + 2 ln(k + 1)]

T(1 + PT ) +

T(1 + log log PT ) = O p

where x0 = 0, and k is deﬁned in (19).

Remark: Similar to SAder, the lookahead SAder also achieves an O( p

T(1 + PT )) bound for dynamic regret with switching cost. In the lookahead setting, we do not need Assumption 2 any more, and the constants in Theorem 5 are independent from G.

Lower Bound To show the optimality of Theorem 5, we provide the lower bound of dynamic regret with switching cost under the lookahead setting.

Theorem 6 For any online algorithm with lookahead ability and any τ [0, TD], there exists a sequence of functions f1, . . . , f T and a sequence of comparators u1, . . . , u T satisfying Assumptions 1 and 3 such that (i) the path-length of u1, . . . , u T is at most τ and (ii) the dynamic regret with switching cost w.r.t. u1, . . . , u T is at least Ω( p

T(D2 + Dτ)).

Remark: The above theorem indicates an Ω( p

T(1 + PT )) lower bound, which implies that the lookahead SAder is optimal up to constant factors. Thus, even in the lookahead setting, it is impossible to improve the O( p

T(1 + PT )) upper bound.

5 Conclusion and future work

We investigate the problem of smoothed online learning (SOL), and derive constant competitive ratio or sublinear dynamic regret with switching cost. For competitive ratio, we demonstrate that the naive approach, which only minimizes the hitting cost, is max(1, 2

α)-competitive for α-polyhedral functions and 1 + 4

λ-competitive for λ-quadratic growth functions. Furthermore, we show that the greedy algorithm, which minimizes the weighted sum of the hitting cost and the switching cost, is 1 + 2

λ-competitive for convex and λ-quadratic growth functions. For dynamic regret with switching

cost, we propose smoothed Ader (SAder), which attains the optimal O( p

T(1 + PT )) bound. We also develop a lookahead version of SAder to make use of the prior knowledge of the hitting cost, and establish an Ω( p

T(1 + PT )) lower bound.

The research on SOL is still on its early stage, and there are many open problems.

1. Although we can upper bound the sum of the hitting cost and the switching cost, we do not have a direct control over the switching cost. However, in many real problems, there may exist a hard constraint on the switching cost, motivating the study of switch-constrained online learning, in which the times of switches are limited [Altschuler and Talwar, 2018, Chen et al., 2020]. It would be interesting to investigate how to impose a budget on the switching cost [Wang et al., 2021]. 2. This work investigates competitive ratio and dynamic regret with switching cost separately. To bound the two metrics simultaneously, one possible way is to create two experts which are designed for competitive ratio and dynamic regret with switching cost respectively, and then aggregate their predictions by the meta-algorithm of Daniely and Mansour [2019, Algorithm 2]. But we need to assume the hitting cost is bounded, because that meta-algorithm cannot make use of the lookahead ability. 3. As aforementioned, for polyhedral functions and quadratic growth functions, the best competitive ratio is obtained by the naive approach which ignores the switching cost, and this fact is counterintuitive. To better understand the challenge, it is important to reveal the lower bound of polyhedral functions and quadratic growth functions.

Acknowledgments and Disclosure of Funding

The authors would like to thank Yuxuan Xiang for discussions about Theorem 4. Funding in direct support of this work: NSFC grant 62122037 and 61921006, Jiangsu SF grant BK20200064.

R. Agrawal, M. Hegde, and D. Teneketzis. Multi-armed bandit problems with multiple plays and switching cost. Stochastics and Stochastic Reports, 29(4):437 459, 1990.

J. Altschuler and K. Talwar. Online learning over a ﬁnite action set with limited switching. In Proceedings of the 31st Annual Conference on Learning Theory, pages 1569 1573, 2018.

L. Andrew, S. Barman, K. Ligett, M. Lin, A. Meyerson, A. Roytman, and A. Wierman. A tale of two metrics: Simultaneous bounds on competitiveness and regret. In Proceedings of the 26th Annual Conference on Learning Theory, pages 741 763, 2013.

A. Antoniadis and K. Schewior. A tight lower bound for online convex optimization with switching costs. In Approximation and Online Algorithms, pages 164 175, 2018.

A. Antoniadis, N. Barcelo, M. Nugent, K. Pruhs, K. Schewior, and M. Scquizzato. Chasing convex bodies and functions. In Proceedings of the 12th Latin American Symposium on Theoretical Informatics, pages 68 81, 2016.

C. Argue, S. Bubeck, M. B. Cohen, A. Gupta, and Y. T. Lee. A nearly-linear bound for chasing nested convex bodies. In Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 117 122, 2019.

C. Argue, A. Gupta, and G. Guruganesh. Dimension-free bounds for chasing convex functions. In Proceedings of 33rd Conference on Learning Theory, pages 219 241, 2020a.

C. Argue, A. Gupta, G. Guruganesh, and Z. Tang. Chasing convex bodies with linear competitive ratio. In Proceedings of the 31st Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1519 1524, 2020b.

N. Bansal, A. Gupta, R. Krishnaswamy, K. Pruhs, K. Schewior, and C. Stein. A 2-competitive algorithm for online convex optimization with switching costs. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 96 109, 2015.

N. Bansal, M. Böhm, M. Eliáš, G. Koumoutsos, and S. W. Umboh. Nested convex bodies are chaseable. In Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1253 1260, 2018.

O. Besbes, Y. Gur, and A. Zeevi. Non-stationary stochastic optimization. Operations Research, 63 (5):1227 1244, 2015.

A. Borodin and R. El-Yaniv. Online Computation and Competitive Analysis. Cambridge University Press, 1998.

S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.

S. Bubeck, Y. T. Lee, Y. Li, and M. Sellke. Competitively chasing convex bodies. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 861 868, 2019.

S. Bubeck, B. Klartag, Y. T. Lee, Y. Li, and M. Sellke. Chasing nested convex bodies nearly optimally. In Proceedings of the 31st Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1496 1508, 2020.

N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.

N. Cesa-Bianchi, O. Dekel, and O. Shamir. Online learning with switching costs and other adaptive adversaries. In Advances in Neural Information Processing Systems 26, pages 1160 1168. 2013.

L. Chen, Q. Yu, H. Lawrence, and A. Karbasi. Minimax regret of switching-constrained online convex optimization: No phase transition. In Advances in Neural Information Processing Systems 33, pages 3477 3486, 2020.

N. Chen, A. Agarwal, A. Wierman, S. Barman, and L. L. Andrew. Online convex optimization using predictions. In Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 191 204, 2015.

N. Chen, J. Comden, Z. Liu, A. Gandhi, and A. Wierman. Using predictions in online optimization: Looking forward with an eye on the past. In Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, pages 193 206, 2016.

N. Chen, G. Goel, and A. Wierman. Smoothed online convex optimization in high dimensions via online balanced descent. In Proceedings of the 31st Conference on Learning Theory, pages 1574 1594, 2018.

T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience, second edition, 2006.

A. Cutkosky. Parameter-free, dynamic, and strongly-adaptive online learning. In Proceedings of the 37th International Conference on Machine Learning, pages 2250 2259, 2020.

A. Daniely and Y. Mansour. Competitive ratio vs regret minimization: achieving the best of both worlds. In Proceedings of the 30th International Conference on Algorithmic Learning Theory, pages 333 368, 2019.

O. Dekel, J. Ding, T. Koren, and Y. Peres. Bandits with switching costs: t2/3 regret. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing, pages 459 467, 2014.

D. Drusvyatskiy and A. S. Lewis. Error bounds, quadratic growth, and linear convergence of proximal methods. Mathematics of Operations Research, 43(3):919 948, 2018.

J. C. Duchi, A. Agarwal, and M. J. Wainwright. Dual averaging for distributed optimization: Convergence analysis and network scaling. IEEE Transactions on Automatic Control, 57(3): 592 606, 2012.

Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119 139, 1997.

J. Friedman and N. Linial. On convex body chasing. Discrete & Computational Geometry, 9:293 321, 1993.

G. Goel and A. Wierman. An online algorithm for smoothed regression and lqr control. In Proceedings of the 22nd International Conference on Artiﬁcial Intelligence and Statistics, pages 2504 2513, 2019.

G. Goel, N. Chen, and A. Wierman. Thinking fast and slow: Optimization decomposition across timescales. In Proceedings of the 56th IEEE Conference on Decision and Control, pages 1291 1298, 2017.

G. Goel, Y. Lin, H. Sun, and A. Wierman. Beyond online balanced descent: An optimal algorithm for smoothed online optimization. In Advances in Neural Information Processing Systems 32, pages 1875 1885, 2019.

S. Guha and K. Munagala. Multi-armed bandits with metric switching costs. In Proceedings of the 36th International Colloquium on Automata, Languages, and Programming, pages 496 507, 2009.

E. C. Hall and R. M. Willett. Dynamical models and tracking regret in online convex programming. In Proceedings of the 30th International Conference on Machine Learning, pages 579 587, 2013.

E. Hazan and S. Kale. Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization. In Proceedings of the 24th Annual Conference on Learning Theory, pages 421 436, 2011.

L. Huang and M. J. Neely. Delay reduction via lagrange multipliers in stochastic network optimization. IEEE Transactions on Automatic Control, 56(4):842 857, 2011.

A. Jadbabaie, A. Rakhlin, S. Shahrampour, and K. Sridharan. Online optimization: Competing with dynamic comparators. In Proceedings of the 18th International Conference on Artiﬁcial Intelligence and Statistics, pages 398 406, 2015.

V. Joseph and G. de Veciana. Jointly optimizing multi-user rate adaptation for video transport over wireless systems: Mean-fairness-variability tradeoffs. In Proceedings of the 31st Annual IEEE International Conference on Computer Communications, pages 567 575, 2012.

S.-J. Kim and G. B. Giannakis. Real-time electricity pricing for demand response using online convex optimization. In Proceedings of the 2012 IEEE PES Conference on Innovative Smart Grid Technologies, pages 1 5, 2014.

T. Kim, Y. Yue, S. Taylor, and I. Matthews. A decision tree framework for spatiotemporal sequence prediction. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 577 586, 2015.

T. Koren, R. Livni, and Y. Mansour. Multi-armed bandits with metric movement costs. In Advances in Neural Information Processing Systems 30, pages 4119 4128. 2017a.

T. Koren, R. Livni, and Y. Mansour. Bandits with movement costs and adaptive pricing. In Proceedings of the 30th Annual Conference on Learning Theory, pages 1242 1268, 2017b.

W. H. Kwon and S. Han. Receding Horizon Control. Springer, 2005.

Y. Li and N. Li. Leveraging predictions in smoothed online convex optimization via gradient-based algorithms. In Advances in Neural Information Processing Systems 33, pages 14520 14531, 2020.

Y. Li, G. Qu, and N. Li. Using predictions in online optimization with switching costs: A fast algorithm and a fundamental limit. In the 2018 Annual American Control Conference, pages 3008 3013, 2018.

M. Lin, A. Wierman, L. L. Andrew, and E. Thereska. Dynamic right-sizing for power-proportional data centers. In Proceedings of the 30th IEEE International Conference on Computer Communications, pages 1098 1106, 2011.

M. Lin, Z. Liu, A. Wierman, and L. L. Andrew. Online algorithms for geographical load balancing. In Proceedings of the 2012 International Green Computing Conference, pages 1 10, 2012.

Y. Lin, G. Goel, and A. Wierman. Online optimization with predictions and non-convex losses. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 4(1):18:1 18:32, 2020.

A. Mokhtari, S. Shahrampour, A. Jadbabaie, and A. Ribeiro. Online optimization in dynamic environments: Improved regret rates for strongly convex problems. In Proceedings of the 55th IEEE Conference on Decision and Control, pages 7195 7201, 2016.

I. Necoara, Y. Nesterov, and F. Glineur. Linear convergence of ﬁrst order methods for non-strongly convex optimization. Mathematical Programming, 175(1):69 107, 2019.

M. Sellke. Chasing convex bodies optimally. In Proceedings of the 31st Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1509 1518, 2020.

S. Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4(2):107 194, 2011.

Y. Wan, B. Xue, and L. Zhang. Projection-free online learning in dynamic environments. In Proceedings of the 35th AAAI Conference on Artiﬁcial Intelligence, pages 10067 10075, 2021.

G. Wang, Y. Wan, T. Yang, and L. Zhang. Online convex optimization with continuous switching constraint. In Advances in Neural Information Processing Systems 34, 2021.

T. Yang, L. Zhang, R. Jin, and J. Yi. Tracking slowly moving clairvoyant: Optimal dynamic regret of online learning with true and noisy gradient. In Proceedings of the 33rd International Conference on Machine Learning, pages 449 457, 2016.

F. Zanini, D. Atienza, G. De Micheli, and S. P. Boyd. Online convex optimization-based algorithm for thermal management of mpsocs. In Proceedings of the 20th Symposium on Great Lakes Symposium on VLSI, pages 203 208, 2010.

L. Zhang. Online learning in changing environments. In Proceedings of the 29th International Joint Conference on Artiﬁcial Intelligence, pages 5178 5182, 2020. Early Career.

L. Zhang, T. Yang, J. Yi, R. Jin, and Z.-H. Zhou. Improved dynamic regret for non-degenerate functions. In Advances in Neural Information Processing Systems 30, pages 732 741, 2017.

L. Zhang, S. Lu, and Z.-H. Zhou. Adaptive online learning in dynamic environments. In Advances in Neural Information Processing Systems 31, pages 1323 1333, 2018a.

L. Zhang, T. Yang, R. Jin, and Z.-H. Zhou. Dynamic regret of strongly adaptive methods. In Proceedings of the 35th International Conference on Machine Learning, pages 5882 5891, 2018b.

L. Zhang, S. Lu, and T. Yang. Minimizing dynamic regret and adaptive regret simultaneously. In Proceedings of the 23rd International Conference on Artiﬁcial Intelligence and Statistics, pages 309 319, 2020.

P. Zhao and L. Zhang. Improved analysis for dynamic regret of strongly convex and smooth functions. In Proceedings of the 3rd Annual Learning for Dynamics and Control Conference, pages 48 59, 2021.

P. Zhao, Y.-J. Zhang, L. Zhang, and Z.-H. Zhou. Dynamic regret of convex and smooth functions. In Advances in Neural Information Processing Systems 33, pages 12510 12520, 2020a.

P. Zhao, Y.-X. Wang, and Z.-H. Zhou. Non-stationary online learning with memory and non-stochastic control. Ar Xiv e-prints, ar Xiv:2102.03758, 2021.

Y. Zhao, Q. Zhao, X. Zhang, E. Zhu, X. Liu, and J. Yin. Understand dynamic regret with switching cost for online decision making. ACM Transactions on Intelligent Systems and Technology, 11(3), 2020b.

M. Zinkevich. Online convex programming and generalized inﬁnitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning, pages 928 936, 2003.