# online_convex_optimization_for_cumulative_constraints__f0b7e04d.pdf Online convex optimization for cumulative constraints Jianjun Yuan Department of Electrical and Computer Engineering University of Minnesota Minneapolis, MN, 55455 yuanx270@umn.edu Andrew Lamperski Department of Electrical and Computer Engineering University of Minnesota Minneapolis, MN, 55455 alampers@umn.edu We propose the algorithms for online convex optimization which lead to cumulative squared constraint violations of the form TP [g(xt)]+ 2 = O(T 1 β), where β (0, 1) . Previous literature has focused on long-term constraints of the form TP t=1 g(xt). There, strictly feasible solutions can cancel out the effects of violated constraints. In contrast, the new form heavily penalizes large constraint violations and cancellation effects cannot occur. Furthermore, useful bounds on the single step constraint violation [g(xt)]+ are derived. For convex objectives, our regret bounds generalize existing bounds, and for strongly convex objectives we give improved regret bounds. In numerical experiments, we show that our algorithm closely follows the constraint boundary leading to low cumulative violation. 1 Introduction Online optimization is a popular framework for machine learning, with applications such as dictionary learning [14], auctions [1], classification, and regression [3]. It has also been influential in the development of algorithms in deep learning such as convolutional neural networks [11], deep Qnetworks [15], and reinforcement learning [8, 20]. The general formulation for online convex optimization (OCO) is as follows: At each time t, we choose a vector xt in convex set S = {x : g(x) 0}. Then we receive a loss function ft : S R drawn from a family of convex functions and we obtain the loss ft(xt). In this general setting, there is no constraint on how the sequence of loss functions ft is generated. See [21] for more details. The goal is to generate a sequence of xt S for t = 1, 2, .., T to minimize the cumulative regret which is defined by: Regret T (x ) = t=1 ft(x ) (1) where x is the optimal solution to the following problem: min x S t=1 ft(x). According to [2], the solution to Problem (1) is called Hannan consistent if Regret T (x ) is sublinear in T. 32nd Conference on Neural Information Processing Systems (Neur IPS 2018), Montréal, Canada. For online convex optimization with constraints, a projection operator is typically applied to the updated variables in order to make them feasible at each time step [21, 6, 7]. However, when the constraints are complex, the computational burden of the projection may be too high for online computation. To circumvent this dilemma, [13] proposed an algorithm which approximates the true desired projection with a simpler closed-form projection. The algorithm gives a cumulative regret Regret T (x ) which is upper bounded by O( T), but the constraint g(xt) 0 may not be satisfied in every time step. Instead, the long-term constraint violation satisfies TP O(T 3/4), which is useful when we only require the constraint violation to be non-positive on average: t=1 g(xt)/T 0. More recently, [10] proposed an adaptive stepsize version of this algorithm which can make Regret T (x ) O(T max{β,1 β}) and TP t=1 g(xt) O(T 1 β/2). Here β (0, 1) is a user- determined trade-off parameter. In related work, [19] provides another algorithm which achieves O( T) regret and a bound of O( T) on the long-term constraint violation. In this paper, we propose two algorithms for the following two different cases: Convex Case: The first algorithm is for the convex case, which also has the user-determined tradeoff as in [10], while the constraint violation is more strict. Specifically, we have Regret T (x ) O(T max{β,1 β}) and TP [g(xt)]+ 2 O(T 1 β) where [g(xt)]+ = max{0, g(xt)} and β (0, 1). Note the square term heavily penalizes large constraint violations and constraint violations from one step cannot be canceled out by strictly feasible steps. Additionally, we give a bound on the cumulative constraint violation TP t=1 [g(xt)]+ O(T 1 β/2), which generalizes the bounds from [13, 10]. In the case of β = 0.5, which we call "balanced", both Regret T (x ) and TP t=1 ([g(xt)]+)2 have the same upper bound of O( T). More importantly, our algorithm guarantees that at each time step, the clipped constraint term [g(xt)]+ is upper bounded by O( 1 T 1/6 ), which does not follow from the results of [13, 10]. However, our results currently cannot generalize those of [19], which has TP t=1 g(xt) O( T). As discussed below, it is unclear how to extend the work of [19] to the clipped constraints, [g(xt)]+. Strongly Convex Case: Our second algorithm for strongly convex function ft(x) gives us the improved upper bounds compared with the previous work in [10]. Specifically, we have Regret T (x ) O(log(T)), and TP t=1 [g(xt)]+ O( p log(T)T). The improved bounds match the regret order of standard OCO from [9], while maintaining a constraint violation of reasonable order. We show numerical experiments on three problems. A toy example is used to compare trajectories of our algorithm with those of [10, 13], and we see that our algorithm tightly follows the constraints. The algorithms are also compared on a doubly-stochastic matrix approximation problem [10] and an economic dispatch problem from power systems. In these, our algorithms lead to reasonable objective regret and low cumulative constraint violation. 2 Problem Formulation The basic projected gradient algorithm for Problem (1) was defined in [21]. At each step, t, the algorithm takes a gradient step with respect to ft and then projects onto the feasible set. With some assumptions on S and ft, this algorithm achieves a regret of O( Although the algorithm is simple, it needs to solve a constrained optimization problem at every time step, which might be too time-consuming for online implementation when the constraints are complex. Specifically, in [21], at each iteration t, the update rule is: xt+1 = ΠS(xt η ft(xt)) = arg min y S y (xt η ft(xt)) 2 (2) where ΠS is the projection operation to the set S and is the ℓ2 norm. In order to lower the computational complexity and accelerate the online processing speed, the work of [13] avoids the convex optimization by projecting the variable to a fixed ball S B, which always has a closed-form solution. That paper gives an online solution for the following problem: min x1,...,x T B t=1 ft(xt) min x S t=1 ft(x) s.t. TP t=1 gi(xt) 0, i = 1, 2, ..., m (3) where S = {x : gi(x) 0, i = 1, 2, ..., m} B. It is assumed that there exist constants R > 0 and r < 1 such that r K S RK with K being the unit ℓ2 ball centered at the origin and B = RK. Compared to Problem (1), which requires that xt S for all t, (3) implies that only the sum of constraints is required. This sum of constraints is known as the long-term constraint. To solve this new problem, [13] considers the following augmented Lagrangian function at each iteration t: Lt(x, λ) = ft(x) + n λigi(x) ση 2 λ2 i o (4) The update rule is as follows: xt+1 = ΠB(xt η x Lt(xt, λt)), λt+1 = Π[0,+ )m(λt + η λLt(xt, λt)) (5) where η and σ are the pre-determined stepsize and some constant, respectively. More recently, an adaptive version was developed in [10], which has a user-defined trade-off parameter. The algorithm proposed by [10] utilizes two different stepsize sequences to update x and λ, respectively, instead of using a single stepsize η. In both algorithms of [13] and [10], the bound for the violation of the long-term constraint is that i, TP t=1 gi(xt) O(T γ) for some γ (0, 1). However, as argued in the last section, this bound does not enforce that the violation of the constraint xt S gets small. A situation can arise in which strictly satisfied constraints at one time step can cancel out violations of the constraints at other time steps. This problem can be rectified by considering clipped constraint, [gi(xt)]+, in place of gi(xt). For convex problems, our goal is to bound the term TP [gi(xt)]+ 2, which, as discussed in the previous section, is more useful for enforcing small constraint violations, and also recovers the existing bounds for both TP t=1 [gi(xt)]+ and TP t=1 gi(xt). For strongly convex problems, we also show the improvement on the upper bounds compared to the results in [10]. In sum, in this paper, we want to solve the following problem for the general convex condition: min x1,x2,...,x T B t=1 ft(xt) min x S t=1 ft(x) s.t. TP [gi(xt)]+ 2 O(T γ), i (6) where γ (0, 1). The new constraint from (6) is called the square-clipped long-term constraint (since it is a square-clipped version of the long-term constraint) or square-cumulative constraint (since it encodes the square-cumulative violation of the constraints). To solve Problem (6), we change the augmented Lagrangian function Lt as follows: Lt(x, λ) = ft(x) + n λi[gi(x)]+ θt 2 λ2 i o (7) In this paper, we will use the following assumptions as in [13]: 1. The convex set S is non-empty, closed, bounded, and can be described by m convex functions as S = {x : gi(x) 0, i = 1, 2, ..., m}. Algorithm 1 Generalized Online Convex Optimization with Long-term Constraint 1: Input: constraints gi(x) 0, i = 1, 2, ..., m, stepsize η, time horizon T, and constant σ > 0. 2: Initialization: x1 is in the center of the B . 3: for t = 1 to T do 4: Input the prediction result xt. 5: Obtain the convex loss function ft(x) and the loss value ft(xt). 6: Calculate a subgradient x Lt(xt, λt), where: x Lt(xt, λt) = xft(xt) + m P i=1 λi t x([gi(xt)]+), x([gi(xt)]+) = 0, gi(xt) 0 xgi(xt), otherwise 7: Update xt and λt as below: xt+1 = ΠB(xt η x Lt(xt, λt)), λt+1 = [g(xt+1)]+ 2. Both the loss functions ft(x), t and constraint functions gi(x), i are Lipschitz continuous in the set B. That is, ft(x) ft(y) Lf x y , gi(x) gi(y) Lg x y , x, y B and t, i. G = max{Lf, Lg}, and F = max t=1,2,...,T max x,y B ft(x) ft(y) 2Lf R, D = max i=1,2,...,m max x B gi(x) Lg R 3 Algorithm 3.1 Convex Case: The main algorithm for this paper is shown in Algorithm 1. For simplicity, we abuse the subgradient notation, denoting a single element of the subgradient by x Lt(xt, λt). Comparing our algorithm with Eq.(5), we can see that the gradient projection step for xt+1 is similar, while the update rule for λt+1 is different. Instead of a projected gradient step, we explicitly maximize Lt+1(xt+1, λ) over λ. This explicit projection-free update for λt+1 is possible because the constraint clipping guarantees that the maximizer is non-negative. Furthermore, this constraint-violation-dependent update helps to enforce small cumulative and individual constraint violations. Specific bounds on constraint violation are given in Theorem 1 and Lemma 1 below. Based on the update rule in Algorithm 1, the following theorem gives the upper bounds for both the regret on the loss and the squared-cumulative constraint violation, TP [gi(xt)]+ 2 in Problem 6. For space purposes, all proofs are contained in the supplementary material. Theorem 1. Set σ = (m+1)G2 2(1 α) , η = 1 G (m+1)RT . If we follow the update rule in Algorithm 1 with α (0, 1) and x being the optimal solution for min x S t=1 ft(x), we have ft(xt) ft(x ) O( [gi(xt)]+ 2 O( T), i {1, 2, ..., m} From Theorem 1, we can see that by setting appropriate stepsize, η, and constant, σ, we can obtain the upper bound for the regret of the loss function being less than or equal to O( T), which is also shown in [13] [10]. The main difference of the Theorem 1 is that previous results of [13] [10] all obtain the upper bound for the long-term constraint TP t=1 gi(xt), while here the upper bound for the constraint violation of the form TP [gi(xt)]+ 2 is achieved. Also note that the stepsize depends on T, which may not be available. In this case, we can use the doubling trick described in the book [2] to transfer our T-dependent algorithm into T-free one with a worsening factor of The proposed algorithm and the resulting bound are useful for two reasons: 1. The square-cumulative constraint implies a bound on the cumulative constraint violation, TP t=1 [gi(xt)]+, while enforcing larger penalties for large violations. 2. The proposed algorithm can also upper bound the constraint violation for each single step [gi(xt)]+, which is not bounded in the previous literature. The next results show how to bound constraint violations at each step. Lemma 1. If there is only one differentiable constraint function g(x) with Lipschitz continuous gradient parameter L, and we run the Algorithm 1 with the parameters in Theorem 1 and large enough T, we have [g(xt)]+ O( 1 T 1/6 ), t {1, 2, ..., T}, if [g(x1)]+ O( 1 T 1/6 ). Lemma 1 only considers single constraint case. For case of multiple differentiable constraints, we have the following: Proposition 1. For multiple differentiable constraint functions gi(x), i {1, 2, ..., m} with Lipschitz continuous gradient parameters Li, if we use g(x) = log m P i=1 exp gi(x) as the constraint function in Algorithm 1, then for large enough T, we have [gi(xt)]+ O( 1 T 1/6 ), i, t, if [ g(x1)]+ O( 1 T 1/6 ). Clearly, both Lemma 1 and Proposition 1 only deal with differentiable functions. For a nondifferentiable function g(x), we can first use a differentiable function g(x) to approximate the g(x) with g(x) g(x), and then apply the previous Lemma 1 and Proposition 1 to upper bound each individual gi(xt). Many non-smooth convex functions can be approximated in this way as shown in [16]. 3.2 Strongly Convex Case: For ft(x) to be strongly convex, the Algorithm 1 is still valid. But in order to have lower upper bounds for both objective regret and the clipped long-term constraint TP t=1 [gi(xt)]+ compared with Proposition 3 in next section, we need to use time-varying stepsize as the one used in [9]. Thus, we modify the update rule of xt, λt to have time-varying stepsize as below: xt+1 = ΠB(xt ηt x Lt(xt, λt)), λt+1 = [g(xt+1)]+ If we replace the update rule in Algorithm 1 with Eq.(8), we can obtain the following theorem: Theorem 2. Assume ft(x) has strongly convexity parameter H1. If we set ηt = H1 t+1, θt = ηt(m + 1)G2, follow the new update rule in Eq.(8), and x being the optimal solution for min x S t=1 ft(x), for i {1, 2, ..., m}, we have ft(xt) ft(x ) O(log(T)), TP t=1 gi(xt) TP t=1 [gi(xt)]+ O( p The paper [10] also has a discussion of strongly convex functions, but only provides a bound similar to the convex one. Theorem 2 shows the improved bounds for both objective regret and the constraint violation. On one hand the objective regret is consistent with the standard OCO result in [9], and on the other the constraint violation is further reduced compared with the result in [10]. 4 Relation with Previous Results In this section, we extend Theorem 1 to enable direct comparison with the results from [13] [10]. In particular, it is shown how Algorithm 1 recovers the existing regret bounds, while the use of the new augmented Lagrangian (7) in the previous algorithms also provides regret bounds for the clipped constraint case. The first result puts a bound on the clipped long-term constraint, rather than the sum-of-squares that appears in Theorem 1. This will allow more direct comparisons with the existing results. Proposition 2. If σ = (m+1)G2 2(1 α) , η = O( 1 T ), α (0, 1), and x = argmin x S t=1 ft(x), then the result of Algorithm 1 satisfies ft(xt) ft(x ) O( t=1 gi(xt) TP t=1 [gi(xt)]+ O(T 3/4), i {1, 2, ..., m} This result shows that our algorithm generalizes the regret and long-term constraint bounds of [13]. The next result shows that by changing our constant stepsize accordingly, with the Algorithm 1, we can achieve the user-defined trade-off from [10]. Furthermore, we also include the squared version and clipped constraint violations. Proposition 3. If σ = (m+1)G2 2(1 α) , η = O( 1 T β ), α (0, 1), β (0, 1), and x = argmin x S then the result of Algorithm 1 satisfies ft(xt) ft(x ) O(T max{β,1 β}), t=1 gi(xt) TP t=1 [gi(xt)]+ O(T 1 β/2), TP t=1 ([gi(xt)]+)2 O(T 1 β), i {1, 2, ..., m} Proposition 3 provides a systematic way to balance the regret of the objective and the constraint violation. Next, we will show that previous algorithms can use our proposed augmented Lagrangian function to have their own clipped long-term constraint bound. Proposition 4. If we run Algorithm 1 in [13] with the augmented Lagrangian formula defined in Eq.(7), the result satisfies ft(xt) ft(x ) O( t=1 gi(xt) TP t=1 [gi(xt)]+ O(T 3/4), i {1, 2, ..., m}. For the update rule proposed in [10], we need to change the Lt(x, λ) to the following one: Lt(x, λ) = ft(x) + λ[g(x)]+ θt where g(x) = maxi {1,...,m} gi(x). Proposition 5. If we use the update rule and the parameter choices in [10] with the augmented Lagrangian in Eq.(9), then i {1, ..., m}, we have ft(xt) ft(x ) O(T max{β,1 β}), TP t=1 gi(xt) TP t=1 [gi(xt)]+ O(T 1 β/2). Propositions 4 and 5 show that clipped long-term constraints can be bounded by combining the algorithms of [13, 10] with our augmented Lagrangian. Although these results are similar in part to our Propositions 2 and 3, they do not imply the results in Theorems 1 and 2 as well as the new single step constraint violation bound in Lemma 1, which are our key contributions. Based on Propositions 4 and 5, it is natural to ask whether we could apply our new augmented Lagrangian formula (7) to the recent work in [19] . Unfortunately, we have not found a way to do so. Furthermore, since [gi(xt)]+ 2 is also convex, we could define gi(xt) = [gi(xt)]+ 2 and apply the previous algorithms [13] [10] and [19]. This will result in the upper bounds of O(T 3/4) [13] and O(T 1 β/2) [10], which are worse than our upper bounds of O(T 1/2) (Theorem 1) and O(T 1 β) ( Proposition 3). Note that the algorithm in [19] cannot be applied since the clipped constraints do not satisfy the required Slater condition. 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 x1 Clipped OGD(β = 0.5) A OGD(β = 0.5) 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 x1 Clipped OGD(β = 2/3) A OGD(β = 2/3) Figure 1: Toy Example Results: Trajectories generated by different algorithms. Note how trajectories generated by Clipped-OGD follow the desired constraints tightly. In contrast, OGD oscillates around the true constraints, and A-OGD closely follows the boundary of the outer ball. 0 2500 5000 7500 10000 12500 15000 17500 20000 Different Iterations T Clipped Constraint Regret Clipped OGD(β = 0.5) Clipped OGD(β = 2/3) A OGD(β = 0.5) A OGD(β = 2/3) OGD Our-Strong 0 2500 5000 7500 10000 12500 15000 17500 20000 Different Iterations T Constraint Regret Clipped OGD(β = 0.5) Clipped OGD(β = 2/3) A OGD(β = 0.5) A OGD(β = 2/3) OGD Our-Strong 0 2500 5000 7500 10000 12500 15000 17500 20000 Different Iterations T Objective Regret Clipped OGD(β = 0.5) Clipped OGD(β = 2/3) A OGD(β = 0.5) A OGD(β = 2/3) OGD Our-Strong Figure 2: Doubly-Stochastic Matrices. Fig.2(a): Clipped Long-term Constraint Violation. Fig.2(b): Long-term Constraint Violation. Fig.2(c): Cumulative Regret of the Loss function 0 500 1000 1500 2000 2500 3000 Time Slots(each 5 min) 0 500 1000 1500 2000 2500 3000 Time Slots(each 5 min) Constrain Violation % Clipped OGD(β = 0.5) Clipped OGD(β = 2/3) A OGD(β = 0.5) A OGD(β = 2/3) 0 500 1000 1500 2000 2500 3000 Time Slots(each 5 min) Constrain Violation % Clipped OGD(β = 0.5) Clipped OGD(β = 2/3) 0 500 1000 1500 2000 2500 3000 Time Slots(each 5 min) Objective Cost Running Average Objective Cost Clipped OGD(β = 0.5) Clipped OGD(β = 2/3) A OGD(β = 0.5) A OGD(β = 2/3) OGD Best fixed strategy in hindsight Figure 3: Economic Dispatch. Fig.3(a): Power Demand Trajectory. Fig.3(b): Constraint Violation for each time step. All of the previous algorithms incurred substantial constraint violations. The figure on the right shows the violations of our algorithm, which are significantly smaller. Fig.3(c): Running Average of the Objective Loss 5 Experiments In this section, we test the performance of the algorithms including OGD [13], A-OGD [10], Clipped OGD (this paper), and our proposed algorithm strongly convex case (Our-strong). Throughout the experiments, our algorithm has the following fixed parameters: α = 0.5, σ = (m+1)G2 2(1 α) , η = R(m+1). In order to better show the result of the constraint violation trajectories, we aggregate all the constraints as a single one by using g(xt) = maxi {1,...,m} gi(xt) as done in [13]. 5.1 A Toy Experiment For illustration purposes, we solve the following 2-D toy experiment with x = [x1, x2]T : t=1 c T t x, s.t. |x1| + |x2| 1 0. (10) where the constraint is the ℓ1-norm constraint. The vector ct is generated from a uniform random vector over [0, 1.2] [0, 1] which is rescaled to have norm 1. This leads to slightly average cost on the on the first coordinate. The offline solutions for different T are obtained by CVXPY [5]. All algorithms are run up to T = 20000 and are averaged over 10 random sequences of {ct}T t=1. Since the main goal here is to compare the variables trajectories generated by different algorithms, the results for different T are in the supplementary material for space purposes. Fig.1 shows these trajectories for one realization with T = 8000. The blue star is the optimal point s position. From Fig.1 we can see that the trajectories generated by Clipped-OGD follows the boundary very tightly until reaching the optimal point. This can be explained by the Lemma 1 which shows that the constraint violation for single step is also upper bounded. For the OGD, the trajectory oscillates widely around the boundary of the true constraint. For the A-OGD, its trajectory in Fig.1 violates the constraint most of the time, and this violation actually contributes to the lower objective regret shown in the supplementary material. 5.2 Doubly-Stochastic Matrices We also test the algorithms for approximation by doubly-stochastic matrices, as in [10]: 1 2 Yt X 2 F s.t. X1 = 1, XT 1 = 1, Xij 0. (11) where X Rd d is the matrix variable, 1 is the vector whose elements are all 1, and matrix Yt is the permutation matrix which is randomly generated. After changing the equality constraints into inequality ones (e.g.,X1 = 1 into X1 1 and X1 1), we run the algorithms with different T up to T = 20000 for 10 different random sequences of {Yt}T t=1. Since the objective function ft(x) is strongly convex with parameter H1 = 1, we also include our designed strongly convex algorithm as another comparison. The offline optimal solutions are obtained by CVXPY [5]. The mean results for both constraint violation and objective regret are shown in Fig.2. From the result we can see that, for our designed strongly convex algorithm Our-Strong, its result is around the best ones in not only the clipped constraint violation, but the objective regret. For our most-balanced convex case algorithm Clipped-OGD with β = 0.5, although its clipped constraint violation is relatively bigger than A-OGD, it also becomes quite flat quickly, which means the algorithm quickly converges to a feasible solution. 5.3 Economic Dispatch in Power Systems This example is adapted from [12] and [18], which considers the problem of power dispatch. That is, at each time step t, we try to minimize the power generation cost ci(xt,i) for each generator i while maintaining the power balance n P i=1 xt,i = dt, where dt is the power demand at time t. Also, each power generator produces an emission level Ei(xt,i). To bound the emissions, we impose the constraint n P i=1 Ei(xt,i) Emax. In addition to requiring this constraint to be satisfied on average, we also require bounded constraint violations at each timestep. The problem is formally stated as: i=1 ci(xt,i) + ξ( n P i=1 xt,i dt)2 , s.t. n P i=1 Ei(t, i) Emax, 0 xt,i xi,max. (12) where the second constraint is from the fact that each generator has the power generation limit. In this example, we use three generators. We define the cost and emission functions according to [18] and [12] as ci(xt,i) = 0.5aix2 t,i + bixt,i, and Ei = dix2 t,i + eixt,i, respectively. The parameters are: a1 = 0.2, a2 = 0.12, a3 = 0.14, b1 = 1.5, b2 = 1, b3 = 0.6, d1 = 0.26, d2 = 0.38, d3 = 0.37, Emax = 100, ξ = 0.5, and x1,max = 20, x2,max = 15, x3,max = 18. The demand dt is adapted from real-world 5-minute interval demand data between 04/24/2018 and 05/03/2018 1, which is shown in Fig.3(a). The offline optimal solution or best fixed strategy in hindsight is obtained by an implementation of SAGA [4]. The constraint violation for each time step is shown in Fig.3(b), and the running average objective cost is shown in Fig.3(c). From these results we can see that our algorithm has very small constraint violation for each time step, which is desired by the requirement. Furthermore, our objective costs are very close to the best fixed strategy. 6 Conclusion In this paper, we propose two algorithms for OCO with both convex and strongly convex objective functions. By applying different update strategies that utilize a modified augmented Lagrangian function, they can solve OCO with a squared/clipped long-term constraints requirement. The algorithm for general convex case provides the useful bounds for both the long-term constraint violation and the constraint violation at each timestep. Furthermore, the bounds for the strongly convex case is an improvement compared with the previous efforts in the literature. Experiments show that our algorithms can follow the constraint boundary tightly and have relatively smaller clipped long-term constraint violation with reasonably low objective regret. It would be useful if future work could explore the noisy versions of the constraints and obtain the similar upper bounds. Acknowledgments Thanks to Tianyi Chen for valuable discussions about algorithm s properties. [1] Avrim Blum, Vijay Kumar, Atri Rudra, and Felix Wu. Online learning in online auctions. Theoretical Computer Science, 324(2-3):137 146, 2004. [2] Nicolo Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge university press, 2006. [3] Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research, 7(Mar):551 585, 2006. [4] Aaron Defazio, Francis Bach, and Simon Lacoste-Julien. Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In Advances in Neural Information Processing Systems, pages 1646 1654, 2014. [5] Steven Diamond and Stephen Boyd. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17(83):1 5, 2016. [6] John Duchi, Shai Shalev-Shwartz, Yoram Singer, and Tushar Chandra. Efficient projections onto the l 1-ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning, pages 272 279. ACM, 2008. 1https://www.iso-ne.com/isoexpress/web/reports/load-and-demand [7] John C Duchi, Shai Shalev-Shwartz, Yoram Singer, and Ambuj Tewari. Composite objective mirror descent. In COLT, pages 14 26, 2010. [8] Maryam Fazel, Rong Ge, Sham M Kakade, and Mehran Mesbahi. Global convergence of policy gradient methods for linearized control problems. ar Xiv preprint ar Xiv:1801.05039, 2018. [9] Elad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex optimization. Machine Learning, 69(2):169 192, 2007. [10] Rodolphe Jenatton, Jim Huang, and Cédric Archambeau. Adaptive algorithms for online convex optimization with long-term constraints. In International Conference on Machine Learning, pages 402 411, 2016. [11] Yann Le Cun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278 2324, 1998. [12] Yingying Li, Guannan Qu, and Na Li. Online optimization with predictions and switching costs: Fast algorithms and the fundamental limit. ar Xiv preprint ar Xiv:1801.07780, 2018. [13] Mehrdad Mahdavi, Rong Jin, and Tianbao Yang. Trading regret for efficiency: online convex optimization with long term constraints. Journal of Machine Learning Research, 13(Sep):2503 2528, 2012. [14] Julien Mairal, Francis Bach, Jean Ponce, and Guillermo Sapiro. Online dictionary learning for sparse coding. In Proceedings of the 26th annual international conference on machine learning, pages 689 696. ACM, 2009. [15] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529 533, 2015. [16] Yu Nesterov. Smooth minimization of non-smooth functions. Mathematical programming, 103(1):127 152, 2005. [17] Yurii Nesterov. Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media, 2013. [18] K Senthil and K Manikandan. Economic thermal power dispatch with emission constraint and valve point effect loading using improved tabu search algorithm. International Journal of Computer Applications, 2010. [19] Hao Yu, Michael Neely, and Xiaohan Wei. Online convex optimization with stochastic constraints. In Advances in Neural Information Processing Systems, pages 1427 1437, 2017. [20] Jianjun Yuan and Andrew Lamperski. Online control basis selection by a regularized actor critic algorithm. In American Control Conference (ACC), 2017, pages 4448 4453. IEEE, 2017. [21] Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), pages 928 936, 2003.