# projectionfree_online_convex_optimization_with_timevarying_constraints__3c8d1dad.pdf Projection-Free Online Convex Optimization with Time-Varying Constraints Dan Garber 1 Ben Kretzu 1 We consider the setting of online convex optimization with adversarial time-varying constraints in which actions must be feasible w.r.t. a fixed constraint set, and are also required on average to approximately satisfy additional time-varying constraints. Motivated by scenarios in which the fixed feasible set (hard constraint) is difficult to project on, we consider projection-free algorithms that access this set only through a linear optimization oracle (LOO). We present an algorithm that, on a sequence of length T and using overall T calls to the LOO, guarantees O(T 3/4) regret w.r.t. the losses and O(T 7/8) constraints violation (ignoring all quantities except for T). In particular, these bounds hold w.r.t. any interval of the sequence. This algorithm however also requires access to an oracle for minimizing a strongly convex nonsmooth function over a Euclidean ball. We present a more efficient algorithm that does not require the latter optimization oracle but only first-order access to the time-varying constraints, and achieves similar bounds w.r.t. the entire sequence. We extend the latter to the setting of bandit feedback and obtain similar bounds (as a function of T) in expectation. 1. Introduction We consider a particular setting of the well studied paradigm for sequential prediction known as online convex optimization (OCO) (Hazan, 2019; Shalev-Shwartz et al., 2012). In (standard) OCO, a decision maker is required to iteratively choose an action some point xt on each round t (the total number of rounds T is finite and assumed for simplicity to be known in advanced), which must belong to some fixed 1Technion - Israel Institute of Technology, Haifa, Israel. Correspondence to: Ben Kretzu , Dan Garber . Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s). (throughout all rounds) feasible convex set K Rn 1, which we will also assume to be compact (as is often standard). After choosing xt, a scalar loss ft(xt) is incurred, where ft : Rn R is some arbitrary convex loss function (assumed for sake of analysis to be chosen adversarially). The standard goal is, on the course of the T rounds, to choose feasible actions x1, . . . , x T such that the regret, given by the difference PT t=1 ft(xt) minx K PT t=1 ft(x), grows (as a function of T) only at a sublinear rate (the slower the better). In the particular setting considered in this work OCO with time-varying constraints, we assume that besides the hard and fixed constraint given by the set K to which all played points must belong, there are additional soft and time-varying constraints given by convex functions g1, . . . , g T Rn R, where gt is revealed at the end of round t and encodes the constraint gt(x) 0. In this setting the standard goal is two folded: I. to guarantee sublinear regret w.r.t. the loss functions f1, . . . , f T against the best action in hindsight in the intersection of the hard constraint and all soft constraints, i.e., to guarantee that T X t=1 ft(xt) min x K: gt(x) 0 t [T ] t=1 ft(x) = o(T), (1) and II. to guarantee that the cumulative violation of the soft constraints is also sublinear in T, i.e., that T X g+ t (xt) := max{gt(xt), 0} = o(T). (2) Here, g+ t (x) := max{gt(x), 0} is introduced to prevent a natural undesired phenomenon in which strongly satisfying some constraints can compensate strongly violating others. This setting was recently studied in (Yi et al., 2022; Guo et al., 2022; Castiglioni et al., 2022; Neely & Yu, 2017; Cao & Liu, 2018). The state-of-the-art bounds with fullinformation (i.e., ft( ), gt( ) become known to the algorithm at the end of each round t) for this setting are O( T) for the regret (as given in (1)) and O(T 3/4) for the constraint violation (as given in (2)), see for instance (Guo et al., 2022). However, previous works require as a sub-routine to solve on each iteration a convex optimization problem that is at 1for ease of presentation we consider the underlying space to be Rn, however any finite-dimensional Euclidean space is suitable Projection-Free Online Convex Optimization with Time-Varying Constraints least as difficult as computing a Euclidean projection onto the feasible set K. In high-dimensional settings and when the set K admits a non-trivial structure, this highly limits the applicability of the proposed methods. Motivated by this observation, in this work, and to the best of our knowledge for the first time, we consider so-called projection-free algorithms for OCO with time-varying constraints. Concretely, motivated by vast work on projection-free methods for OCO in recent years, e.g., (Hazan & Kale, 2012; Garber & Kretzu, 2022; Hazan & Minasyan, 2020; Garber & Kretzu, 2023; Mhammedi, 2021; 2022), we consider algorithms that only access the feasible set K through a linear optimization oracle (that is an oracle that given a linear function, can find a minimizer of it over K) and consider online algorithms that throughout the T rounds make at most T calls to this oracle. Example I: online minimum-cost capacitated flow. Consider a fixed directed acyclic graph G with n nodes, m edges, source node s and target node e. A decision maker (DM) must route on each round t a unit flow from s to e, i.e., choose some point xt in the corresponding unit flow polytope2 K. The DM then incurs cost ft(xt). If each edge is associated with a linear cost, we may have ft(x) = f t x for some ft Rm. The DM also needs to respect time-varying edge capacity constraints given by xt(i) ct(i), i = 1, . . . , m, where ct has non-negative entries. Here, the constraint function on time t is simply gt(x) = maxi [m]{x(i) ct(i)}. The unit flow polytope is difficult to project on, however linear optimization over it corresponds to a simple weighted shortest path computation, which takes linear time using dynamic programming. Example II: online semidefinite optimization. Consider a sequence of T semidefinite optimization problems over the bounded positive semidefinite (PSD) cone K = {X Sn | X 0, Tr(X) τ}, where Sn denotes the space of n n real symmetric matrices and τ > 0. Each instance t in the sequence consist of a convex objective ft(X) : Sn R and a set of mt linear inequalities Tr(A t,i X) bt,i, i {1, . . . , mt}. Here the fixed hard constraints are given by the convex bounded PSD cone, and the time-varying soft constraints can be encoded using gt(X) = maxi [mt]{Tr(A t,i X) bt,i}. Projecting onto the bounded PSD cone requires a full eigen-decomposition of a n n symmetric matrix, which in practical implementations requires O(n3) runtime. Linear optimization over this set amounts to a single leading eigenvector computation whose runtime scales only with n2, and even faster when the input matrix is sparse, using fast iterative eigenvector methods. We believe this setting is well suited to capture important online scenarios with changing constraints: at one extreme, some works as the ones mentioned above, enforce the hard 2this is also the convex-hull of all identifying vectors of paths from node s to node e in the graph G constraint K using costly computational procedures (projections) which can be prohibitive, as in the examples above. At the other extreme, for the sake of computational efficiency one can also model membership to K as a long term soft constraint (this was the motivation of (Mahdavi et al., 2012) which pioneered the study of OCO with long term constraints), however this might result in a too crude approximation of real-world scenarios. For instance, in Example 1 above it makes sense that even if the capacity constraints could be violated, the decision maker still must choose a feasible flow on each round, or that in Example II the chosen matrix must indeed be PSD. We view our work as an appealing middle ground between these two extremes: distinguishing between hard and soft (varying) constraints, while insisting on tractable procedures for high-dimensional and complex domains. 1.1. Contributions (informally stated) I. We give an algorithm that using no more than T calls to the LOO of K throughout all T rounds, guarantees that on each time interval [s, e], 1 s < e T, the regret w.r.t. the set b Ks,e = {x K | gt(x) 0 s t e} is upper-bounded by O(T 3/4), and the overall non-negative constraint violation on the interval Pe t=s g+ t (xt) is upperbounded by O(T 7/8) (ignoring all quantities except for T and n). Note this metric (bounds w.r.t. any interval of the sequence, also known as adpative regret (Hazan & Seshadhri, 2009)) is substantially stronger than the ones in (1), (2), and in particular is interesting even when there is no point x K that satisfies all constraints gt(x) 0, t [T]. Aside from a LOO for the set K and a first-order oracle for the observed loss functions, the algorithm also requires access to an oracle for minimizing a strongly convex nonsmooth function over a Euclidean ball (which can be efficiently implemented via standard methods for convex optimization and is independent of the set K). See Section 3. II. We give a more efficient algorithm than the latter which does not require an oracle for strongly convex nonsmooth minimization over Euclidean balls, and instead only requires first-order access to the constraints g1, . . . , g T . This algorithm provides similar bounds, but only w.r.t. to the entire sequence (i.e., bounds as in (1), (2)). See Section 4. III. We extend the latter algorithm to the bandit setting in which only the scalar values ft(xt), gt(xt) are observed on each round t, where xt is the played point. The algorithm guarantees O( n T 3/4) expected regret and expected non-negative constraint violation PT t=1 g+ t (xt) = O(n1/4T 7/8). See Section 5. Importantly, our regret bounds w.r.t. the loss functions f1, . . . , f T , both in the full-information and bandit settings, match up to logarithmic terms the current state-of-the-art Projection-Free Online Convex Optimization with Time-Varying Constraints bounds (in terms of the horizon T and dimension n) for projection-free OCO (i.e., without additional time-varying constraints) (Hazan & Kale, 2012; Garber & Kretzu, 2022; 2020), and are thus (up to log factors) the best one can currently expect to achieve in our more challenging setting of projection-free OCO with time-varying constraints. 2. Preliminaries 2.1. Assumptions and notation As stated, we assume that the feasible set K Rn, which encodes the hard constraints (must be satisfied on each round t), is convex and compact and we let R be such that K RB, where B denotes the origin-centered unit Euclidean ball. We also assume w.l.o.g. that 0 K. We assume all loss and constraint functions f1, g1, . . . , f T , g T are convex over RB, and we let Gf and Gg denote upper-bounds on the ℓ2 norm of the subgradients of each ft, t [T] and gt, t [T] over the ball RB, respectively. Our online algorithms will consider the T prediction rounds in disjoint blocks (batches) of size K, where K is an integer parameter. We assume throughout for simplicity and w.l.o.g. that T/K is an integer. We also use the notation m(t) := t/K to map from some t [T] to the corresponding block index m(t) [T/K]. 2.2. Fast approximately-feasible projections via linear optimization In order to construct our projection-free algorithms that rely only on solving linear optimization problems over the fixed convex set K (as opposed to projection or even more complex steps) we rely on the technique recently introduced in (Garber & Kretzu, 2022; 2023) of fast computation of approximately-feasible projections using a LOO. Definition 2.1 (Approximately-feasible Projection Oracle3). Given a convex and compact set K Rn, and a tolerance ϵ > 0, we say a function OAF P (y, ϵ, K) is an approximately-feasible projection (AFP) oracle (for the set K with parameter ϵ), if for any input point y Rn, it returns some (x, ey) K Rn such that i. for all z K, ey z y z , and ii. x ey 2 ϵ. Algorithm 1 given below is taken from (Garber & Kretzu, 2023) and implements an AFP oracle for the set K using only linear optimization steps over K. It uses as a subroutine the well known Frank-Wolfe method (Algorithm 2). If the point-to-project is far from the set, the output of Algorithm 2 is used to construct a hyperplane separating the point from K, which is then used to pull the point closer 3(Garber & Kretzu, 2023) considers arbitrary matrix-induced norms. Here we only consider the Euclidean case which is equivalent to setting A = I in their definition. to K. Otherwise, if the point is already sufficiently close to K (but not necessarily in K), Algorithm 2 outputs a point in K sufficiently close to it. Algorithm 1 Approximately-Feasible Projection Oracle via a Linear Optimization Oracle (see (Garber & Kretzu, 2023)) Data: parameters y1 Rn, x0 K, ϵ if x0 y1 2 3ϵ then return x x0, y y1 end if for i = 1, 2, . . . do xi Output of Algorithm 2 when called with tolerance ϵ, feasible point xi 1, and initial point yi if xi yi 2 > 3ϵ then yi+1 = yi 2 3 (yi xi) else return x xi, y yi end if end for Algorithm 2 Separating Hyperplane via Frank-Wolfe Data: parameters x1 K, y Rn, ϵ for i = 1, 2, . . . do vi argminx K{(xi y) x} {call to LOO of K} if (xi y) (xi vi) ϵ or xi y 2 3ϵ then return ex xi end if σi = argminσ [0,1]{ y xi σ(vi xi)) 2} xi+1 = xi + σi(vi xi) end for The following lemma is from (Garber & Kretzu, 2023). Lemma 2.2. Let K Rn be convex and compact such that 0 K RB for some R > 0. Algorithm 1 guarantees that it returns (x, y) K Rn such that the following three conditions hold: I. z K : y z 2 y1 z 2, II. x y 2 3ϵ, and III. y y1 . The overall number of calls to the LOO of K (in Algorithm 2) throughout the run of Algorithm 1 is upper-bounded by ϵ max n 2.25 log y1 x0 2 ϵ + 1, 0 o . 3. Drift-plus-Penalty-inspired Algorithm with Adaptive Regret Guarantees Our first online algorithm, Algorithm 3, is inspired by the drift plus penalty approach used extensively in the literature on OCO with constraints, see for instance (Guo et al., 2022; Yi et al., 2021; 2022; Yu & Neely, 2020). The algorithm considers the T rounds in disjoint blocks of fixed size K, where K is a parameter of the algorithm. At the end of each block m [T/K], the algorithm performs its two key steps: The first step is to find a minimizer of a drift-pluspenalty-style objective function (the function hm defined in Projection-Free Online Convex Optimization with Time-Varying Constraints the algorithm), involving the gradients (i.e., linearizations) of the loss functions and the constraints functions (without linearization) observed during the K iterations of the block, over a Euclidean ball enclosing the set K 4. This amounts to minimizing a strongly convex nonsmooth function over a Euclidean ball (which in particular does not involve the set K and does not require calls to the LOO of K), and can be efficiently implemented via a variety of standard and efficient convex optimization algorithms. The second step is to feed to computed minimizer into the approximatelyfeasible projection oracle Algorithm 1, to obtain the next state of the algorithm. Algorithm 3 LOO-based Drift-plus-Penalty Method Data: parameters T, K, ϵ, δ, α x1 = ey1 arbitrary point in K for m = 1, . . . , T/K do for t = (m 1)K + 1, . . . , m K do Play xm and observe ft( ), gt( ) Set t ft(eym) end for Denote Tm = {(m 1)K + 1, . . . , m K} K P t=Tm t G+ m(x) δ P t Tm g+ t (x) ym+1 argminx RB{hm(x) := m (x eym) + G+ m(x) + α 2 x eym 2} (xm+1, eym+1) OAF P (ym+1, xm, ϵ, K) {Alg. 1} end for Theorem 3.1. Consider running Algorithm 3 with parameters δ = K = T 1 2 , α = Gf R T 1 4 , ϵ = 61R2T 1 2 log T. Fix an interval [s, e], 1 s < e T. If b Ks,e := {x K : gt(x) 0, t [s, e]} = then the regret on the interval is upper-bounded by t=s ft(xm(t)) min x b Ks,e t=s ft(x) = O RGf T 3 4 p and the constraints violation on the interval is upper bounded by Pe t=s g+ t (xm(t)) = O RGg T 7 8 . The overall number of calls to the LOO of K is upper-bounded by T. Note Theorem 3.1 in particular provides meaningful bounds in the important scenario in which no single point in K satsfies all the soft constraints given by g1, . . . , g T . Relying only on a LOO (with an overall budget of T calls) comes with a price: the bounds in Theorem 3.1 could be 4We note that our drift-plus-penalty-style objectives are simpler versions of those used in previous works such as (Guo et al., 2022; Yi et al., 2021; 2022; Yu & Neely, 2020) since the penalty multiplying the constraint function G+ m is fixed throughout the run and not adaptive. compared to the O( T) regret and O(T 3/4) constraint violation achievable by methods with unrestricted optimizations, see for instance (Guo et al., 2022). This deterioration in rates is expected and is a known phenomena in projection-free methods for OCO (without soft constraints), e.g., (Hazan & Kale, 2012; Garber & Kretzu, 2022). For the proof of the theorem we need the following lemma. Lemma 3.2. Consider the (infeasible) sequence {eym(t)}t [T ] produced by Algorithm 3. Fix an interval [s, e], 1 s < e T. If b Ks,e = then the regret of {eym(t)}t [T ] is upper-bounded by: t=s ft(eym(t)) min x b Ks,e t=s ft(x) 2KR(2Gf +Rα)+ G2 f 2α T, and for every c > 0, the constraints violation of the sequence on the interval is upper-bounded by: t=s g+ t (eym(t)) G2 g 2c T + 2c R Gf T α + 4R Gf T 2δK + 4KRGg. Proof. Fix some m [T/K]. Note hm( ) is α-strongly convex. Since ym+1 RB minimizes hm(x) over RB, if follows that for any x RB, hm(x) hm(ym+1) + α 2 x ym+1 2. Thus, for every x K RB, m (x eym)+G+ m(x)+ α x eym 2 x ym+1 2 m (ym+1 eym) + G+ m(ym+1) + α 2 ym+1 eym 2 . Since eym+1 is the output of OAF P (ym+1, ϵ, K), it follows that x eym+1 x ym+1 (Lemma 2.2) and so, m (x eym)+G+ m(x)+ α x eym 2 x eym+1 2 m (ym+1 eym) + G+ m(ym+1) + α 2 ym+1 eym 2 . m (ym+1 eym) + α 2 ym+1 eym 2 2 (ym+1 eym) Combining the last two inequalities, we have for any x K, m (x eym) + G+ m(x) + α x eym 2 x eym+1 2 Projection-Free Online Convex Optimization with Time-Varying Constraints which by rearranging gives that for every x K, m (eym x) α x eym 2 x eym+1 2 2α + G+ m(x) G+ m(ym+1). (4) Let us now fix some interval [s, e], 1 s e T. Let ms and me denotes the smallest block index and the largest block index that are fully contained in the interval [s, e], respectively. Note that m Gf for every m. Recall G+ m( ) 0, and G+ m(x) = 0 for every x b Ks,e and m [ms, me]. Summing Eq.(4) over m from ms to me, we have that for every x b Ks,e it holds that, t Tm t (eym x) K α 2 x eyms 2 + G2 f 2α T. (5) Since ft( ) is convex, for every x b Ks,e it holds that t=s ft(eym(t)) ft(x) = t=s ft(eyms 1) ft(x) t Tm ft(eym) ft(x) + t=me K+1 ft(eyme+1) ft(x) t=s t eyms 1 x + t Tm t (eym x) t=me K+1 t eyme+1 x . From Lemma 2.2 it follows that eym RB for every m. plugging this, Eq.(5), and the fact that t [T] t Gf into the last inequality, we have that for every x b Ks,e, t=s ft(eym(t)) ft(x) 4KGf R + 2KR2α + G2 f 2α T. Now, we move on to bound the constraint violation on the interval [s, e]. We start with some preliminaries. Rearranging Eq.(3), we have for every x K and m [T/K] that, ym+1 eym 2 2 α m (x ym+1) α G+ m(x) G+ m(ym+1) + x eym 2 x eym+1 2 αG+ m(x)+ x eym 2 x eym+1 2 , where in the last inequality we have used the facts that x K RB and ym+1 RB, and that G+ m( ) 0 . Summing the above inequality over m from ms to me, and recalling that eym RB and m Gf for all m, we have that for every x K it holds that, m=ms ym+1 eym 2 4Gf RT m=ms G+ m(x)+4R2. (6) Rearranging Eq.(4), and summing it over m from ms to me, we have that for every x K it holds that, m=ms G+ m(ym+1) x eym 2 x eym+1 2 2α +G+ m(x)+ m (x eym) 2αR2 + G2 f T 2αK + m=ms G+ m(x) + 2Gf RT Recalling G+ m(x) = δ P t Tm g+ t (x), we have x K, t Tm g+ t (ym(t)+1) 2αR2 δ + G2 f T 2αδK m=ms G+ m(x) + 2Gf RT Since gt( ) is convex and Gg-Lipschitz over RB for every t, g+ t (x) = max{0, gt(x)} is also convex and Gg-Lipschitz over RB. Using the inequality 2ab a2 + b2 for every a, b R, we have that for every t [T] and c > 0, g+ t (eym(t)) g+ t (ym(t)+1) Gg eym(t) ym(t)+1 G2 g 2c + c eym(t) ym(t)+1 2 . Summing the last inequality over t, and combining it with Eq.(6), and Eq.(7), we have that for every x b Ks,e K, t Tm g+ t (eym(t)) G2 g 2c T + c K m=ms eym ym+1 2 t Tm g+ t (ym(t)+1) G2 g 2c T + 2c Gf RT α + 2c R2K + c K m=ms G+ m(x) δ + G2 f T 2αδK + 2Gf RT = G2 g T 2c +2c Gf RT α +2c R2K+2αR2 δ + G2 f T 2αδK +2Gf RT where that last equality holds since G+ m(x) = 0 for every x b Ks,e and m [ms, me]. Projection-Free Online Convex Optimization with Time-Varying Constraints The proof is concluded by noting that the number of first and last iterates in the interval [s, e] that are not contained in the blocks ms, . . . me, i.e., the set [s, e] \ Sme m=ms Tm, is at most 2K. These add an additional term which is at most 2K 2RGg to the RHS of the last inequality by using the fact that for each t [s, e] \ Sme m=ms Tm and x b Ks,e, g+ t (eym(t)) g+ t (x) + eym(t) x Gg 2RGg. Proof of Theorem 3.1. Fix some interval [s, e], 1 s e T such that b Ks,e = . Applying Lemma 3.2 and Lemma 2.2, we have that for every x b Ks,e it holds that, t=s ft(xm(t)) ft(x) = t=s ft(xm(t)) ft(eym(t)) + ft(eym(t)) ft(x) 4KGf R + 2KR2α + G2 f 2α T + t=s t xm(t) eym(t) 4KGf R + 2KR2α + G2 f 2α T + TGf where the last inequality also uses the facts that eym RB and that t Gf for every t [T]. plugging-in the values of K, α, ϵ stated in the theorem, we obtain the required bound on the regret. Now we move on to bound the constraints violation. Since for every t [T], gt( ) is convex and Gg-Lipschitz over RB, it follows that g+ t (x) is also convex and Gg-Lipschitz over RB. Using Lemma 2.2 we have that for every t [T], g+ t (xm(t)) g+ t ( ym(t)) Gg xm(t) ym(t) Gg Summing over t [s, e], and using Lemma 3.2, it holds that t=s g+ t (xm(t)) G2 g 2c T + 2c R Gf T α + RK + 2αR2 α + 4R Gf T 2δK + 4KRGg + Gg Plugging-in c = Gg 12RT 1 8 and the values of K, α, ϵ, δ listed in the theorem, the overall constraints violation bound stated in the theorem is obtained. Finally, we move on to bound the overall number of calls to the linear optimization oracle. Using Lemma 2.2, the overall number of calls to linear optimization oracle is 2.25 log ym+1 xm 2 2.25 log 4R2 + 2.25 log(e 4 9 ) 2.25 log 7R2 Plugging-in the values of ϵ, K listed in the theorem, we obtain that Ncalls T, as required. 4. Lagrangian-based Primal-Dual Algorithm Our second algorithm, Algorithm 4, is based on an online first-order primal-dual approach which has also been studied extensively, see for instance (Mahdavi et al., 2012; Cao & Liu, 2018; Jenatton et al., 2016; Sun et al., 2017). This algorithm is more efficient than Algorithm 3 since it does not require any exact minimization of an auxiliary optimization problem involving the constraint functions gt, t [T], and instead only requires first-order access (i.e., to compute subgradients) of ft, gt, t [T]. On the downside, we will only have standard (i.e., non-adaptive) guarantees for this method that hold only w.r.t. the entire sequence of functions. The algorithm performs online primal-dual (sub)gradient updates w.r.t. the dual-regularized Lagrangian functions Lt(x, λ) := ft(x) + λg+ t (x) δη and their block aggregations Lm(x, λ) := Pm K t=(m 1)K+1 Lt(x, λ), m [T/K]. Note that the vectors m,x, m,λ in Algorithm 4 are indeed the derivatives of Lm(x, λ) w.r.t. x and λ, evaluated at (xm, λm). Algorithm 4 LOO-based Primal-Dual Method data: parameters T, K, ϵ, δ, η x1 = y1 arbitrary point in K, λ1 0 for m = 1, . . . , T/K do for t = (m 1)K + 1, . . . , m K do Play xm and observe ft( ), gt( ) Set ft ft(xm) and g+ t g+ t (xm) end for Denote Tm = {(m 1)K + 1, . . . , m K} m,x P t Tm ft + λm g+ t m,λ P t Tm g+ t (xm) δηλm ym+1 ΠRB[ ym η m,x] (xm+1, ym+1) OAF P (ym+1, xm, ϵ, K) {Alg. 1} λm+1 ΠR+ (λm + η m,λ) end for Projection-Free Online Convex Optimization with Time-Varying Constraints Theorem 4.1. Suppose b K = {x K : gt(x) 0 t [T]} = . For every T sufficiently large, setting δ = 32 G2 g + Gg R T 1 2 log T, η = T 3 4 , ϵ = 61R2T 1 2 log T, K = T 1 2 in Algorithm 4 guarantees that the regret is upper bounded by t=1 ft(xm(t)) min x b K t=1 ft(x) = O R (Gf + Gg) T 3 4 p log T + (G2 f + (G2 g + 1)R2)T 3/4 , the constraints violation is upper bounded by t=1 g+ t (xm(t))=O Gf R 1+Gg (Gg + R) p log T T 7 8 and the overall number of calls to the LOO is at most T. The proof of the theorem builds on the following lemma. Lemma 4.2. Algorithm 4 guarantees that for every x b K and λ R+, t=1 Lt xm(t), λ Lt x, λm(t) 3ϵT + G2 fηKT + 4G2 g R2ηKT m=1 λm + δ2η3K2 + G2 gηK2 T/K X Proof. Using Lemma 2.2 and the facts that ym+1 = ΠRB ( ym η m,x) and K RB, for every x K and m [T/K] we have that, ym+1 x 2 ym+1 x 2 ym η m,x x 2 = ym x 2 2η m,x ( ym x) + η2 m,x 2 , and rearranging, we have that m,x ( ym x) 1 ym x 2 ym+1 x 2 2 2 m,x . (8) Since λm+1 = ΠR+ (λm + η m,λ), for every λ R+ it holds that, (λm+1 λ)2 (λm + η m,λ λ)2 = (λm λ)2 + 2η m,λ (λm λ) + η2 2 m,λ, and rearranging, we have that m,λ (λ λm) 1 (λm λ)2 (λm+1 λ)2 2 2 m,λ. (9) Since Lt ( , λ) is convex, it holds that Lt (xm, λm) Lt (x, λm) x Lt (xm, λm) (xm x) , and since Lt (xm, ) is concave, it holds that Lt (xm, λ) Lt (xm, λm) λLt (xm, λm) (λ λm) . Combining the last two inequalities we have that, Lt (xm, λ) Lt (x, λm) x Lt (xm, λm) (xm x) + λLt (xm, λm) (λ λm) . Summing the above inequality over t Tm and recalling that m,x = P t Tm x Lt (xm, λm) and m,λ = P t Tm λLt (xm, λm), we have that t Tm Lt (xm, λ) Lt (x, λm) m,x (xm x) + m,λ (λ λm) = m,x (xm ym) + m,x ( ym x) + m,λ (λ λm) . Combining the last inequality with Eq. (8) and (9), we have that for every x K, λ R+ and m, it holds that X t Tm Lt (xm, λ) Lt (x, λm) m,x (xm ym) + 1 ym x 2 ym+1 x 2 2 m,x 2 + 1 (λm λ)2 (λm+1 λ)2 + η Summing the above over m, we have for every x K and λ R+ that t Tm Lt (xm, λ) Lt (x, λm) m=1 m,x xm ym + η m,x 2 + 2 m,λ + y1 x 2 + (λ1 λ)2 Using Lemma 2.2, it holds that xm ym 2 3ϵ for every m. Since y1 K and λ1 = 0, we have for every x K and λ R+ that t Tm Lt (xm, λ) Lt (x, λm) m,x 2 + 2 m,λ . (10) Projection-Free Online Convex Optimization with Time-Varying Constraints Since for every m t Tm ft + λm g+ t K(Gf + λm Gg), and m,λ = P t Tm(g+ t (xm) δηλm), we obtain t=1 Lt xm(t), λ Lt x, λm(t) m=1 λm + Gf 3ϵT + 4R2 + λ2 m=1 (Gf +λm Gg)2 + η t Tm (g+ t (xm) δηλm) 3ϵT + G2 fηKT t Tm g+ t (xm) + δ2η3K2 + G2 gηK2 T/K X m=1 λ2 m, (11) where in the last inequality we have used the inequality (a + b)2 2a2 + 2b2 for every a, b R. Note that g+ t (x) = 0 for every x b K. Since gt(x) is convex and Gg-Lipschitz, g+ t (x) = max{0, gt(x)} is also convex and Gg-Lipschitz and thus, for every x b K it holds that g+ t (xm) g+ t (x) + g+ t (xm x) 2Gg R. Plugging this into the RHS of (11) yields the lemma. Proof of Theorem 4.1. From Lemma 4.2, for every x b K and λ R+ it holds that, ft(xm(t)) ft(x) + λg+ t (xm(t)) λm(t)g+ t (x) m=1 λ2 m δη 3ϵT + G2 f + 4G2 g R2 ηKT m=1 λm + δ2η2 + G2 g ηK2 T/K X Rearranging, we have that ft(xm(t)) ft(x) + λg+ t (xm(t)) λm(t)g+ t (x) 3ϵT + G2 f + 4G2 g R2 ηKT + Gg + δ2η3K2 + G2 gηK2 δηK g+ t (x) = 0 for every x b K and t [T]. Also, by the choice of parameters in the theorem, we have that for any T larger than some constant, δ2η3K2+ηK2G2 g+Gg 2 . Thus, for every x b K and λ R+ we have that, ft(xm(t)) ft(x) + λ t=1 g+ t (xm(t)) + G2 f + 4G2 g R2 ηKT + Gg Since z z2 1 4 for every z R, it holds that ft(xm(t)) ft(x) + λ t=1 g+ t (xm(t)) η + G2 f + 4G2 g R2 ηKT + Let x argminx b K PT t=1 ft(x). Note that λ = PT t=1 g+ t (xm(t)) δηT +η 1 maximizes the term λ PT t=1 g+ t (xm(t)) δη 2 T + 1 2η λ2 in the last inequality. Plugging-in x and λ into the last inequality, we have that ft(xm(t)) ft(x ) + 1 PT t=1 g+ t (xm(t)) 2 η + G2 f + 4G2 g R2 ηKT + This yields the regret bound t=1 ft(xm(t)) ft(x ) η + G2 f + 4G2 g R2 ηKT + 3ϵ (Gf + Gg/4) T. Projection-Free Online Convex Optimization with Time-Varying Constraints Plugging-in the choice of parameters listed in the theorem yields the desired regret bound. Since ft(x ) ft(xm(t)) 2RGf for all t [T], from (12) we also obtain the constraints violation bound t=1 g+ t (xm(t)) p δηT + η 1 4R2 η +2 G2 f+4G2 g R2 ηKT +4RGf T + 2 3ϵ (Gf + Gg/4) T 1/2 . Plugging-in our choice of parameters, yields the bound. Finally, the bound on the overall number of calls to the LOO of K follows from exactly the same argument as in the proof of Theorem 3.1. In particular we use the same values of K, ϵ which go into the bound. 5. Bandit Feedback Setting In the bandit setting, after each round t [T], the algorithm does not get to observe the functions ft, gt (or their subgradients), but only their value at the point played on round t. Building on the standard approach pioneered in (Flaxman et al., 2005) (see also (Hazan, 2019; Garber & Kretzu, 2020; 2022), we extend Algorithm 4 to this setting by replacing the functions ft, g+ t with their µ-smoothed versions: bft(x) = Eu U(Sn)[ft(x + µu)], bg+ t (x) = Eu U(Sn)[g+ t (x + µu)], where U(Sn) denotes the uniform distribution over the origin-centered unit sphere. By sampling uniformly around the point to play, unbiased estimators for the gradients of bft, bg+ t 5 could be constructed using only the values of the functions at the sampled point. We make the additional standard assumptions: I. {ft, gt}t [T ] are chosen obliviously, i.e., before the first round and are thus independent of any randomness introduced by the algorithm. II. there is r > 0 such that r B K. For a fixed smoothing parameter µ (0, r], we denote a shrinking of the set K by Kµ/r := (1 µ/r)K = {(1 µ/r)x | x K}. It is well known that for all x Kµ/r and y r B it holds that x + y K. The proof of the following theorem is given in the appendix. Theorem 5.1. Suppose b K = {x K : gt(x) 0 t [T]} = . For any T sufficiently large, setting 183 max{Gf, Gg, Mf, Mg}2 r , n r, n, n r R}T 1 2 p η = R max{Gf ,Gg,Mf ,Mg}T 3 4 , ϵ = 61R2T 1 2 log T , K = T 1 2 , µ = nr T 1/4 in Algorithm 5, guarantees that the 5 bft, bg+ t are always differentiable, even if ft, g+ t are not Algorithm 5 LOO-based Bandit Primal-Dual Method Data: parameters T, K, ϵ, δ, η x1 = y1 arbitrary point in Kµ/r, λ1 0 for m = 1, . . . , T/K do for t = (m 1)K + 1, . . . , m K do zt xm + µut where ut U(Sn) {ut sampled from uniform dist. over the unit sphere} Play zt and observe ft(zt), gt(zt) t,x n µ ft(zt) + λmg+ t (zt) ut t,λ g+ t (zt) δη end for Denote Tm = {(m 1)K + 1, . . . , m K} m,x P t Tm t,x and m,λ P t Tm t,λ ym+1 ΠRB ( ym η m,x) (xm+1, ym+1) OAF P ym+1, xm, ϵ, Kµ/r {Apply Algorithm 1 w.r.t. the set Kµ/r} λm+1 ΠR+ (λm + η m,λ) end for expected regret is upper bounded by t=1 ft(x) = O n max{Gf, Gg, Mf, Mg}T 3/4 max{R/r, R, 1/ r} + R(1/ n + 1/ r) p log T = O n T 3/4p the expected constraints violation is upper bounded by t=1 g+ t (zt) RGf max{Gf, Gg, Mf, Mg} r , R r, R n, 1 r, 1 n1/4T 7/8log(T)1/4 =O n1/4T 7/8log(T)1/4 , and the overall number of calls to the LOO is upper bounded by T. Acknowledgements Dan Garber is supported by the ISRAEL SCIENCE FOUNDATION (grant No. 2267/22). Impact Statement This paper presents work whose goal is to advance the field of Machine Learning. There are potential societal conse- Projection-Free Online Convex Optimization with Time-Varying Constraints quences of our work, none which we feel must be specifically highlighted here. Cao, X. and Liu, K. R. Online convex optimization with time-varying constraints and bandit feedback. IEEE Transactions on automatic control, 64(7):2665 2680, 2018. Castiglioni, M., Celli, A., Marchesi, A., Romano, G., and Gatti, N. A unifying framework for online optimization with long-term constraints. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems, volume 35, pp. 33589 33602. Curran Associates, Inc., 2022. Flaxman, A. D., Kalai, A. T., and Mc Mahan, H. B. Online convex optimization in the bandit setting: gradient descent without a gradient. In Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 385 394, 2005. Garber, D. and Kretzu, B. Improved regret bounds for projection-free bandit convex optimization. In International Conference on Artificial Intelligence and Statistics, pp. 2196 2206. PMLR, 2020. Garber, D. and Kretzu, B. New projection-free algorithms for online convex optimization with adaptive regret guarantees. In Loh, P.-L. and Raginsky, M. (eds.), Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pp. 2326 2359. PMLR, 02 05 Jul 2022. Garber, D. and Kretzu, B. Projection-free online expconcave optimization. In Neu, G. and Rosasco, L. (eds.), Proceedings of Thirty Sixth Conference on Learning Theory, volume 195 of Proceedings of Machine Learning Research, pp. 1259 1284. PMLR, 12 15 Jul 2023. Guo, H., Liu, X., Wei, H., and Ying, L. Online convex optimization with hard constraints: Towards the best of two worlds and beyond. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems, volume 35, pp. 36426 36439. Curran Associates, Inc., 2022. Hazan, E. Introduction to online convex optimization. ar Xiv preprint ar Xiv:1909.05207, 2019. Hazan, E. and Minasyan, E. Faster projection-free online learning. In Conference on Learning Theory, pp. 1877 1893. PMLR, 2020. Hazan, E. and Seshadhri, C. Efficient learning algorithms for changing environments. In Proceedings of the 26th annual international conference on machine learning, pp. 393 400, 2009. Hazan, E. E. and Kale, S. Projection-free online learning. In 29th International Conference on Machine Learning, ICML 2012, pp. 521 528, 2012. Jenatton, R., Huang, J., and Archambeau, C. Adaptive algorithms for online convex optimization with long-term constraints. In Balcan, M. F. and Weinberger, K. Q. (eds.), Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pp. 402 411, New York, New York, USA, 20 22 Jun 2016. PMLR. Mahdavi, M., Jin, R., and Yang, T. Trading regret for efficiency: online convex optimization with long term constraints. The Journal of Machine Learning Research, 13(1):2503 2528, 2012. Mhammedi, Z. Efficient projection-free online convex optimization with membership oracle. ar Xiv preprint ar Xiv:2111.05818, 2021. Mhammedi, Z. Exploiting the curvature of feasible sets for faster projection-free online learning, 2022. URL https://arxiv.org/abs/2205.11470. Neely, M. J. and Yu, H. Online convex optimization with time-varying constraints. ar Xiv preprint ar Xiv:1702.04783, 2017. Shalev-Shwartz, S. et al. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4(2):107 194, 2012. Sun, W., Dey, D., and Kapoor, A. Safety-aware algorithms for adversarial contextual bandit. In Precup, D. and Teh, Y. W. (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 3280 3288. PMLR, 06 11 Aug 2017. URL https://proceedings.mlr. press/v70/sun17a.html. Yi, X., Li, X., Yang, T., Xie, L., Chai, T., and Johansson, K. Regret and cumulative constraint violation analysis for online convex optimization with long term constraints. In International Conference on Machine Learning, pp. 11998 12008. PMLR, 2021. Yi, X., Li, X., Yang, T., Xie, L., Chai, T., and Karl, H. Regret and cumulative constraint violation analysis for distributed online constrained convex optimization. IEEE Transactions on Automatic Control, 2022. Yu, H. and Neely, M. J. A low complexity algorithm with O( T) regret and O(1) constraint violations for online Projection-Free Online Convex Optimization with Time-Varying Constraints convex optimization with long term constraints. Journal of Machine Learning Research, 21(1):1 24, 2020. URL http://jmlr.org/papers/v21/16-494. html. Projection-Free Online Convex Optimization with Time-Varying Constraints A. Proof of Theorem 5.1 Before we prove the theorem we shall require some preliminary results. Throughout this section we use the following notation: Lt(x, λ) := ft(x) + λg+ t (x) δη 2 λ2 t [T], b Lt(x, λ) := Eu U(Sn) ft(x + µu) + λg+ t (x + µu) δη 2 λ2 t [T], Lm(x, λ) := X t Tm Lt(x, λ) and b Lm(x, λ) := X b Lt(x, λ) m [T/K], where we recall that Tm = {(m 1)K + 1, . . . , m K}. In the following Lemma A.1 and Lemma A.2, the notation m,x is as defined in Algorithm 5. Lemma A.1. For every m [T/K] the following holds: x Kµ/r, λ R+ : 0 b Lm(x, λ) Lm(x, λ) µK(Gf + λGg), x b Lm(xm, λm) = E{ut}t Tm [ m,x] and λ b Lm(xm, λm) = X t Tm Eu U(Sn) g+ t (x + µu) δηKλ. Proof. The second part of the lemma is a well known result, see for instance (Flaxman et al., 2005) and (Hazan, 2019) (Lemma 6.7). As for the first part, it is also well known that x Kµ/r, λ R+ : | b Lm(x, λ) Lm(x, λ)| µK(Gf + λGg), see (Flaxman et al., 2005) and (Hazan, 2019) (Lemma 2.8). One side of this bound could be improved, as suggested by the lemma, by noticing that for any convex function h : K R and any x Kµ/r we have that, h(x) Eu U(Sn)[h(x + µu)] = Eu U(Sn)[h(x) h(x + µu)] Eu U(Sn)[ µu gx] = µEu U(Sn)[u] gx = 0, where gx is some subgradient in h(x). Lemma A.2 (Lemma 5 in (Garber & Kretzu, 2020)). For every m [T/K], E[ m,x ]2 E[ m,x 2] K n(Mf + λm Mg) 2 + K2(Gf + λm Gg)2. The following lemma is analogous to Lemma 4.2 in the full-information setting. Lemma A.3. Algorithm 5 guarantees that for every x Kµ/r and λ R+, t=1 Lt(xm(t), λ) Lt(x, λm(t)) Kn Mg µ + KGg( ηKn2M 2 g µ2 + ηK2G2 g + K2δ2η3 ! T/K X m=1 E[λ2 m] Kn Mf µ + KGf( + η Kn2M 2 f µ2 + ηK2(G2 f + M 2 g ) Projection-Free Online Convex Optimization with Time-Varying Constraints Proof. Fix some block index m in Algorithm 5. Using Lemma 2.2 and the facts that ym+1 = ΠRB ( ym η m,x) and Kµ/r K RB, we have that for every x Kµ/r it holds that ym+1 x 2 ym+1 x 2 ym η m,x x 2 = ym x 2 2η m,x ( ym x) + η2 m,x 2 , and rearranging, we have that m,x ( ym x) 1 ym x 2 ym+1 x 2 + η 2 m,x 2 . (13) Since λm+1 = ΠR+ (λm + η m,λ), for every λ R+ it holds that (λm+1 λ)2 (λm + η m,λ λ)2 = (λm λ)2 + 2η m,λ (λm λ) + η2 2 m,λ, and rearranging, we have that m,λ (λ λm) 1 (λm λ)2 (λm+1 λ)2 + η 2 2 m,λ. (14) Let us now fix some t [T]. Since b Lt ( , λ) is convex, for all x Kµ/r it holds that, b Lt xm(t), λm(t) b Lt x, λm(t) x b Lt xm(t), λm(t) xm(t) x , and since b Lt xm(t), is concave, for all λ R+ it holds that, b Lt xm(t), λ b Lt xm(t), λm(t) λ b Lt xm(t), λm(t) λ λm(t) . Combining the last two inequalities we have that for any x Kµ/r and λ R+ it holds that, b Lt xm(t), λ b Lt x, λm(t) x b Lt xm(t), λm(t) xm(t) x + λ b Lt xm(t), λm(t) (λ λm) . Let us introduce the notation Em[ ] = E{ut}t Tm [ ], where we recall that Tm = {(m 1)K + 1, . . . , m K}. That is, Em[ ] denotes the expectation w.r.t. all randomness introduces by the random vectors {ut}t Tm on some block m during the run of the algorithm. Let us also recall that according to Lemma A.1, Em[ m,x] = x b Lm(xm, λm) = P t Tm x b Lt(xm, λm) and Em[ m,λ] = λ b Lm(xm, λm) = P t Tm λ b Lt(xm, λm). Summing the above inequality over t Tm, we have that for any x Kµ/r and λ R+ it holds that, X b Lt (xm, λ) b Lt (x, λm) Em[ m,x] (xm x) + Em[ m,λ] (λ λm) = Em[ m,x] (xm ym) + Em[ m,x] ( ym x) + Em[ m,λ] (λ λm) . Combining the last inequality with Eq. (13) and (14), and noting that xm, ym, λm are independent of the random vectors {ut}t Tm, we have that for every x Kµ/r and λ R+ it holds that, X b Lt (xm, λ) b Lt (x, λm) Em[ m,x (xm ym)] 2η Em h ym x 2 ym+1 x 2i + η 2Em[ m,x 2] 2η Em h (λm λ)2 (λm+1 λ)2i + η 2Em[ 2 m,λ]. Summing the above over m {1, . . . , T/K} and taking expectation w.r.t. all random variables u1, . . . , u T , we have for every x Kµ/r and λ R+ that, b Lt (xm, λ) b Lt (x, λm) m=1 E [ m,x xm ym ] E[ m,x 2] + E[ 2 m,λ] + y1 x 2 + (λ1 λ)2 Projection-Free Online Convex Optimization with Time-Varying Constraints Using Lemma 2.2 we have that xm ym 2 3ϵ for every m. Also, since y1 Kµ/r RB and λ1 = 0, for every x Kµ/r and λ R+ we have that, t=1 b Lt xm(t), λ b Lt x, λm(t) # m=1 E[ m,x ] + 4R2 + λ2 E[ m,x 2] + E[ 2 m,λ] . (15) Using Lemma A.2 we have that for every m [T/K], Em[ m,x ]2 Em[ m,x 2] K n(Mf + λm Mg) 2 + K2(Gf + λm Gg)2. Recalling also that m,λ = P t Tm g+ t (zt) δηλm , we obtain t=1 b Lt xm(t), λ b Lt x, λm(t) # K n(Mf + λm Mg) 2 + K2(Gf + λm Gg)2 K n(Mf + λm Mg) 2 + K2(Gf + λm Gg)2 # t Tm (g+ t (zt) δηλm) K n(Mf + λm Mg) µ + K(Gf + λm Gg) K n(Mf + λm Mg) 2 + K2(Gf + λm Gg)2 # t Tm g+ t (zt) + K2δ2η2λ2 m where in the last inequality we have used the facts that (a + b)2 2a2 + 2b2 for any a, b R, and a2 + b2 a + b for any a, b R+. Rearranging the RHS in the above using the fact that g+ t (zt) Mg for all t [T], and also using again the inequaity (a + b)2 2a2 + 2b2 gives, RHS of (16) m=1 E[λm] + ηKn2M 2 g µ2 + ηK2G2 g + K2δ2η3 ! T/K X m=1 E[λ2 m] 3ϵKGf + η Kn2M 2 f µ2 + ηK2(G2 f + M 2 g ) Projection-Free Online Convex Optimization with Time-Varying Constraints Finally, using Lemma A.1, for any x Kµ/r and λ R+ we have that, t=1 Lt(xm(t), λ) Lt(x, λm(t)) Kn Mg µ + KGg( ηKn2M 2 g µ2 + ηK2G2 g + K2δ2η3 ! T/K X m=1 E[λ2 m] Kn Mf µ + KGf( + η Kn2M 2 f µ2 + ηK2(G2 f + M 2 g ) Proof of Theorem 5.1. From Lemma 4.2 we have for every x Kµ/r and λ R+ that, ft(xm(t)) ft(x) + λg+ t (xm(t)) λm(t)g+ t (x) + δηK m=1 λ2 m δη Kn Mg µ + KGg( m=1 E[λm] + ηKn2M 2 g µ2 + ηK2G2 g + K2δ2η3 ! T/K X m=1 E[λ2 m] 3ϵKGf + η Kn2M 2 f µ2 + ηK2(G2 f + M 2 g ) + µKGf Rearranging, we have that for every x Kµ/r and λ R+ it holds that, ft(xm(t)) ft(x) + t=1 λg+ t (xm(t)) Kn Mg µ + KGg( ηKn2M 2 g µ2 + ηK2G2 g + K2δ2η3 δηK m=1 E[λ2 m] 3ϵGf + η n2M 2 f µ2 + ηK(G2 f + M 2 g ) + µGf t=1 g+ t (x)E[λm(t)]. Fix some x b K such that x arg minw b K PT t=1 ft(w) and from now on fix x = (1 µ/r)x Kµ/r. Note that for all t [T], using the convexity of g+ t and the fact that g+ t (x ) = 0 we have that, g+ t (x) = g+ t ((1 µ/r)x + (µ/r)0) (1 µ/r)g+ t (x ) + (µ/r)g+ t (0) µ Projection-Free Online Convex Optimization with Time-Varying Constraints This gives, ft(xm(t)) ft(x) + t=1 λg+ t (xm(t)) Kn Mg µ + KGg( 3ϵ + µ) + µKMg ηKn2M 2 g µ2 + ηK2G2 g + K2δ2η3 δηK m=1 E[λ2 m] 3ϵGf + η n2M 2 f µ2 + ηK(G2 f + M 2 g ) + µGf Suppose throughout the rest of the proof that the following condition holds: 2 ηKn2M 2 g µ2 + ηK2G2 g + K2δ2η3 + Kn Mg µ + KGg( 3ϵ + µ) + µKMg n2M 2 g µ2 + KG2 g + η2Kδ2 ! + Kµ Gg + Mg In particular, for the above to hold it suffices that the following holds: ( n2M 2 g µ2 , KG2 g, Kδ2η2, Then we can write, ft(xm(t)) ft(x) + t=1 λg+ t (xm(t)) Kn Mg µ + KGg( 3ϵ + µ) + µKMg m=1 E[λm λ2 m] 3ϵGf + η n2M 2 f µ2 + ηK(G2 f + M 2 g ) + µGf Since z z2 1 4 for every z R, it holds that ft(xm(t)) ft(x) + t=1 λg+ t (xm(t)) 3ϵn(Mg + 4Mf) 3ϵ + µ) + µMg 3ϵGf + 4η n2M 2 f µ2 + 4ηK(G2 f + M 2 g ) + 4µGf Note that λ = PT t=1 E[g+ t (xm(t))] δηT +η 1 maximizes the term λ PT t=1 E[g+ t (xm(t))] δη 2 T + 1 2η λ2 in the above inequality as a function of λ. Plugging-in x = (1 µ/r)x and λ into the last inequality, we have that ft(xm(t)) ft((1 µ/r)x ) # PT t=1 E[g+ t (xm(t))] 2 δηT + η 1 RHS of (18). (19) Projection-Free Online Convex Optimization with Time-Varying Constraints Suppose η = c1T 3/4, K = T 1/2, ϵ = 61R2T 1/2 log T, δ = c2T 1/2 log T and µ = c3T 1/4 for some constants c1, c2, c3. This gives RHS of (18) = O R2T 3/4 c1 + T RT 1/4 log Tn(Mg + Mf) c3 + Gg RT 1/4p log T + c3T 1/4 r + Gf RT 1/4p log T + c1T 1/4n2M 2 f c2 3 + c1T 1/4(G2 f + M 2 g ) + c3Gf T 1/4 !! log T n R(Mg + Mf) c3 + R(Gg + Gf) Gf + Gg + Mg + c1n2M 2 f c2 3 + c1(G2 f + M 2 g ) Setting c1 = R n max{Gf ,Gg,Mf ,Mg}, c3 = nr, we obtain RHS of (18) = O r R(Mg + Mf) + R(Gg + Gf) +T 3/4 n max{R/r, R, 1/ r} max{Gf, Gg, Mf, Mg} ! = O n max{Gf, Gg, Mf, Mg}T 3/4 max{R/r, R, 1/ r} + R(1/ n + 1/ r) p Note that for all t [T], using the convexity of ft, we have that ft(zt) = ft(xm(t) + µut) ft(xm(t)) + µGf, ft((1 µ/r)x ) ft(x ) + µGf Plugging these into (19) yields the regret bound: t=1 ft(zt) ft(x ) RHS of (18) + TµGf = O n max{Gf, Gg, Mf, Mg}T 3/4 max{R/r, R, 1/ r} + R(1/ n + 1/ r) p Using the fact that for all t [T], ft((1 µ/r)x ) ft(xm(t)) 2RGf and g+ t (zt) g+ t (xm(t)) + µGg, we obtain from (19) the constraints violation bound: t=1 g+ t (zt) 2 (δηT + η 1) (RHS of (18) + 2RGf T) + µGg T. (20) Let us now get back to the condition in (17) which we assumed holds true. Plugging-in our choices of η, µ, ϵ, K, δ, it suffices that the constant c2 satisfies: ( n M 2 g T 1/2 r , G2 g T 1/2, c2 2R2 n max{Gf, Gg, Mf, Mg}2 log T, 183n max{Gf, Gg, Mf, Mg}2T 1/2 log T r , n max{Gf, Gg, Mf, Mg}2T 1/2( 183R log T + nr) R , n T 1/2 max{Gf, Gg, Mf, Mg}2 Projection-Free Online Convex Optimization with Time-Varying Constraints In particular, for any T large enough (so that the second term inside the max in the RHS is not dominant), it suffices to take 183 max{Gf, Gg, Mf, Mg}2 max{n r , n r, n, n r R}. Plugging this choice into (20), and noting that the RHS of (18) scales (as a function of T) only as T 3/4 log T, we have that for any T large enough, t=1 g+ t (zt) RGf max{Gf, Gg, Mf, Mg} max{R r , R r, R n, 1 r, 1 R}n1/4T 7/8log(T)1/4 ! Finally, since ϵ, K are the same as in Theorem 4.1, we have that the same upper-bound on the total number of calls to the linear optimization oracle applies, which concludes the proof.