# projectionfree_online_convex_optimization_with_timevarying_constraints__3c8d1dad.pdf

Projection-Free Online Convex Optimization with Time-Varying Constraints

Dan Garber 1 Ben Kretzu 1

We consider the setting of online convex optimization with adversarial time-varying constraints in which actions must be feasible w.r.t. a ﬁxed constraint set, and are also required on average to approximately satisfy additional time-varying constraints. Motivated by scenarios in which the ﬁxed feasible set (hard constraint) is difﬁcult to project on, we consider projection-free algorithms that access this set only through a linear optimization oracle (LOO). We present an algorithm that, on a sequence of length T and using overall T calls to the LOO, guarantees O(T 3/4) regret w.r.t. the losses and O(T 7/8) constraints violation (ignoring all quantities except for T). In particular, these bounds hold w.r.t. any interval of the sequence. This algorithm however also requires access to an oracle for minimizing a strongly convex nonsmooth function over a Euclidean ball. We present a more efﬁcient algorithm that does not require the latter optimization oracle but only ﬁrst-order access to the time-varying constraints, and achieves similar bounds w.r.t. the entire sequence. We extend the latter to the setting of bandit feedback and obtain similar bounds (as a function of T) in expectation.

1. Introduction

We consider a particular setting of the well studied paradigm for sequential prediction known as online convex optimization (OCO) (Hazan, 2019; Shalev-Shwartz et al., 2012). In (standard) OCO, a decision maker is required to iteratively choose an action some point xt on each round t (the total number of rounds T is ﬁnite and assumed for simplicity to be known in advanced), which must belong to some ﬁxed

1Technion - Israel Institute of Technology, Haifa, Israel. Correspondence to: Ben Kretzu <benkretzu@campus.technion.ac.il>, Dan Garber <dangar@technion.ac.il>.

Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s).

(throughout all rounds) feasible convex set K Rn 1, which we will also assume to be compact (as is often standard). After choosing xt, a scalar loss ft(xt) is incurred, where ft : Rn R is some arbitrary convex loss function (assumed for sake of analysis to be chosen adversarially). The standard goal is, on the course of the T rounds, to choose feasible actions x1, . . . , x T such that the regret, given by the difference PT t=1 ft(xt) minx K PT t=1 ft(x), grows (as a function of T) only at a sublinear rate (the slower the better).

In the particular setting considered in this work OCO with time-varying constraints, we assume that besides the hard and ﬁxed constraint given by the set K to which all played points must belong, there are additional soft and time-varying constraints given by convex functions g1, . . . , g T Rn R, where gt is revealed at the end of round t and encodes the constraint gt(x) 0. In this setting the standard goal is two folded: I. to guarantee sublinear regret w.r.t. the loss functions f1, . . . , f T against the best action in hindsight in the intersection of the hard constraint and all soft constraints, i.e., to guarantee that T X

t=1 ft(xt) min x K: gt(x) 0 t [T ]

t=1 ft(x) = o(T), (1)

and II. to guarantee that the cumulative violation of the soft constraints is also sublinear in T, i.e., that T X

g+ t (xt) := max{gt(xt), 0} = o(T). (2)

Here, g+ t (x) := max{gt(x), 0} is introduced to prevent a natural undesired phenomenon in which strongly satisfying some constraints can compensate strongly violating others.

This setting was recently studied in (Yi et al., 2022; Guo et al., 2022; Castiglioni et al., 2022; Neely & Yu, 2017; Cao & Liu, 2018). The state-of-the-art bounds with fullinformation (i.e., ft( ), gt( ) become known to the algorithm at the end of each round t) for this setting are O(

T) for the regret (as given in (1)) and O(T 3/4) for the constraint violation (as given in (2)), see for instance (Guo et al., 2022).

However, previous works require as a sub-routine to solve on each iteration a convex optimization problem that is at

1for ease of presentation we consider the underlying space to be Rn, however any ﬁnite-dimensional Euclidean space is suitable

Projection-Free Online Convex Optimization with Time-Varying Constraints

least as difﬁcult as computing a Euclidean projection onto the feasible set K. In high-dimensional settings and when the set K admits a non-trivial structure, this highly limits the applicability of the proposed methods. Motivated by this observation, in this work, and to the best of our knowledge for the ﬁrst time, we consider so-called projection-free algorithms for OCO with time-varying constraints. Concretely, motivated by vast work on projection-free methods for OCO in recent years, e.g., (Hazan & Kale, 2012; Garber & Kretzu, 2022; Hazan & Minasyan, 2020; Garber & Kretzu, 2023; Mhammedi, 2021; 2022), we consider algorithms that only access the feasible set K through a linear optimization oracle (that is an oracle that given a linear function, can ﬁnd a minimizer of it over K) and consider online algorithms that throughout the T rounds make at most T calls to this oracle.

Example I: online minimum-cost capacitated ﬂow. Consider a ﬁxed directed acyclic graph G with n nodes, m edges, source node s and target node e. A decision maker (DM) must route on each round t a unit ﬂow from s to e, i.e., choose some point xt in the corresponding unit ﬂow polytope2 K. The DM then incurs cost ft(xt). If each edge is associated with a linear cost, we may have ft(x) = f t x for some ft Rm. The DM also needs to respect time-varying edge capacity constraints given by xt(i) ct(i), i = 1, . . . , m, where ct has non-negative entries. Here, the constraint function on time t is simply gt(x) = maxi [m]{x(i) ct(i)}. The unit ﬂow polytope is difﬁcult to project on, however linear optimization over it corresponds to a simple weighted shortest path computation, which takes linear time using dynamic programming.

Example II: online semideﬁnite optimization. Consider a sequence of T semideﬁnite optimization problems over the bounded positive semideﬁnite (PSD) cone K = {X Sn | X 0, Tr(X) τ}, where Sn denotes the space of n n real symmetric matrices and τ > 0. Each instance t in the sequence consist of a convex objective ft(X) : Sn R and a set of mt linear inequalities Tr(A t,i X) bt,i, i {1, . . . , mt}. Here the ﬁxed hard constraints are given by the convex bounded PSD cone, and the time-varying soft constraints can be encoded using gt(X) = maxi [mt]{Tr(A t,i X) bt,i}. Projecting onto the bounded PSD cone requires a full eigen-decomposition of a n n symmetric matrix, which in practical implementations requires O(n3) runtime. Linear optimization over this set amounts to a single leading eigenvector computation whose runtime scales only with n2, and even faster when the input matrix is sparse, using fast iterative eigenvector methods.

We believe this setting is well suited to capture important online scenarios with changing constraints: at one extreme, some works as the ones mentioned above, enforce the hard

2this is also the convex-hull of all identifying vectors of paths from node s to node e in the graph G

constraint K using costly computational procedures (projections) which can be prohibitive, as in the examples above. At the other extreme, for the sake of computational efﬁciency one can also model membership to K as a long term soft constraint (this was the motivation of (Mahdavi et al., 2012) which pioneered the study of OCO with long term constraints), however this might result in a too crude approximation of real-world scenarios. For instance, in Example 1 above it makes sense that even if the capacity constraints could be violated, the decision maker still must choose a feasible ﬂow on each round, or that in Example II the chosen matrix must indeed be PSD. We view our work as an appealing middle ground between these two extremes: distinguishing between hard and soft (varying) constraints, while insisting on tractable procedures for high-dimensional and complex domains.

1.1. Contributions (informally stated)

I. We give an algorithm that using no more than T calls to the LOO of K throughout all T rounds, guarantees that on each time interval [s, e], 1 s < e T, the regret w.r.t. the set b Ks,e = {x K | gt(x) 0 s t e} is upper-bounded by O(T 3/4), and the overall non-negative constraint violation on the interval Pe t=s g+ t (xt) is upperbounded by O(T 7/8) (ignoring all quantities except for T and n). Note this metric (bounds w.r.t. any interval of the sequence, also known as adpative regret (Hazan & Seshadhri, 2009)) is substantially stronger than the ones in (1), (2), and in particular is interesting even when there is no point x K that satisﬁes all constraints gt(x) 0, t [T]. Aside from a LOO for the set K and a ﬁrst-order oracle for the observed loss functions, the algorithm also requires access to an oracle for minimizing a strongly convex nonsmooth function over a Euclidean ball (which can be efﬁciently implemented via standard methods for convex optimization and is independent of the set K). See Section 3.

II. We give a more efﬁcient algorithm than the latter which does not require an oracle for strongly convex nonsmooth minimization over Euclidean balls, and instead only requires ﬁrst-order access to the constraints g1, . . . , g T . This algorithm provides similar bounds, but only w.r.t. to the entire sequence (i.e., bounds as in (1), (2)). See Section 4.

III. We extend the latter algorithm to the bandit setting in which only the scalar values ft(xt), gt(xt) are observed on each round t, where xt is the played point. The algorithm guarantees O( n T 3/4) expected regret and expected non-negative constraint violation PT t=1 g+ t (xt) = O(n1/4T 7/8). See Section 5.

Importantly, our regret bounds w.r.t. the loss functions f1, . . . , f T , both in the full-information and bandit settings, match up to logarithmic terms the current state-of-the-art

Projection-Free Online Convex Optimization with Time-Varying Constraints

bounds (in terms of the horizon T and dimension n) for projection-free OCO (i.e., without additional time-varying constraints) (Hazan & Kale, 2012; Garber & Kretzu, 2022; 2020), and are thus (up to log factors) the best one can currently expect to achieve in our more challenging setting of projection-free OCO with time-varying constraints.

2. Preliminaries

2.1. Assumptions and notation

As stated, we assume that the feasible set K Rn, which encodes the hard constraints (must be satisﬁed on each round t), is convex and compact and we let R be such that K RB, where B denotes the origin-centered unit Euclidean ball. We also assume w.l.o.g. that 0 K. We assume all loss and constraint functions f1, g1, . . . , f T , g T are convex over RB, and we let Gf and Gg denote upper-bounds on the ℓ2 norm of the subgradients of each ft, t [T] and gt, t [T] over the ball RB, respectively.

Our online algorithms will consider the T prediction rounds in disjoint blocks (batches) of size K, where K is an integer parameter. We assume throughout for simplicity and w.l.o.g. that T/K is an integer. We also use the notation m(t) := t/K to map from some t [T] to the corresponding block index m(t) [T/K].

2.2. Fast approximately-feasible projections via linear optimization

In order to construct our projection-free algorithms that rely only on solving linear optimization problems over the ﬁxed convex set K (as opposed to projection or even more complex steps) we rely on the technique recently introduced in (Garber & Kretzu, 2022; 2023) of fast computation of approximately-feasible projections using a LOO.

Deﬁnition 2.1 (Approximately-feasible Projection Oracle3). Given a convex and compact set K Rn, and a tolerance ϵ > 0, we say a function OAF P (y, ϵ, K) is an approximately-feasible projection (AFP) oracle (for the set K with parameter ϵ), if for any input point y Rn, it returns some (x, ey) K Rn such that i. for all z K, ey z y z , and ii. x ey 2 ϵ.

Algorithm 1 given below is taken from (Garber & Kretzu, 2023) and implements an AFP oracle for the set K using only linear optimization steps over K. It uses as a subroutine the well known Frank-Wolfe method (Algorithm 2). If the point-to-project is far from the set, the output of Algorithm 2 is used to construct a hyperplane separating the point from K, which is then used to pull the point closer

3(Garber & Kretzu, 2023) considers arbitrary matrix-induced norms. Here we only consider the Euclidean case which is equivalent to setting A = I in their deﬁnition.

to K. Otherwise, if the point is already sufﬁciently close to K (but not necessarily in K), Algorithm 2 outputs a point in K sufﬁciently close to it.

Algorithm 1 Approximately-Feasible Projection Oracle via a Linear Optimization Oracle (see (Garber & Kretzu, 2023))

Data: parameters y1 Rn, x0 K, ϵ if x0 y1 2 3ϵ then

return x x0, y y1 end if for i = 1, 2, . . . do

xi Output of Algorithm 2 when called with tolerance ϵ, feasible point xi 1, and initial point yi if xi yi 2 > 3ϵ then

yi+1 = yi 2

3 (yi xi) else

return x xi, y yi end if end for

Algorithm 2 Separating Hyperplane via Frank-Wolfe

Data: parameters x1 K, y Rn, ϵ for i = 1, 2, . . . do

vi argminx K{(xi y) x} {call to LOO of K} if (xi y) (xi vi) ϵ or xi y 2 3ϵ then

return ex xi end if σi = argminσ [0,1]{ y xi σ(vi xi)) 2} xi+1 = xi + σi(vi xi) end for

The following lemma is from (Garber & Kretzu, 2023). Lemma 2.2. Let K Rn be convex and compact such that 0 K RB for some R > 0. Algorithm 1 guarantees that it returns (x, y) K Rn such that the following three conditions hold: I. z K : y z 2 y1 z 2, II. x y 2 3ϵ, and III. y y1 . The overall number of calls to the LOO of K (in Algorithm 2) throughout the run of Algorithm 1 is upper-bounded by

ϵ max n 2.25 log y1 x0 2

ϵ + 1, 0 o .

3. Drift-plus-Penalty-inspired Algorithm with Adaptive Regret Guarantees

Our ﬁrst online algorithm, Algorithm 3, is inspired by the drift plus penalty approach used extensively in the literature on OCO with constraints, see for instance (Guo et al., 2022; Yi et al., 2021; 2022; Yu & Neely, 2020). The algorithm considers the T rounds in disjoint blocks of ﬁxed size K, where K is a parameter of the algorithm. At the end of each block m [T/K], the algorithm performs its two key steps: The ﬁrst step is to ﬁnd a minimizer of a drift-pluspenalty-style objective function (the function hm deﬁned in

Projection-Free Online Convex Optimization with Time-Varying Constraints

the algorithm), involving the gradients (i.e., linearizations) of the loss functions and the constraints functions (without linearization) observed during the K iterations of the block, over a Euclidean ball enclosing the set K 4. This amounts to minimizing a strongly convex nonsmooth function over a Euclidean ball (which in particular does not involve the set K and does not require calls to the LOO of K), and can be efﬁciently implemented via a variety of standard and efﬁcient convex optimization algorithms. The second step is to feed to computed minimizer into the approximatelyfeasible projection oracle Algorithm 1, to obtain the next state of the algorithm.

Algorithm 3 LOO-based Drift-plus-Penalty Method

Data: parameters T, K, ϵ, δ, α x1 = ey1 arbitrary point in K for m = 1, . . . , T/K do

for t = (m 1)K + 1, . . . , m K do

Play xm and observe ft( ), gt( ) Set t ft(eym) end for Denote Tm = {(m 1)K + 1, . . . , m K}

K P t=Tm t G+ m(x) δ P

t Tm g+ t (x) ym+1 argminx RB{hm(x) := m (x eym) + G+ m(x) + α

2 x eym 2} (xm+1, eym+1) OAF P (ym+1, xm, ϵ, K) {Alg. 1} end for

Theorem 3.1. Consider running Algorithm 3 with parameters δ = K = T 1 2 , α = Gf

R T 1 4 , ϵ = 61R2T 1

2 log T. Fix an interval [s, e], 1 s < e T. If b Ks,e := {x K : gt(x) 0, t [s, e]} = then the regret on the interval is upper-bounded by

t=s ft(xm(t)) min x b Ks,e

t=s ft(x) = O RGf T 3 4 p

and the constraints violation on the interval is upper bounded by Pe t=s g+ t (xm(t)) = O RGg T 7 8 . The overall number of calls to the LOO of K is upper-bounded by T.

Note Theorem 3.1 in particular provides meaningful bounds in the important scenario in which no single point in K satsﬁes all the soft constraints given by g1, . . . , g T .

Relying only on a LOO (with an overall budget of T calls) comes with a price: the bounds in Theorem 3.1 could be

4We note that our drift-plus-penalty-style objectives are simpler versions of those used in previous works such as (Guo et al., 2022; Yi et al., 2021; 2022; Yu & Neely, 2020) since the penalty multiplying the constraint function G+ m is ﬁxed throughout the run and not adaptive.

compared to the O(

T) regret and O(T 3/4) constraint violation achievable by methods with unrestricted optimizations, see for instance (Guo et al., 2022). This deterioration in rates is expected and is a known phenomena in projection-free methods for OCO (without soft constraints), e.g., (Hazan & Kale, 2012; Garber & Kretzu, 2022).

For the proof of the theorem we need the following lemma.

Lemma 3.2. Consider the (infeasible) sequence {eym(t)}t [T ] produced by Algorithm 3. Fix an interval [s, e], 1 s < e T. If b Ks,e = then the regret of {eym(t)}t [T ] is upper-bounded by:

t=s ft(eym(t)) min x b Ks,e

t=s ft(x) 2KR(2Gf +Rα)+ G2 f 2α T,

and for every c > 0, the constraints violation of the sequence on the interval is upper-bounded by:

t=s g+ t (eym(t)) G2 g 2c T + 2c R Gf T

α + 4R Gf T

2δK + 4KRGg.

Proof. Fix some m [T/K]. Note hm( ) is α-strongly convex. Since ym+1 RB minimizes hm(x) over RB, if follows that for any x RB, hm(x) hm(ym+1) + α

2 x ym+1 2. Thus, for every x K RB,

m (x eym)+G+ m(x)+ α

x eym 2 x ym+1 2

m (ym+1 eym) + G+ m(ym+1) + α

2 ym+1 eym 2 .

Since eym+1 is the output of OAF P (ym+1, ϵ, K), it follows that x eym+1 x ym+1 (Lemma 2.2) and so,

m (x eym)+G+ m(x)+ α

x eym 2 x eym+1 2

m (ym+1 eym) + G+ m(ym+1) + α

2 ym+1 eym 2 .

m (ym+1 eym) + α

2 ym+1 eym 2

2 (ym+1 eym)

Combining the last two inequalities, we have for any x K,

m (x eym) + G+ m(x) + α

x eym 2 x eym+1 2

Projection-Free Online Convex Optimization with Time-Varying Constraints

which by rearranging gives that for every x K,

m (eym x) α

x eym 2 x eym+1 2

2α + G+ m(x) G+ m(ym+1). (4)

Let us now ﬁx some interval [s, e], 1 s e T. Let ms and me denotes the smallest block index and the largest block index that are fully contained in the interval [s, e], respectively. Note that m Gf for every m. Recall G+ m( ) 0, and G+ m(x) = 0 for every x b Ks,e and m [ms, me]. Summing Eq.(4) over m from ms to me, we have that for every x b Ks,e it holds that,

t Tm t (eym x) K α

2 x eyms 2 + G2 f 2α T. (5)

Since ft( ) is convex, for every x b Ks,e it holds that

t=s ft(eym(t)) ft(x) =

t=s ft(eyms 1) ft(x)

t Tm ft(eym) ft(x) +

t=me K+1 ft(eyme+1) ft(x)

t=s t eyms 1 x +

t Tm t (eym x)

t=me K+1 t eyme+1 x .

From Lemma 2.2 it follows that eym RB for every m. plugging this, Eq.(5), and the fact that t [T] t Gf into the last inequality, we have that for every x b Ks,e,

t=s ft(eym(t)) ft(x) 4KGf R + 2KR2α + G2 f 2α T.

Now, we move on to bound the constraint violation on the interval [s, e]. We start with some preliminaries. Rearranging Eq.(3), we have for every x K and m [T/K] that,

ym+1 eym 2 2

α m (x ym+1)

α G+ m(x) G+ m(ym+1) + x eym 2 x eym+1 2

αG+ m(x)+ x eym 2 x eym+1 2 ,

where in the last inequality we have used the facts that x K RB and ym+1 RB, and that G+ m( ) 0 . Summing the above inequality over m from ms to me, and

recalling that eym RB and m Gf for all m, we have that for every x K it holds that,

m=ms ym+1 eym 2 4Gf RT

m=ms G+ m(x)+4R2. (6)

Rearranging Eq.(4), and summing it over m from ms to me, we have that for every x K it holds that,

m=ms G+ m(ym+1)

x eym 2 x eym+1 2

2α +G+ m(x)+ m (x eym)

2αR2 + G2 f T 2αK +

m=ms G+ m(x) + 2Gf RT

Recalling G+ m(x) = δ P

t Tm g+ t (x), we have x K,

t Tm g+ t (ym(t)+1) 2αR2

δ + G2 f T 2αδK

m=ms G+ m(x) + 2Gf RT

Since gt( ) is convex and Gg-Lipschitz over RB for every t, g+ t (x) = max{0, gt(x)} is also convex and Gg-Lipschitz over RB. Using the inequality 2ab a2 + b2 for every a, b R, we have that for every t [T] and c > 0,

g+ t (eym(t)) g+ t (ym(t)+1) Gg eym(t) ym(t)+1

G2 g 2c + c

eym(t) ym(t)+1 2 .

Summing the last inequality over t, and combining it with Eq.(6), and Eq.(7), we have that for every x b Ks,e K,

t Tm g+ t (eym(t)) G2 g 2c T + c K

m=ms eym ym+1 2

t Tm g+ t (ym(t)+1)

G2 g 2c T + 2c Gf RT

α + 2c R2K + c K

m=ms G+ m(x)

δ + G2 f T 2αδK + 2Gf RT

= G2 g T 2c +2c Gf RT

α +2c R2K+2αR2

δ + G2 f T 2αδK +2Gf RT

where that last equality holds since G+ m(x) = 0 for every x b Ks,e and m [ms, me].

Projection-Free Online Convex Optimization with Time-Varying Constraints

The proof is concluded by noting that the number of ﬁrst and last iterates in the interval [s, e] that are not contained in the blocks ms, . . . me, i.e., the set [s, e] \ Sme m=ms Tm, is at most 2K. These add an additional term which is at most 2K 2RGg to the RHS of the last inequality by using the fact that for each t [s, e] \ Sme m=ms Tm and x b Ks,e, g+ t (eym(t)) g+ t (x) + eym(t) x Gg 2RGg.

Proof of Theorem 3.1. Fix some interval [s, e], 1 s e T such that b Ks,e = . Applying Lemma 3.2 and Lemma 2.2, we have that for every x b Ks,e it holds that,

t=s ft(xm(t)) ft(x) =

t=s ft(xm(t)) ft(eym(t)) + ft(eym(t)) ft(x)

4KGf R + 2KR2α + G2 f 2α T +

t=s t xm(t) eym(t)

4KGf R + 2KR2α + G2 f 2α T + TGf

where the last inequality also uses the facts that eym RB and that t Gf for every t [T].

plugging-in the values of K, α, ϵ stated in the theorem, we obtain the required bound on the regret.

Now we move on to bound the constraints violation. Since for every t [T], gt( ) is convex and Gg-Lipschitz over RB, it follows that g+ t (x) is also convex and Gg-Lipschitz over RB. Using Lemma 2.2 we have that for every t [T],

g+ t (xm(t)) g+ t ( ym(t)) Gg xm(t) ym(t) Gg

Summing over t [s, e], and using Lemma 3.2, it holds that

t=s g+ t (xm(t)) G2 g 2c T + 2c R Gf T

α + RK + 2αR2

α + 4R Gf T

2δK + 4KRGg + Gg

Plugging-in c = Gg 12RT 1 8 and the values of K, α, ϵ, δ listed in the theorem, the overall constraints violation bound stated in the theorem is obtained.

Finally, we move on to bound the overall number of calls to the linear optimization oracle. Using Lemma 2.2, the

overall number of calls to linear optimization oracle is

2.25 log ym+1 xm 2

2.25 log 4R2

+ 2.25 log(e 4 9 )

2.25 log 7R2

Plugging-in the values of ϵ, K listed in the theorem, we obtain that Ncalls T, as required.

4. Lagrangian-based Primal-Dual Algorithm

Our second algorithm, Algorithm 4, is based on an online ﬁrst-order primal-dual approach which has also been studied extensively, see for instance (Mahdavi et al., 2012; Cao & Liu, 2018; Jenatton et al., 2016; Sun et al., 2017). This algorithm is more efﬁcient than Algorithm 3 since it does not require any exact minimization of an auxiliary optimization problem involving the constraint functions gt, t [T], and instead only requires ﬁrst-order access (i.e., to compute subgradients) of ft, gt, t [T]. On the downside, we will only have standard (i.e., non-adaptive) guarantees for this method that hold only w.r.t. the entire sequence of functions.

The algorithm performs online primal-dual (sub)gradient updates w.r.t. the dual-regularized Lagrangian functions

Lt(x, λ) := ft(x) + λg+ t (x) δη

and their block aggregations Lm(x, λ) := Pm K t=(m 1)K+1 Lt(x, λ), m [T/K]. Note that the vectors m,x, m,λ in Algorithm 4 are indeed the derivatives of Lm(x, λ) w.r.t. x and λ, evaluated at (xm, λm).

Algorithm 4 LOO-based Primal-Dual Method

data: parameters T, K, ϵ, δ, η x1 = y1 arbitrary point in K, λ1 0 for m = 1, . . . , T/K do

for t = (m 1)K + 1, . . . , m K do

Play xm and observe ft( ), gt( ) Set ft ft(xm) and g+ t g+ t (xm) end for Denote Tm = {(m 1)K + 1, . . . , m K} m,x P

t Tm ft + λm g+ t m,λ P t Tm g+ t (xm) δηλm

ym+1 ΠRB[ ym η m,x] (xm+1, ym+1) OAF P (ym+1, xm, ϵ, K) {Alg. 1} λm+1 ΠR+ (λm + η m,λ) end for

Projection-Free Online Convex Optimization with Time-Varying Constraints

Theorem 4.1. Suppose b K = {x K : gt(x) 0 t [T]} = . For every T sufﬁciently large, setting δ = 32 G2 g + Gg R T 1 2 log T, η = T 3

4 , ϵ = 61R2T 1

2 log T, K = T 1 2 in Algorithm 4 guarantees that the regret is upper bounded by

t=1 ft(xm(t)) min x b K

t=1 ft(x) =

O R (Gf + Gg) T 3 4 p

log T + (G2 f + (G2 g + 1)R2)T 3/4 ,

the constraints violation is upper bounded by

t=1 g+ t (xm(t))=O

Gf R 1+Gg (Gg + R) p

log T T 7 8

and the overall number of calls to the LOO is at most T.

The proof of the theorem builds on the following lemma.

Lemma 4.2. Algorithm 4 guarantees that for every x b K and λ R+,

t=1 Lt xm(t), λ Lt x, λm(t)

3ϵT + G2 fηKT + 4G2 g R2ηKT

m=1 λm + δ2η3K2 + G2 gηK2 T/K X

Proof. Using Lemma 2.2 and the facts that ym+1 = ΠRB ( ym η m,x) and K RB, for every x K and m [T/K] we have that,

ym+1 x 2 ym+1 x 2 ym η m,x x 2

= ym x 2 2η m,x ( ym x) + η2 m,x 2 ,

and rearranging, we have that

m,x ( ym x) 1

ym x 2 ym+1 x 2

2 2 m,x . (8)

Since λm+1 = ΠR+ (λm + η m,λ), for every λ R+ it holds that,

(λm+1 λ)2 (λm + η m,λ λ)2

= (λm λ)2 + 2η m,λ (λm λ) + η2 2 m,λ,

and rearranging, we have that

m,λ (λ λm) 1

(λm λ)2 (λm+1 λ)2

2 2 m,λ. (9)

Since Lt ( , λ) is convex, it holds that

Lt (xm, λm) Lt (x, λm) x Lt (xm, λm) (xm x) ,

and since Lt (xm, ) is concave, it holds that

Lt (xm, λ) Lt (xm, λm) λLt (xm, λm) (λ λm) .

Combining the last two inequalities we have that,

Lt (xm, λ) Lt (x, λm) x Lt (xm, λm) (xm x)

+ λLt (xm, λm) (λ λm) .

Summing the above inequality over t Tm and recalling that m,x = P t Tm x Lt (xm, λm) and m,λ = P

t Tm λLt (xm, λm), we have that

t Tm Lt (xm, λ) Lt (x, λm)

m,x (xm x) + m,λ (λ λm) =

m,x (xm ym) + m,x ( ym x) + m,λ (λ λm) .

Combining the last inequality with Eq. (8) and (9), we have that for every x K, λ R+ and m, it holds that X

t Tm Lt (xm, λ) Lt (x, λm)

m,x (xm ym) + 1

ym x 2 ym+1 x 2

2 m,x 2 + 1

(λm λ)2 (λm+1 λ)2 + η

Summing the above over m, we have for every x K and λ R+ that

t Tm Lt (xm, λ) Lt (x, λm)

m=1 m,x xm ym + η

m,x 2 + 2 m,λ

+ y1 x 2 + (λ1 λ)2

Using Lemma 2.2, it holds that xm ym 2 3ϵ for every m. Since y1 K and λ1 = 0, we have for every x K and λ R+ that

t Tm Lt (xm, λ) Lt (x, λm)

m,x 2 + 2 m,λ . (10)

Projection-Free Online Convex Optimization with Time-Varying Constraints

Since for every m

t Tm ft + λm g+ t K(Gf + λm Gg),

and m,λ = P

t Tm(g+ t (xm) δηλm), we obtain

t=1 Lt xm(t), λ Lt x, λm(t)

m=1 λm + Gf

3ϵT + 4R2 + λ2

m=1 (Gf +λm Gg)2 + η

t Tm (g+ t (xm) δηλm)

3ϵT + G2 fηKT

t Tm g+ t (xm)

+ δ2η3K2 + G2 gηK2 T/K X

m=1 λ2 m, (11)

where in the last inequality we have used the inequality (a + b)2 2a2 + 2b2 for every a, b R.

Note that g+ t (x) = 0 for every x b K. Since gt(x) is convex and Gg-Lipschitz, g+ t (x) = max{0, gt(x)} is also convex and Gg-Lipschitz and thus, for every x b K it holds that g+ t (xm) g+ t (x) + g+ t (xm x) 2Gg R. Plugging this into the RHS of (11) yields the lemma.

Proof of Theorem 4.1. From Lemma 4.2, for every x b K and λ R+ it holds that,

ft(xm(t)) ft(x) +

λg+ t (xm(t)) λm(t)g+ t (x)

m=1 λ2 m δη

3ϵT + G2 f + 4G2 g R2 ηKT

m=1 λm + δ2η2 + G2 g ηK2 T/K X

Rearranging, we have that

ft(xm(t)) ft(x) +

λg+ t (xm(t)) λm(t)g+ t (x)

3ϵT + G2 f + 4G2 g R2 ηKT + Gg

+ δ2η3K2 + G2 gηK2 δηK

g+ t (x) = 0 for every x b K and t [T]. Also, by the choice of parameters in the theorem, we have that for any T larger than some constant, δ2η3K2+ηK2G2 g+Gg

2 . Thus, for every x b K and λ R+ we have that,

ft(xm(t)) ft(x) + λ

t=1 g+ t (xm(t))

+ G2 f + 4G2 g R2 ηKT + Gg

Since z z2 1

4 for every z R, it holds that

ft(xm(t)) ft(x) + λ

t=1 g+ t (xm(t))

η + G2 f + 4G2 g R2 ηKT +

Let x argminx b K PT t=1 ft(x). Note that λ = PT t=1 g+ t (xm(t)) δηT +η 1 maximizes the term λ PT t=1 g+ t (xm(t)) δη

2 T + 1 2η λ2 in the last inequality. Plugging-in x and λ into the last inequality, we have that

ft(xm(t)) ft(x ) + 1

PT t=1 g+ t (xm(t)) 2

η + G2 f + 4G2 g R2 ηKT +

This yields the regret bound

t=1 ft(xm(t)) ft(x )

η + G2 f + 4G2 g R2 ηKT +

3ϵ (Gf + Gg/4) T.

Projection-Free Online Convex Optimization with Time-Varying Constraints

Plugging-in the choice of parameters listed in the theorem yields the desired regret bound.

Since ft(x ) ft(xm(t)) 2RGf for all t [T], from (12) we also obtain the constraints violation bound

t=1 g+ t (xm(t)) p

δηT + η 1 4R2

η +2 G2 f+4G2 g R2 ηKT

+4RGf T + 2

3ϵ (Gf + Gg/4) T 1/2 .

Plugging-in our choice of parameters, yields the bound.

Finally, the bound on the overall number of calls to the LOO of K follows from exactly the same argument as in the proof of Theorem 3.1. In particular we use the same values of K, ϵ which go into the bound.

5. Bandit Feedback Setting

In the bandit setting, after each round t [T], the algorithm does not get to observe the functions ft, gt (or their subgradients), but only their value at the point played on round t. Building on the standard approach pioneered in (Flaxman et al., 2005) (see also (Hazan, 2019; Garber & Kretzu, 2020; 2022), we extend Algorithm 4 to this setting by replacing the functions ft, g+ t with their µ-smoothed versions: bft(x) = Eu U(Sn)[ft(x + µu)], bg+ t (x) = Eu U(Sn)[g+ t (x + µu)], where U(Sn) denotes the uniform distribution over the origin-centered unit sphere. By sampling uniformly around the point to play, unbiased estimators for the gradients of bft, bg+ t

5 could be constructed using only the values of the functions at the sampled point.

We make the additional standard assumptions: I. {ft, gt}t [T ] are chosen obliviously, i.e., before the ﬁrst round and are thus independent of any randomness introduced by the algorithm. II. there is r > 0 such that r B K.

For a ﬁxed smoothing parameter µ (0, r], we denote a shrinking of the set K by Kµ/r := (1 µ/r)K = {(1 µ/r)x | x K}. It is well known that for all x Kµ/r and y r B it holds that x + y K.

The proof of the following theorem is given in the appendix. Theorem 5.1. Suppose b K = {x K : gt(x) 0 t [T]} = . For any T sufﬁciently large, setting

183 max{Gf, Gg, Mf, Mg}2

r , n r, n, n r R}T 1 2 p

η = R max{Gf ,Gg,Mf ,Mg}T 3

4 , ϵ = 61R2T 1

2 log T , K =

T 1 2 , µ = nr T 1/4 in Algorithm 5, guarantees that the

5 bft, bg+ t are always differentiable, even if ft, g+ t are not

Algorithm 5 LOO-based Bandit Primal-Dual Method

Data: parameters T, K, ϵ, δ, η x1 = y1 arbitrary point in Kµ/r, λ1 0 for m = 1, . . . , T/K do

for t = (m 1)K + 1, . . . , m K do

zt xm + µut where ut U(Sn) {ut sampled from uniform dist. over the unit sphere} Play zt and observe ft(zt), gt(zt) t,x n

µ ft(zt) + λmg+ t (zt) ut t,λ g+ t (zt) δη end for Denote Tm = {(m 1)K + 1, . . . , m K} m,x P

t Tm t,x and m,λ P

t Tm t,λ ym+1 ΠRB ( ym η m,x) (xm+1, ym+1) OAF P ym+1, xm, ϵ, Kµ/r

{Apply Algorithm 1 w.r.t. the set Kµ/r} λm+1 ΠR+ (λm + η m,λ) end for

expected regret is upper bounded by

t=1 ft(x) =

O n max{Gf, Gg, Mf, Mg}T 3/4 max{R/r, R, 1/ r}

+ R(1/ n + 1/ r) p

log T = O n T 3/4p

the expected constraints violation is upper bounded by

t=1 g+ t (zt)

RGf max{Gf, Gg, Mf, Mg}

r , R r, R n, 1 r, 1

n1/4T 7/8log(T)1/4

=O n1/4T 7/8log(T)1/4 ,

and the overall number of calls to the LOO is upper bounded by T.

Acknowledgements

Dan Garber is supported by the ISRAEL SCIENCE FOUNDATION (grant No. 2267/22).

Impact Statement

This paper presents work whose goal is to advance the ﬁeld of Machine Learning. There are potential societal conse-

Projection-Free Online Convex Optimization with Time-Varying Constraints

quences of our work, none which we feel must be speciﬁcally highlighted here.

Cao, X. and Liu, K. R. Online convex optimization with time-varying constraints and bandit feedback. IEEE Transactions on automatic control, 64(7):2665 2680, 2018.

Castiglioni, M., Celli, A., Marchesi, A., Romano, G., and Gatti, N. A unifying framework for online optimization with long-term constraints. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems, volume 35, pp. 33589 33602. Curran Associates, Inc., 2022.

Flaxman, A. D., Kalai, A. T., and Mc Mahan, H. B. Online convex optimization in the bandit setting: gradient descent without a gradient. In Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 385 394, 2005.

Garber, D. and Kretzu, B. Improved regret bounds for projection-free bandit convex optimization. In International Conference on Artiﬁcial Intelligence and Statistics, pp. 2196 2206. PMLR, 2020.

Garber, D. and Kretzu, B. New projection-free algorithms for online convex optimization with adaptive regret guarantees. In Loh, P.-L. and Raginsky, M. (eds.), Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pp. 2326 2359. PMLR, 02 05 Jul 2022.

Garber, D. and Kretzu, B. Projection-free online expconcave optimization. In Neu, G. and Rosasco, L. (eds.), Proceedings of Thirty Sixth Conference on Learning Theory, volume 195 of Proceedings of Machine Learning Research, pp. 1259 1284. PMLR, 12 15 Jul 2023.

Guo, H., Liu, X., Wei, H., and Ying, L. Online convex optimization with hard constraints: Towards the best of two worlds and beyond. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems, volume 35, pp. 36426 36439. Curran Associates, Inc., 2022.

Hazan, E. Introduction to online convex optimization. ar Xiv preprint ar Xiv:1909.05207, 2019.

Hazan, E. and Minasyan, E. Faster projection-free online learning. In Conference on Learning Theory, pp. 1877 1893. PMLR, 2020.

Hazan, E. and Seshadhri, C. Efﬁcient learning algorithms for changing environments. In Proceedings of the 26th

annual international conference on machine learning, pp. 393 400, 2009.

Hazan, E. E. and Kale, S. Projection-free online learning. In 29th International Conference on Machine Learning, ICML 2012, pp. 521 528, 2012.

Jenatton, R., Huang, J., and Archambeau, C. Adaptive algorithms for online convex optimization with long-term constraints. In Balcan, M. F. and Weinberger, K. Q. (eds.), Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pp. 402 411, New York, New York, USA, 20 22 Jun 2016. PMLR.

Mahdavi, M., Jin, R., and Yang, T. Trading regret for efﬁciency: online convex optimization with long term constraints. The Journal of Machine Learning Research, 13(1):2503 2528, 2012.

Mhammedi, Z. Efﬁcient projection-free online convex optimization with membership oracle. ar Xiv preprint ar Xiv:2111.05818, 2021.

Mhammedi, Z. Exploiting the curvature of feasible sets for faster projection-free online learning, 2022. URL https://arxiv.org/abs/2205.11470.

Neely, M. J. and Yu, H. Online convex optimization with time-varying constraints. ar Xiv preprint ar Xiv:1702.04783, 2017.

Shalev-Shwartz, S. et al. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4(2):107 194, 2012.

Sun, W., Dey, D., and Kapoor, A. Safety-aware algorithms for adversarial contextual bandit. In Precup, D. and Teh, Y. W. (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 3280 3288. PMLR, 06 11 Aug 2017. URL https://proceedings.mlr. press/v70/sun17a.html.

Yi, X., Li, X., Yang, T., Xie, L., Chai, T., and Johansson, K. Regret and cumulative constraint violation analysis for online convex optimization with long term constraints. In International Conference on Machine Learning, pp. 11998 12008. PMLR, 2021.

Yi, X., Li, X., Yang, T., Xie, L., Chai, T., and Karl, H. Regret and cumulative constraint violation analysis for distributed online constrained convex optimization. IEEE Transactions on Automatic Control, 2022.

Yu, H. and Neely, M. J. A low complexity algorithm with O(

T) regret and O(1) constraint violations for online

Projection-Free Online Convex Optimization with Time-Varying Constraints

convex optimization with long term constraints. Journal of Machine Learning Research, 21(1):1 24, 2020. URL http://jmlr.org/papers/v21/16-494. html.

Projection-Free Online Convex Optimization with Time-Varying Constraints

A. Proof of Theorem 5.1

Before we prove the theorem we shall require some preliminary results.

Throughout this section we use the following notation:

Lt(x, λ) := ft(x) + λg+ t (x) δη

2 λ2 t [T],

b Lt(x, λ) := Eu U(Sn) ft(x + µu) + λg+ t (x + µu) δη

2 λ2 t [T],

Lm(x, λ) := X

t Tm Lt(x, λ) and b Lm(x, λ) := X

b Lt(x, λ) m [T/K],

where we recall that Tm = {(m 1)K + 1, . . . , m K}.

In the following Lemma A.1 and Lemma A.2, the notation m,x is as deﬁned in Algorithm 5.

Lemma A.1. For every m [T/K] the following holds:

x Kµ/r, λ R+ : 0 b Lm(x, λ) Lm(x, λ) µK(Gf + λGg),

x b Lm(xm, λm) = E{ut}t Tm [ m,x] and λ b Lm(xm, λm) = X

t Tm Eu U(Sn) g+ t (x + µu) δηKλ.

Proof. The second part of the lemma is a well known result, see for instance (Flaxman et al., 2005) and (Hazan, 2019) (Lemma 6.7). As for the ﬁrst part, it is also well known that

x Kµ/r, λ R+ : | b Lm(x, λ) Lm(x, λ)| µK(Gf + λGg),

see (Flaxman et al., 2005) and (Hazan, 2019) (Lemma 2.8). One side of this bound could be improved, as suggested by the lemma, by noticing that for any convex function h : K R and any x Kµ/r we have that,

h(x) Eu U(Sn)[h(x + µu)] = Eu U(Sn)[h(x) h(x + µu)] Eu U(Sn)[ µu gx] = µEu U(Sn)[u] gx = 0,

where gx is some subgradient in h(x).

Lemma A.2 (Lemma 5 in (Garber & Kretzu, 2020)). For every m [T/K],

E[ m,x ]2 E[ m,x 2] K n(Mf + λm Mg)

2 + K2(Gf + λm Gg)2.

The following lemma is analogous to Lemma 4.2 in the full-information setting.

Lemma A.3. Algorithm 5 guarantees that for every x Kµ/r and λ R+,

t=1 Lt(xm(t), λ) Lt(x, λm(t))

Kn Mg µ + KGg(

ηKn2M 2 g µ2 + ηK2G2 g + K2δ2η3 ! T/K X

m=1 E[λ2 m]

Kn Mf µ + KGf(

+ η Kn2M 2 f µ2 + ηK2(G2 f + M 2 g )

Projection-Free Online Convex Optimization with Time-Varying Constraints

Proof. Fix some block index m in Algorithm 5. Using Lemma 2.2 and the facts that ym+1 = ΠRB ( ym η m,x) and Kµ/r K RB, we have that for every x Kµ/r it holds that

ym+1 x 2 ym+1 x 2 ym η m,x x 2 = ym x 2 2η m,x ( ym x) + η2 m,x 2 ,

and rearranging, we have that

m,x ( ym x) 1

ym x 2 ym+1 x 2 + η

2 m,x 2 . (13)

Since λm+1 = ΠR+ (λm + η m,λ), for every λ R+ it holds that

(λm+1 λ)2 (λm + η m,λ λ)2 = (λm λ)2 + 2η m,λ (λm λ) + η2 2 m,λ,

and rearranging, we have that

m,λ (λ λm) 1

(λm λ)2 (λm+1 λ)2 + η

2 2 m,λ. (14)

Let us now ﬁx some t [T]. Since b Lt ( , λ) is convex, for all x Kµ/r it holds that,

b Lt xm(t), λm(t) b Lt x, λm(t) x b Lt xm(t), λm(t) xm(t) x ,

and since b Lt xm(t), is concave, for all λ R+ it holds that,

b Lt xm(t), λ b Lt xm(t), λm(t) λ b Lt xm(t), λm(t) λ λm(t) .

Combining the last two inequalities we have that for any x Kµ/r and λ R+ it holds that,

b Lt xm(t), λ b Lt x, λm(t) x b Lt xm(t), λm(t) xm(t) x + λ b Lt xm(t), λm(t) (λ λm) .

Let us introduce the notation Em[ ] = E{ut}t Tm [ ], where we recall that Tm = {(m 1)K + 1, . . . , m K}. That is, Em[ ] denotes the expectation w.r.t. all randomness introduces by the random vectors {ut}t Tm on some block m during the run of the algorithm. Let us also recall that according to Lemma A.1, Em[ m,x] = x b Lm(xm, λm) = P

t Tm x b Lt(xm, λm) and Em[ m,λ] = λ b Lm(xm, λm) = P

t Tm λ b Lt(xm, λm).

Summing the above inequality over t Tm, we have that for any x Kµ/r and λ R+ it holds that, X

b Lt (xm, λ) b Lt (x, λm) Em[ m,x] (xm x) + Em[ m,λ] (λ λm)

= Em[ m,x] (xm ym) + Em[ m,x] ( ym x) + Em[ m,λ] (λ λm) .

Combining the last inequality with Eq. (13) and (14), and noting that xm, ym, λm are independent of the random vectors {ut}t Tm, we have that for every x Kµ/r and λ R+ it holds that, X

b Lt (xm, λ) b Lt (x, λm) Em[ m,x (xm ym)]

2η Em h ym x 2 ym+1 x 2i + η

2Em[ m,x 2]

2η Em h (λm λ)2 (λm+1 λ)2i + η

2Em[ 2 m,λ].

Summing the above over m {1, . . . , T/K} and taking expectation w.r.t. all random variables u1, . . . , u T , we have for every x Kµ/r and λ R+ that,

b Lt (xm, λ) b Lt (x, λm)

m=1 E [ m,x xm ym ]

E[ m,x 2] + E[ 2 m,λ] + y1 x 2 + (λ1 λ)2

Projection-Free Online Convex Optimization with Time-Varying Constraints

Using Lemma 2.2 we have that xm ym 2 3ϵ for every m. Also, since y1 Kµ/r RB and λ1 = 0, for every x Kµ/r and λ R+ we have that,

t=1 b Lt xm(t), λ b Lt x, λm(t) #

m=1 E[ m,x ] + 4R2 + λ2

E[ m,x 2] + E[ 2 m,λ] . (15)

Using Lemma A.2 we have that for every m [T/K],

Em[ m,x ]2 Em[ m,x 2] K n(Mf + λm Mg)

2 + K2(Gf + λm Gg)2.

Recalling also that m,λ = P

t Tm g+ t (zt) δηλm , we obtain

t=1 b Lt xm(t), λ b Lt x, λm(t) #

K n(Mf + λm Mg)

2 + K2(Gf + λm Gg)2

K n(Mf + λm Mg)

2 + K2(Gf + λm Gg)2 #

t Tm (g+ t (zt) δηλm)

K n(Mf + λm Mg)

µ + K(Gf + λm Gg)

K n(Mf + λm Mg)

2 + K2(Gf + λm Gg)2 #

t Tm g+ t (zt)

+ K2δ2η2λ2 m

where in the last inequality we have used the facts that (a + b)2 2a2 + 2b2 for any a, b R, and

a2 + b2 a + b for any a, b R+.

Rearranging the RHS in the above using the fact that g+ t (zt) Mg for all t [T], and also using again the inequaity (a + b)2 2a2 + 2b2 gives,

RHS of (16)

m=1 E[λm] +

ηKn2M 2 g µ2 + ηK2G2 g + K2δ2η3 ! T/K X

m=1 E[λ2 m]

3ϵKGf + η Kn2M 2 f µ2 + ηK2(G2 f + M 2 g )

Projection-Free Online Convex Optimization with Time-Varying Constraints

Finally, using Lemma A.1, for any x Kµ/r and λ R+ we have that,

t=1 Lt(xm(t), λ) Lt(x, λm(t))

Kn Mg µ + KGg(

ηKn2M 2 g µ2 + ηK2G2 g + K2δ2η3 ! T/K X

m=1 E[λ2 m]

Kn Mf µ + KGf(

+ η Kn2M 2 f µ2 + ηK2(G2 f + M 2 g )

Proof of Theorem 5.1. From Lemma 4.2 we have for every x Kµ/r and λ R+ that,

ft(xm(t)) ft(x) +

λg+ t (xm(t)) λm(t)g+ t (x) + δηK

m=1 λ2 m δη

Kn Mg µ + KGg(

m=1 E[λm] +

ηKn2M 2 g µ2 + ηK2G2 g + K2δ2η3 ! T/K X

m=1 E[λ2 m]

3ϵKGf + η Kn2M 2 f µ2 + ηK2(G2 f + M 2 g ) + µKGf

Rearranging, we have that for every x Kµ/r and λ R+ it holds that,

ft(xm(t)) ft(x) +

t=1 λg+ t (xm(t))

Kn Mg µ + KGg(

ηKn2M 2 g µ2 + ηK2G2 g + K2δ2η3 δηK

m=1 E[λ2 m]

3ϵGf + η n2M 2 f µ2 + ηK(G2 f + M 2 g ) + µGf

t=1 g+ t (x)E[λm(t)].

Fix some x b K such that x arg minw b K PT t=1 ft(w) and from now on ﬁx x = (1 µ/r)x Kµ/r. Note that for all t [T], using the convexity of g+ t and the fact that g+ t (x ) = 0 we have that,

g+ t (x) = g+ t ((1 µ/r)x + (µ/r)0) (1 µ/r)g+ t (x ) + (µ/r)g+ t (0) µ

Projection-Free Online Convex Optimization with Time-Varying Constraints

This gives,

ft(xm(t)) ft(x) +

t=1 λg+ t (xm(t))

Kn Mg µ + KGg(

3ϵ + µ) + µKMg

ηKn2M 2 g µ2 + ηK2G2 g + K2δ2η3 δηK

m=1 E[λ2 m]

3ϵGf + η n2M 2 f µ2 + ηK(G2 f + M 2 g ) + µGf

Suppose throughout the rest of the proof that the following condition holds:

2 ηKn2M 2 g µ2 + ηK2G2 g + K2δ2η3 +

Kn Mg µ + KGg(

3ϵ + µ) + µKMg

n2M 2 g µ2 + KG2 g + η2Kδ2 !

+ Kµ Gg + Mg

In particular, for the above to hold it sufﬁces that the following holds:

( n2M 2 g µ2 , KG2 g, Kδ2η2,

Then we can write,

ft(xm(t)) ft(x) +

t=1 λg+ t (xm(t))

Kn Mg µ + KGg(

3ϵ + µ) + µKMg

m=1 E[λm λ2 m]

3ϵGf + η n2M 2 f µ2 + ηK(G2 f + M 2 g ) + µGf

Since z z2 1

4 for every z R, it holds that

ft(xm(t)) ft(x) +

t=1 λg+ t (xm(t))

3ϵn(Mg + 4Mf)

3ϵ + µ) + µMg

3ϵGf + 4η n2M 2 f µ2 + 4ηK(G2 f + M 2 g ) + 4µGf

Note that λ =

PT t=1 E[g+ t (xm(t))] δηT +η 1 maximizes the term λ PT t=1 E[g+ t (xm(t))] δη

2 T + 1 2η λ2 in the above inequality as a

function of λ. Plugging-in x = (1 µ/r)x and λ into the last inequality, we have that

ft(xm(t)) ft((1 µ/r)x ) #

PT t=1 E[g+ t (xm(t))] 2

δηT + η 1 RHS of (18). (19)

Projection-Free Online Convex Optimization with Time-Varying Constraints

Suppose η = c1T 3/4, K = T 1/2, ϵ = 61R2T 1/2 log T, δ = c2T 1/2 log T and µ = c3T 1/4 for some constants c1, c2, c3. This gives

RHS of (18) = O R2T 3/4

c1 + T RT 1/4 log Tn(Mg + Mf)

c3 + Gg RT 1/4p

log T + c3T 1/4

r + Gf RT 1/4p

log T + c1T 1/4n2M 2 f c2 3 + c1T 1/4(G2 f + M 2 g ) + c3Gf T 1/4 !!

log T n R(Mg + Mf)

c3 + R(Gg + Gf)

Gf + Gg + Mg

+ c1n2M 2 f c2 3 + c1(G2 f + M 2 g )

Setting c1 = R n max{Gf ,Gg,Mf ,Mg}, c3 = nr, we obtain

RHS of (18) = O

r R(Mg + Mf) + R(Gg + Gf)

+T 3/4 n max{R/r, R, 1/ r} max{Gf, Gg, Mf, Mg} !

= O n max{Gf, Gg, Mf, Mg}T 3/4 max{R/r, R, 1/ r} + R(1/ n + 1/ r) p

Note that for all t [T], using the convexity of ft, we have that

ft(zt) = ft(xm(t) + µut) ft(xm(t)) + µGf, ft((1 µ/r)x ) ft(x ) + µGf

Plugging these into (19) yields the regret bound:

t=1 ft(zt) ft(x )

RHS of (18) + TµGf

= O n max{Gf, Gg, Mf, Mg}T 3/4 max{R/r, R, 1/ r} + R(1/ n + 1/ r) p

Using the fact that for all t [T], ft((1 µ/r)x ) ft(xm(t)) 2RGf and g+ t (zt) g+ t (xm(t)) + µGg, we obtain from (19) the constraints violation bound:

t=1 g+ t (zt)

2 (δηT + η 1) (RHS of (18) + 2RGf T) + µGg T. (20)

Let us now get back to the condition in (17) which we assumed holds true. Plugging-in our choices of η, µ, ϵ, K, δ, it sufﬁces that the constant c2 satisﬁes:

( n M 2 g T 1/2

r , G2 g T 1/2, c2 2R2

n max{Gf, Gg, Mf, Mg}2 log T,

183n max{Gf, Gg, Mf, Mg}2T 1/2 log T r ,

n max{Gf, Gg, Mf, Mg}2T 1/2(

183R log T + nr) R ,

n T 1/2 max{Gf, Gg, Mf, Mg}2

Projection-Free Online Convex Optimization with Time-Varying Constraints

In particular, for any T large enough (so that the second term inside the max in the RHS is not dominant), it sufﬁces to take

183 max{Gf, Gg, Mf, Mg}2 max{n

r , n r, n, n r R}.

Plugging this choice into (20), and noting that the RHS of (18) scales (as a function of T) only as T 3/4 log T, we have that for any T large enough,

t=1 g+ t (zt)

RGf max{Gf, Gg, Mf, Mg} max{R

r , R r, R n, 1 r, 1

R}n1/4T 7/8log(T)1/4 !

Finally, since ϵ, K are the same as in Theorem 4.1, we have that the same upper-bound on the total number of calls to the linear optimization oracle applies, which concludes the proof.