# online_learning_with_a_hint__f6071fda.pdf

Online Learning with a Hint

Ofer Dekel Microsoft Research oferd@microsoft.com

Arthur Flajolet Operations Research Center Massachusetts Institute of Technology flajolet@mit.edu

Nika Haghtalab Computer Science Department Carnegie Mellon University nika@cmu.edu

Patrick Jaillet EECS, LIDS, ORC Massachusetts Institute of Technology jaillet@mit.edu

We study a variant of online linear optimization where the player receives a hint about the loss function at the beginning of each round. The hint is given in the form of a vector that is weakly correlated with the loss vector on that round. We show that the player can beneﬁt from such a hint if the set of feasible actions is sufﬁciently round. Speciﬁcally, if the set is strongly convex, the hint can be used to guarantee a regret of O(log(T)), and if the set is q-uniformly convex for q (2, 3), the hint can be used to guarantee a regret of o(

T). In contrast, we establish Ω(

T) lower bounds on regret when the set of feasible actions is a polyhedron.

1 Introduction

Online linear optimization is a canonical problem in online learning. In this setting, a player attempts to minimize an online adversarial sequence of loss functions while incurring a small regret, compared to the best ofﬂine solution. Many online algorithms exist that are designed to have a regret of O(

T) in the worst case and it has been known that one cannot avoid a regret of Ω(

T) in the worst case. While this worst-case perspective on online linear optimization has lead to elegant algorithms and deep connections to other ﬁelds, such as boosting [9, 10] and game theory [4, 2], it can be overly pessimistic. In particular, it does not account for the fact that the player may have side-information that allows him to anticipate the upcoming loss functions and evade the Ω(

T) regret. In this work, we go beyond this worst case analysis and consider online linear optimization when additional information in the form of a function that is correlated with the loss is presented to the player.

More formally, online convex optimization [24, 11] is a T-round repeated game between a player and an adversary. On each round, the player chooses an action xt from a convex set of feasible actions K Rd and the adversary chooses a convex bounded loss function ft. Both choices are revealed and the player incurs a loss of ft(xt). The player then uses its knowledge of ft to adjust its strategy for the subsequent rounds. The player s goal is to accumulate a small loss compared to the best ﬁxed action in hindsight. This value is called regret and is a measure of success of the player s algorithm.

When the adversary is restricted to Lipschitz loss functions, several algorithms are known to guarantee O(

T) regret [24, 16, 11]. If we further restrict the adversary to strongly convex loss functions, the regret bound improves to O(log(T)) [14]. However, when the loss functions are linear, no online algorithm can have a regret of o(

T) [5]. In this sense, linear loss functions are the most difﬁcult convex loss functions to handle [24].

31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.

In this paper, we focus on the case where the adversary is restricted to linear Lipschitz loss functions. More speciﬁcally, we assume that the loss function ft(x) takes the form c T tx, where ct is a bounded loss vector in C Rd. We further assume that the player receives a hint before choosing the action on each round. The hint in our setting is a vector that is guaranteed to be weakly correlated with the loss vector. Namely, at the beginning of round t, the player observes a unit-length vector vt Rd such that v T tct α ct 2, and where α is a small positive constant. So long as this requirement is met, the hint could be chosen maliciously, possibly by an adversary who knows how the player s algorithm uses the hint. Our goal is to develop a player strategy that takes these hints into account, and to understand when hints of this type make the problem provably easier and lead to smaller regret.

We show that the player s ability to beneﬁt from the hints depends on the geometry of the player s action set K. Speciﬁcally, we characterize the roundness of the set K using the notion of (C, q)- uniform convexity for convex sets. In Section 3, we show that if K is a (C, 2)-uniformly convex set (or in other words, if K is a C-strongly convex set), then we can use the hint to design a player strategy that improves the regret guarantee to O (Cα) 1 log(T) , where our O( ) notation hides a polynomial dependence on the dimension d and other constants. Furthermore, as we show in Section 4, if K is a (C, q)-uniformly convex set for q (2, 3), we can use the hint to improve the

regret to O (Cα) 1 1 q T

2 q 1 q , when the hint belongs to a small set of possible hints at every step.

In Section 5, we prove lower bounds on the regret of any online algorithm in this model. We ﬁrst show that when K is a polyhedron, such as a L1 ball, even a stronger form of hint cannot guarantee a regret of o(

T). Next, we prove a lower bound of Ω(log(T)) regret when K is strongly convex.

1.1 Comparison with Other Notions of Hints

The notion of hint that we introduce in this work generalizes some of the notions of predictability on online learning. Hazan and Megiddo [13] considered as an example a setting where the player knows the ﬁrst coordinate of the loss vector at all rounds, and showed that when |ct1| α and when the set of feasible actions is the Euclidean ball, one can achieve a regret of O(1/α log(T)). Our work directly improves over this result, as in our setting a hint vt = e1 also achieves O(1/α log(T)) regret, but we can deal with hints in different directions at different rounds and we allow for general uniformly convex action sets. Rakhlin and Sridharan [20] considered online learning with predictable sequences, with a notion of predictability that is concerned with the gradient of the convex loss functions. They show that if the player receives a

hint Mt at round t, then the regret of the algorithm is at most O( q PT t=1 ft(xt) Mt 2 ).

Figure 1: Comparison between notions of hint.

In the case of linear loss functions, this implies that having an estimate vector c t of the loss vector within distance σ of the true loss vector ct results in an improved regret bound of O(σ

T). In contrast, we consider a notion of hint that pertains to the direction of the loss vector rather than its location. Our work shows that merely knowing whether the loss vector positively or negatively correlates with another vector is sufﬁcient to achieve improved regret bound, when the set is uniformly convex. That is, rather than having access to an approximate value of ct, we only need to have access to a halfspace that classiﬁes ct correctly with a margin. This notion of hint is weaker that the notion of hint in the work of Rakhlin and Sridharan [20] when the approximation error satisﬁes ct c t 2 σ ct 2 for σ [0, 1). In this case one can use c t/ c t 2 as the direction of the hint in our setting and achieve a regret of O( 1 1 σ log T) when the set is strongly convex. This shows that when the set of feasible actions is strongly convex, a directional hint can improve the regret bound beyond what has been known to be achievable by an approximation hint. However, we note that our results require the hints to be always valid, whereas the algorithm of Rakhlin and Sridharan [19] can adapt to the quality of the hints.

We discuss these works and other related works, such as [15], in more details in Appendix A.

2 Preliminaries

We begin with a more formal deﬁnition of online linear optimization (without hints). Let A denote the player s algorithm for choosing its actions. On round t the player uses A and all of the information

it has observed so far to choose an action xt in a convex compact set K Rd. Subsequently, the adversary chooses a loss vector ct in a compact set C Rd. The player and the adversary reveal their actions and the player incurs the loss c T txt. The player s regret is deﬁned as

R(A, c1:T ) =

t=1 c T txt min x K

t=1 c T tx.

In online linear optimization with hints, the player observes vt Rd with vt 2 = 1, before choosing xt, with the guarantee that v T tct α ct 2, for some α > 0.

We use uniform convexity to characterize the degree of convexity of the player s action set K. Informally, uniform convexity requires that the convex combination of any two points x and y on the boundary of K be sufﬁciently far from the boundary. A formal deﬁnition is given below. Deﬁnition 2.1 (Pisier [18]). Let K be a convex set that is symmetric around the origin. K and the Banach space deﬁned by K are said to be uniformly convex if for any 0 < ϵ < 2 there exists a δ > 0 such that for any pair of points x, y K with x K 1, y K 1, x y K ϵ, we have x+y

2 K 1 δ. The modulus of uniform-convexity δK(ϵ) is the best possible δ for that ϵ, i.e.,

δK(ϵ) = inf 1 x + y

K : x K 1, y K 1, x y K ϵ .

For brevity, we say that K is (C, q)-uniformly convex when δK(ϵ) = Cϵq and we omit C when it is clear from the context.

Examples of uniformly convex sets include Lp balls for any 1 < p < with modulus of convexity δLp(ϵ) = Cpϵp for p 2 and a constant Cp and δLp(ϵ) = (p 1)ϵ2 for 1 < p 2. On the other hand, L1 and L units balls are not uniformly convex. When δK(ϵ) Θ(ϵ2), we say that K is strongly convex.

Another notion of convexity we use in this work is called exp-concavity. A function f : K R is exp-concave with parameter β > 0, if exp( βf(x)) is a concave function of x K. This is a weaker requirement than strong convexity when the gradient of f is uniformly bounded [14]. The next proposition shows that we can obtain regret bounds of order Θ(log(T)) in online convex optimization when the loss functions are exp-concave with parameter β. Proposition 2.2 ([14]). Consider online convex optimization on a sequence of loss functions f1, . . . , f T over a feasible set K Rd, such that all t, ft : K R is exp-concave with parameter β > 0. There is an algorithm, with runtime polynomial in d, which we call AEXP, with a regret that is at most d

β (1 + log(T + 1)).

Throughout this work, we draw intuition from basic orthogonal geometry. Given any vector x and a hint v, we deﬁne x v = (x Tv)v and x

v = x (x Tv)v, as the parallel and the orthogonal components of x with respect to v. When the hint v is clear from the context we simply use x and x

to denote these vectors.

Naturally, our regret bounds involve a number of geometric parameters. Since the set of actions of the adversary C is compact, we can ﬁnd G 0 such that c 2 G for all c C. When K is uniformly convex, we denote K = {w Rd | w K 1}. In this case, since all norms are equivalent in ﬁnite dimension, there exist R > 0 and r > 0 such that Br K BR, where Br (resp. BR) denote the L2 unit ball centered at 0 with radius r (resp. R). This implies that 1

3 Improved Regret Bounds for Strongly Convex K

At ﬁrst sight, it is not immediately clear how one should use the hint. Since vt is guaranteed to satisfy c T tvt α ct 2, moving the action x in the direction vt always decreases the loss. One could hope to get the most beneﬁt out of the hint by choosing xt to be the extremal point in K in the direction vt. However, this naïve strategy could lead to a linear regret in the worst case. For example, say that ct = (1, 1

2) and vt = (0, 1) for all t and let K be the Euclidean unit ball. Choosing xt = vt would incur a loss of T

2 , while the best ﬁxed action in hindsight, the point ( 2

5), would incur a

5 2 T. The player s regret would therefore be

Intuitively, the ﬂaw of this naïve strategy is that the hint does not give the player any information about the (d 1)-dimensional subspace orthogonal to vt. Our solution is to use standard online learning machinery to learn how to act in this orthogonal subspace. Speciﬁcally, on round t, we use vt to deﬁne the following virtual loss function:

ˆct(x) = min w K c T tw s.t. w

Figure 2: Virtual function ˆc( ).

In words, we consider the 1-dimensional subspace spanned by vt and its (d 1)-dimensional orthogonal subspace separately. For any action x K, we ﬁnd another point, w K, that equals x in the (d 1)-dimensional orthogonal subspace, but otherwise incurs the optimal loss. The value of the virtual loss ˆct(x) is deﬁned to be the value of the original loss function ct at w. The virtual loss simulates the process of moving x as far as possible in the direction vt without changing its value in any other direction (see Figure 2). This can be more formally seen by the following equation.

arg min w K:w

c T tw = arg min w K:w

= arg min w K:w

v T tw, (1)

where the last transition holds by the fact that ct = ct 2 vt since the hint is valid.

ˆc(x)+ˆc(y)

Figure 3: Uniform-convexity of the feasible set affects the convexity the virtual loss function.

This provides an intuitive understanding of a measure of convexity of our virtual loss functions. When K is uniformly convex then the function ˆct( ) demonstrates convexity in the subspace orthogonal to vt. To see that, note that for any x and y that lie in the space orthogonal to vt, their mid point x+y

2 transforms to a point that is farther away in the direction of vt than the midpoint of the transformations of x and y. As shown in Figure 3, the modulus of uniform convexity of K affects the degree of convexity of ˆct( ). We note, however, that ˆct( ) is not strongly convex in all directions. In fact, ˆct( ) is constant in the direction of vt. Nevertheless, the properties shown here allude to the fact that ˆct( ) demonstrates some notion of convexity. As we show in the next lemma, this notion is indeed exp-concavity: Lemma 3.1. If K is (C, 2)-uniformly convex, then ˆct( ) is 8 α C r

G R2 -exp-concave.

Proof. Let γ = 8 α C r

G R2 . Without loss of generality, we assume that ct = 0, otherwise ˆct( ) = 0 is a constant function and the proof follows immediately. Based on the above discussion, it is not hard to see that ˆct( ) is continuous (we prove this in more detail in the Appendix D.1. So, to prove that ˆct( ) is exp-concave, it is sufﬁcient to show that

2 exp ( γ ˆct(x)) + 1

2 exp ( γ ˆct(y)) (x, y) K.

Consider (x, y) K and choose corresponding (ˆx, ˆy) K such that ˆct(x) = c T t ˆx and ˆct(y) = c T t ˆy. Without loss of generality, we have ˆx K = ˆy K = 1, as we can always choose corresponding ˆx, ˆy that are extreme points of K. Since exp( γˆct( )) is decreasing in ˆct( ), we have

= max w K 1

exp( γ c T tw). (2)

Note that w = ˆx+ˆy

2 δK( ˆx ˆy K) vt vt K satisﬁes w K 1, since w K ˆx+ˆy

δK( ˆx ˆy K) 1 (see also Figure 3). Moreover, w

vt. So, by using this w in Equation (2), we have

2 (c T t ˆx + c T t ˆy) + γ c T tvt vt K δK( ˆx ˆy K) . (3)

On the other hand, since vt K 1

r and ˆx ˆy K 1

R ˆx ˆy 2, we have

exp γ c T tvt vt K δK( ˆx ˆy K) exp γ r α ct 2 C 1

R2 ˆx ˆy 2 2

R2 ct 2 c T t ˆx ct 2 c T t ˆy ct 2

exp (γ/2)2 (c T t ˆx c T t ˆy)2

2 (c T t ˆx c T t ˆy) + 1

2 (c T t ˆy c T t ˆx) ,

where the penultimate inequality follows by the deﬁnition of γ and the last inequality is a consequence of the inequality exp(z2/2) 1

2 exp(z) + 1

2 exp( z), z R. Plugging the last inequality into (3) yields

exp γˆct(x+y

2 (c T t ˆx + c T t ˆy) n exp γ

2 (c T t ˆx c T t ˆy) + exp γ

2 (c T t ˆy c T t ˆx) o

2 exp ( γ c T t ˆy) + 1

2 exp ( γ c T t ˆx)

2 exp ( γ ˆct(y)) + 1

2 exp ( γ ˆct(x)) ,

which concludes the proof.

Now, we use the sequence of virtual loss functions to reduce our problem to a standard online convex optimization problem (without hints). Namely, the player applies AEXP (from Proposition 2.2), which is an online convex optimization algorithm known to have O(log(T)) regret with respect to exp-concave functions, to the sequence of virtual loss functions. Then our algorithm takes the action ˆxt K that is prescribed by AEXP and moves it as far as possible in the direction of vt. This process is formalized in Algorithm 1.

Algorithm 1 Ahint FOR STRONGLY CONVEX K For t = 1, . . . , T,

1. Use Algorithm AEXP with the history ˆcτ( ) for τ < t, and let ˆxt be the chosen action.

2. Let xt = arg minw K(v T tw) s.t. w

vt t . Play xt and receive ct as feedback.

Next, we show that the regret of algorithm AEXP on the sequence of virtual loss functions is an upper bound on the regret of Algorithm 1. Lemma 3.2. For any sequence of loss functions c1, . . . , c T , let R(Ahint, c1:T ) be the regret of algorithm Ahint on the sequence c1, . . . , c T , and R(AEXP, ˆc1:T ) be the regret of algorithm AEXP on the sequence of virtual loss functions ˆc1, . . . , ˆc T . Then, R(Ahint, c1:T ) R(AEXP, ˆc1:T ).

Proof. Equation (1) provides an equivalent deﬁnition xt = arg minw K(c T tw) s.t. w

vt t . Using this, we show that the loss of algorithm Ahint on the sequence c1:T is the same as the loss of algorithm AEXP on the sequence ˆc1:T .

t=1 ˆct(ˆxt) =

t=1 min w K:w

t=1 c T t( arg min w K:w

t c T tw) =

t=1 c T txt.

Next, we show that the ofﬂine optimal on the sequence ˆc1:T is more competitive that the ofﬂine optimal on the sequence c1:T . First note that for any x and t, ˆct(x) = minw K:w

c T tw c T tx. Therefore, minx K PT t=1 ˆct(x) minx K PT t=1 c T tx. The proof concludes by

R(Ahint, c1:T ) =

t=1 c T txt min x K

t=1 ˆct(ˆxt) min x K

t=1 ˆct(x) = R(AEXP, ˆc1:T ).

Our main result follows from the application of Lemmas 3.1 and 3.2. Theorem 3.3. Suppose that K Rd is a (C, 2)-uniformly convex set that is symmetric around the origin, and Br K BR for some r and R. Consider online linear optimization with hints where the cost function at round t is ct 2 G and the hint vt is such that c T tvt α ct 2, while vt 2 = 1. Algorithm 1 in combination with AEXP has a worst-case regret of

R(Ahint, c1:T ) d G R2

8α C r (1 + log(T + 1)).

Since AEXP requires the coefﬁcient of exp-concavity to be given as an input, α needs to be known a priori to be able to use Algorithm 1. However, we can use a standard doubling trick to relax this requirement and derive the same asymptotic regret bound. We defer the presentation of this argument to Appendix B.

4 Improved Regret Bounds for (C, q)-Uniformly Convex K

In this section, we consider any feasible set K that is (C, q)-uniformly convex for q 2. Our results differ from the previous section in two aspects. First, our algorithm can be used with (C, q)-uniformly convex feasible sets for any q 2 compared to the results of the previous section that only hold for strongly convex sets (q = 2). On the other hand, the approach in this section requires the hints to be restricted to a ﬁnite set of vectors V. We show that when K is (C, q)-uniformly convex for q > 2, our regret is O(T

2 q 1 q ). If q (2, 3), this is an improvement over the worst case regret of O(

T) guaranteed in the absence of hints.

We ﬁrst consider the scenario where the hint is always pointing in the same direction, i.e. vt = v for some v and all t [T]. In this case, we show how one can use a simple algorithm that picks the best performing action so far (a.k.a the Follow-The-Leader algorithm) to obtain improved regret bounds. We then consider the case where the hint belongs to a ﬁnite set V. In this case, we instantiate one copy of the Follow-The-Leader algorithm for each v V and combine their outcomes in order to obtain improved regret bounds that depend on the cardinality of V, which we denote by |V|. Lemma 4.1. Suppose that vt = v for all t = 1, , T and that K is (C, q)-uniformly convex that is symmetric around the origin, and Br K BR for some r and R. Consider the algorithm, called Follow-The-Leader (FTL), that at every round t, plays xt arg minx K P

τ<t c T τx. If Pt τ=1 c T τv 0 for all t = 1, , T, then the regret is bounded as follows,

R(AFTL, c1:T ) v K Rq

ct q 2 Pt τ=1 c Tτv

Furthermore, when v is a valid hint with margin α, i.e., c T tv α ct 2 for all t = 1, , T, the right-hand side can be further simpliﬁed to obtain the regret bound:

R(AFTL, c1:T ) 1

2γ G (ln(T) + 1) if q = 2

and R(AFTL, c1:T ) 1 (2γ)1/(q 1) G q 1

q 2 q 1 if q > 2,

where γ = C α v K Rq .

Proof. We use a well-known inequality, known as FT(R)L Lemma (see e.g., [12, 17]), on the regret incurred by the FTL algorithm:

R(AFTL, c1:T )

t=1 c T t(xt xt+1).

Without loss of generality, we can assume that xt K = xt+1 K = 1 since the maximum of a linear function is attained at a boundary point. Since K is (C, q)-uniformly convex, we have xt + xt+1

K 1 δK( xt xt+1 K).

This implies that xt + xt+1

2 δK( xt xt+1 K) v v K

Moreover, xt+1 arg minx K x T Pt τ=1 cτ. So, we have t X

!T xt + xt+1

2 δK( xt xt+1 K) v v K

inf x K x T t X

τ=1 cτ = x T t+1

Rearranging this last inequality and using the fact that Pt τ=1 v Tcτ 0, we obtain: t X

δK( xt xt+1 K) Pt τ=1 v Tcτ

v K C xt xt+1 q 2 v K Rq

By deﬁnition of FTL, we have xt arg minx K x T Pt 1 τ=1 cτ, which implies: t 1 X

Summing up the last two inequalities and setting γ = C α v K Rq , we derive:

xt xt+1 q 2 γ

(c T t(xt xt+1))q

Rearranging this last inequality and using the fact that Pt τ=1 v Tcτ 0, we obtain:

|c T t(xt xt+1)| 1 (2γ/α)1/(q 1)

ct q 2 Pt τ=1 v Tcτ

Summing (4) over all t completes the proof of the ﬁrst claim. The regret bounds for when v Tct α ct 2 for all t = 1, , T follow from the ﬁrst regret bound. We defer this part of the proof to Appendix D.2.

Note that the regret bounds become O(T) when q . This is expected because Lq balls are q-uniformly convex for q 2 and converge to L balls as q and it is well-known that Follow-The-Leader yields Θ(T) regret in online linear optimization when K is a L ball.

Using the above lemma, we introduce an algorithm for online linear optimization with hints that belong to a set V. In this algorithm, we instantiate one copy of the FTL algorithm for each possible direction of the hint. On round t, we invoke the copy of the algorithm that corresponds to the direction of the hint vt, using the history of the game for rounds with hints in that direction. We show that the overall regret of this algorithm is no larger than the sum of the regrets of the individual copies.

Algorithm 2 Aset: SET-OF-HINTS For all v V, let Tv = . For t = 1, . . . , T,

1. Play xt arg minx K P

τ Tvt c T τx and receive ct as feedback.

2. Update Tvt Tvt {t}.

Theorem 4.2. Suppose that K Rd is a (C, q)-uniformly convex set that is symmetric around the origin, and Br K BR for some r and R. Consider online linear optimization with hints where the cost function at round t is ct 2 G and the hint vt comes from a ﬁnite set V and is such that c T tvt α ct 2, while vt 2 = 1. Algorithm 2 has a worst-case regret of

R(Aset, c1:T ) |V| R2

2C α r G (ln(T) + 1), if q = 2,

R(Aset, c1:T ) |V| Rq

1/(q 1) G q 1

q 2 q 1 if q > 2.

Proof. We decompose the regret as follows:

R(Aset, c1:T ) =

t=1 c T txt inf x K

t=1 c T tx X

t Tv c T txt inf x K

t Tv c T tx

|V| max v V R(AFTL, c Tv).

The proof follows by applying Lemma 4.1 and by using vt K (1/r) vt 2 = 1/r.

Note that Aset does not require α or V to be known a priori, as it can compile the set of hint directions as it sees new ones. Moreover, if the hints are not limited to ﬁnite set V a priori, then the algorithm can ﬁrst discretize the L2 unit ball with an α/2-net and approximate any given hint with one of the hints in the discretized set. Using this discretization technique, Theorem 4.2 can be extended to the setting where the hints are not constrained to a ﬁnite set while having a regret that is linear in the size of the α/2-net (exponential in the dimension d.) Extensions of Theorem 4.2 are discussed in more details in the Appendix C.

5 Lower Bounds

The regret bounds derived in Sections 3 and 4 suggest that the curvature of K can make up for the lack of curvature of the loss function to get rates faster than O(

T) in online convex optimization, provided we receive additional information about the next move of the adversary in the form of a hint. In this section, we show that the curvature of the player s decision set K is necessary to get rates better than O(

T), even in the presence of a hint.

As an example, consider the unit cube, i.e. K = {x | x 1}. Note that this set is not uniformly convex. Since, the ith coordinate of points in such a set, namely xi, has no effect on the range of acceptable values for the other coordinates, revealing one coordinate does not give us any information about the other coordinates xj for j = i. For example, suppose that ct has each of its ﬁrst two coordinates set to +1 or 1 with equal probability and all other coordinates set to 1. In this case, even after observing the last d 2 coordinates of the loss vector, the problem is reduced to a standard online linear optimization problem in the 2-dimensional unit cube. This choice of ct is known to incur a regret of Ω(

T) [1]. Therefore, online linear optimization with the set K = {x | x 1}, even in the presence of hints, has a worst-case regret of Ω(

T). As it turns out, this result holds for any polyhedral set of actions. We prove this by means of a reduction to the lower bounds established in [8] that apply to the online convex optimization framework (without hint). We defer the proof to the Appendix D.4. Theorem 5.1. If the set of feasible actions is a polyhedron then, depending on the set C, either there exists a trivial algorithm that achieves zero regret or every online algorithm has worst-case regret Ω(

T). This is true even if the adversary is restricted to pick a ﬁxed hint vt = v for all t = 1, , T.

At ﬁrst sight, this result may come as a surprise. After all, since any Lp ball with 1 < p 2 is strongly convex, one can hope to use a L1+ν unit ball K to approximate K when K is a L1 ball (which is a polyhedron) and apply the results of Section 3 to achieve better regret bounds. The problem with this approach is that the constant in the modulus of convexity of K deteriorates when p 1 since δLp(ϵ) = (p 1) ϵ2, see [3]. As a result, the regret bound established in Theorem 3.3 becomes O( 1 p 1 log T). Since the best approximation of a L1 unit ball using a Lp ball is of the

form {x Rd | d1 1

p x p 1}, the distance between the ofﬂine benchmark in the deﬁnition of regret when using K instead of K can be as large as (1 d 1 p 1) T, which translates into an additive term of order (1 d 1 p 1) T in the regret bound when using K as a proxy for K. Due to the inverse dependence of the regret bound obtained in Theorem 3.3 on p 1, the optimal choice of p = 1 + O( 1

T ) leads to a regret of order O(

Finally, we conclude with a result that suggests that O(log(T)) is, in fact, the optimal achievable regret when K is strongly convex in online linear optimization with a hint. We defer the proof to the Appendix D.4.

Theorem 5.2. If K is a L2 ball then, depending on the set C, either there exists a trivial algorithm that achieves zero regret or every online algorithm has worst-case regret Ω(log(T)). This is true even if the adversary is restricted to pick a ﬁxed hint vt = v for all t = 1, , T.

6 Directions for Future Research

We conjecture that the dependence of our regret bounds with respect to T is suboptimal when K is (C, q)-uniformly convex for q > 2. We expect the optimal rate to converge to

T when q as Lq balls converge to L balls and it is well known that the minimax regret scales as

T in online linear optimization without hints when the decision set is a L ball. However, this calls for the development of an algorithm that is not based on a reduction to the Follow-The-Leader algorithm, as discussed after Lemma 4.1.

We also conjecture that it is possible to relax the assumption that there are ﬁnitely many hints when K is (C, q)-uniformly convex with q > 2 without incurring an exponential dependence of the regret bounds (and the runtime) on the dimension d, see Appendix C. Again, this calls for the development of an algorithm that is not based on a reduction to the Follow-The-Leader algorithm.

A solution that would alleviate the two aforementioned shortcomings would likely be derived through a reduction to online convex optimization with convex functions that are (C, q)-uniformly convex, for q 2, in all but one direction and constant in the other, in a similar fashion as done in Section 3 when q = 2. There has been progress in this direction in the literature, but, to the best of our knowledge, no conclusive result yet. For instance, Vovk [23] studies a related problem but restricts the study to the squared loss function. It is not clear if the setting studied in this paper can be reduced to the setting of square loss function. Another example is given by [21], where the authors consider online convex optimization with general (C, q)-uniformly convex functions in Banach spaces (with no hint) achieving a regret of order O(T (q 2)/(q 1)). Note that this rate matches the one derived in Theorem 4.2. However, as noted above, our setting cannot be reduced to theirs because our virtual loss functions are not uniformly convex in every direction.

Acknowledgments

Haghtalab was partially funded by an IBM Ph.D. fellowship and a Microsoft Ph.D. fellowship. Jaillet acknowledges the research support of the Ofﬁce of Naval Research (ONR) grant N00014-15-1-2083. This work was partially done when Haghtalab was an intern at Microsoft Research, Redmond WA.

[1] Jacob Abernethy, Peter L Bartlett, Alexander Rakhlin, and Ambuj Tewari. Optimal strategies and minimax lower bounds for online convex games. In Proceedings of the 21st Conference on Learning Theory (COLT), pages 415 424, 2008.

[2] Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1):121 164, 2012.

[3] Keith Ball, Eric A Carlen, and Elliott H Lieb. Sharp uniform convexity and smoothness inequalities for trace norms. Inventiones mathematicae, 115(1):463 482, 1994.

[4] Avrim Blum and Yishay Monsour. Learning, regret minimization, and equilibria. In Algorithmic Game Theory, pages 79 102. 2007.

[5] Nicolo Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge university press, 2006.

[6] Chao-Kai Chiang and Chi-Jen Lu. Online learning with queries. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 616 629, 2010.

[7] Chao-Kai Chiang, Tianbao Yang, Chia-Jung Lee, Mehrdad Mahdavi, Chi-Jen Lu, Rong Jin, and Shenghuo Zhu. Online optimization with gradual variations. In Proceedings of the 25th Conference on Learning Theory (COLT), pages 6 1, 2012.

[8] Arthur Flajolet and Patrick Jaillet. No-regret learnability for piecewise linear losses. ar Xiv preprint ar Xiv:1411.5649, 2014.

[9] Yoav Freund. Boosting a weak learning algorithm by majority. Information and computation, 121(2): 256 285, 1995.

[10] Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119 139, 1997.

[11] Elad Hazan. The convex optimization approach to regret minimization. Optimization for machine learning, pages 287 303, 2012.

[12] Elad Hazan and Satyen Kale. Extracting certainty from uncertainty: Regret bounded by variation in costs. In Proceedings of the 23th Conference on Learning Theory (COLT), 2008.

[13] Elad Hazan and Nimrod Megiddo. Online learning with prior knowledge. In Proceedings of the 20th Conference on Learning Theory (COLT), pages 499 513, 2007.

[14] Elad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex optimization. Machine Learning, 69(2-3):169 192, 2007.

[15] Ruitong Huang, Tor Lattimore, András György, and Csaba Szepesvári. Following the leader and fast rates in linear prediction: Curved constraint sets and other regularities. In Proceedings of the 30th Annual Conference on Neural Information Processing Systems (NIPS), pages 4970 4978, 2016.

[16] Adam Kalai and Santosh Vempala. Efﬁcient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291 307, 2005.

[17] H Brendan Mc Mahan. A survey of algorithms and analysis for adaptive online learning. Journal of Machine Learning Research, 18:1 50, 2017.

[18] Gilles Pisier. Martingales in banach spaces (in connection with type and cotype). Manuscript., Course IHP, Feb, pages 2 8, 2011.

[19] Alexander Rakhlin and Karthik Sridharan. Online learning with predictable sequences. In Proceedings of the 25th Conference on Learning Theory (COLT), pages 993 1019, 2013.

[20] Alexander Rakhlin and Karthik Sridharan. Optimization, learning, and games with predictable sequences. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS), pages 3066 3074, 2013.

[21] Karthik Sridharan and Ambuj Tewari. Convex games in banach spaces. In Proceedings of the 23rd Conference on Learning Theory (COLT), pages 1 13, 2010.

[22] Roman Vershynin. Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing: Theory and Applications, pages 210 268. Cambridge University Press, 2012.

[23] Vladimir Vovk. Competing with wild prediction rules. Machine Learning, 69(2-3):193 212, 2007.

[24] Martin Zinkevich. Online convex programming and generalized inﬁnitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning (ICML), pages 928 936, 2003.