# fast_rates_in_timevarying_strongly_monotone_games__d300d562.pdf

Fast Rates in Time-Varying Strongly Monotone Games

Yu-Hu Yan 1 Peng Zhao 1 Zhi-Hua Zhou 1

Abstract Multi-player online games depict the interaction of multiple players with each other over time. Strongly monotone games are of particular interest since they have benign properties and also relate to many classic games that have applications in real life. Existing works mainly focus on the time-invariant case with provable guarantees established. However, the research of the more general time-varying games in changing environments is underexplored and the best-known result cannot match the guarantees in the time-invariant case. In this work, we present a new decentralized online algorithm for time-varying strongly monotone games, which greatly improves existing results and obtains fast rates, matching the best timeinvariant guarantee without knowing the environmental non-stationarity. Furthermore, to achieve faster rates, we generalize the RVU property with smoothness and establish a series of problemdependent bounds that also match the best timeinvariant one. To realize all those results, we make a comprehensive use of the techniques in non-stationary and universal online learning.

1. Introduction

Multi-player online games (Daskalakis et al., 2011; Rakhlin & Sridharan, 2013b; Syrgkanis et al., 2015) is a versatile model that depicts the interaction of multiple players over time. At each round, each player makes a decision from a convex compact set, and meanwhile, the environment selects convex loss functions (also called utility functions) for them. Then each player suffers a loss decided by their own utility function and the joint decisions of all players. A fundamental task of game-theoretic learning is to ﬁnd the Nash equilibrium, a stable set of decisions where no player has incentives to deviate (Nash Jr, 1950). However, solving

1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China. Correspondence to: Peng Zhao <zhaop@lamda.nju.edu.cn>.

Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright 2023 by the author(s).

an accurate Nash equilibrium is generally computation-hard or even PPAD-complete (Daskalakis et al., 2009). Fortunately, it has been revealed that when games exhibit certain benign properties, effective algorithms can be developed to achieve Nash equilibriums with provable guarantees. Particularly, recent advances show the great potential of the regret minimization framework (Freund & Schapire, 1999) for multi-player online games. Indeed, when each player minimizes their own regret, the joint decision eventually converges to a coarse correlated equilibrium. However, it may differ from the Nash equilibrium. Fortunately, with more structures in games, regret minimization algorithms can provably achieve Nash equilibriums.

Among those successes, strongly monotone games (Rosen, 1965) is a signiﬁcant subclass that encompasses many realworld applications of interest and has been extensively studied due to its benign mathematical properties (Monderer & Shapley, 1996; Nemirovski et al., 2010). It is demonstrated that, when appropriate regret minimization algorithms are deployed to all players, the distance between their decisions and Nash equilibrium can provably converge to zero (Facchinei & Pang, 2003; Bravo et al., 2018). Existing results on strongly monotone games primarily concern the time-invariant case, i.e., with ﬁxed utility functions. However, in real-world applications, many game-related scenarios are time-varying. For instance, an important gametheoretic application is the Cournot competition (Monderer & Shapley, 1996), where multiple ﬁrms provide goods to the market, and the goods are then priced as a function of the total supply. Due to various factors such as weather, holidays, politics, etc., the supply-market relationship is often subject to change, while existing studies for time-invariant scenarios cannot address this issue.

The only existing work for time-varying strongly monotone games is by Duvocelle et al. (2023), but the attained results are not favorable enough. Speciﬁcally, they investigated the distance tracking error, the distance between the decisions and the Nash equilibrium (a formal deﬁnition is introduced in (2.1)), and further proposed a restart-based algorithm to handle the non-stationarity, attaining an O(

T +T 2/3P 1/3 T ) guarantee, where T is the time horizon, and PT measures the environmental non-stationarity. Notably, the result implies an O(

T) tracking error in the time-invariant case (where PT = 0), which unfortunately exhibits a large gap compared

Fast Rates in Time-Varying Strongly Monotone Games

Table 1: A summary of tracking error guarantees for time-varying strongly monotone games. The ﬁrst column shows two setups about non-smooth and smooth games. The second column presents where the results are from. The second column provides the main results, where PT , VT , and WT are different non-stationarity measures that are at most O(T) and become 0 in the time-invariant case. The third column shows implications to the time-invariant case, where our results match the best-known results, i.e., e O(1) for general utility functions (Bravo et al., 2018, Theorem 7) and O(1) for smooth functions (Facchinei & Pang, 2003, Section 12.3.2).

Setups Works Time-Varying Games Time-Invariant Games

Non-Smooth Games Duvocelle et al. (2023) O(

T + T 2/3P 1/3 T ) O(

This Paper (Theorem 3) e O(1 + min{T 1/3P 2/3 T , WT }) e O(1)

Smooth Games This Paper (Theorem 5) O(min{ p

(1 + VT + PT )(1 + PT ), 1 + WT }) O(1)

to O(log T), the best-known time-invariant result (Bravo et al., 2018). It is thus natural to ask for a more thorough investigation of time-varying strongly monotone games.

This paper presents a more comprehensive characterization of time-varying strongly monotone games. To optimize the distance tracking error, we propose a new decentralized online algorithm that achieves an e O(1 + T 1/3P 2/3 T ) tracking error bound, where e O( ) omits the poly-logarithmic dependence in T. The advantages of our result lie in three aspects: (i) Our result signiﬁcantly improves upon the previous bestknown O(

T + T 2/3P 1/3 T ) rate (Duvocelle et al., 2023), and importantly, it implies an e O(1) tracking error when specializing to the time-invariant scenario, hence matching the corresponding best-known e O(1) result (Bravo et al., 2018); (ii) Our algorithm does not require to know PT , the variation of Nash equilibriums (a formal deﬁnition is deferred to Section 2.2), that is actually unknown in advance. In contrast, this quantity is required by Duvocelle et al. (2023); (iii) Our algorithm exhibits adaptivity to the strong monotonicity in the sense that the monotonicity coefﬁcient is also not required. We further contribute an orthogonal improvement by showing that our algorithm additionally enjoys an e O(1 + WT ) guarantee, where WT quantiﬁes the variance of the gradients of utility functions. Consequently, our algorithm can take advantage of both slow Nash equilibrium variation and small gradient variance simultaneously. The key to our improvement lies in a novel analysis to construct carefully designed strongly convex surrogate loss functions by the virtue of strong monotonicity.

To step further, we consider the possibility of obtaining even faster tracking error rates. Assuming the smoothness of the utility functions, we generalize the Regret bounded by Variation in Utilities (RVU) condition (Syrgkanis et al., 2015), a key property for fast-rate convergence in ﬁnite games, to continuous multi-player games under the time-varying scenarios inspired by the recent study of time-varying zero-sum games (Zhang et al., 2022c). Using the classic optimistic online gradient descent algorithm (Rakhlin & Sridharan, 2013a), we derive a series of problem-dependent bounds. Speciﬁcally, we obtain an O( p

(1 + VT + PT )(1 + PT ))

tracking error, where VT is a problem-dependent quantity that is at most O(T) but can be much smaller in benign environments. In addition, our algorithm also beneﬁts from small gradient variance with an O(1 + WT ) bound. Note that our new results are faster than e O(1 + T 1/3P 2/3 T ) without smoothness. For instance, in a game with S switches, the non-smooth guarantee ensures an e O(1+T 1/3S2/3) rate. In contrast, the results here give an optimal O(1+S) bound, matching the performance of the oracle learner who restarts once a switch happens and runs a time-invariant algorithm within each stationary period (thus suffering an O(S) bound in total, due to O(1) tracking error in each period and overall S periods). However, the aforementioned guarantees require different conﬁgurations (of the step sizes). To this end, we leverage a two-layer framework by properly hedging over many possibilities and ﬁnally obtain the same guarantees with a single algorithm. Table 1 summarizes the existing result and ours in non-smooth and smooth cases.

Notably, in common interest games, a setting usually encountered in distributed optimization problems (Gopal & Yang, 2013), where the utility functions remain the same across players, all our results hold for the newly proposed utility tracking error, which serves as an upper bound of the distance version and is thus more fundamental in this case.

Techniques. To achieve all those fast-rate guarantees, we make comprehensive use of recent online convex optimization techniques for non-stationarity (Zhao et al., 2021) and universality (van Erven & Koolen, 2016). In the non-smooth case, the key improvements over (Duvocelle et al., 2023) stem from two aspects. First, it is possible to directly handle the non-stationarity with online ensemble (Zhou, 2012; Zhao, 2021), which is realized by employing a group of online gradient descent (OGD) algorithms (Zinkevich, 2003) with different conﬁgurations as base learners and a meta learner to track the best one on the ﬂy. As a result, the two-layer algorithm can track the moving clairvoyant and does not require PT as input. The second improvement is carefully designed strongly convex surrogate loss functions utilizing the virtue of strong monotonicity, which allows us to leverage the recent progress of non-stationary online

Fast Rates in Time-Varying Strongly Monotone Games

learning with strongly convex losses.

For the smooth case, our results draw inspiration from timevarying two-player zero-sum games (Zhang et al., 2022c) but with additional innovations. On the one hand, we generalize the key RVU condition to more complex scenarios with multiple players and general utility functions so that our bounds can safeguard the tracking error; alternatively, Zhang et al. (2022c) only bounded quantities related to the summation of two players regret and thus cannot protect the tracking error. On the other hand, to obtain favorable results, we need a novel usage of correction terms for online ensemble, which hinges on the gradient-variation dynamic regret for online convex optimization (Zhao et al., 2021).

The rest is organized as follows. Section 2 formulates the problem. Section 3 proposes our fast-rate algorithm for timevarying strongly monotone games. Section 4 achieves faster rates with smoothness. Section 5 provides the empirical evaluations and ﬁnally Section 6 concludes the work.

2. Problem Setup

This section introduces the performance and non-stationarity measures in the time-varying strongly monotone games and a particular class called common interest games.

2.1. Time-Varying Strongly Monotone Games

A multi-player time-varying game contains T rounds and N 2 players. In the t-th round, the i-th player (i [N]) chooses a decision xt,i from a compact convex set Xi Rd. Simultaneously, the environments reveal a group of time-varying utility functions ut,i : X 7 R for each player, where X X1 . . . XN. Afterwards, each player receives her local gradient feedback vt,i(xt), where xt (xt,1, . . . , xt,N) and vt,i(xt) xt,iut,i(xt).

For the i-th player, a Nash equilibrium is a joint decision x such that ut,i(x i ; x i) ut,i(xi; x i) for any xi Xi, where x i (x1, . . . , xi 1, xi+1, . . . , x N). Variational inequality is also used to describe a Nash equilibrium: vt(x ), x x 0, where the global gradient vt(x) (vt,1(x), . . . , vt,N(x)). Since solving an accurate Nash equilibrium is computation-hard in general multiplayer games, we focus on those with certain benign properties, especially strongly monotone games (Rosen, 1965). Deﬁnition 1 (Strong Monotonicity). A game with utility gradient v is µ-strongly monotone if v(x) v(y), x y µ x y 2 2 holds for any x, y X, where µ > 0.

Monotone games include many classic games close to realworld applications, including Cournot competition (Monderer & Shapley, 1996), Kelly auctions and Tullock competitions (Nemirovski et al., 2010), signal covariance and power control problems in wireless communications (d Oro

et al., 2015; Mertikopoulos & Moustakas, 2015), and so on. We refer readers to Facchinei & Kanzow (2010) for more applications. Besides, since a monotone game admits a unique Nash equilibrium (Rosen, 1965), we denote by x t the Nash equilibrium of the game of the t-th round.

In game theory and convex optimization, a well-studied performance measure is the distance tracking error, which reﬂects the algorithm s ability to chase some target measured by norm distance. In multi-player games, a natural target is the Nash equilibrium, and thus we investigate

t=1 xt x t 2 . (2.1)

In time-invariant strongly monotone games, where x is used to denote the unique Nash equilibrium, the distance tracking error xt x 2 enjoys an O(t 1) last-iterate convergence (Bravo et al., 2018, Theorem 7) and can be improved to O(ρt) with smooth utility functions (Facchinei & Pang, 2003), where ρ (0, 1). We end this part by listing the assumptions and notations used throughout the work.

Assumption 1. For any i [N], the gradient satisﬁes vt,i( ) , vt( ) G and the domain satisﬁes x y D for any x, y X and x y D for any x, y Xi.

Assumption 2. All games with utility gradients {vt}T t=1 are µ-strongly monotone (Deﬁnition 1).

We use d for a d-dimensional simplex, {at}T t=1 for the sequence a1, . . . , a T , for ℓ2-norm in default. a b represents a e O(b), where e O( )-notation omits logarithmic factors in time horizon T.

2.2. Non-Stationarity Measures

We introduce the following non-stationarity measures to capture the varying intensity of the time-varying games:

Path length: PT PT t=2 x t x t 1 measures the variation of the time-varying Nash equilibriums. Gradient variation: VT PT t=2 supx X vt(x) vt 1(x) 2 denotes the variation in gradients. Gradient variance: WT PT t=1 supx X vt(x) v T (x) reﬂects the variance of the gradients, where v T ( ) = PT t=1 vt( )/T is the average gradient.

The three measures reﬂect different aspects of the games and are generally incomparable. We will give more detailed discussions in the rest of the paper.

2.3. Common Interest Games

Our setup in Section 2.1 exactly matches that of Duvocelle et al. (2023). Besides, we also investigate a particular class called common interest games, where the utility functions remain the same across players, i.e., ut,i = ut for all i [N].

Fast Rates in Time-Varying Strongly Monotone Games

This game is worth studying since it is widely encountered in distributed optimization problems, where the problem dimension is pretty large (Gopal & Yang, 2013).

In common interest games, we propose a more fundamental performance measure, which evaluates the function level. Speciﬁcally, we deﬁne utility tracking error:

t=1 ut(x t ). (2.2)

Note that (2.1) and (2.2) are incomparable in general, but we identify that, in common interest strongly monotone games, the utility tracking error serves as a natural upper bound of the distance tracking error, showing that (2.2) is more fundamental in this particular setup. See Proposition 1 for a formal statement with the proof in Appendix B. Proposition 1. The utility function ut of a µ-strongly monotone game is µ-strongly convex, and the utility tracking error (2.2) can upper-bound the distance tracking error (2.1), speciﬁcally, µDIST-ERR 2UTIL-ERR.

3. Fast Rates for Strongly Monotone Games

This section presents our fast-rate results for time-varying strongly monotone games. First, Section 3.1 reviews the latest work and identiﬁes the unsatisfactory guarantees. Then, Section 3.2 and Section 3.3 introduce our solution with a fast tracking error rate. Finally, Section 3.4 shows that our new method can also take advantage of small gradient variance.

3.1. Reviewing Latest Result

In this part, we review the algorithm and sketch the analysis of Duvocelle et al. (2023). They focused on the distance tracking error (2.1). By deﬁnition of strong monotonicity and property of Nash equilibriums, they upperbounded the distance tracking error as µDIST-ERR P

t [T ] vt(xt), xt x t . Notably, the right-hand side is essentially the regret over linear losses { vt(xt), }T t=1 against a sequence of changing comparators {x t }T t=1, which is unknown and thus in general hard to handle.

To overcome this issue, Duvocelle et al. (2023) leveraged a restarting strategy in analysis,1 a common way to deal with non-stationarity (Besbes et al., 2015; Zhao et al., 2020a). Speciﬁcally, dividing the time horizon T into K = T/ periods of length , P

t [T ] vt(xt), xt x t (upper bound of distance tracking error) can be decomposed as

t k vt(xt), xt vk +

t k vt(xt), vk x t ,

1We call it a method with restarts because it decomposes the regret into multiple periods, just like restart-based algorithms. Notably, the algorithm does not need to restart.

where k denotes the k-th period and {vk}K k=1 is a periodwise stationary comparator sequence. The ﬁrst term is the summation of the static regret of all periods, and the second term represents the gap between two comparator sequences: {vk}K k=1 and {x t }T t=1. For the ﬁrst term, since the comparator is ﬁxed as vk inside the k-th period, an algorithm with static regret guarantees, e.g., OGD (Zinkevich, 2003), gives an O(

) regret inside each period with step size 1/

. The second term, using the period comparison technique from Besbes et al. (2015), can be bounded by O( PT ), where PT PT t=2 x t x t 1 is the path length of the Nash equilibriums. Summing together gives an upper bound of O(T/

+ PT ). Choosing the period length optimally as = min{(T/PT )2/3, T}, which is required in the step size of OGD, gives an O(

T + T 2/3P 1/3 T ) bound.

However, their result is not satisfactory enough. Consider the simplest time-invariant case, i.e., the Nash equilibrium is ﬁxed x 1 = ... = x T (so that PT = 0). Their result implies an O(

T) tracking error, which is exponentially worse than O(log T), the best-known result for static strongly monotone games (Bravo et al., 2018, Theorem 7).

3.2. Tracking the Non-Stationarity Directly

The key idea of Duvocelle et al. (2023) is to divide the time horizon into multiple periods and treat the games inside each period as a time-invariant one in response to the non-stationarity of time-varying games. However, this may not be the best way. Indeed, in this online game setup, the feedback of each player is relatively adequate as they can observe the gradient information. Thus, instead of restarting a stationary algorithm, we manage to design online algorithms capable of directly tracking the non-stationarity and catching up with the changing nature.

With this high-level sense in mind, we present a new decomposition for the distance tracking error,

t=1 vt(xt), xt x t =

t=1 vt,i(xt), xt,i x t,i ,

which is the summation of all players regret. The regret of the i-th player is deﬁned on linear losses { vt,i(xt), }T t=1 against a sequence of time-varying comparators {x t,i}T t=1, and notably the latter is related to Nash equilibriums and is unknown in advance, or even after the game ends.

Benchmarking online performance against a sequence of changing comparators is the purpose of dynamic regret precisely, a central target of non-stationary online learning (Zinkevich, 2003). It is demonstrated that OGD has such tracking ability. Speciﬁcally, let us consider a general online convex optimization (OCO) setup, where the online learner submits decisions {wt}T t=1 inside a convex feasible domain W Rd and competes with changing comparators

Fast Rates in Time-Varying Strongly Monotone Games

{ut}T t=1 over convex losses {ft( )}T t=1. A commonly used performance measure is dynamic regret:

D-REG({ut}T t=1)

t=1 ft(ut). (3.1)

OGD, which updates via wt+1 = ΠW[wt η ft(wt)], provably optimizes (3.1) as D-REG({ut}T t=1) O(PT /η+ ηT) with step size η > 0, where the path length of comparators PT PT t=2 ut ut 1 essentially measures the non-stationarity intensity and the regret bound of OGD scales with it. It is important that we now obtain an online learner that can directly track the changing comparators, without any restarts used in (Duvocelle et al., 2023).

In our problem, if each player knew her own path length PT,i deﬁned on {x t,i}T t=1, her own regret would be bounded by at most O( p

T(1 + PT,i)) by running OGD with step size η = O( p

PT,i/T), leading to an O( p

T(1 + PT )) tracking error, which already improves the O(

T + T 2/3P 1/3 T ) bound of Duvocelle et al. (2023). However, the path-length quantities {PT,i}N i=1 are in general unavailable. To address this issue, inspired by the recent progress in non-stationary OCO (Zhang et al., 2018), we deploy a two-layer online ensemble algorithm with totally M base learners and a meta learner. The i-th base learner maintains her own decision wt,i and optimizes the linearized loss ft(wt), by OGD with her own step size based on a guess of the path length. The meta learner maintains weights {pt,i}M i=1 for each base learner and uses an expert-tracking algorithm, e.g., Hedge (Freund & Schapire, 1997), to track the best base learner on the ﬂy. Interested readers can refer to Algorithm 3 for algorithmic details. In the following, we provide the theoretical guarantees of our proposed algorithm, whose proof is presented in Appendix C.1. Theorem 1. Under Assumptions 1 and 2, if each player runs the aforementioned algorithm (Algorithm 3 in Appendix C.1), the distance tracking error enjoys an upper bound of O( p

T(1 + PT )), where PT PT t=2 x t x t 1 is the path length. Notably, our algorithm does not require path-length quantities (PT or {PT,i}N i=1) as inputs. Remark 1. Theorem 1 improves the result of Duvocelle et al. (2023, Theorem 2) in two aspects: (i) Theorem 1 improves the tracking error bound from O(T 2/3P 1/3 T ) to O( TPT ) since PT O(T); and (ii) Theorem 1 is agnostic about PT because the ensemble structure can eliminate the environmental uncertainty. In contrast, the optimal tuning of the period length in the method of Duvocelle et al. (2023) requires the path length PT as an input.

3.3. Further Exploiting Strong Monotonicity

Although Theorem 1 already improves the existing result (Duvocelle et al., 2023), we discover that a further improve-

ment is possible due to the virtue of strong monotonicity.

The key to the improvement lies in a novel analysis that extracts strong convexity from the deﬁnition of strong monotonicity (Deﬁnition 1), constructing a strongly convex surrogate upper bound for the distance tracking error, as shown in Proposition 2 below. The proof is deferred to Appendix C.2.

Proposition 2. Under Assumption 2, it is guaranteed that µDIST-ERR 2 PN i=1 PT t=1(ℓt,i(xt,i) ℓt,i(x t,i)), where

ℓt,i(x) vt,i(xt), x + µ

2 x xt,i 2 (3.2)

is a µ-strongly convex surrogate loss function.

We ﬁrst provide the deﬁnition of strong convexity for selfcontainedness. Speciﬁcally, for any w, u W, a function f is λ-strongly convex if f(w) f(u) f(w), w u λ

2 w u 2. Clearly, ℓt,i( ) in (3.2) is µ-strongly convex. The strongly convex surrogate loss function design allows us to leverage the recent progress of non-stationary online learning with strongly convex losses (Baby & Wang, 2022). Speciﬁcally, assuming the strong monotonicity µ to be known for a moment, recent studies show that if the loss functions are strongly convex, algorithms optimizing the strongly adaptive regret (Daniely et al., 2015), i.e., P

t I ft(wt) P

t I ft(u) inside an arbitrary interval I [1, T], can also guarantee the dynamic regret.2 To conclude, Baby & Wang (2022, Theorem 8) proves that P

t [T ] ℓt,i(xt,i) P

t [T ] ℓt,i(x t,i) e O(1 + T 1/3P 2/3 T,i ).

Summing all players regret gives an e O(1 + T 1/3P 2/3 T ) tracking error bound, which matches the best-known timeinvariant result of e O(1) (Bravo et al., 2018).

Notably, the aforementioned result requires the knowledge of strong monotonicity µ in both the construction of surrogate loss and the setting of algorithmic parameters. However, in real applications, this quantity can be hard to estimate at the beginning of the online games or even unknown, which asks for online algorithms with more adaptivity. To address so, we leverage the technique of universal OCO (van Erven & Koolen, 2016; Wang et al., 2019), which aims to achieve optimal regret bounds for multiple types of loss functions, including convex, exp-concave, and strongly convex ones, using only a single algorithm.

We illustrate the high-level idea of universal methods in the standard OCO setup (introduced in Section 3.2). In OCO, universal algorithms (e.g., Wang et al. (2019)) guarantee P

t [T ] ft(wt), wt u e O( QT ) for any u W, where QT P

t [T ] wt u 2. If the loss functions

are µ-strongly convex, the e O( QT ) bound can be then

2The strongly adaptive regret minimization algorithm is run on each dimension. Therefore the aggregated decision lies in a box domain and is then projected into the feasible domain. The detailed algorithm is deferred to Algorithm 6 in Appendix C.3.

Fast Rates in Time-Varying Strongly Monotone Games

Algorithm 1 TV-SMOG for the i-th player Input: parameter pool H, domain diameter D, gradient upper bound G Initialize: M instances of Algorithm 6 A1, . . . , AM for t = 1, . . . , T do

Receive xt,i,j from Aj Submit xt,i = PM j=1 pt,i,jajxt,i,j/ PM j=1 pt,i,jaj Receive gradient feedback vt,i(xt) Update pt+1,i,j via pt+1,i,j pt,i,j exp( st,i,j(xt,i,j)) for j = 1, . . . , M do

Construct surrogate loss st,i,j with aj H Aj updates to xt+1,i,j with surrogate loss st,i,j end end

canceled by the negative term µQT induced by strong convexity, without the need to know the value of µ, while attaining the virtue of strong convexity. We note the existing universal methods only guarantee the static regret (with a ﬁxed comparator), and we need to adapt them to the dynamic regret setting (changing comparators) to ﬁt our problem.

In the following, we introduce the algorithmic details of our solution. Without loss of generality, we consider the case of 1/T µ 1, because if µ < 1/T, even the optimal static regret Θ(log T/µ) (Hazan et al., 2007) is linear in T. On the contrary, functions with µ > 1 are also 1-strongly convex. Inspired by Wang et al. (2019), we design a sequence of strongly convex surrogate loss functions, deﬁned as:

st,i,j(x) aj vt,i(xt), x xt,i + a2 j G2 x xt,i 2,

where the hyper-parameter aj is chosen from a pool H {aj|aj = 2 j/(3DG), j [M]} with M = 1

2 log2 T +1. The j-th base learner runs the algorithm of Baby & Wang (2022) (restated in Algorithm 6) with surrogate losses {st,i,j( )}T t=1. Given that different base learners are running over heterogeneous loss functions, combining them with a small regret overhead to the best base learner becomes challenging. To this end, our meta learner employs Titled Exponentially Weighted Average (TEWA), which can combine base learners with different functions and enjoy a small meta regret (van Erven & Koolen, 2016). Algorithm 1 concludes our method for Time-Varying Strongly Monotone Online Games (abbreviated as TV-SMOG), which notably does not require strong monotonicity coefﬁcient as an input. The guarantee is shown below, and the proof is in Appendix C.3.

Theorem 2. Under Assumptions 1 and 2, Algorithm 1 enjoys the following distance tracking error bound

DIST-ERR e O 1 + T 1/3P 2/3 T ,

without knowing strong monotonicity µ and path length PT .

Theorem 2 implies an e O(1) tracking error in time-invariant games, which further improves Theorem 1 and matches the best-known result (Bravo et al., 2018, Theorem 7). Remark 2. We remind that Algorithm 1 is decentralized, in the sense that each player conducts theirss own algorithm without cooperating with others, and thus valuable for real applications (Hsieh et al., 2020; Sentenac et al., 2021).

3.4. Taking Advantage of Small Gradient Variance

Theorem 2 obtains an e O(1 + T 1/3P 2/3 T ) tracking error bound, which is now the state-of-the-art scaling with the path length PT . In this part, we further demonstrate that without any modiﬁcation, Algorithm 1 can take advantage of small gradient variance, deﬁned as WT P

t [T ] supx X vt(x) v T (x) , where v T ( ) = P

t [T ] vt( )/T is the averaged gradient. To do so, we begin with a different decomposition:

t=1 xt x T 2 + 2

t=1 x T x t 2,

where x T is the Nash equilibrium of the averaged game of T rounds. Brieﬂy, Algorithm 1 upper-bounds the ﬁrst term by e O(1) via reusing Theorem 2 since x T is ﬁxed. Thus it remains to analyze the second term, which measures the gap between the averaged and time-varying Nash equilibriums, i.e., x T and {x t }T t=1. We prove the second term is algorithm-irrelevant and can be bounded by O(WT ).

To conclude, using Algorithm 1, we obtain the following variance-based, and thus best-of-both-worlds, fast tracking error rate. The proof can be found in Appendix C.4. Theorem 3 (Main Result). Under Assumptions 1 and 2, Algorithm 1 enjoys the following tracking error bound

DIST-ERR e O (1 + WT ) ,

without knowing the strong monotonicity µ and gradient variance WT . With Theorem 2, our algorithm thus achieves

DIST-ERR e O 1 + min n T 1/3P 2/3 T , WT o ,

without knowing strong monotonicity µ, gradient variance WT , and path length PT .

Note that path length PT and gradient variance WT are generally incomparable. Thus both measures have their own merit, and our algorithm achieves the best of both worlds. Interested readers can refer to Appendix C of Zhang et al. (2022c) for more detailed discussions about their relationship in the simpler two-player zero-sum games.

Finally, in Corollary 1 below, we show that in common interest strongly monotone games (deﬁned in Section 2.3), our results in Theorem 3 also hold for the more fundamental utility tracking error. The proof is deferred to Appendix C.4.

Fast Rates in Time-Varying Strongly Monotone Games

Corollary 1. In common interest strongly monotone games, under the same assumptions of Theorem 3, Algorithm 1 enjoys the same guarantees for the utility tracking error.

4. Faster Rates in Smooth Games

In this section, we investigate the possibility of obtaining even faster rates in time-varying strongly monotone games.

4.1. Leveraging RVU Property

In multi-player ﬁnite games, an essential property for fastrate convergences is the Regret bounded by Variation in Utilities (RVU) (Syrgkanis et al., 2015), restated as follows.

Deﬁnition 2 (RVU Property). An online algorithm satisﬁes the RVU property with parameters α > 0 and 0 < β γ, if its regret P

t [T ] gt, wt u on any loss sequence {gt}T t=1 with respect to any comparator u is bounded by α + β P

t [T ] gt gt 1 2 γ P

t [T ] wt wt 1 2.

Recent progress of online learning shows that the RVU property is satisﬁed by the well-known optimistic online gradient descent (OOGD) (Rakhlin & Sridharan, 2013b), shown as follows in our notations:

xt,i = ΠXi[bxt,i ηimt,i], bxt+1,i = ΠXi[bxt,i ηivt,i(xt)],

where the intermediate decision bxt,i updates using the true gradient vt,i(xt) and the submitted decision xt,i is generated from bxt,i using an optimism mt,i, which can be viewed as an optimistic estimation of the current gradient vt,i(xt).

Due to the property of ﬁnite games (e.g., Theorem 4 of Syrgkanis et al. (2015)), the gradient variation of the i-th player (i.e., gt gt 1 2) can be bounded by the summation of other players switching costs (i.e., P

j =i xt,j xt 1,j 2), with which the summation of all players regret brings cancellations and can thus achieve faster rates.

Zhang et al. (2022c) generalize the RVU property to the time-varying setting, but only for zero-sum ﬁnite games. To achieve faster rates in our problem, we leverage RVU by exploiting the smoothness of the utility functions. 3

Assumption 3 (Smoothness). The utility gradient vt(x) is L-Lipschitz continuous: vt(x) vt(y) L x y .

We remark that Assumption 3 is mild since similar assumptions naturally exist in matrix games as long as the matrix norm is bounded (Kangarshahi et al., 2018). In the following, we further generalize the RVU property to time-varying

3In literature, smooth game (Roughgarden, 2009) also has another meaning: a game is (a, b)-smooth if there exists a joint decision x such that PN i=1 ui(x i ; x i) a PN i=1 ui(x ) + b PN i=1 ui(x) for any x. In words, any player using their optimal strategy continues to do well irrespective of other players.

multi-player continuous games, as shown below in Lemma 1. And the proof is deferred to Appendix D.1.

Lemma 1. Under Assumptions 1-3, if the i-th player runs OOGD with optimism mt,i = vt 1,i(xt 1) and step size ηi, the regret of the i-th player, P

t [T ] vt,i(xt), xt,i x t,i , can be non-asymptotically bounded by

ηi + ηi(1 + VT ) + ηi

ηi Si, (4.1)

where Sj PT t=2 xt,j xt 1,j 2 is the cumulative switching cost of the j-th player, VT denotes the gradient variation and PT,i represents the path length of the i-th player.

With speciﬁc choices of the step sizes {ηi}N i=1, the summation of the last two terms in (4.1) can be non-positive, with which the summation of all players regret can achieve faster rates that only depend on the gradient variation VT and path length PT . The distance tracking error can be consequently guaranteed due to DIST-ERR P

t [T ] vt,i(xt), xt,i x t,i . Besides, using the similar analysis in Section 3.4, we show that OOGD can also take advantage of small gradient variance WT . Theorem 4 informally guarantees the tracking error in various problemdependent quantities, which matches the best known O(1) bound in time-invariant games. The corresponding formal version and proof can be found in Appendix D.2.

Theorem 4 (informal). Under Assumptions 1-3, if the i-th player runs OOGD with mt,i = vt 1,i(xt 1), the distance tracking error enjoys O( p

(1 + VT + PT )(1 + PT )) and O(1+WT ) guarantees respectively with different step sizes. Note that the algorithm requires inputs of path length PT , gradient variation VT , and gradient variance WT .

Remark 3. Although the gradient variance WT is an upper bound of the variation VT , as shown in (D.6), our theoretical guarantees of O( p

(1 + VT + PT )(1 + PT )) and O(1 + WT ) are in general incomparable.

Remark 4. Theorem 4 obtains faster rates than Theorem 2 (without smoothness). For example, in time-invariant games, the O(1) tracking error induced by Theorem 4 improves the earlier e O(1) by log T factors. Another example is the Sswitch games, where Theorem 4 implies an O(1+S) bound, faster than e O(1 + T 1/3S2/3) of Theorem 2 since S T.

4.2. Best-of-Both-Worlds Rates

Theorem 4 obtains faster tracking error guarantees than those without smoothness. However, different bounds require different step size setups, which even depend on the unknown problem-dependent quantities. In this part, we address this issue by designing a single algorithm that attains a best-of-both-worlds tracking error guarantee.

Fast Rates in Time-Varying Strongly Monotone Games

Algorithm 2 TV-SMOG (smooth) for the i-th player Input: step size pool Hη i (D.18) Initialize: M instances of OOGD A1, . . . , AM with step size ηi,j Hη i for t = 1, . . . , T do

Receive xt,i,j from Aj Submit xt,i = PM j=1 pt,i,jxt,i,j Receive gradient feedback vt,i(xt) Construct loss ℓt,i and optimism mt,i via (4.3) Update to pt+1,i via (4.2) with learning rate εt,i (D.7) for j = 1, . . . , M do

Aj updates to xt+1,i,j with optimism vt 1,i(xt 1) end end

To this end, we adopt the two-layer online ensemble framework to facilitate the algorithm with different kinds of adaptivity. Taking the i-th player as an example, the decision is given by xt,i = P

j [M] pt,i,jxt,i,j, where pt,i (pt,i,1, . . . , pt,i,M) M denotes the weight returned by the meta learner and the j-th base learner updates her own decision xt,i,j via OOGD with step size ηi,j. However, one caveat is that the cancellations from Lemma 1 fail. To see this, for the j-th base learner, notice that the last two terms of (4.1) become ηi,j PN j=1 Sj Si,j/ηi,j, where

Sj PT t=2 xt,j xt 1,j 2 denotes the cumulative switching cost of the ﬁnal decision of the i-th player. However, the negative term relies on Si,j PT t=2 xt,i,j xt 1,i,j 2, the switching cost of the j-th base learner, and thus cannot be leveraged to cancel the positive switching cost.

To handle this, inspired by the work of Zhao et al. (2021) on the gradient-variation guarantees in non-stationary environments, we observe that the switching cost of the i-th player can be bounded as xt,i xt 1,i 2 pt,i pt 1,i 2 1 + P j [M] pt,i,j xt,i,j xt 1,i,j 2 (interested readers can refer to (D.10) for a detailed derivation). The ﬁrst term pt,i pt 1,i 2 1 can be canceled by adopting optimistic Hedge (Rakhlin & Sridharan, 2013a) as the meta algorithm (see Lemma 7 for the negative term in the analysis):

pt+1,i,j exp

s=1 ℓs,i,j + mt+1,i,j

where the learning rate εt,i, the optimism mt,i (mt,i,1, . . . , mt,i,M) and the loss ℓt,i (ℓt,i,1, . . . , ℓt,i,M) of the meta learner are to be speciﬁed later.

To illustrate the high-level solution to P

j [M] pt,i,j xt,i,j xt 1,i,j 2, we consider a simpler problem with regret P

t [T ] ℓt, pt P

t [T ] ℓt,i . If we instead optimize the biased loss vector ℓt + bt and obtain a bound of RT , the original regret is at most RT

i [N] pt,ibt,i + P

t [T ] bt,i , where the negative term of P

i [N] pt,ibt,i is useful. For our purpose, we set the bias term as λ xt,i,j xt 1,i,j 2 (λ to be speciﬁed), the switching cost of the j-th base learner. The loss vector ℓt,i and optimism mt,i are set accordingly as

ℓt,i,j vt,i(xt), xt,i,j + λ xt,i,j xt 1,i,j 2, (4.3)

mt,i,j vt 1,i(xt 1), xt,i,j + λ xt,i,j xt 1,i,j 2.

Our method, TV-SMOG (smooth), is summarized in Algorithm 2. Theorem 5 gives the theoretical guarantees in terms of multiple problem-dependent quantities. Besides, the individual regret of each player (i.e., P

t [T ] vt,i(xt), xt,i x t,i ), when all players agree to run the same algorithm distributedly, can also achieve faster rates. A formal version and the proof are deferred to Appendix D.3.

Theorem 5 (informal). Under Assumptions 1-3, Algorithm 2 guarantees a distance tracking error bound of

(1 + VT + PT )(1 + PT ), 1 + WT }).

Simultaneously, the individual regret of each player is bounded by O(1) in the time-invariant case.

Although our solution draws inspiration from Zhang et al. (2022c), we additionally exploit the structure of strongly monotone games and can therefore guarantee the tracking error, while their method only enjoys implicit measures of the distance to Nash equilibriums.

Finally, Corollary 2 considers common interest strongly monotone games (deﬁned in Section 2.3) and shows that the same rates hold for utility tracking error. Besides, our algorithm can take advantage of the small loss FT PT t=1 ut(x t ), which is at most O(T) but can be much smaller in benign scenarios. The proof is in Appendix D.4.

Corollary 2. In common interest strongly monotone games, under the assumptions of Theorem 5, Algorithm 2 enjoys an O(min{ p

(1 + min{VT , FT } + PT )(1 + PT ), 1 + WT }) best-of-three-worlds utility tracking error guarantee.

The small loss FT is orthogonal to other problem-dependent measures, because it can be positive in the time-invariant case and also zero in time-varying games. Therefore, our algorithm can exploit the merit of different aspects of the environments and achieve a best-of-three-worlds guarantee.

5. Experiment

This section provides empirical evaluations of our proposed method in time-varying strongly monotone games.

Contenders. We compare our method, TV-Smog (Algorithm 1) and TV-Smog (Smooth) (Algorithm 2) with three contenders: (i) OGD does not consider the changing of

Fast Rates in Time-Varying Strongly Monotone Games

games and runs simple online gradient descent; (ii) Restart is the method of Duvocelle et al. (2023), which knows the path length PT in advance; and (iii) Ader is our Algorithm 3 proposed in Section 3.2, which is agnostic about the path length PT , but does not exploit the strong monotonicity.

Game Setups. Our game setups are mostly inspired by Lin et al. (2022), and we modify them to be time-varying to adapt to our problem. We investigate three kinds of timevarying strongly monotone games, including distributed ℓ2-regularized logistic regression, Cournot competition, and strongly convex-concave zero-sum games. In the following, we introduce the aforementioned setups respectively.

ℓ2-regularized logistic regression is an important model in machine learning, where the performance is measured by the ℓ2-regularized logistic loss ut(x) log(1 + exp( bt a t x)) + µ x 2 2, where at RN and bt { 1, +1} are chosen from a dataset {at, bt}T t=1, µ is the parameter of the regularization term to prevent overﬁtting. In applications, the problem dimension (i.e., N) may be extremely large. Thus distributed computation is usually considered by modeling the problem as a common interest game. See Gopal & Yang (2013) for an example. Here we use four LIBSVM datasets to initialize the logistic loss. We set µ = 0.005, i.e., ut is 0.01-strongly monotone. The logistic loss is also smooth as long as at is bounded.

In the Cournot competition model, there are N ﬁrms, each supplying the market with a quantity xi [0, R] of some good up to the ﬁrm s production capacity. This good is then priced as a decreasing function P(x) of the total supply, as determined by all ﬁrms production. We focus on the standard linear model: P(x) a b PN i=1 xi, where a = 0.5 and b = 1/(NR). The reward of the i-th ﬁrm is given by ut,i(x) (ct,i P(xt)) x, where ct,i [0, 1] is the timevarying marginal production cost of the i-th ﬁrm at the t-th round. This game is b-strongly monotone and 2b-smooth.

We consider time-varying zero-sum strongly convexconcave games with utility ut(x, y) 1

2 x 2 + x Aty 1 2 y 2, which is 1-strongly monotone and 1-smooth.

Results. We report average results with standard deviations of 5 independent runs. Only the randomness of the initial decisions is preserved. All hyper-parameters are set to be theoretically optimal. Figure 1 plots the tracking error of all methods. Smaller tracking error indicates better performance. The results show the supremacy of our TV-Smog and TV-Smog (Smooth), supporting our theoretical results.

6. Conclusion and Future Directions

This work presents an initial solution for the time-varying strongly monotone games. We develop novel decentralized online algorithms and establish a series of fast tracking

0 200 400 600 800 1000 1200 1400 1600

OGD Restart Ader TV-Smog TV-Smog (Smooth)

0 250 500 750 1000 1250 1500 1750 2000

OGD Restart Ader TV-Smog TV-Smog (Smooth)

(b) mushrooms

0 200 400 600 800 1000

OGD Restart Ader TV-Smog TV-Smog (Smooth)

0 200 400 600 800 1000 1200

OGD Restart Ader TV-Smog TV-Smog (Smooth)

(d) svmguide3

0 250 500 750 1000 1250 1500 1750 2000

1400 OGD Restart Ader TV-Smog TV-Smog (Smooth)

(e) Cournot

0 250 500 750 1000 1250 1500 1750 2000

7000 OGD Restart Ader TV-Smog TV-Smog (Smooth)

(f) zero-sum

Figure 1: Tracking error of all methods in four time-varying distributed ℓ2-regularized logistic regression tasks, named by a1a, mushrooms, splice and svmguide3, a time-varying Cournot competition game named Cournot and a time-varying strongly convex-concave zero-sum game named zero-sum.

error rates, in both non-smooth and smooth cases. Our results match the best-known time-invariant bounds and enjoy many favorable properties. We also investigate the more specialized common interest strongly monotone games, and show that our bounds also hold this setup regarding the more fundamental utility tracking error.

One major future direction is to explore the problem s lower bound. Another interesting question is to leverage the strong convexity induced by strong monotonicity to further improve current results in the smooth case.

Acknowledgements

This research was supported by National Key R&D Program of China (2022ZD0114800), NSFC (61921006, 62206125), and Fundamental Research Funds for the Central Universities (2023300246). The authors thank the ICML area chair for pointing out that the utility tracking error may only apply to common interest games rather than general games, which has been corrected in the camera-ready version. We are also grateful to Mengxiao Zhang for the helpful discussions.

Fast Rates in Time-Varying Strongly Monotone Games

Abernethy, J. D., Bartlett, P. L., Rakhlin, A., and Tewari, A. Optimal stragies and minimax lower bounds for online convex games. In Proceedings of the 21st Annual Conference on Learning Theory (COLT), pp. 415 424, 2008a.

Abernethy, J. D., Hazan, E., and Rakhlin, A. Competing in the dark: An efﬁcient algorithm for bandit linear optimization. In Proceedings of the 21st Annual Conference Computational Learning Theory (COLT), pp. 263 274, 2008b.

Baby, D. and Wang, Y. Optimal dynamic regret in exp-concave online learning. In Proceedings of the 34th Annual Conference Computational Learning Theory (COLT), pp. 359 409, 2021.

Baby, D. and Wang, Y. Optimal dynamic regret in proper online learning with strongly convex losses and beyond. In Proceedings of the 25th International Conference on Artiﬁcial Intelligence and Statistics (AISTATS), pp. 1805 1845, 2022.

Beck, A. First-order Methods in Optimization. SIAM, 2017.

Besbes, O., Gur, Y., and Zeevi, A. J. Non-stationary stochastic optimization. Operations Research, 63(5):1227 1244, 2015.

Bravo, M., Leslie, D. S., and Mertikopoulos, P. Bandit learning in concave N-person games. In Advances in Neural Information Processing Systems 31 (Neur IPS), pp. 5666 5676, 2018.

Bregman, L. and Fokin, I. Methods of determining equilibrium situations in zero-sum polymatrix games. Optimizatsia, 40(57):70 82, 1987.

Cai, Y., Candogan, O., Daskalakis, C., and Papadimitriou, C. Zero-sum polymatrix games: A generalization of minmax. Mathematics of Operations Research, 41(2): 648 655, 2016.

Cesa-Bianchi, N. and Lugosi, G. Prediction, Learning, and Games. Cambridge University Press, 2006.

Cutkosky, A. Parameter-free, dynamic, and stronglyadaptive online learning. In Proceedings of the 37th International Conference on Machine Learning (ICML), pp. 2250 2259, 2020.

Daniely, A., Gonen, A., and Shalev-Shwartz, S. Strongly adaptive online learning. In Proceedings of the 32nd International Conference on Machine Learning (ICML), pp. 1405 1411, 2015.

Daskalakis, C. and Papadimitriou, C. H. On a network generalization of the minmax theorem. In Proceedings of the 36th International Colloquium on Automata, Languages and Programming (ICALP), pp. 423 434, 2009.

Daskalakis, C., Goldberg, P. W., and Papadimitriou, C. H. The complexity of computing a Nash equilibrium. SIAM Journal on Computing, 39(1):195 259, 2009.

Daskalakis, C., Deckelbaum, A., and Kim, A. Near-optimal no-regret algorithms for zero-sum games. In Proceedings of the 22nd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 235 254, 2011.

d Oro, S., Mertikopoulos, P., Moustakas, A. L., and Palazzo, S. Interference-based pricing for opportunistic multicarrier cognitive radio systems. IEEE Transactions on Wireless Communications, 14(12):6536 6549, 2015.

Drusvyatskiy, D., Fazel, M., and Ratliff, L. J. Improved rates for derivative free gradient play in strongly monotone games. In Proceedings of the 61st IEEE Conference on Decision and Control (CDC), pp. 3403 3408, 2022.

Duvocelle, B., Mertikopoulos, P., Staudigl, M., and Vermeulen, D. Multi-agent online learning in time-varying games. Mathematics of Operations Research, 48(2):914 941, 2023.

Even-Dar, E., Mansour, Y., and Nadav, U. On the convergence of regret minimization dynamics in concave games. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing (STOC), pp. 523 532, 2009.

Facchinei, F. and Kanzow, C. Generalized Nash equilibrium problems. Annals of Operations Research, 175(1):177 211, 2010.

Facchinei, F. and Pang, J.-S. Finite-dimensional Variational Inequalities and Complementarity Problems. Springer, 2003.

Flaxman, A., Kalai, A. T., and Mc Mahan, H. B. Online convex optimization in the bandit setting: Gradient descent without a gradient. In Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 385 394, 2005.

Freund, Y. and Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119 139, 1997.

Freund, Y. and Schapire, R. E. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1-2):79 103, 1999.

Fast Rates in Time-Varying Strongly Monotone Games

Golowich, N., Pattathil, S., and Daskalakis, C. Tight lastiterate convergence rates for no-regret learning in multiplayer games. In Advances in Neural Information Processing Systems 33 (Neur IPS), 2020.

Gopal, S. and Yang, Y. Distributed training of large-scale logistic models. In Proceedings of the 30th International Conference on Machine Learning (ICML), pp. 289 297, 2013.

Hazan, E. and Seshadhri, C. Adaptive algorithms for online decision problems. Electronic Colloquium on Computational Complexity, TR07-088, 2007.

Hazan, E., Agarwal, A., and Kale, S. Logarithmic regret algorithms for online convex optimization. Machine Learning, 69(2-3):169 192, 2007.

Hsieh, K., Phanishayee, A., Mutlu, O., and Gibbons, P. B. The non-iid data quagmire of decentralized machine learning. In Proceedings of the 37th International Conference on Machine Learning (ICML), pp. 4387 4398, 2020.

Kangarshahi, E. A., Hsieh, Y., Sahin, M. F., and Cevher, V. Let s be honest: An optimal no-regret framework for zero-sum games. In Proceedings of the 35th International Conference on Machine Learning (ICML), pp. 2493 2501, 2018.

Kapralov, M. and Panigrahy, R. Prediction strategies without loss. In Advances in Neural Information Processing Systems 24 (NIPS), pp. 828 836, 2011.

Lin, T., Zhou, Z., Mertikopoulos, P., and Jordan, M. I. Finitetime last-iterate convergence for multi-agent learning in games. In Proceedings of the 37th International Conference on Machine Learning (ICML), pp. 6161 6171, 2020.

Lin, T., Zhou, Z., Ba, W., and Zhang, J. Doubly optimal no-regret online learning in strongly monotone games with bandit feedback. Ar Xiv preprint, ar Xiv:2112.02856, 2022.

Loizou, N., Berard, H., Gidel, G., Mitliagkas, I., and Lacoste-Julien, S. Stochastic gradient descent-ascent and consensus optimization for smooth games: Convergence analysis under expected co-coercivity. In Advances in Neural Information Processing Systems 34 (Neur IPS), pp. 19095 19108, 2021.

Lu, Z. and Hazan, E. On the computational efﬁciency of adaptive and dynamic regret minimization. Ar Xiv preprint, ar Xiv:2207.00646, 2022.

Mertikopoulos, P. and Moustakas, A. L. Learning in an uncertain world: Mimo covariance matrix optimization with imperfect feedback. IEEE Transactions on Signal Processing, 64(1):5 18, 2015.

Monderer, D. and Shapley, L. S. Potential games. Games and Economic Behavior, 14(1):124 143, 1996.

Nash Jr, J. F. Equilibrium points in n-person games. Proceedings of the National Academy of Sciences, 36(1): 48 49, 1950.

Nemirovski, A., Onn, S., and Rothblum, U. G. Accuracy certiﬁcates for computational problems with convex structure. Mathematics of Operations Research, pp. 52 78, 2010.

Pogodin, R. and Lattimore, T. On ﬁrst-order bounds, variance and gap-dependent bounds for adversarial bandits. In Proceedings of the 35th Conference on Uncertainty in Artiﬁcial Intelligence (UAI), pp. 894 904, 2019.

Rakhlin, A. and Sridharan, K. Online learning with predictable sequences. In Proceedings of the 26th Annual Conference Computational Learning Theory (COLT), pp. 993 1019, 2013a.

Rakhlin, A. and Sridharan, K. Optimization, learning, and games with predictable sequences. In Advances in Neural Information Processing Systems 26 (NIPS), pp. 3066 3074, 2013b.

Rosen, J. B. Existence and uniqueness of equilibrium points for concave N-person games. Econometrica: Journal of the Econometric Society, pp. 520 534, 1965.

Roughgarden, T. Intrinsic robustness of the price of anarchy. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing (STOC), pp. 513 522, 2009.

Sentenac, F., Boursier, E., and Perchet, V. Decentralized learning in online queuing systems. In Advances in Neural Information Processing Systems 34 (Neur IPS), pp. 18501 18512, 2021.

Srebro, N., Sridharan, K., and Tewari, A. Smoothness, low noise and fast rates. In Advances in Neural Information Processing Systems 23 (NIPS), pp. 2199 2207, 2010.

Syrgkanis, V., Agarwal, A., Luo, H., Luo, H., and Schapire, R. E. Fast convergence of regularized learning in games. In Advances in Neural Information Processing Systems 28 (NIPS), pp. 2989 2997, 2015.

Tatarenko, T. and Kamgarpour, M. Learning Nash equilibria in monotone games. In Proceedings of the 58th IEEE Conference on Decision and Control (CDC), pp. 3104 3109, 2019a.

Tatarenko, T. and Kamgarpour, M. Learning generalized Nash equilibria in a class of convex games. IEEE Transactions on Automatic Control, 64(4):1426 1439, 2019b.

Fast Rates in Time-Varying Strongly Monotone Games

Tatarenko, T. and Kamgarpour, M. On the rate of convergence of payoff-based algorithms to Nash equilibrium in strongly monotone games. Ar Xiv preprint, ar Xiv:2202.11147, 2022.

Tseng, P. On linear convergence of iterative methods for the variational inequality problem. Journal of Computational and Applied Mathematics, 60(1-2):237 252, 1995.

van Erven, T. and Koolen, W. M. Metagrad: Multiple learning rates in online learning. In Advances in Neural Information Processing Systems 29 (NIPS), pp. 3666 3674, 2016.

Wang, G., Lu, S., and Zhang, L. Adaptivity and optimality: A universal algorithm for online convex optimization. In Proceedings of the 35th Conference on Uncertainty in Artiﬁﬁcial Intelligence (UAI), pp. 659 668, 2019.

Zhang, L., Lu, S., and Zhou, Z.-H. Adaptive online learning in dynamic environments. In Advances in Neural Information Processing Systems 31 (Neur IPS), pp. 1330 1340, 2018.

Zhang, L., Jiang, W., Yi, J., and Yang, T. Smoothed online convex optimization based on discounted-normalpredictor. In Advances in Neural Information Processing Systems 35 (Neur IPS), pp. 4928 4942, 2022a.

Zhang, L., Wang, G., Yi, J., and Yang, T. A simple yet universal strategy for online convex optimization. In Proceedings of the 39th International Conference on Machine Learning (ICML), pp. 26605 26623, 2022b.

Zhang, M., Zhao, P., Luo, H., and Zhou, Z.-H. No-regret learning in time-varying zero-sum games. In Proceedings of the 39th International Conference on Machine Learning (ICML), pp. 26772 26808, 2022c.

Zhang, Z., Cutkosky, A., and Paschalidis, I. C. Adversarial tracking control via strongly adaptive online learning with memory. In Proceedings of the 25th International Conference on Artiﬁcial Intelligence and Statistics (AISTATS), pp. 8458 8492, 2022d.

Zhao, P. Online Ensemble Theories and Methods for Robust Online Learning. Ph D thesis, Nanjing University, Nanjing, China, 2021. Advisor: Zhi-Hua Zhou.

Zhao, P., Zhang, L., Jiang, Y., and Zhou, Z.-H. A simple approach for non-stationary linear bandits. In Proceedings of the 23rd International Conference on Artiﬁcial Intelligence and Statistics (AISTATS), pp. 746 755, 2020a.

Zhao, P., Zhang, Y.-J., Zhang, L., and Zhou, Z.-H. Dynamic regret of convex and smooth functions. In Advances in Neural Information Processing Systems 33 (Neur IPS), pp. 12510 12520, 2020b.

Zhao, P., Zhang, Y.-J., Zhang, L., and Zhou, Z.-H. Adaptivity and non-stationarity: Problem-dependent dynamic regret for online convex optimization. Ar Xiv preprint, ar Xiv:2112.14368, 2021.

Zhao, P., Xie, Y.-F., Zhang, L., and Zhou, Z.-H. Efﬁcient methods for non-stationary online learning. In Advances in Neural Information Processing Systems 35 (Neur IPS), pp. 11573 11585, 2022.

Zhou, Z.-H. Ensemble Methods: Foundations and Algorithms. Chapman & Hall/CRC Press, 2012.

Zinkevich, M. Online convex programming and generalized inﬁnitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning (ICML), pp. 928 936, 2003.

Fast Rates in Time-Varying Strongly Monotone Games

A. Related Work

In this section, we brieﬂy review related works of non-stationary (Appendix A.1) and universal (Appendix A.2) online learning, and monotone games (Appendix A.3), and compare our work with existing time-varying results (Appendix A.4).

A.1. Non-Stationary Online Convex Optimization

Online convex optimization (OCO), a versatile model which stems from the seminal work of Zinkevich (2003), depicts the learning of a player in adversarial environments, aiming to minimize the game theoretical performance measure called regret (Cesa-Bianchi & Lugosi, 2006):

t=1 ft(wt) min w W

Zinkevich (2003) proposed the well-known OGD and proved an O(

T) regret, which is minimax optimal (Abernethy et al., 2008a). In non-stationary environments, a more appropriate performance measure is dynamic regret (Zinkevich, 2003):

D-REG({ut}T t=1)

which aims to compete the performance of the online learner with changing comparators. For convex functions, the author proved that OGD enjoys a dynamic regret of O(

T(1 + PT )), which is sub-optimal. Zhang et al. (2018) showed that the minimax lower bound is Ω( p

T(1 + PT )) and closed the gap by proposing an online ensemble algorithm with an O( p

T(1 + PT )) regret. When exploiting the smoothness, Zhao et al. (2020b) developed a novel online ensemble algorithm called Sword that enjoys a best-of-both-worlds problem-dependent dynamic regret bound of order O( p

(1 + PT + min{VT , FT })(1 + PT )), where VT PT t=2 supw W ft(w) ft 1(w) 2 measures the variation of the gradients and FT PT t=1 ft(ut) is the cumulative loss of comparators. In the subsequent extended version (Zhao et al., 2021), the authors further proposed the Sword++ algorithm that achieves the same dynamic regret guarantee while improving the per-round gradient complexity from O(log T) to 1 via the collaborative online ensemble framework. For exp-concave functions, Baby & Wang (2021) proposed an improper learning method (the decisions can be outside the feasible domain) that enjoys the optimal e O(1 + T 1/3P 2/3 T ) dynamic regret guarantee. Later, Baby & Wang (2022) showed that improper learning is not necessary for strongly convex functions and designed a proper learning method with the optimal dynamic regret bound in this case.

Another commonly used performance measure in non-stationary OCO is the adaptive regret (Hazan & Seshadhri, 2007; Daniely et al., 2015), which measures the regret inside an arbitrary interval I [T]:

t I ft(wt) min w W

Hazan & Seshadhri (2007) ﬁrst proposed the weakly adaptive regret, max I [T ] A-REG(I), which measures the maximum regret over all possible intervals. They designed a meta algorithm called Follow the Leading History (FLH) and proved an e O(

T) adaptive regret with OGD as base learners. Later, Daniely et al. (2015) optimized the adaptive regret A-REG(I) over arbitrary interval I, which is called strongly adaptive regret, and proved an e O( p

|I|) regret via a new meta algorithm called SAOL with OGD as base learners. For exp-concave and strongly convex functions, FLH-OGD (Hazan & Seshadhri, 2007) can achieve the optimal regret bound of O(log T).

The aforementioned works all choose the weighted combination mechanism (i.e., wt = P

i pt,iwt,i) in non-stationary online learning problems. We note that there are other methodologies for combining the multiple base learners, for example, using techniques from parameter-free online learning (Cutkosky, 2020; Zhang et al., 2022d) and techniques based on the discounted normal predictor (Kapralov & Panigrahy, 2011; Zhang et al., 2022a).

Finally, we mention that the computational efﬁciency of the non-stationary online learning, including dynamic and adaptive regret minimization, is recently considered (Zhao et al., 2022; Lu & Hazan, 2022). Typical algorithms for non-stationary online learning often utilize the online ensemble framework. The overall method requires to run multiple base learners (typically O(log T) base learners) and a meta learner simultaneously, which may introduce some computational overheads

Fast Rates in Time-Varying Strongly Monotone Games

compared to the static regret minimization. Zhao et al. (2022) investigated the projection complexity in non-stationary environments, an important and potentially time-consuming operation for common online learning algorithms such as OGD, and reduced the number of projection operations needed in each round to 1.

A.2. Universal Online Convex Optimization

When the loss functions are convex, using OGD with a ﬁxed step size η 1/

T gives an optimal regret bound of O(

T) (Zinkevich, 2003). When the loss functions are α-exp-concave, online Newton step enjoys a regret bound of O(d log T) (Hazan et al., 2007). And when the loss functions are σ-strongly convex, using the same OGD algorithm but with a time-varying step size ηt = 1/(σt) gives an optimal O(log T) guarantee (Hazan et al., 2007). Although the theories of OCO are rich, its application requires heavy domain knowledge: (i) the learner must know the type of functions in advance in order to select an appropriate algorithm; and (ii) for exp-concave and strongly-convex functions, the learner needs the strong convexity and the exp-concavity coefﬁcients (i.e., values of α and σ) as prior knowledge. Universal online learning aims to remove the above barriers. A milestone is the Meta Grad proposed by van Erven & Koolen (2016). Brieﬂy, Meta Grad is a two-layer algorithm, with multiple base learners running on some complicated surrogate losses to guess the curvature coefﬁcients and a meta learner tracking the best one on the ﬂy. Later, many improvements and extensions are proposed, that further broaden the application scope of universal methods. Speciﬁcally, Wang et al. (2019) improved the results of Meta Grad for strongly convex functions. Zhang et al. (2022b) proposed a simpler universal method with the same theoretical guarantees as previous works, but without the need to design handcrafted surrogate losses.

A.3. Monotone Games

Monotone game is a general class of games that satisﬁes the diagonal strict convexity condition of Rosen (1965). Many common and well-studied classes of games, such as zero-sum poly-matrix games (Bregman & Fokin, 1987; Daskalakis & Papadimitriou, 2009; Cai et al., 2016) and its generalization zero-sum socially-concave games (Even-Dar et al., 2009) are monotone. Solving the Nash equilibriums of a monotone game is equivalent to solving a variational inequality, where there is a vast literature, and we refer the readers to the work of Facchinei & Pang (2003) for further references. Besides, in the following, we only talk about the convergence of the last iterate, the decision of the last round of the game.

In generally monotone games, only asymptotic convergence can be guaranteed. With additional smoothness assumption, Golowich et al. (2020) obtained nearly optimal convergence rate in terms of the total function gap.

There are many works devoted to studying certain subclasses of monotone games. Speciﬁcally, Lin et al. (2020) considered co-coercive games, i.e., v(x) v(y), x y µ v(x) v(y) 2, in the unconstrained setting (i.e., vt(x ) = 0) and chose vt(xt) 2 as the performance measure. Although their setting is more general than ours, the two works cannot be compared directly with since their performance measure is also easier as vt(xt) 2 is observable while ours xt x 2 is not. Besides, Loizou et al. (2021) considered a more general setting with quasi-strong monotonicity and expected co-coercivity, and gave a linear tracking error convergence rate O(ρT ) for some ρ (0, 1) (see Corollary 4.2 therein). We refer readers to Loizou et al. (2021) for detailed relationship between monotonicity, co-coercivity and strong monotonicity.

For strongly monotone games, non-asymptotic convergence rates can be established. In the full information setting, an O(T 1) last-iterate convergence of the tracking error can be obtained (Bravo et al., 2018). With smoothness (i.e., Lipschitz continuous gradients), this rate can be improved to O(ρT ) for some ρ (0, 1) (Tseng, 1995; Facchinei & Pang, 2003). In the bandit feedback setting, existing works cannot obtain faster rates than the single-player results. Speciﬁcally, Bravo et al. (2018) extended the FKM algorithm (Flaxman et al., 2005) to the multi-player setting, and proposed an algorithm with an O(T 1/3) distance tracking error. With smoothness, Lin et al. (2022) leveraged the self-concordant barrier (Abernethy et al., 2008b), and obtained an improved result of O(T 1/2), matching the optimal result in the single-player setting. Drusvyatskiy et al. (2022) achieved the same guarantee with Lin et al. (2022), while their result depends on an additional assumption that the Jacobian of each player s gradient is Lipschitz continuous. There is also a different line of research (Tatarenko & Kamgarpour, 2019a;b; 2022) that can also obtain the optimal result using Tikhonov (ridge) regularization.

A.4. Time-Varying Games

The study of online time-varying games has just started in recent years and the corresponding related works are limited. In the following, we directly compare with two works that are mostly related to ours.

Comparison with (Zhang et al., 2022c). As for problem setup, while both their work and ours investigate time-varying

Fast Rates in Time-Varying Strongly Monotone Games

games, theirs considers the two-player zero-sum games and ours considers the multi-player continuous strongly monotone games. As for techniques, in smooth games, the bias term injection technique is partially inspired by theirs, which originally stems from the work of Zhao et al. (2021). However, as mentioned in Section 4.2, the structure of xt,i xt 1,i 2 naturally rises in our problem and the correction term injection is a well-studied solution to handle this term. We also note that the results of Zhang et al. (2022c) can be directly extended to the multi-player setting, but without tracking error guarantees. The key observation (Proposition 2) and the techniques used in the non-smooth games are different from theirs.

Comparison with (Duvocelle et al., 2023). Both works investigate time-varying strongly monotone games. Theirs obtained a dynamic regret of O(

T + T 2/3P 1/3 T ), where PT PT t=2 x t x t 1 denotes the variation of the Nash equilibriums. The authors argued that this bound is tight according to an Ω(T 2/3(V f T )1/3) lower bound established by Besbes et al. (2015). We point out this argument is ﬂawed since the V f T quantity is in fact the functional variation: V f T PT t=2 supx |ft(x) ft 1(x)|. The dynamic regret bounds with respect to PT and V f T are in general incomparable. Based on the fact that O(

T + T 2/3P 1/3 T ) is far from optimal, we obtain improved results in Section 3. Also, Duvocelle et al. (2023) mentioned that they cannot incorporate strongly convexity into this problem. In this work, we show that this is actually possible, as illustrated in Proposition 2.

B. Proof of Proposition 1

Proof. First we prove that strong monotonicity directly implies strong convexity. Speciﬁcally, since the game of the t-th round is µ-strongly monotone (see Assumption 2), strong monotonicity (see Deﬁnition 1) tells that for any x, y X, µ x y 2 vt(x) vt(y), x y , where vt( ) = ut( ). The following result shows that the deﬁnition of strong monotonicity is in fact an equivalent condition of strong convexity in common interest strongly monotone games, which means that the utility function ut is µ-strongly convex.

Lemma 2 (Theorem 5.24 of Beck (2017)). Let f be a proper closed and convex function. Then for a given σ > 0, the following three claims are equivalent: (i) f is σ-strongly convex; (ii) for any x dom( f), y dom(f) and g f(x), f(y) f(x) + g, y x + σ

2 y x 2; and (iii) for any x, y dom( f) and gx f(x), gy f(y), gx gy, x y σ x y 2.

Consequently, it holds that

2 DIST-ERR = µ

t=1 xt x t 2

t=1 vt(x t ), x t xt +

t=1 ut(x t ) UTIL-ERR,

where the ﬁrst inequality is due to the second property in Lemma 2 and the last inequality is by the deﬁnition of the Nash equilibrium x t , i.e., vt(x t ), x t x 0 for any x X, which completes the proof.

C. Analysis for Section 3

In this section, we provide the detailed analysis of Section 3, including the algorithm and proof of Theorem 1 in Appendix C.1, the proof of Proposition 2 in Appendix C.2, the algorithm and proof of Theorem 2 in Appendix C.3 and the proofs of Theorem 3 and Corollary 1 in Appendix C.4 and Appendix C.5. Finally, we list some useful lemmas in Appendix C.6.

C.1. Algorithm and Proof of Theorem 1

To begin with, we give a different upper bound for the distance tracking error. Speciﬁcally, it holds that

t=1 vt(xt), xt x t =

i=1 vt,i(xt), xt,i x t,i , (C.1)

where the ﬁrst step uses the deﬁnition of strong monotonicity and the property of Nash equilibrium: vt(x t ), x t x 0 for any x X. In the following, we deploy ADER (Zhang et al., 2018), restated in Algorithm 3 in our notations, to optimize P

t [T ] vt,i(xt), xt,i x t,i for each player. Speciﬁcally, in the t-th round, the algorithm submits a weighted combination of base learners decisions as xt,i = P

j [M] pt,i,jxt,i,j and simultaneously receives a gradient feedback vt,i(xt). Then

Fast Rates in Time-Varying Strongly Monotone Games

Algorithm 3 Deploying ADER (Zhang et al., 2018) for the i-th player Input: learning rate of the meta learner εi Initialize: learning rate of the meta learner εi = 1/

T, decisions of all base learners x1,i,j Xi, weights of the meta learner p1,i,j 1/(j2 + j) for all j [M], step size pool Hη i

ηi,j = 2j 1D

7 2T , j [M]

for t = 1, . . . , T do

Receive xt,i,j from the j-th base learner Submit xt,i = PM j=1 pt,i,jxt,i,j and receive a gradient feedback vt,i(xt) Meta update: Update the weights via pt+1,i,j pt,i,j exp( εi vt,i(xt), xt,i,j ) for j = 1, . . . , M do

Base update: Update via xt+1,i,j = ΠXi[xt,i,j ηi,jvt,i(xt)] end end

each base learner updates her own decision to xt+1,i,j using her own step size ηi,j via OGD. Finally the meta learner updates her weights to pt+1,i,j using Hedge with learning rate εi and a linearized loss vt,i(xt), xt,i,j . In the following, we provide the proof of Theorem 1.

Proof. The proof is direct by bounding the summation of all players regret bounds and leveraging the theoretical guarantee of ADER (deferred to Lemma 3 in Appendix C.6),

i=1 vt,i(xt), xt,i x t,i

T(1 + PT,i) N p

T(1 + PT ), (by Lemma 3)

where a b means a e O(b), PT,i PT t=2 x t,i x t 1,i denotes the path length of the Nash equilibriums of the i-th player and the last step is true because

i=1 x t,i x t 1,i

i=1 x t,i x t 1,i 2 =

N x t x t 1 2 =

Leveraging (C.1) ﬁnishes the proof.

C.2. Proof of Proposition 2

Proof. Through the deﬁnition of strong monotonicity (Deﬁnition 1) and moving µ

2 PT t=1 xt x t 2 to the right-hand side,

t=1 xt x t 2

t=1 vt(xt), xt x t µ

t=1 xt x t 2.

Multiplying both sides by 2, it is easy to ﬁnd that

t=1 xt x t 2 2

t=1 vt(xt), xt x t µ

t=1 xt x t 2 !

t=1 vt,i(xt), xt,i x t,i µ

t=1 xt,i x t,i 2 !

t=1 ℓt,i(xt,i)

t=1 ℓt,i(x t,i)

Fast Rates in Time-Varying Strongly Monotone Games

Algorithm 4 FLH-OGD (Hazan & Seshadhri, 2007) Input: feasible domain W, learning rate of the meta learner ε, loss functions {ht( )}T t=1 Initialize: T instances A1, . . . , AT of OGD for t = 1, . . . , T do

Receive wt,j from Aj for all j [t] Submit decision wt = Pt j=1 pt,jwt,j and receive loss function ht Meta update: Set bpt+1,t+1 = 0 and update bpt+1,j = pt,j exp( εht(wt,j)) Pt j=1 pt,j exp( εht(wt,j)) for j [t]

Meta update: Obtain pt+1,t+1 = 1/(t + 1) and pt+1,j = (1 (t + 1) 1)bpt+1,j for j [t] for j = 1, . . . , t + 1 do

Base update: Update wt,j to wt+1,j via Aj with loss function ht end end

where although the second step only holds for ℓ2-norm, it can be extended to arbitrary norm up to constant factors. The last step holds by the deﬁnition of the strongly convex loss function (3.2), restated as follows: ℓt,i(x) vt,i(xt), x + µ

2 x xt,i 2, which ﬁnishes the proof.

C.3. Algorithm and Proof of Theorem 2

Before giving the proof of Theorem 2, we explain the complete algorithms of this part. Brieﬂy, the overall algorithms are concluded in Algorithm 1, Algorithm 4, Algorithm 5 and Algorithm 6. Among them, Algorithm 1 is the top algorithm, which aims to deal with the unknown strong monotonicity parameter µ and takes Algorithm 6 as base learners. Algorithm 6 aims to handle general convex feasible domains and loss functions with known strong convexity parameter and obtains its decisions from Algorithm 5. Algorithm 5 works on box domains and runs Algorithm 4 on each dimension. Algorithm 4 runs FLH-OGD (Hazan & Seshadhri, 2007), an adaptive regret minimization algorithm which runs FLH as the meta learner and multiple OGDs as the base learners. Algorithm 1, Algorithm 5 and Algorithm 6 are illustrated in the language of online multi-player games while Algorithm 4 is summarized in the general online convex optimization notations.

In the following, we give the proof of Theorem 2.

Proof. To begin with, we decompose the regret on the surrogate loss function st,i,j, with respect to its j-th base learner:

t=1 st,i,j(xt,i)

t=1 st,i,j(x t,i)

t=1 st,i,j(xt,i)

t=1 st,i,j(xt,i,j)

| {z } META-REG

t=1 st,i,j(xt,i,j)

t=1 st,i,j(x t,i)

| {z } BASE-REG

As for the meta regret, we can reuse the result of Wang et al. (2019), restated in Lemma 4, since we use the same meta algorithm therein. It remains to bound the base regret. Denoting by λj, Gj the strong convexity parameter and the gradient upper bound of the surrogate loss st,i,j, its gradient upper bound satisﬁes st,i,j( ) = ajvt,i(xt) + 2a2 j G2( xt,i) aj G + 2a2 j G2D and its strong convexity parameter holds for 2st,i,j( ) 2a2 j G2. For simplicity, we deﬁne Gj aj G+2a2 j G2D and λj = 2a2 j G2. Following Lemma 5 (about the base regret with known strong convexity coefﬁcient), denoting by TVT,i PT t=2 x t,i x t 1,i 1 the total variation of the comparator sequence {x t,i}T t=1, it holds that

t=1 st,i,j(xt,i,j)

t=1 st,i,j(x t,i) G2 j λj (1 + T 1/3TV2/3 T,i ) 1 + T 1/3P 2/3 T,i ,

where a b means a e O(b), the last step is due to the fact that aj 1 3DG, which leads to G2 j λj, and the relationship between ℓ1-norm and ℓ2-norm: x 1

d x 2 for any x Rd. Denoting by REG META-REG + BASE-REG, we obtain a problem-dependent bound for P t [T ] vt,i(xt), xt,i x t,i :

t=1 vt,i(xt), xt,i x t,i REG

aj + aj G2 T X

t=1 xt,i x t,i 2 REG + G

v u u t REG

t=1 xt,i x t,i 2,

Fast Rates in Time-Varying Strongly Monotone Games

Algorithm 5 Known strong monotonicity parameter µ and box domain (Baby & Wang, 2022) Input: box domain Xi with diameter B, gradient upper bound G, strong monotonicity parameter µ, dimension d, loss functions {gt( )}T t=1 Initialize: d FLH-OGD instances (Algorithm 4) A1, . . . , Ad with feasible domain [ B, B] and learning rate εk = 1 2(2B+G/µ)2 for t = 1, . . . , T do

Receive x(k) t,i from Ak for k [d]

Submit xt,i (x(1) t,i , . . . , x(d) t,i ) and receive loss function gt,i Construct surrogate loss h(k) t (x) (x (x(k) t,i gt(xt,i)(k)/µ))2

for k = 1, . . . , d do

Update x(k) t,i to x(k) t+1,i via Ak with loss function h(k) t,i end end

Algorithm 6 Known strong monotonicity parameter µ (Baby & Wang, 2022) Input: domain Xi, gradient upper bound G, strong monotonicity parameter µ, loss functions {st( )}T t=1 Initialize: tightest box b Xi that circumscribes Xi, initialization A of Algorithm 5 with feasible set b Xi and gradient upper bound 2G for t = 1, . . . , T do

Receive bxt,i from A Submit xt,i = ΠXi[bxt,i] = arg minx Xi x bxt,i 1 and receive gradient feedback vt,i(xt) Construct surrogate loss gt(x) st(x) + G x ΠXi(x) 1 Update bxt,i to bxt+1,i via A with loss function gt and strong monotonicity parameter µ end

where the last step is by considering whether the optimal value of aj = q

REG G2 PT t=1 xt,i x t,i 2 is covered by the candidate

pool H = {aj | aj = 2 j/(3DG), j [M]} with M = 1

2 log2 T + 1. Finally, the i-th player s regret can be bounded by

t=1 vt,i(xt), xt,i x t,i µ

t=1 xt,i x t,i 2 REG + G

v u u t REG

t=1 xt,i x t,i 2 µ

t=1 xt,i x t,i 2 1

where the last step is due to xy ax

2a for any x, y, a > 0. Combining the regret bounds of all players, we obtain

µDIST-ERR 2

t=1 vt,i(xt), xt,i x t,i µ

t=1 xt,i x t,i 2 !

i=1 (1 + T 1/3P 2/3 T,i ) N

µ (1 + T 1/3P 2/3 T ),

which ﬁnishes the proof.

C.4. Proof of Theorem 3

Before the analysis of this part, we denote by x T the Nash equilibrium of the averaged game from t = 1 to T, whose utility gradient is v T ( ) P t [T ] vt( )/T. Note that the averaged Nash equilibrium x T must exist and is unique because: (i) the average of multiple strongly monotone games is still strongly monotone, due to Deﬁnition 1; and (ii) a monotone game admits a unique Nash equilibrium (Rosen, 1965).

In the following, we give the detailed proof of Theorem 3 .

Fast Rates in Time-Varying Strongly Monotone Games

Proof. The distance tracking error can be decomposed with the averaged Nash equilibrium x T as an intermediate variable:

µDIST-ERR = µ

t=1 xt x t 2 2 µ

t=1 xt x T 2

t=1 x T x t 2

To bound TERM (A), through the deﬁnition of strong monotonicity (Deﬁnition 1), it holds that

t=1 xt x T 2

t=1 vt(xt) vt( x T ), xt x T µ

t=1 xt x T 2.

Multiplying both sides by 2, we obtain

TERM (A) = µ

t=1 xt x T 2 2

t=1 vt(xt), xt x T µ

t=1 xt x T 2 !

t=1 v T ( x T ), x T xt + 2

t=1 vt( x T ) v T ( x T ), x T xt

t=1 ℓt,i(xt,i)

t=1 ℓt,i( x T,i)

e O(1) + DWT , (by using Theorem 2 with PT,i = 0)

where the second inequality is due to the deﬁnition of the surrogate loss function ℓt,i (3.2), x T is the Nash equilibrium of v T , namely, v T ( x T ), x T x 0 for any x X and

t=1 vt( x T ) v T ( x T ), x T x t D

t=1 vt( x T ) v T ( x T ) D

t=1 sup x X vt(x) v T (x) = DWT . (C.2)

TERM (B) can be bounded as

TERM (B) = µ

t=1 x T x t 2

t=1 vt(x t ) vt( x T ), x t x T (by Deﬁnition 1)

t=1 vt(x t ), x t x T +

t=1 v T ( x T ), x T x t +

t=1 vt( x T ) v T ( x T ), x T xt DWT

where the ﬁrst two terms in the second last step is non-positive because x t and x T are both Nash equilibriums of the t-th round and the averaged game, respectively, i.e., vt(x t ), x t x 0 and v T ( x T ), x T y 0 for any x, y X. The last term can be bounded using (C.2) again. Combining the bounds of TERM (A) and TERM (B) ﬁnishes the proof.

C.5. Proof of Corollary 1

Proof. The proof is direct by combining Proposition 1 and Theorem 3.

C.6. Useful Lemmas

Lemma 3 (Theorem 3 of Zhang et al. (2018)). Assume the domain has a diameter D, i.e., x y D for any x, y W and the gradient of the loss functions ft has an upper bound G, i.e., ft( ) G, then ADER enjoys

t=1 ft(ut) O p

T(1 + PT ) ,

where PT PT t=2 ut ut 1 denotes the path length.

Fast Rates in Time-Varying Strongly Monotone Games

Lemma 4 (Lemma 1 of Wang et al. (2019)). Due to the construction of the surrogate loss functions st,i,j, if the meta learner updates as in Algorithm 1, the meta regret on the surrogate loss can be bounded by

t=1 st,i,j(xt,i)

t=1 st,i,j(xt,i,j) O(log T).

Lemma 5 (Theorem 8 of Baby & Wang (2022)). Let the feasible domain be W Rd and let the loss functions {ft( )}T t=1 are H-strongly convex in ℓ2-norm and satisfy ft( ) G. Then the algorithm (Figure 3 therein, restated in Algorithm 6 for self-containedness) guarantees that

t=1 ft(ut) e O G2

H max{d, d1/3T 1/3TV2/3 T } ,

for any comparator sequence {ut}T t=1 W with TVT PT t=2 ut ut 1 1. e O( ) hides the dependence of log T.

Note that the original proof omits the dependence of the gradient upper bound G and the strong convexity parameter H. For self-containedness, we restate the proof to illuminate the concerned parameter dependence.

Proof of Lemma 5. The regret can be decomposed as the regret summation of the surrogate loss on each dimension:

t=1 ft(ut) H

t=1 h(k) t (w(k) t )

t=1 h(k) t (u(k) t )

where w(k) t denotes the k-th dimension of wt and so does u(k) t . h(k) t (x) x w(k) t ft(wt)(k)

H 2 is a squared

surrogate loss function. For general squared loss function gt(x) (x y)2, where x [ B, B] and y [ G, G], it is 1/(2(B + G)2)-exp-concave. It is known that for a α-exp-concave loss function, FLH-OGD enjoys O(α 1 log T) adaptive regret on each interval inside the whole time horizon (Hazan & Seshadhri, 2007, Lemma 8). As a result, the dynamic regret of the squared loss function {gt( )}T t=1 is about O((B + G)2(1 + T 2/3P 1/3 T )) (Baby & Wang, 2022, Lemma 18). Since the surrogate loss h(k) t ( ) is α-exp-concave with parameter

we obtain T X

t=1 ft(ut) H

k=1 2 2B + G

2 (1 + T 2/3P 1/3 T ) G2

H (1 + T 2/3P 1/3 T ),

where a b means a e O(b), and the last step is true since we consider H 1 without loss of generality.

D. Analysis for Section 4

In this section, we provide the detailed analysis of Section 4, including the proof of Lemma 1 in Appendix D.1, the proof of Theorem 4 in Appendix D.2, the proof of Theorem 5 in Appendix D.3 and the proof of Corollary 2 in Appendix D.4. Finally, in Appendix D.5, we list some useful lemmas.

D.1. Proof of Lemma 1

Proof. First, by the standard dynamic regret analysis of OOGD (see Lemma 6), it holds that

t=1 vt,i(xt), xt,i x t,i D2 + 2DPT,i

t=2 vt,i(xt) vt 1,i(xt 1) 2 !

t=2 xt,i xt 1,i 2 .

Fast Rates in Time-Varying Strongly Monotone Games

Diving into the second term (i.e., gradient variation) above, we have

t=2 vt,i(xt) vt 1,i(xt 1) 2 =

t=2 vt,i(xt) vt 1,i(xt) + vt 1,i(xt) vt 1,i(xt 1) 2

t=2 sup x X vt,i(x) vt 1,i(x) 2 + 2

t=2 vt 1,i(xt) vt 1,i(xt 1) 2

t=2 vt 1(xt) vt 1(xt 1) 2 2VT + 2L2 T X

j=1 xt,j xt 1,j 2 , (D.1)

where the second inequality holds because if we denote by h(x) vt,i(x) vt 1,i(x) 2 and g(x) vt(x) vt 1(x) 2, then h(x) g(x) is true obviously, which consequently implies supx X h(x) supx X g(x). Thus we obtain

t=1 sup x X vt,i(x) vt 1,i(x) 2

t=1 sup x X vt(x) vt 1(x) 2 = VT .

The last inequality in (D.1) uses Assumption 3 and the fact that xt xt 1 2 = PN j=1 xt,j xt 1,j 2. Finally, we have

t=1 vt,i(xt), xt,i x t,i D2 + 2DPT,i

2ηi + ηi(G2 + 2VT ) + 2ηi L2 N X

where Sj PT t=2 xt,j xt 1,j 2 denotes the cumulative switching cost of the j-th player.

D.2. Proof of Theorem 4

In the following, we provide a formal version of Theorem 4 and the corresponding proof.

Theorem 6. Under Assumptions 1-3, if all problem-dependent quantities are known a priori, the i-th player runs OOGD:

xt,i = ΠXi[bxt,i ηimt,i], bxt+1,i = ΠXi[bxt,i ηivt,i(xt)], (D.2)

with optimism mt,i = vt 1,i(xt 1), then the distance tracking error can be upper-bounded by

(1 + VT + PT )(1 + PT )) if choosing the step size as

D2 + 2DPT,i

2G2 + 4VT , 1 2L

O(1 + WT ) if choosing the step size as

2G2 + 4VT , 1 2L

Proof of Theorem 6. We ﬁrst prove the gradient-variation bound, then the gradient-variance bound.

Gradient-Variation Bound. Summing the regret bounds of all players using Lemma 1 gives

t=1 vt,i(xt), xt,i x t,i =

D2 + 2DPT,i

2ηi + ηi(G2 + 2VT ) + 2L2 N X

Fast Rates in Time-Varying Strongly Monotone Games

Our goal is to design the players step sizes η1, . . . , ηN such that

which means the step sizes need to satisfy 2L2 P j [N] ηj 1 4ηi . This requirement is easy to satisfy. Assume there is a constant X such that 2L2 P

j [N] ηj X 1/(4ηi). Solving the second inequality gives ηi 1/(4X) for all i [N].

Plugging this into the ﬁrst inequality, we have L2N/(2X) X. We can set X = L p

N/2, and it sufﬁces to choose

Overall the distance tracking error is at most

t=1 vt,i(xt), xt,i x t,i

D2 + 2DPT,i

2ηi + ηi(G2 + 2VT ) (D.3)

2(G2 + 2VT )(D2 + 2DPT,i) + L

N(D2 + 2DPT,i) p

(1 + VT + PT )(1 + PT ),

where a b means a e O(b) and the last inequality is by choosing the step size as

D2 + 2DPT,i

2G2 + 4VT , 1 2L

Gradient-Variance Bound. Let x T be the Nash equilibrium of the averaged game over the whole time horizon T. Following the similar analysis in Appendix C.4, we have

µDIST-ERR 2

t=1 vt,i(xt), xt,i x T,i + 4DWT 2

2ηi + ηi(G2 + 2VT ) + 4DWT

2(G2 + 2VT ) + LD2

N + 4DWT 1 + WT ,

where the second step reuse Lemma 6 with PT,i = 0 and the third step is by setting the step size as

2G2 + 4VT , 1 2L

and the last step is due to the relationship between gradient variation VT and gradient variance WT :

t=1 sup x X vt(x) vt 1(x) 2 = 4G2 T X

t=1 sup x X

vt(x) vt 1(x)

t=1 sup x X vt(x) vt 1(x) (by Assumption 1)

t=1 sup x X vt(x) v T (x) + 4G

t=1 sup x X vt 1(x) v T (x)

t=1 sup x X vt(x) v T (x) = 8GWT , (D.6)

which ﬁnishes the proof.

Remark 5. The tuning of the step sizes requires the players number N. We note that this requirement is mild, which also appears in other multi-player analyses, e.g., in the work of Syrgkanis et al. (2015).

Fast Rates in Time-Varying Strongly Monotone Games

D.3. Proof of Theorem 5

To start with, we specify the step sizes of the meta learner:

ε0,i = ε1,i = 1

L, and εt,i = 1 q

L2 + Pt 1 s=2 vs,i(xs) vs 1,i(xs 1) 2 for 2 t T 1. (D.7)

In the following, we provide a formal version of Theorem 5 and the corresponding proof.

Theorem 7. Under Assumptions 1-3, Algorithm 2 enjoys the following guarantees simultaneously for the distance tracking error and regret of individual players when they agree to run the same algorithm (abbreviated as honest regret):

(1 + VT + PT )(1 + PT )

O(1 + WT ) ,

t=1 vt,i(xt), xt,i x t,i

(1 + VT + PT )(1 + PT,i)

(1 + WT + PT,i)(1 + PT,i) ,

where PT , VT , WT denote the path length, gradient variation and gradient variance, respectively.

Proof of Theorem 7. We ﬁrst prove the gradient-variation bound and then the gradient-variance bound. For simplicity, we assume all problem-dependent quantities are known for a moment and illuminate the setting of the step size pool later.

Gradient-Variation Bound. First we decompose the regret of the i-th player with respect to its j-th base learner:

t=1 vt,i(xt), xt,i x t,i =

t=1 vt,i(xt), xt,i xt,i,j

| {z } META-REG

t=1 vt,i(xt), xt,i,j x t,i

| {z } BASE-REG

The regret of the meta learner, due to Lemma 7 (optimistic Hedge is a special case of optimistic FTRL by choosing negative entropy as the regularizer), can be bounded as

t=1 ℓt,i, pt,i ej KL(ej, p1,i)

t=1 εt 1,i ℓt,i mt,i 2

1 8εt 1,i pt,i pt+1,i 2 1, (D.8)

where the ﬁrst term above can be bounded by plugging the setup of learning rate (D.7):

KL(ej, p1,i)

εT 1,i = ln(1/p1,i,j)

εT 1,i = ln M

v u u t L2 +

t=2 vt,i(xt) vt 1,i(xt 1) 2.

The second term in (D.8) can be analyzed as

t=1 εt 1,i ℓt,i mt,i 2 = ε0,i ℓ1,i 2 +

t=2 εt 1,i max j [M] vt,i(xt) vt 1,i(xt 1), xt,i,j 2

D2(G + λD)2

t=2 εt 1,i vt,i(xt) vt 1,i(xt 1) 2

= O(1) + D2 T X

vt,i(xt) vt 1,i(xt 1) 2 q

L2 + Pt 1 s=2 vs,i(xs) vs 1,i(xs 1) 2 (by (D.7))

v u u t L2 +

t=2 vt,i(xt) vt 1,i(xt 1) 2,

Fast Rates in Time-Varying Strongly Monotone Games

where the ﬁrst step is because of m1,i = 0, the second step is due to maxt,i,j ℓt,i,j GD + λD2 (see (4.3) for the

deﬁnition of the meta learner s loss) and the last step uses PT t=1 at/ q

1 + Pt 1 s=1 as 4 q

1 + PT t=1 at + maxt [T ] at for any a1, a2, . . . , a T > 0 (Pogodin & Lattimore, 2019, Lemma 4.8). Overall, the regret of the meta learner is at most

t=1 ℓt,i, pt,i ej (4D2 + ln M)

v u u t L2 +

t=2 vt,i(xt) vt 1,i(xt 1) 2 L

t=1 pt,i pt+1,i 2 1 + O(1).

Plugging in the deﬁnition of the surrogate loss function ℓt,i (4.3), the meta regret can be bounded as

META-REG (4D2 + ln M)

v u u t L2 +

t=2 vt,i(xt) vt 1,i(xt 1) 2 L

t=1 pt,i pt+1,i 2 1

j=1 pt,i,j xt,i,j xt 1,i,j 2 + λ

t=2 xt,i,j xt 1,i,j 2 + O(1).

As for the base regret, following the standard analysis of OOGD (deferred to Lemma 6), it holds that

BASE-REG D2 + 2DPT,i

2ηi,j + ηi,j

t=2 vt,i(xt) vt 1,i(xt 1) 2 !

t=2 xt,i,j xt 1,i,j 2 .

Suppose there is a uniform upper bound X for ηi,j, the step size of the j-th base learner of the i-th player, for all i [N], j [M]. Overall, the regret of the i-th player is at most

t=1 vt,i(xt), xt,i x t,i O(1) L

t=1 pt,i pt+1,i 2 1 λ

j=1 pt,i,j xt,i,j xt 1,i,j 2

t=2 xt,i,j xt 1,i,j 2 + 5D2 + ln M + 2DPT,i

G2 + L2 + 2

t=2 vt,i(xt) vt 1,i(xt 1) 2 !

c1 + 2DPT,i

2ηi,j + ηi,j(c2 + 4VT ) + 8L2D2X

i=1 pt,i pt 1,i 2 (by (D.1))

j=1 pt,i,j xt,i,j xt 1,i,j 2 + λ 1

t=2 xt,i,j xt 1,i,j 2

j=1 pt,i,j xt,i,j xt 1,i,j 2 L

t=1 pt,i pt+1,i 2 1 + O(1)

where c1 = 5D2 + ln M, c2 = G2 + L2. The ﬁrst inequality holds since xy ax

2a for any x, y, a > 0,

v u u t L2 +

t=2 vt,i(xt) vt 1,i(xt 1) 2 1 2ηi,j + ηi,j L2 + PT t=2 vt,i(xt) vt 1,i(xt 1) 2

and the second step is true since (D.1) and

xt,i xt 1,i 2 =

j=1 pt,i,jxt,i,j

j=1 pt 1,i,jxt 1,i,j

Fast Rates in Time-Varying Strongly Monotone Games

j=1 pt,i,jxt,i,j

j=1 pt,i,jxt 1,i,j +

j=1 pt,i,jxt 1,i,j

j=1 pt 1,i,jxt 1,i,j

j=1 pt,i,j xt,i,j xt 1,i,j 2 + 2D2 pt,i pt 1,i 2 1 . (D.10)

Summing all players regret, we have

t=1 vt,i(xt), xt,i x t,i O(1) +

c1 + 2DPT,i

2ηi,j + ηi,j(c2 + 4VT )

t=2 xt,i,j xt 1,i,j 2 + 8NL2D2X L

t=2 pt,i pt 1,i 2 1

+ (8NL2X λ)

j=1 pt,i,j xt,i,j xt 1,i,j 2 . (D.11)

Following the above analysis, for all i [N] and j [M], we choose

λ = 16NLD2, ηi,j 1 128NLD2 = X

4X = 16NLD2 < 0, 8NL2X λ 15NLD2 < 0, 8NL2D2X L

The distance tracking error can be thus bounded as

t=1 vt(xt), xt x t =

t=1 vt,i(xt), xt,i x t,i

c1 + 2DPT,i

2ηi,j + ηi,j(c2 + 4VT ) p

(1 + VT + PT )(1 + PT ), (D.12)

where a b means a e O(b) and the last step is by choosing the step size optimally as ηi,j = η i , where

c1 + 2DPT,i

2c2 + 8VT , 1 128NLD2

As for the honest regret of individual players, to start with, since P

t [T ] vt(xt), xt x t P

t [T ] vt(xt) vt(x t ), xt x t µ P

t [T ] xt x t 2 0, it means that the left-hand side of (D.11) is positive. As a result, we can move the terms that are canceled to the left hand side of (D.11) to upper-bound them. By doing so, we obtain

t=2 xt xt 1 2 2

D2 pt,i pt 1,i 2 +

j=1 pt,i,j xt,i,j xt 1,i,j 2

(1 + VT + PT )(1 + PT ).

Thus the honest regret of the i-th player, via Lemma 1, can be bounded as

t=1 vt,i(xt), xt,i x t,i D2 + 2DPT,i

2ηi,j + ηi,j(G2 + 2VT ) + 2ηi,j L2 T X

t=2 xt xt 1 2

D2 + 2DPT,i

2ηi,j + ηi,j(G2 + 2VT ) + 2ηi,j L2p

(1 + VT + PT )(1 + PT )

Fast Rates in Time-Varying Strongly Monotone Games

D2 + 2DPT,i

2ηi,j + ηi,j(1 + VT + PT ) q

(1 + PT,i)(1 + VT + PT ), (D.14)

where the last step is by setting the step size optimally as

D2 + 2DPT,i N(1 + VT + PT ). (D.15)

Gradient-Variance Bound. Following the same analysis as in Appendix C.4, the environmental non-stationarity can be extracted into the gradient-variance term:

µDIST-ERR 2

t=1 vt,i(xt), xt,i x T,i + 4DWT p

1 + VT + WT 1 + WT ,

where the last step holds due to (D.12) with PT = 0 and the relationship between gradient variation VT and gradient variance WT (D.6). As for the honest regret of individual players, following the same argument of the last part, we have

t=2 xt xt 1 2 2

D2 pt,i pt 1,i 2 +

j=1 pt,i,j xt,i,j xt 1,i,j 2

Consequently, via Lemma 1, we have

t=1 vt,i(xt), xt,i x t,i D2 + 2DPT,i

2ηi,j + ηi,j(G2 + 2VT ) + 2ηi,j L2 T X

t=2 xt xt 1 2

D2 + 2DPT,i

2ηi,j + ηi,j(G2 + 2VT ) + 2ηi,j L2(1 + WT )

D2 + 2DPT,i

2ηi,j + ηi,j(1 + WT ) q

(1 + PT,i)(1 + WT + PT,i), (D.16)

where the last step is by setting the step size optimally as

D2 + 2DPT,i

N(1 + WT ) . (D.17)

Step Size Pool Conﬁguration. In the last part of the proof, we consider together different optimal tunings of the above guarantees, including (D.13), (D.15) and (D.17), and give the setting of step size pool that covers all of them. Speciﬁcally, we choose the step size pool as

1 1 + T 2j 1, 1 NL

where M log2(T + 1) + 1 = O(log T), which ﬁnishes the proof.

D.4. Proof of Corollary 2

Proof. The proofs of gradient-variation and gradient-variance bound can be obtained directly by combining Proposition 1 and Theorem 5. As a result, in the following, we focus on the small-loss bound.

Small-Loss Bound. If we choose the step size of all players as η (to be speciﬁed later), following Lemma 6, it holds that

t=1 vt(xt), xt x t N(D2 + 2DPT )

2η + ηNG2 + η

t=2 vt,i(xt) vt 1,i(xt 1) 2 .

Fast Rates in Time-Varying Strongly Monotone Games

Next, we show that the second term above, i.e., the gradient-variation term, naturally implies small-loss term. Speciﬁcally,

t=2 vt,i(xt) vt 1,i(xt 1) 2 2

vt,i(xt) 2 + vt 1,i(xt 1) 2

t=1 vt,i(xt) 2 = 4

t=1 vt(xt) 2 16L

t=1 ut(xt), (D.19)

where the equality uses the fact that P

i [N] vt,i(xt) 2 = vt(xt) 2 and the last step holds due to the property of nonnegative and smooth functions ( f( ) 2 2 4Lf( ) for any L-smooth and non-negative function f) (Srebro et al., 2010, Lemma 3.1). Rearranging gives

t=1 vt(xt), xt x t 16ηL

t=1 ut(xt) ut(x t )

N(D2 + 2DPT )

2η + ηNG2 + 16ηLFT ,

t [T ] ut(x t ) denotes the cumulative game value. Since P

t [T ] ut(xt) ut(x t ) P

t [T ] vt(xt), xt x t ,

t=1 vt(xt), xt x t

N(D2 + 2DPT )

2η + η(NG2 + 16LFT ).

Finally, we upper-bound the utility tracking error by

t=1 vt(xt), xt x t N(D2 + 2DPT )

η + 2η(NG2 + 16LFT ) p

(1 + FT + PT )(1 + PT ),

where the last step is by choosing the step sizes of each player equally as

η1 = . . . = ηN = η = min

N(D2 + 2DPT ) 2NG2 + 32LFT , 1 32L

Since (D.20) is covered by (D.18), the proof is ﬁnished.

D.5. Useful Lemmas

Lemma 6 (Optimistic OGD (Zhao et al., 2021, Theorem 1)). If the domain diameter is bounded by D, i.e., w1 w2 D for any w1, w2 W, the gradient norm is bounded by G, i.e., f( ) G, the dynamic regret of OOGD is bounded by

t=1 ft(ut) D2 + 2DPT

t=2 ft(wt) mt 2 !

t=2 wt wt 1 2,

where PT PT t=2 ut ut 1 denotes the path length.

Lemma 7 (Optimistic FTRL (Rakhlin & Sridharan, 2013a)). Suppose the learning rates are non-increasing: ηt ηt+1 for all t 1. Optimistic FTRL with the following update rule:

wt+1 = arg min w W

s=1 fs(ws) + mt+1, w

where ψ(w) P

i [d] wi ln wi, guarantees that

t=1 ft(wt), wt u KL(u, w1)

t=1 ηt 1 ft(wt) mt 2

1 8ηt 1 wt wt+1 2,

where KL(x, y) Pd i=1 xi ln(xi/yi) denotes the Kullback-Leibler divergence.