# online_learning_in_periodic_zerosum_games__7c1524a4.pdf

Online Learning in Periodic Zero-Sum Games

Tanner Fiez University of Washington Seattle, Washington fiezt@uw.edu

Ryann Sim SUTD Singapore ryann_sim@mymail.sutd.edu.sg

Stratis Skoulakis SUTD Singapore efstratios@sutd.edu.sg

Georgios Piliouras SUTD Singapore georgios@sutd.edu.sg

Lillian Ratliff University of Washington Seattle, Washington ratliffl@uw.edu

A seminal result in game theory is von Neumann s minmax theorem, which states that zero-sum games admit an essentially unique equilibrium solution. Classical learning results build on this theorem to show that online no-regret dynamics converge to an equilibrium in a time-average sense in zero-sum games. In the past several years, a key research direction has focused on characterizing the day-today behavior of such dynamics. General results in this direction show that broad classes of online learning dynamics are cyclic, and formally Poincaré recurrent, in zero-sum games. We analyze the robustness of these online learning behaviors in the case of periodic zero-sum games with a time-invariant equilibrium. This model generalizes the usual repeated game formulation while also being a realistic and natural model of a repeated competition between players that depends on exogenous environmental variations such as time-of-day effects, week-to-week trends, and seasonality. Interestingly, time-average convergence may fail even in the simplest such settings, in spite of the equilibrium being ﬁxed. In contrast, using novel analysis methods, we show that Poincaré recurrence provably generalizes despite the complex, non-autonomous nature of these dynamical systems.

1 Introduction

The study of learning dynamics in zero-sum games is arguably as old of a ﬁeld as game theory itself, dating back to the seminal work of Brown and Robinson [7, 27], which followed shortly after the foundational minmax theorem of von Neumann [33]. The dynamics of online no-regret learning algorithms [10, 29] are of particular interest in zero-sum games as they are designed with an adversarial environment in mind. Moreover, well known results imply that such dynamics converge in a time-average sense to a minmax equilibrium in zero-sum games [10, 15].

Despite the classical nature of the study of online no-regret learning dynamics in zero-sum games, the actual transient behavior of such dynamics was historically not as understood. However, in the past several years this topic has gained attention with a number of works studying such dynamics in zerosum games (and variants thereof) with a particular focus on continuous-time analysis [24, 25, 20, 6, 32, 23, 22]. The unifying emergent picture is that the dynamics are approximately cyclic" in a formal sense known as Poincaré recurrence. Moreover, these results have acted as fundamental building blocks for understanding the limiting behavior of their discrete-time variants [2, 12, 21, 11, 3].

Joint ﬁrst authors Joint last authors

35th Conference on Neural Information Processing Systems (Neur IPS 2021).

Despite the plethora of emerging results regarding online learning dynamics in zero-sum games, an important and well motivated aspect of this problem has only begun to receive attention.

How do online learning dynamics behave if the zero-sum game evolves over time?

Clearly, the answer to this question depends critically on how the game is allowed to evolve.

Problem Setting and Model. We study periodic zero-sum games with a time-invariant equilibrium, which is a class of games we formally deﬁne in Section 2. In a periodic zero-sum game, the payoffs that dictate the game are both T-periodic and zero-sum at all times. We consider both periodic zero-sum bilinear games on inﬁnite, unconstrained strategy spaces and periodic zero-sum matrix games (along with network generalizations thereof) on ﬁnite strategy spaces. The goal of this work is to evaluate the robustness of the archetypal online learning behaviors in zero-sum games, Poincaré recurrence and time-average equilibrium convergence, to this natural model of game evolution.

Connections to Repeated Game Models. The time-evolving game model we study can be seen as a generalization of usual repeated game formulations. A time-invariant game is a trivial version of a periodic game, in which case we recover the repeated static game setting. For a general periodic zero-sum game with period T, each stage game now is chosen according to a ﬁxed length T sequence of games, capturing interactions between the players with time-dependent payoffs.

Periodic zero-sum games can also ﬁt into the frameworks of multi-agent contextual games [28] and dynamic games (see, e.g., [5]). In a multi-agent contextual game [28], the environment selects a context from a set before each round of play and this choice deﬁnes the game that is played. Periodic zero-sum games can be seen as a multi-agent contextual game where the environment draws contexts from the available set in a T-periodic fashion with each context deﬁning a zero-sum game with a common equilibrium. In the class of dynamic games, there is a game state on which the payoffs may depend that evolves with dynamics. Periodic zero-sum games can be interpreted as a dynamic game where the state transitions do not depend on the strategies of the players, the state is T-periodic, and the payoffs are completely deﬁned by the state. We remark that the focus of existing work on contextual games and dynamic games is distinct from the questions investigated in this paper.

The periodic zero-sum game model allows us to capture competitive settings where exogenous environmental variations manifest in an effectively periodic/epochal fashion. This naturally occurs in market competitions where time-of-day effects, week-to-week trends, and seasonality can dictate the game between players. To illustrate this point, consider a competition between service providers that wish to maximize their users, while the total market size evolves seasonally over time. This evolution affects the utility functions, even if the fundamentals of the market, and consequently the equilibrium, remain invariant.

Contributions and Approach. In this paper, for the classes of periodic zero-sum bilinear games and periodic zero-sum polymatrix games with time-invariant equilibrium, we investigate the day-today and time-average behaviors of continuous-time gradient descent-ascent (GDA) and follow-theregularized-leader (FTRL) learning dynamics, respectively. This study highlights the careful attention that must be given to the dynamical systems in periodic zero-sum games which preclude standard proof techniques for Poincaré recurrence, while also revealing that intuition from existing results on static zero-sum games can be totally invalidated even by simple restricted examples in periodic zero-sum games.

Contribution 1: Poincaré Recurrence. A key technical challenge in this work is that the dynamical systems which emerge from learning dynamics in periodic zero-sum games correspond to nonautonomous ordinary differential equations, whereas learning dynamics in static zero-sum games correspond to autonomous ordinary differential equations. Consequently, the usual proof methods from static zero-sum games for showing Poincaré recurrence are insufﬁcient on their own in periodic zero-sum games. We overcome this challenge by delicately piecing together properties of periodic systems to construct a discrete-time autonomous system that we are able to show is Poincaré recurrent. This approach allows to prove both the GDA and FTRL learning dynamics are Poincaré recurrent in the respective classes of periodic zero-sum games (Theorems 1 & 2). Finally, we show both periodicity and a time-invariant equilibrium are necessary for such results in evolving games (Proposition 1).

Contribution 2: Time-Average Strategy Equilibration Fails. Given that Poincaré recurrence provably generalizes from static zero-sum games to periodic zero-sum games, it may be expected that the time-average strategies in periodic zero-sum games converge to the time-invariant equilibrium

as in static zero-sum games. Surprisingly, we show that counterexamples can be constructed to this intuition even in the simplest of periodic zero-sum games. In particular, we prove the negative result that the time-average GDA and FTRL strategies do not necessarily converge to the time-invariant equilibrium in the respective classes of zero-sum games (Propositions 2 & 3).

Contribution 3: Time-Average Equilibrium Utility Convergence. Despite the negative result on the time-average strategy convergence, in the special case of periodic zero-sum bimatrix games we are able to show a complimentary positive result on the time-average utility convergence. Speciﬁcally, we show that the time-average utilities of the FTRL learning dynamics converge to the average of the equilibrium utility values of all the zero-sum games included in a single period of our time-evolving games. (Theorem 3).

Organization. In Section 2, we formalize the classes of games that we study. We present characteristics of dynamical systems as they pertain to this work in Section 3. Section 4 and 5 contain our results analyzing GDA and FTRL learning dynamics in continuous and ﬁnite strategy periodic zero-sum bilinear and polymatrix games, respectively. We present numerical experiments in Section 6 and ﬁnish with a discussion in Section 7. Proofs of are theoretical results are deferred to the appendix.

2 Game-Theoretic Preliminaries

2.1 Continuous Strategy Periodic Zero-Sum Games

For continuous strategy periodic zero-sum games, we study periodic zero-sum bilinear games. We begin by formalizing zero-sum bilinear games and then deﬁne the periodic variant.

Zero-Sum Bilinear Games. Given a matrix A Rn1 n2, a zero-sum bilinear game on continuous strategy spaces can be deﬁned by the max-min problem maxx1 Rn1 minx2 Rn2 x 1 Ax2. Formally, the game is deﬁned by the pair of payoff matrices {A, A } and the action space of agents 1 and 2 are given by Rn1 and Rn2, respectively. Player 1 seeks to maximize the utility function u1(x1, x2) = x 1 Ax2 while player 2 optimizes the utility u2(x1, x2) = x 2 A x1. The game is zero-sum since for any x1 Rn1 and x2 Rn2, the sum of utility over each player is zero. For zero-sum bilinear games, a Nash equilibrium corresponds to a joint strategy (x 1, x 2) such that for each player i and j = i, ui(x i , x j) ui(xi, x j), xi Rni. Note that (x 1, x 2) = (0, 0) is always a Nash equilibrium of a zero-sum bilinear game.

Periodic Zero-Sum Bilinear Games. We study the continuous-time GDA learning dynamics in a class of games we refer to as periodic zero-sum bilinear games. The key distinction from a typical static zero-sum bilinear game is that the payoff matrix is no longer ﬁxed in this class of games. Instead, the payoff matrix may change at each time instant as long as game remains zero-sum and the continuous-time sequence of payoffs is periodic. The next deﬁnition formalizes this class of games.

Deﬁnition 1 (Periodic Zero-Sum Bilinear Game). A periodic zero-sum bilinear game is an inﬁnite sequence of zero-sum bilinear games {A(t), A(t) } t=0 in which the player set and strategy spaces are ﬁxed and the payoff matrix is such that A(t) = A(t + T) for a ﬁnite period T and all t 0. Note that in such a game, (0, 0) is always a time-invariant Nash equilibrium. Furthermore, we assume that the dependence of the payoff entries on time is smooth everywhere except for a ﬁnite set of points.

2.2 Finite Strategy Periodic Zero-Sum Games

For ﬁnite strategy periodic zero-sum games, we analyze periodic zero-sum polymatrix games. In what follows we deﬁne a zero-sum polymatrix game, which is a network generalization of a bimatrix game, and then detail the periodic variant considered in this paper.

Zero-Sum Polymatrix Games. An N-player polymatrix game is deﬁned by an undirected graph G = (V, E) where V is the player set and E is the edge set where a bimatrix game is played between the endpoints of each edge [8]. Each player i V has a set of actions Ai = {1, . . . , ni} that can be selected at random from a distribution xi called a mixed strategy. The mixed strategy set of player i V is the simplex in Rni denoted by Xi = ni 1 = {xi Rni 0 : P

α Ai xiα = 1} where xiα denotes the probability of action α Ai. The joint strategy space is denoted by by X = Πi V Xi.

The bimatrix game on edge (i, j) is described using a pair of matrices Aij Rni nj and Aji Rnj ni. The utility or payoff of agent i V under the strategy proﬁle x X is given by ui(x) =

P j:(i,j) E x i Aijxj and corresponds to the sum of payoffs from the bimatrix games the player participates in. We further denote by uiα(x) = P

j:(i,j) E(Aijxj)α the utility of player i V under the strategy proﬁle x = (α, x i) X for α Ai. The game is called zero-sum if P

i V ui(x) = 0 for all x X. Each bimatrix edge game is not necessarily zero-sum in a zero-sum polymatrix game.

A Nash equilibrium in a polymatrix game is a mixed strategy proﬁle x X such that for each player i V , ui(x i , x i) ui(xi, x i), xi Xi. A Nash equilibrium is said to be an interior if supp(x i ) = Ai i V where supp(x i ) = {α Ai : xiα > 0} is the support of x i Xi.

Periodic Zero-Sum Polymatrix Games. We analyze the continuous-time FTRL learning dynamics in a class of games we call periodic zero-sum polymatrix games. This class of games is such that the payoffs deﬁned by the edge games evolve periodically. We consider that this periodic evolution is such that there is a common interior Nash equilibrium that arises in each zero-sum polymatrix game that arrives. The following deﬁnition formalizes the games we study on ﬁnite strategy spaces.

Deﬁnition 2 (Periodic Zero-Sum Polymatrix Game). A periodic zero-sum polymatrix game is an inﬁnite sequence of zero-sum polymatrix games {G(t) = (V (t), E(t))} t=0 in which the set of players, strategy spaces, and edges are ﬁxed and each bimatrix game on an edge (i, j) is such that Aij(t) = Aij(t + T) and Aji(t) = Aji(t + T) for some ﬁnite period T and all t 0. We assume there is a common interior Nash equilibrium x X of the polymatrix game G(t) for all t 0. Furthermore, we assume that the dependence of the payoff entries on time is smooth everywhere except for a ﬁnite set of points.

3 Preliminaries on Dynamical Systems

We now cover concepts from dynamical systems theory that will help us analyze learning dynamics in periodic zero-sum games and prove Poincaré recurrence. Careful attention must be given to these preliminaries in this work since the dynamical systems we study are non-autonomous whereas typical recurrence analysis in the study of learning in games deals with autonomous dynamical systems.

3.1 Background on Dynamical Systems

We begin this section by providing dynamical systems background that is necessary both for deﬁning Poincaré recurrence and sketching typical proof methods along with our approach.

Flows. Consider an ordinary differential equation x = f(t, x) on a topological space X. We can deﬁne the ﬂow φ : R X X of a dynamical system x, for which the following holds: (i) φ(t, ) : X X, often denoted φt : X X, is a homeomorphism for each t R, (ii) φ(t + s, x) = φ(t, φ(s, x)) for all t, s R and all x X, (iii) for each x X, d

dt|t=0φ(t, x) = f(t, x), and (iv) φ(t, x0) = x(t) is the solution.

Existence and Uniqueness. We utilize Carathéodory s existence theorem to guarantee the existence of a ﬂow for x, even for some discontinuous functions f.

Theorem (Carathéodory s existence theorem [13, 16]). Consider a differential equation x = f(t, x) on a rectangular domain R = {(t, y)| |t t0| a, |x x0| b}. If f satisﬁes the following conditions:

1. f(t, x) is continuous in y for each ﬁxed t,

2. f(t, x) is measurable in t for each ﬁxed y,

3. there is a Lebesgue-integrable function m : [t0 a, t0 + a] [0, ) such that |f(t, x)| m(t) for all (t, x) R,

then the differential equation has a solution. Moreover, if f is also Lipschitz continuous, meaning |f(t, x1) f(t, x2)| k(t)|x1 x2| with some Lebesgue-integrable function k : [t0 a, t0 + a] [0, ), then there exists a unique solution of the differential equation.

In the settings we study, the above three conditions hold: Condition 1 holds because for every ﬁxed t, the dynamics we study (speciﬁcally GDA and FTRL) are continuous functions of their state space. Condition 2 holds because the systems we study are ﬁnite and continuous almost-everywhere, and

so by Lusin s theorem [18] are measurable for each ﬁxed y. Finally, Condition 3 is always satisﬁed because the games we study always admit bounded orbits. Hence, it follows that a unique ﬂow exists for all the dynamical systems studied in this paper.

Conservation of Volume. The ﬂow φ of an ordinary differential equations is called volume preserving if the volume of the image of any set U Rd under φt is preserved, meaning that vol(φt(U)) = vol(U). Liouville s theorem states that a ﬂow is volume preserving if the divergence of f at any point x Rd equals zero: that is, divf(t, x) = tr(Df(t, x)) = Pd i=1 f(t,x)

We now transition to give general Poincaré recurrence statements along with discussion of how the results are usually applied in game theory before we outline our proof methods.

3.2 Poincaré Recurrence in Autonomous Dynamical Systems

A number of works in the past several years show that online no-regret learning dynamics are Poincaré recurrent in repeated static zero-sum games (see, e.g., [24, 20, 6]). The proof methods for deriving such results crucially rely on the static nature of the game for the reason that the learning dynamics amount to an autonomous dynamical system. Informally, the standard Poincaré recurrence theorem states that if an autonomous dynamical system preserves volume and every orbit remains bounded, almost all trajectories return arbitrarily close to their initial position, and do so inﬁnitely often [26]; this property of a dynamical system is known as Poincaré recurrence. Thus, proving the Poincaré recurrence of dynamics in repeated static zero-sum games is tantamount to verifying the volume preservation and bounded orbit properties. We now formalize the Poincaré recurrence theorem.

Given a ﬂow φt on a topological space X, a point x X is nonwandering for φt if for each open neighborhood U containing x, there exists T > 1 such that U φT (U) = . The set of all nonwandering points for φt, called the nonwandering set, is denoted Ω(φt). Theorem (Poincaré Recurrence for Continuous-Time Systems [26]). If a ﬂow preserves volume and has only bounded orbits, then for each open set almost all orbits intersecting the set intersect it inﬁnitely often: if φt is a volume preserving ﬂow on a bounded set Z Rd, then Ω(φt) = Z.

In order to describe our proof methods in the following subsection for showing Poincaré recurrence in periodic zero-sum games, it will be useful for us to state an alternative formulation of the Poincaré recurrence theorem that is applicable to autonomous discrete-time systems. Theorem (Poincaré Recurrence for Discrete-Time Maps [4]). Let (X, Σ, µ) be a ﬁnite measure space and let φ: X X be a measure-preserving map. For any E Σ, the set of those points x of E for which there exists N N such that φn(x) / E for all n > N has zero measure. In other words, almost every point of E returns to E. In fact, almost every point returns inﬁnitely often. That is, µ ({x E : N s.t. φn(x) / E for all n > N}) = 0.

3.3 Poincaré Recurrence in Periodic Dynamical Systems

The proof methods described in the last section for showing the Poincaré recurrence of dynamics in static zero-sum games cannot directly be applied to time-evolving zero-sum games as a result of the non-autonomous nature of the systems. In fact, we can construct time-evolving zero-sum games without both periodic payoffs and a time-invariant equilibrium, where online learning dynamics are not Poincaré recurrent.3 Despite this hurdle, we show that in the natural subclass of time-evolving games covered by periodic zero-sum games (periodic payoffs and a time-invariant equilibrium), we can develop proof methods to show the Poincaré recurrence of online learning dynamics. Given the previous claim regarding the possible non-recurrent nature of learning dynamics when there is not both periodic payoffs and a time-invariant equilibrium, this is perhaps the most general class of time-evolving zero-sum games with obtainable positive results in this direction.

We now give an overview of our approach, beginning by recalling properties of periodic systems.

Periodic Systems and Poincaré Maps. A system x = f(t, x) is T-periodic if f(t + T, x) = f(t, x) for all (x, t). Let φt : Rn Rn denote the mapping taking x Rn to the value at time t. For a T-periodic system, φT +s = φs φT so that φk T = (φT )k for any integer k. The mapping φT : Rn Rn is called the Poincaré map or the mapping at a period.

3We formalize this statement in Proposition 1 of the following section.

If the differential equation is well-deﬁned for all x and has a solution for all t [0, T], then for each initial condition (where we have suppressed the dependence on x0), the Poincaré map φT deﬁnes a discrete-time autonomous dynamical system x+ = φT (x). The learning dynamics we study in periodic zero-sum games form T-periodic dynamical systems. Thus, the discrete-time autonomous dynamical system x+ = φT (x) formed by the Poincaré map is key to the analysis methods we pursue. In particular, our approach is to show that this system is Poincaré recurrent, which we then use to conclude that the original continuous-time non-autonomous system is Poincaré recurrent.

Given the previously presented Poincaré recurrence theorem for discrete-time maps, proving the Poincaré recurrence of the system x+ = φT (x) requires verifying the volume preservation and bounded orbit properties, which then implies the measure preserving property. The following result states that if the divergence of a T-periodic vector ﬁeld f(x, t) is divergence free so that the ﬂow φt is volume preserving, then the Poincaré map φT and the resulting discrete-time dynamical system x+ = φT (x) is also volume preserving. Theorem (Volume preservation for T-Periodic Systems [1, 3.16.B, Thm 2]). If the T-periodic system x = f(t, x) is divergence-free, then φT preserves volume.

Similarly, if orbits of x = f(t, x) are bounded, then clearly the orbits of x+ = φT (x) are bounded. Hence, to show the system x+ = φT (x) is Poincaré recurrent, we prove x = f(t, x) has a divergencefree vector-ﬁeld (equivalently, that the ﬂow is volume preserving) and only bounded orbits. This will then be sufﬁcient to conclude x = f(t, x) is Poincaré recurrent since the discrete-time system forms a subsequence of the continuous-time system.

4 Gradient Descent-Ascent in Periodic Zero-Sum Bilinear Games

This section focuses on the continuous-time GDA learning dynamics in periodic zero-sum bilinear games. The dynamics are such that each player seeks to maximize their utility by following the gradient with respect to their choice variable and are given by

x1 = A(t)x2(t)

x2 = A (t)x1(t).

4.1 Poincaré Recurrence

The focus of this section is on characterizing the transient behavior of the continuous-time GDA learning dynamics in periodic zero-sum games. Speciﬁcally, we show the following result. Theorem 1. The continuous-time GDA learning dynamics are Poincaré recurrent in any periodic zero-sum bilinear game as given in Deﬁnition 1.

Theorem 1 establishes that the recurrent nature of continuous-time GDA dynamics in static zero-sum bilinear games is robust to the dynamic evolution of the payoffs in periodic zero-sum bilinear games.

Prior to outlining the proof steps for Theorem 1, we elaborate on the claim from the previous section that without the periodicity property and a time-invariant equilibrium, such a result is unobtainable. In particular, we show the Poincaré recurrence of the GDA dynamics is not guaranteed without both properties by constructing counterexamples when only one of the properties holds. Proposition 1. There exists time-evolving zero-sum games such that there is a time-invariant equilibrium or the payoffs are periodic (but not both simultaneously) in which the GDA dynamics are not Poincaré recurrent.

This proposition highlights the strength of our results regarding GDA, given that the assumptions needed to obtain them are more or less tight.

We now outline the key intermediate results we prove to obtain Theorem 1, following the techniques described in Section 3.3. For the GDA dynamics, we utilize the observation that the corresponding vector ﬁelds are divergence free to show that the learning dynamics are volume-preserving. We now state this result formally. Lemma 1. The GDA learning dynamics are volume preserving in any periodic zero-sum bilinear game as given in Deﬁnition 1.

We then proceed by showing that the GDA orbits are bounded by deriving a time-invariant function. This step relies on the fact that we have a time-invariant equilibrium.

Lemma 2. The function Φ(t) = 1

2 x 1 (t)x1(t) + x 2 (t)x2(t) is time-invariant. Hence, the GDA orbits are bounded in any periodic zero-sum bilinear game as given in Deﬁnition 1.

Given the volume preservation and bounded orbit characteristics of the continuous-time GDA learning dynamics in periodic zero-sum games, the proof of recurrence follows by applying the arguments described in Section 3.3.

4.2 Time-Average Convergence

The Poincaré recurrence continuous-time GDA learning dynamics in periodic zero-sum bilinear games indicates that the system has regularities which couple the evolving players and evolving game despite the failure to converge to a ﬁxed point. A natural follow-up question to the cyclic transient behavior of the dynamics is whether the long-run converges to a game-theoretically meaningful outcome.

We show that in periodic zero-sum bilinear games, the time-average of GDA learning dynamics may not converge to the time-invariant Nash equilibrium. To prove this, we consider a periodic zero-sum bilinear game with the action space of each player on R so that the evolving payoff simply rescales the vector ﬁeld. We construct the payoff sequence so that the dynamics return back to the initial condition after a period of the game, while the time-average of the dynamics are not equal to the time-invariant equilibrium (x 1, x 2) = (0, 0). Given the simplicity of this example, it effectively rules out hope to provide a meaningful time-average convergence guarantee in this class of games.

Proposition 2. There exists periodic zero-sum bilinear games satisfying Deﬁnition 1 where the time-average strategies of the GDA dynamics fail to converge to the time-invariant equilibrium (0, 0).

5 Follow-the-Regularized-Leader in Periodic Zero-Sum Polymatrix Games

We now analyze continuous-time FTRL learning dynamics in periodic zero-sum polymatrix games. Players that follow FTRL learning dynamics in this class of games select a mixed strategy at each time that maximizes the difference between the cumulative payoff evaluated over the history of games and a regularization penalty. This adaptive strategy balances exploitation based on the past with exploration.

Formally, the continuous-time FTRL learning dynamics for any player i V in a periodic zero-sum polymatrix game with an initial payoff vector yi(0) Rni are given by

yi(t) = yi(0) + R t 0 P

j:(i,j) E Aij(τ)xj(τ)dτ

xi(t) = argmaxxi Xi{ xi, yi(t) hi(xi)} (1)

where hi : Xi R is a penalty term which encourages exploration away from the strategy which maximizes the cumulative payoffs in hindsight. We assume that the regularization function hi( ) for each player i V is continuous, strictly convex on Xi, and smooth on the relative interior of every face of Xi. These assumptions ensure the update xi(t) is well-deﬁned since a unique solution exists.

Common FTRL learning dynamics include the multiplicative weights update and the projected gradient dynamics. The multiplicative weights dynamics for a player i V arise from the regularization function hi(xi) = P

α Ai xiα log xiα and correspond to the replicator dynamics. The projected gradient dynamics for a player i V derive from the Euclidean regularization hi(xi) = 1

To simplify notation, the FTRL dynamics can equivalently be formulated as the following update

yi(t) = y(0) + R t 0 vi(x(τ), τ)dτ

xi(t) = Qi(yi(t)). (2)

Observe that we denote by vi(x, τ) = (uiα(x, τ))α Ai the vector of each pure strategy α Ai utility for agent i V under the joint proﬁle x = (α, x i) X at time τ 0. Moreover, Qi : Rni Xi is known as the choice map and deﬁned as

Qi(yi(t)) = argmaxxi Xi{ yi(t), xi hi(xi)}.

In this notation, the utility of the player i V under the joint strategy x = (xi, x i) X at time t 0 is given by ui(x, τ) = vi(x, τ), xi . Observe that in our notation of utility we are now including the time index to make the dependence on the evolving game and payoffs explicit.

For any player i V we denote by h i : Rni R the convex conjugate of the regularization function hi : Xi R which is given by the quantity h i (yi(t)) = maxxi Xi{ xi, yi(t) hi(xi)}.

5.1 Poincaré Recurrence

We now focus on characterizing the transient behavior of the continuous-time FTRL learning dynamics in periodic zero-sum polymatrix games. It is known that the continuous-time FTRL learning dynamics are Poincaré recurrent in static zero-sum polymatrix games [20]. The following result demonstrates that this characteristic holds even in games that are evolving in a periodic fashion with a time-invariant equilibrium, providing a broad generalization. Theorem 2. The FTRL learning dynamics are Poincaré recurrent in any periodic zero-sum polymatrix game as given in Deﬁnition 2.

For the remainder of this subsection, we describe our proof methods. The general approach is that we prove the Poincaré recurrence of a transformed system using the techniques described in Section 3.3. This conclusion then allows us to infer the equivalent property for the original FTRL system.

The utility differences for each player i V and pure strategy αi Ai \ βi evolve following the differential equation ziαi = viαi(x(t), t) viβi(x(t), t). Toward proving that this system is Poincaré recurrent, we show that the vector ﬁeld z is divergence free and hence volume preserving. Lemma 3. The dynamics deﬁned by the system z are volume preserving in any periodic zero-sum polymatrix game as given in Deﬁnition 2.

We then construct a time-invariant function along the evolution of the system that is sufﬁcient to guarantee that the orbits generated by the z dynamics are bounded. Recall that x denotes the common, time-invariant interior Nash equilibrium of the periodic zero-sum polymatrix game. Lemma 4. The function Φ(x , y(t)) = P

i V h i (yi(t)) x i , yi(t) + hi(x i ) is time-invariant. Hence, the orbits generated by the z dynamics are bounded in any periodic zero-sum polymatrix game as given in Deﬁnition 2.

From this point, we follow the arguments from Section 3.3 to conclude the z dynamics are Poincaré recurrent. Finally, we show that Poincaré recurrence of the z system is sufﬁcient to guarantee the Poincaré recurrence of the FTRL learning dynamics.

The proofs of the results in this section can be found in Appendix D.

5.2 Time-Average Convergence

A number of well-known properties of zero-sum bimatrix games fail to generalize to zero-sum polymatrix games. Indeed, fundamental characteristics of zero-sum bimatrix games include that each agent has a unique utility value in any Nash equilibrium and that equilibrium strategies are exchangeable. However, Cai and Daskalakis [8] show that neither of these properties are guaranteed in zero-sum polymatrix games. Consequently, in general, time-average convergence to the set of equilibrium values in the utility and strategy spaces does not equate to the stronger notion of pointwise convergence.

For the reasons just outlined, we pursue a different notion of time-average convergence in periodic zero-sum polymatrix games. That is, we consider the subclass of periodic zero-sum bimatrix games (2-player periodic zero-sum polymatrix games) and show that the time-average utility of each agent converges to the time-average of the game values (that is, the unique utility the player obtains at any Nash equilibrium) over a period of the periodic game. Theorem 3. In periodic zero-sum bimatrix games satisfying Deﬁnition 2, if each player follows FTRL dynamics, then the time-average utility of each player converges to the time-average over a period of the game equilibrium utility values.

Theorem 3 paints a positive view of the time-average behavior of FTRL learning dynamics in periodic zero-sum games. However, the following result demonstrates that much like in the case of GDA in periodic zero-sum bilinear games, the time-average strategies are not guaranteed to converge to the time-invariant Nash equilibrium.

Proposition 3. There exist periodic zero-sum bimatrix games satisfying Deﬁnition 1 in which the time-average strategies of FTRL dynamics fail to converge to the time-invariant Nash equilibrium.

We prove the negative result in this proposition by constructing a simple counterexample that corresponds to a time-varying rescaling of Matching Pennies.

The proofs of the results in this section can be found in Appendix E.

6 Experiments

In this section, we present several experimental simulations that illustrate our theoretical results. To begin, for continuous-time GDA dynamics we show that Poincaré recurrence holds in a periodic zero-sum bilinear game. We consider the ubiquitous Matching Pennies game with payoff matrix 1 1 1 1 . We then use the following periodic rescaling with period 2π:

α(t) = sin(t) 0 t 3π

π (t mod(2π) 2π) 3π

When players follow the GDA learning dynamics, we see from Figure 1 that the trajectories when plotted alongside the value of the periodic rescaling are bounded. A similar experimental result holds in the case of FTRL dynamics. In the supplementary material, we simulate replicator dynamics with the same periodic rescaling as in Figure 1. The trajectories in the dual/payoff space also remain bounded due to the invariance of the Kullback-Leibler divergences (KL-divergence).

Figure 1: Bounded trajectories for a periodically rescaled Matching Pennies game updated using GDA.

Figure 2: Weighted sum of KL-divergences for a 64-player periodically rescaled Matching Pennies game. Note that despite the complicated trajectories of each player, the weighted sum of their divergences remains constant.

Lemmas 2 and 4 describe functions Φ which remain time-invariant. In the case of replicator dynamics, Φ(t) is the sum of Kullback-Leibler divergences measured between the strategy of each player and their mixed Nash strategy [1/2, 1/2]. We simulated a 64-player polymatrix extension to the Matching Pennies game, where each agent plays against the opponent immediately adjacent to them, forming a toroid -like chain of games. Furthermore, we randomly rescale each game with a different periodic function. Figure 2 depicts the claim presented in the lemmas: although each agent s speciﬁc divergence term KL(x i xi(t)) ﬂuctuates, the sum P

i V KL(x i xi(t)) remains constant.

To generate Figure 3, we show the data from a simpliﬁed 64-player polymatrix game simulation, where the graph that represents player interactions is sparse. Here, the strategy of each player informs the RGB value of a corresponding pixel on a grid. If the system exhibits Poincaré recurrence, we should eventually see similar patterns emerge as the pixels change color over time (i.e., as their corresponding mixed strategies evolve). As observed in Figure 3, the system returns near the initial image at time T = 6226. Further details about the experiments can be found in Appendix F.

(a) T = 0 (b) T = 700 (c) T = 1500 (d) T = 3000 (e) T = 5000 (f) T = 6226

Figure 3: Sequence of images showing Poincaré recurrence in an 8 8 zero-sum polymatrix game, where the changing color of each pixel on the grid represents the mixed strategy of the player over time. After time T = 6226, we see that an approximation of the original image is recovered, showing that the recurrence property holds.

7 Discussion

We study both GDA and FTRL learning dynamics in periodically varying zero-sum games. We prove that the recurrent nature of such dynamics carries over from static games to the classes of evolving games we study. Yet, in the settings we analyze, the time-average convergence behavior from static zero-sum games can fail to generalize. This work takes a step toward understanding the behavior of classical learning algorithms for games in the more realistic setting where the game itself is not ﬁxed.

We conclude by discussing related works on learning dynamics in evolving games. The existing literature considers a number of models that admit distinct results [19, 30, 9, 14]. In a class of time-evolving games where the evolution can be arbitrary [9], algorithms are designed that provide a novel type of regret guarantee called Nash equilibrium regret. In a sense these algorithms are competitive against the Nash equilibrium of the long-term-averaged payoff matrix. In an analysis of discrete-time FTRL dynamics in evolving games that are strictly/strongly monotone [14], sufﬁcient conditions (e.g., when the evolving game stabilizes) under which the dynamics track/converge to the evolving equilibrium are derived. Unfortunately, zero-sum games do not satisfy these strong properties. Finally, in a model of endogenously evolving zero-sum games where a parametric game evolves itself adversarially towards the participating agents, a transformation that treats the game as an additional hidden" agents allows for a reduction to a more standard static network zero-sum game has been developed [19, 30] under which both time-average convergence to equilibrium as well as Poincaré recurrence holds.

This growing literature indicates that time-varying games can exhibit distinct and often times more complex behavior than their classic static counterparts. As such, there is much potential for future work towards a better understanding of their learning dynamics.

Acknowledgments and Disclosure of Funding

This research/project is supported in part by the National Research Foundation, Singapore under its AI Singapore Program (AISG Award No: AISG2-RP-2020-016), NRF 2018 Fellowship NRF-NRFF2018-07, NRF2019-NRF-ANR095 ALIAS grant, grant PIE-SGP-AI-2018-01, AME Programmatic Fund (Grant No. A20H6b0151) from the Agency for Science, Technology and Research (A*STAR). Tanner Fiez was supported by a National Defense Science and Engineering Graduate Fellowship.

[1] V. I. Arnol d. Mathematical methods of classical mechanics, volume 60. Springer Science & Business Media, 2013.

[2] J. P. Bailey and G. Piliouras. Multiplicative weights update in zero-sum games. In ACM Conference on Economics and Computation, pages 321 338, 2018.

[3] J. P. Bailey, G. Gidel, and G. Piliouras. Finite regret and cycles with ﬁxed step-size via alternating gradient descent-ascent. In Conference on Learning Theory, pages 391 407, 2020.

[4] L. Barreira. Poincaré recurrence: old and new. In XIVth International Congress on Mathematical Physics, pages 415 422. World Scientiﬁc, 2006.

[5] T. Ba sar and G. J. Olsder. Dynamic noncooperative game theory. SIAM, 1998.

[6] V. Boone and G. Piliouras. From Darwin to Poincaré and von Neumann: Recurrence and Cycles in Evolutionary and Algorithmic Game Theory. In International Conference on Web and Internet Economics, pages 85 99, 2019.

[7] G. Brown. Iterative solutions of games by ﬁctitious play. In Activity Analysis of Production and Allocation, T.C. Koopmans (Ed.), New York: Wiley., 1951.

[8] Y. Cai and C. Daskalakis. On minmax theorems for multiplayer games. In Symposium of Discrete Algorithms, pages 217 234, 2011.

[9] A. R. Cardoso, J. Abernethy, H. Wang, and H. Xu. Competing against nash equilibria in adversarially changing zero-sum games. In International Conference on Machine Learning, pages 921 930, 2019.

[10] N. Cesa-Bianchi and G. Lugoisi. Prediction, Learning, and Games. Cambridge University Press, 2006.

[11] Y. K. Cheung. Multiplicative weights updates with constant step-size in graphical constant-sum games. In Advances in Neural Information Processing Systems, pages 3528 3538, 2018.

[12] Y. K. Cheung and G. Piliouras. Vortices instead of equilibria in minmax optimization: Chaos and butterﬂy effects of online learning in zero-sum games. In Conference on Learning Theory, pages 807 834, 2019.

[13] E. A. Coddington and N. Levinson. Theory of ordinary differential equations. Tata Mc Graw-Hill Education, 1955.

[14] B. Duvocelle, P. Mertikopoulos, M. Staudigl, and D. Vermeulen. Learning in time-varying games. ar Xiv preprint ar Xiv:1809.03066, 2018.

[15] Y. Freund and R. E. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1-2):79 103, 1999.

[16] J. K. Hale. Ordinary differential equations. Robert E. Krieger Publishing Company, 1980.

[17] M. Heino, J. A. Metz, and V. Kaitala. The enigma of frequency-dependent selection. Trends in Ecology & Evolution, 13(9):367 370, 1998.

[18] N. Lusin. Sur les propriétés des fonctions mesurables. CR Acad. Sci. Paris, 154(25):1688 1690, 1912.

[19] T. Mai, M. Mihail, I. Panageas, W. Ratcliff, V. Vazirani, and P. Yunker. Cycles in Zero Sum Differential Games and Biological Diversity. In ACM Conference on Economics and Computation, page 339 350, 2018.

[20] P. Mertikopoulos, C. Papadimitriou, and G. Piliouras. Cycles in adversarial regularized learning. In Symposium of Discrete Algorithms, pages 2703 2717, 2018.

[21] P. Mertikopoulos, B. Lecouat, H. Zenati, C.-S. Foo, V. Chandrasekhar, and G. Piliouras. Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile. In International Conference on Learning Representations, 2019.

[22] S. G. Nagarajan, D. Balduzzi, and G. Piliouras. From chaos to order: Symmetry and conservation laws in game dynamics. In International Conference on Machine Learning, pages 7186 7196, 2020.

[23] J. Perolat, R. Munos, J.-B. Lespiau, S. Omidshaﬁei, M. Rowland, P. Ortega, N. Burch, T. Anthony, D. Balduzzi, B. De Vylder, G. Piliouras, M. Lanctot, and K. Tuyls. From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization. ar Xiv preprint ar Xiv:2002.08456, 2020.

[24] G. Piliouras and J. S. Shamma. Optimization despite chaos: Convex relaxations to complex limit sets via Poincaré recurrence. In Symposium of Discrete Algorithms, pages 861 873, 2014.

[25] G. Piliouras, C. Nieto-Granda, H. I. Christensen, and J. S. Shamma. Persistent patterns: Multiagent learning beyond equilibrium and utility. In International Conference on Autonomous Agents and Multi-Agent Systems, pages 181 188, 2014.

[26] H. Poincaré. Sur le problème des trois corps et les équations de la dynamique. Acta mathematica, 13(1), 1890.

[27] J. Robinson. An iterative method of solving a game. Annals of Mathematics, 54:296 301, 1951.

[28] P. G. Sessa, I. Bogunovic, A. Krause, and M. Kamgarpour. Contextual games: Multi-agent learning with side information. In Advances in Neural Information Processing Systems, 2020.

[29] S. Shalev-Shwartz et al. Online learning and online convex optimization. Foundations and trends in Machine Learning, 4(2):107 194, 2011.

[30] S. Skoulakis, T. Fiez, R. Sim, G. Piliouras, and L. Ratliff. Evolutionary game theory squared: Evolving agents in endogenously evolving zero sum games. In AAAI Conference on Artiﬁcial Intelligence, 2021.

[31] A. R. Tilman, J. B. Plotkin, and E. Akçay. Evolutionary games with environmental feedbacks. Nature communications, 11(1):1 11, 2020.

[32] E.-V. Vlatakis-Gkaragkounis, L. Flokas, and G. Piliouras. Poincaré Recurrence, Cycles and Spurious Equilibria in Gradient-Descent-Ascent for Non-Convex Non-Concave Zero-Sum Games. In Advances in Neural Information Processing Systems, pages 10450 10461, 2019.

[33] J. von Neumann. Zur theorie der gesellschaftsspiele. Mathematische Annalen, 100:295 300, 1928.

[34] J. S. Weitz, C. Eksin, K. Paarporn, S. P. Brown, and W. C. Ratcliff. An oscillating tragedy of the commons in replicator dynamics with game-environment feedback. National Academy of Sciences, 113(47), 2016.

1. For all authors...

(a) Do the main claims made in the abstract and introduction accurately reﬂect the paper s contributions and scope? [Yes] The contributions and scope are clearly stated and derived directly from our results. (b) Did you describe the limitations of your work? [Yes] We have explained the limitations of our work thoroughly in the Introduction, Preliminaries and Discussion. (c) Did you discuss any potential negative societal impacts of your work? [No] The work is primarily theoretical, and thus does not have any directly negative societal impacts. (d) Have you read the ethics review guidelines and ensured that your paper conforms to them? [Yes] We have not used any human-derived data in our work, and in general our paper conforms to the guidelines. 2. If you are including theoretical results...

(a) Did you state the full set of assumptions of all theoretical results? [Yes] All assumptions are stated either in the main body or in the supplementary material. (b) Did you include complete proofs of all theoretical results? [Yes] See the supplementary material. 3. If you ran experiments...

(a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] The code and instructions required to reproduce the results are given in several formats within the supplementary material. (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [N/A] Our simulations do not require the above. (c) Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? [N/A] Our simulations are deterministic. (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] We explain the resources and compute used in the supplementary material. 4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets...

(a) If your work uses existing assets, did you cite the creators? [N/A] All assets used are proprietary. (b) Did you mention the license of the assets? [N/A]

(c) Did you include any new assets either in the supplemental material or as a URL? [N/A]

(d) Did you discuss whether and how consent was obtained from people whose data you re using/curating? [N/A] Our work does not include human-derived data. (e) Did you discuss whether the data you are using/curating contains personally identiﬁable information or offensive content? [N/A] 5. If you used crowdsourcing or conducted research with human subjects...

(a) Did you include the full text of instructions given to participants and screenshots, if applicable? [N/A] (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [N/A] (c) Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? [N/A]