# learning_nash_equilibria_in_rank1_games__ce008598.pdf

Published as a conference paper at ICLR 2024

LEARNING NASH EQUILIBRIA IN RANK-1 GAMES

Nikolas Patris University of California, Irvine Archimedes Research Unit npatris@uci.edu

Ioannis Panageas University of California, Irvine ipanagea@ics.uci.edu

Learning Nash equilibria (NE) in games has garnered significant attention, particularly in the context of training Generative Adversarial Networks (GANs) and multi-agent Reinforcement Learning. The current state-of-the-art in efficiently learning NE in games focuses on landscapes that meet the (weak) minty property or games characterized by a unique function, often referred to as potential games. A significant challenge in this domain is that computing Nash equilibria is a computationally intractable task Daskalakis et al. (2009). In this paper we focus on bimatrix games (A, B) called rank-1. These are games in which the sum of the payoff matrices A + B is a rank 1 matrix; note that standard zero-sum games are rank-0. We show that a modification of optimistic mirror descent converges to an ϵ-approximate NE after O 1

ϵ ) iterates in rank-1 games. We achieve this by leveraging structural results about the NE landscape of rank-1 games Adsul et al. (2021). Notably, our approach bypasses the fact that these games do not satisfy the MVI property.

1 INTRODUCTION

Learning in games has its origins in Blackwell s influential work on the approachability theorem (Blackwell, 1956; Abernethy et al., 2011), which paved the way for the development of various learning algorithms with convergence guarantees to different game theoretic solution concepts, including coarse correlated equilibrium (CCE) and Nash equilibrium (NE). For instance, a well-known result in this context is that if both players in a zero-sum game employ a no-regret learning algorithm, the empirical averages of their strategies will converge to a NE. However, the typical guarantees provided by online learning frameworks do not shed light on how the system stabilizes toward a Nash equilibrium when the underlying two-player game is neither zero-sum nor a potential game (Anagnostides et al., 2022b). In such cases, one can only argue that the empirical averages converge to a CCE if both players employ a no-regret learning algorithm. This raises the following fundamental question:

For which classes of two-player games, beyond zero-sum and potential games, can we design online learning algorithms that guarantee convergence to a Nash equilibrium?

The work in Kannan & Theobald (2010) introduced a hierarchy of bimatrix games denoted as (A, B), where the rank of A + B is k. These are referred to as rank-k bimatrix games. Notably, this class encompasses zero-sum games when k is zero. Intriguingly, it has been demonstrated that even for the case of k = 1, the set of Nash equilibria can comprise a potentially large number of disconnected components and lacks convexity. Furthermore, it has been proven that for k 3, computing an exact Nash equilibria in rank k bimatrix games is a PPAD-hard problem Mehta (2018). In other words, it is highly unlikely that a polynomial-time algorithm exists for computing Nash equilibria in such games. On the contrary, rank-1 games have been shown to have polynomial-time algorithms based on linear programming formulations Adsul et al. (2021; 2011), while the complexity of rank-2 games remains an open question.

Published as a conference paper at ICLR 2024

Our contributions and Technical Overview In this paper, we focus on rank-1 games. First we establish that there exist rank-1 games that do not satisfy the so-called Minty criterion and as a result optimistic mirror descent Rakhlin & Sridharan (2013) and extra gradient algorithms Korpelevich (1976) fail to converge to a Nash equilibrium.

On a positive note, we introduce a decentralized learning algorithm that converges to an ϵ-approximate Nash equilibrium in O 1

ϵ iterations, as proven in Theorem 3.7. Both agents in the game utilize an optimistic mirror descent algorithm in a parametrized zero-sum game that evolves over time. The key to our technical analysis lies in how we adapt the underlying parameter, ensuring that when the algorithm terminates, an approximate Nash equilibrium in the parametrized zero-sum game coincides with an approximate Nash equilibrium in the rank-1 game. To provide more context, consider a rank-1 bimatrix game represented as (A, B). We can assume that B = A + ab for some vectors a and b. It turns out that a Nash equilibrium (x, y) of the rank-1 game (A, B) is also a Nash equilibrium of the parametrized zero-sum game (A 1λb , A + 1λb ) for an appropriate choice of the parameter λ, where 1 is the all-ones vector. The primary technical challenge is formulating a suitable optimization problem to ensure that a Nash equilibrium in the parameterized translates to a Nash equilibrium in the rank-1 game, as shown in Lemma 3.4.

Our proposed algorithm is based on optimistic mirror descent applied to the parameterized bilinear function x (A 1λb )y with an additional regularization term (x a λ)2. Depending on the sign of the difference (x a λ), λ is updated appropriately. To prove our claims, we leverage the structural insights of the Nash equilibrium landscape Adsul et al. (2021) and exploit the stationarity properties of the λ-parameterized (strongly) concave-convex function fλ(s, y) = x (A 1λb )y 1

2(x a λ)2. Furthermore, in Appendix C we provide experimental evaluation supporting our theoretical findings.

2 PRELIMINARIES

2.1 NOTATION AND DEFINITIONS

Notation Let R be the set of real numbers, and [n] be the set {1, 2, . . . , n}. We define n as the probability simplex, which is the set of n-dimensional probability vectors, i.e., n := {x Rn : xi 0, Pn i=1 xi = 1}. We use ei to denote the i-th elementary vector, and to refer to the i-th coordinate of x we use xi. The superscripts are used to indicate the timestep at which a vector is referring to.

Normal-form Games We primarily focus on two-player normal-form bimatrix game. We are given a pair of payoff matrices A Rn m and B Rn m where n and m are the respective pure strategies (actions) of the row and the column player. After the players choose their action, we denote (ei, ej) the joint action profile for i [n] and j [m]. Then, their respective payoffs are Aij := e i Aej for the row player and Bij := e i Bej for the column player. Player are also allowed to randomize. A mixed strategy for the row (resp. column) player is a probability distribution x n (resp. y m) over the n rows (resp. m columns). In this case, the expected utility of the row, column player is expressed as x Ay and x By respectively, where (x, y) n m denotes the joint mixed strategy profile.

Definition 2.1 (ϵ-approximate Nash Equilibrium). A strategy profile (x , y ) n m is called an ϵ-approximate Nash equilibrium (NE) if and only if

(x ) Ay x Ay ϵ x n and (x ) By (x ) By ϵ y m

In words, an approximate Nash equilibrium is a strategy profile in which no player can improve their payoff significantly (more than ϵ) by unilaterally changing their strategy, but the strategy profile may not necessarily satisfy the definition of an exact (ϵ = 0) Nash equilibrium.

Published as a conference paper at ICLR 2024

Remark 2.2. We highlight two special cases of an ϵ-approximate Nash equilibrium. Firstly, when ϵ is equal to zero, it is referred to as an exact Nash equilibrium. Secondly, when the support of strategies, supp(x) = {i | xi > 0}, is of size 1, it is called a pure Nash equilibrium.

2.2 RANK-1 GAMES

In this work, we consider bimatrix games (A, B) of fixed rank, and particularly of rank 1. This hierarchy within two-player games was initially introduced by Kannan & Theobald (2010) in an attempt to offer a more detailed characterization of games that (may) admit simple polynomial-time algorithms based on the rank of the matrix (A + B). In a related model, Lipton et al. (2003) investigated games where both payoff matrices are of fixed rank k, and they proposed a simple (quasi) polynomial-time algorithm for constructing an ϵ-approximate NE. This seemingly similar model, however, has a significant drawback; not even zero-sum games, which have polynomial-time algorithms, entirely belong to this class. Motivated by this observation, Kannan & Theobald (2010) proposed that (A + B) should have fixed rank instead. Definition 2.3 (Games of fixed rank). The rank of a bimatrix game (A, B) is the rank of matrix A + B. Remark 2.4. It is apparent that zero-sum games (A, A) are rank-0 games as A+( A) = 0. Additionally, rank-1 games satisfy A + B = ab . Assumption 2.5. As Nash equilibria remain invariant under the transformations of shifting and scaling of the payoff matrices, we assume that the elements of a and b are confined within the interval [ 1, 1]. This assumption simplifies our calculations without impacting any of our results.

The following lemma presents an alternative method for describing a bimatrix game of fixed rank r. Lemma 2.6 (Lemma 5 Adsul et al. (2021)). An (n m) bimatrix game (A, B) of positive rank r can be written as (A, C + ab ) for suitable a Rn, b Rm, and a game (A, C) of rank r 1.

The following lemma, though simple, holds significant importance as it establishes that for a bimatrix game of fixed rank r, the set of its Nash equilibria is equivalent to the intersection of the set of equilibria of a parameterized game of rank r 1 and a hyperplane. Lemma 2.7 (Lemma 6 Adsul et al. (2021)). Let A, C Rn m, x n, y m, a Rn, b Rm, λ R. The following are equivalent:

(a) (x, y) is an equilibrium of (A, C + ab ).

(b) (x, y) is an equilibrium of (A, C + 1λb ) and x a = λ.

(c) (x, y) is an equilibrium of (A 1λb , C + 1λb ) and x a = λ.

Sketch Proof. The equivalence of (a) and (b) holds because the players get in both games the same expected payoffs for their pure strategies. The games in (b) and (c) are strategically equivalent.

While it may not be immediately evident how this rank reduction can be practically applied, the true significance of Lemma 2.7 becomes fully apparent in the context of rank-1 games. Corollary 2.8. Let (A, B) be a bimatrix game of rank 1, i.e. A + B = ab . Then, the following are equivalent.

(a) (x, y) is an equilibrium of (A, A + ab ).

(b) (x, y) is an equilibrium of (A 1λb , A + 1λb ) and x a = λ.

The proof follows trivially from the equivalence of (a) and (c) in Lemma 2.7.

Published as a conference paper at ICLR 2024

Parameterized zero-sum games Corollary 2.8 establishes a connection between rank-1 games and parameterized zero-sum games. While this link initially appears compelling, it remains uncertain whether an algorithm can compute, or more importantly learn, a Nash equilibrium (x, y) satisfying x a = λ at the same time. The primary challenge arises from the fact that optimal strategies vary as the parameter λ changes. It is well-known, however, that zero-sum games can be efficiently solved using various methods, including LP-based techniques Adler (2013); von Stengel (2023) and no-regret algorithms Freund & Schapire (1999); Daskalakis et al. (2011); Syrgkanis et al. (2015); Chen & Peng (2020). Therefore, as long as there is an easy way of updating λ, the parameterized zero-sum game serves as an intermediary to solve the rank-1 game.

An initial attempt was made in Adsul et al. (2011), where they demonstrated that the set of fully-labeled points, which captures all Nash equilibria of the game, of specific best-response polytopes contains only a path that exhibits monotonicity with respect to λ. Additionally, they introduced an algorithm that employs binary search on the parameter to compute an exact Nash equilibrium for the rank-1 game. However, a limitation of this method is its inability to handle non-degenerate games. In their subsequent work Adsul et al. (2021), while retaining their structural insights and building on Adler & Monteiro (1992), they refined their approach. They proposed a polynomial-time algorithm, similar in spirit to the one in Adsul et al. (2011), to compute a Nash equilibrium for any rank-1 game.

Definition 2.9. Let N be the set of the Nash equilibria of the parameterized zero-sum game defined as follows.

N := {(λ, x, y) R Rn Rm | (x, y) is a NE of A 1λb , A + 1λb } (1)

The following key lemma, essential for the algorithm in Adsul et al. (2021), shows that the set N exhibits monotonicity with respect to λ.

Lemma 2.10 (Lemma 17 Adsul et al. (2021)). Let λ λ, x, x n and y, y m so that for N in (1)

λ, x, y N, λ x a.

Then x a = λ for some (λ, x, y) N with λ [λ, λ].

This property serves as the invariant maintained by the algorithm outlined in Adsul et al. (2021). The algorithm essentially employs a binary search with respect to the parameter λ. In brief, at each iteration, it considers the midpoint, λ = (λ + λ)/2, and, based on its value, it solves a series of linear programs. These either return a Nash equilibrium of the zero-sum game A 1λb , A + 1λb for which x a = λ or indicate that none of the Nash equilibrium strategies satisfy this equation. Importantly, in the latter case, and due to the monotonic nature with respect to the parameter, the sign of the difference (x a λ) should be maintained. Therefore, depending on it, either λ or λ is updated accordingly. For instance, in case x a λ < 0 then λ = λ; otherwise, it sets λ = λ 1. According to Lemma 2.10, this step is safe as there exists a Nash equilibrium (x, y) such that x a = λ and λ [λ, λ]. In Lemma 3.5, we extend this property to the case of an approximate Nash equilibrium by leveraging the convexity of Nash equilibria in zero-sum games.

2.3 BASIC BACKGROUND ON MIN-MAX OPTIMIZATION

After establishing the link between rank-1 games and zero-sum games, as demonstrated in Corollary 2.8, this section presents a brief background on min-max optimization. A more compact and thorough presentation is deferred to Appendix B.

1We have intentionally omitted certain technical details of this step that do not impact the core aspects of the algorithm.

Published as a conference paper at ICLR 2024

A Nash equilibrium in zero-sum games represents a solution to a saddle point problem which in its general form (not only for bilinear functions) is min x X max y Y f(x, y) (2)

where X, Y are convex and compact subsets of an Euclidean n-dimensional (m-dimensional resp.) space, and f is a continuously differentiable function, i.e. a smooth function.

Remark 2.11. It is widely recognized that two zero-sum games constitute a special case where the function f(x, y) = x Ay is convex (linear) with respect to x and concave (linear) with respect to y. Then, the celebrated minimax theorem v. Neumann (1928) asserts that min x Y max y Y f(x, y) = max y Y min x X f(x, y).

However, in practical scenarios, such as those in Generative Adversarial Networks (GANs) Daskalakis et al. (2017), the function f does not meet the necessary convex-concave property, and so most optimization techniques for (2) do not to apply. As a result, a significant amount of research has focused instead on different notions of local min-max solutions Daskalakis & Panageas (2018b); Jin et al. (2020). The result in Daskalakis et al. (2021), however, establishes that for general smooth objectives, the computation of even approximate first-order locally optimal min-max solutions is intractable. In light of this intractability result, research has shifted its focus to investigate structural properties that circumvent this barrier, enabling the efficient computation of a solution. This will usually involve imposing some (mild or not) assumptions on the objective function. An example of such an assumption is the existence of a solution to (MVI), as presented below.

Nash Gap The goal is to find a point (x , y ) X Y that is close to the set of Nash equilibria. The notion of closeness is implied in terms of the best response gap, not the distance to the set of NE 2, and is defined with respect to the (Nash) gap as

Gap(x , y ) := max y Y f(x , y) min x X f(x, y ), (Gap)

which is always non-negative and zero if and only if (x , y ) is a NE.

Variational inequalities We introduce the set Z = X Y and the operator F associated with the function f, defined as F(z) = F([ x y ]) = h xf(x,y) yf(x,y) i . It is readily verified that a saddle point problem (2) can be framed as a variational inequality problem; this unifying treatment enables the application of various techniques and results drawn from the vast literature of variational inequalities, Facchinei & Pang (2003). Given the compact convex set Z and an operator F, the Stampacchia Variational Inequality problem consists in finding z Z such that:

F(z ), z z 0 for all z Z (SVI)

In this case, z is referred to as the strong solution to variational inequality problem corresponding to F. From a game-theoretic standpoint, z corresponds to a Nash equilibrium of the underlying game. As the existence of a Nash equilibrium is always guaranteed to exist Nash Jr (1950), we can safely assume that there exists a solution to (SVI).

Most of the existing literature on variational inequality problems Facchinei & Pang (2003) has focused on the monotone case, i.e. f is convex concave. In this context, it is widely acknowledged that every solution to (SVI) is also a solution to the Minty Variational Inequality problem (MVI).

F(z), z z 0 for all z Z (MVI)

2This notion of closeness is commonly referred to in the literature as weak, while the closeness in terms of the distance to a Nash Equilibrium is known as the strong notion Etessami & Yannakakis (2010).

Published as a conference paper at ICLR 2024

In settings beyond the monotone case, all that can be said is that the set of solutions for (MVI) is a subset of the set of solutions for (SVI). An important (Minty) criterion that ensures tractability is the existence of a solution to (MVI). In Mertikopoulos et al. (2018), they introduced the concept of coherence (refer to Definition 2.1), which holds true if every saddle point of f (2), and in turn to (SVI), is a solution to (MVI) and vice versa. In Mertikopoulos et al. (2018); Song et al. (2020); Zhou et al. (2020) (and references therein) it has been established that if the Minty criterion is satisfied (e.g. bilinear problems, quasi-convex-concave objectives) then extra gradient Korpelevich (1976) do converge to a solution of saddle point problem. In recent works, such as Diakonikolas et al. (2021); Pethick et al. (2022), a less stringent criterion, namely the weak Minty criterion, has been explored, and convergence to a solution of (2) was also ensured. Nevertheless, as we will demonstrate shortly in the Section 3.1, even relatively simple rank-1 games fail to satisfy the (weak) Minty criterion.

3 EFFICIENTLY LEARNING NASH EQUILIBRIA IN RANK-1 GAMES

In this section, we sketch our main findings concerning the learning of Nash equilibria in rank-1 games; all the proofs are deferred to Appendix A. We start by presenting two illustrative examples that highlight the inherent challenges of these games. Despite their apparent simplicity, conventional first-order methods fail to converge to a Nash equilibrium. In Section 3.2, we introduce an efficient algorithm for learning a NE, that uses and extends the structural properties detailed in Section 2.2.

3.1 TECHNICAL CHALLENGES

The rank-reduction method presented in Corollary 2.8 transforms a rank-1 game (A, B) into a parameterized zero-sum game (A 1λb , A + 1λb ). A question that arises is whether the set of approximate Nash equilibria of the zero-sum game is properly included in the approximate Nash equilibria of the rank-1 game. Although in the case of an exact NE, we readily verify that it does not hold, the former remains unclear for approximate NE. In other words, can solving the zero-sum game alone provide a solution for the rank-1 game?

The following example provides a negative answer to this question, even when we have complete knowledge of a correct value λ -i.e., corresponding to a Nash equilibrium (x , y ) of the rank-1 game where (x ) a = λ .

Example 3.1 (Full Version in Example A.1). Let A 1λb = 1 1 1 ϵ 1 ϵ . This game has an exact NE of the form (x , y ) = ((1, 0), (µ, 1 µ)) for any µ [0, 1]. Additionally, ( x, y) = (( 1

2)) is an ϵ 2-approximate Nash equilibrium of the zero-sum game (A 1λb , A + 1λb )

x A 1λb y = 1 and x A 1λb y = 1

2 1 2 1 1 1 ϵ 1 ϵ

We choose a = [1 2] and b = [2 1] , and so x a = 1 = λ.

A = 1 1 1 ϵ 1 ϵ

= 3 2 3 ϵ 2 ϵ

and B = 1 1 1 + ϵ ϵ

We confirm that ( x, y) = 1

2 no longer qualifies as an ϵ

2-approximate Nash Equilibrium for the original rank-1 game as the Nash equilibrium condition (inequality) for the column player does not hold.

As described in Section 2.3, there are additional structural properties that enables convergence to a solution even if the game is no longer monotone. As such, our next goal is to investigate whether the Minty criterion (MVI) is satisfied rank-1 games. Unfortunately, but importantly for the scope of this work, it fails to hold.

Published as a conference paper at ICLR 2024

Algorithm 1: Rank-1 Game Solver

Input: x(0) = 1/n n, y(0) = 1/m m Output: (x, y) NE of rank-1 (A, B)

1 for k = 1, . . . , K do

2 (x(k), y(k)) = OMWU(x(k 1), y(k 1), λ(k 1))

3 if (x(k), y(k)) is NE of (A, B) then

4 return (x(k), y(k))

6 λ(k) = λ(k 1) 1 2k sgn x(k) a λ

/* update λ according to sign */

Algorithm 2: OMWU(x, y, λ)

Input: (x(0), y(0)) n m, λ [0, 1] Output: (x, y) such that Gap(x, y) ϵ

1 g(0) X = 0

2 g(0) Y = 0

3 for t = 1, 2, . . . , T do

4 g(t) X = xfλ(x(t), y(t))

5 g(t) Y = yfλ(x(t), y(t))

6 x(t+1) i = x(t) i exp( ηX2g(t) X,i ηXg(t 1) X,i )

i=1 x(t) i exp( ηX2g(t) X,i ηXg(t 1) X,i )

y(t+1) j = y(t) j exp( ηY 2g(t) Y,j ηY g(t 1) Y,j )

j=1 y(t) j exp( ηY 2g(t) Y,j ηXg(t 1) Y,j )

7 return (x(T ), y(T ))

Example 3.2 (Full Version in Example A.2). Let z = (x , y ) be a NE of the bimatrix rank-1 game (A, B) = (A, A + ab ). We show that there exists a game such that the Minty variational inequality (MVI) is not satisfied.

(MVI) F(z), z z 0 z = x y

where F(z) = Ay B x

We examine solutions z = (x , y ) that correspond to the Nash equilibrium of the selected game.

F(z), z z = Ay, B x x x y y

= x x , Ay y y , B x

Let A = 1 0 0 1 and B = 1 2 1 0 . This game has the two pure equilibria (x1, y1) = ((1, 0), (1, 0)) and (x2, y2) = ((0, 1), (0, 1)), and the mixed equilibrium (x3, y3) = (( 1

2)). However, none of these equilibria satisfy the Minty criterion.

3.2 MAIN RESULTS

In this section, we introduce our method, and sketch the main ingredients required for the proof of Theorem 3.7. To facilitate the presentation, we break it down into three main steps. The first step consists of showing that only specific approximate Nash equilibria of the parameterized zero-sum correspond to approximate Nash equilibria of the underlying rank-1 game. Then, we formulate a suitable min-max optimization problem whose stationary points are extendable under conditions to NE. Finally, we present an efficient algorithm for solving the optimization problem.

The following lemma relies on the correspondence established in Corollary 2.8. Although this appears to be applicable solely to exact Nash equilibria, we demonstrate that by relaxing both conditions, it also extends to approximate Nash equilibria.

Lemma 3.3. Let (x, y) be an ϵ-approximate Nash equilibrium of the zero sum game with payoff (A 1λb ) and |x a λ| ϵ. Then (x, y) is an (3ϵ)-approximate Nash equilibrium of the rank-1 game (A, A + 1λb ).

Published as a conference paper at ICLR 2024

A study of Lemma 3.3leads us to the following observation: to compute an approximate Nash equilibrium of the rank-1 game, we should seek a NE of the parameterized zero-sum game which simultaneously minimizes the distance (x a λ). By virtue of this observation, we formulate the following min-max optimization problem; its full proof is deferred to Appendix A. Lemma 3.4. Consider the function fλ(x, y) := x A 1λb y 1

2(x a λ)2. f is concave with respect to x and convex (linear) with respect to y. Moreover, assume that the corresponding zero-sum game has a ϵ-approximate Nash equilibrium (x , y ) with |x a λ| ϵ. Then, any ϵ-approximate stationary point ( x, y) of f must satisfy the following.

1. | x a λ|

2. ( x, y) must be an (6 ϵ)-NE of the zero-sum game (A 1λb , A + 1λb ).

Sketch Proof. Given that ( x, y) is an ϵ-approximate stationary point of f, it satisfies the following variational inequalities (VI).

VI (1) xfλ( x, y), x x = A 1λb y x a λ a, x x ϵ x n

VI (2) yfλ( x, y), y y = A 1λb x, y y ϵ y m

First, we use the fact that f( , y) is a concave function for any y, which implies that it satisfies the inequality fλ(x , y) fλ( x, y) + xfλ( x, y), x x . Therefore, when setting x = x and from VI (1) and the fact |x a λ| ϵ we obtain the following.

x A 1λb y x A 1λb y 1

x a λ 2 + 2ϵ

Furthermore, fλ(x, ) is linear,for any x, which implies that it satisfies the equality fλ( x, y ) = fλ( x, y) + yfλ( x, y), y y for any y m. Therefore, when setting y = y and from VI (2) we obtain the following.

x A 1λb y x A 1λb y ϵ

The last two inequalities follow from the approximate NE definition of (x , y ), with x representing the maximizer and y representing the minimizer.

x A 1λb y x A 1λb y ϵ

x A 1λb y x A 1λb y + ϵ

By combining the above four inequalities, we prove the first claim: | x a λ|

5ϵ. The second claim ( x, y) is a Nash Equilibrium (NE) of the parameterized zero-sum follows from the VI (1), VI (2) alongside the inequalities we showed.

So far, we have verified that any stationary point ( x, y) of fλ constitutes a Nash equilibrium for the rank-1 game under the condition that a NE does exists for that particular λ. In other words, Lemma 3.4 holds true only in the case where the parameter attains the correct value λ i.e., corresponding to a Nash equilibrium (x , y ) of the rank-1 game where (x ) a = λ . This raises the question of how we can determine the correct value of λ. Hopefully, the following simple but important lemma answers this question. Lemma 3.5 (Approximate Nash equilibria maintain sign). Let ( x, y) and (x , y ) be ϵ-approximate Nash equilibria of the parameterized zero-sum game (A 1λb , A + 1λb ), so that x a > λ and x a < λ. Then, there exists (x, y) that is ϵ-approximate NE with x a = λ.

Published as a conference paper at ICLR 2024

The lemma above answers our question in the following way: whenever a stationary point (x , y ) of fλ( , ) fails to be an ϵ-approximate Nash equilibria for the zero-sum and |x a λ| be less than or equal to ϵ, this necessarily means that there does not exist any NE that satisfies that equation for that particular λ; otherwise, any stationary point of fλ would suffice. More importantly, it implies that x a λ must maintain its sign for any ϵ-approximate Nash equilibrium in the zero-sum game for that particular λ. In other words, if there were a pair of Nash equilibria where (x a λ) has opposite signs, then according to Lemma 3.5, it would guarantee the existence of a third solution where this difference equals zero.

In a similar spirit to Lemma 2.10, Lemma 3.5 serves as an invariant in our approach. Algorithm 1 performs binary search to parameter λ. At each iteration, it employs Algorithm 2 to obtain an ϵ-approximate stationary point of fλ. This solution either constitutes a Nash equilibrium of the rank-1 game (A, B) or indicates that no Nash equilibrium exists for that particular λ and the specified precision ϵ. In the latter case, the sign of the difference (x a λ) determines the next value of λ. For instance, if x a λ < ϵ, then λ is updated as λ = λ 1 2k ; otherwise, if x a λ > +ϵ, then λ is set as λ + 1 2k .

In the final part, we introduce an efficient method for obtaining an ϵ-approximate stationary point of fλ for a specific λ. Leveraging the structure of fλ, we utilize an Optimistic Multiplicative Weights Update (OMWU), as detailed in Daskalakis & Panageas (2018a). This method offers two key advantages: it does not require a projection step, which can be computationally expensive, and it achieves faster convergence rates by incorporating optimism. A more comprehensive presentation of optimistic methods is deferred to Appendix B. Lemma 3.6 provides us with a specific number of iterations required.

Lemma 3.6. Let (x(0), y(0)) be initialized as uniform vectors (1/n, 1/m). Suppose that both players employ OMWU for T = Ω (ln n+ln m)

ϵ with learning rates ηX = ηY = 1 16

2 max{ A 2, a 2} and fixed λ. Then,

Algorithm 2 returns a pair of strategies

x(T ), y(T ) that guarantees the following.

x(T ), y(T ) = max x n fλ x, y(T ) min y m fλ

x(T ), y ϵ.

Also, it holds that

x(T ), y(T ) is an ϵ-approximate stationary point of fλ.

We are now prepared to present the main technical theorem that integrates all the components established thus far into a concise statement

Theorem 3.7 (Main Result). Let (A, B) be a rank-1 game. Suppose that both players employ OMWU for

T = Ω (log n+log m)

ϵ ) . Then, Algorithm 1 returns a pair of strategies (x , y ) that constitutes an 18ϵ-approximate Nash equilibrium of the rank-1 game.

4 CONCLUSION

In this work, we examined the structural properties, as well as challenges, of rank-1 games, an important class the generalizes over the well-understood zero-sum games. We introduced a decentralized learning procedure that provable converges to a Nash equilibrium of the game. Nevertheless, a number of important questions remain open. Proving a matching lower bound will certify the optimality of our solution. However, given that our approach hinges on the idea of reducing a rank-1 game into a zero-sum game and then optimizing a carefully crafted minmax problem, we presume that, unless an entirely different methodology or accelerated methods are considered, our solution remains optimal. Furthermore, a fundamental question is whether our approach, or one similar to it, can be extended further for rank-2 games.

Published as a conference paper at ICLR 2024

5 ACKNOWLEDGEMENTS

We are grateful to anonymous reviewers at ICLR for valuable feedback. Ioannis Panageas would like to acknowledge startup grant from UCI and UCI ICS Research Award. Part of this work was conducted while Nikolas and Ioannis were visiting Archimedes Research Unit. This work has been partially supported by project MIS 5154714 of the National Recovery and Resilience Plan Greece 2.0 funded by the European Union under the Next Generation EU Program.

Jacob Abernethy, Elad Hazan, and Alexander Rakhlin. Competing in the dark: An efficient algorithm for bandit linear optimization. In 21st Annual Conference on Learning Theory, COLT 2008, 2008.

Jacob D. Abernethy, Peter L. Bartlett, and Elad Hazan. Blackwell approachability and no-regret learning are equivalent. In COLT 2011 - The 24th Annual Conference on Learning Theory, volume 19 of JMLR Proceedings, pp. 27 46. JMLR.org, 2011.

Ilan Adler. The equivalence of linear programs and zero-sum games. International Journal of Game Theory, 42:165 177, 2013.

Ilan Adler and Renato DC Monteiro. A geometric view of parametric linear programming. Algorithmica, 8: 161 176, 1992.

Bharat Adsul, Jugal Garg, Ruta Mehta, and Milind Sohoni. Rank-1 bimatrix games: a homeomorphism and a polynomial time algorithm. In Proceedings of the forty-third annual ACM symposium on Theory of computing, pp. 195 204, 2011.

Bharat Adsul, Jugal Garg, Ruta Mehta, Milind Sohoni, and Bernhard Von Stengel. Fast algorithms for rank-1 bimatrix games. Operations Research, 69(2):613 631, 2021.

Ioannis Anagnostides, Gabriele Farina, Ioannis Panageas, and Tuomas Sandholm. Optimistic mirror descent either converges to nash or to strong coarse correlated equilibria in bimatrix games. Advances in Neural Information Processing Systems, 35:16439 16454, 2022a.

Ioannis Anagnostides, Ioannis Panageas, Gabriele Farina, and Tuomas Sandholm. On last-iterate convergence beyond zero-sum games. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato (eds.), International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pp. 536 581. PMLR, 2022b. URL https://proceedings.mlr.press/v162/anagnostides22a.html.

David Blackwell. An analog of the minimax theorem for vector payoffs. Pacific Journal of Mathematics, 6 (1):1 8, 1956.

Xi Chen and Binghui Peng. Hedging in games: Faster convergence of external and swap regrets. Advances in Neural Information Processing Systems, 33:18990 18999, 2020.

Chao-Kai Chiang, Tianbao Yang, Chia-Jung Lee, Mehrdad Mahdavi, Chi-Jen Lu, Rong Jin, and Shenghuo Zhu. Online optimization with gradual variations. In Conference on Learning Theory, pp. 6 1. JMLR Workshop and Conference Proceedings, 2012.

Constantinos Daskalakis and Ioannis Panageas. Last-iterate convergence: Zero-sum games and constrained min-max optimization. ar Xiv preprint ar Xiv:1807.04252, 2018a.

Published as a conference paper at ICLR 2024

Constantinos Daskalakis and Ioannis Panageas. The limit points of (optimistic) gradient descent in min-max optimization. Advances in neural information processing systems, 31, 2018b.

Constantinos Daskalakis, Paul W Goldberg, and Christos H Papadimitriou. The complexity of computing a nash equilibrium. Communications of the ACM, 52(2):89 97, 2009.

Constantinos Daskalakis, Alan Deckelbaum, and Anthony Kim. Near-optimal no-regret algorithms for zerosum games. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms, pp. 235 254. SIAM, 2011.

Constantinos Daskalakis, Andrew Ilyas, Vasilis Syrgkanis, and Haoyang Zeng. Training gans with optimism. ar Xiv preprint ar Xiv:1711.00141, 2017.

Constantinos Daskalakis, Stratis Skoulakis, and Manolis Zampetakis. The complexity of constrained min-max optimization. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pp. 1466 1478, 2021.

Jelena Diakonikolas, Constantinos Daskalakis, and Michael I. Jordan. Efficient methods for structured nonconvex-nonconcave min-max optimization. In Arindam Banerjee and Kenji Fukumizu (eds.), The 24th International Conference on Artificial Intelligence and Statistics, AISTATS 2021, April 13-15, 2021, Virtual Event, volume 130 of Proceedings of Machine Learning Research, pp. 2746 2754. PMLR, 2021. URL http://proceedings.mlr.press/v130/diakonikolas21a.html.

Kousha Etessami and Mihalis Yannakakis. On the complexity of nash equilibria and other fixed points. SIAM Journal on Computing, 39(6):2531 2597, 2010.

Francisco Facchinei and Jong-Shi Pang. Finite-dimensional variational inequalities and complementarity problems. Springer, 2003.

Yoav Freund and Robert E Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1-2):79 103, 1999.

Elad Hazan and Satyen Kale. Extracting certainty from uncertainty: Regret bounded by variation in costs. Machine learning, 80:165 188, 2010.

Chi Jin, Praneeth Netrapalli, and Michael I. Jordan. What is local optimality in nonconvex-nonconcave minimax optimization? In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pp. 4880 4889. PMLR, 2020. URL http://proceedings.mlr.press/v119/jin20e.html.

Ravi Kannan and Thorsten Theobald. Games of fixed rank: A hierarchy of bimatrix games. Economic Theory, 42:157 173, 2010.

Galina M Korpelevich. The extragradient method for finding saddle points and other problems. Matecon, 12: 747 756, 1976.

Stefanos Leonardos, Will Overman, Ioannis Panageas, and Georgios Piliouras. Global convergence of multi-agent policy gradient in markov potential games. ar Xiv preprint ar Xiv:2106.01969, 2021.

Richard J Lipton, Evangelos Markakis, and Aranyak Mehta. Playing large games using simple strategies. In Proceedings of the 4th ACM Conference on Electronic Commerce, pp. 36 41, 2003.

Ruta Mehta. Constant rank two-player games are ppad-hard. SIAM Journal on Computing, 47(5):1858 1887, 2018. doi: 10.1137/15M1032338.

Published as a conference paper at ICLR 2024

Panayotis Mertikopoulos, Bruno Lecouat, Houssam Zenati, Chuan-Sheng Foo, Vijay Chandrasekhar, and Georgios Piliouras. Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile. ar Xiv preprint ar Xiv:1807.02629, 2018.

John F Nash Jr. Equilibrium points in n-person games. Proceedings of the national academy of sciences, 36 (1):48 49, 1950.

Arkadij Semenoviˇc Nemirovskij and David Borisovich Yudin. Problem complexity and method efficiency in optimization. In Wiley-Interscience, 1983.

Francesco Orabona. A modern introduction to online learning. ar Xiv preprint ar Xiv:1912.13213, 2019.

Francesco Orabona. Saddle-Point Optimization With Optimism, November 2022. URL https:// parameterfree.com/2022/11/07/saddle-point-optimization-with-optimism. [Online; accessed 15. Mar. 2024].

Thomas Pethick, Puya Latafat, Panos Patrinos, Olivier Fercoq, and Volkan Cevher. Escaping limit cycles: Global convergence for constrained nonconvex-nonconcave minimax problems. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=2_ vhk AMARk.

Sasha Rakhlin and Karthik Sridharan. Optimization, learning, and games with predictable sequences. Advances in Neural Information Processing Systems, 26, 2013.

Shai Shalev-Shwartz. Thesis submitted for the degree of Doctor of Philosophy . Ph D thesis, Thesis submitted for the degree of doctor of philosophy, 2007.

Maurice Sion. On general minimax theorems. Pacific Journal of Mathematics, 8(1):171 176, 1958.

Chaobing Song, Zhengyuan Zhou, Yichao Zhou, Yong Jiang, and Yi Ma. Optimistic dual extrapolation for coherent non-monotone variational inequalities. Advances in Neural Information Processing Systems, 33: 14303 14314, 2020.

Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, and Robert E Schapire. Fast convergence of regularized learning in games. Advances in Neural Information Processing Systems, 28, 2015.

J v. Neumann. Zur theorie der gesellschaftsspiele. Mathematische annalen, 100(1):295 320, 1928.

Bernhard von Stengel. Zero-sum games and linear programming duality. Mathematics of Operations Research, 2023.

Zhengyuan Zhou, Panayotis Mertikopoulos, Nicholas Bambos, Stephen P Boyd, and Peter W Glynn. On the convergence of mirror descent beyond stochastic convex programming. SIAM Journal on Optimization, 30 (1):687 716, 2020.

Published as a conference paper at ICLR 2024

A.1 TWO-EXAMPLES OF SECTION 3.1

Example A.1. In this example, we demonstrate that an approximate Nash equilibrium of the zero-sum game (A 1λb , A + 1λb ) does not constitute an approximate Nash Equilibrium of the rank-1 game, even when we know a correct value of λ.

Let A 1λb = 1 1 1 ϵ 1 ϵ . This game has an exact NE of the form (x , y ) = ((1, 0), (µ, 1 µ)) for any µ [0, 1]. Additionally, ( x, y) = (( 1

2)) is an ϵ

2-approximate Nash equilibrium of the zero-sum game (A 1λb , A + 1λb )

x A 1λb y = 1 and x A 1λb y = 1

2 1 2 1 1 1 ϵ 1 ϵ

We now choose a = [1 2] and b = [2 1] . Thus, setting λ = 1 := x a the rank-1 game is defined as follows.

A = 1 1 1 ϵ 1 ϵ

= 3 2 3 ϵ 2 ϵ

and B = 1 1 1 + ϵ ϵ

We confirm that ( x, y) = 1

2 no longer qualifies as an ϵ

2-approximate Nash equilibrium for the original rank-1 game as the condition for the column player does not hold.

x B y x B 1 0

Specifically, this example highlights that irrespective of our knowledge of λ, an approximate NE of the zero-sum game that does not satisfy x a λ, cannot be a Nash Equilibrium in the original rank-1 game. Example A.2. Let z = (x , y ) be a NE of the bimatrix rank-1 game (A, B) = (A, A + ab ). We introduce the operator F,

F(z) = F x y

i.e. the gradient operator with respect to the underlying rank 1 game. We note that as both players aim to maximize their payoff, the considered gradients come with a negative sign.

As described in Section 2.3, a recent literature line investigates structural properties of games that enable learning procedure to converge to a Nash equilibrium, beyond the case of a monotone operator F. An example of compelling criterion that guarantees convergence is the existence of solution to the Minty variational inequality (MVI) problem Mertikopoulos et al. (2018) or the weak MVI problem Diakonikolas et al. (2021). In this example, we show that there exists a game such that neither of those criterion does not hold.

(MVI) F(z), z z 0 for all z = x y

(weak MVI) F(z), z z ρ

2 F(z) 2 2 for all z = x y

where ρ 0, 1

Published as a conference paper at ICLR 2024

The parameter L is the Lipschitz constant of the operator F with respect to the ℓ2 norm: F(z) F(z ) 2 L z z 2. Now, as the set of solutions to the (weak) MVI is a subset to the solutions set of (SVI), we only have to examine z = (x , y ) that correspond to the Nash equilibrium of the following game.

A = 1 0 0 1

B = 1 2 1 0

F(z), z z = Ay, B x x x y y

= x x , Ay y y , B x

To calculate the Lipschitz constant, we can simply find the spectral norm of the block matrix C = h 0 A

that is, L = C 2 = 2.288. This game has the two pure equilibria (x1, y1) = ((1, 0), (1, 0)) and (x2, y2) = ((0, 1), (0, 1)), and the mixed equilibrium (x3, y3) = (( 1

2)). It is important to note that when setting z = (x, y) to be another NE, the inequality must necessarily be less than or equal to zero ( ), as the constituent terms (without the minus sign) are greater than or equal to zero ( ) by the definition of NE.

- (x , y ) = (x1, y1) and (x, y) = (x2, y2).

x2 x1, Ay2 = [ 1 1] 0 1

- (x , y ) = (x2, y2) and (x, y) = (x1, y1).

x1 x2, Ay1 = [1 1] 1 0

- (x , y ) = (x3, y3) and (x, y) = (x1, y1).

x1 x3, Ay1 = 3

Repeating the same procedure for the second term y y , B x , we get that F(z2), z2 z1 = 2, F(z1), z1 z2 = 4, F(z1), z1 z3 = 9/4. Furthermore, it is relative straightforward to verify that F(z1) 2 2 = 6, F(z2) 2 2 = 2, F(z3) 2 2 = 1. We readily conclude that neither the Minty or weak Minty criterion is satisfied.

A.2 PROOF OF LEMMA 3.3

Lemma A.3 (Proof of Lemma 3.3). Let (x, y) be an ϵ-approximate Nash equilibrium of the zero sum game with payoff (A 1λb ) and |x a λ| ϵ. Then (x, y) is an (3ϵ)-approximate Nash equilibrium of the rank-1 game (A, A + 1λb ).

Proof. From the definition of the Nash equilibrium for the row player, it holds that

Published as a conference paper at ICLR 2024

x A 1λb y x A 1λb y ϵ for every x n

Since any x n satisfies x 1 = 1, it follows that x 1(λb ) = (x 1)(λb ) = (λb ).

x Ay x Ay ϵ for every x n (3)

Thus, Equation (3) satisfies the Nash equilibrium condition of row player for the rank-1 game (A, A+ab ); hence they play approximate best response. Similarly, the Nash equilibrium condition for the column player yields the following.

x A 1λb y x A 1λb y + ϵ

x Ay x Ay λb (y y ) + ϵ

x Ay x Ay + x a + ϵ b (y y ) + ϵ

x Ay x Ay x ab (y y ) + ϵ b (y y ) + ϵ

x A ab y x A ab y + 3 ϵ (4)

The last inequality follows from application the Cauchy-Schwarz 3 inequality and the triangle inequality: b, y y b y y 1 maxj [m]|bj| ( y 1 + y 1) = 2. Equations (3) and (4) prove that (x, y) satisfy the NE conditions of the game (A, A + ab ). Thus, (x, y) is an (3ϵ)-approximate NE.

A.3 PROOF OF LEMMA 3.4

Lemma A.4 (Proof of Lemma 3.4). Consider the function fλ(x, y) := x A 1λb y 1

2(x a λ)2. f is concave with respect to x and convex (linear) with respect to y. Moreover, assume that the corresponding zero-sum game has a ϵ-approximate Nash equilibrium (x , y ) with |x a λ| ϵ. Then, any ϵ-approximate stationary point ( x, y) of f must satisfy the following.

1. | x a λ|

2. ( x, y) must be an (6 ϵ)-NE of the zero-sum game (A 1λb , A + 1λb ).

Proof. Given that ( x, y) is an ϵ-approximate stationary point of f, it satisfies the following variational inequalities (VI).

VI (1) xfλ( x, y), x x = A 1λb y x a λ a, x x ϵ x n

VI (2) yfλ( x, y), y y = A 1λb x, y y ϵ y m

First, we leverage the fact that f( , y) is a concave function for any y, which implies that it satisfies the inequality fλ(x , y) fλ( x, y) + xfλ( x, y), x x . Therefore, when setting x = x and from VI (1) and the fact |x a λ| ϵ we obtain the following.

x A 1λb y 1

2 x a λ 2 x A 1λb y 1

x a λ 2 + ϵ

3In its general form, the Cauchy-Schwarz inequality states | a, b | a b , where is any norm and its dual.

Published as a conference paper at ICLR 2024

x A 1λb y ϵ2

2 x A 1λb y 1

x a λ 2 + ϵ

x A 1λb y x A 1λb y 1

x a λ 2 + 2ϵ (5)

Furthermore, fλ(x, ) is linear,for any x, which implies that it satisfies the equality fλ( x, y ) = fλ( x, y) + yfλ( x, y), y y for any y m. Therefore, when setting y = y and from VI (2) we obtain the following.

x A 1λb y 1

2 x a λ 2 x A 1λb y 1

x A 1λb y x A 1λb y ϵ (6)

The last two inequalities follow from the approximate NE definition of (x , y ), with x representing the maximizer and y representing the minimizer.

x A 1λb y x A 1λb y ϵ (7)

x A 1λb y x A 1λb y + ϵ (8)

We now have to combine appropriately the above four inequalities to prove the claim.

(6) x A 1λb y x A 1λb y + ϵ

(7) x A 1λb y + 2ϵ

(8) x A 1λb y + 3ϵ

(5) x A 1λb y 1

x a λ 2 + 5ϵ

x A 1λb y x A 1λb y 1

x a λ 2 + 5ϵ

The fact that ( x, y) constitutes a Nash Equilibrium (NE) of the parameterized zero-sum arises from the VI (1), VI (2) in conjunction with the inequality we have just established.

VI (2) A 1λb x, y y ϵ x A 1λb y x A 1λb y + ϵ

VI (1) A 1λb y x a λ a, x x ϵ

x A 1λb y x A 1λb y ( x a λ) a, x x ϵ

x A 1λb y x A 1λb y 2

x A 1λb y x A 1λb y 6 ϵ

Equation (9) follows again from the Cauchy-Schwarz inequality along with the fact that |x a λ|

5ϵ. Therefore, we conclude that ( x, y) is an (6 ϵ)-approximate NE of the zero-sum game (A 1λb , A + 1λb ).

Published as a conference paper at ICLR 2024

A.4 PROOF OF LEMMA 3.5

Lemma A.5 (Proof of Lemma 3.5). Let ( x, y) and (x , y ) be ϵ-approximate Nash equilibria of the parameterized zero-sum game (A 1λb , A + 1λb ), so that x a > λ and x a < λ. Then, there exists (x, y) that is ϵ-approximate NE with x a = λ.

Proof. It is a well-established fact that the set of (approximate) Nash equilibria (NE) in a zero-sum game forms a convex set. In other words, any convex combination µ( x, y)+(1 µ)(x , y ), where µ [0, 1], also constitutes a NE. Consequently, we consider a combination denoted as x, with the property that x a = λ. As explained earlier, this corresponding convex combination qualifies as an ϵ-NE.

A.5 PROOF OF MAIN RESULT

Theorem A.6 (Proof of Theorem 3.7). Let (A, B) be a rank-1 game. Suppose that both players employ OMWU for T = Ω (log n+log m)

ϵ ) . Then, Algorithm 1 returns a pair of strategies (x , y ) that constitutes an 18ϵ-approximate Nash equilibrium of the rank-1 game.

Proof. For a specific λ, we consider two cases. First, assuming that λ is correct, meaning that there exists an O(ϵ)-approximate Nash equilibrium for the rank-1 game, if we run Algorithm 2 for T O (log n+log m)

iterations, we obtain a pair ( x, y) that is an ϵ2-approximate point of fλ. By combining Lemma 3.4 and Lemma 3.3, this pair constitutes a 18ϵ-approximate Nash equilibrium of the rank-1 game.

Conversely, if the value of λ is incorrect, it must be updated. Considering the sign of the difference (x a λ), the parameter is updated according to the rule in Algorithm 1. Since Algorithm 1 needs at most K = log 1

binary search steps, we conclude the aforementioned complexity.

B MIN-MAX OPTIMIZATION

B.1 BASIC BACKGROUND

In this section, we provide a concise overview of how online learning, particularly regret minimization, can be employed to solve a saddle point problem. Most of the results described here can be referenced in Orabona (2019). We examine the general saddle point problem, as defined in Section 2.3.

Definition B.1 (Saddle point problem). Let X Rn and X Rm and f : X Y R. A point (x , y ) X Y is a saddle-point of f in X Y if

f(x , y) f(x , y ) f(x, y ), x X, y Y

We first describe the scenario where f is a convex-concave function, meaning that f( , y) is convex for any y Y , and f(x, ) is concave for any x X. In this case, the minimax theorem v. Neumann (1928); Sion (1958) states that:

Theorem B.2 (Minimax theorem v. Neumann (1928); Sion (1958)). Let X and Y be compact, convex subsets of Rn and Rm respectively. Let f : X Y R a continuous, convex-concave function. Then, we have that

min x X max y Y f(x, y) = max y Y min x X f(x, y)

Published as a conference paper at ICLR 2024

Online Learning The saddle point problem can be reformulated as a regret minimization problem as follows. We focus on the first player, whose goal is to minimize the function f(x) = maxy Y f(x, y), and Player 2, whose objective is to maximize the function f(y) = minx X f(x, y).

Then the standard suboptimality gap, that describes the improvent of Player 1 over time, is

f(x(t)) min x X f(x) = max y Y f(x(t), y) min x X max y Y f(x, y) (Gap X)

Similarly, for Player 2, the corresponding measure would be:

max y X f(x, y) f(y(t)) = max y Y min x X f(x, y) min x X f(x, y(t)) (Gap Y )

By adding these two measures and combining them with Sion s minimax theorem, we obtain the suboptimality (Nash) gap.

max y Y f(x(t), y) min x X max y Y f(x, y)+max y Y min x X f(x, y) min x X f(x, y(t)) = max y Y f(x(t), y) min x X f(x, y(t))

Regret We describe the process from the perspective of Player 1. In an online learning framework, the learner chooses a strategy x(t) X at every round t [T]. Then, the learner receives a linear loss function ℓ(t) X (x). Based on the feedback received thus far, they update their strategy for the next round. The standard objective is to minimize the cumulative external regret, defined as follows.

Regret(T ) X (x) :=

τ=1 ℓ(τ) X (x)

τ=1 ℓ(τ) X (x(τ)),

where t [T] is the time horizon. In other words, performance is evaluated based on the optimal fixed strategy in hindsight. To provide further precision, it s worth noting that regret is typically defined with respect to the strategy x that maximizes Regret(T ) X (x). Nevertheless, for the sake of clarity in our explanation, we will retain the previously stated definition.

Optimistic Regret Minimization Indeed, it is widely acknowledged that a wide range of online learning algorithms, including Follow-The-Regularized-Leader (FTRL) Abernethy et al. (2008); Hazan & Kale (2010) and Mirror Descent Nemirovskij & Yudin (1983), can guarantee O(

T) regret under any sequence of bounded utilities. However, when dealing with more predictable utilities, one can achieve even stronger convergence guarantees. In recent years, several studies have explored this direction Chiang et al. (2012); Rakhlin & Sridharan (2013); Syrgkanis et al. (2015). In our work, we focus on the optimistic mirror descent algorithm, as detailed in Rakhlin & Sridharan (2013).

Optimistic Mirror Descent Suppose now that each player employs optimistic mirror descent. The feedback that Player 1 receives in each round is set to ℓ(t)(x) = g(t) X , x where g(t) X = xf(x(t), y(t)), while Player 2 receives ℓ(t)(y) = g(t) Y , y where g(t) Y = yf(x(t), y(t)). As usual, we consider the "one-recency bias" prediction Syrgkanis et al. (2015), wherein g(t) = g(t 1). Assumption B.3. Assume that f : X Y R is smooth in an open interval containing its domain, in the sense that for any x, x X and any y, y Y , we have the following.

xf(x, y) xf(x , y) X, LXX x x X xf(x, y) xf(x, y ) X, LXY y y Y yf(x, y) yf(x , y) Y, LXY x x X

Published as a conference paper at ICLR 2024

Algorithm 3: Solving Saddle Point Problem with Optimistic OMD Orabona (2022)

1 Require: λX > 0, λY > 0, x1 X, y1 Y

2 g(0) X = 0, g(0) Y = 0

3 for t = 1, , T do

4 g(t) X = xf(x(t), y(t))

5 g(t) Y = yf(x(t), y(t))

6 x(t+1) arg minx X 2g(t) X g(t 1) X , x + λXBψX(x; x(t))

7 y(t+1) arg miny Y 2g(t) Y g(t 1) Y , y + λY BψY (y; y(t))

8 return x(T ) = 1

t=1 x(t), y(T ) = 1

yf(x, y) yf(x, y ) Y, LY Y y y Y where x and y denote the gradients with respect to x and y respectively. We denote by ( X, )X, and ( Y , )Y, the pair of dual norms in X and Y .

The following theorem, originally proven in Rakhlin & Sridharan (2013), has been cited from Orabona (2022). Theorem B.4. Orabona (2022) Let f : X Y R a convex-concave function satisfying the Assumption B.3. For a fixed α > 0, let λX 2

2(LXX + LXY /α) and λY 2

2(LY Y + LXY /α). Let ψX : X R be 1-strongly convex w.r.t. X and ψY : Y R be 1-strongly convex w.r.t. Y . Assume arg maxy Y f(x(T ), y) and arg minx X f(x, y(T )) are non-empty. Then, we have

max y Y f(x(T ), y) min x X f(x, y(T )) BψX(x(T ); x(1)) + BψY (y(T ); y(1)) + g(1) X 2 X, λX + g(1) Y 2 Y, λY T

where x(T ) = 1

t=1 x(t), y(T ) = 1

Lemma B.5 (Lemma 16 Shalev-Shwartz (2007)). ψ(x) = n P

i=1 xi ln xi is 1-strongly convex with respect to

the L1 norm over the simplex n := {x Rn : xi 0, x 1 = 1}.

B.2 PROOF OF LEMMA 3.6

In the proof of the lemma, we consider the function fλ(x, y) = x (A 1λb )y 1

2(x a λ)2, where x is the maximizer and y is the minimizer in our context, and we define X := n and Y := m. Due to this, we select the negative entropy as the regularizer for both X and Y , that is, ψX(x) = Pn i=1 xi ln xi. According to Lemma B.5, this choice ensures that the regularizer is 1-strongly convex concerning the L1 norm, satisfying some of the conditions outlined in Theorem B.4.

Next, we examine whether Assumption B.3 are satisfied. Instead of deriving each constant individually, we utilize Proposition B.6. The Hessian matrix of fλ is a block matrix composed of four blocks, Hfλ = aa A A 0

. Therefore, we can readily bound each constant LXX, LXY , LY Y by 4 max{ A 2, a 2}.

Proposition B.6 (Claim C.2 Leonardos et al. (2021)). Consider a symmetric block matrix C with n n matrices so that Cij 2 L. Then, it holds that C 2 n L. In other words, if all block matrices have spectral norm at most L, then C has spectral norm at most n L.

Published as a conference paper at ICLR 2024

Regarding the initial Bregman divergences BψX(x(T ); x(1)) and BψY (y(T ); y(1)), it s worth noting that when x(1) and y(1) are initialized to the uniform distribution, they are both upper-bounded by ln n and ln m, respectively:

BψX(u; x) =

i=1 ui ln ui + ln n ln n and BψY (u; y) =

j=1 uj ln uj + ln m ln m

Lastly, we will bound the dual norm of the gradients vector. For the sake of completeness, we will present the upper bound for g(1) X 2 X, .

g(1) X X, = g(1) X (A 1λb )y + |x a λ| a max i [n]{(A 1λb )iy} + 3 a (Amax + 1) + 3

g(1) Y Y, = g(1) Y (A 1λb ) x (Amax + 1)

Lemma B.7 (Proof of Lemma 3.6). Let (x(0), y(0)) be initialized as uniform vectors (1/n, 1/m). Suppose that both players employ OMWU for T = Ω (ln n+ln m)

ϵ with learning rates ηX = ηY =

2 max{ A 2, a 2} and fixed λ. Then, Algorithm 2 returns a pair of strategies

x(T ), y(T ) that guarantees the following.

x(T ), y(T ) = max x n fλ x, y(T ) min y m fλ

x(T ), y ϵ.

Also, it holds that

x(T ), y(T ) is an ϵ-approximate stationary point of fλ.

Proof. As mentioned previously, all the assumptions and required conditions of Theorem B.4 have been satisfied. By setting α = 1, λX = λY = 16

2 max{ A 2, a 2}, bounding g(1) X 2 X, , g(1) Y 2 Y, by (Amax + 4) and T O( ln n+ln m

ϵ ), we obtain the following results from Theorem B.4.

max x n fλ(x, y(T )) min y m fλ(x, y(T )) ϵ

The fact that (x(T ), y(T )) is an ϵ-stationary point can be deduced from the definition of the Nash Gap, the non-negativity of both terms, and the definition of a convex-concave function for fλ.

C EXPERIMENTS

In this section, we provide experimental evaluation supporting our theoretical finding. Here we restrict to a single example, while we also provide an anonymous repository containing multiple rank-1 games of multiple sizes

0.10 0.50 0.50 0.00 0.30 0.00 0.10 0.30 0.10 0.20 0.10 0.20 0.30 0.00 0.30 0.30 0.20 0.10 0.20 0.00 0.40 0.20 0.30 0.40 0.30

0.60 0.80 0.80 0.00 0.10

0.30 1.00 0.50 0.60 0.10

Published as a conference paper at ICLR 2024

Figure 1: In Figure 1a, we observe how the strategies of the players evolve over time when they employ vanilla OGA, alongside with the Nash gap as defined in (Gap). In Figure 1b, we plot the respective trajectories on the simplex signifying that OGA enter a limit cycle.

We constructed a random 6 6 game defined by the matrix A, and the vectors a, b. Then, the payoff matrix of the second player is derived as usual: B = A + ab . Our solution delineated in Section 3.2 is compared against a (vanilla) optimistic variant of mirror descent. More specifically, our tests were performed with (optimistc) gradient ascent (OGA) instead of multiplicative weights to avoid any numerical imprecision that might arise due to exponentiation.

In Figure 1, we (experimentally) verify that OGA enters a limit cycle, whereas our solution Figure 2 reaches a Nash equilibrium. While this game may not have a natural interpretation initially, given their random construction, it is worth noting that constructing random games where OGA provably enters a limit cycle is rather challenging. This difficulty is further explained by the recent work Anagnostides et al. (2022a).

Published as a conference paper at ICLR 2024

Figure 2: In Figure 2a, we observe how the strategies of the players evolve over time when they employ Algorithm 1, alongside with the Nash gap as defined in (Gap). In Figure 2b, we plot the respective trajectories on the simplex as well.