# exploitability_minimization_in_games_and_beyond__4eca286d.pdf

Exploitability Minimization in Games and Beyond

Denizalp Goktas Department of Computer Science

Brown University Providence, RI 02906, USA denizalp_goktas@brown.edu

Amy Greenwald

Brown University Providence, RI 02906, USA amy_greenwald@brown.edu

Pseudo-games are a natural and well-known generalization of normal-form games, in which the actions taken by each player a ect not only the other players payo s, as in games, but also the other players strategy sets. The solution concept par excellence for pseudo-games is the generalized Nash equilibrium (GNE), i.e., a strategy pro le at which each player s strategy is feasible and no player can improve their payo s by unilaterally deviating to another strategy in the strategy set determined by the other players strategies. The computation of GNE in pseudo-games has long been a problem of interest, due to applications in a wide variety of elds, from environmental protection to logistics to telecommunications. Although computing GNE is PPAD-hard in general, it is still of interest to try to compute them in restricted classes of pseudo-games. One approach is to search for a strategy pro le that minimizes exploitability, i.e., the sum of the regrets across all players. As exploitability is nondi erentiable in general, developing e cient rst-order methods that minimize it might not seem possible at rst glance. We observe, however, that the exploitability-minimization problem can be recast as a min-max optimization problem, and thereby obtain polynomial-time rstorder methods to compute a re nement of GNE, namely the variational equilibria (VE), in convex-concave cumulative regret pseudo-games with jointly convex constraints. More generally, we also show that our methods nd the stationary points of the exploitability in polynomial time in Lipschitz-smooth pseudo-games with jointly convex constraints. Finally, we demonstrate in experiments that our methods not only outperform known algorithms, but that even in pseudo-games where they are not guaranteed to converge to a GNE, they may do so nonetheless, with proper initialization.

1 Introduction

Nash equilibrium [1] is the canonical solution concept used to predict the outcome of models of rational agents known as games. A game comprises a set of players, each of whom simultaneously chooses a strategy (or action) from a strategy set, and then receives a payo based on all the players choices. A Nash equilibrium (NE) is a strategy pro le, i.e., a collection of strategies, one per player, from which no player can improve their payo via a unilateral deviation.

Recently, the literature on NE has expanded beyond classic economic applications to the question of its computation, a problem known as the Nash equilibrium problem (NEP) [2, 3]. Unfortunately, it is unlikely that there exists an e cient algorithm to compute NE, as NEP is PPAD-complete, i.e., NEP is equivalent to the problem of computing solutions to a class of xed-point problems known as PPAD [4], which are believed to be computationally intractable [5, 6, 7]. Nonetheless, there has been a great deal of recent progress developing e cient algorithms to compute NE in special classes of games, such as two-player zero-sum [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,

36th Conference on Neural Information Processing Systems (Neur IPS 2022).

26, 27, 28, 29, 30, 31], potential [32, 33, 34], and monotone (or, more generally, variationally stable) games [35, 36, 37, 38, 39, 40, 41]. Interest in two-player zero-sum games, for example, stems in part from emerging applications in machine learning, e.g., the training of GANs [42, 43].

A natural generalization of games are pseudo-games or abstract economies, rst introduced by Arrow and Debreu [44] in the context of markets, in which the actions taken by each player a ect not only the other players payo s, as in games, but also the other players strategy sets. In other words, players (simultaneously) take actions that can restrict the other players strategy sets. Such a model is not technically a game, giving rise to the pseudo-game terminology. Pseudo-games generalize games, and hence are even more widely applicable. Some recently studied applications include adversarial classi cation [45, 46], energy resource allocation [47, 48], environmental protection [49, 50], cloud computing [51, 52], adversarial classi cation, ride sharing services [53], transportation [54], wireless and network communication [55, 56].

The solution concept par excellence for pseudo-games is the generalized Nash equilibrium (GNE) [44, 57, 58], i.e., a strategy pro le at which each player s strategy is feasible and no player can improve their payo s by unilaterally deviating to another strategy in the strategy set determined by the other players strategies. The aforementioned applications speak to the importance of developing algorithms that can approximate GNEs e ciently in the largest class of pseudo-games possible. Work in this direction, on the generalized Nash equilibrium problem (GNEP), is progressing; see, for example, [59, 57, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77]. Nonetheless, there are still few, if any [78], GNEnding algorithms with computational guarantees, even for restricted classes of pseudo-games.

Exploitability, or the Nikoida-Isoda potential [79], is de ned as the sum of the players payo - maximizing unilateral deviations w.r.t. a given strategy pro le. Minimizing exploitability is a logical approach to computing GNE, especially in cases where exploitability is convex, such as in the pseudo-games that result from replacing each player s payo function in a monotone pseudo-game by a second-order Taylor approximation [80]. When the players payo s are also Lipschitz-smooth and strongly concave (in their own action), this approximation can be quite good, because the error in the approximation will be bounded by the Lipschitz-smoothness and strong concavity constant. Minimizing exploitability can thus be an e ective way of nding GNEs in pseudo-games; however, to our knowledge, no convergence rates for this approach are known, and few methods have been proposed for pseudo-games whose exploitability is non-convex.

In this paper, we develop e cient exploitability-minimization algorithms that converge to variational equilibrium (VE), a re nement of GNE, exactly in a large class of pseudo-games with jointly convex constraints (which includes, but is not limited to, two-player zero-sum, n-player pairwise zero-sum, and a large class of monotone and bilinear pseudo-games, as well as Cournot oligopoly games); and approximately, in Lipschitz-smooth pseudo-games with jointly convex constraints, where exact computation is intractable. Our algorithms apply not only to games with discrete action spaces but also to games with continuous action spaces; previous exploitability-minimization algorithms concern only nite games [81, 82].

The cumulative regret of two strategy pro les is de ned as the sum, across all players, of the change in utility due to unilateral deviations from one strategy pro le to the other. Our algorithms and their analysis rely on the following key insights: assuming jointly convex constraints (i.e., symmetric and convex-valued joint constraint correspondences) the exploitability-minimization problem is equivalent to solving a cumulative regret min-max optimization problem, which although non-convex-concave in general can nonetheless be solved by gradient descent ascent (GDA) methods1 [25, 85] using a parametric variant of Moreau s envelope theorem that we introduce (Theorem 4.1). We apply this logic to develop two algorithms for Lipschitz-smooth pseudo-games with jointly convex constraints. To the best of our knowledge, this makes ours the rst exploitabilityminimization methods with convergence rate guarantees in this class of pseudo-games.

The rst (EDA; Algorithm 1) is an extragradient method that converges in average iterates to an "-VE, and hence an "-GNE, in O(1/") iterations in pseudo-games with convex-concave cumulative regret, where " is the exploitability of the game (Theorem 3.4). This class of pseudo-games includes zero-sum, potential, Cournot, a large class of monotone pseudo-games, and a class of bilinear

1When the objective function is convex-concave, a saddle point is guaranteed to exist, in which case this optimization problem can be solved using simultaneous-GDA-style algorithms (e.g., [21, 83, 84]).

pseudo-games, namely those with convex second-order approximations. Our second algorithm (ADA; Algorithm 2) is a nested GDA algorithm, whose best iterate converges more generally to a stationary point of (regularized) exploitability in all Lipschitz-smooth pseudo-games with jointly convex constraints at the slightly slower rate of O(log(1/")/") (Theorem 4.2). When the (regularized) exploitability is additionally invex, our method also converges to a GNE in best iterates at the same rate. This convergence can be strengthened to convergence in last iterates in O(log(1/")2) iterations in strongly-convex exploitability pseudo-games and in a certain class of pseudo-games with non-convex exploitability (Theorem 4.3).

Pseudo-games with jointly convex constraints are of interest in decentralized resource allocation problems with feasibility (e.g., supply) constraints. Feasibility constraints are almost always joint, and often also jointly convex, since we generally require the sum of the allocations to be less than or equal to the supply. One of the many examples of such pseudo-games is wireless and network communication, where agents communicate packets across a shared network, with limited capacities [55]. The experiments we run in this paper suggest that our algorithms have the potential to yield improved solutions to such pseudo-games as compared to other known approaches, and thus open up avenues for practical applications of our work in the future.

Related Work Following Arrow and Debreu s introduction of GNE, Rosen [86] initiated the study of the mathematical and computational properties of GNE in pseudo-games with jointly convex constraints, proposing a projected gradient method to compute GNE. Thirty years later, Uryas ev and Rubinstein [87] developed the rst relaxation methods for nding GNEs, which were improved upon in subsequent works [88, 89]. Two other types of algorithms were also introduced to the literature: Newton-style methods [59, 64, 68, 69, 70, 90] and interior-point potential methods [90]. Many of these approaches are based on minimizing the exploitability of the pseudo-game, but others use variational inequality[91, 92] and Lemke methods [93].

More recently, novel methods that transform GNEP to NEP were analyzed. These models take the form of either exact penalization methods, which lift the constraints into the objective function via a penalty term [72, 73, 76, 77, 94], or augmented Lagrangian methods [71, 74, 76, 95], which do the same, augmented by dual Lagrangian variables. Using these methods, Jordan, Lin, and Zampetakis [78] provide the rst convergence rates to a "-GNE in monotone (resp. strongly monotone) pseudogames with jointly a ne constraints in O(1/") ( O(1/p")) iterations. These algorithms, despite being highly e cient in theory, are numerically unstable in practice [78]. Nearly all of the aforementioned approaches concerned pseudo-games with jointly convex constraints.

Exploitability minimization has also been a valuable tool in multi-agent reinforcement learning; algorithms in this literature that aim to minimize exploitability are known as exploitability descent algorithms [81]. Lockhart et al. [81] analyzed exploitability descent in two-player, zero-sum, extensive-form games with nite action spaces. Variants of exploitability-descent have also been combined with entropic regularization and homotopy methods to solve for NE in large games [82].

2 Preliminaries

Notation We use caligraphic uppercase letters to denote sets (e.g., X); bold lowercase letters to denote vectors (e.g., p, ); bold uppercase letters to denote matrices (e.g., X) and lowercase letters to denote scalar quantities (e.g., x, δ). We denote the ith row vector of a matrix (e.g., X) by the corresponding bold lowercase letter with subscript i (e.g., xi). Similarly, we denote the jth entry of a vector (e.g., p or xi) by the corresponding Roman lowercase letter with subscript j (e.g., pj or xij). We denote functions by a letter determined by the value of the function, e.g., f if the mapping is scalar-valued, f if the mapping is vector-valued, and F if the mapping is set-valued. We denote the set of integers {1, . . . , n} by [n], the set of natural numbers by N, the set of real numbers by R, and the positive and strictly positive elements of a set by a + and ++ subscript, respectively, e.g., R+ and R++. We de ne the gradient rx : C1(X) ! C0(X) as the operator which takes as input a functional f : X ! R, and outputs a vector-valued function consisting of the partial derivatives of f w.r.t. x. We denote the orthogonal projection onto a set C by C, i.e., C(x) = arg miny2C kx yk2

2. Finally, we write

X(a) to denote the indicator function of the set X, i.e.,

X(x) = 0 if x 2 X and

X(x) = 1 if x /2 X.

A pseudo-game [44] G .= (n, A, X, h, u) comprises n 2 N+ players, each i 2 [n] of whom chooses an ai from an action space Ai. We denote the players joint action space by A = i2[n] Ai. Each player i aims to maximize their continuous utility ui : A ! R, which is concave in ai, by choosing a feasible action from a set of actions Xi(a i) Ai determined by the actions a i 2 A i Rm of the other players, where Xi : A i Ai is a non-empty, continuous, compactand convex-valued action correspondence, which for convenience we represent as Xi(a i) = {ai 2 Ai | hik(ai, a i) 0, for all k 2 [d]}, where for all i 2 [n], and k 2 [d], hik is a continuous and concave function in ai, which de nes the constraints. We denote the product of the feasible action correspondences of all players by X(a) = i2[n] Xi(a i), which we note is guaranteed to be non-empty, continuous, and compact-valued, but not necessarily convexvalued. Additionally, overloading notation, we de ne the set of jointly feasible strategy pro les X = {a 2 A | hik(a) 0, for all i 2 [n], k 2 [d]}.

A two player pseudo-game is called zero-sum if for all a 2 A, u1(a) = u2(a). A pseudo-game is called monotone if for all a, b 2 A P

raiui(a) raiui(b)

#T (a b) 0. A pseudogame is said to have joint constraints if there exists a function g : A ! Rd s.t. g = h1 = . . . = hd. If additionally, for all k 2 [d], gk(a) is concave in a, then we say that the pseudo-game has jointly convex constraints, as this assumption implies that X is a convex set. A game [1] is a pseudogame where, for all players i 2 [n], Xi is a constant correspondence, i.e., for all players i 2 [n], Xi(a i) = Xi(b i), for all a, b 2 A. Moreover, while X(a) is convex, for all action pro les a 2 A, X is not guaranteed to be convex unless one assumes joint convexity.

Given a pseudo-game G, an "-generalized Nash equilibrium (GNE) is strategy pro le a 2 X(a ) s.t. for all i 2 [n] and ai 2 Xi(a

i), ui(a ) ui(ai, a

i) ". An "-variational equilibrium (VE) (or normalized GNE) of a pseudo-game is a strategy pro le a 2 X(a ) s.t. for all i 2 [n] and a 2 X, ui(a ) ui(ai, a

i) ". We note that in the above de nitions, one could just as well write a 2 X(a ) as a 2 X, as any xed point of the joint action correspondence is also a jointly feasible action pro le, and vice versa. A GNE (VE) is an "-GNE (VE) with " = 0. Under our assumptions, while GNE are guaranteed to exist in all pseudo-games by Arrow and Debreu s lemma on abstract economies [44], VE are only guaranteed to exist in pseudo-games with jointly convex constraints [65]. Note that the set of "-VE of a pseudo-game is a subset of the set of the "-GNE, as X(a ) X, for all a which are GNE of G. The converse, however, is not true, unless A X. Further, when G is a game, GNE and VE coincide; we refer to this set simply as NE.

Mathematical Preliminaries Finally, we de ne several mathematical concepts that are relevant to our convergence proofs. Given A Rn, the function f : A ! R is said to be f-Lipschitzcontinuous i 8x1, x2 2 X, kf(x1) f(x2)k f kx1 x2k. If the gradient of f is rf Lipschitz-continuous, we then refer to f as rf-Lipschitz-smooth. A function f : X ! R is said to be invex w.r.t. a function h : X X ! X if f(x) f(y) h(x, y) rf(y), for all x, y 2 X. A function f : X ! R is convex if it is invex w.r.t. h(x, y) = x y and concave if f is convex. A function f : X ! R is µ-strongly convex (SC) if f(x1) f(x2) + hrxf(x2), x1 x2i +

µ/2 kx1 x1k2, and µ-strongly concave if f is µ-strongly convex.

3 From Exploitability Minimization To Min-Max Optimization

In this section, we reformulate the exploitability minimization problem as a min-max optimization problem. This reformulation is a consequence of the following observation, which is key to our analysis: In pseudo-games for which a VE exists, e.g., pseudo-games with jointly convex constraints, the quasi-optimization problem of minimizing exploitability, whose solutions characterize the set of GNEs, reduces to the reduces to a standard min-max optimization problem (with independent constraint sets), which characterizes the set of VEs, a subset of the GNEs. This new perspective allows us to e ciently solve GNEP by computing VEs in a large class of pseudo-games as an immediate consequence of known results on the computation of saddle points in convex-concave min-max optimization problems.

Given a pseudo-game G, we de ne the regret felt by any player i 2 [n] for an action ai as compared to another action bi, given the action pro le a i of other players, as follows: Regreti(ai, bi; a i) = ui(bi, a i) ui(ai, a i). Additionally, the cumulative regret, or the Nikaida-Isoda function, : A A ! R between two action pro les a 2 X and b 2 X

across all players in a pseudo-game is given by (a, b) = P

i2[n] Regreti(ai, bi; a i). Further, the exploitability, or Nikaido-Isoda potential function, ' : A ! R of an action pro le a is de ned as '(a) = P

i2[n] maxbi2Xi(a i) Regreti(ai, bi; a i) [57]. Notice that the max is taken over Xi(a i), since a player can only deviate within the set of available strategies.

A well-known [57]2 result is that any unexploitable strategy pro le in a pseudo-game is a GNE. Lemma 3.1. Given a pseudo-game G, for all a 2 A, '(a) 0. Additionally, a strategy pro le a 2 X(a ) is a GNE i it achieves the lowerbound, i.e., '(a ) = 0.

This lemma tells us that we can reformulate GNEP as the quasi-optimization problem of minimizing exploitability, i.e., mina2X(a) '(a). Here, quasi refers to the fact that a solution to this problem is both a minimizer of ' and a xed point a s.t. a 2 X(a ).3 Despite this reformulation of GNEP in terms of exploitability, no exploitability-minimization algorithms with convergence rate guarantees are known. The unexploitability (!) of exploitability may be due to the fact that it is not Lipschitz-smooth. The key insight that allows us to obtain convergence guarantees is that we treat the GNE problem not as a quasi-minimization problem, but rather as a quasi-min-max optimization problem, namely mina2X(a) '(a) = mina2X(a) maxb2X(a) (a, b).

Observation 3.2. Given a pseudo-game G, for all a 2 A, '(a) = maxb2X(a) (a, b).

Proof. The per-player maximum operators can be pulled out of the sum, as the ith player s best action in hindsight is independent of the other players best actions in hindsight, since the action pro le a is xed:

max bi2Xi(a i) Regreti(ai, bi; a i) = max

Regreti(ai, bi; a i) = max

b2X(a) (a, b)

That is, the optimization problem in the ith summand is independent of the optimization problem in the other summands, e.g., for any y 2 [0, 1/2]2, maxx12[0,1]:x1 y1 0 {x1y1} + maxx22[0,1]:x2 y2 0 {x2y2} = maxx2[0,1]2:x y 0 x1y1 + x2y2.

If we restrict our attention to VEs, a subset of GNEs, VE exploitability hereafter exploitability for short is conveniently expressed as '(a) = maxb2X (a, b). When a VE exists, e.g., in pseudo-games with jointly convex constraints, this exploitability is guaranteed to achieve the lower bound of 0 for some a 2 X. In such cases, formulation of exploitability minimization as a quasi-min-max optimization problem reduces to a standard min-max optimization problem, namely mina2X maxb2X (a, b), which characterizes VEs.

This problem is well understood when is a convex-concave objective function [21, 83, 84, 96]. Furthermore, the cumulative regret is indeed convex-concave, i.e., convex in a and concave in b, in many pseudo-games of interest: e.g., two-player zero-sum, n-player pairwise zero-sum, and a large class of monotone and bilinear pseudo-games, as well as Cournot oligopoly games.

Going forward, we restrict our attention to pseudo-games satisfying the following assumptions. Lipschitz-smoothness is a standard assumption in the convex optimization literature [97], while joint convexity is a standard assumption in the GNE literature [57]. Furthermore, these are weaker assumptions than ones made to obtain the few known results on convergence rates to GNEs [78].

Assumption 3.3. The pseudo-game G has joint constraints g : A ! Rd, and additionally, 1. (Lipschitz smoothness and concavity) for any player i 2 [n], their utility function ui is ru-Lipchitz smooth; 2. (Joint Convexity) g is component-wise concave.

Using the simple observation that every VE of a pseudo-game is the solution to a min-max optimization problem we introduce our rst algorithm (EDA; Algorithm 1), an extragradient method [83]. The algorithm works by interleaving extragradient ascent and descent steps: at iteration t, given a(t), it ascends on (a(t), ), thereby generating a better response b(t+1), and then descends on ( , b(t+1)), thereby decreasing exploitability. We combine several known results about the

2yet hard to exploit(!) 3This problem could also be formulated as minimizing over the set of jointly feasible actions, i.e., mina2X '(a); however, without assuming joint convexity, X is not necessarily convex.

Algorithm 1 Extragradient descent ascent (EDA)

Inputs: n, u, X, , T, a(0), b(0)

Outputs: (a(t), b(t))T

t=0 1: for t = 0, . . . , T 1 do 2: a(t+1/2) = X

a(t) ra (a(t), b(t))

3: b(t+1/2) = X

b(t) + rb (a(t), b(t))

4: a(t+1) = X

a(t) ra (a(t+1/2), b(t+1/2))

5: b(t+1) = X

b(t) + rb (a(t+1/2), b(t+1/2))

6: return (a(t), b(t))T

convergence of extragradient descent methods in min-max optimization problems to obtain the following convergence guarantees for EDA in pseudo-games.4

Theorem 3.4 (Convergence rate of EDA). Consider a pseudo-game G with convex-concave cumulative regret that satis es Assumption 3.3. Suppose that EDA (Algorithm 1) is run with < 1/ r , and that doing so generates the sequence of iterates (a(t), b(t))T

t=0. Let a(T) =

t=1 a(t) and b(T) = 1/T PT

t=1 b(t). Then the following convergence rate to a VE, i.e., to zero exploitability holds: maxb2X (a(T), b) maxa2X (a, b(T)) 1/T (d r ), where

d = max(a,b)2X X

))(a, b) (a(0), b(0))

2. If, additionally, is µ-strongly-convex-µ-stronglyconcave, and the learning rate = 1/4 , then the following convergence bound also holds: maxb2X (a(T), b) maxa2X (a, b(T)) d 2

/µ (1 µ/4 )T.

Remark 3.5. If the players utility functions have Lipschitz-continuous third-order derivatives, then convergence in last iterates can be obtained in O(1/"2) iterations [98].

Since the complexity of two-player zero-sum convex-concave games, a special case of our model, is (1/"), the iteration complexity we derive is optimal. The exploitability minimization via EDA is thus an optimal approach to GNEP in convex-concave cumulative regret pseudo-games. Additionally, since pseudo-games generalize games, our results show that the complexity of this class of pseudogames is the same as that of the corresponding games.

4 Pseudo-Games with Jointly Convex Constraints

More generally, the exploitability minimization problem is equivalent to a non-convex-concave min-max optimization problem, i.e., cumulative regret is non-convex-concave. In such cases, a minimax theorem does not hold, which precludes the existence of a saddle point, making the problem much harder. Solving non-convex-concave min-max optimization problems is NP-hard in general [99], so we instead set as our goal nding a ( rst-order) stationary point a 2 X of the exploitability ', i.e., a point at which rst-order deviations cannot decrease the exploitability, which we show can be found in polynomial-time. Note that any stationary point a of the exploitability is a '(a )-GNE of the associated pseudo-game.

In recent years, a variety of methods have been developed that could be applied to nding stationary points of the exploitability [19, 25, 26, 27, 31]. Using the most e cient of these methods, one could obtain convergence to an "-stationary point of the exploitability in O(1/"6) iterations [19]. However, as we will see, this rate would be very slow, and the convergence metric, not very strong, e.g., convergence in expected iterates. Instead, we demonstrate how to leverage the structure of the exploitability-minimization problem to obtain much faster convergence rates.

Unfortunately, the exploitability associated with the min-max characterization of pseudo-games with joint constraints is non-di erentiable in general. This fact poses an obstacle when trying to design e cient algorithms that nd its stationary points. However, by exploiting the structure of cumulative regret, we can regularize it to obtain a smooth objective. Observe the following: if

4All omitted proofs can be found in Appendix B.

a 2 arg mina2X maxb2X (a, b), then a 2 arg maxb2X (a , b). In other words, b = a

is a solution to the inner maximization problem. As a result, we can penalize exploitability in proportion to the distance between a and b, while still ensuring that this penalized exploitability is minimized at a VE. We thus optimize the -regularized cumulative regret : A A ! R, de ned as (a, b) .= (a, b)

2, whose associated -regularized exploitability ' : A ! R is given by ' (a) = maxb2X (a, b). Von Heusinger and Kanzow show that a strategy pro le a has no -regularized-exploitability, i.e., ' (a ) = 0, i a is a VE for all > 0 (Theorem 3.3 [65]).

To better understand the smoothness properties of the regularized exploitability, we present the parametric Moreau envelope, a Moreau envelope [100] in which the objective function is a function of the point w.r.t. which we regularize the objective function, which in our case is cumulative regret. Our next theorem is a generalization of the Moreau envelope theorem [100], which shows that -regularized exploitability is Lipschitz-smooth, by allowing us to see ' as a parametric Moreau envelope for , an observation which to our knowledge has not been made in previous work. This Lipschitz-smoothness of ' gives rise to the possibility of deriving algorithms that converge to a stationary point of ' , the key idea underlying our second algorithm (ADA; Algorithm 2).

Theorem 4.1 (Parameteric Moreau Envelope Theorem). Given > 0, consider the parametric Moreau envelope ' (a) .= maxb2X

and the associated proximal operator

{b (a)} .= arg maxb2X

, where is r -Lipschitz smooth. Then ' (a)

-Lipschitz-smooth, with gradients ra' (a) = ra (a, b (a)) (a b (a)).

Next, we formalize the de nition of a stationary point. As the exploitability ' is in general non-convex, and the optimization problem mina2X ' (a) is constrained, the rst-order condition r' (a) = 0 is not su cient for stationarity, since a solution on the boundary of the constraint set could also be rst-order stationary. As a result, we use proximal mappings in our de nition. In particular, we reformulate the constrained exploitability minimization problem as the unconstrained minimization problem mina2Rnm ' (a) +

X(a). In this optimization problem, ' is continuous and Lipschitz smooth, while

X is continuous and convex. Additionally, the set of solutions is non-empty. As a result, one can solve this problem using the proximal gradient method, which, in this case, is equivalent to the projected gradient method [97, 101]. This view of the problem as proximal minimization allows us to provide a clear de nition of a rst-order stationary point.

De ne the projected gradient operator of a function ' : A ! R for some step size a > 0 as

follows: G'

a (a) = a X

. The projected gradient operator is a special case of the gradient mapping operator, and a generalization of the gradient that is projected back onto the feasible set X [101]. The zeros of the projected gradient operator capture stationary points both at the boundary of the set X and in its interior. Indeed, a 2 A is a stationary point of ' if G'

a (a) = 0. Note that, for an action pro le a 2 A to be a VE, it is necessary that it is stationary with respect to the VE exploitability (Proposition 3, [80]). Our goal will thus be to achieve convergence to a strategy pro le a 2 X s.t. G' a (a) = 0.

ADA (Algorithm 2) is a nested GDA algorithm, similar to those of Nouiehed et al. [25] and Goktas and Greenwald [85], with the objective function augmented by a smart regularizer. The algorithm is reminiscent of EDA, interleaving gradient ascent and descent steps. The key di erence is that whereas EDA runs only a single step of extragradient ascent (to nd a better response), ADA runs multiple steps of gradient ascent to approximate a best response. Indeed, as Theorem 4.2 shows, ADA s accuracy depends on the accuracy of the best-response found.

To derive convergence rates for ADA, we rely on two technical lemmas. The rst tells us that the inner loop can approximate the gradient of the exploitability to any desired precision. More speci cally, we bound the error

))ra' (a(t)) ra (a(t), b(t))

)) " by providing a lower bound on the number of inner loop iterations Tb necessary to achieve a desired precision ", for all outer loop iterations t 2 [Ta] (Lemma B.3). The second lemma bounds the algorithm s progress at each iteration of the outer loop by relating " and G' a (at) (Lemma B.4). Finally, by telescoping the inequalities in the progress lemma, we obtain our main theorem: The best iterate found by

Algorithm 2 Augmented Descent Ascent (ADA)

Inputs: n, u, g, , a, b, Ta, Tb, a(0), b(0)

Outputs: (a(t), b(t))

Ta t=0 1: for t = 0, . . . , Ta 1 do

2: a(t+1) = X

ra (a(t), b(t)) (a(t) b(t))

3: b = 0 4: for s = 0, . . . , Tb 1 do 5: b = X

rb (a(t+1), b) (a(t+1) b)

6: b(t+1) = b

7: return (a(t), b(t))

ADA, as measured by the norm of the projected gradient operator, converges to an approximate stationary point of the exploitability in all Lipschitz-smooth pseudo-games with jointly convex constraints. We note that if the gradient of the regularized exploitability can be computed exactly, then convergence to an exact stationary point can be obtained.

Theorem 4.2 (Convergence to Stationary Point of Exploitability). Suppose that ADA is run on a pseudo-game G which satis es Assumption 3.3 with learning rates a > 2

and b = 1 ,

for any number of outer loop iterations Ta 2 N++ and for Tb

total inner loop

iterations where " > 0. Then the outputs (a(t), b(t))T

t=0 satisfy mint=0,...,Ta 1

)))G' a (a(t))

Note that the stationary points to which ADA converges are global minima of ' i ' is invex, in which case ADA converges to a VE, and hence a GNE [102]. We can also obtain faster and last iterate convergence to VE in a more restricted class of games. A pseudo-game is said to be a µ-PL-pseudo-game if the regularized exploitability ' satis es the µ-projection-Polyak Lojasiewicz (PL) condition, i.e., the projected gradient operator associated with ' satis es

)))G' a (a)

' (a) mina2X ' (a)

for all a 2 X.5 A similar PL-condition for games

was recently used by Raghunathan, Cherian, and Jha [104], which they argued is natural.

Theorem 4.3 (PL Exploitability Convergence). Suppose ADA is run on a PL-pseudo-game G

which satis es Assumption 3.3 with learning rates a 2

and b = 1/ r , for any number of outer loop iterations Ta 2 N++

log( r ) total inner loop iterations. Then the outputs

(a(t), b(t))T

t=0 satisfy ' (a(Ta))

We note that this result also implies convergence to a VE in strongly-convex exploitability games in linear time as well, since strong convexity implies the PL-condition.

5We note that this is a special case of the proximal-Polyak-Lojasiwicz condition (See Section 4, [103]).

5 Experiments

In this section, we report on experiments that demonstrate the e ectiveness of our algorithms as compared to others that were also designed to compute GNE. We present numerical results on benchmark psuedo-games that were studied empirically in previous work [104]: 1) monotone pseudo-games with a ne constraints and 2) bilinear pseudo-games with jointly convex constraints.6 We compare our algorithms to the accelerated mirror-prox quadratic penalty method (AMPQP), the only GNEnding algorithm with theoretical convergence guarantees and rates [78].7 For the AMPQP algorithm, we use the hyperparameter settings derived in theory when available, and otherwise conduct a grid search for the best ones. We compare the convergence rates of our algorithms to that of AMPQP, which converges at the same rate as EDA, but slower than ADA, in settings in which it is guaranteed to converge, i.e., in monotone pseudo-games.

In the monotone pseudo-games, for EDA we use a constant learning rate of = 0.02, while for ADA we use the constant learning rates of a = 0.02 and a = 0.05. These rates approximate the Lipschitz-smoothness parameter of the pseudo-games. In the bilinear pseudo-games, we use the theoretically-grounded (Theorem 3.4) learning rate of = 1/k Qk for EDA, and a = k Qk + k Qk2

and b = 1/k Qk for ADA. In both settings, we use a regularization of = 0.1 for ADA, which we found by grid search. Likewise, in our implementation of AMPQP, grid search led us to initialize β0 = 0.01, γ = 1.05, = 0.2 in both settings.8 Additionally, because the iterates generated by AMPQP are not necessarily feasible, we projected them onto the feasible set of actions, so as to avoid in nite exploitability. This heuristic improved the convergence rate of AMPQP signi cantly. The rst iterate for all methods was common, and initialized at random.

Monotone Games with Jointly A ne Constraints We begin by experimenting with monotone games with jointly a ne constraints, the largest class of pseudo-games for which convergence rates exist for AMPQP. In particular, we consider norm-minimization pseudo-games with payo functions ui(a) =

i2[n] ai si

))), for all players i 2 [n], and for some randomly initialized

shifting factors si, as well as jointly a ne constraints g(a) = 1 P

j2[m] a1j P

i2[m] a2j, with action spaces Ai = [ 10, 10]m, for all players i 2 [n]. Although norm-minimization games are trivial, in the sense that they can be solved in closed form, this is not the case for pseudo-games; on the contrary, they form a bedrock example for monotone pseudo-games ([57], Example 1).

In Figure 1a, we observe that all three algorithms converge to a GNE. Interestingly, EDA and ADA nd distinct GNEs from AMPQP, which is not entirely surprising, as AMPQP is not an exploitability minimization algorithm. Although all the algorithms eventually nd a GNE, exploitability does not decrease monotonically (Figure 1b); on the contrary, it increases before eventually decreasing to zero. Convergence to a GNE is expected for AMPQP, as per the theory. EDA, however, is not guaranteed to converge to a GNE in pseudo-games beyond convex exploitability; yet, we still observe convergence, albeit at a slower rate than AMPQP or ADA. Likewise, ADA converges to a GNE, although it is again not guaranteed, much faster than AMPQP.

Bilinear General-Sum Games with Jointly Convex Constraints Second, we consider the class of two-player general-sum games with jointly convex constraints with payo functions u1(a) = a1

T Q1a2, u2(a) = a1

T Q2a2, g(a) = 1 kak2

2 , A1 = A2 = [ 10, 10]m. Such games are not monotone, and hence AMPQP is not guaranteed to converge in theory. In Figure 1c, we see that convergence is not guaranteed empirically either; shown is a sample run in which ADA and EDA converge, while AMPQP exhibits erratic behavior. Non-convergence for AMPQP was

6We describe additional experiments in the supplemental work, with two-player zero-sum bilinear pseudogames with jointly a ne constraints and two-player zero-sum bilinear pseudo-games with jointly convex constraints. The results are not qualitatively di erent than those presented here. All our code can be found on https://github.com/denizalp/exploit-min.

7We note that we are not reporting on our results with the Accelerated Mirror-Prox Augmented Lagrangian Method (AMPAL), another method studied in the literature with the convergence rates as AMPQP, because we found this algorithm to be unstable on our benchmark pseudo-games.

8As suggested by Jordan, Lin, and Zampetakis [78], we replaced the convergence tolerance parameter δ, with the number of iterations for which the accelerated mirror-prox subroutine [105] was run. Then, at each iteration of AMPQP, this parameter was increased by a factor of γ of AMPQP.

(a) Phase portrait for a 2 player monotone game with m = 1

(b) Exploitability for a 5 player monotone game with m = 10

(c) Phase portrait for bilinear general-sum pseudogame with m = 1

(d) Exploitability for a bilinear general-sum pseudo-game; m = 10

Figure 1: Convergence in monotone and bilinear general-sum pseudo-games.

observed in 93 of the 100 experiments run. In fact, even though AMPQP gets very close a GNE (Figure 1d), i.e., zero exploitability, it does not stabilize there. We likewise found that ADA and EDA do not converge to a GNE from all random initializations. In such cases, we restarted the algorithms with a new random initialization; this process lead to on average 11 restarts for EDA and 10 restarts for ADA, meaning both algorithms converged to a GNE within this window.

6 Conclusion

In this paper, we provide an a rmative answer to an open question rst posed by Flam and Ruszczynski [80], who suggested that projected gradient methods could be developed to minimize exploitability in pseudo-games. Although some existing methods [57] make use of the exploitability concept to analyze algorithms that take alternative optimization approaches, we know of none that are exploitability-minimization algorithms per se, as exploitability itself is not explicitly minimized. Our main contribution is the observation that the exploitability minimization problem can be recast as a simple and seemingly trivial min-max optimization problem (Observation 3.2). Our analysis e ectively extends the class of games which can be solved as potential games, since the exploitability can be seen as a potential for any game.

We provide a de nition of stationarity for exploitability via the gradient mapping operator from proximal theory, and we use this de nition to characterize the limit points of our algorithms. Our characterization relies on novel proof techniques inspired by recent analyses of algorithms for min-max optimization problems [25], as well as a parameterized version of Moreau s envelope theorem, which may be of independent interest. This approach, which takes advantage of the structural properties of exploitability, allows us to rigorously analyze an algorithm that converges to stationary points of the VE exploitability in all Lipschitz-smooth pseudo-games with jointly convex constraints, and obtain algorithms which are orders of magnitude faster than known minmax optimization algorithms for similar types of problems. To summarize, our results 1. extend known polynomial-time convergence guarantees to VEs, and thus GNEs, to a class of pseudo-games beyond monotone, and 2. provide a general approach to approximate, with convergence guarantees, solutions to Lipschitz-smooth pseudo-games with jointly convex constraints.

The design of e cient algorithms for constrained non-convex-concave min-max optimization problems is an open problem. The literature thus far has focused on problems in which only the variable to be maximized is constrained [25, 27, 29]. Our techniques may provide a path to solving non-convex-concave min-max optimization problems in entirely constrained domains, since our algorithms solve a non-convex-concave optimization problem where both variables are constrained.

Acknowledgments and Disclosure of Funding

The ideas in this paper stemmed from conversations and seminars held during the Learning in Games workshop at the Simons Institute for the Theory of Computing. This work was also supported by NSF Grant CMMI-1761546.

[1] John F. Nash. Equilibrium points in <i>n</i>-person games . In: Proceedings of the National Academy of Sciences 36.1 (1950), pp. 48 49. : 10.1073/pnas.36.1.48. eprint: https: //www.pnas.org/doi/pdf/10.1073/pnas.36.1.48. : https://www.pnas.org/doi/ abs/10.1073/pnas.36.1.48. [2] Martin J Osborne and Ariel Rubinstein. A course in game theory. MIT press, 1994. [3] Noam Nissan and Tim Roughgarden. Algorithmic Game Theory. Cambridge University Press, 2007. : 10.1017/CBO9780511800481. [4] Christos H Papadimitriou. On the complexity of the parity argument and other ine cient proofs of existence . In: Journal of Computer and system Sciences 48.3 (1994), pp. 498 532. [5] Xi Chen and Xiaotie Deng. 3-Nash is PPAD-complete . In: Electronic Colloquium on Computational Complexity. Vol. 134. Citeseer. 2005, pp. 2 29. [6] Xi Chen and Xiaotie Deng. Settling the complexity of two-player Nash equilibrium . In: 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS 06). IEEE. 2006, pp. 261 272. [7] Constantinos Daskalakis, Paul W Goldberg, and Christos H Papadimitriou. The complexity of computing a Nash equilibrium . In: SIAM Journal on Computing 39.1 (2009), pp. 195 259. [8] Paul Tseng. On linear convergence of iterative methods for the variational inequality problem . In: Journal of Computational and Applied Mathematics 60.1 (1995). Proceedings of the International Meeting on Linear/Nonlinear Iterative Methods and Veri cation of Solution, pp. 237 252. : 0377-0427. : https://doi.org/10.1016/0377-0427(94)00094-H. : https://www.sciencedirect.com/science/article/pii/037704279400094H. [9] Yurii Nesterov and Laura Scrimali. Solving strongly monotone variational and quasivariational inequalities . In: Discrete & Continuous Dynamical Systems 31.4 (2011), pp. 1383 1396. [10] Gauthier Gidel et al. A Variational Inequality Perspective on Generative Adversarial Networks. 2020. ar Xiv: 1802.10551 [cs.LG]. [11] Aryan Mokhtari, Asuman Ozdaglar, and Sarath Pattathil. Convergence Rate of O(1/k) for Optimistic Gradient and Extra-gradient Methods in Smooth Convex-Concave Saddle Point Problems. 2020. ar Xiv: 1906.01115 [math.OC]. [12] Adam Ibrahim et al. Lower bounds and conditioning of di erentiable games . In: ar Xiv preprint ar Xiv:1906.07300 (2019). [13] Mingyi Hong, Junyu Zhang, Shuzhong Zhang, et al. On Lower Iteration Complexity Bounds for the Saddle Point Problems. 2020. ar Xiv: 1912.07481 [math.OC]. [14] Tianyi Lin, Chi Jin, and Michael I Jordan. Near-optimal algorithms for minimax optimization . In: Conference on Learning Theory. PMLR. 2020, pp. 2738 2779. [15] Mohammad Alkousa et al. Accelerated methods for composite non-bilinear saddle point problem. 2020. ar Xiv: 1906.03620 [math.OC]. [16] Anatoli Juditsky, Arkadi Nemirovski, et al. First order methods for nonsmooth convex largescale optimization, ii: utilizing problems structure . In: Optimization for Machine Learning 30.9 (2011), pp. 149 183. [17] Erfan Yazdandoost Hamedani and Necdet Serhat Aybat. A primal-dual algorithm for general convex-concave saddle point problems . In: ar Xiv preprint ar Xiv:1803.01401 2 (2018). [18] Zhao Renbo. Optimal algorithms for stochastic three-composite convex-concave saddle point problems. 2019. ar Xiv: 1903.01687 [math.OC]. [19] Kiran Koshy Thekumparampil et al. E cient Algorithms for Smooth Minimax Optimization. 2019. ar Xiv: 1907.01543 [math.OC]. [20] Yuyuan Ouyang and Yangyang Xu. Lower complexity bounds of rst-order methods for convex-concave bilinear saddle-point problems. 2018. ar Xiv: 1808.02901 [math.OC]. [21] Arkadi Nemirovski. Prox-method with rate of convergence O (1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems . In: SIAM Journal on Optimization 15.1 (2004), pp. 229 251. [22] Yurii Nesterov. Dual extrapolation and its applications to solving variational inequalities and related problems . In: Mathematical Programming 109.2 (2007), pp. 319 344.

[23] Paul Tseng. On accelerated proximal gradient methods for convex-concave optimization . In: submitted to SIAM Journal on Optimization 1 (2008). [24] Maziar Sanjabi et al. On the Convergence and Robustness of Training GANs with Regularized Optimal Transport . In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS 18. Montreal, Canada: Curran Associates Inc., 2018, pp. 7091 7101. [25] Maher Nouiehed et al. Solving a class of non-convex min-max games using iterative rst order methods . In: ar Xiv preprint ar Xiv:1902.08297 (2019). [26] Songtao Lu, Ioannis Tsaknakis, and Mingyi Hong. Block alternating optimization for non-convex min-max problems: algorithms and applications in signal processing and communications . In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2019, pp. 4754 4758. [27] Chi Jin, Praneeth Netrapalli, and Michael I. Jordan. What is Local Optimality in Nonconvex Nonconcave Minimax Optimization? 2020. ar Xiv: 1902.00618 [cs.LG]. [28] Dmitrii M Ostrovskii, Andrew Lowy, and Meisam Razaviyayn. E cient search of rstorder nash equilibria in nonconvex-concave smooth min-max problems . In: ar Xiv preprint ar Xiv:2002.07919 (2020). [29] Tianyi Lin, Chi Jin, and Michael Jordan. On gradient descent ascent for nonconvex-concave minimax problems . In: International Conference on Machine Learning. PMLR. 2020, pp. 6083 6093. [30] Renbo Zhao. A Primal Dual Smoothing Framework for Max-Structured Nonconvex Optimization. 2020. ar Xiv: 2003.04375 [math.OC]. [31] Hassan Ra que et al. Non-Convex Min-Max Optimization: Provable Algorithms and Applications in Machine Learning. 2019. ar Xiv: 1810.02060 [math.OC]. [32] Dov Monderer and Lloyd S Shapley. Potential games . In: Games and economic behavior 14.1 (1996), pp. 124 143. [33] William H Sandholm. Potential games with continuous player sets . In: Journal of Economic theory 97.1 (2001), pp. 81 108. [34] Stephane Durand and Bruno Gaujal. Complexity and optimality of the best response algorithm in random potential games . In: International Symposium on Algorithmic Game Theory. Springer. 2016, pp. 40 51. [35] Yurii Nesterov and Laura Scrimali. Solving strongly monotone variational and quasivariational inequalities . In: (2006). [36] Yang Cai, Argyris Oikonomou, and Weiqiang Zheng. Tight Last-Iterate Convergence of the Extragradient Method for Constrained Monotone Variational Inequalities . In: ar Xiv preprint ar Xiv:2204.09228 (2022). [37] Yu Malitsky. Projected re ected gradient methods for monotone variational inequalities . In: SIAM Journal on Optimization 25.1 (2015), pp. 502 520. [38] Eduard Gorbunov, Nicolas Loizou, and Gauthier Gidel. Extragradient method: O (1/K) last-iterate convergence for monotone variational inequalities and connections with cocoercivity . In: International Conference on Arti cial Intelligence and Statistics. PMLR. 2022, pp. 366 402. [39] Tatiana Tatarenko and Maryam Kamgarpour. Learning Nash equilibria in monotone games . In: 2019 IEEE 58th Conference on Decision and Control (CDC). IEEE. 2019, pp. 3104 3109. [40] Bingsheng He et al. A new inexact alternating directions method for monotone variational inequalities . In: Mathematical Programming 92.1 (2002), pp. 103 118. [41] Yu V Malitsky and VV Semenov. An extragradient algorithm for monotone variational inequalities . In: Cybernetics and Systems Analysis 50.2 (2014), pp. 271 277. [42] Ian Goodfellow et al. Generative Adversarial Nets . In: Advances in Neural Information Processing Systems. Ed. by Z. Ghahramani et al. Vol. 27. Curran Associates, Inc., 2014. : https : / / proceedings . neurips . cc / paper / 2014 / file / 5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf. [43] Antonia Creswell et al. Generative Adversarial Networks: An Overview . In: IEEE Signal Processing Magazine 35.1 (2018), pp. 53 65. : 10.1109/MSP.2017.2765202. [44] Kenneth Arrow and Gerard Debreu. Existence of an equilibrium for a competitive economy . In: Econometrica: Journal of the Econometric Society (1954), pp. 265 290.

[45] Michael Brückner, Christian Kanzow, and Tobias Sche er. Static prediction games for adversarial learning problems . In: J. Mach. Learn. Res. 13 (2012), pp. 2617 2654. : http: //dl.acm.org/citation.cfm?id=2503326. [46] Michael Brückner and Tobias Sche er. Nash equilibria of static prediction games . In: Advances in neural information processing systems 22 (2009). [47] Benjamin F. Hobbs and J. S. Pang. Nash-Cournot Equilibria in Electric Power Markets with Piecewise Linear Demand Functions and Joint Constraints . In: Operations Research 55.1 (2007), pp. 113 127. : 10.1287/opre.1060.0342. eprint: https://doi.org/10.1287/ opre.1060.0342. : https://doi.org/10.1287/opre.1060.0342. [48] Wei Jing-Yuan and Yves Smeers. Spatial Oligopolistic Electricity Models with Cournot Generators and Regulated Transmission Prices . In: Operations Research 47.1 (1999), pp. 102 112. : 10.1287/opre.47.1.102. eprint: https://pubsonline.informs.org/doi/ pdf/10.1287/opre.47.1.102. : https://pubsonline.informs.org/doi/abs/10. 1287/opre.47.1.102. [49] Michele Breton, Georges Zaccour, and Mehdi Zahaf. A game-theoretic formulation of joint implementation of environmental projects . In: European Journal of Operational Research 168.1 (2006), pp. 221 239. [50] Jacek B. Krawczyk. Coupled constraint Nash equilibria in environmental games . In: Resource and Energy Economics 27.2 (2005), pp. 157 181. : 0928-7655. : https://doi. org/10.1016/j.reseneeco.2004.08.001. : https://www.sciencedirect.com/ science/article/pii/S0928765504000661. [51] Danilo Ardagna, Michele Ciavotta, and Mauro Passacantando. Generalized Nash Equilibria for the Service Provisioning Problem in Multi-Cloud Systems . In: IEEE Transactions on Services Computing 10.3 (2017), pp. 381 395. : 10.1109/TSC.2015.2477836. [52] Danilo Ardagna, Barbara Panicucci, and Mauro Passacantando. A Game Theoretic Formulation of the Service Provisioning Problem in Cloud Systems . In: Proceedings of the 20th International Conference on World Wide Web. WWW 11. Hyderabad, India: Association for Computing Machinery, 2011, pp. 177 186. : 9781450306324. : 10.1145/1963405.1963433. : https://doi.org/10.1145/1963405.1963433. [53] Xuegang (Je ) Ban et al. A general equilibrium model for transportation systems with ehailing services and ow congestion . In: Transportation Research Part B: Methodological 129 (2019), pp. 273 304. : 0191-2615. : https://doi.org/10.1016/j.trb.2019.08.012. : https://www.sciencedirect.com/science/article/pii/S0191261518309044. [54] Oliver Stein and Nathan Sudermann-Merx. The noncooperative transportation problem and linear generalized Nash games . In: European Journal of Operational Research 266.2 (2018), pp. 543 553. : 10.1016/j.ejor.2017.10.00. : https://ideas.repec.org/ a/eee/ejores/v266y2018i2p543-553.html. [55] Zhu Han et al. Game theory in wireless and communication networks: Theory, models, and applications. English. Vol. 9780521196963. Publisher Copyright: Cambridge University Press 2012. United Kingdom: Cambridge University Press, Jan. 2011. : 9780521196963. : 10.1017/CBO9780511895043. [56] Jong-Shi Pang et al. Distributed power allocation with rate constraints in Gaussian parallel interference channels . In: IEEE Transactions on Information Theory 54.8 (2008), pp. 3471 3489. [57] Francisco Facchinei and Christian Kanzow. Generalized Nash equilibrium problems . In: Annals of Operations Research 175.1 (2010), pp. 177 211. [58] Andreas Fischer, Markus Herrich, and Klaus Schönefeld. GENERALIZED NASH EQUILIBRIUM PROBLEMS - RECENT ADVANCES AND CHALLENGES . In: Pesquisa Operacional 34 (2014), pp. 521 558. [59] Francisco Facchinei, Andreas Fischer, and Veronica Piccialli. Generalized Nash equilibrium problems and Newton methods . In: Mathematical Programming 117.1 (2009), pp. 163 194. [60] Francisco Facchinei and Simone Sagratella. On the computation of all solutions of jointly convex generalized Nash equilibrium problems . In: Optimization Letters 5.3 (2011), pp. 531 547. [61] Dario Paccagnan et al. Distributed computation of generalized Nash equilibria in quadratic aggregative games with a ne coupling constraints . In: 2016 IEEE 55th Conference on Decision and Control (CDC). IEEE. 2016, pp. 6123 6128.

[62] Peng Yi and Lacra Pavel. A distributed primal-dual algorithm for computation of generalized Nash equilibria via operator splitting methods . In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC). IEEE. 2017, pp. 3841 3846. [63] Eleftherios Couzoudis and Philipp Renner. Computing generalized Nash equilibria by polynomial programming . In: Mathematical Methods of Operations Research 77.3 (2013), pp. 459 472. [64] Axel Dreves. Computing all solutions of linear generalized Nash equilibrium problems . In: Mathematical Methods of Operations Research 85.2 (2017), pp. 207 221. [65] Anna Von Heusinger and Christian Kanzow. Optimization reformulations of the generalized Nash equilibrium problem using Nikaido-Isoda-type functions . In: Computational Optimization and Applications 43.3 (2009), pp. 353 377. [66] Tatiana Tatarenko and Maryam Kamgarpour. Learning generalized Nash equilibria in a class of convex games . In: IEEE Transactions on Automatic Control 64.4 (2018), pp. 1426 1439. [67] Axel Dreves and Nathan Sudermann-Merx. Solving linear generalized Nash equilibrium problems numerically . In: Optimization Methods and Software 31.5 (2016), pp. 1036 1063. [68] Anna von Heusinger, Christian Kanzow, and Masao Fukushima. Newton s method for computing a normalized equilibrium in the generalized Nash game through xed point formulation . In: Mathematical Programming 132.1 (2012), pp. 99 123. [69] Alexey F Izmailov and Mikhail V Solodov. On error bounds and Newton-type methods for generalized Nash equilibrium problems . In: Computational Optimization and Applications 59.1 (2014), pp. 201 218. [70] Andreas Fischer et al. A globally convergent LP-Newton method . In: SIAM Journal on Optimization 26.4 (2016), pp. 2012 2033. [71] Jong-Shi Pang and Masao Fukushima. Quasi-variational inequalities, generalized Nash equilibria, and multi-leader-follower games . In: Computational Management Science 2.1 (2005), pp. 21 56. [72] Francisco Facchinei and Lorenzo Lampariello. Partial penalization for the solution of generalized Nash equilibrium problems . In: Journal of Global Optimization 50.1 (2011), pp. 39 57. [73] Masao Fukushima. Restricted generalized Nash equilibria and controlled penalty algorithm . In: Computational Management Science 8.3 (2011), pp. 201 218. [74] Christian Kanzow. On the multiplier-penalty-approach for quasi-variational inequalities . In: Mathematical Programming 160.1 (2016), pp. 33 63. [75] Christian Kanzow and Daniel Steck. Augmented Lagrangian methods for the solution of generalized Nash equilibrium problems . In: SIAM Journal on Optimization 26.4 (2016), pp. 2034 2058. [76] Christian Kanzow and Daniel Steck. Augmented Lagrangian and exact penalty methods for quasi-variational inequalities . In: Computational Optimization and Applications 69.3 (2018), pp. 801 824. [77] Qin Ba and Jong-Shi Pang. Exact penalization of generalized Nash equilibrium problems . In: Operations Research (2020). [78] Michael I. Jordan, Tianyi Lin, and Manolis Zampetakis. First-Order Algorithms for Nonlinear Generalized Nash Equilibrium Problems. 2022. : 10.48550/ARXIV.2204.03132. : https://arxiv.org/abs/2204.03132. [79] Hukukane Nikaidô and Kazuo Isoda. Note on non-cooperative convex games . In: Paci c Journal of Mathematics 5.S1 (1955), pp. 807 815. [80] Sjur Flam and Andrzej Ruszczynski. Noncooperative Convex Games: Computing Equilibrium By Partial Regularization. Working Papers. International Institute for Applied Systems Analysis, 1994. : https://Econ Papers.repec.org/Re PEc:wop:iasawp:wp94042. [81] Edward Lockhart et al. Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent . In: Co RR abs/1903.05614 (2019). ar Xiv: 1903.05614. : http://arxiv.org/abs/1903.05614. [82] Ian M. Gemp et al. Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent . In: Co RR abs/2106.01285 (2021). ar Xiv: 2106.01285. : https: //arxiv.org/abs/2106.01285.

[83] Galina M Korpelevich. The extragradient method for nding saddle points and other problems . In: Matecon 12 (1976), pp. 747 756. [84] Angelia Nedic and Asuman Ozdaglar. Subgradient methods for saddle-point problems . In: Journal of optimization theory and applications 142.1 (2009), pp. 205 228. [85] Denizalp Goktas and Amy Greenwald. Convex-Concave Min-Max Stackelberg Games . In: Advances in Neural Information Processing Systems 34 (2021). [86] J. B. Rosen. Existence and Uniqueness of Equilibrium Points for Concave N-Person Games . In: Econometrica 33.3 (1965), pp. 520 534. : 00129682, 14680262. : http://www. jstor.org/stable/1911749 (visited on 05/16/2022). [87] S. Uryas ev and R.Y. Rubinstein. On relaxation algorithms in computation of noncooperative equilibria . In: IEEE Transactions on Automatic Control 39.6 (1994), pp. 1263 1267. : 10.1109/9.293193. [88] Jacek B. Krawczyk and Stanislav Uryasev. Relaxation algorithms to nd Nash equilibria with economic applications . English. In: Environmental Modeling & Assessment 5.1 (2000). This revised version was published online in July 2006 with corrections to the Cover Date., pp. 63 73. : 10.1023/A:1019097208499. [89] A. Heusinger and C. Kanzow. Relaxation Methods for Generalized Nash Equilibrium Problems with Inexact Line Search . In: Journal of Optimization Theory and Applications 143.1 (2009), pp. 159 183. : https://Econ Papers.repec.org/Re PEc:spr:joptap:v: 143:y:2009:i:1:d:10.1007_s10957-009-9553-0. [90] Axel Dreves et al. A Globalized Newton Method for the Computation of Normalized Nash Equilibria . In: J. of Global Optimization 56.2 (June 2013), pp. 327 340. : 0925-5001. : 10.1007/s10898-011-9824-9. : https://doi.org/10.1007/s10898-011-9824-9. [91] Francisco Facchinei, Andreas Fischer, and Veronica Piccialli. On generalized Nash games and variational inequalities . In: Operations Research Letters 35.2 (2007), pp. 159 164. : 0167-6377. : https://doi.org/10.1016/j.orl.2006.03.004. : https://www. sciencedirect.com/science/article/pii/S0167637706000484. [92] Koichi Nabetani, Paul Tseng, and Masao Fukushima. Parametrized Variational Inequality Approaches to Generalized Nash Equilibrium Problems with Shared Constraints . In: Comput. Optim. Appl. 48.3 (Apr. 2011), pp. 423 452. : 0926-6003. : 10.1007/s10589-0099256-3. : https://doi.org/10.1007/s10589-009-9256-3. [93] Dane A. Schiro, Jong-Shi Pang, and Uday V. Shanbhag. On the solution of a ne generalized Nash equilibrium problems with shared constraints by Lemke s method . English. In: Mathematical Programming 142.1-2 (2013). This work was based on research partially supported by the National Science Foundation under grant CMMI-0969600 and the Department of Energy under grant DOE DE-SC0003879., pp. 1 46. : 10.1007/s10107-012-0558-3. [94] Francisco Facchinei and Christian Kanzow. Penalty Methods for the Solution of Generalized Nash Equilibrium Problems . In: SIAM Journal on Optimization 20.5 (2010), pp. 2228 2253. : 10.1137/090749499. eprint: https://doi.org/10.1137/090749499. : https: //doi.org/10.1137/090749499. [95] Luís Felipe Bueno, Gabriel Haeser, and Frank Navarro Rojas. Optimality Conditions and Constraint Quali cations for Generalized Nash Equilibrium Problems and Their Practical Implications . In: SIAM Journal on Optimization 29.1 (2019), pp. 31 54. : 10.1137/ 17M1162524. eprint: https://doi.org/10.1137/17M1162524. : https://doi.org/ 10.1137/17M1162524. [96] J v Neumann. Zur theorie der gesellschaftsspiele . In: Mathematische annalen 100.1 (1928), pp. 295 320. [97] Stephen Boyd, Stephen P Boyd, and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004. [98] Noah Golowich et al. Last Iterate is Slower than Averaged Iterate in Smooth Convex Concave Saddle Point Problems . In: Co RR abs/2002.00057 (2020). ar Xiv: 2002.00057. : https://arxiv.org/abs/2002.00057. [99] Ioannis Tsaknakis, Mingyi Hong, and Shuzhong Zhang. Minimax Problems with Coupled Linear Constraints: Computational Complexity, Duality and Solution Methods . In: ar Xiv preprint ar Xiv:2110.11210 (2021). [100] Jean-Jacques Moreau. Proximité et dualité dans un espace hilbertien . In: Bulletin de la Société mathématique de France 93 (1965), pp. 273 299.

[101] Amir Beck. First-order methods in optimization. SIAM, 2017. [102] Shashi K Mishra and Giorgio Giorgi. Invexity and optimization. Vol. 88. Springer Science & Business Media, 2008. [103] Hamed Karimi, Julie Nutini, and Mark Schmidt. Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition . In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer. 2016, pp. 795 811. [104] Arvind Raghunathan, Anoop Cherian, and Devesh Jha. Game theoretic optimization via gradient-based nikaido-isoda function . In: International Conference on Machine Learning. PMLR. 2019, pp. 5291 5300. [105] Yunmei Chen, Guanghui Lan, and Yuyuan Ouyang. Accelerated schemes for a class of variational inequalities . In: Mathematical Programming 165.1 (2017), pp. 113 149. [106] Guido Van Rossum and Fred L Drake Jr. Python tutorial. Centrum voor Wiskunde en Informatica Amsterdam, The Netherlands, 1995. [107] Charles R. Harris et al. Array programming with Num Py . In: Nature 585 (2020), pp. 357 362. : 10.1038/s41586-020-2649-2. [108] Steven Diamond and Stephen Boyd. CVXPY: A Python-embedded modeling language for convex optimization . In: Journal of Machine Learning Research 17.83 (2016), pp. 1 5. [109] J. D. Hunter. Matplotlib: A 2D graphics environment . In: Computing in Science & Engineering 9.3 (2007), pp. 90 95. : 10.1109/MCSE.2007.55. [110] Gauthier Gidel et al. A variational inequality perspective on generative adversarial networks . In: ar Xiv preprint ar Xiv:1802.10551 (2018). [111] John M. Danskin. The Theory of Max-Min, with Applications . In: SIAM Journal on Applied Mathematics 14.4 (1966), pp. 641 664. : 00361399. : http://www.jstor.org/ stable/2946123. [112] S. N. Afriat. Theory of Maxima and the Method of Lagrange . In: SIAM Journal on Applied Mathematics 20.3 (1971), pp. 343 357. : 00361399. : http://www.jstor.org/ stable/2099955. [113] Paul Milgrom and Ilya Segal. Envelope theorems for arbitrary choice sets . In: Econometrica 70.2 (2002), pp. 583 601. [114] Murray H Protter, B Charles Jr, et al. A rst course in real analysis. Springer Science & Business Media, 2012. [115] Haim Brezis and Haim Brézis. Functional analysis, Sobolev spaces and partial di erential equations. Vol. 2. 3. Springer, 2011. [116] Elad Hazan, Karan Singh, and Cyril Zhang. E cient regret minimization in non-convex games . In: International Conference on Machine Learning. PMLR. 2017, pp. 1433 1441.

1. For all authors...

(a) Do the main claims made in the abstract and introduction accurately re ect the paper s

contributions and scope? [Yes] (b) Did you describe the limitations of your work? [Yes] Yes, details are discussed in

conclusion, and experiments sections. (c) Did you discuss any potential negative societal impacts of your work? [N/A] (d) Have you read the ethics review guidelines and ensured that your paper conforms to

them? [Yes] 2. If you are including theoretical results...

(a) Did you state the full set of assumptions of all theoretical results? [Yes] Assumptions

are summarized in Assumption 3.3 (b) Did you include complete proofs of all theoretical results? [Yes] Yes, all proofs can be

found in the appendix. 3. If you ran experiments...

(a) Did you include the code, data, and instructions needed to reproduce the main experi-

mental results (either in the supplemental material or as a URL)? [Yes] Yes, code repo can be found here: https://github.com/denizalp/exploit-min. (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they

were chosen)? [Yes] Details can be found in section 5 and Appendix A. (c) Did you report error bars (e.g., with respect to the random seed after running ex-

periments multiple times)? [Yes] For experiments run multiple times, i.e., zero-sum convex constraints, exploitability always reached zero, and there was hence no error to report in all randomly initialized examples. (d) Did you include the total amount of compute and the type of resources used (e.g., type

of GPUs, internal cluster, or cloud provider)? [Yes] Details can be found in Appendix A. 4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets...

(a) If your work uses existing assets, did you cite the creators? [Yes] Details can be found

in Appendix A (b) Did you mention the license of the assets? [Yes] Details can be found in Appendix A

(c) Did you include any new assets either in the supplemental material or as a URL?

[N/A] (d) Did you discuss whether and how consent was obtained from people whose data

you re using/curating? [N/A] (e) Did you discuss whether the data you are using/curating contains personally identi -

able information or o ensive content? [N/A] 5. If you used crowdsourcing or conducted research with human subjects...

(a) Did you include the full text of instructions given to participants and screenshots, if

applicable? [N/A] (b) Did you describe any potential participant risks, with links to Institutional Review

Board (IRB) approvals, if applicable? [N/A] (c) Did you include the estimated hourly wage paid to participants and the total amount

spent on participant compensation? [N/A]