# on_tractable_phiequilibria_in_nonconcave_games__81970230.pdf On Tractable Φ-Equilibria in Non-Concave Games Yang Cai Yale University yang.cai@yale.edu Constantinos Daskalakis MIT CSAIL costis@csail.mit.edu Haipeng Luo University of Southern California haipengl@usc.edu Chen-Yu Wei University of Virginia chenyu.wei@virginia.edu Weiqiang Zheng Yale University weiqiang.zheng@yale.edu While Online Gradient Descent and other no-regret learning procedures are known to efficiently converge to a coarse correlated equilibrium in games where each agent s utility is concave in their own strategy, this is not the case when utilities are non-concave a common scenario in machine learning applications involving strategies parameterized by deep neural networks, or when agents utilities are computed by neural networks, or both. Non-concave games introduce significant game-theoretic and optimization challenges: (i) Nash equilibria may not exist; (ii) local Nash equilibria, though they exist, are intractable; and (iii) mixed Nash, correlated, and coarse correlated equilibria generally have infinite support and are intractable. To sidestep these challenges, we revisit the classical solution concept of Φ-equilibria introduced by Greenwald and Jafari [GJ03], which is guaranteed to exist for an arbitrary set of strategy modifications Φ even in nonconcave games [SL07]. However, the tractability of Φ-equilibria in such games remains elusive. In this paper, we initiate the study of tractable Φ-equilibria in nonconcave games and examine several natural families of strategy modifications. We show that when Φ is finite, there exists an efficient uncoupled learning algorithm that approximates the corresponding Φ-equilibria. Additionally, we explore cases where Φ is infinite but consists of local modifications, showing that Online Gradient Descent can efficiently approximate Φ-equilibria in non-trivial regimes. 1 Introduction Von Neumann s celebrated minimax theorem establishes the existence of Nash equilibrium in all two-player zero-sum games where the players utilities are continuous as well as concave in their own strategy [Neu28].1 This assumption that players utilities are concave, or quasi-concave, in their own strategies has been a cornerstone for the development of equilibrium theory in Economics, Game Theory, and a host of other theoretical and applied fields that make use of equilibrium concepts. In particular, (quasi-)concavity is key for showing the existence of many types of equilibrium, from generalizations of min-max equilibrium [Fan53; Sio58] to competitive equilibrium in exchange economies [AD54; Mc K54], mixed Nash equilibrium in finite normal-form games [Nas50], and, more generally, Nash equilibrium in (quasi-)concave games [Deb52; Ros65]. 1Throughout this paper, we model games using the standard convention in Game Theory that each player has a utility function that they want to maximize. This is, of course, equivalent to modeling the players as loss minimizers, a convention more common in learning. When we say that a player s utility is concave (respectively non-concave) in their strategy, this is the same as saying that the player s loss is convex (respectively non-convex) in their strategy. 38th Conference on Neural Information Processing Systems (Neur IPS 2024). Table 1: A comparison between different solution concepts in multi-player non-concave games. We include definitions of Nash equilibrium, mixed Nash equilibrium, (coarse) correlated equilibrium, strict local Nash equilibrium, and second-order local Nash equilibrium in Appendix B. We also give a detailed discussion on the existence and complexity of these solution concepts in Appendix B. Solution Concept Incentive Guarantee Existence Complexity Nash equilibrium Mixed Nash equilibrium (Coarse) Correlated equilibrium Global stability Strict local Nash equilibrium Local stability Second-order local Nash equilibrium Second-order stability NP-hard [MK87; AZ22] Local Nash equilibrium First-order stability PPAD-hard [DSZ21] Φ-equilibrium (finite |Φ|) Stability against finite deviations Efficient ε-approximation for any ε > 0 (Theorem 2) Conv(Φ(δ))-equilibrium (finite |Φ(δ)|) Efficient ε-approximation for ε = Ω(δ2) (Theorem 4) Φproj(δ)-equilibrium Efficient ε-approximation via GD/OG for ε = Ω(δ2) (Theorem 3,9) ΦInt(δ)-equilibrium First-order stability when ε = Ω(δ2) Efficient ε-approximation via no-regret learning for ε = Ω(δ2) (Theorem 5) Not only are equilibria guaranteed to exist in concave games, but it is also well-established thanks to a long line of work at the interface of game theory, learning and optimization whose origins can be traced to Dantzig s work on linear programming [Geo63], Brown and Robinson s work on fictitious play [Bro51; Rob51], Blackwell s approachability theorem [Bla56] and Hannan s consistency theory [Han57] that several solution concepts are efficiently computable both centrally and via decentralized learning dynamics. For instance, it is well-known that the learning dynamics produced when the players of a game iteratively update their strategies using no-regret learning algorithms, such as online gradient descent, is guaranteed to converge to Nash equilibrium in twoplayer zero-sum concave games, and to coarse correlated equilibrium in multi-player general-sum concave games [CL06]. The existence of such simple decentralized dynamics further justifies using these solution concepts to predict the outcome of real-life multi-agent interactions where agents deploy strategies, obtain feedback, and use that feedback to update their strategies. While (quasi-)concave utilities have been instrumental in the development of equilibrium theory, as described above, they are also too restrictive an assumption. Several modern applications and outstanding challenges in Machine Learning, from training Generative Adversarial Networks (GANs) to Multi-Agent Reinforcement Learning (MARL) as well as generic multi-agent Deep Learning settings where the agents strategies are parameterized by deep neural networks or their utilities are computed by deep neural networks, or both, give rise to games where the agents utilities are non-concave in their own strategies. We call these games non-concave, following [Das22]. Unfortunately, classical equilibrium theory quickly hits a wall in non-concave games. First, Nash equilibria are no longer guaranteed to exist. Second, while mixed Nash, correlated and coarse correlated equilibria do exist under convexity and compactness of the strategy sets [Gli52], which we have been assuming all along in our discussion so far, they have infinite support, in general [Kar14]. Finally, they are computationally intractable; so, a fortiori, they are also intractable to attain via decentralized learning dynamics. In view of the importance of non-concave games in emerging ML applications and the afore-described state-of-affairs, our investigation is motivated by the following broad and largely open question: Question from [Das22]: Is there a theory of non-concave games? What solution concepts are meaningful, universal, and tractable? 1.1 Contributions We study Daskalakis question through the lens of the classical solution concept of Φ-equilibria introduced by Greenwald and Jafari [GJ03]. This concept is guaranteed to exist for virtually any set of strategy modifications Φ, even in non-concave games, as demonstrated by Stoltz and Lugosi [SL07].2 However, the tractability of Φ-equilibria in such games remains elusive. In this paper, we initiate the study of tractable Φ-equilibria in non-concave games and examine several natural families of strategy modifications. 2Stoltz and Lugosi [SL07] only require the elements of Φ to be measurable functions. Figure 1: The relationship between different solution concepts in non-concave games. An arrow from one solution concept to another means the former is contained in the latter. The dashed arrow from Conv(Φ(δ))-equilibria to ΦFinite-equilibria means the former is contained in the latter when Φ(δ) = ΦFinite. Φ!"#$(𝛿)-Eq. Φ%&'(𝛿)-Eq. CCE Φ()&)'* -Eq. Conv Φ 𝛿 -Eq. Global Deviation Local Deviation Φ-Equilibrium. The concept of Φ-equilibrium generalizes (coarse) correlated equilibrium. A Φequilibrium is a joint distribution over Πn i=1Xi, the Cartesian product of all players strategy sets, and is defined in terms of a set, ΦXi, of strategy modifications, for each player i. The set ΦXi contains functions mapping Xi to itself. A joint distribution over strategy profiles qualifies as a Φ = Πn i=1ΦXi-equilibrium if no player i can increase their expected utility by using any strategy modification function, ϕi ΦXi, on the strategy sampled from the joint distribution. The larger the set Φ, the stronger the incentive guarantee offered by the Φ-equilibrium. For example, if ΦXi contains all constant functions, the corresponding Φ-equilibrium coincides with the notion of coarse correlated equilibrium. Throughout the paper, we also consider ε-approximate Φ-equilibria, where no player can gain more than ε by deviating using any function from ΦXi. We study several families of Φ and illustrate their relationships in Figure 1. Finite Set of Global Deviations. The first case we consider is when each player i s set of strategy modifications, ΦXi, contains a finite number of arbitrary functions mapping Xi to itself. As shown in [GJ03], if there exists an online learning algorithm where each player i is guaranteed to have sublinear ΦXi-regret, the empirical distribution of joint strategies played converges to a Φ = Πn i=1ΦXi-equilibrium. Gordon, Greenwald, and Marks [GGM08] consider Φ-regret minimization but for concave reward functions, and their results, therefore, do not apply to non-concave games. Stoltz and Lugosi [SL07] provide an algorithm that achieves no ΦXi-regret in non-concave games; however, their algorithm requires a fixed-point computation per step, making it computationally inefficient.3 Our first contribution is to provide an efficient randomized algorithm that achieves no ΦXi-regret for each player i with high probability. Contribution 1: Let X be a strategy set (not necessarily compact or convex), and Ψ an arbitrary finite set of strategy modification functions for X. We design a randomized online learning algorithm that achieves O p T log |Ψ| Ψ-regret, with high probability, for arbitrary bounded reward functions on X (Theorem 2). The algorithm operates in time T|Ψ| per iteration. If every player in a non-concave game adopts this algorithm, the empirical distribution of strategy profiles played forms an ε-approximate Φ = Πn i=1ΦXi-equilibrium, with high probability, for any ε > 0, after poly 1 ε, log maxi |ΦXi| , log n iterations. If players have infinitely many global strategy modifications, we can extend Algorithm 1 by discretizing the set of strategy modifications under mild assumptions, such as the modifications being Lipschitz (Corollary 1). The empirical distribution of the strategy profiles still converges to the corresponding Φ-equilibrium, but at a much slower rate of O(T 1 d+2 ), where d is the dimension of the set of strategies. Additionally, the algorithm requires exponential time in the dimension per iteration, making it inefficient. This inefficiency is unavoidable, as the problem remains intractable even when Φ contains only constant functions. 3The existence of the fixed point is guaranteed by the Schauder-Cauty fixed-point theorem [Cau01], a generalization of the Brouwer fixed-point theorem. Hence, it s unlikely such fixed points are tractable. To address the limitations associated with infinitely large global strategy modifications, a natural approach is to focus on local deviations instead. The corresponding Φ-equilibrium will guarantee local stability. The study of local equilibrium concepts in non-concave games has received significant attention in recent years see e.g., [RBS16; HSZ17; DP18; JNJ20; DSZ21]. However, these solution concepts either are not guaranteed to exist, are restricted to sequential two-player zero-sum games [MV21], only establish local convergence guarantees for learning dynamics see e.g., [DP18; WZB20; FCR20], only establish asymptotic convergence guarantees see e.g., [Das+23], or involve non-standard solution concepts where local stability is not with respect to a distribution over strategy profiles [HSZ17]. We study the tractability of Φ-equilibrium with infinitely large Φ sets that consist solely of local strategy modifications. These local solution concepts are guaranteed to exist in general multi-player non-concave games. Specifically, we focus on the following three families of natural deviations. - Projection based Local Deviations: Each player i s set of strategy modifications, denoted by ΦXi Proj(δ), contains all deviations that attempt a small step from their input in a fixed direction and project if necessary, namely are of the form ϕv(x) = ΠXi[x v], where v δ and ΠXi stands for the ℓ2-projection onto Xi. - Convex Combination of Finitely Many Local Deviations: Each player i s set of strategy modifications, denoted by Conv(ΦXi(δ)), contains all deviations that can be represented as a convex combination of a finite set of δ-local strategy modifications, i.e., ϕ(x) x δ for all ϕ ΦXi(δ). - Interpolation based Local Deviations: each player i s set of local strategy modifications, denoted by ΦXi Int(δ), that contains all deviations that interpolate between the input strategy and another strategy in Xi. Formally, each element ϕλ,x (x) of ΦXi Int(δ) can be represented as (1 λ)x + λx for some x Xi and λ δ/DXi (DXi is the diameter of Xi). For our three families of local strategy modifications, we explore the tractability of Φ-equilibrium within a regime we term the first-order stationary regime, where ε = Ω(δ2)4, with δ representing the maximum deviation allowed for a player. An ε-approximate Φ-equilibrium in this regime ensures firstorder stability. This regime is particularly interesting for two reasons: (i) Daskalakis, Skoulakis, and Zampetakis [DSZ21] have demonstrated that computing an ε-approximate δ-local Nash equilibrium in this regime is intractable.5 This poses an intriguing question: can correlating the players strategies, as in a Φ-equilibrium, potentially make the problem tractable? (ii) Extending our algorithm, initially designed for finite sets of strategy modifications, to these three sets of local deviations results in inefficiency; specifically, the running time becomes exponential in one of the problem s natural parameters. Designing efficient algorithms for this regime thus presents challenges. Despite these, we show the following: Contribution 2: For any δ > 0, for each of the three families of infinite δ-local strategy modifications mentioned above, there exists an efficient uncoupled learning algorithm that converges to an ε-approximate Φ-equilibrium of the non-concave game in the first-order stationary regime, i.e., ε = Ω(δ2). We present our results for the projection-based local deviation in Theorem 3 and Theorem 9. Our result for the convex combination of local deviations can be found in Theorem 4. Theorem 5 contains our result for the interpolation-based local deviations. Similar to the finite case, our algorithms build on the connection between Φ-regret minimization and Φ-equilibrium. Given that our strategy modifications are non-standard, it is a priori unclear how to minimize the corresponding Φ-regret. For instance, to our knowledge, no algorithm is known to minimize ΦX Proj(δ)-regret even when the reward functions are concave, and provably ΦX Proj(δ)-regret is incomparable to external regret (Examples 3 and 4). However, via a novel analysis, we show that Online Gradient Descent (GD) and Optimistic Gradient (OG) achieve a near-optimal ΦX Proj(δ)-regret guarantee (Theorem 3 and Theorem 8). Our results provide efficient uncoupled algorithms to compute ε-approximate Φ(δ)-equilibria in the first-order stationary regime ε = Ω(δ2). Further related work is discussed in Appendix A. 4The regime ε = Ω(δ) is trivial when the utility is Lipschitz. 5A strategy profile is considered an ε-approximate δ-local Nash equilibrium if no player can gain more than ε by deviating within a δ distance. 2 Preliminaries A ball of radius r > 0 centered at x Rd is denoted by Bd(x, r) := {x Rd : x x r}. We use for ℓ2 norm throughout. We also write Bd(δ) for a ball centered at the origin with radius δ. For a R, we use [a]+ to denote max{0, a}. We denote DX as the diameter of a set X. Continuous / Smooth Games. An n-player continuous game has a set of n players [n] := {1, 2, . . . , n}. Each player i [n] has a nonempty convex and compact strategy set Xi Rdi. For a joint strategy profile x = (xi, x i) Qn j=1 Xj, the reward of player i is determined by a utility function ui : Qn j=1 Xj [0, 1]. We denote by d = Pn i=1 di the dimensionality of the game and assume maxi [n]{DXi} D. A smooth game is a continuous game whose utility functions further satisfy the following assumption. Assumption 1 (Smooth Games). The utility function ui(xi, x i) for any player i [n] is differentiable and satisfies 1. (G-Lipschitzness) xiui(x) G for all i and x Qn j=1 Xj; 2. (L-smoothness) there exists Li > 0 such that xiui(xi, x i) xiui(x i, x i) Li xi x i for all xi, x i Xi and x i Πj =i Xj. We denote L = maxi Li as the smoothness of the game. Crucially, we make no assumption on the concavity of ui(xi, x i). Φ-equilibrium and Φ-regret. Below we formally introduce the concept of Φ-equilibrium and its relationship with online learning and Φ-regret minimization. Definition 1 (Φ-equilibrium [GJ03; SL07]). In a continuous game, a distribution σ over joint strategy profiles Πn i=1Xi is an ε-approximate Φ-equilibrium for some ε 0 and a profile of strategy modification sets Φ = Πn i=1Φi if and only if for all player i [n], maxϕ Φi Ex σ[ui(ϕ(xi), x i)] Ex σ[ui(x)] + ε. When ε = 0, we call σ a Φ-equilibrium. We consider the standard online learning setting: at each day t [T], the learner chooses an action xt from a nonempty convex compact set X Rm and the adversary chooses a possibly non-convex loss function f t : X R, then the learner suffers a loss f t(xt) and receives feedback. In this paper, we focus on two feedback models: (1) the player receives an oracle for f t( ); (2) the player receives only the gradient f t(xt). The classic goal of an online learning algorithm is to minimize the external regret defined as Reg T := maxx X PT t=1(f t(xt) f t(x)). An algorithm is called no-regret if its external regret is sublinear in T. The notion of Φ-regret generalizes external regret by allowing more general strategy modifications. Definition 2 (Φ-regret). Let Φ be a set of strategy modification functions {ϕ : X X}. For T 1, the Φ-regret of an online learning algorithm is Reg T Φ := maxϕ Φ PT t=1 (f t(xt) f t(ϕ(xt))). An algorithm is called no Φ-regret if its Φ-regret is sublinear in T. Many classic notions of regret can be interpreted as Φ-regret. For example, the external regret is Φext-regret where Φext contains all constant strategy modifications ϕx (x) = x for all x X. The swap regret on simplex m is Φswap-regret where Φswap contains all linear transformations ϕ : m m. A fundamental result for learning in games is that no-Φ-regret learning dynamics in games converge to an approximate Φ-equilibrium [GJ03]. Theorem 1 ([GJ03]). If each player i s Φi-regret is upper bounded by Reg T Φi, then their empirical distribution of strategy profiles played is an (maxi [n] Reg T Φi/T)-approximate Φ-equilibrium. 3 Tractable Φ-Equilibrium for Finite Φ via Sampling In this section, we revisit the problem of computing and learning an Φ-equilibrium in non-concave games when each player s set of strategy modifications ΦXi is finite. The pioneering work of Stoltz and Lugosi [SL07] gives a no-Φ-regret algorithm for this case where each player chooses a distribution over strategies in each round. This result also implies convergence to Φ-equilibrium. However, the algorithm by Stoltz and Lugosi [SL07] is not computationally efficient. In each iteration, their algorithm requires computing a distribution that is stationary under a transformation that can be represented as a mixture of the modifications in Φ. The existence of such a stationary distribution is guaranteed by the Schauder-Cauty fixed-point theorem [Cau01], but the distribution might require exponential support and be intractable to find. Our main result in this section is an efficient Φ-regret minimization algorithm (Algorithm 1) that circumvents the step of the exact computation of a stationary distribution. Consequently, our algorithm also ensures efficient convergence to a Φ-equilibrium when adopted by all players. Algorithm 1: Φ-regret minimization for non-concave reward via sampling Input: xroot X, h 2, an external regret minimization algorithm RΦ over Φ Output: A Φ-regret minimization algorithm for X 1 function NEXTSTRATEGY() 2 pt RΦ.NEXTSTRATEGY(). Note that pt is a distribution over Φ. 3 return xt SAMPLESTRATEGY(xroot, h, pt). 4 function OBSERVEREWARD(ut( )) 5 Set ut Φ(ϕ) = ut(ϕ(xt)) for all ϕ Φ. 6 RΦ.OBSERVEREWARD(ut Φ( )). Algorithm 2: SAMPLESTRATEGY Input: xroot X, h 2, pt (Φ) Output: x X 1 x1 xroot. 2 for 2 k h do 3 ϕ sample form Φ according to pt. 4 xk = ϕ(xk 1). 5 return x from {x1, . . . , xh} uniformly at random. Theorem 2. Let X be a strategy set (not necessarily compact or convex), Φ be an arbitrary finite set of strategy modifications over X, and u1( ), . . . , u T ( ) be an arbitrary sequence of possibly non-concave reward functions from X to [0, 1]. If we instantiate Algorithm 1 with RΦ being the Hedge algorithm over Φ and h = T, the algorithm guarantees that, with probability at least 1 β, it produces a sequence of strategies x1, . . . , x T with Φ-regret at most maxϕ Φ PT t=1 ut(ϕ(xt)) PT t=1 ut(xt) 8 p T(log |Φ| + log(1/β)). Moreover, the algorithm runs in time O( T|Φ|) per iteration. If all players in a non-concave continuous game employ Algorithm 1, then with probability at least 1 β, for any ε > 0, the empirical distribution of strategy profiles played forms an ε-approximate Φ = Πn i=1ΦXi-equilibrium, after poly 1 ε, log maxi |ΦXi| , log n β iterations. High-level ideas. We adopt the framework in [SL07]. The framework contains two steps in each iteration t: (1) the learner runs a no-external-regret algorithm over Φ which outputs pt (Φ) in each iteration t; (2) the learner chooses a stationary distribution µt = P ϕ Φ ptϕ(µt), where we slightly abuse notation to use ϕ(µt) to denote the image measure of µ by ϕ. However, how to compute the stationary distribution µt efficiently is unclear. We essentially provide a computationally efficient way to carry out step (2) without computing this stationary distribution. We first construct an ε-approximate stationary distribution by recursively applying strategy modifications from Φ. The constructed distribution can be viewed as a tree. Our construction is inspired by the recent work of Zhang, Anagnostides, Farina, and Sandholm [Zha+24] for concave games. The main difference here is that for non-concave games, the distribution needs to be approximately stationary with respect to a mixture of strategy modifications rather than a single one as in concave games. Consequently, this leads to an approximate stationary distribution with prohibitively high support size (|Φ|) T , as opposed to T in [Zha+24] for concave games. Despite the exponentially large support size of the distribution, we utilize its tree structure to design a simple and efficient sampling procedure that runs in time T. Equipped with such a sampling procedure, we provide an efficient randomized algorithm that generates a sequence of strategies so that, with high probability, the Φ-regret for this sequence of strategies is at most O( p T log |Φ|). We defer the full proof of Theorem 2 to Section 3.1. An extension of Theorem 2 to infinite Φ holds when the rewards {ut}t [T ] are G-Lipschitz and Φ admits an α-cover with size N(α). In particular, when Φ is the set of all M-Lipschitz functions over [0, 1]d, Φ admits an α-cover with log N(α) of the order (1/α)d [SL07]. In this case, we have Corollary 1. There is a randomized algorithm such that, with probability at least 1 β, the Φ-regret is bounded by c T d+1 d+2 log(1/β), where c only depends on G and M. The algorithm runs in time poly(T, N(T 1/(d+2))). 3.1 Proof of Theorem 2 For a distribution µ (X) over strategy space X, we slightly abuse notation and define its expected utility as ut(µ) := Ex µ[ut(x)] [0, 1]. We define ϕ(µ) the image of µ under transformation ϕ. In each iteration t, the learner chooses their strategy xt X according to the distribution µt. For a sequence of strategies {xt}t [T ], the Φ-regret is Reg T Φ := maxϕ Φ n PT t=1 (ut(ϕ(xt)) ut(xt)) o . Algorithm 1 uses an external regret minimization algorithm RΦ over Φ which outputs a distribution pt (Φ). We can then decompose the Φ-regret into two parts. Reg T Φ = max ϕ Φ t=1 ut(ϕ(xt)) Eϕ pt ut(ϕ (xt)) ) | {z } I: external regret over Φ t=1 Eϕ pt ut(ϕ (xt)) ut(xt) | {z } II: approximation error of stationary distribution I: Bounding the external regret over Φ. The external regret over Φ can be bounded directly. This is equivalent to an online expert problem: in each iteration t, the external regret minimizer RΦ chooses pt (Φ) and the adversary then determines the utility of each expert ϕ Φ as ut(ϕ(xt)). We choose the external regret minimizer RΦ to be the Hedge algorithm [FS99]. Then we have maxϕ Φ n PT t=1 ut(ϕ(xt)) Eϕ pt[ut(ϕ (xt))] o 2 p T log |Φ| (Theorem 6), where we use the fact that the utility function ut is bounded in [0, 1]. II: Bounding error due to sampling from an approximate stationary distribution. We first define a distribution µt over X using a complete |Φ|-ary tree with depth h. The root of this tree is an arbitrary strategy xroot X. Each internal node x has exactly |Φ| children, denoted as {ϕ(x)}ϕ Φ. The distribution µt is supported on the nodes of this tree. Next, we define the probability for each node under the distribution µt. The root node xroot receives probability 1 h. The probability of other nodes is defined in a recursive manner. For every node x = ϕ(xp) where xp is its parent, x receives probability Prµt[x] = Prµt[xp] pt(ϕ). It is then clear that the total probability of the children of a node xp is exactly Prµt[x is xp s child] = P ϕ Pr[xp] pt(ϕ) = Pr[xp]. Denote the set of nodes in depth k as Nk. We have Prµt[x Nk] = 1 h for every depth 1 k h. Thus the distribution µt supports on |Φ|h 1 |Φ| 1 points and is well-defined. By the construction above, we know xt output by Algorithm 2 is a sample from µt. Now we show that the approximation error of µt is bounded by O( 1 h). We can evaluate the approximation error of µt: Eϕ pt ut(ϕ(µt)) ut(µt) = Eϕ pt x Nk Pr µt [x]ut(ϕ(x)) x Nk Pr µt [x]ut(x) Eϕ pt Pr µt [x]ut(ϕ(x)) Pr µt [x]ut(x) . We recall that for a node x = ϕ(xp) with xp being its parent, we have Prµt[x] = Prµt[xp] pt(ϕ). Thus for any 1 k h 1, we have Eϕ pt Pr µt [x]ut(ϕ(x)) Pr µt [x]ut(x) = X ϕ Φ pt(ϕ) Pr µt [x]ut(ϕ(x)) Pr µt [x]ut(x) x Nk+1 Pr µt [x]ut(x) X x Nk Pr µt [x]ut(x). Using the above equality, we get Eϕ pt ut(ϕ(µt)) ut(µt) x Nk+1 Pr µt [x]ut(x) + X ϕ Φ pt(ϕ) Pr µt [x]ut(ϕ(x)) x Nk Pr µt [x]ut(x) Pr µt [xroot]ut(xroot) ϕ Φ pt(ϕ) Pr µt [x]ut(ϕ(x)) Pr µt [xroot]ut(xroot) 1 where in the last inequality we use the fact that P x Nk Prµt[x] = 1 h for all 1 k h and the utility function ut is bounded in [0, 1]. Therefore, for xt µt, the sequence of random variables {Pτ t=1 (Eϕ pt[ut(ϕ(xt))] ut(xt) 1 h)}τ 1 is a super-martingale. Thanks to the boundedness of the utility function, we can apply the Hoeffding-Azuma Inequality and get for any ε > 0. Eϕ pt ut(ϕ(xt)) ut(xt) 1 Combining the bounds for I and II with ε = p 8T log(1/β) and h = T, we have, with probability 1 β, that Reg T Φ 2 p T log |Φ| + T 8T log(1/β) 8 p T(log |Φ| + log(1/β)). Convergence to Φ-equilibrium. If all players in a non-concave continuous game employ Algorithm 1, then we know for each player i, with probability 1 β n, its ΦXi-regret is upper bounded by 8 p T(log |ΦXi| + log(n/β)). By a union bound over all n players, we get with probability 1 β, every player i s ΦXi-regret is upper bounded by 8 p T(log |ΦXi| + log(n/β)). Now by Theorem 1, we know the empirical distribution of strategy profiles played forms an ε-approximate Φ = Πn i=1ΦXi-equilibrium, as long as T 64(log(maxi |ΦXi|) + log(n/β))/ε2 iterations. 4 Approximate Φ-Equilibria under Infinite Local Strategy Modifications This section studies Φ-equilibrium when |Φ| is infinite. It is, in general, computationally hard to compute a Φ-equilibrium even if Φ contains all constant deviations. Instead, we focus on Φ that consists solely of local strategy modifications. We introduce several natural classes of local strategy modifications and provide efficient online learning algorithms that converge to ε-approximate Φequilibrium in the first-order stationary regime where ε = Ω(δ2L). These approximate Φ-equilibria guarantee first-order stability. Definition 3 (δ-local strategy modification). For each agent i, we call a set of strategy modifications ΦXi δ-local if for all x Xi and ϕi ΦXi, ϕi(x) x δ. We use notation ΦXi(δ) to denote a δ-local strategy modification set for agent i. We also use Φ(δ) = Πn i=1ΦXi(δ) to denote a profile of δ-local strategy modification sets. Below we present a useful reduction from computing an ε-approximate Φ(δ)-equilibrium in nonconcave smooth games to ΦXi(δ)-regret minimization against convex losses for any ε δ2L 2 . The key observation here is that the L-smoothness of the utility function permits, within a δ-neighborhood, a δ2L 2 -approximation using a linear function. We defer the proof to Appendix D. Lemma 1 (No Φ(δ)-Regret for Convex Losses to Approximate Φ(δ)-Equilibrium in Non-Concave Games). For any T 1 and δ > 0, let A be an algorithm that guarantees to achieve no more than Reg T ΦXi(δ) ΦXi(δ)-regret for convex loss functions for each agent i [n]. Then 1. The ΦXi(δ)-regret of A for non-convex and L-smooth losses is at most Reg T ΦXi(δ) + δ2LT 2. When every agent employs A in a non-concave L-smooth game, their empirical distribution of the joint strategies played converges to a (maxi [n]{Reg T ΦXi (δ)}/T + δ2L 2 )-approximate Φ(δ)-equilibrium. 4.1 Projection-Based Local Strategy Modifications In this section, we study a set of local strategy modifications based on projection. Specifically, the set ΦX Proj(δ) encompasses all deviations that essentially add a fixed displacement vector v to the input strategy and project back to the feasible set: ΦX Proj(δ) := {ϕProj,v(x) = ΠX [x v] : v Bd(δ)}. It is clear that ϕv(x) x v δ. The induced ΦX Proj(δ)-regret is Reg T Proj,δ := maxv Bd(δ) PT t=1 (f t(xt) f t(ΠX [xt v])). We also define ΦProj(δ) = Πn i=1ΦXi Proj(δ) By Lemma 1, to achieve convergence to an approximate ΦProj(δ)-equilibrium, it suffices to minimize ΦX Proj(δ)-regret against convex losses. However, to our knowledge, there does not exist an algorithm that minimize ΦX Proj(δ)-regret even in the convex case. In fact, external regret and ΦX Proj(δ)-regret are provably incomparable: a sequence of actions may suffer high Reg T but low Reg T Proj,δ (Example 3) or vise versa (Example 4). At a high level, the external regret competes against a fixed action, whereas ΦX Proj(δ)-regret is more akin to the notion of dynamic regret, competing with a sequence of varying actions. Despite this, surprisingly, we show that classical algorithms like Online Gradient Descent (GD) and Optimistic Gradient (OG), known for minimizing external regret, also attain near-optimal ΦX Proj(δ)-regret. We defer the examples and missing proofs to Appendix E. ΦX Proj(δ)-Regret Minimization in the Adversarial Setting. We show that (GD) enjoys an O(G δDX T) ΦX Proj(δ)-regret despite the difference between the external regret and ΦX Proj(δ)- regret. First, let us recall the update rule of GD: given initial point x1 X and step size η > 0, GD updates in each iteration t: xt+1 = ΠX [x η f t(xt)]. (GD) The key step in our analysis for GD is simple but novel and general (See Appendix E.2). We extend the analysis to the Optimistic Gradient (OG) algorithm in Appendix E.4. Theorem 3. Let δ > 0 and T N. For any convex and G-Lipschitz loss functions {f t : X R}t [T ], the ΦX Proj(δ)-regret of (GD) with step size η > 0 is Reg T Proj,δ δ2 can choose η optimally as T and attain Reg T Proj,δ 2G p δ(δ + DX )T. For any δ > 0 and any ε > 0, when all player employ GD in a smooth game, their empirical distribution of played strategy profiles converges to an (ε + δ2L 2 )-approximate ΦProj(δ)-equilibrium in O(1/ε2) iterations. Remark 1. The ΦX Proj(δ)-regret can also be viewed as the dynamic regret [Zin03] with changing comparators {pt := ΠX [x v]}. However, our analysis does not follow from standard O( (1+PT ) ηT) dynamic regret bound of GD [Zin03] since PT , defined as PT t=2 pt pt 1 , can be Ω(ηT). Lower bounds for ΦX Proj(δ)-regret. We complement our upper bound with two lower bounds for ΦX Proj(δ)-regret minimization. The first one is an Ω(δG T) lower bound for any online learning algorithms against linear loss functions (Theorem 7). The second one is an Ω(δ2LT) lower bound for any algorithm that satisfies the linear span assumption (Proposition 1), which holds for GD and OG against L-smooth non-convex losses. Combining with Lemma 1, this lower bound suggests that GD attains nearly optimal ΦX Proj(δ)-regret, even in the non-convex setting, among a natural family of gradient-based algorithms. We defer the theorem statements and detailed discussion to Appendix E.3. Improved ΦX Proj(δ)-Regret in the Game Setting. We complement the Ω( T) lower bound in the adversarial setting by considering the game setting where players interact with each other using the same algorithm setting, which has been extensively studied for concave games [Syr+15; CP20; DFG21; Ana+22a; Ana+22b; Far+22a]. We prove an improved O(T 1 4 ) individual ΦX Proj(δ)-regret bound for OG (Theorem 9). We defer the details to Appendix E.4 4.2 Convex Combination of Finite Local Strategy Modifications This section considers Conv(Φ) where Φ is a finite set of local strategy modifications. The set of infinite strategy modifications Conv(Φ) is defined as Conv(Φ) = {ϕp(x) = P ϕ Φ p(ϕ)ϕ(x) : p (Φ)}. Our main result is an efficient algorithm (Algorithm 3) that guarantees convergence to an ε-approximate Conv(Φ)-equilibrium in a smooth game satisfying Assumption 1 for any ε > δ2L. Due to space constraints, we defer Algorithm 3 and the proof to Appendix F. Theorem 4. Let X be a convex and compact set, Φ be an arbitrary finite set of δ-local strategy modification functions for X, and u1( ), . . . , u T ( ) be a sequence of G-Lipschitz and L-smooth but possibly non-concave reward functions from X to [0, 1]. If we instantiate Algorithm 3 with RΦ being the Hedge algorithm over (Φ) and K = T, the algorithm guarantees that, with probability at least 1 β, it produces a sequence of strategies x1, . . . , x T with Conv(Φ)-regret at most 8 log |Φ| + p log(1/β)) + δ2LT. The algorithm runs in time T|Φ| per iteration. If all players in a non-concave smooth game employ Algorithm 3, then with probability 1 β, for any ε > 0, the empirical distribution of strategy profiles played forms an (ε + δ2L)-approximate Φ = Πn i=1ΦXi-equilibrium after poly 1 ε, G, log maxi |ΦXi| , log n β iterations. 4.3 Interpolation-Based Local Strategy Modifications We introduce a natural set of local strategy modifications and the corresponding local equilibrium notion. Given any set of (possibly non-local) strategy modifications Ψ = {ψ : X X}, we define a set of local strategy modifications as follows: for δ DX and λ [0, 1], each strategy modification ϕλ,ψ interpolates the input strategy x with the modified strategy ψ(x): formally, ΦX Int,Ψ(δ) := {ϕλ,ψ(x) := (1 λ)x + λψ(x) : ψ Ψ, λ δ/DX } . Note that for any ψ Ψ and λ δ DX , we have ϕλ,ψ(x) x = λ x ψ(x) δ, respecting the locality constraint. The induced ΦX Int,Ψ(δ)-regret can be written as Reg T Int,Ψ,δ := maxψ Ψ,λ δ DX PT t=1 (f t(xt) f t((1 λ)xt + λψ(xt))). To guarantee convergence to the corre- sponding Φ-equilibrium, it suffices to minimize ΦX Int,Ψ(δ)-regret against convex losses, which we show further reduces to Ψ-regret minimization against convex losses (Theorem 10 in Appendix G). CCE-like Instantiation. In the special case where Ψ contains only constant strategy modifications (i.e., ψ(x) = x for all x), we get a coarse correlated equilibrium (CCE)-like instantiation of local equilibrium, which limits the gain by interpolating with any fixed strategy. We denote the resulting set of local strategy modification simply as ΦX Int(δ). We can apply any no-external regret algorithm for efficient ΦX Int(δ)-regret minimization and computation of ε-approximate ΦInt(δ)-equilibrium in the first-order stationary regime as summarized in Theorem 5. We also discuss faster convergence rates in the game setting in Appendix G. Theorem 5. For the Online Gradient Descent algorithm (GD) [Zin03] with step size η = DX G ΦX Int(δ)-regret is at most 2δG T. Furthermore, for any δ > 0 and any ε > δ2L 2 , when all players employ the GD algorithm in a smooth game, their empirical distribution of played strategy profiles converges to an (ε + δ2L 2 )-approximate ΦInt(δ)-equilibrium in O(1/ε2) iterations. 5 Discussion and Future Directions Lower Bound in the Global Regime When δ equals the diameter of our strategy set, it is NP-hard to compute an ε-approximate Φ(δ)-equilibrium (for Φ(δ) = ΦProj(δ), ΦInt(δ)), even when ε = Θ(1) and G, L = O(poly(d)). Moreover, given black-box access to value and gradient queries, finding such equilibria requires exponentially many queries in at least one of the parameters d, G, L, 1/ε. These results are presented as Theorem 12 and Theorem 13 in Appendix I. More Efficient Φ-Equilibria We include the discussion of another natural class of local strategy modifications that is based on beam search, where GD suffers linear regret to Appendix H. This result shows that even for simple local strategy modification sets Φ(δ), the landscape of efficient local Φ(δ)-regret minimization is already quite rich and many questions remain open. A fruitful future direction is to identify more classes of Φ that admit efficient regret minimization. Acknowledgements We thank the anonymous reviewers for their constructive comments that improves the paper. Y.C. acknowledge the support from the NSF Awards CCF-1942583 (CAREER) and CCF-2342642. W.Z. was supported by the NSF Awards CCF-1942583 (CAREER), CCF-2342642, and a Research Fellowship from the Center for Algorithms, Data, and Market Design at Yale (CADMY). H.L. was supported by NSF award IIS-1943607. [AD54] Kenneth J Arrow and Gerard Debreu. Existence of an equilibrium for a competitive economy . In: Econometrica (1954), pp. 265 290. [AFS23] Ioannis Anagnostides, Gabriele Farina, and Tuomas Sandholm. Near-Optimal Phi Regret Learning in Extensive-Form Games . In: International Conference on Machine Learning (ICML). 2023. [AGH19] Naman Agarwal, Alon Gonen, and Elad Hazan. Learning in non-convex games with an optimization oracle . In: Conference on Learning Theory. PMLR. 2019, pp. 18 29. [AHK12] Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications . In: Theory of computing 8.1 (2012), pp. 121 164. [ALW21] Jacob Abernethy, Kevin A Lai, and Andre Wibisono. Last-iterate convergence rates for min-max optimization: Convergence of hamiltonian gradient descent and consensus optimization . In: Algorithmic Learning Theory. PMLR. 2021, pp. 3 47. [Ana+22a] Ioannis Anagnostides, Constantinos Daskalakis, Gabriele Farina, Maxwell Fishelson, Noah Golowich, and Tuomas Sandholm. Near-optimal no-regret learning for correlated equilibria in multi-player general-sum games . In: Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing (STOC). 2022. [Ana+22b] Ioannis Anagnostides, Gabriele Farina, Christian Kroer, Chung-Wei Lee, Haipeng Luo, and Tuomas Sandholm. Uncoupled Learning Dynamics with O(log T) Swap Regret in Multiplayer Games . In: Advances in Neural Information Processing Systems (Neur IPS). 2022. [Aue+02] Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E Schapire. The nonstochastic multiarmed bandit problem . In: SIAM journal on computing 32.1 (2002), pp. 48 77. [AZ22] Amir Ali Ahmadi and Jeffrey Zhang. On the complexity of finding a local minimizer of a quadratic function over a polytope . In: Mathematical Programming 195.1 (2022), pp. 783 792. [AZF19] Sergul Aydore, Tianhao Zhu, and Dean P Foster. Dynamic local regret for non-convex online forecasting . In: Advances in neural information processing systems 32 (2019). [Bai+22] Yu Bai, Chi Jin, Song Mei, Ziang Song, and Tiancheng Yu. Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent . In: Advances in Neural Information Processing Systems 35 (2022), pp. 22313 22325. [Ber+23] Martino Bernasconi, Matteo Castiglioni, Alberto Marchesi, Francesco Trovò, and Nicola Gatti. Constrained Phi-Equilibria . In: International Conference on Machine Learning. 2023. [Bla56] David Blackwell. An analog of the minimax theorem for vector payoffs. In: Pacific Journal of Mathematics 6.1 (Jan. 1956). Publisher: Pacific Journal of Mathematics, A Non-profit Corporation, pp. 1 8. [Bro51] George W Brown. Iterative solution of games by fictitious play . In: Act. Anal. Prod Allocation 13.1 (1951), p. 374. [Bub+15] Sébastien Bubeck et al. Convex optimization: Algorithms and complexity . In: Foundations and Trends in Machine Learning 8.3-4 (2015), pp. 231 357. [Cau01] Robert Cauty. Solution du problème de point fixe de Schauder . en. In: Fundamenta Mathematicae 170 (2001). Publisher: Instytut Matematyczny Polskiej Akademii Nauk, pp. 231 246. [CDT09] Xi Chen, Xiaotie Deng, and Shang-Hua Teng. Settling the complexity of computing two-player Nash equilibria . In: Journal of the ACM (JACM) 56.3 (2009), pp. 1 57. [CL06] Nicolo Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge university press, 2006. [CP20] Xi Chen and Binghui Peng. Hedging in games: Faster convergence of external and swap regrets . In: Advances in Neural Information Processing Systems (Neur IPS) 33 (2020), pp. 18990 18999. [CZ23] Yang Cai and Weiqiang Zheng. Accelerated Single-Call Methods for Constrained Min-Max Optimization . In: International Conference on Learning Representations (ICLR) (2023). [Dag+24] Yuval Dagan, Constantinos Daskalakis, Maxwell Fishelson, and Noah Golowich. From External to Swap Regret 2.0: An Efficient Reduction for Large Action Spaces . In: Proceedings of the 56th Annual ACM Symposium on Theory of Computing. 2024, pp. 1216 1222. [Das+23] Constantinos Daskalakis, Noah Golowich, Stratis Skoulakis, and Emmanouil Zampetakis. STay-ON-the-Ridge: Guaranteed Convergence to Local Minimax Equilibrium in Nonconvex-Nonconcave Games . In: The Thirty Sixth Annual Conference on Learning Theory. PMLR. 2023, pp. 5146 5198. [Das22] Constantinos Daskalakis. Non-Concave Games: A Challenge for Game Theory s Next 100 Years . In: Cowles Preprints (2022). [DDJ21] Jelena Diakonikolas, Constantinos Daskalakis, and Michael Jordan. Efficient methods for structured nonconvex-nonconcave min-max optimization . In: International Conference on Artificial Intelligence and Statistics (2021). [Deb52] Gerard Debreu. A social equilibrium existence theorem . In: Proceedings of the National Academy of Sciences 38.10 (1952), pp. 886 893. [DFG21] Constantinos Daskalakis, Maxwell Fishelson, and Noah Golowich. Near-optimal noregret learning in general games . In: Advances in Neural Information Processing Systems (Neur IPS) (2021). [DGP09] Constantinos Daskalakis, Paul W Goldberg, and Christos H Papadimitriou. The complexity of computing a Nash equilibrium . In: Communications of the ACM 52.2 (2009), pp. 89 97. [DP18] Constantinos Daskalakis and Ioannis Panageas. The limit points of (optimistic) gradient descent in min-max optimization . In: the 32nd Annual Conference on Neural Information Processing Systems (Neur IPS). 2018. [DSZ21] Constantinos Daskalakis, Stratis Skoulakis, and Manolis Zampetakis. The complexity of constrained min-max optimization . In: Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing (STOC). 2021. [Fan53] Ky Fan. Minimax theorems . In: Proceedings of the National Academy of Sciences 39.1 (1953), pp. 42 47. [Far+22a] Gabriele Farina, Ioannis Anagnostides, Haipeng Luo, Chung-Wei Lee, Christian Kroer, and Tuomas Sandholm. Near-optimal no-regret learning dynamics for general convex games . In: Advances in Neural Information Processing Systems 35 (2022), pp. 39076 39089. [Far+22b] Gabriele Farina, Andrea Celli, Alberto Marchesi, and Nicola Gatti. Simple Uncoupled No-regret Learning Dynamics for Extensive-form Correlated Equilibrium . In: Journal of the ACM 69.6 (2022). [FCR20] Tanner Fiez, Benjamin Chasnov, and Lillian Ratliff. Implicit learning dynamics in stackelberg games: Equilibria characterization, convergence analysis, and empirical study . In: International Conference on Machine Learning. PMLR. 2020, pp. 3133 3144. [FR21] Tanner Fiez and Lillian J Ratliff. Local convergence analysis of gradient descent ascent with finite timescale separation . In: Proceedings of the International Conference on Learning Representation. 2021. [FS99] Yoav Freund and Robert E. Schapire. Adaptive Game Playing Using Multiplicative Weights . In: Games and Economic Behavior 29 (1999), pp. 79 103. [Geo63] George B. Dantzig. Linear Programming and Extensions. Princeton University Press, 1963. [GGM08] Geoffrey J Gordon, Amy Greenwald, and Casey Marks. No-regret learning in convex games . In: Proceedings of the 25th international conference on Machine learning. 2008, pp. 360 367. [GJ03] Amy Greenwald and Amir Jafari. A general class of no-regret learning algorithms and game-theoretic equilibria . In: Learning Theory and Kernel Machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop, COLT/Kernel 2003, Washington, DC, USA, August 24-27, 2003. Proceedings. Springer. 2003, pp. 2 12. [Gli52] Irving L Glicksberg. A further generalization of the Kakutani fixed theorem, with application to Nash equilibrium points . In: Proceedings of the American Mathematical Society 3.1 (1952), pp. 170 174. [GLZ18] Xiand Gao, Xiaobo Li, and Shuzhong Zhang. Online learning with non-convex losses and non-stationary regret . In: International Conference on Artificial Intelligence and Statistics. PMLR. 2018, pp. 235 243. [GZL23] Ziwei Guan, Yi Zhou, and Yingbin Liang. Online Nonconvex Optimization with Limited Instantaneous Oracle Feedback . In: The Thirty Sixth Annual Conference on Learning Theory. PMLR. 2023, pp. 3328 3355. [Han57] James Hannan. Approximation to Bayes risk in repeated play . In: Contributions to the Theory of Games 3 (1957), pp. 97 139. [Hél+20] Amélie Héliou, Matthieu Martin, Panayotis Mertikopoulos, and Thibaud Rahier. Online non-convex optimization with imperfect feedback . In: Advances in Neural Information Processing Systems 33 (2020), pp. 17224 17235. [HMC21a] Nadav Hallak, Panayotis Mertikopoulos, and Volkan Cevher. Regret minimization in stochastic non-convex learning via a proximal-gradient approach . In: International Conference on Machine Learning. PMLR. 2021, pp. 4008 4017. [HMC21b] Ya-Ping Hsieh, Panayotis Mertikopoulos, and Volkan Cevher. The limits of min-max optimization algorithms: Convergence to spurious non-critical sets . In: International Conference on Machine Learning. PMLR. 2021, pp. 4337 4348. [HSZ17] Elad Hazan, Karan Singh, and Cyril Zhang. Efficient regret minimization in nonconvex games . In: International Conference on Machine Learning. PMLR. 2017, pp. 1433 1441. [JNJ20] Chi Jin, Praneeth Netrapalli, and Michael Jordan. What is local optimality in nonconvex-nonconcave minimax optimization? In: International conference on machine learning (ICML). PMLR. 2020, pp. 4880 4889. [Kar14] Samuel Karlin. Mathematical Methods and Theory in Games, Programming, and Economics: Volume 2: The Theory of Infinite Games. Elsevier, 2014. [Kar59] Samuel Karlin. Mathematical methods and theory in games, programming, and economics: Volume II: the theory of infinite games. Vol. 2. Addision-Wesley, 1959. [Kri+15] Walid Krichene, Maximilian Balandat, Claire Tomlin, and Alexandre Bayen. The hedge algorithm on a continuum . In: International Conference on Machine Learning. PMLR. 2015, pp. 824 832. [Mc K54] Lionel Mc Kenzie. On equilibrium in Graham s model of world trade and other competitive systems . In: Econometrica (1954), pp. 147 161. [MK87] Katta G. Murty and Santosh N. Kabadi. Some NP-complete problems in quadratic and nonlinear programming . en. In: Mathematical Programming 39.2 (June 1987), pp. 117 129. [MM10] Odalric-Ambrym Maillard and Rémi Munos. Online learning in adversarial lipschitz environments . In: Joint european conference on machine learning and knowledge discovery in databases. Springer. 2010, pp. 305 320. [Mor+21a] Dustin Morrill, Ryan D Orazio, Reca Sarfati, Marc Lanctot, James R Wright, Amy R Greenwald, and Michael Bowling. Hindsight and sequential rationality of correlated play . In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. 6. 2021, pp. 5584 5594. [Mor+21b] Dustin Morrill, Ryan D Orazio, Marc Lanctot, James R Wright, Michael Bowling, and Amy R Greenwald. Efficient deviation types and learning for hindsight rationality in extensive-form games . In: International Conference on Machine Learning. PMLR. 2021, pp. 7818 7828. [MRS20] Eric Mazumdar, Lillian J Ratliff, and S Shankar Sastry. On gradient-based learning in continuous games . In: SIAM Journal on Mathematics of Data Science 2.1 (2020), pp. 103 131. [MV21] Oren Mangoubi and Nisheeth K Vishnoi. Greedy adversarial equilibrium: an efficient alternative to nonconvex-nonconcave min-max optimization . In: Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing. 2021, pp. 896 909. [MZ19] Panayotis Mertikopoulos and Zhengyuan Zhou. Learning in games with continuous action sets and unknown payoff functions . In: Mathematical Programming 173 (2019), pp. 465 507. [Nas50] John F Nash Jr. Equilibrium points in n-person games . In: Proceedings of the national academy of sciences 36.1 (1950), pp. 48 49. [Neu28] J v. Neumann. Zur theorie der gesellschaftsspiele . In: Mathematische annalen 100.1 (1928), pp. 295 320. [Pet+22] Thomas Pethick, Puya Latafat, Panagiotis Patrinos, Olivier Fercoq, and Volkan Cevherå. Escaping limit cycles: Global convergence for constrained nonconvex-nonconcave minimax problems . In: International Conference on Learning Representations (ICLR). 2022. [Pil+22] Georgios Piliouras, Mark Rowland, Shayegan Omidshafiei, Romuald Elie, Daniel Hennes, Jerome Connor, and Karl Tuyls. Evolutionary dynamics and phi-regret minimization in games . In: Journal of Artificial Intelligence Research 74 (2022), pp. 1125 1158. [PR24] Binghui Peng and Aviad Rubinstein. Fast swap regret minimization and applications to approximate correlated equilibria . In: Proceedings of the 56th Annual ACM Symposium on Theory of Computing. 2024, pp. 1223 1234. [RBS16] Lillian J Ratliff, Samuel A Burden, and S Shankar Sastry. On the characterization of local Nash equilibria in continuous games . In: IEEE transactions on automatic control 61.8 (2016), pp. 2301 2307. [Rob51] Julia Robinson. An iterative method of solving a game . In: Annals of mathematics (1951), pp. 296 301. [Ros65] J Ben Rosen. Existence and uniqueness of equilibrium points for concave n-person games . In: Econometrica (1965), pp. 520 534. [RS13] Sasha Rakhlin and Karthik Sridharan. Optimization, learning, and games with predictable sequences . In: Advances in Neural Information Processing Systems (2013). [RST11] Alexander Rakhlin, Karthik Sridharan, and Ambuj Tewari. Online learning: Beyond regret . In: Proceedings of the 24th Annual Conference on Learning Theory. JMLR Workshop and Conference Proceedings. 2011, pp. 559 594. [Sha24] Dravyansh Sharma. No Internal Regret with Non-convex Loss Functions . In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 38. 13. 2024, pp. 14919 14927. [Sio58] Maurice Sion. On general minimax theorems. In: Pacific J. Math. 8.4 (1958), pp. 171 176. [SL07] Gilles Stoltz and Gábor Lugosi. Learning correlated equilibria in games with compact sets of strategies . In: Games and Economic Behavior 59.1 (2007), pp. 187 208. [SMB22] Ziang Song, Song Mei, and Yu Bai. Sample-efficient learning of correlated equilibria in extensive-form games . In: Advances in Neural Information Processing Systems 35 (2022), pp. 4099 4110. [SN20] Arun Sai Suggala and Praneeth Netrapalli. Online non-convex learning: Following the perturbed leader is optimal . In: Algorithmic Learning Theory. PMLR. 2020, pp. 845 861. [Syr+15] Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, and Robert E Schapire. Fast convergence of regularized learning in games . In: Advances in Neural Information Processing Systems (Neur IPS) (2015). [VF08] Bernhard Von Stengel and Françoise Forges. Extensive-form correlated equilibrium: Definition and computational complexity . In: Mathematics of Operations Research 33.4 (2008), pp. 1002 1022. [WZB20] Yuanhao Wang, Guodong Zhang, and Jimmy Ba. On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach . In: International Conference on Learning Representations (ICLR). 2020. [Zha+24] Brian Hu Zhang, Ioannis Anagnostides, Gabriele Farina, and Tuomas Sandholm. Efficient Φ-Regret Minimization with Low-Degree Swap Deviations in Extensive-Form Games . In: Advances in Neural Information Processing Systems. 2024. [Zin03] Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent . In: Proceedings of the 20th international conference on machine learning (ICML). 2003. 1 Introduction 1 1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Preliminaries 5 3 Tractable Φ-Equilibrium for Finite Φ via Sampling 5 3.1 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4 Approximate Φ-Equilibria under Infinite Local Strategy Modifications 8 4.1 Projection-Based Local Strategy Modifications . . . . . . . . . . . . . . . . . . . 9 4.2 Convex Combination of Finite Local Strategy Modifications . . . . . . . . . . . . 10 4.3 Interpolation-Based Local Strategy Modifications . . . . . . . . . . . . . . . . . . 10 5 Discussion and Future Directions 10 A Related Work 16 B Additional Preliminaries: Solution Concepts in Non-Concave Games 17 C Rerget Bound for Hedge 19 D Proof of Lemma 1 19 E Missing Details in Section 4.1 19 E.1 Differences between External Regret and ΦX Proj-regret . . . . . . . . . . . . . . . . 19 E.2 Proof of Theorem 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 E.3 Lower bounds for ΦX Proj-Regret . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 E.3.1 Proof of Theorem 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 E.3.2 Proof of Proposition 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 E.4 Improved ΦX Proj(δ)-Regret in the Game Setting and Proof of Theorem 9 . . . . . . 22 E.4.1 Proof of Theorem 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 E.4.2 Proof of Theorem 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 F Missing Details in Section 4.2 24 F.1 Proof of Theorem 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 G Missing details in Section 4.3 26 H Beam-Search Local Strategy Modifications and Local Equilibria 27 H.1 Proof of Theorem 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 I Hardness in the Global Regime 28 I.1 Proof of Theorem 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 I.2 Proof of Corollary 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 J Removing the D dependence for ΦX Proj-regret 30 J.1 One-Dimensional Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 J.2 d-Dimensional Box Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 A Related Work Non-Concave Games. An important special case of multi-player games are two-player zero-sum games, which are defined in terms of some function f : X Y R that one of the two players say the one choosing x X, wants to minimize, while the other player, the one choosing y Y, wants to maximize. Finding Nash equilibrium in such games is tractable in the convex-concave setting, i.e. when f(x, y) is convex with respect to the minimizing player s strategy, x, and concave with respect to the maximizing player s strategy, y, but it is computationally intractable in the general nonconvex-nonconcave setting. Namely, a Nash equilibrium may not exist, and it is NP-hard to determine if one exists and, if so, find it. Moreover, in this case, stable limit points of gradientbased dynamics are not necessarily Nash equilibria, not even local Nash equilibria [DP18; MRS20]. Moreover, there are examples including the Polar Game [Pet+22] and the Forsaken Matching Pennies [HMC21b] showing that for GD / OG and many other no-regret learning algorithms in nonconvex-nonconcave min-max optimization, the last-iterate does not converge and even the averageiterate fails to be a stationary point. We emphasize that the convergence guarantees we provide for GD / OG in Section 4.1 and Section 4.3 holds for the empirical distribution of play, not the average-iterate or the last-iterate. A line of work focuses on computing Nash equilibrium under additional structure in the game. This encompasses settings where the game satisfies the (weak) Minty variational inequality [MZ19; DDJ21; Pet+22; CZ23], or is sufficiently close to being bilinear [ALW21]. However, the study of universal solution concepts in the nonconvex-nonconcave setting is sparse. Daskalakis, Skoulakis, and Zampetakis [DSZ21] proved the existence and computational hardness of local Nash equilibrium. In a more recent work, [Das+23] proposes second-order algorithms with asymptotic convergence to local Nash equilibrium. Several works study sequential two-player zero-sum games with additional assumptions about the player who goes second. They propose equilibrium concepts such as local minimax points [JNJ20], differentiable Stackelberg equilibrium [FCR20], and greedy adversarial equilibrium [MV21]. Notably, local minimax points are stable limit points of Gradient-Descent Ascent (GDA) dynamics [JNJ20; WZB20; FR21] while greedy adversarial equilibrium can be computed efficiently using second-order algorithms in the unconstrained setting [MV21]. In contrast to these studies, we focus on the more general case of multi-player non-concave games. Local Equilibrium. To address the limitations associated with classical, global equilibrium concepts, a natural approach is to focus on developing equilibrium concepts that guarantee local stability instead. One definition of interest is the strict local Nash equilibrium, wherein each player s strategy corresponds to a local maximizer of their utility function, given the other players strategies. Unfortunately, a strict local Nash equilibrium may not always exist, as demonstrated in Example 1. Furthermore, a weaker notion the second-order local Nash equilibrium, where each player has no incentive to deviate based on the second-order Taylor expansion estimate of their utility, is also not guaranteed to exist as illustrated in Example 1. What s more, it is NP-hard to check whether a given strategy profile is a strict local Nash equilibrium or a second-order local Nash equilibrium, as implied by the result of Murty and Kabadi [MK87] and Ahmadi and Zhang [AZ22].6 Finally, one can consider local Nash equilibrium, a first-order stationary solution, which is guaranteed to exist [DSZ21]. Unlike non-convex optimization, where targeting first-order local optima sidesteps the intractability of global optima, this first-order local Nash equilibrium has been recently shown to be intractable, even in 6Murty and Kabadi [MK87] shows that checking whether a point is a local maximizer of a multi-variate quadratic function is NP-hard, and Ahmadi and Zhang [AZ22] shows that computing a local maximize of a multi-variate quadratic function over a polytope is NP-hard. two-player zero-sum non-concave games with joint feasibility constraints [DSZ21].7 See Table 1 for a summary of solution concepts in non-concave games. Online Learning with Non-Convex Losses. A line of work has studied online learning against non-convex losses. To circumvent the computational intractability of this problem, various approaches have been pursued: some works assume a restricted set of non-convex loss functions [GLZ18], while others assume access to a sampling oracle [MM10; Kri+15] or access to an offline optimization oracle [AGH19; SN20; Hél+20] or a weaker notion of regret [HSZ17; AZF19; HMC21a; GZL23]. The work most closely related to ours is [HSZ17]. The authors propose a notion of w-smoothed local regret against non-convex losses, and they also define a local equilibrium concept for non-concave games. They use the idea of smoothing to average the loss functions in the previous w iterations and design algorithms with optimal w-smoothed local regret. The concept of regret they introduce suggests a local equilibrium concept. However, their local equilibrium concept is non-standard in that its local stability is not with respect to a distribution over strategy profiles sampled by this equilibrium concept. Moreover, the path to attaining this local equilibrium through decentralized learning dynamics remains unclear. The algorithms provided in [HSZ17; GZL23] require that every agent i experiences (over several rounds) the average utility function of the previous w iterates, denoted as F t i,w := 1 w Pw 1 ℓ=0 ut ℓ i ( , xt ℓ i ). Implementing this imposes a significant coordination burden on the agents. In contrast, we focus on a natural concept of Φ(δ)-equilibrium, which is incomparable to that of [HSZ17], and we also show that efficient convergence to this concept is achieved via decentralized gradient-based learning dynamics. Φ-regret and Φ-equilibrium. The concept of Φ-regret and the associated Φ-equilibrium is introduced by Greenwald and Jafari [GJ03] and has been broadly investigated in the context of concave games [GJ03; SL07; GGM08; RST11; Pil+22; Ber+23; Dag+24; PR24] and extensiveform games [VF08; Mor+21a; Mor+21b; Far+22b; Bai+22; SMB22; AFS23; Zha+24]. The work of [Sha24] studies internal regret minimization against non-convex losses. To our knowledge, no efficient algorithm exists for the classes of Φ-equilibria we consider for non-concave games. Specifically, all of these algorithms, when applied to compute a (ε, Φ(δ))-equilibrium for a general δ-local strategy modification set Φ(δ) (using Lemma 1), require running time exponential in either 1/ε or the dimension d. In contrast, we show that for several natural choices of Φ(δ), ε-approximate Φ(δ)-equilibrium can be computed efficiently, i.e. polynomial in 1/ε and d, using simple algorithms. B Additional Preliminaries: Solution Concepts in Non-Concave Games We present definitions of several solution concepts in the literature as well as the existence and computational complexity of each solution concept. Definition 4 (Nash Equilibrium). In a continuous game, a strategy profile x Qn j=1 Xj is a Nash equilibrium (NE) if and only if for every player i [n], ui(x i, x i) ui(x), x i Xi Definition 5 (Mixed Nash Equilibrium). In a continuous game, a mixed strategy profile p Qn j=1 (Xj) (here we denote (Xi) as the set of probability measures over Xi) is a mixed Nash equilibrium (MNE) if and only if for every player i [n], ui(p i, p i) ui(p), p i (Xi) Definition 6 ((Coarse) Correlated Equilibrium). In a continuous game, a distribution σ over joint strategy profiles Πn i=1Xi is a correlated equilibrium (CE) if and only if for all player i [n], max ϕi:Xi Xi Ex σ[ui(ϕi(xi), x i)] Ex σ[ui(x)]. Similarly, a distribution σ over joint strategy profiles Πn i=1Xi is a coarse correlated equilibrium (CCE) if and only if for all player i [n], max x i Xi Ex σ[ui(x i, x i)] Ex σ[ui(x)]. 7In general sum games, it is not hard to see that the intractability results [DGP09; CDT09] for computing global Nash equilibria in bimatrix games imply intractability for computing local Nash equilibria. Definition 7 (Strict Local Nash Equilibrium). In a continuous game, a strategy profile x Qn j=1 Xj is a strict local Nash equilibrium if and only if for every player i [n], there exists δ > 0 such that ui(x i, x i) ui(x), x i Bdi(xi, δ) Xi. Definition 8 (Second-order Local Nash Equilibrium). Consider a continuous game where each utility function ui(xi, x i) is twice-differentiable with respect to xi for any fixed x i. A strategy profile x Qn j=1 Xj is a second-order local Nash equilibrium if and only if for every player i [n], xi maximizes the second-order Taylor expansion of its utility functions at xi, or formally, xiui(x), x i xi + (x i xi) 2 xiui(x)(x i xi) 0, x i Xi. Existence Mixed Nash equilibria exist in continuous games, thus smooth games [Deb52; Gli52; Fan53]. By definition, an MNE is also a CE and a CCE. This also proves the existence of CE and CCE. In contrast, strict local Nash equilibria, second-order Nash equilibria, or (pure) Nash equilibria may not exist in a smooth non-concave game, as we show in the following example. Example 1. Consider a two-player zero-sum non-concave game: the action sets are X1 = X2 = [ 1, 1] and the utility functions are u1(x1, x2) = u2(x1, x2) = (x1 x2)2. Let x = (x1, x2) X1 X2 be any strategy profile: if x1 = x2, then player 1 is not at a local maximizer; if x1 = x2, then player 2 is not at a local maximizer. Thus x is not a strict local Nash equilibrium. Since the utility function is quadratic, we conclude that the game also has no second-order local Nash equilibrium. Computational Complexity Consider a single-player smooth non-concave game with a quadratic utility function f : X R. The problem of finding a local maximizer of f can be reduced to the problem of computing a NE, a MNE, a CE, a CCE, a strict local Nash equilibrium, or a second-order local Nash equilibrium. Since computing a local maximizer or checking if a given point is a local maximizer is NP-hard [MK87], we know that the computational complexities of NE, MNE, CE, CCE, strict local Nash equilibria, and second-order local Nash equilibria are all NP-hard. Representation Complexity Karlin [Kar59] present a two-player zero-sum non-concave game whose unique MNE has infinite support. Since in a two-player zero-sum game, the marginal distribution of a CE or a CCE is an MNE, it also implies that the representation complexity of any CE or CCE is infinite. We present the example in Karlin [Kar59] here for completeness and also prove that the game is Lipschitz and smooth. Example 2 ([Kar59, Chapter 7.1, Example 3]). We consider a two-player zero-sum game with action sets X1 = X2 = [0, 1]. Let p and q be two distributions over [0, 1]. The only requirement for p and q is that their cumulative distribution functions are not finite-step functions. For example, we can take p = q to be the uniform distribution. Let µn and νn denote the n-th moments of p and q, respectively. Define the utility function u(x, y) = u1(x, y) = u2(x, y) = 1 2n (xn µn)(yn νn), 0 x, y 1. Claim 1. The game in Example 2 is 2-Lipschitz and 6-smooth, and (p, q) is its unique (mixed) Nash equilibrium. Proof. Fix any y [0, 1], since | 1 2n (yn νn)nxn 1| n 2n , the series of xu(x, y) is uniformly convergent. We have | xu(x, y)| P n=0 n 2n 2, y [0, 1]. Similarly, we have | 2 xu(x, y)| P n=0 n2 2n 6 for all y [0, 1]. By symmetry, we also have | y(x, y)| 2 and | 2 y(x, y)| 6 for all x, y [0, 1]. Thus, the game is 2-Lispchitz and 6-smooth. 2n (xn µn)(yn νn)| 1 2n , the series of u(x, y) is absolutely and uniformly convergent. We have Z 1 0 u(x, y)d Fp(x) = 1 2n (yn νn) Z 1 0 (xn µn)d Fp(x) 0, 0 u(x, y)Fq(y) = 1 2n (xn µn) Z 1 0 (yn νn)d Fq(y) 0. In particular, (p, q) is a mixed Nash equilibrium, and the value of the game is 0. Suppose (p , q ) is also a mixed Nash equilibrium. Then (p, q ) is a mixed Nash equilibrium. Note that p supports on every point in [0, 1]. As a consequence, we have 0 u(x, y)d Fq (y) = 1 2n (xn µn)(ν n νn) for all x [0, 1], where ν n is the n-th moment of q . Since the series vanished identically, the coefficients of each power of x must vanish. Thus ν n = νn and q = q. Similarly, we have p = p, and the mixed Nash equilibrium is unique. C Rerget Bound for Hedge Theorem 6 ([AHK12]). In an N-expert problem, assume all the rewards are bounded, i.e., ut [ M, M]N, then the Hedge algorithm with step size η = min{ 1 T } has regret ut, pt 2M p D Proof of Lemma 1 Let {f t}t [T ] be a sequence of non-convex L-smooth loss functions satisfying Assumption 1. Let {xt}t [T ] be the iterates produced by A against {f t}t [T ]. Then {xt}t [T ] is also the iterates produced by A against a sequence of linear loss functions { f t(xt), }. For the latter, we know max ϕ ΦX (δ) f t(xt), xt ϕ(xt) Reg T ΦX (δ). Then using L-smoothness of {f t} and the fact that ϕ(x) x δ for all ϕ Φ(δ), we get max ϕ ΦX (δ) t=1 f t(xt) f t(ϕ(xt)) max ϕ ΦX (δ) f t(xt), xt ϕ(xt) + L Reg T ΦX (δ) + δ2LT This completes the proof of the first part. Let each player i [n] employ algorithm A in a smooth game independently and produces iterates {xt}. The averaged joint strategy profile σT that chooses xt uniformly at random from t [T] satisfies for any player i [n], max ϕ ΦXi(δ) Ex σ[ui(ϕ(xi), x i)] Ex σ[ui(x)] = max ϕ ΦXi(δ) 1 T ui(ϕ(xt i), xt i) ui(xt) Reg T ΦXi(δ) T + δ2L Thus σT is a (maxi [n]{Reg T ΦXi(δ)} T 1 + δ2L 2 )-approximate Φ(δ))-equilibrium. This completes the proof of the second part. E Missing Details in Section 4.1 E.1 Differences between External Regret and ΦX Proj-regret In the following two examples, we show that ΦX Proj(δ)-regret is incomparable with external regret for convex loss functions . A sequence of actions may suffer high Reg T but low Reg T Proj,δ (Example 3), and vise versa (Example 4). Example 3. Let f 1(x) = f 2(x) = |x| for x X = [ 1, 1]. Then the ΦX Proj(δ)-regret of the sequence {x1 = 1 2} for any δ (0, 1 2) is 0. However, the external regret of the same sequence is 1. By repeating the construction for T 2 times, we conclude that there exists a sequence of actions with Reg T Proj,δ = 0 and Reg T = T 2 for all T 2. Example 4. Let f 1(x) = 2x and f 2(x) = x for x X = [ 1, 1]. Then the ΦX Proj(δ)-regret of the sequence {x1 = 1 2, x2 = 0} for any δ (0, 1 2) is δ. However, the external regret of the same sequence is 0. By repeating the construction for T 2 times, we conclude that there exists a sequence of actions with Reg T Proj,δ = δT 2 and Reg T = 0 for all T 2. At a high level, the external regret competes against a fixed action, whereas ΦX Proj(δ)-regret is more akin to the notion of dynamic regret, competing with a sequence of varying actions. When the environment is stationary, i.e., f t = f (Example 3), a sequence of actions that are far away from the global minimum must suffer high regret, but may produce low ΦX Proj(δ)-regret since the change to the cumulative loss caused by a fixed-direction deviation could be neutralized across different actions in the sequence. In contrast, in a non-stationary (dynamic) environment (Example 4), every fixed action performs poorly, and a sequence of actions could suffer low regret against a fixed action but the ΦX Proj(δ)-regret that competes with a fixed-direction deviation could be large. Nevertheless, despite these differences between the two notions of regret as shown above, they are compatible for convex loss functions: our main results in this section provide algorithms that minimize external regret and ΦX Proj(δ)-regret simultaneously. E.2 Proof of Theorem 3 Proof. Let us denote v Bd(δ) a fixed deviation and define pt = ΠX [xt v]. By standard analysis of GD [Zin03] (see also the proof of [Bub+15, Theorem 3.2] ), we have f t(xt) f t(pt) xt pt 2 xt+1 pt 2 + η2 f t(xt) 2 xt+1 pt+1 2 xt+1 pt 2 + δ2 where the last step uses x1 p1 δ and f t(xt) G. Here the terms xt+1 pt+1 2 xt+1 pt 2 do not telescope, and we further relax them in the following key step. Key Step: We relax the first term as: xt+1 pt+1 2 xt+1 pt 2 = pt pt+1, 2xt+1 pt pt+1 = pt pt+1, 2xt+1 2pt+1 pt pt+1 2 = 2 pt pt+1, v + 2 pt pt+1, xt+1 v pt+1 pt pt+1 2 2 pt pt+1, v pt pt+1 2, where in the last inequality we use the fact that pt+1 is the projection of xt+1 v onto X and pt is in X. Now we get a telescoping term 2 pt pt+1, u and a negative term pt pt+1 2. The negative term is useful for improving the regret analysis in the game setting, but we ignore it for now. Combining the two inequalities above, we have f t(xt) f t(pt) δ2 η p1 p T , v δ2 Since the above holds for any v with v δ, it also upper bounds Reg T Proj,δ. E.3 Lower bounds for ΦX Proj-Regret Theorem 7 (Lower bound for ΦX Proj(δ)-regret against convex losses). For any T 1, DX > 0, 0 < δ DX , and G 0, there exists a distribution D on G-Lipschitz linear loss functions f 1, . . . , f T over X = [ DX , DX ] such that for any online algorithm, its ΦX Proj(δ)-regret on the loss sequence satisfies ED[Reg T Proj,δ] = Ω(δG Remark 2. A keen reader may notice that the Ω(Gδ T) lower bound in Theorem 7 does not match the O(G δDX T) upper bound in Theorem 3, especially when DX δ. A natural question is: which of them is tight? We conjecture that the lower bound is tight. In fact, for the special case where the feasible set X is a box, we obtain a DX -independent bound O(d 1 4 Gδ T) using a modified version of GD, which is tight when d = 1. See Appendix J for a detailed discussion. This lower bound suggests that GD achieves near-optimal ΦX Proj(δ)-regret for convex losses. For L-smooth non-convex loss functions, we provide another Ω(δ2LT) lower bound for algorithms that satisfy the linear span assumption. The linear span assumption states that the algorithm produces xt+1 {ΠX [P i [t] ai xi+bi f i(xi)] : ai, bi R, i [t]} as essentially the linear combination of the previous iterates and their gradients. Many online algorithms such as online gradient descent and optimistic gradient satisfy the linear span assumption. Combining with Lemma 1, this lower bound suggests that GD attains nearly optimal ΦX Proj(δ)-regret, even in the non-convex setting, among a natural family of gradient-based algorithms. Proposition 1 (Lower bound for ΦX Proj(δ)-regret against non-convex losses). For any T 1, δ (0, 1), and L 0, there exists a sequence of L-Lipschitz and L-smooth non-convex loss functions f 1, . . . , f T on X = [ 1, 1] such that for any algorithm that satisfies the linear span assumption, its ΦX Proj(δ)-regret on the loss sequence is Reg T Proj,δ δ2LT E.3.1 Proof of Theorem 7 Our proof technique comes from the standard one used in multi-armed bandits [Aue+02, Theorem 5.1]. Suppose that f t(x) = gtx. We construct two possible environments. In the first environment, gt = G with probability 1+ε 2 and gt = G with probability 1 ε 2 ; in the second environment, gt = G with probability 1 ε 2 and gt = G with probability 1+ε 2 . We use Ei and Pi to denote the expectation and probability measure under environment i, respectively, for i = 1, 2. Suppose that the true environment is uniformly chosen from one of these two environments. Below, we show that the expected regret of the learner is at least Ω(δG Define N+ = PT t=1 I{xt 0} be the number of times xt is non-negative, and define f 1:T = (f 1, . . . , f T ). Then we have |E1[N+] E2[N+]| = P1(f 1:T )E N+ | f 1:T P2(f 1:T )E N+ | f 1:T (enumerate all possible sequences of f 1:T ) P1(f 1:T ) P2(f 1:T ) = T P1 P2 TV (2 ln 2)KL(P1, P2) (Pinsker s inequality) (2 ln 2)T KL Bernoulli 1 + ε , Bernoulli 1 ε (2 ln 2)Tε ln 1 + ε (4 ln 2)Tε2. (2) In the first environment, we consider the regret with respect to v = δ. Then we have E1 Reg T Proj,δ E1 t=1 f t(xt) f t(ΠX [xt δ]) t=1 gt(xt ΠX [xt δ]) t=1 εG(xt ΠX [xt δ]) t=1 I{xt 0} = εδGE1 [N+] , where in the last inequality we use the fact that if xt 0 then xt ΠX [xt δ] = xt (xt δ) = δ because D δ. In the second environment, we consider the regret with respect to v = δ. Then similarly, we have E2 Reg T Proj,δ E2 t=1 f t(xt) f t(ΠX [xt + δ]) t=1 gt(xt ΠX [xt + δ]) t=1 εG(xt ΠX [xt + δ]) t=1 I{xt < 0} = εδG (T E2 [N+]) . Summing up the two inequalities, we get 1 2 E1 Reg T Proj,δ + E2 Reg T Proj,δ 1 2 (εδGT + εδG(E1[N+] E2[N+])) εδGT εδGTε p (4 ln 2)T . (by (2)) Choosing ε = 1 (16 ln 2)T , we can lower bound the last expression by Ω(δG T). The theorem is proven by noticing that 1 2 E1 Reg T Proj,δ + E2 Reg T Proj,δ is the expected regret of the learner. E.3.2 Proof of Proposition 1 Proof. Consider f : [ 1, 1] R such that f(x) = L 2 x2 and let f t = f for all t [T]. Then any first-order methods that satisfy the linear span assumption with initial point x1 = 0 will produce xt = 0 for all t [T]. The ΦX Proj(δ)-regret is thus PT t=1(f(0) f(δ)) = δ2LT E.4 Improved ΦX Proj(δ)-Regret in the Game Setting and Proof of Theorem 9 Any online algorithm suffers an Ω( T) ΦX Proj(δ)-regret even against linear loss functions by Theorem 7. This lower bound, however, holds only in the adversarial setting. In this section, we show an improved O(T 1 4 ) individual ΦX Proj(δ)-regret bound under a slightly stronger smoothness assumption (Assumption 2) in the game setting, where players interact with each other using the same algorithm, previous results show improved external regret [Syr+15; CP20; DFG21; Ana+22a; Ana+22b; Far+22a]. This assumption is naturally satisfied by finite normal-form games and is also made for results about concave games [Far+22a]. Assumption 2. For any player i [n], the utility ui(x) satisfies xiui(x) xiui(x ) L x x for all x, x X. We study the Optimistic Gradient (OG) algorithm [RS13], an optimistic variant of GD that has been shown to have improved individual external regret guarantee in the game setting [Syr+15]. The OG algorithm initializes w0 X arbitrarily and g0 = 0. In each step t 1, the algorithm plays xt, receives feedback gt := f t(xt), and updates wt, as follows: xt = ΠX wt 1 ηgt 1 , wt = ΠX wt 1 ηgt . (OG) We show that OG has O( T) ΦX Proj(δ)-regret in the adversarial setting and fast O(T 1/4) ΦX Proj(δ)- regret and convergence to approximate ΦX Proj(δ)-equilibrium in games. Theorem 8 (Adversarial Regret Bound for OG). Let δ > 0 and T N. For convex and G-Lipschitz loss functions {f t : X R}t [T ], the ΦX Proj(δ)-regret of (OG) with step size η > 0 is Reg T Proj,δ δDX Choosing step size η = δDX 2G T , we have Reg T Proj,δ 4G δDX T. Theorem 9 (Improved Individual ΦX Proj(δ)-Regret of OG in the Game Setting). In a G-Lipschitz L-smooth (in the sense of Assumption 2) game, when all players employ OG with step size η > 0, then for each player i, δ > 0, and T 1, their individual ΦXi Proj(δ)-regret denoted as Reg T,i Proj,δ is Reg T,i Proj,δ δD η + ηG2 + 3n L2G2η3T. Choosing η = min{(δD/(n L2G2T)) 1 4 , (δD) 1 2 /G}, we have Reg T,i Proj,δ 4(δD) 3 4 (n L2G2T) 1 4 + 2 δDG. Furthermore, for any δ > 0 and any ε > 0, their empirical distribution of played strategy profiles converges to an (ε + δ2L 2 )-approximate ΦProj(δ)-equilibrium in O(1/ε 4 3 ) iterations. E.4.1 Proof of Theorem 8 Proof. Fix any deviation v that is bounded by δ. Let us define p0 = w0 and pt = ΠX [xt v]. Following standard analysis of OG [RS13], we have t=1 f t(xt) f t(pt) f t(xt), xt pt wt 1 pt 2 wt pt 2 + η gt gt 1 2 1 xt wt 2 + xt wt 1 2 wt 1 pt 2 1 wt 1 pt 1 2 + η gt gt 1 2 1 xt wt 1 2 (3) Now we apply a similar analysis from Theorem 3 to upper bound the term wt 1 pt 2 wt 1 pt 1 2: wt 1 pt 2 wt 1 pt 1 2 = pt 1 pt, 2wt 1 pt 1 pt = pt 1 pt, 2wt 1 2pt pt pt 1 2 = 2 pt 1 pt, v + 2 pt 1 pt, wt 1 v pt pt pt 1 2 = 2 pt 1 pt, v + 2 pt 1 pt, xt v pt + 2 pt 1 pt, wt 1 xt pt pt 1 2 2 pt 1 pt, v + xt wt 1 2, where in the last-inequality we use pt 1 pt, xt u pt 0 since pt = ΠX [xt v] and X is a compact convex set; we also use 2 a, b b2 a2. In the analysis above, unlike the analysis of GD where we drop the negative term pt pt 1 2, we use pt pt 1 2 to get a term xt wt 1 2 which can be canceled by the last term in (3). Now we combine the above two inequalities. Since the term xt wt 1 2 cancels out and 2 pt 1 pt, v telescopes, we get t=1 f t(xt) f t(pt) p0 p T , u t=1 η gt gt 1 2 δDX E.4.2 Proof of Theorem 9 In the analysis of Theorem 8 for the adversarial setting, the term gt gt 1 2 can be as large as 4G2. In the game setting where every player i employs OG, gt i ,i.e., xiui(x), depends on other players action xt i. Note that the change of the players actions xt xt 1 2 is only O(η2). Such stability of the updates leads to an improved upper bound on gt i gt 1 i 2 and hence also an improved O(T 1 4 ) ΦX Proj(δ)-regret for the player. Proof. Let us fix any player i [n] in the smooth game. In every step t, player i s loss function f t : Xi R is xiui(xt), determined by their utility function ui and all players actions xt. Therefore, their gradient feedback is gt = xiui(xt). For all t 2, we have gt gt 1 2 = ui(xt) ui(xt 1) 2 L2 xt xt 1 2 xt i xt 1 i 2 xt i wt i 2 + wt i wt 1 i 2 + wt 1 i xt 1 i 2 3n L2η2G2, where we use L-smoothness of the utility function ui in the first inequality; we use the update rule of OG and the fact that gradients are bounded by G in the last inequality. Applying the above inequality to the regret bound obtained in Theorem 8, the individual ΦX Proj(δ)- regret of player i is upper bounded by Reg T,i Proj,δ δD η + ηG2 + 3n L2G2η3T. Choosing η = min{(δD/(n L2G2T)) 1 4 , (δD) 1 2 /G}, we have Reg T,i Proj,δ 4(δD) 3 4 (n L2G2T) 1 4 + 2 δDG. Using Lemma 1, we have the empirical distribution of played strategy profiles converge to an (ε + δ2L 2 )-approximate ΦProj(δ))-equilibrium in O(1/ε 4 3 ) iterations. F Missing Details in Section 4.2 Algorithm 3: Conv(Φ)-regret minimization for Lipschitz smooth non-concave rewards Input: x1 X, K 2, a no-external-regret algorithm RΦ against linear reward over (Φ) Output: A Conv(Φ)-regret minimization algorithm over X 1 function NEXTSTRATEGY() 2 pt RΦ.NEXTSTRATEGY(). Note that pt is a distribution over Φ. 3 xk ϕpt(xk 1), for all 2 k K 4 return xt uniformly at random from {x1, . . . , x K}. 5 function OBSERVEREWARD( xut(xt)) 6 ut Φ( ) a linear reward over (Φ) with ut Φ(ϕ) = xut(xt), ϕ(xt) xt for all ϕ Φ. 7 RΦ.OBSERVEREWARD(ut Φ( )). F.1 Proof of Theorem 4 Proof Sketch. We adopt the framework in [SL07; GGM08] (as described in Section 3) with two main modifications. First, we utilize the L-smoothness of the utilities to transform the problem of external regret over (Φ) against non-concave rewards into a linear optimization problem. Second, we use the technique of fixed point in expectation" [Zha+24] to circumvent the intractable problem of finding a fixed point. Proof. For a sequence of strategies {xt}t [T ], its Conv(Φ)-regret is Reg T Conv(Φ) = max ϕ Conv(Φ) ut(ϕ(xt)) ut(xt) ) = max p (Φ) t=1 ut(ϕp(xt)) ut(ϕpt(xt)) | {z } I: external regret over (Φ) t=1 ut(ϕpt(xt)) ut(xt) | {z } II: approximation error of fixed point Bounding External Regret over (Φ) We can define a new reward function f t(p) := ut(ϕp(xt)) over p (Φ). Since ut is non-concave, the reward f t is also non-concave and it is computational intractable to minimize external regret. We use locality to avoid computational barrier. Here we use the fact that Φ = Φ(δ) contains only δ-local strategy modifications. Then by L-smoothness of ut, we know for any p (Φ) ut(ϕp(xt) ut(xt) ut(xt), ϕp(xt) xt ) L ϕp(xt) xt 2 δ2L Thus we can approximate the non-concave optimization problem by a linear optimization problem over (Φ) with only second-order error δ2L 2 . Here we use the notation a = b c to mean b c a b + c. ut(ϕp(xt) ut(xt) = ut(xt), ϕp(xt) xt δ2L ϕ Φ p(ϕ)ϕ(xt) xt + ϕ Φ p(ϕ) ut(xt), ϕ(xt) xt δ2L We can then instantiate the external regret RΦ as the Hedge algorithm over reward f t(p) = P ϕ Φ p(ϕ) ut(xt), ϕ(xt) xt and get t=1 ut(ϕp(xt)) ut(ϕpt(xt)) ϕ Φ (p(ϕ) pt(ϕ)) ut(xt), ϕ(xt) xt ) T log |Φ| + δ2LT, where we use the fact that ut(xt), ϕ(xt) xt ut(xt) ϕ(xt) xt Gδ. Bounding error due to sampling from a fixed point in expectation We choose x1 as an arbitrary point in X. Then we recursively apply ϕpt to get xk = ϕpt(xk 1) = X ϕ Φ pt(ϕ)ϕ(xk 1), 2 k K. We denote µt = Uniform{xk : 1 k K}. Then the strategy xt µt is sampled from µt. We have that µt is an approximate fixed-point in expectation / stationary distribution in the sense that Eµt ut(ϕpt(xt)) ut(xt) = 1 k=1 ut(ϕpt(xk) ut(xk)) K ut(ϕpt(x K)) ut(x1) Thanks to the boundedness of ut, we can use Hoeffding-Azuma s inequality to conclude that ut(ϕpt(xt)) ut(xt) 1 for any ε > 0. Combining the above with ε = p 8T log(1/β) and K = T, we get with probability at least 1 β, Reg T Conv(Φ) 2Gδ p T log |Φ| + δ2LT + 8T log(1/β) log |Φ| + p log(1/β) + δ2LT. Convergence to Φ-equilibrium If all players in a non-concave continuous game employ Algorithm 1, then we know for each player i, with probability 1 β n, its ΦXi-regret is upper bounded by log |ΦXi| + p log(n/β) + δ2LT. By a union bound over all n players, we get with probability 1 β, every player i s ΦXi-regret is upper bounded by 8 log |Φ+Xi| + p log(n/β)) + δ2LT. Now by Theorem 1, we know the empirical distribution of strategy profiles played forms an (ε + δ2L)-approximate Φ = Πn i=1ΦXiequilibrium, as long as T 128(G2δ2 log |ΦXi|+log(n/β)) ε2 iterations. G Missing details in Section 4.3 We introduce a natural set of local strategy modifications and the corresponding local equilibrium notion. Given any set of (possibly non-local) strategy modifications Ψ = {ψ : X X}, we define a set of local strategy modifications as follows: for δ DX and λ [0, 1], each strategy modification ϕλ,ψ interpolates the input strategy x with the modified strategy ψ(x): formally, ΦX Int,Ψ(δ) := {ϕλ,ψ(x) := (1 λ)x + λψ(x) : ψ Ψ, λ δ/DX } . Note that for any ψ Ψ and λ δ DX , we have ϕλ,ψ(x) x = λ x ψ(x) δ, respecting the locality constraint. The induced ΦX Int,Ψ(δ)-regret can be written as Reg T Int,Ψ,δ := maxψ Ψ,λ δ DX PT t=1 (f t(xt) f t((1 λ)xt + λψ(xt))). We now define the corresponding ΦInt,Ψ(δ)-equilibrium. Definition 9. Define ΦInt,Ψ(δ) = Πn j=1ΦXj Int,Ψj(δ). In a continuous game, a distribution σ over strategy profiles is an (ε-approximate ΦInt,Ψ(δ)-equilibrium if and only if for all player i [n], max ψ Ψi,λ δ/DXi Ex σ[ui((1 λ)xi + λψ(xi), x i)] Ex σ[ui(x)] + ε. Intuitively speaking, when a correlation device recommends strategies to players according to an ε-approximate ΦInt,Ψ(δ)-equilibrium, no player can increase their utility by more than ε through a local deviation by interpolating with a (possibly global) strategy modification ψ Ψ. The richness of Ψ determines the incentive guarantee provided by an ε-approximate ΦInt,Ψ(δ)-equilibriumas well as its computational complexity. When we choose Ψ to be the set of all possible strategy modifications, the corresponding notion of local equilibrium limiting the gain of a player by interpolating with any strategy resembles that of a correlated equilibrium. Computation of ε-approximate ΦInt,Ψ(δ)-Equilibrium. By Lemma 1, we know computing an ε-approximate ΦInt,Ψ(δ)-equilibrium reduces to minimizing ΦX Int,Ψ(δ)-regret against convex loss functions. We show that minimizing ΦX Int,Ψ(δ)-regret against convex loss functions further reduces to Ψ-regret minimization against linear loss functions. Theorem 10. Let A be an algorithm with Ψ-regret Reg T Ψ(G, DX ) for linear and G-Lipschitz loss functions over X. Then, for any δ > 0, the ΦX Int,Ψ(δ)-regret of A for convex and G-Lipschitz loss functions over X is at most δ DX [Reg T Ψ(G, DX )] +. Proof. By definition and convexity of f t, we get max ϕ ΦX Int,Ψ(δ) t=1 f t(xt) f t(ϕ(xt)) = max ψ Ψ,λ δ DX t=1 f t(xt) f t((1 λ)xt + λψ(xt)) f t(xt), xt ψ(xt) #+ Figure 2: Illustration of ϕProj,v(x) and ϕBeam,v(x) Note that when f t is linear, the reduction is without loss. Thus, any worst-case Ω(r(T))-lower bound for Ψ-regret implies a Ω( δ DX r(T)) lower bound for ΦInt,Ψ(δ)-regret. Moreover, for any set Ψ that admits efficient Ψ-regret minimization algorithms such as swap transformations over the simplex and more generally any set such that (i) all modifications in the set can be represented as linear transformations in some finite-dimensional space and (ii) fixed point computation can be carried out efficiently for any linear transformations [GGM08], we also get an efficient algorithm for computing an ε-approximate ΦInt,Ψ(δ)-equilibrium in the first-order stationary regime. CCE-like Instantiation In the special case where Ψ contains only constant strategy modifications (i.e. ψ(x) = x for all x), we get a coarse correlated equilibrium (CCE)-like instantiation of local equilibrium, which limits the gain by interpolating with any fixed strategy. We denote the resulting set of local strategy modification simply as ΦX Int. We can apply any no-external regret algorithm for efficient ΦX Int-regret minimization and computation of ε-approximate ΦInt(δ)-equilibrium in the first-order stationary regime as summarized in Theorem 5. The above ΦX Int(δ)-regret bound of O( T) is derived for the adversarial setting. In the game setting, where each player employs the same algorithm, players may have substantially lower external regret [Syr+15; CP20; DFG21; Ana+22a; Ana+22b; Far+22a] but we need a slightly stronger smoothness assumption than Assumption 1. This assumption is naturally satisfied by finite normalform games and is also made for results about concave games [Far+22a]. Using Assumption 2 and Lemma 1, the no-regret learning dynamics of [Far+22a] that guarantees O(log T) individual external regret in concave games can be applied to smooth non-concave games so that the individual ΦX Int(δ)-regret of each player is at most O(log T)+ δ2LT 2 . This gives an algorithm with faster O(1/ε) convergence to an (ε + δ2L 2 )-approximate ΦInt(δ)-equilibrium than GD. H Beam-Search Local Strategy Modifications and Local Equilibria In Section 4.1 and Section 4.3, we have shown that GD achieves near-optimal performance for both ΦX Int(δ)-regret and ΦX Proj(δ)-regret. In this section, we introduce another natural set of local strategy modifications, ΦX Beam(δ), which is similar to ΦX Proj(δ). Specifically, the set ΦX Beam(δ) contains deviations that try to move as far as possible in a fixed direction (see Figure 2 for an illustration of the difference between ϕBeam,v(x) and ϕProj,v(x)): ΦX Beam(δ) := {ϕBeam,v(x) = x λ v : v Bd(δ), λ = max{λ : x λv X, λ [0, 1]}}. It is clear that ϕBeam,v(x) x v δ. We can similarly derive the notion of ΦX Beam-regret and (ε, ΦBeam(δ))-equilibrium. Surprisingly, we show that GD suffers linear ΦX Beam(δ)-regret (proof deferred to Appendix H.1). Theorem 11. For any δ, η < 1 2 and T 1, there exists a sequence of linear loss functions {f t : X [0, 1]2 R}t [T ] such that GD with step size η suffers Ω(δT) ΦX Beam(δ)-regret. H.1 Proof of Theorem 11 Let X R2 be a triangle region with vertices A = (0, 0), B = (1, 1), C = (δ, 0). Consider v = ( δ, 0). The initial point is x1 = (0, 0). The adversary will choose ℓt adaptively so that xt remains on the boundary of X and cycles clockwise (i.e., A B C A ). To achieve this, the adversary will repeat the following three phases: 1. Keep choosing ℓt = u BA (u BA denotes the unit vector in the direction of BA) until xt+1 reaches B. 2. Keep choosing ℓt = u CB until xt+1 reaches C. 3. Keep choosing ℓt = u AC until xt+1 reaches A. In Phase 1, xt AB. By the choice of v = ( δ, 0), we have xt ϕv(xt) = ( δ(1 xt,1), 0), and the instantaneous regret is δ(1 xt,1) In Phase 2, xt BC. By the choice of v = ( δ, 0), we have xt ϕv(xt) = (0, 0), and the instantaneous regret is 0. In Phase 3, xt CA. By the choice of v = ( δ, 0), we have xt ϕv(xt) = ( δ + xt,1, 0), and the instantaneous regret is δ + xt,1 0. In each cycle, the number of rounds in Phase 1 is of order Θ( 2 η ), the number of rounds in Phase 2 is between O( 1 2 η ), the number of rounds in Phase 3 is of order Θ( δ Therefore, the cumulative regret in each cycle is roughly η ( 0.5δ) = 0.5δ 0.5δ2 On the other hand, the number of cycles is no less than T η = Θ(ηT). Overall, the cumulative regret is at least 0.5δ 0.5δ2 η Θ(ηT) = Θ(δT) as long as δ < 0.5. I Hardness in the Global Regime In the first-order stationary regime δ p 2ε/L, (ε, δ)-local Nash equilibrium is intractable, and we have shown polynomial-time algorithms for computing the weaker notions of ε-approximate ΦInt(δ))- equilibrium and ε-approximate ΦProj(δ))-equilibrium. A natural question is whether correlation enables efficient computation of ε-approximate Φ(δ))-equilibrium when δ is in the global regime, i.e., δ = Ω( d). In this section, we prove both computational hardness and a query complexity lower bound for both notions in the global regime To prove the lower bound results, we only require a single-player game. The problem of computing an ε-approximate Φ(δ)-equilibrium becomes: given scalars ε, δ, G, L > 0 and a polynomial-time Turing machine Cf evaluating a G-Lipschitz and L-smooth function f : [0, 1]d [0, 1] and its gradient f : [0, 1]d Rd, we are asked to output a distribution σ that is an ε-approximate Φ(δ)-equilibrium or if such equilibrium does not exist. Hardness of finding ε-approximate ΦX Int(δ)-equilibria in the global regime When δ = d, which equals to the diameter D of [0, 1]d, then the problem of finding an ε-approximate ΦX Int(δ)- equilibrium is equivalent to finding a (ε, δ)-local minimum of f: assume σ is an ε-approximate ΦX Int(δ)-equilibrium of f, then there exists x [0, 1]d in the support of σ such that f(x) min x [0,1]d Bd(x ,δ) f(x ) ε. Then hardness of finding an ε-approximate ΦX Int(δ)-equilibrium follows from hardness of finding a (ε, δ)-local minimum of f [DSZ21]. The following Theorem is a corollary of Theorem 10.3 and 10.4 in [DSZ21]. Theorem 12 (Hardness of finding ε-approximate ΦX Int(δ)-equilibria in the global regime). In the worst case, the following two holds. Computing an ε-approximate ΦX Int(δ)-equilibrium for a game on X = [0, 1]d with G = d, L = d, ε 1 24, δ = d is NP-hard. Ω(2d/d) value/gradient queries are needed to determine an ε-approximate ΦX Int(δ)- equilibrium for a game on X = [0, 1]d with G = Θ(d15), L = Θ(d22), ε < 1, δ = Hardness of finding ε-approximate ΦX Proj(δ)-equilibria in the global regime Theorem 13 (Hardness of of finding ε-approximate ΦX Proj(δ)-equilibria in the global regime). In the worst case, the following two holds. Computing an ε-approximate ΦX Proj(δ)-equilibrium for a game on X = [0, 1]d with G = Θ(d15), L = Θ(d22), ε < 1, δ = d is NP-hard. Ω(2d/d) value/gradient queries are needed to determine an ε-approximate ΦX Proj(δ)- equilibrium for a game on X = [0, 1]d with G = Θ(d15), L = Θ(d22), ε < 1, δ = The hardness of computing ε-approximate ΦX Proj(δ)-equilibrium also implies a lower bound on ΦX Proj(δ)-regret in the global regime. Corollary 2 (Lower bound of ΦX Proj(δ)-regret against non-convex functions). In the worst case, the ΦX Proj(δ)-regret of any online algorithm is at least Ω(2d/d, T) even for loss functions f : [0, 1]d [0, 1] with G, L = poly(d) and δ = The proofs of Theorem 13 and Corollary 2 can be found in the next two sections. I.1 Proof of Theorem 13 We will reduce the problem of finding an ε-approximate ΦX Proj(δ)-equilibrium in smooth games to finding a satisfying assignment of a boolean function, which is NP-complete. Fact 1. Given only black-box access to a boolean formula ϕ : {0, 1}d {0, 1}, at least Ω(2d) queries are needed in order to determine whether ϕ admits a satisfying assignment x such that ϕ(x ) = 1. The term black-box access refers to the fact that the clauses of the formula are not given, and the only way to determine whether a specific boolean assignment is satisfying is by querying the specific binary string. Moreover, the problem of finding a satisfying assignment of a general boolean function is NP-hard. We revisit the construction of the hard instance in the proof of [DSZ21, Theorem 10.4] and use its specific structures. Given black-box access to a boolean formula ϕ as described in Fact 1, following [DSZ21], we construct the function fϕ(x) : [0, 1]d [0, 1] as follows: 1. for each corner v V = {0, 1}d of the [0, 1]d hypercube, we set fϕ(x) = 1 ϕ(x). 2. for the rest of the points x [0, 1]d/V , we set fϕ(x) = P v V Pv(x) fϕ(v) where Pv(x) are non-negative coefficients defined in [DSZ21, Definition 8.9]. The function fϕ satisfies the following properties: 1. if ϕ is not satisfiable, then fϕ(x) = 1 for all x [0, 1]d since fϕ(v) = 1 for all v V ; if ϕ has a satisfying assignment v , then fϕ(v ) = 0. 2. fϕ is Θ(d12)-Lipschitz and Θ(d25)-smooth. 3. for any point x [0, 1]d, the set V (x) := {v V : Pv(x) = 0} has cardinality at most d + 1 while P v V Pv(x) = 1; any value / gradient query of fϕ can be simulated by d + 1 queries on ϕ. In the case there exists a satisfying argument v , then fϕ(v ) = 0. Define the deviation e so that e[i] = 1 if v [i] = 0 and e[i] = 1 if v [i] = 1. It is clear that e = d = δ. By properties of projection on [0, 1]d, for any x [0, 1]d, we have Π[0,1]n[x v] = v . Then any ε-approximate ΦX Proj(δ)-equilibrium σ must include some x X with fϕ(x ) < 1 in the support, since ε < 1. In case there exists an algorithm A that computes an ε-approximate ΦX Proj(δ)-equilibrium, A must have queried some x with fϕ(x ) < 1. Since fϕ(x ) = P v V (x ) Pv(x )fϕ(v) < 1, there exists ˆv V (x ) such that fϕ(ˆv) = 0. Since |V (x )| d + 1, it takes addition d + 1 queries to find ˆv with fϕ(ˆv) = 0. By Fact 1 and the fact that we can simulate every value/gradient query of fϕ by d + 1 queries on ϕ, A makes at least Ω(2d/d) value/gradient queries. Suppose there exists an algorithm B that outputs an ε-approximate ΦX Proj(δ)-equilibrium σ in time T(B) for ε < 1 and δ = d. We construct another algorithm C for SAT that terminates in time T(B) poly(d). C: (1) given a boolean formula ϕ, construct fϕ as described above; (2) run B and get output σ (3) check the support of σ to find v {0, 1}d such that fϕ(v) = 0; (3) if finds v {0, 1}d such that fϕ(v) = 0, then ϕ is satisfiable, otherwise ϕ is not satisfiable. Since we can evaluate fϕ and fϕ in poly(d) time and the support of σ is smaller than T(B), the algorithm C terminates in time O(T(B) poly(d)). The above gives a polynomial time reduction from SAT to finding an ε-approximate ΦX Proj(δ)-equilibrium and proves the NP-hardness of the latter problem. I.2 Proof of Corollary 2 Let ϕ : {0, 1}d {0, 1} be a boolean formula and define fϕ : [0, 1]d [0, 1] the same as that in Theorem 13. We know fϕ is Θ(poly(d))-Lipschitz and Θ(poly(d))-smooth. Now we let the adversary pick fϕ each time. For any T O(2d/d), in case there exists an online learning algorithm with Reg T Proj,δ < T 2 , then σ := 1 T PT t=1 1xt is an ( 1 2, δ)-equilibrium. Applying Theorem 13 and the fact that in this case, Reg T Proj,δ is non-decreasing with respect to T concludes the proof. J Removing the D dependence for ΦX Proj-regret For the regime δ DX which we are more interested in, the lower bound in Theorem 7 is Ω(Gδ T) while the upper bound in Theorem 3 is O(G δDX T). They are not tight especially when DX δ. A natural question is: which of them is the tight bound? We conjecture that the lower bound is tight. In fact, for the special case where the feasible set X is a box, we have a way to obtain a DX -independent bound O(d 1 4 Gδ T), which is tight when d = 1. Below, we first describe the improved strategy in 1-dimension. Then we show how to extend it to the d-dimensional box setting. J.1 One-Dimensional Case In one-dimension, we assume that X = [a, b] for some b a 2δ (if b a 2δ, then our original bound in Theorem 3 is already of order Gδ T). We first investigate the case where f t(x) is a linear function, i.e., f t(x) = gtx for some gt [ G, G]. The key idea is that we will only select xt from the two intervals [a, a + δ] and [b δ, b], and never play xt (a + δ, b δ). To achieve so, we concatenate these two intervals, and run an algorithm in this region whose diameter is only 2δ. The key property we would like to show is that the regret is preserved in this modified problem. More precisely, given the original feasible set X = [a, b], we create a new feasible set Y = [ δ, δ] and apply our algorithm GD in this new feasible set. The loss function is kept as f t(x) = gtx. Whenever the algorithm for Y outputs yt [ δ, 0], we play xt = yt + a + δ in X; whenever it outputs yt (0, δ], we play xt = yt + b δ. Below we show that the regret is the same in these two problems. Notice that when yt 0, we have for any v [ δ, δ], xt ΠX [xt v] = xt max min xt v, b , a = xt max xt v, a (xt v = yt + a + δ v a + 2δ b always holds) = yt + a + δ max yt + a + δ v, a = yt max yt v, δ = yt max min yt v, δ , δ (yt v δ always holds) = yt ΠY[yt v] Similarly, when yt > 0, we can follow the same calculation and prove xt ΠX [xt v] = yt ΠY[yt v]. Thus, the regret in the two problems: gt xt ΠX [xt v] and gt yt ΠY[yt v] are exactly the same for any v. Finally, observe that the diameter of Y is only of order O(δ). Thus, the upper bound in Theorem 3 would give us an upper bound of O(G δ δT) = O(Gδ For convex f t, we run the algorithm above with gt = f t(xt). Then by convexity we have f t(xt) f t(ΠX [xt v]) gt(xt ΠX [xt v]) = gt(yt ΠY[yt v]), so the regret in the modified problem (which is O(Gδ T)) still serves as a regret upper bound for the original problem. J.2 d-Dimensional Box Case A d-dimensional box is of the form X = [a1, b1] [a2, b2] [ad, bd]. The box case is easy to deal with because we can decompose the regret into individual components in each dimension. Namely, we have f t(xt) f t(ΠX [xt v]) f t(xt) xt ΠX [xt v] i=1 gt i xt i ΠXi[xt i vi] where we define Xi = [ai, bi], gt = f t(xt), and use subscript i to indicate the i-th component of a vector. The last equality above is guaranteed by the box structure. This decomposition allows as to view the problem as d independent 1-dimensional problems. Now we follow the strategy described in Section J.1 to deal with individual dimensions (if bi ai < 2δ then we do not modify Xi; otherwise, we shrink Xi to be of length 2δ). Applying the analysis of Theorem 3 to each dimension, we get i=1 gt i xt i ΠXi[xt i vi] v2 i 2η + η t=1 (gt i)2 + |vi| 2δ (the diameter in each dimension is now bounded by 2δ) δ Pd i=1 |vi| . (by Cauchy-Schwarz, P Choosing the optimal η = d 1 4 δ G T , we get the regret upper bound of order O d 1 4 Gδ Neur IPS Paper Checklist The checklist is designed to encourage best practices for responsible machine learning research, addressing issues of reproducibility, transparency, research ethics, and societal impact. Do not remove the checklist: The papers not including the checklist will be desk rejected. The checklist should follow the references and follow the (optional) supplemental material. The checklist does NOT count towards the page limit. Please read the checklist guidelines carefully for information on how to answer these questions. For each question in the checklist: You should answer [Yes] , [No] , or [NA] . [NA] means either that the question is Not Applicable for that particular paper or the relevant information is Not Available. Please provide a short (1 2 sentence) justification right after your answer (even for NA). The checklist answers are an integral part of your paper submission. They are visible to the reviewers, area chairs, senior area chairs, and ethics reviewers. You will be asked to also include it (after eventual revisions) with the final version of your paper, and its final version will be published with the paper. The reviewers of your paper will be asked to use the checklist as one of the factors in their evaluation. While "[Yes] " is generally preferable to "[No] ", it is perfectly acceptable to answer "[No] " provided a proper justification is given (e.g., "error bars are not reported because it would be too computationally expensive" or "we were unable to find the license for the dataset we used"). In general, answering "[No] " or "[NA] " is not grounds for rejection. While the questions are phrased in a binary way, we acknowledge that the true answer is often more nuanced, so please just use your best judgment and write a justification to elaborate. All supporting evidence can appear either in the main paper or the supplemental material, provided in appendix. If you answer [Yes] to a question, in the justification please point to the section(s) where related material for the question can be found. IMPORTANT, please: Delete this instruction block, but keep the section heading Neur IPS paper checklist", Keep the checklist subsection headings, questions/answers and guidelines below. Do not modify the questions and only use the provided macros for your answers. Question: Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope? Answer: [Yes] Justification: We state the problem setting and our main contributions in the abstract and introduction. Guidelines: The answer NA means that the abstract and introduction do not include the claims made in the paper. The abstract and/or introduction should clearly state the claims made, including the contributions made in the paper and important assumptions and limitations. A No or NA answer to this question will not be perceived well by the reviewers. The claims made should match theoretical and experimental results, and reflect how much the results can be expected to generalize to other settings. It is fine to include aspirational goals as motivation as long as it is clear that these goals are not attained by the paper. 2. Limitations Question: Does the paper discuss the limitations of the work performed by the authors? Answer: [Yes] Justification: Our results provide efficient uncoupled algorithms to compute ε-approximate Φ-equilibria in the first-order stationary regime. We leave the computational complexity of finding ε-approximate Φ-equilibria beyond the first-order stationary regime as an open question. Guidelines: The answer NA means that the paper has no limitation while the answer No means that the paper has limitations, but those are not discussed in the paper. The authors are encouraged to create a separate "Limitations" section in their paper. The paper should point out any strong assumptions and how robust the results are to violations of these assumptions (e.g., independence assumptions, noiseless settings, model well-specification, asymptotic approximations only holding locally). The authors should reflect on how these assumptions might be violated in practice and what the implications would be. The authors should reflect on the scope of the claims made, e.g., if the approach was only tested on a few datasets or with a few runs. In general, empirical results often depend on implicit assumptions, which should be articulated. The authors should reflect on the factors that influence the performance of the approach. For example, a facial recognition algorithm may perform poorly when image resolution is low or images are taken in low lighting. Or a speech-to-text system might not be used reliably to provide closed captions for online lectures because it fails to handle technical jargon. The authors should discuss the computational efficiency of the proposed algorithms and how they scale with dataset size. If applicable, the authors should discuss possible limitations of their approach to address problems of privacy and fairness. While the authors might fear that complete honesty about limitations might be used by reviewers as grounds for rejection, a worse outcome might be that reviewers discover limitations that aren t acknowledged in the paper. The authors should use their best judgment and recognize that individual actions in favor of transparency play an important role in developing norms that preserve the integrity of the community. Reviewers will be specifically instructed to not penalize honesty concerning limitations. 3. Theory Assumptions and Proofs Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof? Answer: [Yes] Justification: We provide assumptions and proofs for our theoretical results in the main body and the appendix. Guidelines: The answer NA means that the paper does not include theoretical results. All the theorems, formulas, and proofs in the paper should be numbered and crossreferenced. All assumptions should be clearly stated or referenced in the statement of any theorems. The proofs can either appear in the main paper or the supplemental material, but if they appear in the supplemental material, the authors are encouraged to provide a short proof sketch to provide intuition. Inversely, any informal proof provided in the core of the paper should be complemented by formal proofs provided in appendix or supplemental material. Theorems and Lemmas that the proof relies upon should be properly referenced. 4. Experimental Result Reproducibility Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)? Answer: [NA] Justification: This paper does not contain experimental results. Guidelines: The answer NA means that the paper does not include experiments. If the paper includes experiments, a No answer to this question will not be perceived well by the reviewers: Making the paper reproducible is important, regardless of whether the code and data are provided or not. If the contribution is a dataset and/or model, the authors should describe the steps taken to make their results reproducible or verifiable. Depending on the contribution, reproducibility can be accomplished in various ways. For example, if the contribution is a novel architecture, describing the architecture fully might suffice, or if the contribution is a specific model and empirical evaluation, it may be necessary to either make it possible for others to replicate the model with the same dataset, or provide access to the model. In general. releasing code and data is often one good way to accomplish this, but reproducibility can also be provided via detailed instructions for how to replicate the results, access to a hosted model (e.g., in the case of a large language model), releasing of a model checkpoint, or other means that are appropriate to the research performed. While Neur IPS does not require releasing code, the conference does require all submissions to provide some reasonable avenue for reproducibility, which may depend on the nature of the contribution. For example (a) If the contribution is primarily a new algorithm, the paper should make it clear how to reproduce that algorithm. (b) If the contribution is primarily a new model architecture, the paper should describe the architecture clearly and fully. (c) If the contribution is a new model (e.g., a large language model), then there should either be a way to access this model for reproducing the results or a way to reproduce the model (e.g., with an open-source dataset or instructions for how to construct the dataset). (d) We recognize that reproducibility may be tricky in some cases, in which case authors are welcome to describe the particular way they provide for reproducibility. In the case of closed-source models, it may be that access to the model is limited in some way (e.g., to registered users), but it should be possible for other researchers to have some path to reproducing or verifying the results. 5. Open access to data and code Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [NA] Justification: This paper does not contain experimental results. Guidelines: The answer NA means that paper does not include experiments requiring code. Please see the Neur IPS code and data submission guidelines (https://nips.cc/ public/guides/Code Submission Policy) for more details. While we encourage the release of code and data, we understand that this might not be possible, so No is an acceptable answer. Papers cannot be rejected simply for not including code, unless this is central to the contribution (e.g., for a new open-source benchmark). The instructions should contain the exact command and environment needed to run to reproduce the results. See the Neur IPS code and data submission guidelines (https: //nips.cc/public/guides/Code Submission Policy) for more details. The authors should provide instructions on data access and preparation, including how to access the raw data, preprocessed data, intermediate data, and generated data, etc. The authors should provide scripts to reproduce all experimental results for the new proposed method and baselines. If only a subset of experiments are reproducible, they should state which ones are omitted from the script and why. At submission time, to preserve anonymity, the authors should release anonymized versions (if applicable). Providing as much information as possible in supplemental material (appended to the paper) is recommended, but including URLs to data and code is permitted. 6. Experimental Setting/Details Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to understand the results? Answer: [NA] Justification: This paper does not contain experimental results. Guidelines: The answer NA means that the paper does not include experiments. The experimental setting should be presented in the core of the paper to a level of detail that is necessary to appreciate the results and make sense of them. The full details can be provided either with the code, in appendix, or as supplemental material. 7. Experiment Statistical Significance Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments? Answer: [NA] Justification: This paper does not contain experimental results. Guidelines: The answer NA means that the paper does not include experiments. The authors should answer "Yes" if the results are accompanied by error bars, confidence intervals, or statistical significance tests, at least for the experiments that support the main claims of the paper. The factors of variability that the error bars are capturing should be clearly stated (for example, train/test split, initialization, random drawing of some parameter, or overall run with given experimental conditions). The method for calculating the error bars should be explained (closed form formula, call to a library function, bootstrap, etc.) The assumptions made should be given (e.g., Normally distributed errors). It should be clear whether the error bar is the standard deviation or the standard error of the mean. It is OK to report 1-sigma error bars, but one should state it. The authors should preferably report a 2-sigma error bar than state that they have a 96% CI, if the hypothesis of Normality of errors is not verified. For asymmetric distributions, the authors should be careful not to show in tables or figures symmetric error bars that would yield results that are out of range (e.g. negative error rates). If error bars are reported in tables or plots, The authors should explain in the text how they were calculated and reference the corresponding figures or tables in the text. 8. Experiments Compute Resources Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [NA] Justification: This paper does not contain experimental results. Guidelines: The answer NA means that the paper does not include experiments. The paper should indicate the type of compute workers CPU or GPU, internal cluster, or cloud provider, including relevant memory and storage. The paper should provide the amount of compute required for each of the individual experimental runs as well as estimate the total compute. The paper should disclose whether the full research project required more compute than the experiments reported in the paper (e.g., preliminary or failed experiments that didn t make it into the paper). 9. Code Of Ethics Question: Does the research conducted in the paper conform, in every respect, with the Neur IPS Code of Ethics https://neurips.cc/public/Ethics Guidelines? Answer: [Yes] Justification: We have reviewed the Neur IPS Code of Ethics and the current paper conforms the Neur IPS Code of Ethics. Guidelines: The answer NA means that the authors have not reviewed the Neur IPS Code of Ethics. If the authors answer No, they should explain the special circumstances that require a deviation from the Code of Ethics. The authors should make sure to preserve anonymity (e.g., if there is a special consideration due to laws or regulations in their jurisdiction). 10. Broader Impacts Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed? Answer: [NA] Justification: This is a purely theoretical paper and we do not see any immediate societal impact. Guidelines: The answer NA means that there is no societal impact of the work performed. If the authors answer NA or No, they should explain why their work has no societal impact or why the paper does not address societal impact. Examples of negative societal impacts include potential malicious or unintended uses (e.g., disinformation, generating fake profiles, surveillance), fairness considerations (e.g., deployment of technologies that could make decisions that unfairly impact specific groups), privacy considerations, and security considerations. The conference expects that many papers will be foundational research and not tied to particular applications, let alone deployments. However, if there is a direct path to any negative applications, the authors should point it out. For example, it is legitimate to point out that an improvement in the quality of generative models could be used to generate deepfakes for disinformation. On the other hand, it is not needed to point out that a generic algorithm for optimizing neural networks could enable people to train models that generate Deepfakes faster. The authors should consider possible harms that could arise when the technology is being used as intended and functioning correctly, harms that could arise when the technology is being used as intended but gives incorrect results, and harms following from (intentional or unintentional) misuse of the technology. If there are negative societal impacts, the authors could also discuss possible mitigation strategies (e.g., gated release of models, providing defenses in addition to attacks, mechanisms for monitoring misuse, mechanisms to monitor how a system learns from feedback over time, improving the efficiency and accessibility of ML). 11. Safeguards Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)? Answer: [NA] Justification: This is a purely theoretical paper Guidelines: The answer NA means that the paper poses no such risks. Released models that have a high risk for misuse or dual-use should be released with necessary safeguards to allow for controlled use of the model, for example by requiring that users adhere to usage guidelines or restrictions to access the model or implementing safety filters. Datasets that have been scraped from the Internet could pose safety risks. The authors should describe how they avoided releasing unsafe images. We recognize that providing effective safeguards is challenging, and many papers do not require this, but we encourage authors to take this into account and make a best faith effort. 12. Licenses for existing assets Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected? Answer: [NA] Justification: This is a purely theoretical paper. Guidelines: The answer NA means that the paper does not use existing assets. The authors should cite the original paper that produced the code package or dataset. The authors should state which version of the asset is used and, if possible, include a URL. The name of the license (e.g., CC-BY 4.0) should be included for each asset. For scraped data from a particular source (e.g., website), the copyright and terms of service of that source should be provided. If assets are released, the license, copyright information, and terms of use in the package should be provided. For popular datasets, paperswithcode.com/datasets has curated licenses for some datasets. Their licensing guide can help determine the license of a dataset. For existing datasets that are re-packaged, both the original license and the license of the derived asset (if it has changed) should be provided. If this information is not available online, the authors are encouraged to reach out to the asset s creators. 13. New Assets Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets? Answer: [NA] Justification: This is a purely theoretical paper. Guidelines: The answer NA means that the paper does not release new assets. Researchers should communicate the details of the dataset/code/model as part of their submissions via structured templates. This includes details about training, license, limitations, etc. The paper should discuss whether and how consent was obtained from people whose asset is used. At submission time, remember to anonymize your assets (if applicable). You can either create an anonymized URL or include an anonymized zip file. 14. Crowdsourcing and Research with Human Subjects Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)? Answer: [NA] Justification: This paper does not contain experiments. Guidelines: The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. Including this information in the supplemental material is fine, but if the main contribution of the paper involves human subjects, then as much detail as possible should be included in the main paper. According to the Neur IPS Code of Ethics, workers involved in data collection, curation, or other labor should be paid at least the minimum wage in the country of the data collector. 15. Institutional Review Board (IRB) Approvals or Equivalent for Research with Human Subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or institution) were obtained? Answer: [NA] Justification: This paper does not contain experiments. Guidelines: The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. Depending on the country in which research is conducted, IRB approval (or equivalent) may be required for any human subjects research. If you obtained IRB approval, you should clearly state this in the paper. We recognize that the procedures for this may vary significantly between institutions and locations, and we expect authors to adhere to the Neur IPS Code of Ethics and the guidelines for their institution. For initial submissions, do not include any information that would break anonymity (if applicable), such as the institution conducting the review.