# are_equivariant_equilibrium_approximators_beneficial__5d55d160.pdf

Are Equivariant Equilibrium Approximators Beneficial?

Zhijian Duan 1 Yunxuan Ma 1 Xiaotie Deng 1 2

Recently, remarkable progress has been made by approximating Nash equilibrium (NE), correlated equilibrium (CE), and coarse correlated equilibrium (CCE) through function approximation that trains a neural network to predict equilibria from game representations. Furthermore, equivariant architectures are widely adopted in designing such equilibrium approximators in normal-form games. In this paper, we theoretically characterize the benefits and limitations of equivariant equilibrium approximators. For the benefits, we show that they enjoy better generalizability than general ones and can achieve better approximations when the payoff distribution is permutation-invariant. For the limitations, we discuss their drawbacks in terms of equilibrium selection and social welfare. Together, our results help to understand the role of equivariance in equilibrium approximators.

1. Introduction

The equivariant equilibrium property states that, given a Nash Equilibrium (NE) solution of a game, the permuted solution is also an NE for the game whose actions of representation are permuted in the same way. The same property also holds in correlated equilibrium (CE) and coarse correlated equilibrium (CCE), as well as the approximate solutions for all three solution concepts.

In this paper, we are interested in understanding the equivariant equilibrium property in designing neural networks that predict equilibria from game payoffs, following such recent approaches in designing equivariant equilibrium approximators (Feng et al., 2021; Marris et al., 2022) in normal-form games. Informally, such equivariant approximators keep the same permutation of the output strategies (represented as

1Center on Frontiers of Computing Studies, School of Computer Science, Peking University, Beijing, China 2Center for Multi Agent Research, Institute for AI, Peking University, Beijing, China. Correspondence to: Xiaotie Deng <xiaotie@pku.edu.cn>.

Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright 2023 by the author(s).

vectors or tensors) when the input game representations (e.g., the game payoff tensors) are permuted. We will provide a formal definition of equivariance equilibrium approximators in Section 4. While equivariant approximators achieved empirical success, little work has theoretically discussed whether they are beneficial.

We theoretically characterize the benefits and limitations of equivariant NE, CE, and CCE approximators. For the benefits, we first show that equivariant approximators enjoy better generalizability, where we evaluate the approximators using the maximum exploitability (Lockhart et al., 2019; Goktas & Greenwald, 2022) over all players. To get such a result, we derive the generalization bounds and the sample complexities of the NE, CE, and CCE approximators: The generalization bounds offer confidence intervals on the expected testing approximations based on the empirical training approximations; The sample complexities describe how many training samples the equilibrium approximators need to achieve desirable generalizability. Both the generalization bounds and sample complexities include the covering numbers (Shalev-Shwartz & Ben-David, 2014), which measure the representativeness of the approximators function classes. We then prove that the equivariant approximators have lower covering numbers than the general models, therefore have lower generalization bounds and sample complexities. Additionally, we also show that the equivariant approximators can achieve better approximation when the payoff distribution is permutation-invariant.

As for the limitations, we find the equivariant approximators unable to find all the equilibria of some normal-form games. Such a result is caused by the limited representativeness of the equivariant approximators function class. Besides, we find that the equivariant NE approximator may lose social welfare. Specifically, in an example we constructed, while the maximum NE social welfare is significant, the maximum social welfare of NEs that the equivariant NE approximators could find can be arbitrarily close to zero. Such a negative result inspires us to balance generalizability and social welfare when we design the approximators architectures.

In summary, we make the following contributions:

We theoretically characterize the benefits of equivariant equilibrium approximators: Compared to general

Are Equivariant Equilibrium Approximators Beneficial?

approximators, equivariant ones offer improved generalizability and, in some instances, achieve better approximation performance.

We demonstrate a limitation of equivariant equilibrium approximators through constructive examples: they cannot identify all equilibria in some games.

We illustrate that equivariant NE approximators may find NE with arbitrarily small maximum social welfare in extreme cases. In contrast, the actual maximum social welfare achievable by NEs can be significantly larger.

The rest of our paper is organized as follows: In Section 2 we discuss further related works. In Section 3 we introduce the preliminary of game theory and equilibrium approximators. In Section 4 we formally define the equivariance of equilibrium approximators. We present our theoretical analysis of benefits in Section 5 and limitations in Section 6. We conclude and point out the future work in Section 7.

2. Related Work

Solving (approximate) NE, CE, and CCE for a single game has been extensively studied in the literature (Fudenberg et al., 1998; Cesa-Bianchi & Lugosi, 2006; Tsaknakis & Spirakis, 2008; Fu & Lin, 2020; Deligkas et al., 2022). However, in many practical scenarios and multi-agent learning algorithms, there is a need to solve many similar games (Marris et al., 2021; Liu et al., 2022). For instance, in repeated traffic routing games (Sessa et al., 2020), the payoffs of games depend on the capacity of the underlying network, which can vary with time and weather conditions. In repeated sponsored search auctions, advertisers value different keywords based on the current marketing environment (Nekipelov et al., 2015). In multi-agent reinforcement learning (Deng et al., 2023) algorithms such as Nash Q-learning (Hu & Wellman, 2003), Correlated-Q learning (Greenwald et al., 2003), V-learning (Jin et al., 2022) and PSRO (Lanctot et al., 2017), an NE, CE or CCE of a normal-form game need to be solved in every update step.

In these settings, it is preferred to accelerate the speed of game solving by function approximation: Duan et al. (2023) propose the construction of a neural network for approximating NE in normal-form games and analyze the PAC learnability of such an NE approximator; Marris et al. (2022) introduce a neural equilibrium approximator to approximate CE and CCE in n-player normal-form games; Feng et al. (2021) propose a neural NE approximator within the context of PSRO (Lanctot et al., 2017); Wu & Lisser (2022) design a CNN-based NE approximator specifically for zero-sum bimatrix games. Differentiable approximators have also been developed to learn QREs (Ling et al., 2018), NE in chance-

constrained games (Wu & Lisser, 2023), and opponent s strategy (Hartford et al., 2016).

Equivariance is an ideal property of the equilibrium approximator. After formally defining them, we will discuss the literates of equivariant approximators in Section 4.2.

3. Preliminary

In this section, we introduce the preliminary and notations of our paper. We also provide a brief introduction to equilibrium approximators.

3.1. Game Theory

Normal-Form Game Let a normal-form game with joint payoff u be Γu = (n, A, u), in which

n N 2 is the number of players. Each player is represented by the index i [n].

A = i [n]Ai is the product action space of all players, where Ai = {1, 2, . . . , mi}. For player i [n], let ai Ai be a specific action of i (An action is also referred to as a pure strategy). A joint action a = (a1, a2, . . . , an) A represents one play of the game in which the player i takes action ai. The action space A is a Cartesian product that contains all possible joint actions. We have |A| = Q

i [n] |Ai| = Q

u = (ui)i [n] is the joint payoff or utility of the game. ui is an n-dimensional tensor (or matrix if n = 2) describing player i s payoff on each joint action. In our paper, following previous literatures (Tsaknakis & Spirakis, 2008; Deligkas et al., 2022), we normalize all the elements of payoff into [0, 1].

A joint (mixed) strategy is a distribution over A. Let σ = i [n]σi be a product strategy and π A be a joint (possibly correlated) strategy. Denote πi as the marginal strategy of player i in π. The expected utility of player i under π is

ui(π) = Ea π[ui(a)] = X

a A π(a)ui(a).

Besides, on behalf of player i, the other players joint strategy is denoted as π i, so as a i and σ i.

Nash Equilibrium (NE) We say a product strategy σ = (σ 1, σ 2, . . . , σ n) is a NE if each player s strategy is the best response given the strategies of others, i.e.,

ui(σi, σ i) ui(σ i , σ i), i, σi. (NE)

Computing NE for even general 2-player or 3-player games is PPAD-hard (Chen et al., 2009; Daskalakis et al., 2009),

Are Equivariant Equilibrium Approximators Beneficial?

which leads to research on approximate solutions. For arbitrary ϵ > 0, we say a product strategy ˆσ is an ϵ-approximate Nash equilibrium (ϵ-NE) if no one can achieve more than ϵ utility gain by deviating from her current strategy. Formally,

ui(σi, ˆσ i) ui(ˆσi, ˆσ i) + ϵ, i, σi. (ϵ-NE)

The definition of ϵ-NE reflects the idea that players might not be willing to deviate from their strategies when the amount of utility they could gain by doing so is tiny (not more than ϵ).

Coarse Correlated Equilibrium (CCE) We say a joint (possibly correlated) strategy π is a CCE if no player can receive a higher payoff by acting independently, i.e.,

ui(σi, π i) ui(π ), i, σi Ai, (CCE)

and we say ˆπ is an ϵ-approximate coarse correlated equilibrium (ϵ-CCE) for ϵ > 0 if

ui(σi, ˆπ i) ui(ˆπ) + ϵ, i, σi Ai, (ϵ-CCE)

The difference between NE and CCE is that in an NE, players execute their strategy individually in a decentralized way, while in a CCE, players strategies are possibly correlated. A standard technique to correlate the strategy is sending each player a signal from a centralized controller (Shoham & Leyton-Brown, 2008).

Correlated Equilibrium (CE) CE is similar to CCE, except that in a CE, each player can observe her recommended action before she acts. Thus, player i deviates her strategy through strategy modification ϕi : Ai Ai. ϕi maps actions in Ai to possibly different actions in Ai. Based on strategy modification, we say a joint (possibly correlated) strategy π is a CE if X

a A π (a)ui(ϕi(ai), a i) ui(π ), i, ϕi, (CE)

and a joint strategy ˆπ is an ϵ-approximate correlated equilibrium (ϵ-CE) for ϵ > 0 if X

a A ˆπ(a)ui(ϕi(ai), a i) ui(ˆπ) + ϵ, i, ϕi, (ϵ-CE)

Note that for a finite n-player normal-form game, at least one NE, CE, and CCE must exist. This is because NE always exists (Nash et al., 1950) and NE CE CCE.

Equilibrium Approximation To evaluate the quality of a joint strategy to approximate an equilibrium, we define approximation based on exploitability (Lockhart et al., 2019; Goktas & Greenwald, 2022).

Definition 3.1 (Exploitability and Approximation). Given a joint strategy π, the exploitability (or regret) Ei(π, u) of player i is the maximum payoff gain of i by deviating from her current strategy, i.e.,

Ei(π, u) := max σ i ui(σ i, π i) ui(π)

= max a i ui(a i, π i) ui(π)

and the exploitability under strategy modification ECE i (π, u) of player i is the maximum payoff gain of i by deviating through strategy modification, i.e.,

ECE i (π, u) := max ϕi

a A π(a)ui(ϕi(ai), a i) ui(π).

The equilibrium approximation is defined as the maximum exploitability over all players 1, i.e.,

( maxi [n] Ei(π, u) , for NE and CCE maxi [n] ECE i (π, u) , for CE

Based on approximation, we can restate the definition of solution concepts. A product strategy σ is an NE of game Γu if E(σ, u) = 0 and is an ϵ-NE if E(σ, u) ϵ. A joint strategy π is a (C)CE of Γu if E(π, u) = 0 and is an ϵ-(C)CE if E(π, u) ϵ.

3.2. Equilibrium Approximator

The equilibrium approximators, including NE, CE, and CCE approximators, aim to predict solution concepts from game representations. In our paper, we fix the number of players n and action space A. We denote by U the set of all the possible game payoffs. The NE approximator f NE : U i [n] Ai maps a game payoff to a product strategy, where f NE(u)i Ai is player i s predicted strategy. We define FNE as the function class of the NE approximator. Similarly, we define (C)CE approximator as f (C)CE : U A and (C)CE approximator class as F(C)CE.

An equilibrium approximator can be learned through machine learning paradigms based on empirical data. For instance, Feng et al. (2021) learn the NE approximator using the game payoffs generated in the learning process of PSRO, and optimize the approximator by gradient descent in exploitability. Marris et al. (2022) learn the CE and CCE approximators using the games i.i.d. generated from a manually designed distribution, and optimize the approximators using maximum welfare minimum relative entropy loss. Such a loss balances the equilibrium approximation,

1A similar metric of equilibrium approximation is called Nikaido-Isoda function (Nikaidˆo & Isoda, 1955) or NASHCONV (Lockhart et al., 2019), which is the sum of exploitability over all players, i.e., P

i [n] Ei(π, u).

Are Equivariant Equilibrium Approximators Beneficial?

Algorithm 1 Example: learning NE/CCE approximator via minibatch SGD

1: Input: Training set S 2: Parameters: Number of iterations T > 0, batch size B > 0, learning rate η > 0, initial parameters w0 Rd

of the approximator model. 3: for t = 0 to T do 4: Receive minibatch St = {u(1), . . . , u(B)} S 5: Compute the empirical average approximation of St: 6: LSt(f wt) 1

B PB i=1 E(f wt(u(i)), u(i)) 7: Update model parameters: 8: wt+1 wt η wt LSt(f wt) 9: end for

the social welfare, and the relative entropy of the joint strategy. Additionally, another simple way to learn NE or CCE equilibrium approximator is to apply minibatch stochastic gradient descent (SGD) on approximation. Specifically, we denote w Rd as the d-dimensional parameter of the approximator, such as the weights of the neural network. We can optimize w by the standard minibatch SGD algorithm on approximation (See Algorithm 1).

4. Equivariant Equilibrium Approximator

In this section, we introduce the equivariance of the equilibrium approximators and the way how we apply orbit averaging (Elesedy & Zaidi, 2021) to construct equivariant approximators. Equivariant approximator has been developed in many literatures (Hartford et al., 2016; Feng et al., 2021; Marris et al., 2022; Wu & Lisser, 2022), which we will discuss later.

4.1. Definition of Equivariance

We first define the permutation of a game. Let ρi : Ai Ai be a permutation function of player i, which is a bijection from Ai to Ai itself. Define Gi ρi as the class of permutation function for player i, which forms an Abelian group.

Definition 4.1 (Permutation of a game). For a normal-form game Γu = (n, u, A), we define the ρi-permutation of payoff tensor u as ρiu = (ρiuj)j [n], in which

(ρiuj)(ai, a i) = uj(ρ 1 i (ai), a i), a A.

We also define the ρi-permutation of joint strategy π as ρiπ, where

(ρiπ)(ai, a i) = π(ρ 1 i (ai), a i), a A,

and the ρi-permutation of product strategy σ as ρiσ = (ρiσj)j [n], where

aj Aj, ρiσj(aj) =

( σj(aj) , if j = i σi(ρ 1 i ai) , if j = i

Equivariant NE Approximator Considering the relationship of ρi-permuted game and ρi-permuted product strategy, we have the following result for NE:

Lemma 4.2. In a normal-form game Γu = (n, u, A), for arbitrary player i [n] and any (ϵ-)NE strategy σ = (σi, σ i), ρiσ = (ρiσi, σ i) is also an (ϵ-)NE for the ρi-permuted game Γρiu.

Lemma 4.2 tells the unimportance of action order with respect to NE approximation. Based on it, we can summarize two ideal properties for NE approximators, which are defined as follows:

Definition 4.3 (Player-Permutation-Equivariance, PPE). We say an NE approximator f NE satisfies player i permutation-equivariant (i-PE) if for arbitrary permutation function ρi Gi we have

f NE(ρiu)i = ρif NE(u)i, (i-PE)

Moreover, we say f NE is player-permutation-equivariant (PPE) if f NE satisfies i-PE for all player i [n].

Definition 4.4 (Opponent-Permutation-Invariance, OPI). We say an NE approximator f NE is opponent i permutationinvariant (i-PI) if for any other player j [n] {i} and arbitrary permutation function ρi Gi we have

f NE(ρiu)j = f NE(u)j, j = i (i-PI)

and we say f NE is opponent-permutation-invariant (OPI) if f NE satisfies i-PI for all player i [n].

Equivariant (C)CE approximator Considering the relationship of ρi-permuted game and ρi-permuted joint strategy, we have a similar result for CE and CCE:

Lemma 4.5. In a normal-form game Γu = (n, u, A), for an arbitrary player i [n] and any (ε-)CE or (ϵ-)CCE strategy π, ρiπ is also an (ε-)CE or (ϵ-)CCE for the ρi-permuted game Γρiu.

Inspired by Lemma 4.5, we can also summarize an ideal property for CE and CCE approximators defined as follows.

Definition 4.6 (Permutation-Equivariance,PE). We say an (C)CE approximator f (C)CE is player i permutationequivariant (i-PE) if for permutation function ρi Gi we have f (C)CE(ρiu) = ρif (C)CE(u),

and we say f (C)CE is permutation-equivariant (PE) if f (C)CE

satisfies i-PE for all player i [n].

4.2. Equivariant Approximators in Literature

For two-player games, Feng et al. (2021) propose an MLPbased NE approximator that satisfies both PPE and OPI for

Are Equivariant Equilibrium Approximators Beneficial?

zero-sum games. Additionally, they also design a Conv1dbased NE approximator that satisfies PPE only. Hartford et al. (2016) give a PPE approximator to predict players strategies. The traditional algorithms Tsaknakis & Spirakis (2008) and Deligkas et al. (2022), which approximate NE by optimization, are also PPE and OPI to payoffs and the initial strategies. For n-player general games, Marris et al. (2022) provide a permutation-equivariant approximator to approximate CE and CCE. Equivariant architectures are also adopted in optimal auction design (Rahme et al., 2021; Duan et al., 2022; Ivanov et al., 2022), and Qin et al. (2022) theoretically characterize the benefits of permutation-equivariant in auction mechanisms. We follow the rough idea of Qin et al. (2022) when we analyze the benefits of equivariant equilibrium approximators.

4.3. Orbit Averaging

Orbit averaging is a well-known method to enforce equivariance or invariance for a function (Schulz-Mirbach, 1994). It averages the inputs of a function over the orbit of a group (e.g., the permutation group in our paper).

Orbit Averaging for NE Approximator For an NE approximator f NE and any player i [n], we can construct a i-PI or i-PE NE approximator by averaging with respect to all the permutations of player i. Specifically, we construct an i-PI NE approximator by operator Oi with

(Oif NE)(u)j =

( f NE(u)i , if j = i 1 |Ai|! P

ρi Gi f NE(ρiu)j , otherwise

and we construct an i-PE NE approximator by operator Pi with:

(Pif NE)(u)j =

( 1 |Ai|! P

ρi Gi ρ 1 i f NE(ρiu)i , if j = i

f NE(u)j , otherwise

Lemma 4.7. Oif NE is i-PI and Pif NE is i-PE. Specially, if f NE is already i-PI or i-PE, then we have Oif NE = f NE

or Pif NE = f NE, respectively.

To construct a PPE or OPI NE approximator, we composite the operators with respect to all players. Let O = O1 O2 On and P = P1 P2 Pn, we get the following corollary:

Lemma 4.8. Of NE is OPI and Pf NE is PPE. If f NE is already OPI or PPE, we have Of NE = f NE or Pf NE = f NE, respectively.

Furthermore, we can also compose P O to construct a NE approximator with both PPE and OPI.

Orbit Averaging for (C)CE Approximator For CE or CCE approximator f, we define Qi-project for player i

[n] to construct an i-PE approximator, which averages with respect to all the permutations of player i.

(Qif (C)CE)(u) = 1 |Ai|!

ρi Gi ρ 1 i f (C)CE(ρiu)

Similarly, we define Q = Q1 Q2 Qn as the composite operator.

Lemma 4.9. Qif (C)CE is i-PE and Qf (C)CE is PE. Specifically, If f (C)CE is already i-PE or PE, then we have Qif (C)CE = f (C)CE or Qf (C)CE = f (C)CE, respectively.

Combined with Lemma 4.7, Lemma 4.8 and Lemma 4.9, we can have the following corollary directly.

Corollary 4.10. O2 = O, P2 = P, Q2 = Q.

The benefit of using orbit averaging is shown in the following lemma:

Lemma 4.11. Denote X as an idempotent operator, i.e. X 2 = X (e.g. O, P or Q). For function class F of NE, CE or CCE approximator, let FX be any subset of F that is closed under X, then XFX is the largest subset of FX that is invariant under X.

According to Lemma 4.8, Lemma 4.9 and Lemma 4.11, OFNE(or PFNE/QF(C)CE) is the largest subset of FNE(or FNE/F(C)CE) with the corresponding property (OPI, PPE, or PE) if FNE(or FNE/F(C)CE) is closed operator under O(or P/Q). The result tells that the orbit averaging operators, while enforcing the operated function to be equivariance or invariance, keep as large a capacity of the function class as possible. Therefore, we believe that orbit averaging is an ideal approach to constructing equivariant or invariant functions.

5. Theoretical Analysis of Benefits

In this section, we theoretically analyze the benefits of equivariant approximators with respect to generalizability and approximation.

5.1. Benefits for Generalization

We first derive the generalization bound and sample complexity for general approximator classes, and then we show the benefits of equivariant approximators by applying orbit averaging to the approximators.

The representativeness of an approximator class is measured by the covering numbers (Shalev-Shwartz & Ben-David, 2014) under ℓ -distance, which are defined as follows:

Definition 5.1 (ℓ -distance). The ℓ -distance between two equilibrium approximators f, g is:

ℓ (f, g) = max u U f(u) g(u) ,

Are Equivariant Equilibrium Approximators Beneficial?

where we define the distance of two product strategies σ and σ as

σ1 σ2 = max i [n]

ai Ai |σ1 i (ai) σ2 i (ai)|,

and the distance of two joint strategy π and π as

a A |π1(a) π2(a)|.

Definition 5.2 (r-covering number). For r > 0, we say function class Fr r-covers another function class F under ℓ -distance if for all function f F, there exists fr Fr such that f fr r. The r-covering number N (F, r) of F is the cardinality of the smallest function class Fr that r-covers F under ℓ -distance.

Based on covering numbers, we provide the generalization bounds of NE, CE and CCE approximators. The bounds describe the difference between the expected testing approximation and empirical training approximation.

Theorem 5.3. [Generalization bound] For function class F of NE, CE or CCE approximator, with probability at least 1 δ over draw of the training set S (with size m) from payoff distribution D, for all approximator f F we have

Eu D[E(f(u), u)] 1

u S E(f(u), u)

2 ln N (F, r)

m + Lr} + 4

where L = 2n for NE approximator, and L = 2 for CE and CCE approximators.

To get the theorem, we first show that all three equilibrium approximations are Lipschitz continuous with respect to strategies. Afterward, we derive the Rademacher complexity (Bartlett & Mendelson, 2002) of the expected approximation based on the Lipschitz continuity and covering numbers. See Appendix B.1 for the detailed proof.

We can see from Theorem 5.3 that, with a large enough training set, the generalization gaps of equilibrium approximators go to zero if the covering number N (F, r) is bounded. As a result, we can estimate the expected testing performance through the empirical training performance.

We can also derive the sample complexities of equilibrium approximators to achieve the desirable generalizability.

Theorem 5.4. [Sample complexity] For ϵ, δ (0, 1), function class F of NE, CE or CCE approximator and distribution D, with probability at least 1 δ over draw of the training set S with

δ + ln N (F, ϵ

games sampled from D, f F we have

Eu D[E(f(u), u)] 1

u S E(f(u), u) + ϵ,

where L = 2n for NE approximators, and L = 2 for CE and CCE approximators.

The proof is based on the Lipschitz continuity of approximation, uniform bound, and concentration inequality. See Appendix B.2 for details. Theorem 5.4 is also called the uniform convergence of function class F, which is a sufficient condition for agnostic PAC learnable (Shalev-Shwartz & Ben-David, 2014).

As for the benefits of equivariant approximators for generalizability, the following result indicates that the projected equilibrium approximators have smaller covering numbers.

Theorem 5.5. The O-projected, P-projected and Qprojected approximator classes have smaller covering numbers, i.e., r > 0 we have

N (OFNE, r) N (FNE, r),

N (PFNE, r) N (FNE, r),

N (QF(C)CE, r) N (F(C)CE, r).

The proof is done by showing all the operators are contraction mappings. See Appendix B.3 for details.

Both the generalization bounds in Theorem 5.3 and the sample complexities in Theorem 5.4 decrease with the decrease of covering numbers N (F, r). Thus, we can see from Theorem 5.5 that both PPE and OPI can improve the generalizability of NE approximators, and PE can improve the generalizability of CE and CCE approximators.

5.2. Benefits for Approximation

We then show the benefits of equivariance for approximation when the payoff distribution is invariant under permutation. The permutation-invariant distribution holds when the action is anonymous or indifferent or when we pre-train the equilibrium approximators using a manually designed distribution (Marris et al., 2022).

(C)CE Approximator The following theorem tells the benefit of permutation-equivariance in decreasing the exploitability of (C)CE approximators.

Theorem 5.6. When the payoff distribution D is invariant under the permutation of payoffs, the Q-projected (C)CE approximator has a smaller expected equilibrium approximation. Formally, for all f (C)CE F(C)CE and permutationinvariant distribution D, we have

Eu D[E(Qf (C)CE(u), u)] Eu D[E(f (C)CE(u), u)],

Are Equivariant Equilibrium Approximators Beneficial?

The proof is done by using the convexity of approximation. See Appendix B.4 for details. We can see from Theorem 5.6 that, when payoff distribution is invariant under permutation, it is beneficial to use equivariant architecture as the CE or CCE approximators.

NE Approximator As for NE approximators, we have similar results.

Theorem 5.7. For bimatrix constant-sum games, when the payoff distribution D is invariant under the permutation of payoffs, then the X-projected (X {O, P}) NE approximator has a smaller expected exploitability. Formally, for all f NE FNE and permutation-invariant distribution D for bimatrix constant-sum games, we have

i=1 Ei((Xf NE)(u), u)

i=1 Ei(f NE(u), u)

Theorem 5.8. When the payoff distribution D is invariant under the permutation of payoffs, and f NE satisfies OPI, then the P-projected NE approximator has a smaller expected NE approximation. Formally, for all f NE FNE

that is OPI and permutation-invariant distribution D, we have

Eu D[E((Pf NE)(u), u)] Eu D[E(f NE(u), u)].

Theorem 5.9. For bimatrix games, when the payoff distribution D is invariant under the permutation of payoffs, and f NE satisfies PPE, then the O-projected NE approximator has a smaller expected NE approximation. Formally, for all f NE FNE that is PPE and permutation-invariant distribution D of bimatrix games, we have

Eu D[E((Of NE)(u), u)] Eu D[E(f NE(u), u)].

Theorem 5.8 and Theorem 5.9 tell that PPE and OPI approximators can achieve better approximation than ones with only PPE or OPI. Meanwhile, we can see from Theorem 5.7 that for bimatrix constant-sum games (such as zero-sum games), it can be preferred to introduce PPE or OPI to the architectures.

6. Theoretical Analysis of Limitations

As we discussed in Section 5, equivariant approximators enjoy better generalizability and better approximation sometimes. However, as we will show, they have some limitations regarding equilibrium selection and social welfare. Such limitations attribute to the limited representativeness caused by equivariance.

6.1. Equilibrium Selection

We first show that there may be equilibria points that equivariant approximators will never find. We illustrate such limitation in permutation-invariant games, which is defined as follows:

Definition 6.1 (Permutation-ρ-Invariant Game). We say a game Γu is permutation-ρ-invariant, where ρ = i [n]ρi, if the payoff u is permutation-invariant with respect to ρ. That is, ρu = u.

Permutation-ρ-invariance indicates that one cannot distinguish joint action a from ρa using only the payoff u. We d like to provide an example to show more insight of permutation-ρ-invariant games:

Example 6.2. For a 2-player game Γu = (2, u = (u1, u2), A = ([m1], [m2])) , Let ρi = (mi, mi 1, . . . , 1) and ρ = ρ1 ρ2. If one of the following conditions holds, then u is permutation-ρ-invariant:

1. u1 and u2 are symmetric and persymmetric (i.e., symmetric with respect to the northeast-to-southwest diagonal) squares.

2. Both u1 and u2 are centrosymmetric, i.e., ui(x, y) = ui(m1 + 1 x, m2 + 1 y) for i {1, 2}, x [m1] and y [m2].

For permutation ρ = ( i [n]ρi) and player k [n], we denote the set of non-fixed actions of player k under ρk as

V (ρk) := {ak|ak Ak, ρk(ak) = ak}.

Based on V (ρk), we find some equilibria points of permutation-ρ-invariant games that any equivariant approximators will never find.

Theorem 6.3. For a permutation-ρ-invariant game Γu. if there is a pure NE a = (a i )i [n] and at least one player k [n] such that a k V (ρk), then a will never be found by any NE approximator with both PPE and OPI. Besides, a (as a pure CE or CCE) will also never be found by any CE or CCE approximator with PE.

We illustrate Theorem 6.3 by the following example:

Example 6.4. Consider a bimatrix game with identity utility

u = 1, 1 0, 0 0, 0 1, 1

There are two pure NE (bolded in the above matrix) and one mixed NE of σ1 = (0.5, 0.5) and σ2 = (0.5, 0.5). Let ρi be the unique permute function (except for identity function) of player i [2], and ρ = ρ1 ρ2. The game is permutationρ-invariant.

Are Equivariant Equilibrium Approximators Beneficial?

Case 1: Let f be a permutation-equivariant CE or CCE approximator, and denote π = f(u). We have

π = f(u) (a) = f(ρu) (b) = ρf(u),

where (a) holds by permutation-ρ-invariance of u, and (b) holds by PE of f. Thus, we have π1,1 = π2,2 [0, 1

2] and π1,2 = π2,1 [0, 1

2]. As a result, the two pure (C)CEs cannot be found.

Case 2: Let f be a NE approximator that holds PPE and OPI. Denote f(u) = (σ1, σ2), where σ1 = (p1, 1 p1) and σ2 = (p2, 1 p2). By PPE and OPI of f, we have

f(u)1 = (p1, 1 p1) (a) = f(ρ1ρ2u)1 (b) = ρ1f(ρ2u)1

(c) = ρ1f(u)1 = (1 p1, p1),

where (a) holds by permutaion-ρ-invariance of u, (b) holds by PPE of f, and (c) holds by OPI of f. As a result, the only NE that f could find is the mixed NE.

As we can see from the example and Theorem 6.3, the equivariance, while introducing inductive bias to the approximator architecture, is also a strong constraint. Such a constraint is why the equivariant approximators cannot find all the equilibria points.

6.2. Social Welfare

The social welfare of a joint strategy π is defined as the sum of all players utilities, i.e.,

SW(π, u) = X

i [n] ui(π).

The equilibrium with higher social welfare is usually preferred (Marris et al., 2022).

To analyze the social welfare of equivariant approximators, we define the worst social welfare ratio as follows:

Definition 6.5. For any N, M 2 and two NE (or CE/CCE) approximator classes F1, F2 that target on games with the number of players n N and |Ai| M, we define the worst social welfare ratio of F1 over F2 as:

SWRN,M(F1, F2) := inf D maxf1 F1 Eu DSW(f1(u), u) maxf2 F2 Eu DSW(f2(u), u).

SWRN,M(F1, F2) measures the relative representativeness of F1 over F2 in terms of social welfare. Based on that, we have the following result for equivariant CE and CCE approximator classes:

Theorem 6.6. Given N, M 2, let F(C)CE PE be the function class (target on games with the number of players n N and |Ai| M) of all the (C)CE approximators with

PE. Denote by F(C)CE general the function class of all the (C)CE approximators. Then we have

SWRN,M(F(C)CE PE , F(C)CE general) = 1.

Theorem 6.6 tells that, while the permutation-equivariant (C)CE approximator class may not be able to find all the (C)CE in a game, it can keep the social welfare of the output solutions.

However, when considering equivariant NE approximators, we have the following negative result: Theorem 6.7. Given N, M 2, let FNE OPI, FNE PPE and FNE both be the function classes (target on games with number of players n N and |Ai| M) of all the NE approximators with OPI, PPE and both. Denote the function class of all the NE approximators as FNE general. Then we have

SWRN,M(FNE OPI, FNE general) = 1 M N 1 , (1)

SWRN,M(FNE PPE, FNE general) 1

SWRN,M(FNE both, FNE general) = 1 M N 1 . (3)

Additionally, when M 3, denote by e FNE both the function class of all the NE oracles (functions that always output exact NE solutions of the input games) with both PPE and OPI, and by e FNE general the function class of all the NE oracles. Then we have

SWRN,M( e FNE both, e FNE general) = 0. (4)

The proof is done by construction (See Appendix C.3 for details). As an illustration of Equation (4), consider a bimatrix game with the following payoff:

1, 1 0, 0 0, 1/2 + ε 0, 0 1, 1 0, 1/2 + ε 1/2 + ε, 0 1/2 + ε, 0 ε, ε

for ϵ (0, 1/2). The maximum NE (the upper-left corner of u) social welfare is 2, which can be found by at least one NE oracle in e FNE general. However, the only NE (the lower-right corner of u) that the NE oracles in e FNE both could find only has a social welfare of 2ϵ. As a result,

SWR2,3( e FNE both, e FNE general) 2ϵ

which goes to zero as ϵ 0. Recall that we always have SWRN,M 0, thus Equation (4) holds when N = 2 and M = 3.

Theorem 6.7 tells that equivariant NE approximators may lose some social welfare while enjoying better generalizability. Such a result inspires us to balance generalizability and social welfare when designing the NE approximator architecture.

Are Equivariant Equilibrium Approximators Beneficial?

7. Conclusion and Future Work

In this paper, we theoretically analyze the benefits and limitations of equivariant equilibrium approximators, including player-permutation-equivariant (PPE) and opponent-permutation-invariant (OPI) NE approximator, and permutation-equivariant (PE) CE and CCE approximators. For the benefits, we first show that these equivariant approximators enjoy better generalizability. To get the result, we derive the generalization bounds and sample complexities based on covering numbers, and then we prove that the symmetric approximators have lower covering numbers. We then show that the equivariant approximators can decrease the exploitability when the payoff distribution is invariant under permutation. For the limitations, we find the equivariant approximators may fail to find some equilibria points due to their limited representativeness caused by equivariance. Besides, while equivariant (C)CE approximators can keep the social welfare, the equivariant NE approximators reach a small worst social welfare ratio compared to the general approximators. Such a result indicates that equivariance may reduce social welfare; therefore, we d better balance the generalizability and social welfare when we design the architectures of NE approximators.

As for future work, since in our paper we assume the training and testing payoff distribution are the same, an interesting topic is to study the benefits of equivariant approximators under the payoff distribution shift. Moreover, since we consider fixed and discrete action space, another interesting future direction is to analyze the benefits of equivariant approximators in varying or continuous action space.

Acknowledgements

This work is supported by Science and Technology Innovation 2030 - New Generation Artificial Intelligence Major Project (No. 2022ZD0114904). We thank all anonymous reviewers for their helpful feedback.

Bartlett, P. L. and Mendelson, S. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov):463 482, 2002.

Cesa-Bianchi, N. and Lugosi, G. Prediction, learning, and games. Cambridge university press, 2006.

Chen, X., Deng, X., and Teng, S.-H. Settling the complexity of computing two-player Nash equilibria. Journal of the ACM (JACM), 56(3):1 57, 2009.

Daskalakis, C., Goldberg, P. W., and Papadimitriou, C. H. The complexity of computing a Nash equilibrium. SIAM Journal on Computing, 39(1):195 259, 2009.

Deligkas, A., Fasoulakis, M., and Markakis, E. A polynomial-time algorithm for 1/3-approximate Nash equilibria in bimatrix games. In 30th Annual European Symposium on Algorithms, ESA, 2022.

Deng, X., Li, N., Mguni, D., Wang, J., and Yang, Y. On the complexity of computing markov perfect equilibrium in general-sum stochastic games. National Science Review, 10(1):nwac256, 2023.

Duan, Z., Tang, J., Yin, Y., Feng, Z., Yan, X., Zaheer, M., and Deng, X. A context-integrated transformer-based neural network for auction design. In International Conference on Machine Learning, pp. 5609 5626. PMLR, 2022.

Duan, Z., Huang, W., Zhang, D., Du, Y., Wang, J., Yang, Y., and Deng, X. Is Nash equilibrium approximator learnable? In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, pp. 233 241, 2023.

D utting, P., Feng, Z., Narasimhan, H., Parkes, D., and Ravindranath, S. S. Optimal auctions through deep learning. In International Conference on Machine Learning, pp. 1706 1715. PMLR, 2019.

Elesedy, B. and Zaidi, S. Provably strict generalisation benefit for equivariant models. In International Conference on Machine Learning, pp. 2959 2969. PMLR, 2021.

Feng, X., Slumbers, O., Wan, Z., Liu, B., Mc Aleer, S., Wen, Y., Wang, J., and Yang, Y. Neural auto-curricula in twoplayer zero-sum games. Advances in Neural Information Processing Systems, 34:3504 3517, 2021.

Fu, H. and Lin, T. Learning utilities and equilibria in nontruthful auctions. Advances in Neural Information Processing Systems, 33:14231 14242, 2020.

Fudenberg, D., Drew, F., Levine, D. K., and Levine, D. K. The theory of learning in games, volume 2. MIT press, 1998.

Goktas, D. and Greenwald, A. Exploitability minimization in games and beyond. In Advances in Neural Information Processing Systems, 2022.

Greenwald, A., Hall, K., Serrano, R., et al. Correlated Q-learning. In ICML, volume 3, pp. 242 249, 2003.

Hartford, J. S., Wright, J. R., and Leyton-Brown, K. Deep learning for predicting human strategic behavior. Advances in neural information processing systems, 29, 2016.

Hu, J. and Wellman, M. P. Nash Q-learning for general-sum stochastic games. Journal of machine learning research, 4(Nov):1039 1069, 2003.

Are Equivariant Equilibrium Approximators Beneficial?

Ivanov, D., Safiulin, I., Filippov, I., and Balabaeva, K. Optimal-er auctions through attention. In Advances in Neural Information Processing Systems, 2022.

Jin, C., Liu, Q., Wang, Y., and Yu, T. V-learning a simple, efficient, decentralized algorithm for multiagent RL. In ICLR 2022 Workshop on Gamification and Multiagent Solutions, 2022.

Lanctot, M., Zambaldi, V., Gruslys, A., Lazaridou, A., Tuyls, K., P erolat, J., Silver, D., and Graepel, T. A unified game-theoretic approach to multiagent reinforcement learning. Advances in neural information processing systems, 30, 2017.

Ling, C., Fang, F., and Kolter, J. Z. What game are we playing? End-to-end learning in normal and extensive form games. In IJCAI, pp. 396 402, 2018.

Liu, S., Lanctot, M., Marris, L., and Heess, N. Simplex neural population learning: Any-mixture bayes-optimality in symmetric zero-sum games. In International Conference on Machine Learning, ICML, 2022.

Lockhart, E., Lanctot, M., P erolat, J., Lespiau, J.-B., Morrill, D., Timbers, F., and Tuyls, K. Computing approximate equilibria in sequential adversarial games by exploitability descent. In Kraus, S. (ed.), IJCAI, pp. 464 470. ijcai.org, 2019.

Marris, L., Muller, P., Lanctot, M., Tuyls, K., and Graepel, T. Multi-agent training beyond zero-sum with correlated equilibrium meta-solvers. In International Conference on Machine Learning, pp. 7480 7491. PMLR, 2021.

Marris, L., Gemp, I., Anthony, T., Tacchetti, A., Liu, S., and Tuyls, K. Turbocharging solution concepts: Solving NEs, CEs and CCEs with neural equilibrium solvers. In Advances in Neural Information Processing Systems, 2022.

Nash, J. F. et al. Equilibrium points in n-person games. Proceedings of the national academy of sciences, 36(1): 48 49, 1950.

Nekipelov, D., Syrgkanis, V., and Tardos, E. Econometrics for learning agents. In Proceedings of the sixteenth acm conference on economics and computation, pp. 1 18, 2015.

Nikaidˆo, H. and Isoda, K. Note on non-cooperative convex games. Pacific Journal of Mathematics, 5(S1):807 815, 1955.

Qin, T., He, F., Shi, D., Huang, W., and Tao, D. Benefits of permutation-equivariance in auction mechanisms. In Advances in Neural Information Processing Systems, 2022.

Rahme, J., Jelassi, S., Bruna, J., and Weinberg, S. M. A permutation-equivariant neural network architecture for auction design. In Proceedings of the AAAI Conference on Artificial Intelligence, 2021.

Schulz-Mirbach, H. Constructing invariant features by averaging techniques. In Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3Conference C: Signal Processing (Cat. No. 94CH3440-5), volume 2, pp. 387 390. IEEE, 1994.

Sessa, P. G., Bogunovic, I., Krause, A., and Kamgarpour, M. Contextual games: Multi-agent learning with side information. Advances in Neural Information Processing Systems, 33:21912 21922, 2020.

Shalev-Shwartz, S. and Ben-David, S. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.

Shoham, Y. and Leyton-Brown, K. Multiagent systems: Algorithmic, game-theoretic, and logical foundations. Cambridge University Press, 2008.

Tsaknakis, H. and Spirakis, P. G. An optimization approach for approximate nash equilibria. Internet Mathematics, 5 (4):365 382, 2008.

Wu, D. and Lisser, A. Using CNN for solving two-player zero-sum games. Expert Systems with Applications, pp. 117545, 2022.

Wu, D. and Lisser, A. CCGnet: A deep learning approach to predict Nash equilibrium of chance-constrained games. Information Sciences, 2023.

Are Equivariant Equilibrium Approximators Beneficial?

A. Omitted Proofs in Section 4

A.1. Useful Lemma

We first introduce a lemma, which will be frequently used in the following proofs.

Lemma A.1. i, j [n], ρi Gi we have (ρiu)j(σi, σ i) = uj(ρ 1 i σi, σ i) and (ρiu)j(π) = uj(ρ 1 i π).

Proof. Define bai := ρ 1 i ai. For product strategy σ = (σi)i [n],

(ρiu)j(σi, σ i) = X

a i A i (ρiu)j(ai, a i) σi(ai) σ i(a i)

a i A i uj(ρ 1 i ai, a i) σi(ai) σ i(a i)

a i A i uj(ρ 1 i ai, a i) (ρ 1 i σi)(ρ 1 i ai) σ i(a i)

a i A i uj(bai, a i) (ρ 1 i σi)(bai) σ i(a i)

=uj(ρ 1 i σi, σ i).

For joint strategy π,

(ρiu)j(π) = X

a i A i (ρiuj)(ai, a i) π(ai, a i)

a i A i uj(ρ 1 i ai, a i) π(ai, a i)

a i A i uj(ρ 1 i ai, a i) (ρ 1 i π)(ρ 1 i ai, a i)

a i A i uj(bai, a i) (ρ 1 i π)(bai, a i)

=uj(ρ 1 i π).

A.2. Proof of Lemma 4.2

Lemma 4.2. In a normal-form game Γu = (n, u, A), for arbitrary player i [n] and any (ϵ-)NE strategy σ = (σi, σ i), ρiσ = (ρiσi, σ i) is also an (ϵ-)NE for the ρi-permuted game Γρiu.

Proof. For player i, we have

Ei(ρiσ, ρiu) = max ai Ai ρiui(ai, ρiσ i) ρiui(ρiσ) = max ai Ai ρiui(ai, σ i) ρiui(ρiσi, σ i)

= max ai Ai ui(ρ 1 i ai, σ i) ui(ρ 1 i ρiσi, σ i) (a) = max ai Ai ui(ai, σ i) ui(σi, σ i) = Ei(σ, u),

where (a) holds since ρi is a bijection on Ai. For player j = i, we have

Ej(ρiσ, ρiu) = max aj A ρiuj(aj, ρiσ j) ρiuj(ρiσ) = max aj Aj uj(aj, ρ 1 i ρiσ j) uj(ρ 1 i ρiσ)

= max aj Aj uj(aj, σ j) uj(σ) = Ej(σ, u).

From above, we have E(ρiσ, ρiu) = E(σ, u), thus if σ is a ε-NE of Γu, then ρiσ must be a ε-NE of Γρiu.

Are Equivariant Equilibrium Approximators Beneficial?

A.3. Proof of Lemma 4.5

Lemma 4.5. In a normal-form game Γu = (n, u, A), for an arbitrary player i [n] and any (ε-)CE or (ϵ-)CCE strategy π, ρiπ is also an (ε-)CE or (ϵ-)CCE for the ρi-permuted game Γρiu.

CCE For player i, we have

Ei(ρiπ, ρiu) = max ai Ai(ρiui)(ai, (ρiπ) i) (ρiui)(ρiπi)

= max ai Ai(ρiui)(ai, (ρiπ) i) ui(ρ 1 i ρiπi)

= max ai Ai(ρiui)(ai, (ρiπ) i) ui(πi)

= max ai Ai

b A (ρiui)(ai, b i) (ρiπ)(b) ui(πi)

= max ai Ai

b i A i ui(ρ 1 i ai, b i) π(ρ 1 i bi, b i) ui(πi)

= max ai Ai

b i A i ui(ai, b i) π(bi, b i) ui(πi) , ρi is a bijection on Ai.

For player j = i, we have

Ej(ρiπ, ρiu) = max aj Aj(ρiuj)(aj, (ρiπ) j) (ρiuj)(ρiπj)

= max aj Aj(ρiuj)(aj, (ρiπ) j) uj(ρ 1 i ρiπj)

= max aj Aj(ρiuj)(aj, (ρiπ) j) uj(πj)

= max aj Aj

b A (ρiuj)(aj, b j) (ρiπ)(b) uj(πj)

= max aj Aj

b i A i uj(aj, (b j) i, ρ 1 i bi) π(ρ 1 i bi, b i) uj(πj)

= max aj Aj

b i A i uj(aj, (b j) i, bi) π(bi, b i) uj(πj) , ρi is a bijection on Ai.

Thus, we have E(ρiπ, ρiu) = E(π, u). Thus, if π is a ε-CCE of Γu, then ρiπ must be a ε-CCE of Γρiu.

CE For player j = i, we have

ECE j (ρiπ, ρiu) = max ϕj:Aj Aj

a A (ρiπ)(a) (ρiuj)(ϕj(aj), a j) (ρiuj)(ρiπ)

= max ϕj:Aj Aj

a A π(ρ 1 i ai, a i) uj(ϕj(aj), a i,j, ρ 1 i ai) uj(π)

= max ϕj:Aj Aj

a A π(ai, a i) uj(ϕj(aj), a i,j, ai) uj(π) , ρi is a bijection on Ai.

=ECE j (π, u).

For player i, we define operator ρi as ( ρiϕi)(ai) = ρ 1 i ϕi(ρiai). We can verify that ρi is a bijection on {ϕi : Ai Ai}, because is a homomorphism in the sense that ρ1 i ρ2 i = ρ2 i ρ1 i and maps the identity mapping of Ai to the identity mapping of {Ai Ai}. Specifically,

ρ1 i ρ2 i ϕi(ai) = (ρ1 i ) 1(ρ2 i ϕi)(ρ1 i ai) = (ρ1 i ) 1(ρ2 i ) 1ϕi(ρ2 i ρ1 i ai) = ρ2 i ρ1 i ϕi(ai),

Are Equivariant Equilibrium Approximators Beneficial?

eiϕi(ai) = e 1 i ϕi(eiai) = ϕi(ai).

Based on ρi, we have

ECE i (ρiπ, ρiu)

= max ϕi:Ai Ai

a A (ρiπ)(a) (ρiui)(ϕi(ai), a i) ui(π)

= max ϕi:Ai Ai

a A π(ρ 1 i ai, a i)ui(ρ 1 i ϕi(ai), a i) ui(π)

= max ϕi:Ai Ai

a A π(ρ 1 i ai, a i)ui(ρ 1 i ϕi(ρi(ρ 1 i ai)), a i) ui(π)

= max ϕi:Ai Ai

a A π(ai, a i)ui(ρ 1 i ϕi(ρiai), a i) ui(π) , ρi is a bijection on Ai.

= max ϕi:Ai Ai

a A π(ai, a i)ui(( ρiϕi)(ai), a i) ui(π)

= max ϕi:Ai Ai

a A π(ai, a i)ui(ϕi(ai), a i) ui(π) , ρi is a bijection on {Ai Ai}.

=ECE i (π, u).

Thus, we have E(ρiπ, ρiu) = E(π, u), thus if π is a ε-CE of Γu, then ρiπ must be a ε-CE of Γρiu.

A.4. Proof of Lemma 4.7 to Lemma 4.9

Lemma 4.7. Oif NE is i-PI and Pif NE is i-PE. Specially, if f NE is already i-PI or i-PE, then we have Oif NE = f NE or Pif NE = f NE, respectively.

Proof. j = i, ρ0 Gi, for operator Oi we have

(Oif NE)(ρ0u)j = 1 |Ai|!

ρi Gi f NE(ρiρ0u)j (a) = 1 |Ai|!

bρi Gi f NE(bρiu)j = (Oif NE)(u)j,

where in (a) we define bρi = ρiρ0, and (a) holds since ρ0 is a bijection on Gi. As a result, Oif NE is i-PI.

For operator Pi we have

(Pif NE)(ρ0u)i = 1 |Ai|!

ρi Gi ρ 1 i f NE(ρiρ0u)j = ρ0 1 |Ai|!

ρi Gi ρ 1 0 ρ 1 i f NE(ρiρ0u)j

=ρ0 1 |Ai|!

bρi Gi bρ 1 i f NE(bρiu)j = ρ0(Pif NE)(u)i,

therefore Pif NE is i-PE.

If f NE is already i-PI, j = i we have

Oif NE(u)j = 1 |Ai|!

ρi Gi f NE(ρiu)j = 1 |Ai|!

ρi Gi f NE(u)j = f NE(u)j,

and Oif NE(u)i = f NE(u)i according to definition of Oi. Therefore, Oif NE = f NE for i-PI f NE.

If f NE is already i-PE, we have

Pif NE(u)i = 1 |Ai|!

ρi Gi ρ 1 i f NE(ρiu)i = 1 |Ai|!

ρi Gi ρ 1 i ρif NE(u)i = 1 |Ai|!

ρi Gi f NE(u)i = f NE(u)i,

Are Equivariant Equilibrium Approximators Beneficial?

and j = i, Pif NE(u)j = f NE(u)j according to definition of Pi. Therefore, Pif NE = f NE for i-PE f NE.

Lemma 4.8. Of NE is OPI and Pf NE is PPE. If f NE is already OPI or PPE, we have Of NE = f NE or Pf NE = f NE, respectively.

Proof. A direct inference from Lemma 4.7

Lemma 4.9. Qif (C)CE is i-PE and Qf (C)CE is PE. Specifically, If f (C)CE is already i-PE or PE, then we have Qif (C)CE = f (C)CE or Qf (C)CE = f (C)CE, respectively.

Proof. ρ0 Gi, we have

(Qif (C)CE)(ρ0u) = 1 |Ai|!

ρi Gi ρ 1 i f (C)CE(ρiρ0u) = ρ0 1 |Ai|!

ρi Gi ρ 1 0 ρ 1 i f (C)CE(ρiρ0u)

=ρ0 1 |Ai|!

bρi Gi bρ 1 i f (C)CE(bρiu) = ρ0(Qif (C)CE)(u).

If f (C)CE is already i-PE, we have

Qif (C)CE(u) = 1 |Ai|!

ρi Gi ρ 1 i f (C)CE(ρiu) = 1 |Ai|!

ρi Gi ρ 1 i ρif (C)CE(u) = 1 |Ai|!

ρi Gi f (C)CE(u) = f (C)CE(u).

A.5. Proof of Lemma 4.11

Lemma 4.11. Denote X as an idempotent operator, i.e. X 2 = X (e.g. O, P or Q). For function class F of NE, CE or CCE approximator, let FX be any subset of F that is closed under X, then XFX is the largest subset of FX that is invariant under X.

Proof. We prove the three claims below.

1. XFX FX .

2. X 2FX = XFX .

3. If XY = Y FX , then Y XFX .

The first claim holds because FX is closed under X, and the second claim holds because X is idempotent. For the third claim, from Y FX we know XY XFX , then Y = XY XFX .

We immediately know XFX is the largest subset of FX that is invariant under X.

B. Omitted Proofs in Section 5

B.1. Proof of Theorem 5.3

Theorem 5.3. [Generalization bound] For function class F of NE, CE or CCE approximator, with probability at least 1 δ over draw of the training set S (with size m) from payoff distribution D, for all approximator f F we have

Eu D[E(f(u), u)] 1

u S E(f(u), u)

2 ln N (F, r)

m + Lr} + 4

Are Equivariant Equilibrium Approximators Beneficial?

where L = 2n for NE approximator, and L = 2 for CE and CCE approximators.

Some of the proof techniques come from D utting et al. (2019) and Duan et al. (2023). We first introduce some useful lemmas. Denote ℓ: F U R as the loss function (such as ℓ(f, u) := E(f(u), u)). We measure the capacity of the composite function class ℓ F using the empirical Rademacher complexity (Bartlett & Mendelson, 2002) on the training set S, which is defined as:

RS(ℓ F) := 1

m Ex {+1, 1}m h sup f F

i=1 xi ℓ(f, u(i)) i ,

where x is distributed i.i.d. according to uniform distribution in {+1, 1}. We have

Lemma B.1 (Shalev-Shwartz & Ben-David (2014)). Let S be a training set of size m drawn i.i.d. from distribution D over U. Then with probability at least 1 δ over draw of S from D, for all f F,

Eu D[ℓ(f, u)] 1

u S ℓ(l, u) 2RS(ℓ F) + 4

Lemma B.2. If |ℓ( )| c for constant c > 0 and f, f F, |ℓ(f, u) ℓ(f , u)| L f f , then we have

Eu D[ℓ(f, u)] 1

u S ℓ(l, u) 2 inf r>0

2 ln N (F, r)

Proof. For function class F, let Fr with |Fr| = N (F, r) be the function class that r-covers F for some r > 0. Similarly, f F, denote fr Fr be the function that r-covers f. We have

RS(ℓ F) = 1

m Ex h sup f F

i=1 xi ℓ(f, u(i)) i

m Ex h sup f F

i=1 xi ℓ(fr, u(i)) + ℓ(f, u(i)) ℓ(fr, u(i)) i

m Ex h sup fr Fr

i=1 xi ℓ(fr, u(i)) i + 1

m Ex h sup f F

i=1 |xi Lr| i , |ℓ(f, u) ℓ(fr, u)| L f fr = Lr.

i=1 ℓ2(fr, u(i))

2 ln N (F, r)

m Ex x , the first term holds by Massart s lemma.

2 ln N (F, r)

2 ln N (F, r)

Combining Lemma B.1 and Equation (5), we get

Eu D[ℓ(f, u)] 1

u S ℓ(l, u) 2 inf r>0

2 ln N (F, r)

B.1.1. NE APPROXIMATOR

Lemma B.3. For arbitrary product mixed strategy σ and σ , we have

|E(σ, u) E(σ , u)| 2n σ σ .

Are Equivariant Equilibrium Approximators Beneficial?

Proof. σ, σ , we define y j := (σ1, . . . , σj 1, σ j+1, . . . , σ n). Then, i [n] we have

|ui(σ) ui(σ )| =|ui(σ1, σ2, . . . , σn) ui(σ , σ 2, . . . , σ n)|

ui(σ1, . . . , σj, σ j+1, . . . , σ n) ui(σ1, . . . , σ j, σ j+1, . . . , σ n)

ui(σj, y j) ui(σ j, y j)

aj (σj(aj) σ j(aj)) X

a j ui(aj, a j)y j(a j)

σj(aj) σ j(aj) X

a j ui(aj, a j)y j(a j)

σj(aj) σ j(aj) X

a j y j(a j) , ui( ) [0, 1]

σj(aj) σ j(aj) n max j [n]

σj(aj) σ j(aj)

Therefore, ai Ai,

ui(ai, σ i) ui(σ) =ui(ai, σ i) ui(ai, σ i) + ui(ai, σ i) ui(σ ) + ui(σ ) ui(σ)

n σ σ + E(σ , u) + n σ σ

=E(σ , u) + 2n σ σ .

Based on that, we get

E(σ, u) = max i N,ai Ai[ui(ai, σ i) ui(σ)] E(σ , u) + 2n σ σ .

Similarly, we also have E(σ , u) E(σ, u) + 2n σ σ .

Based on Lemma B.3, f, f FNE, we have

E(f(u), u) E(f (u), u) 2 f(u) f (u) 2 f f .

Considering that |E( )| 1, according to Lemma B.2, we have:

Eu D[E(f NE(u), u)] 1

u S E(f NE(u), u) 2 inf r>0

2 ln N (FNE, r)

m + 2nr o + 4

B.1.2. CCE APPROXIMATOR

Lemma B.4. For arbitrary joint mixed strategy π and π , we have

|E(π, u) E(π , u)| 2 π π ,

Proof. π, π , i [n] we have

|ui(π) ui(π )| = X

a A (π(a) π (a))ui(a) (a) X

a A |π(a) π (a)| = π π , (6)

Are Equivariant Equilibrium Approximators Beneficial?

where (a) holds since ui( ) [0, 1]. Therefore, ai Ai,

ui(ai, π i) ui(π) =ui(ai, π i) ui(ai, π i) + ui(ai, π i) ui(π ) + ui(π ) ui(π)

π π + E(π , u) + π π

=E(π , u) + 2 π π .

Based on that, we get E(π, u) = max i N,ai Ai[ui(ai, π i) ui(π)] E(π , u) + 2 π π .

Similarly, we also have E(π , u) E(π, u) + 2 π π .

Based on Lemma B.4, f, f FCCE, we have

E(f(u), u) E(f (u), u) 2 f(u) f (u) 2 f f .

Considering that |E( )| 1, according to Lemma B.2, we have:

Eu D[E(f CCE(u), u)] 1

u S E(f CCE(u), u) 2 inf r>0

2 ln N (FCCE, r)

m + 2r o + 4

B.1.3. CE APPROXIMATOR

Lemma B.5. For arbitrary joint mixed strategy π and π , we have

|ECE(π, u) ECE(π , u)| 2 π π .

Proof. ai Ai, ϕi, we have X

a A π(a)ui(ϕ(ai), a i) ui(π) = X

a A π(a)ui(ϕ(ai), a i) X

a A π (a)ui(ϕ(ai), a i)

a A π (a)ui(ϕ(ai), a i) ui(π ) + ui(π ) ui(π)

π π + ECE(π , u) + π π

=ECE(π , u) + 2 π π .

Based on that, we get

ECE(π, u) = max i N max ϕi

a A π(a)ui(ϕ(ai), a i) ui(π) ECE(π , u) + 2 π π .

Similarly, we also have ECE(π , u) ECE(π, u) + 2 π π .

Based on Lemma B.4, f, f FCE, we have

ECE(f(u), u) ECE(f (u), u) 2 f(u) f (u) 2 f f .

Considering that |E( )| 1, according to Lemma B.2, we have:

Eu D[ECE(f CE(u), u)] 1

u S ECE(f CE(u), u) 2 inf r>0

2 ln N (FCE, r)

m + 2r o + 4

Are Equivariant Equilibrium Approximators Beneficial?

B.2. Proof of Theorem 5.4

Theorem 5.4. [Sample complexity] For ϵ, δ (0, 1), function class F of NE, CE or CCE approximator and distribution D, with probability at least 1 δ over draw of the training set S with

δ + ln N (F, ϵ

games sampled from D, f F we have

Eu D[E(f(u), u)] 1

u S E(f(u), u) + ϵ,

where L = 2n for NE approximators, and L = 2 for CE and CCE approximators.

Proof. For function class F of NE, CE or CCE approximators, according to Lemma B.3, Lemma B.4 and Lemma B.5, f, g F we have E(CE)(f(u), u) E(CE)(g(u), u) L f(u) g(u) L f g , (7)

where L = 2n for NE approximators, and L = 2 for CE and CCE approximators.

For simplicity, we denote LS(f) = 1 m P

u S E(CE)(f(u), u) and LD(f) = Eu D[E(CE)(f(u), u)]. let Fr with |Fr| = N (F, r) be the function class that r-covers F for some r > 0. ϵ (0, 1), by setting r = ϵ 3L we have

PS Dm h f F, LS(f) LD(f) > ϵ i

PS Dm h f F, LS(f) LS(fr) + LS(fr) LD(fr) + LD(fr) LD(f) > ϵ i

(a) PS Dm h f F, Lr + LS(fr) LD(fr) + Lr > ϵ i

PS Dm h fr Fr, LS(fr) LD(fr) > ϵ 2Lr i

(b) N (F, r)PS Dm h LS(f) LD(f) > ϵ 2Lr i

(c) 2N (F, r) exp( 2m(ϵ 2Lr)2),

where (a) holds by Equation (7), (b) holds by union bound, and (c) holds by Hoeffding inequality. As a result, when

m 9 2ϵ2 ln 2

δ + ln N (F, ϵ 3L) , we have PS Dm h f F, LS(f) LD(f) > ϵ i < δ.

B.3. Proof of Theorem 5.5

Theorem 5.5. The O-projected, P-projected and Q-projected approximator classes have smaller covering numbers, i.e., r > 0 we have

N (OFNE, r) N (FNE, r),

N (PFNE, r) N (FNE, r),

N (QF(C)CE, r) N (F(C)CE, r).

We first provide an auxiliary lemma. Lemma B.6. For function class F and orbit averaging operator X, if f, g F, ℓ (Xf, Xg) ℓ (f, g), then N (XF, r) N (F, r) for any r > 0.

Proof. r > 0, Denote Fr as the smallest r-covering set that covers F with size N (F, r). f F, let fr Fr be the function that r-covers f. We have ℓ (Xfr, Xf) ℓ (fr, f) r. Therefore, XFr is a r-covering set of XF, and we have N (XF, r) |XFr| |Fr| = N .

Are Equivariant Equilibrium Approximators Beneficial?

Proof of Theorem 5.5. For player i [n] and f NE, g NE FNE, assuming U is closed under any ρi Gi. For Oi,

l (Oif NE, Oig NE) = max u U Oif NE(u) Oig NE(u)

= max j [n] max u U (Oif NE)(u)j (Oig NE)(u)j

= max n max u U f NE(u)i g NE(u)i , max j =i max u U 1 |Ai|!

ρi Gi (f NE(ρiu)j g NE(ρiu)j) o

max n max u U f NE(u)i g NE(u)i , max j =i 1 |Ai|!

ρi Gi max u U f NE(ρiu)j g NE(ρiu)j o

= max n max u U f NE(u)i g NE(u)i , max j =i 1 |Ai|!

ρi Gi max u U f NE(u)j g NE(u)j o

= max n max u U f NE(u)i g NE(u)i , max j =i max u f NE(u)j g NE(u)j o

=l (f NE, g NE).

Since O = O1 On, we have

ℓ (Of NE, Og NE) ℓ (f NE, g NE). (8)

l (Pif NE, Pig NE) = max u U max j [n] (Pif NE)(u)j (Pig NE)(u)j

= max n max j =i max u f NE(u)j g NE(u)j , max u 1 |Ai|!

ρi Gi ρ 1 i (f NE(ρiu)i g NE(ρiu)i) o

= max n max j =i max u f NE(u)j g NE(u)j , max u 1 |Ai|!

ρi Gi (f NE(ρiu)i g NE(ρiu)i) o

max n max j =i max u f NE(u)j g NE(u)j , 1 |Ai|!

ρi Gi max u f NE(ρiu)i g NE(ρiu)i o

= max n max j =i max u f NE(u)j g NE(u)j , 1 |Ai|!

ρi Gi max u f NE(u)i g NE(u)i o

= max n max j =i max u f NE(u)j g NE(u)j , max u f NE(u)i g NE(u)i o

=l (f NE, g NE).

Since P = P1 Pn, we have

ℓ (Pf NE, Pg NE) ℓ (f NE, g NE). (9)

Are Equivariant Equilibrium Approximators Beneficial?

For CE or CCE approximator f (C)CE F(C)CE and Qi, we have

l (Qif (C)CE, Qig(C)CE) = max u U (Qif (C)CE)(u) (Qig(C)CE)(u)

= max u 1 |Ai|!

ρi Gi ρ 1 i (f (C)CE(ρiu) g(C)CE(ρiu))

max u 1 |Ai|!

ρi Gi ρ 1 i (f (C)CE(ρiu) g(C)CE(ρiu))

ρi Gi max u ρ 1 i (f (C)CE(ρiu) g(C)CE(ρiu))

ρi Gi max u f (C)CE(ρiu) g(C)CE(ρiu)

ρi Gi max u f (C)CE(u) g(C)CE(u)

=l (f (C)CE, g(C)CE).

Since Q = Q1 Qn, we have

ℓ (Qf (C)CE, Qg(C)CE) ℓ (f (C)CE, g(C)CE). (10)

Combing Lemma B.6, Equation (8), Equation (9) and Equation (10), we finish the proof.

B.4. Proof of Theorem 5.6

Theorem 5.6. When the payoff distribution D is invariant under the permutation of payoffs, the Q-projected (C)CE approximator has a smaller expected equilibrium approximation. Formally, for all f (C)CE F(C)CE and permutationinvariant distribution D, we have

Eu D[E(Qf (C)CE(u), u)] Eu D[E(f (C)CE(u), u)],

We first prove a lemma about the property of Ei(π, u) and ECE i (π, u).

Lemma B.7. Ei(π, u) and ECE i (π, u) are convex on π, i.e.

p E(CE) i (π1, u) + (1 p)E(CE) i (π2, u) E(CE) i (pπ1 + (1 p)π2, u), p [0, 1].

Proof. We recall the definition Ei(π, u) = maxai Ai ui(ai, π i) ui(π) for CCE approximator and ECE i (π, u) = maxϕi Ai Ai P a π(a)ui(ϕi(ai), a i) ui(π) for CE approximator. ui(ai, π i) is linear on π. Given ϕ, P

a π(a)ui(ϕi(ai), a i) is also linear on π. Moreover, the maximum operator on a set of linear functions will induce a convex function.

Are Equivariant Equilibrium Approximators Beneficial?

Proof of Theorem 5.6. For f F(C)CE and i, j [n],

Eu D[E(CE) i (Qjf(u), u)] =Eu D[E(CE) i ( 1 |Aj|!

ρj Gj ρ 1 j f(ρju), u)] , by definition.

ρj Gj Eu D[E(CE) i (ρ 1 j f(ρju), u)] , by convexity.

ρj Gj Ev D[E(CE) i (ρ 1 j f(v), ρ 1 j v)] , let v = ρju.

ρj Gj Ev D[E(CE) i (f(v), v)] , invariance of E(CE) i (π, u) under ρ 1 j Gj.

=Eu D[E(CE) i (f(u), u)].

Since Q = i Qi and E = maxi Ei, we have

Eu D[E(Qf(u), u)] Eu D[E(f(u), u)].

B.5. Proof of Theorem 5.7

Theorem 5.7. For bimatrix constant-sum games, when the payoff distribution D is invariant under the permutation of payoffs, then the X-projected (X {O, P}) NE approximator has a smaller expected exploitability. Formally, for all f NE FNE and permutation-invariant distribution D for bimatrix constant-sum games, we have

i=1 Ei((Xf NE)(u), u)

i=1 Ei(f NE(u), u)

Proof. We only prove for the P-projected case; the proof for O-projected case is similar and therefore omitted.

Ei(σ, u) = max ai Ai ui(ai, σ i) ui(σ).

Denote u1(σ) + u2(σ) c, then

i=1 Ei(σ, u) = max a1 A1,a2 A2 u1(a1, σ2) + u2(a2, σ1) c.

Then we have

i=1 Ei((Pf NE)(u), u)] =Eu D[max a1,a2 u1(a1, (Pf NE)(u)2) + u2(a2, (Pf NE)(u)1) c]

=Eu D[max a1 u1(a1, (Pf NE)(u)2)] + Eu D[max a2 u2(a2, (Pf NE)(u)1)] c.

Are Equivariant Equilibrium Approximators Beneficial?

For the first term,

Eu D[max a1 u1(a1, (Pf NE)(u)2)] =Eu D[max a1 u1(a1, 1 |A2|!

ρ2 G2 ρ 1 2 f NE(ρ2u)2)]

ρ2 G2 Eu D[max a1 u1(a1, ρ 1 2 f NE(ρ2u)2)]

ρ2 G2 Ev D[max a1 (ρ 1 2 v)1(a1, ρ 1 2 f NE(v)2)]

ρ2 G2 Ev D[max a1 v1(a1, f NE(v)2)]

=Eu D[max a1 u1(a1, f NE(u)2)].

Similarly, for the second term,

Eu D[max a2 u2(a2, (Pf NE)(u)1)] Eu D[max a2 u2(a2, f NE(u)1)].

i=1 Ei((Pf NE)(u), u)] =Eu D[max a1 u1(a1, (Pf NE)(u)2)] + Eu D[max a2 u2(a2, (Pf NE)(u)1)] c

Eu D[max a1 u1(a1, f NE(u)2)] + Eu D[max a2 u2(a2, f NE(u)1)] c

i=1 Ei(f NE(u), u)].

B.6. Proof of Theorem 5.8

Theorem 5.8. When the payoff distribution D is invariant under the permutation of payoffs, and f NE satisfies OPI, then the P-projected NE approximator has a smaller expected NE approximation. Formally, for all f NE FNE that is OPI and permutation-invariant distribution D, we have

Eu D[E((Pf NE)(u), u)] Eu D[E(f NE(u), u)].

We first introduce a useful lemma. It is about the property of Ei(σ, u)

Lemma B.8. Ei(σ, u) is

1. Linear on σi, i.e.

p Ei((σ1 i , σ i), u) + (1 p)Ei((σ2 i , σ i), u) = Ei((pσ1 i + (1 p)σ2 i , σ i), u), p [0, 1].

2. Convex on σj, i.e.

p Ei((σ1 j , σ j), u) + (1 p)Ei((σ2 j , σ j), u) Ei((pσ1 j + (1 p)σ2 j , σ j), u), p [0, 1], j = i.

Proof. We recall the definition Ei(σ, u) = maxai Ai ui(ai, σ i) ui(σ). Notice that ui(σ) is linear on σk for all k [n], thus both ui(ai, σ i) and ui(σ) are linear on σk for any k [n]. Moreover, the maximum operator on a set of linear functions will induce a convex function.

Proof of Theorem 5.8. We prove the theorem in two steps.

Are Equivariant Equilibrium Approximators Beneficial?

Step 1 First, we show that

Eu D[Ei((Pif NE)(u), u)] = Eu D[Ei(f NE(u), u)], f NE FNE.

By definition,

Eu D[Ei(Pif NE(u), u)]

=Eu D[Ei(( 1 |Ai|!

ρi Gi ρ 1 i f(ρiu)i, f(u) i), u)]

ρi Gi Eu D[Ei((ρ 1 i f(ρiu)i, f(u) i), u)] , by linearity of Ei(σ, u) on σi.

ρi Gi Ev D[Ei((ρ 1 i f(v)i, f(ρ 1 i v) i), ρ 1 i v)] , let v = ρiu and use the invariance of D.

ρi Gi Ev D[Ei((ρ 1 i f(v)i, f(v) i), ρ 1 i v)] , OPI of f.

ρi Gi Eu D[Ei((f(u)i, f(u) i), u)] , invariance of Ei(σ, u). under ρ 1 i Gi.

=Eu D[Ei(f NE(u), u)].

Step 2 Then we show that

Eu D[Ej((Pif NE)(u), u)] Eu D[Ej(f NE(u), u)], f NE FNE, j = i.

Eu D[Ej((Pif NE)(u), u)]

=Eu D[Ej(( 1 |Ai|!

ρi Gi ρ 1 i f(ρiu)i, f(u) i), u)]

ρi Gi Eu D[Ej((ρ 1 i f(ρiu)i, f(u) i), u)] , by convexity of Ej(σ, u) on σi.

ρi Gi Ev D[Ej((ρ 1 i f(v)i, f(ρ 1 i v) i), ρ 1 i v)] , let v = ρiu and use the invariance of D.

ρi Gi Ev D[Ej((ρ 1 i f(v)i, f(v) i), ρ 1 i v)] , OPI of f.

ρi Gi Eu D[Ej((f(u)i, f(u) i), u)] , invariance of Ej(σ, u) under ρ 1 i Gi.

=Eu D[Ej(f NE(u), u)].

Since P = i Pi and E = maxi Ei, we have

Eu D[E((Pf NE)(u), u)] Eu D[E(f NE(u), u)].

B.7. Proof of Theorem 5.9

Theorem 5.9. For bimatrix games, when the payoff distribution D is invariant under the permutation of payoffs, and f NE satisfies PPE, then the O-projected NE approximator has a smaller expected NE approximation. Formally, for all f NE FNE that is PPE and permutation-invariant distribution D of bimatrix games, we have

Eu D[E((Of NE)(u), u)] Eu D[E(f NE(u), u)].

Are Equivariant Equilibrium Approximators Beneficial?

Proof. We prove the theorem in two steps, similar to the proof of Theorem 5.8.

Step 1 First we show that for player i {1, 2}, let {j} = {1, 2}\{i},

Eu D[Ei((Oif NE)(u), u)] Eu D[Ei(f NE(u), u)].

This is because

Eu D[Ei((Oif NE)(u), u)] =Eu D[Ei((f NE(u)i, 1 |Ai|!

ρi Gi f NE(ρiu)j), u)]

ρi Gi Eu D[Ei((f NE(u)i, f NE(ρiu)j), u)] , by convexity of Ei on σj.

ρi Gi Ev D[Ei((f NE(ρ 1 i v)i, f NE(v)j), ρ 1 i v)] , let v = ρiu.

ρi Gi Ev D[Ei((ρ 1 i f NE(v)i, f NE(v)j), ρ 1 i v)] , by PPE of f NE.

ρi Gi Ev D[Ei((f NE(v)i, f NE(v)j), v)] , invariance of Ei(σ, u) under ρ 1 i G.

=Eu D[Ei((f NE)(u), u)].

Step 2 Then we show that if j = i and {i, j} = {1, 2}

Eu D[Ej((Oif NE)(u), u)] = Eu D[Ej(f NE(u), u)].

This is because

Eu D[Ej((Oif NE)(u), u)] =Eu D[Ej((f NE(u)i, 1 |Ai|!

ρi Gi f NE(ρiu)j), u)]

ρi Gi Eu D[Ej((f NE(u)i, f NE(ρiu)j), u)] , by linearity of Ej on σj.

ρi Gi Ev D[Ej((f NE(ρ 1 i v)i, f NE(v)j), ρ 1 i v)] , let v = ρiu.

ρi Gi Ev D[Ej((ρ 1 i f NE(v)i, f NE(v)j), ρ 1 i v)] , by PPE of f NE.

ρi Gi Ev D[Ej((f NE(v)i, f NE(v)j), v)] , invariance of Ej(σ, u) under ρ 1 i Gi.

=Eu D[Ej(f NE(u), u)].

Since O = i Oi and E = maxi Ei, we have

Eu D[E(Of NE(u), u)] Eu D[E(f NE(u), u)].

C. Omitted Proofs in Section 6

C.1. Proof of Theorem 6.3

Theorem 6.3. For a permutation-ρ-invariant game Γu. if there is a pure NE a = (a i )i [n] and at least one player k [n] such that a k V (ρk), then a will never be found by any NE approximator with both PPE and OPI. Besides, a (as a pure CE or CCE) will also never be found by any CE or CCE approximator with PE.

Are Equivariant Equilibrium Approximators Beneficial?

Proof. Let f be a PPE and OPI NE approximator. Denote f(u) = (σi)i [n]. For player k that a k V (ρk), we get

σk = f(u)k (a) = f(ρu)k (b) = f(ρku)k (c) = ρkf(u)k = ρkσk, (11)

where (a) holds since u is permutable w.r.t. ρ, (b) holds by OPI of f, and (c) holds by PPE of f. If a can be found by f,

we will have 1 = σk(a k) (d) = ρkσk(a k) = σk(ρ 1 k (a k)), where (d) holds by Equation (11). However, such result leads to a contradiction, because a k = ρ 1 k (ak) but σk(a k) = σ(ρ 1 k (a k)) = 1.

Let f be a PE (C)CE approximator. Denote f(u) = π, we have

π = f(u) (a) = f(ρu) (b) = ρf(u) = ρπ, (12)

where (a) holds since u is permutable w.r.t. ρ, (b) holds by PE of f. If a can be found by f, we will have 1 = π(a ) (c) = ρπ(a ) = π(ρ 1a ) = π(ρ 1 1 a 1, , ρ 1 n a n), where (c) holds by Equation (12). However, from a k V (ρk) we know ρ 1 k (a k) = a k, then ρ 1a = a , but π(a ) = π(ρ 1a ) = 1.

C.2. Proof of Theorem 6.6

Theorem 6.6. Given N, M 2, let F(C)CE PE be the function class (target on games with the number of players n N and |Ai| M) of all the (C)CE approximators with PE. Denote by F(C)CE general the function class of all the (C)CE approximators. Then we have SWRN,M(F(C)CE PE , F(C)CE general) = 1.

Proof. Assume f F(C)CE general is an (C)CE approximator that always finds the strategy that maximizes the social welfare. Afterward, we construct another f0 that satisfies PE and always finds the strategy that maximizes social welfare. f0 is constructed by orbit averaging:

f0(u) = Qf(u),

thus f0 is PE.

Denote D as an arbitrary payoff distribution of u such that D is invariant under permutation and the cardinality of its support is finite. We have

Eu DSW(Qif(u), u) =Eu DSW( 1 |Ai|!

ρi Gi ρ 1 i f(ρiu), u)

i=1 ui( 1 |Ai|!

ρi Gi ρ 1 i f(ρiu))

i=1 ui(ρ 1 i f(ρiu))

i=1 (ρ 1 i v)i(ρ 1 i f(v)) , let v = ρiu.

i=1 vi(f(v))

i=1 ui(f(u))

=Eu DSW(f(u), u).

Due to that Q = Q1 Qn, we have

Eu DSW(f0(u), u) = Eu DSW(f(u), u).

Are Equivariant Equilibrium Approximators Beneficial?

Due to the arbitrariness of D, we know that f0 maximizes the social welfare w.r.t. any u.

From above, we immediately know

SWRN,M(F(C)CE PE , F(C)CE general) = 1.

C.3. Proof of Theorem 6.7

Theorem 6.7. Given N, M 2, let FNE OPI, FNE PPE and FNE both be the function classes (target on games with number of players n N and |Ai| M) of all the NE approximators with OPI, PPE and both. Denote the function class of all the NE approximators as FNE general. Then we have

SWRN,M(FNE OPI, FNE general) = 1 M N 1 , (1)

SWRN,M(FNE PPE, FNE general) 1

SWRN,M(FNE both, FNE general) = 1 M N 1 . (3)

Additionally, when M 3, denote by e FNE both the function class of all the NE oracles (functions that always output exact NE solutions of the input games) with both PPE and OPI, and by e FNE general the function class of all the NE oracles. Then we have

SWRN,M( e FNE both, e FNE general) = 0. (4)

C.3.1. PROOF OF EQUATION (1) AND EQUATION (3)

We first prove the theorem with respect to FNE OPI and FNE both

Step 1 On the one part, we prove

SWRN,M(FNE OPI, FNE general)

SWRN,M(FNE both, FNE general)

We prove this by construction.

Consider a game with N player and Ai = [M] for i [N]. a A, i [N], define the payoff u as follows:

( 1 , if a1 = a2 = = a N. 0 , otherwise.

Define U = {u |u = iρi u, ρi Gi} and D as a uniform distribution on U. Easy to certify that D is a permutation-invariant distribution.

Let f FNE general be the NE oracle that f( u)i = 1 and for any u = iρi u U, f(u )i = ρi(1). Intuitively, the oracle will choose the action that will provide all players with revenue 1, leading to a social welfare of N. Since each player has got her maximum possible utility, we have

max f FNE general Eu DSW(f(u), u) = max f e FNE general Eu DSW( f(u), u) = N. (13)

For any j1, j2 [M] and j1 < j2, let ρ(j1,j2) i = (1, . . . , j2, . . . , j1, . . . , M) for all player i [N] be the swap permutation that swaps actions j1 and j2 and keeps other actions still. Then i =jρ(j1,j2) i u = ρ(j1,j2) j u for player j. For f FNE OPI, we

have f( u)j = f( i =jρ(j1,j2) i u)j = f(ρ(j1,j2) j u)j for arbitrary swap permutation ρ(j1,j2) j . Since any permutation can be

Are Equivariant Equilibrium Approximators Beneficial?

achieved by composition of swap permutations, we have ρj Gj, f( u)j = f(ρj u)j. Based on that, and by OPI of f, ρ = i [N]ρi we have f( u)j = f(ρ u)j, i.e. f is a constant function on U. Without loss of generality, we denote f(u) σ for all u U. Then

Eu DSW(f(u), u) = 1 |U|

u U SW(σ, u ) = 1 (M!)N 1 SW(σ, X

Additionally, we have (P u U u )(a) = ((M 1)!)N 1 for any a A. Based on that, we have

max f FNE OPI Eu DSW(f(u), u) = 1 (M!)N 1 N((M 1)!)N 1 = N M N 1 . (14)

Combining Equation (13) and Equation (14), we have

SWRN,M(FNE OPI, FNE general) 1 M N 1 .

Due to FNE both FNE OPI, we immediately know

SWRN,M(FNE both, FNE general) 1 M N 1 .

Step 2 On the other part, we prove

SWRN,M(FNE OPI, FNE general)

SWRN,M(FNE both, FNE general)

Define the maximum possible utility (MPU) for player i with respect to utility ui and action ai as

MPU(ui, ai) := max a i A i ui(ai, a i). (15)

Define the set of maximum possible utility best response for player i w.r.t. ui as

Bi(ui) := {ai Ai : MPU(ui, ai) = max a i Ai MPU(ui, a i)}.

We first conduct some simplification to the target.

SWRN,M(FNE both, FNE general) = inf D

maxf FNE both Eu DSW(f(u), u)

maxf FNE general Eu DSW(f(u), u) inf D

maxf FNE both Eu DSW(f(u), u)

Eu D maxσ SW(σ, u) .

Then we constrain u to be a cooperation game. For a normal form game Γu, we define u = ( ui)i [n] in which ui = 1 n Pn i=1 ui. Then we have SW(σ, u) = SW(σ, u), which means that constraining u to be a cooperation game will induce the same social welfare. Then

maxf FNE both Eu DSW(f(u), u)

Eu D maxσ SW(σ, u) = inf D

maxf FNE both Eu DSW(f(u), u)

Eu D maxσ SW(σ, u) .

Denote f0 be the approximator that always outputs uniform strategy on Bi( ui) for player i. It s obvious that f0 is both OPI and PPE because the operations from u to f0(u) are all permutation-equivariant. Then,

maxf FNE both Eu DSW(f(u), u)

Eu D maxσ SW(σ, u) inf D Eu DSW(f0(u), u) Eu D maxσ SW(σ, u).

Are Equivariant Equilibrium Approximators Beneficial?

Ignore the infimum and the expectation operator, consider SW(f0(u), u) maxσ SW(σ, u) for arbitrary u, denote b be the maximum element appeared in u, then the denominator equals Nb. But for the numerator, for player i, no matter what action ai Bi( ui) she chooses, she always has probability at least Q

j =i 1 |Bj| 1 M N 1 to achieve revenue b, therefore inducing SW(f0(u), u)

Then, SW(f0(u), u) maxσ SW(σ, u) 1 M N 1 , so as inf D Eu DSW(f0(u), u) Eu D maxσ SW(σ, u), SWRN,M(FNE both) and SWRN,M(FNE OPI).

SWRN,M(FNE OPI, FNE general)

SWRN,M(FNE both, FNE general)

= 1 M N 1 .

C.3.2. PROOF OF EQUATION (2)

We next prove the theorem with respect to FNE PPEthat

SWRN,M(FNE PPE, FNE general) 1

Consider a bimatrix game and Ai = [M] for i [2]. a A, i [2], define the payoff u as follows:

( 1 , if a1 = a2. 0 , otherwise.

Define U := {u |u = ρ1ρ2 u, ρi Gi} and D as a uniform distribution on U. Easy to certify that U = {u |u = ρ1 u, ρ1 G1} = {u |u = ρ2 u, ρ2 G2} and D is a permutation-invariant distribution.

Let f FNE general be the NE oracle that f( u)i = 1 and for any u = iρi u U, f(u )i = ρi(1). Intuitively, the oracle will choose the action that will provide all players with revenue of 1, leading to a social welfare of 2.

For a permutation ϱ on [M], let Pϱ {0, 1}M M be the corresponding permutation matrix. Denote P as the set of all permutation matrice. As a result, u U, ρ1 G1, ρ1u = (Pρ1u1, Pρ1u2) =: Pρ1u and ρ2 G2, ρ2u = (u1P T ρ2, u2P T ρ2) =: u P T ρ2. Specially, we have Pϱ u P T ϱ = u. For f FNE PPE, Denote f( u) = σ = (σ1, σ2). For permutation ϱ in [M] and payoff u = Pϱ u = u(P T ϱ ) 1, by PPE of f, we have f(u )1 = f(Pϱ u)1 = Pϱσ1 = ϱσ1, and f(u )2 = f( u(P T ϱ ) 1)2 = (Pϱ) 1σ2 = ϱ 1σ2. Then we have

SW(f(u ), u ) =

i=1 (Pϱ u)i(ϱσ1, ϱ 1σ2) =

i=1 ui(σ1, ϱ 1σ2) =

i=1 ( u P T ϱ )i(σ1, σ2) = SW(f( u), u P T ϱ ).

Eu DSW(f(u), u) = 1 |U|

u U SW(f(u ), u )

Pϱ P SW(f( u), u P T ϱ )

u= u(P T ϱ ) U SW(f( u), u)

= 1 |U|SW(σ, X

Since |U| = 1 M! and P

u U u is a tensor with all elements equal to (M 1)!. Thus Eu DSW(f(u), u) = 2 M and

SWRN,M(FNE PPE, FNE general) 1

Are Equivariant Equilibrium Approximators Beneficial?

C.3.3. PROOF OF EQUATION (4)

Consider a 3 3 game as follows, where ϵ (0, 1/2):

1, 1 0, 0 0, 1/2 + ε 0, 0 1, 1 0, 1/2 + ε 1/2 + ε, 0 1/2 + ε, 0 ε, ε

It is obvious that maxσ NE(Γu) SW(σ , u) = 2, and the corresponding strategy has been bolded. However, for NE oracles with both PPE and OPI, it can only output a unique NE with a pure strategy that induces utility (ε, ε).

Let ρ1 = ρ2 = (2, 1, 3), we have ρ1ρ2u = u. From the analysis above we know if f NE e FNE both and f NE(u) = (σ1, σ2), then σ1(1) = σ1(2), σ2(1) = σ2(2). We integrate the first two actions of player 1 and player 2 into a new action that will choose randomly between the first two actions, then we form the utility matrix below:

u = 1/2, 1/2 0, 1/2 + ε 1/2 + ε, 0 ε, ε

There is a unique NE in this Prisoner s Dilemma, which has been bolded. The game u is the same with the game u under the assumption that σ1(1) = σ1(2) and σ2(1) = σ2(2) in u. Then maxf e FNE both SW(f(u), u) = 2ε. Since ε can be arbitrarily

small, we have SWR2,3( e FNE both, e FNE general) = 0. As a result, we have SWRN,M( e FNE both, e FNE general) = 0 for all N 2 and M 3.

D. Experiments of NE Approximation Performance

In this section, we present experimental results that assess the approximation performance of orbit-averaged NE approximators. Our primary objective is to verify the validity of Theorem 5.7, 5.8, and 5.9. Additionally, we examine the social welfare performance observed in each experiment.

D.1. Experimental Setup

We employ a 5-layer fully connected neural network as the NE approximator, with 512 nodes in each hidden layer. Re LU serves as the activation function for each hidden layer, and batch normalization is applied before activation. During training, the Nash approximation loss function is utilized, and the model is optimized using the Adam optimizer. A batch size of 1024 is employed, with 65536 data points used for training and 1000 data points for testing purposes. Data is generated on-site for each training epoch. The initial learning rate is set to 5 10 4, with a decay ratio of γ = 0.3 at the 60th and 80th epochs.

Following the training phase, we proceed to test four different versions of the model: Raw, PPE, OPI, and BOTH. The Raw model represents the original network, while the PPE model applies operator P to the Raw model. Similarly, the OPI model applies operator O to the Raw model, and the BOTH model applies both operator P and O to the Raw model.

Due to the exponentially increasing computational complexity associated with orbit averaging, we limit the game size to be sufficiently small. In our experiments, we select a bimatrix game of size 4 4.

D.2. Experimental Results

Different distribution. In this experiment, we assess the performance of the NE approximator on games with different payoff distributions, namely Uniform, Gaussian, and Exponential distributions. In the Uniform distribution, each element in u is independently and identically sampled from a uniform distribution over the interval [0, 1]. For the Gaussian Games, each element in u is independently and identically sampled from a Gaussian distribution with mean 0 and standard deviation 1. In the Exponential Games, each element in u is independently and identically distributed according to an exponential distribution with rate parameter 1. The experimental results are presented in Table 1. We observe that the NE approximations of the PPE and OPI models are consistently higher than those of the BOTH model, while the BOTH model yields higher NE approximations compared to the RAW model. Furthermore, the social welfare decreases after orbit averaging in comparison to the RAW model, aligning with our theoretical findings. Overall, the experimental results align with Theorem 5.8 and Theorem 5.9.

Are Equivariant Equilibrium Approximators Beneficial?

Uniform Gaussian Exponential Approximation Social Welfare Approximation Social Welfare Approximation Social Welfare

Raw 0.0142 0.0005 1.5211 0.0048 0.0470 0.0010 1.7695 0.0295 0.0422 0.0022 3.8628 0.0410

PPE 0.0224 0.0012 1.4905 0.0086 0.0725 0.0017 1.6859 0.0281 0.0658 0.0021 3.7801 0.0448 OPI 0.0227 0.0012 1.4905 0.0089 0.0714 0.0026 1.6876 0.0298 0.0645 0.0020 3.7792 0.0448 Both 0.0196 0.0005 1.4908 0.0087 0.0630 0.0011 1.6868 0.0309 0.0581 0.0017 3.7773 0.0466

Table 1. Approximation and social welfare of games from different payoff distributions. Each experiment is repeated five times, and the average results along with the standard deviation are presented.

General Cooperation Zero-Sum Permutation-ρ-Invariant Approximation Social Welfare Approximation Social Welfare Approximation Social Welfare Approximation Social Welfare

Raw 0.0142 0.0005 1.5211 0.0048 0.0035 0.0002 1.8226 0.0033 0.0138 0.0006 0.0000 4.6404E-5 1.3778E-5 0.9996 0.0044

PPE 0.0224 0.0012 1.4905 0.0086 0.1660 0.0081 1.4238 0.0100 0.0120 0.0007 0.0000 3.5154E-6 1.2956E-6 0.9996 0.0044 OPI 0.0227 0.0012 1.4905 0.0089 0.1657 0.0064 1.4247 0.0085 0.0119 0.0004 0.0000 4.4566E-5 1.2887E-5 0.9996 0.0044 Both 0.0196 0.0005 1.4908 0.0087 0.1056 0.0007 1.4224 0.0064 0.0116 0.0005 0.0000 2.9783E-7 1.2921E-7 0.9996 0.0044

Table 2. Approximation and social welfare of different game classes. Each experiment is repeated five times, and the average results along with the standard deviation are presented.

Different Game Type. In this experiment, we evaluate the performance of the NE approximator on different game classes, including General, Cooperation, Zero-Sum, and Permutation-ρ-Invariant games. In General Games, each element in u is independently and identically distributed from a uniform distribution over the interval [0, 1]. In Cooperation Games, we first generate u1, and then set u2 = u1. In Zero-Sum Games, we first generate u1, and then set u2 = u1. In Permutation-ρ-Invariant Games, both u1 and u2 are set to be permutation-ρ-invariant, where ρ = ρ1ρ2 and ρi U(R), with R being the set of circulations of [n] and n = 4. The choice of these game classes ensures that each game distribution is permutation-invariant, which aligns with the assumptions of our theorems. The BOTH model consistently achieves smaller approximations compared to the PPE and OPI models. In Zero-Sum games, both the PPE and OPI models yield smaller approximations than the RAW model, which aligns with the findings of Theorem 5.7. In Permutation-ρ-Invariant Games, all models converge to the ordinary solution σ1 = σ2 = ( 1

n, . . . , 1

n), resulting in almost zero error, regardless of the inputs. These experimental results provide empirical support for the theoretical findings presented in our theorems.

Different Training Mode. In this part, we train the four versions of the model, and in testing, we also transform the trained model into the four versions. By doing so, we get 4 4 = 16 different experimental results. Due to the exponential computation costs, we set the game to be 3 3. Moreover, we set the hidden size to be 128 and the data size per epoch to be 20000 to deduce costs. The results are presented in Table 2, where the diagonal elements, representing the same training and testing approach, are highlighted in bold. We observed that in RAW training mode, both the PPE and OPI models showed larger approximations compared to the RAW model. However, when trained in their respective modes, both the PPE and OPI models outperformed the RAW model in terms of approximation. These findings highlight the impact of training mode on the model s performance and demonstrate that both the PPE and OPI models, in their respective training modes, achieved better approximation results compared to the RAW model in RAW training mode.

E. Experiments of Social Welfare

In Theorem 6.7, we discussed the limitation of NE approximators in terms of social welfare. To prove this theorem, we constructed a bimatrix game with the following payoff matrix:

1, 1 0, 0 0, 1/2 + ε 0, 0 1, 1 0, 1/2 + ε 1/2 + ε, 0 1/2 + ε, 0 ε, ε

where ε (0, 1/2). We showed that although the NE with the largest social welfare in this game can achieve a welfare of 2, our proof demonstrated that the maximum social welfare of the exact NEs that any equivariant (PPE and OPI) NE approximators can find is only 2ε.

Are Equivariant Equilibrium Approximators Beneficial?

Test Train Raw PPE OPI BOTH E SW E SW E SW E SW

Raw 0.0122 1.4395 0.0515 1.4311 0.0462 1.4334 0.0543 1.4397 PPE 0.0148 1.4291 0.0091 1.4324 0.0421 1.4318 0.0323 1.4353 OPI 0.0145 1.4288 0.0427 1.4314 0.0091 1.4304 0.0278 1.4339 BOTH 0.0125 1.4283 0.0078 1.4324 0.0078 1.4323 0.0053 1.4355

Table 3. Approximation and social welfare under different training and testing mode.

Model Social Welfare Approximation

General 1.9930 9.9655 10 4

Equivariant 0.1854 9.7611 10 4

Table 4. Social welfare and corresponding approximation on the constructed games in the proof of Theorem 6.7.

On top of that, we conduct experiments on the game to empirically verify the findings of Theorem 6.7.

E.1. Experimental Setup

We set our general model as a 5-layer fully connected neural network with 256 nodes in each hidden layer. Re LU is used as the activation function for each hidden layer, and batch normalization is added before the activation. The equivariant model we use is the (PPE and OPI) orbit-averaged general model. Both models are trained using a constrained optimization framework, where the learning problem is formulated as follows:

max θ Eu D[SW(fθ(u), u)],

s.t Eu D[E(fθ(u), u)] = 0 .

In this formulation, we introduce an objective to maximize social welfare while maintaining the constraint of minimizing approximation. This encourages both models to identify NE solutions with high social welfare.

Similar to previous works (D utting et al., 2019; Duan et al., 2022), we employ the Augmented Lagrangian Multiplier Method to optimize our model. The loss function with respect to the training set S is defined as follows:

Lρ(θ; λ) = 1

u S SW(fθ(u), u) + λ

u S E(fθ(u), u) + ρ

u S E(fθ(u), u)

where λ > 0 is the Lagrange multiplier, m is the data size, and ρ > 0 is a hyperparameter controlling the penalty weight for constraint violation. During optimization, we alternate between updating the model parameters using gradient descent and updating the Lagrange multipliers using the following equation:

λnew = λold + ρ 1

u S E(fθ(u), u).

The initial value of the Lagrange multiplier λ is set to 1, and the initial value of ρ is set to 1 and increased by 5 after each epoch.

We utilize the Adam optimizer with a batch size of 512 to train our model. The initial learning rate is set to 5 10 4. Each epoch involves training on 4096 data points and testing on 1000 data points. Data is generated on-site for each training epoch. Specifically, the training and testing data are sampled from a fixed distribution, where the ε parameter of the game is sampled from a uniform distribution with a minimum of 0.005 and a maximum of 0.01 (i.e., U(0.005, 0.01)).

E.2. Experimental Results

The results of our experiments are presented in Table 4. Both approaches demonstrate good approximations, with an approximation loss of less than 1e-3. However, there is a notable difference in the achieved social welfare between the

Are Equivariant Equilibrium Approximators Beneficial?

general model and the equivariant model. While the general model achieves nearly the highest social welfare, the equivariant model can only achieve significantly lower social welfare. It is important to highlight that both models were encouraged to find NE solutions with higher social welfare. The reason why the equivariant model achieves higher social welfare than 2ϵ is that it finds an approximate NE rather than an exact NE. This result suggests that the equivariant model may sometimes prioritize other factors over social welfare.

In summary, our findings demonstrate that even in this small-scale setting, equivariant approximators can exhibit limitations in terms of social welfare, particularly in extreme cases.