# predicting_deliberative_outcomes__90976804.pdf

Predicting deliberative outcomes

Vikas K. Garg 1 Tommi Jaakkola 1

We extend structured prediction to deliberative outcomes. Speciﬁcally, we learn parameterized games that can map any inputs to equilibria as the outcomes. Standard structured prediction models rely heavily on global scoring functions and are therefore unable to model individual player preferences or how they respond to others asymmetrically. Our games take as input, e.g., UN resolution to be voted on, and map such contexts to initial strategies, player utilities, and interactions. Players are then thought to repeatedly update their strategies in response to weighted aggregates of other players choices towards maximizing their individual utilities. The output from the game is a sample from the resulting (near) equilibrium mixed strategy proﬁle. We characterize conditions under which players strategies converge to an equilibrium in such games and when the game parameters can be provably recovered from observations. Empirically, we demonstrate on two real voting datasets that our games can recover interpretable strategic interactions, and predict strategies for players in new settings.

1. Introduction

Structured prediction methods (Lafferty et al., 2001; Taskar et al., 2003; Tsochantaridis et al., 2005; Nowozin and Lampert, 2011) typically operate on parametric scoring functions whose maximizing assignment is used as the predicted conﬁguration. Since the parameters can be learned directly to maximize prediction accuracy, often via surrogate losses, the methods have been successful across areas (Chen et al., 2015; Globerson et al., 2015; London et al., 2016; Cortes et al., 2016; Osokin et al., 2017; Pan and Srikumar, 2018).

Not all structured observations can be naturally modeled

1CSAIL, MIT. Correspondence to: Vikas Garg <vgarg@csail.mit.edu>, Tommi Jaakkola <tommi@csail.mit.edu>.

Proceedings of the 37 th International Conference on Machine Learning, Online, PMLR 119, 2020. Copyright 2020 by the author(s).

as extrema of scoring functions. For instance, votes cast in response to a bill in the US Congress likely involve rounds of negotiations prior to casting the ﬁnal votes. As a result, the votes reﬂect individual utilities of representatives as well as their interactions with subsets of others. Absent some unifying scoring function, contentious votes are more naturally modeled in game theoretic terms as an equilibrium rather than a maximizing assignment (Waugh et al., 2011; Garg and Jaakkola, 2017). For example, one could consider the voting outcome directly as a pure strategy Nash equilibrium (PSNE) of some ﬁxed underlying graphical game (Irfan and Ortiz, 2014; Honorio and Ortiz, 2015; Garg and Jaakkola, 2016; 2017). The observed votes would be in this case directly actions in the game, and complete observations help estimation. However, most games do not permit pure strategy equilibria (Tardos and Vazirani, 2007). The voting outcome is therefore best viewed as a sample from a product distribution that represents a mixed strategy equilibrium (guaranteed to exist for any game).

Our goal is to use games in a predictive sense. We must therefore explicitly model the game dynamics, i.e., how an equilibrium outcome is reached from the initial conditions. Previous methods for estimating game parameters from data often side-step the issue of game dynamics. It is therefore not possible in such approaches to directly output an equilibrium as the predicted outcome. For games to be useful in structured prediction, the game parameters as well as the initial strategies for the players, prior to deliberation, must be also conditioned on a common context such as a bill to be voted on.

Strategic prediction. In this paper, we turn games into structured prediction methods and call such approaches strategic prediction methods. At a high level, our model takes the available context such as the written bill the input and maps it to a mixed strategy equilibrium the output. The input-output mapping is obtained via a deliberative process akin to multiple rounds of negotiations between the countries. The observed actions by players, e.g., how countries voted on the bill, are then viewed as samples from this mixed strategy equilibrium. We model the impact of context by parametrically mapping it to each player as player types. These types are simply learned vectors quantifying what each player derives from the context so as to guide their behavior in the game. The types also specify the utility func-

Predicting deliberative outcomes

tions and initial strategies for players. A key part of strategic prediction pertains to how players interact with each other. In this paper, we limit our focus to so called aggregative games where the players respond to weighted aggregates of other players actions or strategies. The interactions are weighted and asymmetric, and the weights can be positive or negative representing cooperative or competing relationships between the players. We follow a best-response game dynamics to evolve players initial strategies towards the predicted output, a (near) mixed strategy equilibrium. We call our parametrized games strategic aggregative prediction (SAP) models.

Our games can be viewed as conditional versions of directed graphical games (Kearns et al., 2001; Kearns and Mansour, 2002), restricted to a rich subclass of aggregative games that subsumes Cournot oligopoly, mean ﬁeld, public goods, and population games (Garg and Jaakkola, 2017). These games shield each player from speciﬁc information about any neighbor since players respond to aggregate (weighted sum) of their neighbors strategies.

Modeling game dynamics. A key novelty of our approach is to explicitly incorporate game dynamics. In our implementation, we follow a k-step best response dynamics, seeded with predicted initial strategies. As we operate on continuous strategies and our best response updates are differentiable, we can back-propagate through the k-step strategy updates, and thus learn parameters efﬁciently, unlike, e.g. (Garg and Jaakkola, 2016; 2017). Since game dynamics plays a critical role in predictive use of games, we also introduce and provide a deeper analysis of more general dynamics and types of aggregation strategies. Our work yields exact conditions under which strategies converge to different types of equilibrium.

Identiﬁability. We provide identiﬁability guarantees for the game parameters. For the analysis, we adopt a simpler one-shot setting, where the observed outcome is sampled from player strategies after just one round of communication instead of following k-steps. We characterize conditions under which one-shot interaction weights are identiﬁable in the sense that we can recover the neighbors of any player as well as the correct sign (positive or negative) of their interaction. Prior guarantees exist for the recovery of equilibria, not game parameters, showing conditions for the recovery of a set of pure strategy equilibria under various noise assumptions (Ghoshal and Honorio, 2017).

The rest of this paper is structured as follows. The basic setup is introduced in section 2. We describe parameter estimation in section 3, and extension to new players in section 4. Identiﬁability is discussed in section 5, and convergence of dynamics in section 6. We present detailed experiments on two real datasets to illustrate the beneﬁts of

our approach in section 7. We defer the details of our proofs to the Supplementary.

2. Basic strategic prediction model

We ﬁrst introduce our basic strategic prediction model. To this end, we need to deﬁne several components of the model. These include (a) the graphical layout of the game, and how players inﬂuence each other; (b) the player types and how these are derived from the context; (c) individual utilities for the players; (d) initial strategies for the players before witnessing the play of others; (e) and the game dynamics, i.e., how players respond to others. Later, in the transferable setting, we will no longer individuate players through their identities but instead introduce feature vectors to predict the strategies of new players in new contexts.

Let G = (V, E) be a connected digraph such that vertex i identiﬁes player i [n] {1, 2, . . . , n}, n = |V |. Let A be the ﬁnite discrete set of actions, the same for all the players. Each player i plays a randomized (mixed) strategy, i.e., a distribution over actions σi (A), from the simplex

ai A σi(ai) = 1, σi(ai) 0 ai A

We will denote a joint strategy proﬁle of all the players by (σi, σ i) to emphasize the distinction between player i and the other players.

We model the inﬂuence of players on others through a weighted aggregation of the strategies. The weights wij R denote the strength of inﬂuence of player j on player i. We will call players j [n] \ {i} that have wij = 0 the neighbors of players i. We deﬁne a weight matrix W Rn n

such that W(i, i) = 0 and W(i, j) = wij. Player i communicates with other players only through aggregator Ai that maps the strategies of other players, i.e., σ i to the weighted sum P

j =i wijσj, the effective inﬂuence of others.

The context inﬂuences the game through the types. The type of a player is her valuation that quantiﬁes what she derives from the context. The mapping from context to types could be deﬁned in multiple ways. For simplicity, we parameterize the private type zi of each player i via a matrix θi that maps any context x X Rd to vector zi(x) = θix. Henceforth, we will keep the dependence on context implicit for simplicity, and write zi(x) as zi when the context is clear.

We model the utility or payoff of player i [n], of type zi, under strategy proﬁle (σi, σ i) as:

Ui(σi, σ i, zi) = σ i (Ai(σ i) zi) + τH(σi) , (1)

where H(σi) is the entropy associated with σi and τ 0. The entropy encourages completely mixed strategy choices,

Predicting deliberative outcomes

in the interior of simplex (A). The type of a player is by design hidden from the other players. Thus by subtracting the player type, the payoff function mitigates the (indirect) role of player s valuation in guiding the aggregate response of the others. Our games are ﬂexible to accommodate a wide variety of other payoff functions.

Our choice of payoff function (1) maintains a linear dependence on the strategies of others for two reasons. First, our games are related to linear inﬂuence games that have been studied previously, e.g., as models of how to vaccinate against a disease, install antivirus software, or get home insurance (Irfan and Ortiz, 2014; Honorio and Ortiz, 2015; Ghoshal and Honorio, 2017). We extend these games by allowing multi-way actions and types. Second, the linear dependence also helps establish provable recovery guarantees for the game parameters.

The utilities naturally deﬁne the best response of player i when it observes the aggregate input Ai(σ i):

βτ i (Ai(σ i), zi) arg max σi (Ai) Ui(σi, σ i, zi) . (2)

We say that (σ i , σ i, zi, z i) form a mixed strategy Nash equilibrium MSNE (MSNE) or simply NE iff

Ui(σ i , σ i, zi) Ui(σi, σ i, zi) i [n], σi (A) . (3) It is easy to show that our game has at least one MSNE. For τ > 0, our best response function is continuous, and maps each strategy proﬁle from product simplex (compact space) to a unique point in itself. The Nash equilibrium is therefore guaranteed to exist by Brouwer s ﬁxed point theorem. For τ = 0, the mapping may be set-valued, but the Nash equilibrium is still guaranteed to exist by Nash s theorem (Nash, 1951) via Kakutani s ﬁxed point argument.

We say that MSNE is strict (SNE) when (3) is strict for all i [n], σi (A)\{σ i }, completely mixed (CMNE) when σ i (ai) > 0 for all i [n], ai A, and pure (PSNE) when for all i [n] there exists an ai A such that σ i (ai) = 1.

It remains to specify how an equilibrium is reached, i.e., the game dynamics. To begin with, players observe context x, and evaluate types zi. The types, in turn, give rise to initial strategies σ0 i = ψ(zi), where ψ : Ti (A) (e.g. softmax). The best response dynamics from this point on depends on the details of the aggregator and whether the dynamics is deﬁned over strategies or actions directly. We provide details on general game dynamics for two different aggregators along with associated convergence guarantees in section 6. In our empirical analysis, we adopt a k-step dynamics as described in the following section. Once a (near) equilibrium is reached, a sample action proﬁle y Y An is drawn from player strategies.

We emphasize that our setting dispenses with the restrictive assumption made by Bayesian games (Harsanyi, 1967;

Kalai, 2004; Jiang and Leyton-Brown, 2010) that the conditional distribution P(z i|zi) is known to player i.

3. Parameter estimation

We learn our games from data as structured prediction methods. Speciﬁcally, given a dataset D = {(x(m), y(m))} X Y, m [M]} linking contexts to sampled action proﬁles, our objective is to estimate the type parameters θi and the inﬂuences of neighbors wi (wij)j =i, i [n].

Each pair (x(m), y(m)) is treated as follows. A linear transformation ˆθ = (ˆθ1, . . . , ˆθn) maps the context x(m)

to the types ˆz(x(m)) (ˆz(m) 1 , . . . , ˆz(m) n ) that result in initial strategies ˆσ0 i (x(m)) = ζ(ˆz(m) i ) of the players i [n], where ζ is the softmax nonlinearity. The aggregators ˆ Ai evaluate weighted sums, and are parametrized by weights ˆwi = ( ˆwij)j =i. A sequence of k update steps

ˆσt+1 i (x(m)) = ζ(ν(ˆσt i) + α( ˆ Ai(ˆσt i) ˆz(m) i )), (4)

t {0, 1, . . . , k 1} is then followed: k and α are hyperparameters, and ν deﬁnes the type of update. Several choices of ν are possible; e.g., ν(ˆσt i) = 0 pertains to best response β1/α i ( ˆ Ai(ˆσt i), ˆz(m) i ) deﬁned in (2), the identity mapping ν(ˆσt i) = ˆσt i deﬁnes a gradient step, and ν(ˆσt i) = log ˆσt i corresponds to a proximal update based on KL-divergence: we deﬁne ˆσt+1 i as

arg max σi σ i ( ˆAi(ˆσt i) ˆz(m) i ) + (1/α)KL(σi||ˆσt i) .

Note that we can use functions other than softmax for our strategy updates. For instance, we can take a gradient step and then project on the probability simplex. One attractive feature of softmax is that it is differentiable and its gradient can be computed in a closed form, so it can be readily backpropagated through in a neural model. On the other hand, projection on the probability simplex typically requires specialized optimization algorithms such as (Duchi et al., 2008) or (Condat, 2016).

Our estimation criterion for the game is to minimize expected cross-entropy loss E[ℓ(ˆσk(x(m)), y(m)] between the predicted mixed strategies and the observed proﬁles, where the expectation is with respect to the empirical distribution over pairs (x(m), y(m)), ℓis the cross entropy loss, and ˆσk(x(m)) (ˆσk i (x(m)))i [n]. We use standard backpropagation to evaluate gradients through the k-step strategy updates efﬁciently.

Note that since we do not assume that the observed conﬁgurations y(m) correspond to pure strategy Nash equilibria, we naturally avoid having to enforce speciﬁc consistency constraints. Instead, we can measure a degree of agreement that is amenable to more complex parameterizations such as

Predicting deliberative outcomes

neural networks. Our estimation procedure also scales better than methods that rely on PSNEs, e.g., (Garg and Jaakkola, 2016; 2017). The overall dependence of our approach is linear in number of observed game outcomes and the number of update steps, and quadratic in number of players.

4. Transferable strategic prediction

Here we generalize SAP models to permit different players from one game to another. Unlike in section 2, we can no longer assume a ﬁxed interaction structure across games. Instead, the neighbor inﬂuences are determined by context and player feature vectors. Speciﬁcally, we construct a feature vector bi B for each player i [n]. Such information is often available, e.g., education and gender of judges; human development indicators of countries, etc. This enables us to predict the behavior of new players in new contexts.

Each game is played with a different subset of players I [n], and is unrolled as follows. A context x X is mapped to latent (more general) player types using a parametric function fz : X B Z, taking each pair (x, bv), v I as input, and mapping it to zx,v Z. These types deﬁne initial strategies as before σ0 x,v = φ(Γzx,v), where Γ is a transformation matrix that yields a vector in R|A|, and φ (e.g., softmax) maps the result to a distribution in the simplex (A). Unlike before, the (asymmetric) inﬂuences between players are now calculated parametrically from the types: wx,v,v = fw(zx,v, zx,v ) using a parametric mapping fw : Z Z R. Each player v still responds to other players v I \ {v} through its aggregator

Ax,v,I(σx, v) X

v I:v =v wx,v,v σx,v .

We can extend the deﬁnition of the payoffs slightly to incorporate the more general types:

Uv,I(σx,v, σx, v, zx,v) = σ v (Ax,v,I(σx, v)) Γzx,v) .

where Γ is an additional parameter matrix to be learned. The game dynamics dictates the course of play in the same fashion as the basic strategic setting. We learn the model, now parameterized by fz, fw, and Γ, by minimizing the loss between predicted k-step strategies and observed action proﬁles.

5. Identiﬁability of the games

We now characterize the conditions under which the strategic interactions in our games become identiﬁable. We focus on the one-shot setting (i.e., k = 1) with the gradient update so that the observed outcome is sampled from player strategies after one round of communication. We also use binary actions to simplify the exposition. Speciﬁcally, we estimate from data D the support Si or the set of neighbors

of i deﬁned with respect to the unknown inﬂuences w ij = 0. We also recover the correct sign of these inﬂuences.

Our recovery procedure is a novel adaptation of the primaldual witness method (Wainwright, 2009) for structure estimation in games. The method has previously been applied in several non-strategic settings such as Lasso (Wainwright, 2009) and Ising models (Ravikumar et al., 2010). In contrast to Lasso and Ising models, our setting poses some new challenges. Unlike these models, the nodes of our games are players that actively communicate with others, and reﬁne their strategies toward maximizing their individual utilities. Moreover, context and dynamics play no part in Ising models. In our setting, each outcome is sampled from a separate joint strategy proﬁle following one step of dynamics initiated under a different context.

(Ghoshal and Honorio, 2017) employed the witness method to recover the entire set of PSNE from data consisting of a subset of PSNE, and a small fraction of non-equilibrium outcomes assumed to be sampled under their noise models in the setting of linear inﬂuence games. However, the problem of structure recovery is signiﬁcantly harder: it is known (Honorio and Ortiz, 2015; Ghoshal and Honorio, 2017) that the problem becomes non-identiﬁable in the setting of PSNE, since multiple game structures may pertain to the same of PSNE. We leverage the dynamics to circumvent these issues, and characterize the conditions when we can provably recover the structure of our games.

Our analysis on recovering the inﬂuences makes the simplifying assumption that the type parameters θ are known. We denote by φ(m) j the probability assigned by the initial strategy σ0 j (x(m)) to action 1 for player j on example m. Let ℓi(wi; D) be the average cross-entropy loss between one-step strategies under candidate weights wi (wij)j =i and the observed actions for player i, and λ > 0 be a regularization parameter. We solve the following problem for each player i [n]:

arg min wi Rn 1 ℓi(wi; D) + λ||wi||1 , (5)

Let HM i be the sample Hessian 2ℓi(wi; D) under wi, and H M i pertain to true weights w i . Let Λmin( ) and Λmax( ) denote the minimum and the maximum eigenvalues. The following assumptions serve as our analogues of the conditions for support recovery in Lasso (Wainwright, 2009) and Ising models (Ravikumar et al., 2010):

Λmin H M i,SS α2Cmin (6)

m=1 Φ(m) i Φ(m)

|||H M i,Sc S(H M i,SS) 1||| 1 γ , (8)

where Φ(m) i (φ(m) j )j =i, Cmin > 0, Cmax < ,

Predicting deliberative outcomes

γ (0, 1]; |||A||| is the maximum over L1-norm of rows in A; H M i,SS is the submatrix obtained by restricting H M i to rows and columns corresponding to neighbors, i.e., players in Si, and H M i,SSc is restricted to rows pertaining to Si and columns to Sc i (non-neighbors). Let the number of neighbors for any player be at most d n 1. We have the following result.

Theorem 1. Let M > 802C2 max C4 min

4 d2 log(n), and

M . Suppose the sample satisﬁes

assumptions (6), (7), and (8).

Deﬁne Cα,γ = γ2

32α2(2 γ)2 . Consider any player i

[n]. The following results hold with probability at least 1 2 exp( Cα,γλ2 MM) 1 for i.

1. The corresponding L1-regularized optimization problem has a unique solution, i.e., a unique set of neighbors for i.

2. The set of predicted neighbors of i is a subset of the true neighbors. Additionally, the predicted set contains

all true neighbors j for which |w ij| 10 α2Cmin

In particular, the set of true neighbors of i is exactly recovered if

min j Si |w ij| 10 α2Cmin

Note that taking a union bound over players, our results imply that we recover the true signed neighborhoods for all players in our game with high probability. Our analysis can possibly be extended to include unknown type parameters θ by computing the gradient and the Hessian of loss with respect to θ, similarly to inﬂuences. However, additional assumptions may be needed to establish recovery.

6. General game dynamics and convergence

We now give an overview of general game dynamics along with associated convergence guarantees. The aggregator acts as a privacy preserving component, hiding speciﬁc neighbor actions or strategies, only offering aggregate statistics. We design dynamics under two different kinds of feedback from the aggregator. In our active aggregator (AA) setting, the players get a prediction about the anticipated aggregate of their neighbors. In contrast, a passive aggregator (PA) only provides the aggregate of empirical frequencies used by the neighbors, and changes in the aggregate are estimated by the player. Intuitively, AA reveals more information about the neighbors strategy evolution.

Note that the payoff received by player i when she samples a pure action ai A according to σi (denoted by ai σi), and others sample actions a i according to σ i is

Ui(eai, ea i, zi) = e ai Ai(ea i) zi + τH(σi),

where eai (A) is the strategy corresponding to pure action ai, and ea i {eaj : j = i}.

We devise two new protocols that may be viewed as derivative action adaptations (Shamma and Arslan, 2005) of smooth ﬁctitious play (FP) and gradient play (GP) (Brown, 1951; Robinson, 1951; Fudenberg and Kreps, 1993; Shapley, 1964) in an aggregative game setting. The protocols differ by how players respond to the (predicted) aggregate: one can play the best response or adapt the strategy via a gradient update. Formally, in our protocols, player i samples an action ak i σk i at time k > 0 based on

qk i = qk 1 i + (eak 1 i qk 1 i )/k (A)

σk i = gi(Ai(h i(qk i)), zi) , (9)

where qk i is the empirical frequency of actions played by i till time k, and gi : R|A| R|A| (A) and hi : R|A| R|A| are appropriately deﬁned Lipschitz mappings possibly involving small input noise. We let qk 1 i {qk 1 j |j = i}, and h i(qk 1 i ) {hj(qk 1 j )|j = i}. We also deﬁne the base case q0 i = σ0 i = φ(zi). Note that player i communicates only with Ai. We deﬁne a passive aggregator (PA) by letting hi be the identity mapping, i.e. hi(vi) = vi. Alternatively, when hi(vi) = vi + γ vi for some γ > 0 and a difference approximation vi of a temporal derivative vi, we obtain an active aggregator (AA). Intuitively, AA views each qj as discretization of a continuous signal qj(t) so that when qj(t) qj(t), for neighbors j of i, we have

hj(qj(t)) qj(t) + γ qj(t) qj(t + γ)

= Ai(h i(q i(t))) Ai(q i(t + γ)),

and therefore Ai offers a predicted aggregate to player i. We consider two forms of best response dynamics encoded in gi, SAP-FP and SAP-GP, based on derivative FP and derivative GP, respectively. In SAP-FP we set τ > 0 in the utility functions. This lets us have a unique best response:

AA yields uk : gi(uk, zi) = βτ i (uk, zi),

PA yields uk : gi(uk, zi) = βτ i (uk + γ ˆuk, zi),

where the AA case differs from PA in terms of where the difference approximation happens. In AA, it happens prior to aggregation thus gi is deﬁned directly in terms of the output of AA or uk which absorbs any temporal approximation error. In PA case, the player constructs a temporal prediction of the aggregate, and the approximation is ( uk ˆuk).

Predicting deliberative outcomes

(SAP-FP/AA) qi = βτ i (Ai(q i + γ r i), zi) qi, ri = λ(qi ri) (10)

(SAP-FP/ PA) qi = βτ i (Ai(q i) + γ ri, zi) qi, ri = λ(Ai(q i) ri) (11)

(SAP-GP/AA) qi = Π [qi + Ai(q i + γ r i) zi] qi, ri = λ(qi ri) (12)

(SAP-GP/PA) qi = Π [qi + Ai(q i) + γ ri zi] qi, ri = λ(Ai(q i) ri) (13)

In SAP-GP, we set τ = 0, and player i takes a gradient step to maximize the anticipated payoff followed by a Euclidean projection to get a unique mapping gi (since any such projection on a closed convex set is unique):

AA yields uk : gi(uk, zi) = Π (qi + uk zi),

PA yields uk : gi(uk, zi) = Π (qk i + uk + γ ˆuk zi),

where Π (q) argmin q (A) || q q||2 .

Thus under both protocols, players take actions stochastically according to σk i and the best response mapping gi is unique for each k. For analysis, we associate an ODE system to characterize the evolution of player strategies, and specify conditions when the ﬁxed points of this ODE are locally asymptotic stable (l.a.s.). An equilibrium point s is said to be l.a.s. if every ODE trajectory that starts at a point in a small neighborhood of s remains forever in that neighborhood and eventually converges to s. As a consequence, our discrete updates would converge to a Nash equilibrium with positive probability (Shamma and Arslan, 2005). Our updates in (9) lead to the implicit ODEs (39)-(42) for SAPFP and SAP-GP under AA and PA settings, where λ > 0, ri is an estimate for qi, and r i { rj|j = i, wij = 0}. We will call a matrix stable if all its eigenvalues have strictly negative real parts. Let I denote the identity matrix. We now state results that characterize conditions under which different dynamics lead to asymptotically stable equilibria.

We state and prove our convergence results as theorems in the Supplementary (see Table 1 for a summary). Speciﬁcally, we prove the convergence of player strategies to SNE via carefully crafted Lyapunov functions that are locally positive deﬁnite and have a locally negative semideﬁnite time derivative. The other proofs track the evolution of game dynamics around an equilibrium: we analyze conditions under which the Jacobian matrix of the linearization of ODE is Hurwitz stable, i.e., all the eigenvalues have negative real roots, and exploit the fact that the behavior of the ODE near equilibrium is same as its linear approximation when all eigenvalues have non-zero real parts.

Our results have several important implications. (Garg and Jaakkola, 2016; 2017) enforced margin constraints on the payoff functions in their discriminative PSNE setup, without establishing SNE that guarantees the players would be strictly worse off by a unilateral switch to another strategy. In contrast, Theorems 5 and 7 specify exactly the conditions

Table 1: Convergence results (details in the Supplementary)

Setting Converges to Conditions

SAP-FP/AA NE Theorem 2 SAP-FP/PA NE Theorem 3 SAP-GP/AA CMNE Theorem 4 SAP-GP/AA SNE Theorem 5 SAP-GP/PA CMNE Theorem 6 SAP-GP/PA SNE Theorem 7

for SNE, which under additional assumptions, may be sampled to get a PSNE (Azrieli and Shmaya, 2013) if desired. It is known that classic FP fails to converge in some simple games, e.g. (Shapley, 1964), that have a unique CMNE. Theorems 4 and 6 specify conditions that circumvent such negative results. Our stability conditions can be simpliﬁed further when λ is sufﬁciently large, whence the behavior may be understood solely in terms of γ.

7. Experiments

We now describe the results of our experiments on two real datasets that provide insights into some important aspects of our games. We ﬁrst show that our SAP model qualitatively recovers the known strategic behavior of the Justices from US Supreme Court data. We also provide quantitative evidence in terms of two measures: (a) the fraction of correct edges recovered, and (b) coherence of groups in terms of cut-size, to show that SAP game outperforms prior methods on two different measures. We then show that the structure estimated by SAP game on the UN General Assembly data (Voeten et al., 2009) is meaningful, and helps unravel the subtle behavior of member countries. Finally, we demonstrate that SAP games incur a lower cross-entropy loss than our baseline, so can be effectively transferred to predict strategies in new settings with different sets of players.

Experimental setup

We found that SAP games performed well over a wide range of hyperparameters. We implemented the models in sections 3 and 4 with L1 regularization for structure estimation as described in section 5. Our models performed well for a wide range of α, λ, and k. We report the results with k = 5,

Predicting deliberative outcomes

(a) Positive Inﬂuence

(b) Negative Inﬂuence

Figure 1: Supreme Court structure recovery with our SAP model: (a) and (b) show, respectively, justices with the most positive and the most negative inﬂuence, quantiﬁed by ˆwij, on each Justice i. The estimated connections are consistent with the known jurisprudence of the Court. In particular, (a) shows coherence between the conservatives (red), that between the liberals (blue), and the separation of these ideologies from the moderates (green). Likewise, (b) shows all the negative connections are between the blocs. Determining inﬂuences by heuristics such as ordering by pairwise vote agreements does not work; e.g, that would imply K had a strong positive inﬂuence on R, since R agreed with K more than with anyone else.

(a) SAP (Ours)

(b) Local AG

(c) Tree Structured Potential Game

Figure 2: Quantitative comparison with prior methods: (a), (b), and (c) show the structures estimated by SAP, local AG, and tree structured potential game on US Supreme Court data. We evaluate the different methods using size of the cut, i.e., minimum number of edges to be removed to decompose the structure into its three components (reds, blues, and greens). A low value of cut quantiﬁes high coherence within each component, and thus pertains to a good structure. The cut size for SAP game (3) is much lower than other methods (6 each).

α = 0.1, and λ = 0.1 for all our experiments, except the transferable setting where we set α = 0.01 and λ = 0. We set ν to be the identity function in (4). We trained our models in batches of size 200, with default settings of the RMSprop optimizer in Py Torch. To account for effect of randomness in neural training toward structure estimation, we averaged the parameters of each model across 5 independent runs. We characterize the inﬂuence of other players j on player i in terms of the ordering of the corresponding estimated average weight values from most positive to most negative.

We did not impose L1 penalty in the transferable setting since the interactions are learned for new individuals, i.e., they are not speciﬁc to any ﬁxed set of players, unlike the basic setting.

7.1. US Supreme Court Data

We included all the cases from the Rehnquist Court, during the period 1994-2005 that had votes documented for all the

9 Justices.1 Justices Rehnquist (R), Scalia (Sc), and Thomas (T) represented the conservative side; Justices Stevens (St), Souter (So), Ginsburg (G), and Breyer (B) formed the liberal bloc; and Justices Kennedy (K) and O Connor (O), often called swing votes, followed a moderate ideology. Each Justice is treated as a player in our game. Our contexts comprise of 32 binary attributes about the speciﬁcs of the appeal, e.g., the disposition of lower court. Our observation outcome y(m) for each context x(m) pertains to the corresponding votes of the Justices. The votes belong to one of the three categories: yes, no, or complex.

Structure recovery from Supreme Court data

Fig. 1 describes in detail how our method yields a structure that is qualitatively consistent with the known jurisprudence of the Rehnquist Court. Speciﬁcally, the conservatives and the liberals form two separate coherent, strongly-connected blocs that are well-segregated from each other.

1data available at: http://scdb.wustl.edu/data.php

Predicting deliberative outcomes

(a) Positive inﬂuence

(b) Negative inﬂuence

Figure 3: UNGA structure recovery with SAP games: Incoming arrows show (a) 2 countries with the most positive inﬂuence (black edges), and (b) 2 countries with the most negative inﬂuence (magenta edges), quantiﬁed by ˆwij, on each country i. The estimated links are largely consistent with the expected alignments. In particular, (a) shows the two blocs (in yellow and green) are well segregated from each other. More interesting alignments are revealed, e.g., (1) strong afﬁnity between NATO members on one side, and Syria, Iran, Venezuela etc. on the other, (2) link from Germany and Korea to Russia hinting at the trade inﬂuence despite their differences, and (3) geographical inﬂuence of Russia and Ukraine on Belarus. Additionally, (b) reveals that a signiﬁcant fraction of negative connections emanate from or end at Israel and USA on one side, and some yellow node on the other.

Comparison with previous works

In order to position our work with respect to prior work on structure recovery, we also quantitatively compare SAP games with the PSNE based Potential Game (Garg and Jaakkola, 2016) and Local Aggregative Game (Garg and Jaakkola, 2017) methods.

Fig. 2 provides a detailed quantitative comparison that reveals how SAP compared favorably with these measures in terms of size of the cut, i.e., minimum number of edges to be removed in order to decompose the recovered structure into the constituent blocs. Note that a low value of cut indicates coherence within each component, and thus quantiﬁes a good structure.

7.2. United Nations General Assembly (UNGA) Data

Our second dataset consists of the roll call votes of the member countries on the resolutions considered in the UN General Assembly. Each resolution is a textual description that provides a context while the votes of the countries on the resolution pertain to the observed outcome. We compiled data on all resolutions in UNGA during 19922017 to understand the interactions of member nations since the dissolution of Soviet Union.

We considered 25 countries that have dominated the United

States (USA) politics, and are generally known to belong to one of the two blocs: pro-USA, namely, Australia (AUS), Canada (CAN), France (FRA), Germany (GER), Israel (ISR), Italy (ITA), Japan (JPN), South Korea (KOR), Norway (NOR), Ukraine (UKR); and others, namely, Afghanistan (AFG), Belarus (BLR), China (CHN), Cuba (CUB), Iran (IRN), Iraq (IRQ), Mexico (MEX), Pakistan (PAK), Philippines (PHL), North Korea (PRK), Kazakhstan (KAZ), Russia (RUS), Syria (SYR), Venezuela (VEN) and Vietnam (VNM). Each country is viewed as a player in our game.

We used pretrained GLo Ve embeddings to represent each resolution as a 50-dimensional context vector x(m) obtained by taking the mean embedding of the words in its resolution. Each vote was interpreted to take one of the three values: 1 (yes), 2 (absent/abstain), or 3 (no), and we represented y(m)

as a 26-dimensional vector.

Structure and type recovery from UNGA data

Fig. 3 shows the structure estimated by our method, i.e., the weights ˆwi learned for each country i. The weights ˆwij, j = i have a natural interpretation in terms of inﬂuence: the more positive wij is, the more positive the inﬂuence of j on i. A similar connotation holds for the negative weights. To aid visualization, we disentangle the positive

Predicting deliberative outcomes

(a) Type similarity

(b) Transfer performance

Figure 4: (Left: UNGA type recovery) Incoming arrows show 2 countries with the highest cosine similarity ˆθ i ˆθj/(||ˆθi|| ||ˆθj||) for each country i. Type vectors were reasonably well aligned for members in the same bloc. (Right: transfer performance) SAP models were more effective in predicting strategies for new players as quantiﬁed by a lower transfer loss than that incurred by the baseline strategy.

and the negative connections, and depict only the most inﬂuential connections for either case. Fig. 3 describes how our method estimated a meaningful structure, and unraveled subtle inﬂuences beyond the prominent two-bloc structure. The learned type parameters ˆθi were also found to be similar for members in the same bloc (Fig. 4).

Prediction and transfer results

Our ﬁnal set of experiments focused on transferring SAP games. Besides UNGA data, we compiled a 73-dimensional feature vector for each country from its HDI Indicators (UN2, 2007) that span various spheres including health, education, trade, and social-economic sustainability.

We kept aside three-fourths of the data to set up games over small sets of players by randomly sampling 5 countries independently for each context. The left-out players from these contexts were then sampled independently to get another set of 5 countries per context. We call these two sets A and B respectively. We formed a third set C of 5 countries per game from the untouched data (i.e. the one-fourth fraction). Thus, A and B were deﬁned on the same contexts, that were disjoint from C. We averaged our results over 10 such independent triplets (A, B, C) to mitigate sampling effects. We trained a model for A using the procedure in section 4, and computed the loss on B. Our baseline, train empirical, used the empirical distribution of actions for each player i over the games it participated in A as its predicted strategy for the games in B and C.

The cross-entropy loss of SAP games (0.852) turned out

to be lower than the baseline (0.865) on C. Moreover, as Fig. 4 shows, the loss of SAP games (0.696) was found to be signiﬁcantly lower than the baseline (0.720) on B even though HDI does not fully reﬂect complex country characteristics. Thus, our results clearly underscore the beneﬁts of using SAP games for strategic prediction.

We extended structured prediction to strategic settings, and speciﬁed conditions under which our games are identiﬁable from data, and when strategies converge to an equilibrium. Empirical results demonstrate effectiveness of our models in uncovering meaningful strategic interactions from data, predicting player strategies on new contexts, and transferring knowledge to predict strategies for new players.

We sidestepped the issue of rate of convergence to equilibrium in our analysis. Since there might be multiple mixed strategy equilibria in a game, convergence rate to an equilibrium depends on the proximity of the initial joint strategy proﬁle, and the spectrum of Jacobian matrix resulting from linearization of the associated ODE.

Acknowledgments

TJ and VG acknowledge support from ONR and the MITIBM collaboration.

Predicting deliberative outcomes

Human Development Reports. Available at http://hdr.undp.org/en/countries/proﬁles/. United Nations Development Programme, 2007.

Y. Azrieli and E. Shmaya. Lipschitz games. Mathematics of Operations Research, 38(2):350 357, 2013.

M. Benaïm, J. Hofbauer, and S. Sorin. Stochastic approximations and differential inclusions. SIAM Journal on Control and Optimization, 44:328 348, 2005.

M. Benaïm, J. Hofbauer, and S. Sorin. Stochastic approximations and differential inclusions, part ii: Applications. Mathematics of Operations Research, 31(4):673 695, 2006.

V. S. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge University Press, 2008.

G. W. Brown. Iterative solutions of games by ﬁctitious play. In Activity Analysis of Production and Allocation, 1951.

L.-C. Chen, A. Schwing, A. Yuille, and R. Urtasun. Learning deep structured models. In International Conference on Machine Learning (ICML), 2015.

L. Condat. Fast projection onto the simplex and the l1 ball. Mathematical Programming, 158(1 2):575 585, 2016.

C. Cortes, V. Kuznetsov, M. Mohri, and S. Yang. Structured prediction theory based on factor graph complexity. In Neural Information Processing Systems (NIPS), 2016.

J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efﬁcient projections onto the l1-ball for learning in high dimensions. In International Conference on Machine Learning (ICML), 2008.

D. Fudenberg and D. M. Kreps. Learning mixed equilibria. Games and Economic Behavior, 5(3):320 367, 1993.

V. K. Garg and T. Jaakkola. Learning tree structured potential games. In Neural Information Processing Systems (NIPS), 2016.

V. K. Garg and T. Jaakkola. Local aggregative games. In Neural Information Processing Systems (NIPS), 2017.

A. Ghoshal and J. Honorio. Learning graphical games from behavioral data: Sufﬁcient and necessary conditions. In International Conference on Artiﬁcial Intelligence and Statistics (AISTATS), 2017.

A. Globerson, T. Roughgarden, D. Sontag, and C. Yildirim. How hard is inference for structured prediction? In International Conference on Machine Learning (ICML), 2015.

J. C. Harsanyi. Games with incomplete information played by bayesian players (part i, the basic model). Management Science, 14(3):159 182, 1967.

J. Honorio and L. Ortiz. Learning the structure and parameters of large-population graphical games from behavioral data. Journal of Machine Learning Research (JMLR), 16: 1157 1210, 2015.

M. T. Irfan and L. E. Ortiz. On inﬂuence, stable behavior, and the most inﬂuential individuals in networks: A gametheoretic approach. Artiﬁcial Intelligence, 215:79 119, 2014.

A. X. Jiang and K. Leyton-Brown. Bayesian action-graph games. In Neural Information Processing Systems (NIPS), 2010.

E. Kalai. Large robust games. Econometrica, 72(6):1631 1665, 2004.

M. Kearns and Y. Mansour. Efﬁcient nash computation in large population games with bounded inﬂuence. In Uncertainty in Artiﬁcial Intelligence (UAI), 2002.

M. Kearns, M. L. Littman, and S. Singh. Graphical models for game theory. In Uncertainty in Artiﬁcial Intelligence (UAI), 2001.

J. Lafferty, A. Mc Callum, and F. Pereira. Conditional random ﬁelds: Probabilistic models for segmenting and labeling sequence data. In International Conference on Machine Learning (ICML), 2001.

B. London, B. Huang, and L. Getoor. Stability and generalization in structured prediction. Journal of Machine Learning Research (JMLR), 2016.

J. Nash. Non-cooperative games. Annals of Mathematics, 54(2):286 295, 1951.

S. Nowozin and C. H. Lampert. Structured learning and prediction in computer vision, 2011.

A. Osokin, F. Bach, and S. Lacoste-Julien. On structured prediction theory with calibrated convex surrogate losses. In Neural Information Processing Systems (NIPS), 2017.

X. Pan and V. Srikumar. Learning to speed up structured output prediction. In International Conference on Machine Learning (ICML), pages 3996 4005, 2018.

P. Ravikumar, M. J. Wainwright, and J. D. Lafferty. Highdimensional ising model selection using ℓ1-regularized logistic regression. The Annals of Statistics, 38(3):1287 1319, 2010.

J. Robinson. An iterative method of solving a game. Annals of Mathematics, 54(2), 1951.

Predicting deliberative outcomes

J. S. Shamma and G. Arslan. Dynamic ﬁctitious play, dynamic gradient play, and distributed convergence to nash equilibria. IEEE Transactions on Automatic Control, 50 (3):312 327, 2005.

L. S. Shapley. Some topics in two-person games. Advances in Game Theory, pages 1 29, 1964.

E. Tardos and V. Vazirani. Introduction to game theory: Basic solution concepts and computational issues, 2007.

B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. In Neural Information Processing Systems (NIPS), 2003.

I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research (JMLR), 6(2):1453 1484, 2005.

E. Voeten, A. Strezhnev, and M. Bailey. United nations general assembly voting data, 2009. URL https:// hdl.handle.net/1902.1/12379.

M. J. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using l1-constrained quadratic programming (lasso). IEEE Transactions on Information Theory, 55(5):2183 2202, 2009.

K. Waugh, B. D. Ziebart, and J. A. Bagnell. Computational rationalization: The inverse equilibrium problem. In International Conference on Machine Learning (ICML), 2011.