# predicting_deliberative_outcomes__90976804.pdf Predicting deliberative outcomes Vikas K. Garg 1 Tommi Jaakkola 1 We extend structured prediction to deliberative outcomes. Specifically, we learn parameterized games that can map any inputs to equilibria as the outcomes. Standard structured prediction models rely heavily on global scoring functions and are therefore unable to model individual player preferences or how they respond to others asymmetrically. Our games take as input, e.g., UN resolution to be voted on, and map such contexts to initial strategies, player utilities, and interactions. Players are then thought to repeatedly update their strategies in response to weighted aggregates of other players choices towards maximizing their individual utilities. The output from the game is a sample from the resulting (near) equilibrium mixed strategy profile. We characterize conditions under which players strategies converge to an equilibrium in such games and when the game parameters can be provably recovered from observations. Empirically, we demonstrate on two real voting datasets that our games can recover interpretable strategic interactions, and predict strategies for players in new settings. 1. Introduction Structured prediction methods (Lafferty et al., 2001; Taskar et al., 2003; Tsochantaridis et al., 2005; Nowozin and Lampert, 2011) typically operate on parametric scoring functions whose maximizing assignment is used as the predicted configuration. Since the parameters can be learned directly to maximize prediction accuracy, often via surrogate losses, the methods have been successful across areas (Chen et al., 2015; Globerson et al., 2015; London et al., 2016; Cortes et al., 2016; Osokin et al., 2017; Pan and Srikumar, 2018). Not all structured observations can be naturally modeled 1CSAIL, MIT. Correspondence to: Vikas Garg , Tommi Jaakkola . Proceedings of the 37 th International Conference on Machine Learning, Online, PMLR 119, 2020. Copyright 2020 by the author(s). as extrema of scoring functions. For instance, votes cast in response to a bill in the US Congress likely involve rounds of negotiations prior to casting the final votes. As a result, the votes reflect individual utilities of representatives as well as their interactions with subsets of others. Absent some unifying scoring function, contentious votes are more naturally modeled in game theoretic terms as an equilibrium rather than a maximizing assignment (Waugh et al., 2011; Garg and Jaakkola, 2017). For example, one could consider the voting outcome directly as a pure strategy Nash equilibrium (PSNE) of some fixed underlying graphical game (Irfan and Ortiz, 2014; Honorio and Ortiz, 2015; Garg and Jaakkola, 2016; 2017). The observed votes would be in this case directly actions in the game, and complete observations help estimation. However, most games do not permit pure strategy equilibria (Tardos and Vazirani, 2007). The voting outcome is therefore best viewed as a sample from a product distribution that represents a mixed strategy equilibrium (guaranteed to exist for any game). Our goal is to use games in a predictive sense. We must therefore explicitly model the game dynamics, i.e., how an equilibrium outcome is reached from the initial conditions. Previous methods for estimating game parameters from data often side-step the issue of game dynamics. It is therefore not possible in such approaches to directly output an equilibrium as the predicted outcome. For games to be useful in structured prediction, the game parameters as well as the initial strategies for the players, prior to deliberation, must be also conditioned on a common context such as a bill to be voted on. Strategic prediction. In this paper, we turn games into structured prediction methods and call such approaches strategic prediction methods. At a high level, our model takes the available context such as the written bill the input and maps it to a mixed strategy equilibrium the output. The input-output mapping is obtained via a deliberative process akin to multiple rounds of negotiations between the countries. The observed actions by players, e.g., how countries voted on the bill, are then viewed as samples from this mixed strategy equilibrium. We model the impact of context by parametrically mapping it to each player as player types. These types are simply learned vectors quantifying what each player derives from the context so as to guide their behavior in the game. The types also specify the utility func- Predicting deliberative outcomes tions and initial strategies for players. A key part of strategic prediction pertains to how players interact with each other. In this paper, we limit our focus to so called aggregative games where the players respond to weighted aggregates of other players actions or strategies. The interactions are weighted and asymmetric, and the weights can be positive or negative representing cooperative or competing relationships between the players. We follow a best-response game dynamics to evolve players initial strategies towards the predicted output, a (near) mixed strategy equilibrium. We call our parametrized games strategic aggregative prediction (SAP) models. Our games can be viewed as conditional versions of directed graphical games (Kearns et al., 2001; Kearns and Mansour, 2002), restricted to a rich subclass of aggregative games that subsumes Cournot oligopoly, mean field, public goods, and population games (Garg and Jaakkola, 2017). These games shield each player from specific information about any neighbor since players respond to aggregate (weighted sum) of their neighbors strategies. Modeling game dynamics. A key novelty of our approach is to explicitly incorporate game dynamics. In our implementation, we follow a k-step best response dynamics, seeded with predicted initial strategies. As we operate on continuous strategies and our best response updates are differentiable, we can back-propagate through the k-step strategy updates, and thus learn parameters efficiently, unlike, e.g. (Garg and Jaakkola, 2016; 2017). Since game dynamics plays a critical role in predictive use of games, we also introduce and provide a deeper analysis of more general dynamics and types of aggregation strategies. Our work yields exact conditions under which strategies converge to different types of equilibrium. Identifiability. We provide identifiability guarantees for the game parameters. For the analysis, we adopt a simpler one-shot setting, where the observed outcome is sampled from player strategies after just one round of communication instead of following k-steps. We characterize conditions under which one-shot interaction weights are identifiable in the sense that we can recover the neighbors of any player as well as the correct sign (positive or negative) of their interaction. Prior guarantees exist for the recovery of equilibria, not game parameters, showing conditions for the recovery of a set of pure strategy equilibria under various noise assumptions (Ghoshal and Honorio, 2017). The rest of this paper is structured as follows. The basic setup is introduced in section 2. We describe parameter estimation in section 3, and extension to new players in section 4. Identifiability is discussed in section 5, and convergence of dynamics in section 6. We present detailed experiments on two real datasets to illustrate the benefits of our approach in section 7. We defer the details of our proofs to the Supplementary. 2. Basic strategic prediction model We first introduce our basic strategic prediction model. To this end, we need to define several components of the model. These include (a) the graphical layout of the game, and how players influence each other; (b) the player types and how these are derived from the context; (c) individual utilities for the players; (d) initial strategies for the players before witnessing the play of others; (e) and the game dynamics, i.e., how players respond to others. Later, in the transferable setting, we will no longer individuate players through their identities but instead introduce feature vectors to predict the strategies of new players in new contexts. Let G = (V, E) be a connected digraph such that vertex i identifies player i [n] {1, 2, . . . , n}, n = |V |. Let A be the finite discrete set of actions, the same for all the players. Each player i plays a randomized (mixed) strategy, i.e., a distribution over actions σi (A), from the simplex ai A σi(ai) = 1, σi(ai) 0 ai A We will denote a joint strategy profile of all the players by (σi, σ i) to emphasize the distinction between player i and the other players. We model the influence of players on others through a weighted aggregation of the strategies. The weights wij R denote the strength of influence of player j on player i. We will call players j [n] \ {i} that have wij = 0 the neighbors of players i. We define a weight matrix W Rn n such that W(i, i) = 0 and W(i, j) = wij. Player i communicates with other players only through aggregator Ai that maps the strategies of other players, i.e., σ i to the weighted sum P j =i wijσj, the effective influence of others. The context influences the game through the types. The type of a player is her valuation that quantifies what she derives from the context. The mapping from context to types could be defined in multiple ways. For simplicity, we parameterize the private type zi of each player i via a matrix θi that maps any context x X Rd to vector zi(x) = θix. Henceforth, we will keep the dependence on context implicit for simplicity, and write zi(x) as zi when the context is clear. We model the utility or payoff of player i [n], of type zi, under strategy profile (σi, σ i) as: Ui(σi, σ i, zi) = σ i (Ai(σ i) zi) + τH(σi) , (1) where H(σi) is the entropy associated with σi and τ 0. The entropy encourages completely mixed strategy choices, Predicting deliberative outcomes in the interior of simplex (A). The type of a player is by design hidden from the other players. Thus by subtracting the player type, the payoff function mitigates the (indirect) role of player s valuation in guiding the aggregate response of the others. Our games are flexible to accommodate a wide variety of other payoff functions. Our choice of payoff function (1) maintains a linear dependence on the strategies of others for two reasons. First, our games are related to linear influence games that have been studied previously, e.g., as models of how to vaccinate against a disease, install antivirus software, or get home insurance (Irfan and Ortiz, 2014; Honorio and Ortiz, 2015; Ghoshal and Honorio, 2017). We extend these games by allowing multi-way actions and types. Second, the linear dependence also helps establish provable recovery guarantees for the game parameters. The utilities naturally define the best response of player i when it observes the aggregate input Ai(σ i): βτ i (Ai(σ i), zi) arg max σi (Ai) Ui(σi, σ i, zi) . (2) We say that (σ i , σ i, zi, z i) form a mixed strategy Nash equilibrium MSNE (MSNE) or simply NE iff Ui(σ i , σ i, zi) Ui(σi, σ i, zi) i [n], σi (A) . (3) It is easy to show that our game has at least one MSNE. For τ > 0, our best response function is continuous, and maps each strategy profile from product simplex (compact space) to a unique point in itself. The Nash equilibrium is therefore guaranteed to exist by Brouwer s fixed point theorem. For τ = 0, the mapping may be set-valued, but the Nash equilibrium is still guaranteed to exist by Nash s theorem (Nash, 1951) via Kakutani s fixed point argument. We say that MSNE is strict (SNE) when (3) is strict for all i [n], σi (A)\{σ i }, completely mixed (CMNE) when σ i (ai) > 0 for all i [n], ai A, and pure (PSNE) when for all i [n] there exists an ai A such that σ i (ai) = 1. It remains to specify how an equilibrium is reached, i.e., the game dynamics. To begin with, players observe context x, and evaluate types zi. The types, in turn, give rise to initial strategies σ0 i = ψ(zi), where ψ : Ti (A) (e.g. softmax). The best response dynamics from this point on depends on the details of the aggregator and whether the dynamics is defined over strategies or actions directly. We provide details on general game dynamics for two different aggregators along with associated convergence guarantees in section 6. In our empirical analysis, we adopt a k-step dynamics as described in the following section. Once a (near) equilibrium is reached, a sample action profile y Y An is drawn from player strategies. We emphasize that our setting dispenses with the restrictive assumption made by Bayesian games (Harsanyi, 1967; Kalai, 2004; Jiang and Leyton-Brown, 2010) that the conditional distribution P(z i|zi) is known to player i. 3. Parameter estimation We learn our games from data as structured prediction methods. Specifically, given a dataset D = {(x(m), y(m))} X Y, m [M]} linking contexts to sampled action profiles, our objective is to estimate the type parameters θi and the influences of neighbors wi (wij)j =i, i [n]. Each pair (x(m), y(m)) is treated as follows. A linear transformation ˆθ = (ˆθ1, . . . , ˆθn) maps the context x(m) to the types ˆz(x(m)) (ˆz(m) 1 , . . . , ˆz(m) n ) that result in initial strategies ˆσ0 i (x(m)) = ζ(ˆz(m) i ) of the players i [n], where ζ is the softmax nonlinearity. The aggregators ˆ Ai evaluate weighted sums, and are parametrized by weights ˆwi = ( ˆwij)j =i. A sequence of k update steps ˆσt+1 i (x(m)) = ζ(ν(ˆσt i) + α( ˆ Ai(ˆσt i) ˆz(m) i )), (4) t {0, 1, . . . , k 1} is then followed: k and α are hyperparameters, and ν defines the type of update. Several choices of ν are possible; e.g., ν(ˆσt i) = 0 pertains to best response β1/α i ( ˆ Ai(ˆσt i), ˆz(m) i ) defined in (2), the identity mapping ν(ˆσt i) = ˆσt i defines a gradient step, and ν(ˆσt i) = log ˆσt i corresponds to a proximal update based on KL-divergence: we define ˆσt+1 i as arg max σi σ i ( ˆAi(ˆσt i) ˆz(m) i ) + (1/α)KL(σi||ˆσt i) . Note that we can use functions other than softmax for our strategy updates. For instance, we can take a gradient step and then project on the probability simplex. One attractive feature of softmax is that it is differentiable and its gradient can be computed in a closed form, so it can be readily backpropagated through in a neural model. On the other hand, projection on the probability simplex typically requires specialized optimization algorithms such as (Duchi et al., 2008) or (Condat, 2016). Our estimation criterion for the game is to minimize expected cross-entropy loss E[ℓ(ˆσk(x(m)), y(m)] between the predicted mixed strategies and the observed profiles, where the expectation is with respect to the empirical distribution over pairs (x(m), y(m)), ℓis the cross entropy loss, and ˆσk(x(m)) (ˆσk i (x(m)))i [n]. We use standard backpropagation to evaluate gradients through the k-step strategy updates efficiently. Note that since we do not assume that the observed configurations y(m) correspond to pure strategy Nash equilibria, we naturally avoid having to enforce specific consistency constraints. Instead, we can measure a degree of agreement that is amenable to more complex parameterizations such as Predicting deliberative outcomes neural networks. Our estimation procedure also scales better than methods that rely on PSNEs, e.g., (Garg and Jaakkola, 2016; 2017). The overall dependence of our approach is linear in number of observed game outcomes and the number of update steps, and quadratic in number of players. 4. Transferable strategic prediction Here we generalize SAP models to permit different players from one game to another. Unlike in section 2, we can no longer assume a fixed interaction structure across games. Instead, the neighbor influences are determined by context and player feature vectors. Specifically, we construct a feature vector bi B for each player i [n]. Such information is often available, e.g., education and gender of judges; human development indicators of countries, etc. This enables us to predict the behavior of new players in new contexts. Each game is played with a different subset of players I [n], and is unrolled as follows. A context x X is mapped to latent (more general) player types using a parametric function fz : X B Z, taking each pair (x, bv), v I as input, and mapping it to zx,v Z. These types define initial strategies as before σ0 x,v = φ(Γzx,v), where Γ is a transformation matrix that yields a vector in R|A|, and φ (e.g., softmax) maps the result to a distribution in the simplex (A). Unlike before, the (asymmetric) influences between players are now calculated parametrically from the types: wx,v,v = fw(zx,v, zx,v ) using a parametric mapping fw : Z Z R. Each player v still responds to other players v I \ {v} through its aggregator Ax,v,I(σx, v) X v I:v =v wx,v,v σx,v . We can extend the definition of the payoffs slightly to incorporate the more general types: Uv,I(σx,v, σx, v, zx,v) = σ v (Ax,v,I(σx, v)) Γzx,v) . where Γ is an additional parameter matrix to be learned. The game dynamics dictates the course of play in the same fashion as the basic strategic setting. We learn the model, now parameterized by fz, fw, and Γ, by minimizing the loss between predicted k-step strategies and observed action profiles. 5. Identifiability of the games We now characterize the conditions under which the strategic interactions in our games become identifiable. We focus on the one-shot setting (i.e., k = 1) with the gradient update so that the observed outcome is sampled from player strategies after one round of communication. We also use binary actions to simplify the exposition. Specifically, we estimate from data D the support Si or the set of neighbors of i defined with respect to the unknown influences w ij = 0. We also recover the correct sign of these influences. Our recovery procedure is a novel adaptation of the primaldual witness method (Wainwright, 2009) for structure estimation in games. The method has previously been applied in several non-strategic settings such as Lasso (Wainwright, 2009) and Ising models (Ravikumar et al., 2010). In contrast to Lasso and Ising models, our setting poses some new challenges. Unlike these models, the nodes of our games are players that actively communicate with others, and refine their strategies toward maximizing their individual utilities. Moreover, context and dynamics play no part in Ising models. In our setting, each outcome is sampled from a separate joint strategy profile following one step of dynamics initiated under a different context. (Ghoshal and Honorio, 2017) employed the witness method to recover the entire set of PSNE from data consisting of a subset of PSNE, and a small fraction of non-equilibrium outcomes assumed to be sampled under their noise models in the setting of linear influence games. However, the problem of structure recovery is significantly harder: it is known (Honorio and Ortiz, 2015; Ghoshal and Honorio, 2017) that the problem becomes non-identifiable in the setting of PSNE, since multiple game structures may pertain to the same of PSNE. We leverage the dynamics to circumvent these issues, and characterize the conditions when we can provably recover the structure of our games. Our analysis on recovering the influences makes the simplifying assumption that the type parameters θ are known. We denote by φ(m) j the probability assigned by the initial strategy σ0 j (x(m)) to action 1 for player j on example m. Let ℓi(wi; D) be the average cross-entropy loss between one-step strategies under candidate weights wi (wij)j =i and the observed actions for player i, and λ > 0 be a regularization parameter. We solve the following problem for each player i [n]: arg min wi Rn 1 ℓi(wi; D) + λ||wi||1 , (5) Let HM i be the sample Hessian 2ℓi(wi; D) under wi, and H M i pertain to true weights w i . Let Λmin( ) and Λmax( ) denote the minimum and the maximum eigenvalues. The following assumptions serve as our analogues of the conditions for support recovery in Lasso (Wainwright, 2009) and Ising models (Ravikumar et al., 2010): Λmin H M i,SS α2Cmin (6) m=1 Φ(m) i Φ(m) |||H M i,Sc S(H M i,SS) 1||| 1 γ , (8) where Φ(m) i (φ(m) j )j =i, Cmin > 0, Cmax < , Predicting deliberative outcomes γ (0, 1]; |||A||| is the maximum over L1-norm of rows in A; H M i,SS is the submatrix obtained by restricting H M i to rows and columns corresponding to neighbors, i.e., players in Si, and H M i,SSc is restricted to rows pertaining to Si and columns to Sc i (non-neighbors). Let the number of neighbors for any player be at most d n 1. We have the following result. Theorem 1. Let M > 802C2 max C4 min 4 d2 log(n), and M . Suppose the sample satisfies assumptions (6), (7), and (8). Define Cα,γ = γ2 32α2(2 γ)2 . Consider any player i [n]. The following results hold with probability at least 1 2 exp( Cα,γλ2 MM) 1 for i. 1. The corresponding L1-regularized optimization problem has a unique solution, i.e., a unique set of neighbors for i. 2. The set of predicted neighbors of i is a subset of the true neighbors. Additionally, the predicted set contains all true neighbors j for which |w ij| 10 α2Cmin In particular, the set of true neighbors of i is exactly recovered if min j Si |w ij| 10 α2Cmin Note that taking a union bound over players, our results imply that we recover the true signed neighborhoods for all players in our game with high probability. Our analysis can possibly be extended to include unknown type parameters θ by computing the gradient and the Hessian of loss with respect to θ, similarly to influences. However, additional assumptions may be needed to establish recovery. 6. General game dynamics and convergence We now give an overview of general game dynamics along with associated convergence guarantees. The aggregator acts as a privacy preserving component, hiding specific neighbor actions or strategies, only offering aggregate statistics. We design dynamics under two different kinds of feedback from the aggregator. In our active aggregator (AA) setting, the players get a prediction about the anticipated aggregate of their neighbors. In contrast, a passive aggregator (PA) only provides the aggregate of empirical frequencies used by the neighbors, and changes in the aggregate are estimated by the player. Intuitively, AA reveals more information about the neighbors strategy evolution. Note that the payoff received by player i when she samples a pure action ai A according to σi (denoted by ai σi), and others sample actions a i according to σ i is Ui(eai, ea i, zi) = e ai Ai(ea i) zi + τH(σi), where eai (A) is the strategy corresponding to pure action ai, and ea i {eaj : j = i}. We devise two new protocols that may be viewed as derivative action adaptations (Shamma and Arslan, 2005) of smooth fictitious play (FP) and gradient play (GP) (Brown, 1951; Robinson, 1951; Fudenberg and Kreps, 1993; Shapley, 1964) in an aggregative game setting. The protocols differ by how players respond to the (predicted) aggregate: one can play the best response or adapt the strategy via a gradient update. Formally, in our protocols, player i samples an action ak i σk i at time k > 0 based on qk i = qk 1 i + (eak 1 i qk 1 i )/k (A) σk i = gi(Ai(h i(qk i)), zi) , (9) where qk i is the empirical frequency of actions played by i till time k, and gi : R|A| R|A| (A) and hi : R|A| R|A| are appropriately defined Lipschitz mappings possibly involving small input noise. We let qk 1 i {qk 1 j |j = i}, and h i(qk 1 i ) {hj(qk 1 j )|j = i}. We also define the base case q0 i = σ0 i = φ(zi). Note that player i communicates only with Ai. We define a passive aggregator (PA) by letting hi be the identity mapping, i.e. hi(vi) = vi. Alternatively, when hi(vi) = vi + γ vi for some γ > 0 and a difference approximation vi of a temporal derivative vi, we obtain an active aggregator (AA). Intuitively, AA views each qj as discretization of a continuous signal qj(t) so that when qj(t) qj(t), for neighbors j of i, we have hj(qj(t)) qj(t) + γ qj(t) qj(t + γ) = Ai(h i(q i(t))) Ai(q i(t + γ)), and therefore Ai offers a predicted aggregate to player i. We consider two forms of best response dynamics encoded in gi, SAP-FP and SAP-GP, based on derivative FP and derivative GP, respectively. In SAP-FP we set τ > 0 in the utility functions. This lets us have a unique best response: AA yields uk : gi(uk, zi) = βτ i (uk, zi), PA yields uk : gi(uk, zi) = βτ i (uk + γ ˆuk, zi), where the AA case differs from PA in terms of where the difference approximation happens. In AA, it happens prior to aggregation thus gi is defined directly in terms of the output of AA or uk which absorbs any temporal approximation error. In PA case, the player constructs a temporal prediction of the aggregate, and the approximation is ( uk ˆuk). Predicting deliberative outcomes (SAP-FP/AA) qi = βτ i (Ai(q i + γ r i), zi) qi, ri = λ(qi ri) (10) (SAP-FP/ PA) qi = βτ i (Ai(q i) + γ ri, zi) qi, ri = λ(Ai(q i) ri) (11) (SAP-GP/AA) qi = Π [qi + Ai(q i + γ r i) zi] qi, ri = λ(qi ri) (12) (SAP-GP/PA) qi = Π [qi + Ai(q i) + γ ri zi] qi, ri = λ(Ai(q i) ri) (13) In SAP-GP, we set τ = 0, and player i takes a gradient step to maximize the anticipated payoff followed by a Euclidean projection to get a unique mapping gi (since any such projection on a closed convex set is unique): AA yields uk : gi(uk, zi) = Π (qi + uk zi), PA yields uk : gi(uk, zi) = Π (qk i + uk + γ ˆuk zi), where Π (q) argmin q (A) || q q||2 . Thus under both protocols, players take actions stochastically according to σk i and the best response mapping gi is unique for each k. For analysis, we associate an ODE system to characterize the evolution of player strategies, and specify conditions when the fixed points of this ODE are locally asymptotic stable (l.a.s.). An equilibrium point s is said to be l.a.s. if every ODE trajectory that starts at a point in a small neighborhood of s remains forever in that neighborhood and eventually converges to s. As a consequence, our discrete updates would converge to a Nash equilibrium with positive probability (Shamma and Arslan, 2005). Our updates in (9) lead to the implicit ODEs (39)-(42) for SAPFP and SAP-GP under AA and PA settings, where λ > 0, ri is an estimate for qi, and r i { rj|j = i, wij = 0}. We will call a matrix stable if all its eigenvalues have strictly negative real parts. Let I denote the identity matrix. We now state results that characterize conditions under which different dynamics lead to asymptotically stable equilibria. We state and prove our convergence results as theorems in the Supplementary (see Table 1 for a summary). Specifically, we prove the convergence of player strategies to SNE via carefully crafted Lyapunov functions that are locally positive definite and have a locally negative semidefinite time derivative. The other proofs track the evolution of game dynamics around an equilibrium: we analyze conditions under which the Jacobian matrix of the linearization of ODE is Hurwitz stable, i.e., all the eigenvalues have negative real roots, and exploit the fact that the behavior of the ODE near equilibrium is same as its linear approximation when all eigenvalues have non-zero real parts. Our results have several important implications. (Garg and Jaakkola, 2016; 2017) enforced margin constraints on the payoff functions in their discriminative PSNE setup, without establishing SNE that guarantees the players would be strictly worse off by a unilateral switch to another strategy. In contrast, Theorems 5 and 7 specify exactly the conditions Table 1: Convergence results (details in the Supplementary) Setting Converges to Conditions SAP-FP/AA NE Theorem 2 SAP-FP/PA NE Theorem 3 SAP-GP/AA CMNE Theorem 4 SAP-GP/AA SNE Theorem 5 SAP-GP/PA CMNE Theorem 6 SAP-GP/PA SNE Theorem 7 for SNE, which under additional assumptions, may be sampled to get a PSNE (Azrieli and Shmaya, 2013) if desired. It is known that classic FP fails to converge in some simple games, e.g. (Shapley, 1964), that have a unique CMNE. Theorems 4 and 6 specify conditions that circumvent such negative results. Our stability conditions can be simplified further when λ is sufficiently large, whence the behavior may be understood solely in terms of γ. 7. Experiments We now describe the results of our experiments on two real datasets that provide insights into some important aspects of our games. We first show that our SAP model qualitatively recovers the known strategic behavior of the Justices from US Supreme Court data. We also provide quantitative evidence in terms of two measures: (a) the fraction of correct edges recovered, and (b) coherence of groups in terms of cut-size, to show that SAP game outperforms prior methods on two different measures. We then show that the structure estimated by SAP game on the UN General Assembly data (Voeten et al., 2009) is meaningful, and helps unravel the subtle behavior of member countries. Finally, we demonstrate that SAP games incur a lower cross-entropy loss than our baseline, so can be effectively transferred to predict strategies in new settings with different sets of players. Experimental setup We found that SAP games performed well over a wide range of hyperparameters. We implemented the models in sections 3 and 4 with L1 regularization for structure estimation as described in section 5. Our models performed well for a wide range of α, λ, and k. We report the results with k = 5, Predicting deliberative outcomes (a) Positive Influence (b) Negative Influence Figure 1: Supreme Court structure recovery with our SAP model: (a) and (b) show, respectively, justices with the most positive and the most negative influence, quantified by ˆwij, on each Justice i. The estimated connections are consistent with the known jurisprudence of the Court. In particular, (a) shows coherence between the conservatives (red), that between the liberals (blue), and the separation of these ideologies from the moderates (green). Likewise, (b) shows all the negative connections are between the blocs. Determining influences by heuristics such as ordering by pairwise vote agreements does not work; e.g, that would imply K had a strong positive influence on R, since R agreed with K more than with anyone else. (a) SAP (Ours) (b) Local AG (c) Tree Structured Potential Game Figure 2: Quantitative comparison with prior methods: (a), (b), and (c) show the structures estimated by SAP, local AG, and tree structured potential game on US Supreme Court data. We evaluate the different methods using size of the cut, i.e., minimum number of edges to be removed to decompose the structure into its three components (reds, blues, and greens). A low value of cut quantifies high coherence within each component, and thus pertains to a good structure. The cut size for SAP game (3) is much lower than other methods (6 each). α = 0.1, and λ = 0.1 for all our experiments, except the transferable setting where we set α = 0.01 and λ = 0. We set ν to be the identity function in (4). We trained our models in batches of size 200, with default settings of the RMSprop optimizer in Py Torch. To account for effect of randomness in neural training toward structure estimation, we averaged the parameters of each model across 5 independent runs. We characterize the influence of other players j on player i in terms of the ordering of the corresponding estimated average weight values from most positive to most negative. We did not impose L1 penalty in the transferable setting since the interactions are learned for new individuals, i.e., they are not specific to any fixed set of players, unlike the basic setting. 7.1. US Supreme Court Data We included all the cases from the Rehnquist Court, during the period 1994-2005 that had votes documented for all the 9 Justices.1 Justices Rehnquist (R), Scalia (Sc), and Thomas (T) represented the conservative side; Justices Stevens (St), Souter (So), Ginsburg (G), and Breyer (B) formed the liberal bloc; and Justices Kennedy (K) and O Connor (O), often called swing votes, followed a moderate ideology. Each Justice is treated as a player in our game. Our contexts comprise of 32 binary attributes about the specifics of the appeal, e.g., the disposition of lower court. Our observation outcome y(m) for each context x(m) pertains to the corresponding votes of the Justices. The votes belong to one of the three categories: yes, no, or complex. Structure recovery from Supreme Court data Fig. 1 describes in detail how our method yields a structure that is qualitatively consistent with the known jurisprudence of the Rehnquist Court. Specifically, the conservatives and the liberals form two separate coherent, strongly-connected blocs that are well-segregated from each other. 1data available at: http://scdb.wustl.edu/data.php Predicting deliberative outcomes (a) Positive influence (b) Negative influence Figure 3: UNGA structure recovery with SAP games: Incoming arrows show (a) 2 countries with the most positive influence (black edges), and (b) 2 countries with the most negative influence (magenta edges), quantified by ˆwij, on each country i. The estimated links are largely consistent with the expected alignments. In particular, (a) shows the two blocs (in yellow and green) are well segregated from each other. More interesting alignments are revealed, e.g., (1) strong affinity between NATO members on one side, and Syria, Iran, Venezuela etc. on the other, (2) link from Germany and Korea to Russia hinting at the trade influence despite their differences, and (3) geographical influence of Russia and Ukraine on Belarus. Additionally, (b) reveals that a significant fraction of negative connections emanate from or end at Israel and USA on one side, and some yellow node on the other. Comparison with previous works In order to position our work with respect to prior work on structure recovery, we also quantitatively compare SAP games with the PSNE based Potential Game (Garg and Jaakkola, 2016) and Local Aggregative Game (Garg and Jaakkola, 2017) methods. Fig. 2 provides a detailed quantitative comparison that reveals how SAP compared favorably with these measures in terms of size of the cut, i.e., minimum number of edges to be removed in order to decompose the recovered structure into the constituent blocs. Note that a low value of cut indicates coherence within each component, and thus quantifies a good structure. 7.2. United Nations General Assembly (UNGA) Data Our second dataset consists of the roll call votes of the member countries on the resolutions considered in the UN General Assembly. Each resolution is a textual description that provides a context while the votes of the countries on the resolution pertain to the observed outcome. We compiled data on all resolutions in UNGA during 19922017 to understand the interactions of member nations since the dissolution of Soviet Union. We considered 25 countries that have dominated the United States (USA) politics, and are generally known to belong to one of the two blocs: pro-USA, namely, Australia (AUS), Canada (CAN), France (FRA), Germany (GER), Israel (ISR), Italy (ITA), Japan (JPN), South Korea (KOR), Norway (NOR), Ukraine (UKR); and others, namely, Afghanistan (AFG), Belarus (BLR), China (CHN), Cuba (CUB), Iran (IRN), Iraq (IRQ), Mexico (MEX), Pakistan (PAK), Philippines (PHL), North Korea (PRK), Kazakhstan (KAZ), Russia (RUS), Syria (SYR), Venezuela (VEN) and Vietnam (VNM). Each country is viewed as a player in our game. We used pretrained GLo Ve embeddings to represent each resolution as a 50-dimensional context vector x(m) obtained by taking the mean embedding of the words in its resolution. Each vote was interpreted to take one of the three values: 1 (yes), 2 (absent/abstain), or 3 (no), and we represented y(m) as a 26-dimensional vector. Structure and type recovery from UNGA data Fig. 3 shows the structure estimated by our method, i.e., the weights ˆwi learned for each country i. The weights ˆwij, j = i have a natural interpretation in terms of influence: the more positive wij is, the more positive the influence of j on i. A similar connotation holds for the negative weights. To aid visualization, we disentangle the positive Predicting deliberative outcomes (a) Type similarity (b) Transfer performance Figure 4: (Left: UNGA type recovery) Incoming arrows show 2 countries with the highest cosine similarity ˆθ i ˆθj/(||ˆθi|| ||ˆθj||) for each country i. Type vectors were reasonably well aligned for members in the same bloc. (Right: transfer performance) SAP models were more effective in predicting strategies for new players as quantified by a lower transfer loss than that incurred by the baseline strategy. and the negative connections, and depict only the most influential connections for either case. Fig. 3 describes how our method estimated a meaningful structure, and unraveled subtle influences beyond the prominent two-bloc structure. The learned type parameters ˆθi were also found to be similar for members in the same bloc (Fig. 4). Prediction and transfer results Our final set of experiments focused on transferring SAP games. Besides UNGA data, we compiled a 73-dimensional feature vector for each country from its HDI Indicators (UN2, 2007) that span various spheres including health, education, trade, and social-economic sustainability. We kept aside three-fourths of the data to set up games over small sets of players by randomly sampling 5 countries independently for each context. The left-out players from these contexts were then sampled independently to get another set of 5 countries per context. We call these two sets A and B respectively. We formed a third set C of 5 countries per game from the untouched data (i.e. the one-fourth fraction). Thus, A and B were defined on the same contexts, that were disjoint from C. We averaged our results over 10 such independent triplets (A, B, C) to mitigate sampling effects. We trained a model for A using the procedure in section 4, and computed the loss on B. Our baseline, train empirical, used the empirical distribution of actions for each player i over the games it participated in A as its predicted strategy for the games in B and C. The cross-entropy loss of SAP games (0.852) turned out to be lower than the baseline (0.865) on C. Moreover, as Fig. 4 shows, the loss of SAP games (0.696) was found to be significantly lower than the baseline (0.720) on B even though HDI does not fully reflect complex country characteristics. Thus, our results clearly underscore the benefits of using SAP games for strategic prediction. We extended structured prediction to strategic settings, and specified conditions under which our games are identifiable from data, and when strategies converge to an equilibrium. Empirical results demonstrate effectiveness of our models in uncovering meaningful strategic interactions from data, predicting player strategies on new contexts, and transferring knowledge to predict strategies for new players. We sidestepped the issue of rate of convergence to equilibrium in our analysis. Since there might be multiple mixed strategy equilibria in a game, convergence rate to an equilibrium depends on the proximity of the initial joint strategy profile, and the spectrum of Jacobian matrix resulting from linearization of the associated ODE. Acknowledgments TJ and VG acknowledge support from ONR and the MITIBM collaboration. Predicting deliberative outcomes Human Development Reports. Available at http://hdr.undp.org/en/countries/profiles/. United Nations Development Programme, 2007. Y. Azrieli and E. Shmaya. Lipschitz games. Mathematics of Operations Research, 38(2):350 357, 2013. M. Benaïm, J. Hofbauer, and S. Sorin. Stochastic approximations and differential inclusions. SIAM Journal on Control and Optimization, 44:328 348, 2005. M. Benaïm, J. Hofbauer, and S. Sorin. Stochastic approximations and differential inclusions, part ii: Applications. Mathematics of Operations Research, 31(4):673 695, 2006. V. S. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge University Press, 2008. G. W. Brown. Iterative solutions of games by fictitious play. In Activity Analysis of Production and Allocation, 1951. L.-C. Chen, A. Schwing, A. Yuille, and R. Urtasun. Learning deep structured models. In International Conference on Machine Learning (ICML), 2015. L. Condat. Fast projection onto the simplex and the l1 ball. Mathematical Programming, 158(1 2):575 585, 2016. C. Cortes, V. Kuznetsov, M. Mohri, and S. Yang. Structured prediction theory based on factor graph complexity. In Neural Information Processing Systems (NIPS), 2016. J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the l1-ball for learning in high dimensions. In International Conference on Machine Learning (ICML), 2008. D. Fudenberg and D. M. Kreps. Learning mixed equilibria. Games and Economic Behavior, 5(3):320 367, 1993. V. K. Garg and T. Jaakkola. Learning tree structured potential games. In Neural Information Processing Systems (NIPS), 2016. V. K. Garg and T. Jaakkola. Local aggregative games. In Neural Information Processing Systems (NIPS), 2017. A. Ghoshal and J. Honorio. Learning graphical games from behavioral data: Sufficient and necessary conditions. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2017. A. Globerson, T. Roughgarden, D. Sontag, and C. Yildirim. How hard is inference for structured prediction? In International Conference on Machine Learning (ICML), 2015. J. C. Harsanyi. Games with incomplete information played by bayesian players (part i, the basic model). Management Science, 14(3):159 182, 1967. J. Honorio and L. Ortiz. Learning the structure and parameters of large-population graphical games from behavioral data. Journal of Machine Learning Research (JMLR), 16: 1157 1210, 2015. M. T. Irfan and L. E. Ortiz. On influence, stable behavior, and the most influential individuals in networks: A gametheoretic approach. Artificial Intelligence, 215:79 119, 2014. A. X. Jiang and K. Leyton-Brown. Bayesian action-graph games. In Neural Information Processing Systems (NIPS), 2010. E. Kalai. Large robust games. Econometrica, 72(6):1631 1665, 2004. M. Kearns and Y. Mansour. Efficient nash computation in large population games with bounded influence. In Uncertainty in Artificial Intelligence (UAI), 2002. M. Kearns, M. L. Littman, and S. Singh. Graphical models for game theory. In Uncertainty in Artificial Intelligence (UAI), 2001. J. Lafferty, A. Mc Callum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In International Conference on Machine Learning (ICML), 2001. B. London, B. Huang, and L. Getoor. Stability and generalization in structured prediction. Journal of Machine Learning Research (JMLR), 2016. J. Nash. Non-cooperative games. Annals of Mathematics, 54(2):286 295, 1951. S. Nowozin and C. H. Lampert. Structured learning and prediction in computer vision, 2011. A. Osokin, F. Bach, and S. Lacoste-Julien. On structured prediction theory with calibrated convex surrogate losses. In Neural Information Processing Systems (NIPS), 2017. X. Pan and V. Srikumar. Learning to speed up structured output prediction. In International Conference on Machine Learning (ICML), pages 3996 4005, 2018. P. Ravikumar, M. J. Wainwright, and J. D. Lafferty. Highdimensional ising model selection using ℓ1-regularized logistic regression. The Annals of Statistics, 38(3):1287 1319, 2010. J. Robinson. An iterative method of solving a game. Annals of Mathematics, 54(2), 1951. Predicting deliberative outcomes J. S. Shamma and G. Arslan. Dynamic fictitious play, dynamic gradient play, and distributed convergence to nash equilibria. IEEE Transactions on Automatic Control, 50 (3):312 327, 2005. L. S. Shapley. Some topics in two-person games. Advances in Game Theory, pages 1 29, 1964. E. Tardos and V. Vazirani. Introduction to game theory: Basic solution concepts and computational issues, 2007. B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. In Neural Information Processing Systems (NIPS), 2003. I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research (JMLR), 6(2):1453 1484, 2005. E. Voeten, A. Strezhnev, and M. Bailey. United nations general assembly voting data, 2009. URL https:// hdl.handle.net/1902.1/12379. M. J. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using l1-constrained quadratic programming (lasso). IEEE Transactions on Information Theory, 55(5):2183 2202, 2009. K. Waugh, B. D. Ziebart, and J. A. Bagnell. Computational rationalization: The inverse equilibrium problem. In International Conference on Machine Learning (ICML), 2011.