# local_aggregative_games__991ed7dd.pdf

Local Aggregative Games

Vikas K. Garg CSAIL, MIT vgarg@csail.mit.edu

Tommi Jaakkola CSAIL, MIT tommi@csail.mit.edu

Aggregative games provide a rich abstraction to model strategic multi-agent interactions. We introduce local aggregative games, where the payoff of each player is a function of its own action and the aggregate behavior of its neighbors in a connected digraph. We show the existence of a pure strategy ϵ-Nash equilibrium in such games when the payoff functions are convex or sub-modular. We prove an information theoretic lower bound, in a value oracle model, on approximating the structure of the digraph with non-negative monotone sub-modular cost functions on the edge set cardinality. We also deﬁne a new notion of structural stability, and introduce γ-aggregative games that generalize local aggregative games and admit ϵ-Nash equilibrium that is stable with respect to small changes in some speciﬁed graph property. Moreover, we provide algorithms for our models that can meaningfully estimate the game structure and the parameters of the aggregator function from real voting data.

1 Introduction

Structured prediction methods have been remarkably successful in learning mappings between input observations and output conﬁgurations [1; 2; 3]. The central guiding formulation involves learning a scoring function that recovers the conﬁguration as the highest scoring assignment. In contrast, in a game theoretic setting, myopic strategic interactions among players lead to a Nash equilibrium or locally optimal conﬁguration rather than highest scoring global conﬁguration. Learning games therefore involves, at best, enforcement of local consistency constraints as recently advocated [4].

[4] introduced the notion of contextual potential games, and proposed a dual decomposition algorithm for learning these games from a set of pure strategy Nash equilibria. However, since their setting was restricted to learning undirected tree structured potential games, it cannot handle (a) asymmetries in the strategic interactions, and (b) higher order interactions. Moreover, a wide class of strategic games (e.g. anonymous games [5]) do not admit a potential function and thus locally optimal conﬁgurations do not coincide with pure strategy Nash equilibria. In such games, the existence of only (approximate) mixed strategy equilibria is guaranteed [6].

In this work, we focus on learning local aggregative games to address some of these issues. In an aggregative game [7; 8; 9], every player gets a payoff that depends only on its own strategy and the aggregate of all the other players strategies. Aggregative games and their generalizations form a very rich class of strategic games that subsumes Cournot oligopoly, public goods, anonymous, mean ﬁeld, and cost and surplus sharing games [10; 11; 12; 13]. In a local aggregative game, a player s payoff is a function of its own strategy and the aggregate strategy of its neighbors (i.e. only a subset of other players). We do not assume that the interactions are symmetric or conﬁned to a tree structure, and therefore the game structure could, in general, be a spanning digraph, possibly with cycles.

We consider local aggregative games where each player s payoff is a convex or submodular Lipschitz function of the aggregate of its neighbors. We prove sufﬁcient conditions under which such games admit some pure strategy ϵ-Nash equilibrium. We then prove an information theoretic lower bound that for a speciﬁed ϵ, approximating a game structure that minimizes a non-negative monotone submodular cost objective on the cardinality of the edge set may require exponentially many queries under a zero-order or value oracle model. Our result generalizes the approximability of the submodular minimum spanning tree problem to degree constrained spanning digraphs [14]. We argue that this lower bound might be averted with a dataset of multiple ϵ-Nash equilibrium conﬁgurations sampled

31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.

from the local aggregative game. We also introduce γ-aggregative games that generalize local aggregative games to accommodate the (relatively weaker) effect of players that are not neighbors. These games are shown to have a desirable stability property that makes their ϵ-Nash equilibria robust to small ﬂuctuations in the aggregator input. We formulate learning these games as optimization problems that can be efﬁciently solved via branch and bound, outer approximation decomposition, or extended cutting plane methods [17; 18]. The information theoretic hardness results do not apply to our algorithms since they have access to the (sub)gradients as well, unlike the value oracle model where only the function values may be queried. Our experiments strongly corroborate the efﬁcacy of the local aggregative and γ-aggregative games in estimating the game structure on two real voting datasets, namely, the US Supreme Court Rulings and the Congressional Votes.

We consider an n-player game where each player i [n] {1, 2, . . . , n} plays a strategy (or action) from a ﬁnite set Ai. For any strategy proﬁle a, ai denotes the strategy of the ith player, and a i the strategies of the other players. We are interested in local aggregative games that have the property that the payoff of each player i depends only on its own action and the aggregate action of its neighbors NG(i) = {j V (G) : (j, i) E(G)} in a connected digraph G = (V, E), where |V | = n. Since, the graph is directed, the neighbors need not be symmetric, i.e., (j, i) E does not imply (i, j) E.

For any strategy proﬁle a, we will denote the strategy vector of neighbors of player i by a NG(i). We assume that player i has a payoff function of the form ui(ai, f G(a, i)), where f G(a, i) f(a NG(i)) is a local aggregator function, and ui is convex and Lipschitz in the aggregate f G(a, i) for all ai Ai. Since f G(a, i) may take only ﬁnitely many values, we will assume interpolation between these values such that they form a convex set. We can deﬁne the Lipschitz constant of G as

δ(G) max i,ai,a i,a i {ui(ai, f G(a , i)) ui(ai, f G(a , i))}, (1)

where the vectors a i and a i differ in exactly one coordinate. Clearly, the payoff of any player in the network does not change by more than δ(G) when one of the neighbors changes its strategy. We can now talk about a class of aggregative games characterized by the Lipschitz constant:

L( , n) = {G : V (G) = n, δ(G) }.

A strategy proﬁle a = (ai, a i) is said to be a pure strategy ϵ-Nash equilibrium (ϵ-PSNE) if no player can improve its payoff by more than ϵ by unilaterally switching its strategy. In other words, any player i cannot gain more than ϵ by playing an alternative strategy a i if the other players continue to play a i. More generally, instead of playing deterministic actions in response to the actions of others, each player can randomize its actions. Then, the distributions over players actions constitute a mixed strategy ϵ-Nash equilibrium if any unilateral deviation could improve the expected payoff by at most ϵ. We will prove the existence of ϵ-PSNE in our setting. We will assume a training set S = {a1, a2, . . . , a M}, where each ai is an ϵ-PSNE sampled from our game. Our objective is to recover the game digraph G and the payoff functions ui, i [n] from the set S.

The rest of the paper is organized as follows. We ﬁrst establish some important theoretical paraphernalia on the local aggregative games in Section 3. In Section 4, we introduce γ-aggregative games and show that γ-aggregators are structurally stable. We formulate the learning problem in Section 5, and describe our experimental set up and results in Section 6. We state the theoretical results in the main text, and provide the detailed proofs in the Supplementary (Section 7) for improved readability.

3 Theoretical foundations

Any ﬁnite game is guaranteed to admit a mixed strategy ϵ-equilibrium due to a seminal result by Nash [6]. However, general games may not have any ϵ-PSNE (for small ϵ). We ﬁrst prove a sufﬁcient condition for the existence of ϵ-PSNE in local aggregative games with small Lipschitz constant. A similar result holds when the payoff functions ui( ) are non-negative monotone submodular and Lipschitz (see the supplementary material for details).

Theorem 1. Any local aggregative game on a connected digraph G, where G L( , n) and max i |Ai| m, admits a 10 p

ln(8mn)-PSNE.

Proof. (Sketch.) The main idea behind the proof is to sample a random strategy proﬁle from a mixed strategy Nash equilibrium of the game, and show that with high probability the sampled proﬁle corresponds to an ϵ-PSNE when the Lipschitz constant is small. The proof is based on a novel application of the Talagrand s concentration inequality.

Theorem 1 implies the minimum degree d (which depends on number of players n, the local aggregator function A, Lipschitz constant , and ϵ) of the game structure that ensures the existence of at least one ϵ-PSNE. One example is the following local generalization of binary summarization games [8]. Each player i plays ai {0, 1} and has access to an averaging aggregator that computes the fraction of its neighbors playing action 1. Then, the Lipschitz constant of G is 1/k, where k is the minimum degree the underlying game digraph. Then, an ϵ-PSNE is guaranteed for k = Ω(

ln n/ϵ). In other words, k needs to grow slowly (i.e., only sub-logarithmically) in the number of players n.

An important follow-up question is to determine the complexity of recovering the underlying game structure in a local aggregative game with an ϵ-PSNE. We will answer this question in a combinatorial setting with non-negative monotone submodular cost functions on the edge set cardinality. Speciﬁcally, we consider the following problem. Given a connected digraph G(V, E), a degree parameter d, and a submodular cost function h : 2E R+ that is normalized (i.e. h( ) = 0) and monotone (i.e. h(S) h(T) for all S T 2E), we would like to ﬁnd a spanning directed subgraph1 Gs of G such that f(Gs) is minimized, the in-degree of each player is at least d, and Gs admits some ϵ-Nash equilibrium when players play to maximize their individual payoffs. We ﬁrst establish a technical lemma that provides tight lower and upper bounds on the probability that a directed random graph is disconnected, and thus extends a similar result for Erd os-Rényi random graphs [25] to the directed setting. The lemma will be invoked while proving a bound for the recovery problem, and might be of independent interest beyond this work.

Lemma 2. Consider a directed random graph DG(n, p) where p (0, 1) is the probability of choosing any directed edge independently of others. Deﬁne q = 1 p. Let Pn be the probability that DG is connected. Then, the probability that DG is disconnected is 1 PN = nq2(n 1) +O n2q3n .

We will now prove an information theoretic lower bound for the recovery problem under the value oracle model [14]. A problem with an information theoretic lower bound of β has the property that any randomized algorithm that approximates the optimum to within a factor β with high probability needs to make superpolynomial number of queries under the speciﬁed oracle model. In the value oracle model, each query Q corresponds to obtaining the cost/value of any candidate set by issuing Q to the value oracle (which acts as a black-box). We invoke the Yao s minimax principle [28], which states the relation between distributional complexity and randomized complexity. Using Yao s principle, the performance of randomized algorithms can be lower bounded by proving that no deterministic algorithm can perform well on an appropriately deﬁned distribution of hard inputs.

Theorem 3. Let ϵ > 0, and α, δ (0, 1). Let n be the number of players in a local aggregative game, where each player i [n] is provided with some convex -Lipschitz function ui and an aggregator A. Let Dn Dn( , ϵ, A, (ui)i [n]) be the sufﬁcient in-degree (number of incoming edges) of each player such that the game admits some ϵ-PSNE when the players play to maximize their individual payoffs ui according to the local information provided by the aggregator A. Assume any non-negative monotone submodular cost function on the edge set cardinality. Then for any d max{Dn, nα ln n}/(1 α), any randomized algorithm that approximates the game structure to a factor n1 α/(1 + δ)d requires exponentially many queries under the value oracle model.

Proof. (Sketch.) The main idea is to construct a digraph that has exponentially many spanning directed subgraphs, and deﬁne two carefully designed submodular cost functions over the edges of the digraph, one of which is deterministic in query size while the other depends on a distribution. We make it hard for the deterministic algorithm to tell one cost function from the other. This can be accomplished by ensuring two conditions: (a) these cost functions map to the same value on almost all the queries, and (b) the discrepancy in the optimum value of the functions (on the optimum query) is massive. The proof invokes Lemma 2, exploits the degree constraint for ϵ-PSNE, argues about the optimal query size, and appeals to the Yao s minimax principle.

1A spanning directed graph spans all the vertices, and has the property that the (multi)graph obtained by replacing the directed edges with undirected edges is connected.

Theorem 3 might sound pessimistic from a practical perspective, however, a closer look reveals why the query complexity turned out to be prohibitive. The proof hinged on the fact that all spanning subgraphs with same edge cardinality that satisﬁed the sufﬁciency condition for existence of any ϵ-PSNE were equally good with respect to our deterministic submodular function, and we created an instance with exponentially such spanning subgraphs. However, we might be able to circumvent Theorem 3 by breaking the symmetry, e.g., by using data that speciﬁes multiple distinct ϵ-Nash equilibria. Then, since the digraph instance would be required to satisfy these equilibria, fooling the deterministic algorithm would be more difﬁcult. Thus data could, in principle, help us avoid the complexity result of Theorem 3. We will formulate optimization problems that would enforce margin separability on the equilibrium proﬁles, which will further limit the number of potential digraphs and thus facilitate learning the aggregative game. Moreover, the hardness result does not apply to our estimation algorithms that will have access to the (sub)gradients in addition to the function values.

4 γ-Aggregative Games

We now describe a generalization of the local aggregative games, which we call the γ-aggregative games. The main idea behind these games is that a player i [n] may, often, be inﬂuenced not only by the aggregate behavior of its neighbors, but also to a lesser extent on the aggregate behavior of the other players, whose inﬂuence on the payoff of i decreases with increase in their distance to i. Let d G(i, j) be the number of intermediate nodes on a shortest path from j to i in the underlying digraph G = (V, E). That is, d G(i, j) = 0 if (j, i) E, and 1 + mink V \{i,j} d G(i, k) + d G(k, j) otherwise. Let WG maxi,j V d G(i, j) be the width of G. For any strategy proﬁle a {0, 1}n and t {0, 1, . . . , WG}, let It G(i) = {j : d G(i, j) = t} be the set of nodes that have exactly t intermediaries on a shortest path to i, and let a It G(i) be a strategy proﬁle of the nodes in this set. We deﬁne aggregator functions f t G(a, i) f(a It G(i)) that return the aggregate at level t with respect to player i. Let γ (0, 1) be a discount rate. Deﬁne the γ-aggregator function

g G(a, γ, ℓ, i)

t=0 γtf t G(a, i)/

which discounts the aggregates based on the distance ℓ {0, 1, . . . , WG} to i. We assume that player i [n] has a payoff function of the form ui(ai, ), which is convex and η-Lipschitz in its second argument for each ﬁxed ai. Finally, we deﬁne the Lipschitz constant of the γ-aggregative game as

δγ(G) max i,ai,a i,a i {ui(ai, g G(a , γ, WG, i)) ui(ai, g G(a , γ, WG, i))},

where the vectors a i and a i differ in exactly one coordinate.

The main criticism of the concept of ϵ-Nash equilibrium concerns lack of stability: if any player deviates (due to ϵ-incentive), then in general, some other player may have a high incentive to deviate as well, resulting in a non-equilibrium proﬁle. Worse, it may take exponentially many steps to reach an ϵ-equilibrium again. Thus, stability of ϵ-equilibrium is an important consideration. We will now introduce an appropriate notion of stability, and prove that γ-aggregative games admit stable pure strategy ϵ-equilibrium in that any deviation by a player does not affect the equilibrium much.

Structurally Stable Aggregator (SSA): Let G = (E, V ) be a connected digraph and PG(w) be a property of G, where w denotes the parameters of PG. Let A be an aggregator function that depends on PG. Suppose M = (a1, a2, . . . , an) be an ϵ-PSNE when A aggregates information according to PG(w), where ai is the strategy of player i V = [n]. Suppose now A aggregates information according to PG(w ). Then, A is a (α, β)P,w,w -structurally stable aggregator (SSA) with respect to G, where α and β are functions of the gap between w, w , if it satisﬁes these conditions: (a) M is a (ϵ + α)-equilibrium under PG(w ), and (b) the payoff of each player at the equilibrium proﬁle M under PG(w ) is at most β = O(α) worse than that under PG(w).

A SSA with small values of α and β with respect to a small change in w is desirable since that would discourage the players from deviating from their ϵ-equilibrium strategy, however, such an aggregator might not exist in general. The following result shows the γ-aggregator is a SSA.

Theorem 4. Let γ (0, 1), and g G( , , ℓ, ) be the γ-aggregator deﬁned above. Let PG(ℓ) be the property the number of maximum permissible intermediaries in a shortest path of length ℓin G . Then, g G is a (2ηκG, ηκG)P,WG,LSSA, where L < WG and κG depends on γ and WG L.

5 Learning formulation

We now formulate an optimization problem to recover the underlying graph structure, the parameters of the aggregator function, and the payoff functions. Let S = {a1, a2, . . . , a M} be our training set, where each strategy proﬁle am {0, 1}n is an ϵ-PSNE, and am i is the action of player i in example m [M]. Let f be a local aggregator function, and let am Ni be the actions of neighbors Ni of player i [n] on training example m. We will also represent N as a 0-1 adjacency matrix, with the interpretation that Nij = 1 implies that j Ni, and Nij = 0 otherwise. We will use the notation Ni {Nij : j = i}. Note that since the underlying game structure is represented as a digraph, Nij and Nji need not be equal. Let h be a concave function such that h(0) = 0. Then Fi(h) h(|Ni|) is submodular since the concave transformation of the cardinality function results in a submodular function. Moreover F(h) = P

i [n] Fi(h) is submodular since it is a sum of submodular functions. We will use F(h) as a sparsity-inducing prior. Several choices of h have been advocated in the literature, including suitably normalized geometric, log, smooth log and square root functions [15].

We would denote the parameters of the aggregator function f by θf. The payoff functions will depend on the choice of this parameterization. For a ﬁxed aggregator f (such as the sum aggregator), linear parameterization is one possibility, where the payoff function for player i [n] takes the form,

uf i (am, Ni ) = am i wi1(wf f(am Ni) + bf) + (1 am i )wi0(wf f(am Ni) + bf),

where wi = (wi0, wi1) and Ni denote the independent parameters for player i and θf = (wf, bf) are the shared parameters. Our setting is ﬂexible, and we can easily accommodate more complex aggregators instead of the standard aggregators (e.g. sum). Exchangeable functions over sets [16] provide one such example. An interesting instantiation is a neural network comprising one hidden layer, an output sum layer, with tied weights. Speciﬁcally, let W Rn (n 1) where all entries of W are equal to w NN. Let σ be an element-wise non-linearity (e.g. we used the Re LU function, σ(x) = max{x, 0} for our experiments). Then, using the element-wise multiplication operator and a vector 1 with all ones, ui may be expressed as uf NN i (am, Ni ) = am i wi1f NN(am Ni)+(1 am i )wi0f NN(am Ni), where the permutation invariant neural aggregator, parameterized by θf NN = (w NN, b NN) ,

f NN(am Ni) = 1 σ(W am i Ni + b NN).

We could have more complex functions such as deeper neural nets, with parameter sharing, at the expense of increased computation. We believe this versatility makes local aggregative games particularly attractive, and provides a promising avenue for modeling structured strategic settings.

Each am is an ϵ-PSNE, so it ensures a locally (near) optimal reward for each player. We will impose a margin constraint on the difference in the payoffs when player i unilaterally deviates from am i . Note that Ni = {j Ni : Nij = 1}. Then, introducing slack variables ξm i , and hyperparameters C, C , Cf > 0, we obtain the following optimization problem in O(n2) variables:

min θf ,w1 ,...,wn ,Ni ,...,Nn ,ξ 1 2

i=1 ||wi ||2 + Cf

2M ||θf||2 + C

i=1 Fi(h) + C

s.t. i [n], m [M] : uf i (am, Ni ) uf i (1 am, Ni ) e(am, a ) ξm i i [n], m [M] : ξm i 0 i [n] : Ni {0, 1}n 1,

where am and a differ in exactly one coordinate, and e is a margin speciﬁc loss term, such as Hamming loss e H(a, a) = 1{a = a} or scaled 0-1 loss es(a, a) = 1{a = a}/n. From a game theoretic perspective, the scaled loss has a natural asymptotic interpretation: as the number of players n , es(am, a ) 0, and we get i [n], m [M] : uf i (am, Ni ) uf i (1 am, Ni ) ξm i , i.e., each training example am is an ϵ-PSNE, where ϵ = maxi [n],m [M] ξm i .

Once θf are ﬁxed, the problem clearly becomes separable, i.e., each player i can solve an independent sub-problem in O(n) variables. Each sub-problem includes both continuous and binary variables,

and may be solved via branch and bound, outer approximation decomposition, or extended cutting plane methods (see [17; 18] for an overview of these techniques). The individual solutions can be forced to agree on θf via a standard dual decomposition procedure, and methods like alternating direction method of multipliers (ADMM) [19] could be leveraged to facilitate rapid agreement of the continuous parameters wf and bf. The extension to learning the γ-aggregative games is immediate.

We now describe some other optimization variants for the local aggregative games. Instead of constraining each player to a hard neighborhood, one might relax the constraints Nij {0, 1} to Nij [0, 1], where Nij might be interpreted as the strength of the edge (j, i). The Lovász convex relaxation of F [20] is a natural prior for inducing sparsity in this case. Speciﬁcally, for an ordering of values |Ni(0)| |Ni(1)| . . . |Ni(n 1)|, i [n], this prior is given by

i=1 Γh(N, i), where Γh(N, i) =

k=0 [h(k + 1) h(k)]|Ni(k)|.

Since the transformation h encodes the preference for each degree, Γh(N) will act as a prior that encourages structured sparsity. One might also enforce other constraints on the structure of the local aggregative game. For instance, an undirected graph could be obtained by adding constraints Nij = Nji, for i [n], j = i. Likewise, a minimum in-degree constraint may be enforced on player i by requiring P

j Nij d. Both these constraints are linear in Ni , and thus do not add to the complexity of the problem. Finally, based on cues such as domain knowledge, one may wish to add a degree of freedom by not enforcing sharing of the parameters of the aggregator among the players.

6 Experiments

We now present strong empirical evidence to demonstrate the efﬁcacy of local aggregative games in unraveling the aggregative game structure of two real voting datasets, namely, the US Supreme Court Rulings dataset and the Congressional Votes dataset. Our experiments span the different variants for recovering the structure of the aggregative games including settings where (a) parameters of the aggregator are learned along with the payoffs, (b) in-degree of each node is lower bounded, (c) γ-discounting is used, or (d) parameters of the aggregator are ﬁxed. We will also demonstrate that our method compares favorably with the potential games method for tree structured games [4], even when we relax the digraph setting to let weights Nij [0, 1] instead of {0, 1} or force the game structure to be undirected by adding the constraints Nij = Nji. For our purposes, we used the smoothed square-root concave function, h(i) = i + 1 1 + αi parameterized by α, the sum and neural aggregators, and the scaled 0-1 loss function es(a, a) = 1{a = a}/n. We found our model to perform well across a very wide range of hyperparameters. All the experiments described below used the following setting of values: α = 1, C = 100, and Cf = 1. C was also set to 0.01 in all settings except when the parameters of the aggregator were ﬁxed, when we set C = 0.01 n.

Conservatives Liberals

Figure 1: Supreme Court Rulings (full bench): The digraph recovered by the local aggregative and γ-aggregative games (ℓ 2, all γ) with the sum aggregator as well as the neural aggregator is consistent with the known behavior of the Justices: conservative and liberal sides of the bench are well segregated from each other, while the moderate Justice Kennedy is positioned near the center. Numbers on the arrows are taken from an independent study [21] on Justices mutual voting patterns.

6.1 Dataset 1: Supreme Court Rulings

We experimented with a dataset containing all non-unanimous rulings by the US Supreme court bench during the year 2013. We denote the Justices of the bench by their last name initials, and add a second character to some names to avoid the conﬂicts in the initials: Alito (A), Breyer (B), Ginsburg(G), Kennedy (K), Kagan (Ka), Roberts (R), Scalia (S), Sotomayor (So), and Thomas (T). We obtained a binary dataset following the procedure described in [4].

(a) Local Aggregative

(b) Potential Exhaustive Enumeration

(c) Local Aggregative (Undirected & Relaxed)

(d) Potential Hamming

Figure 2: Comparison with the potential games method [4]: (a) The digraph produced by our method with the sum as well as the neural aggregator is consistent with the expected voting behavior of the Justices on the data used by [4] in their experiments. (c) Relaxing all Nij [0, 1] and enforcing Nij = Nji still resulted in a meaningful undirected structure. (b) & (d) The tree structures obtained by the brute force and the Hamming distance restricted methods [4] fail to capture higher order interactions, e.g., the strongly connected component between Justices A, T, S and R. .

(a) Local Aggregative (d >= 2)

(b) γ aggregative (ℓ= 2, γ = 0.9)

Figure 3: Degree constrained and γ-aggregative games: (a) Enforcing the degree of each node to be at least 2 reinforces the intra-republican and the intra-democrat afﬁnity, reafﬁrming their respective jurisprudences, and (b) γ-aggregative games also support this observation: the same digraph as Fig. 2(a) is obtained unless ℓand γ are set to high values (plot generated with ℓ= 2, γ = 0.9), when the strong effect of one-hop and two-hop neighbors overpowers the direct connection between B and G.

Fig. 1 shows the structure recovered by the local aggregative method. The method was able to distinguish the conservative side of the court (Justices A, R, S, and T) from the left side (B, G, Ka, and So). Also, the structure places Justice Kennedy in between the two extremes, which is consistent with his moderate jurisprudence. To put our method in perspective, we also compare the result of applying our method on the same subset of the full bench data that was considered by [4] in their experiments. Fig. 2 demonstrates how the local aggregative approach estimated meaningful structures consistent with the full bench structure, and compared favorably with both the methods of [4]. Finally, Fig. 3(a)

and 3(b) demonstrate the effect of enforcing minimum in-degree constraints in the local aggregative games, and increasing ℓand γ in the γ-aggregative games respectively. As expected, the estimated γ-aggregative structure is stable unless γ and ℓare set to high values when non-local effects kick in. We provide some additional results on the degree-constrained local aggregative games (Fig. 4 ) and the γ-aggregative games (Fig. 5). In particular, we see that the γ-aggregative games are indeed robust to small changes in the aggregator input as expected in the light of stability result of Theorem 4.

Conservatives

Figure 4: Degree constrained local aggregative games (full bench): The digraph recovered by the local aggregative method when the degree of each node was constrained to be at least 2. Clearly, the cohesion among the Justices on the conservative side got strengthened by the degree constraint (likewise for the liberal side of the bench). On the other hand, no additional edges were added between the two sides.

Conservatives Liberals

Figure 5: γ-Aggregative Games (full bench): The digraph estimated by the γ-aggregative method for ℓ= 2, γ = 0.9, and lower values of γ and/or ℓ. Note that an identical structure was obtained by the local aggregative method (Fig. 1). This indicates that despite heavily weighting the effect of the nodes on a shortest path with one or two intermediary hops, the structure in Fig. 1 is very stable. Also, this substantiates our theoretical result about the stability of the γ-aggregative games.

6.2 Dataset 2: Congressional Votes

We also experimented with the Congressional Votes data [22], that contains the votes by the US Senators on all the bills of the 110 US Congress, Session 2. Each of the 100 Senators voted in favor of (treated as 1) or against each bill (treated as 0). Fig. 6 shows that the local aggregative method provides meaningful insights into the voting patterns of the Senators as well. In particular, few connections exist between the nodes in red and those in blue, making the bipartisan structure quite apparent. In some cases, the intra-party connections might be bolstered due to same state afﬁliations, e.g. Senators Corker (28) and Alexander (2) represent Tennessee. The cross connections may also capture some interesting collaborations or inﬂuences, e.g., Senators Allard (3) and Clinton (22) introduced the Autism Act. Likewise, Collins (26) and Carper (19) reintroduced the Fire Grants Reauthorization Act. The potential methods [4] failed to estimate some of these strategic interactions. Likewise, Fig. 7 provides some interesting insights regarding the ideologies of some Senators that follow a more centrist ideology than their respective political afﬁliations would suggest.

Figure 6: Comparison with [4] on the Congressional Votes data: The digraph recovered by local aggregative method, on the data used by [4], when the parameters of the sum aggregator were ﬁxed (wf = 1, bf = 0). The segregation between the Republicans (shown in red) and the Democrats (shown in blue) strongly suggests that they are aligned according to their party policies.

Figure 7: Complete Congressional Votes data: The digraph recovered on ﬁxing parameters, relaxing Nij to [0, 1], and thresholding at 0.05. The estimated structure not only separates majority of the reds from the blues, but also associates closely the then independent Senators Sanders (82) and Lieberman (62) with the Democrats. Moreover, the few reds among the blues generally identify with a more centrist ideology - Collins (26) and Snowe (87) are two prominent examples.

An overwhelming majority of literature on machine learning is restricted to modeling non-strategic settings. Strategic interactions in several real world systems such as decision/voting often exhibit local structure in terms of how players are guided by or respond to each other. In other words, different agents make rational moves in response to their neighboring agents leading to locally stable conﬁgurations such as Nash equilibria. Another challenge with modeling the strategic settings is that they are invariably unsupervised. Consequently, standard learning techniques such as structured prediction that enforce global consistency constraints fall short in such settings (cf. [4]). As substantiated by our experiments, local aggregative games nicely encapsulate various strategic applications, and could be leveraged as a tool to glean important insights from voting data. Furthermore, the stability of approximate equilibria is a primary consideration from a conceptual viewpoint, and the γ-aggregative games introduced in this work add a fresh perspective by achieving structural stability.

[1] J. Lafferty, A. Mc Callum, and F. Pereira. Conditional random ﬁelds: Probabilistic models for segmenting and labeling sequence data, ICML, 2001.

[2] B. Taskar, C. Guestrin, and D. Koller. Max-margin Markov networks, NIPS, 2003.

[3] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables, JMLR, 6(2), pp. 1453-1484, 2005.

[4] V. K. Garg and T. Jaakkola. Learning Tree Structured Potential Games, NIPS, 2016.

[5] C. Daskalakis and C. H. Papadimitriou. Approximate Nash equilibria in anonymous games, Journal of Economic Theory, 156, pp. 207-245, 2015.

[6] J. Nash. Non-Cooperative Games, Annals of Mathematics, 54(2), pp. 286-295, 1951.

[7] R. Selten. Preispolitik der Mehrproduktenunternehmung in der Statischen Theorie, Springer-Verlag, 1970.

[8] M. Kearns and Y. Mansour. Efﬁcient Nash computation in large population games with bounded inﬂuence, UAI, 2002.

[9] R. Cummings, M. Kearns, A. Roth, and Z. S. Wu. Privacy and truthful equilibrium selection for aggregative games, WINE, 2015.

[10] R. Cornes and R. Harley. Fully Aggregative Games, Economic Letters, 116, pp. 631-633, 2012.

[11] W. Novshek. On the Existence of Cournot Equilibrium, Review of Economic Studies, 52, pp. 86-98, 1985.

[12] M. K. Jensen. Aggregative Games and Best-Reply Potentials, Economic Theory, 43, pp. 45-66, 2010.

[13] J. M. Lasry and P. L. Lions. Mean ﬁeld games, Japanese Journal of Mathematics, 2(1), pp. 229-260, 2007.

[14] G. Goel, C. Karande, P. Tripathi, and L. Wang. Approximability of Combinatorial Problems with Multiagent Submodular Cost Functions, FOCS, 2009.

[15] A. J. Defazio and T. S. Caetano. A convex formulation for learning scale-free networks via submodular relaxation, NIPS, 2012.

[16] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. Salakhutdinov, and A. Smola. Deep Sets, ar Xiv:1703.06114, 2017.

[17] P. Bonami et al. An algorithmic framework for convex mixed integer nonlinear programs, Discrete Optimization, 5(2), pp. 186-204, 2008.

[18] P. Bonami, M. Kilinç, and J. Linderoth J. Algorithms and Software for Convex Mixed Integer Nonlinear Programs, Mixed Integer Nonlinear Programming, The IMA Volumes in Mathematics and its Applications, 154, Springer, 2012.

[19] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3, 2011.

[20] F. Bach. Structured sparsity-inducing norms through submodular functions, NIPS, 2010.

[21] J. Bowers, A. Liptak and D. Willis. Which Supreme Court Justices Vote Together Most and Least Often, The New York Times, 2014.

[22] J. Honorio and L. Ortiz. Learning the Structure and Parameters of Large-Population Graphical Games from Behavioral Data, JMLR, 16, pp. 1157-1210, 2015.

[23] Y. Azrieli and E. Shmaya. Lipschitz Games, Mathematics of Operations Research, 38(2), pp. 350-357, 2013.

[24] E. Kalai. Large robust games. Econometrica, 72(6), pp. 1631-1665, 2004.

[25] E. N. Gilbert. Random Graphs, The Annals of Mathematical Statistics, 30(4), pp. 1141-1144,1959.

[26] W. Feller. An Introduction to Probability Theory and its Applications, Vol. 1, Second edition, Wiley, 1957.

[27] M.-F. Balcan and N. J. A. Harvey. Learning Submodular Functions, STOC, 2011.

[28] A. Yao. Probabilistic computations: Toward a uniﬁed measure of complexity, FOCS, 1977.

[29] U. Feige, V. S. Mirrokni, and J. Vondrak. Maximizing non-monotone submodular functions, FOCS, 2007.

[30] M. X. Goemans, N. J. A. Harvey, S. Iwata, and V. S. Mirrokni. Approximating submodular functions everywhere, SODA, 2009.

[31] Z. Svitkina and L. Fleischer. Submodular approximation: Sampling based algorithms and lower bounds, FOCS, 2008.