# biased_games__f95085f9.pdf Biased Games Ioannis Caragiannis University of Patras caragian@ceid.upatras.gr David Kurokawa Carnegie Mellon University dkurokaw@cs.cmu.edu Ariel D. Procaccia Carnegie Mellon University arielpro@cs.cmu.edu We present a novel extension of normal form games that we call biased games. In these games, a player s utility is influenced by the distance between his mixed strategy and a given base strategy. We argue that biased games capture important aspects of the interaction between software agents. Our main result is that biased games satisfying certain mild conditions always admit an equilibrium. We also tackle the computation of equilibria in biased games. 1 Introduction Since the seminal work of Nash (1950), the notion of Nash equilibrium has been the cornerstone of game theory (and its interaction with AI, and computer science more broadly). Nash equilibrium is most commonly employed as a solution concept for normal form games, which are defined by a set of n players, their sets of pure strategies, and their payoff functions that map profiles of pure strategies (reflecting the choice of each player) to the player s payoff. In Nash equilibrium, each player s choice of strategy is a best response to other players strategies, in the sense that the player cannot secure a higher payoff by deviating to a different strategy. Nash s key mathematical contribution was to show that if players have the option of playing mixed strategies probability distributions over pure strategies then a Nash equilibrium always exists. More specifically, the payoffs under a mixed strategy profile are simply the expected payoffs. To compute this expectation, note that the probability of each possible pure strategy profile (s1,...,sn) is the product of probabilities of each player i playing strategy si, and the payoffs for this pure strategy profile are given by the players payoff functions. Importantly, even when players play mixed strategies, their utilities are completely determined by the payoff functions, which only take pure strategies as input. In this paper, we are interested in a fundamentally different way in which a player s mixed strategy can directly affect his utility. Specifically, the player may be biased towards (or away from) a specific base strategy, so his utility may also depend on the distance between his mixed strategy and the base strategy. Copyright c 2014, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. We believe that these issues have been largely overlooked, as normal form games are typically seen as one-shot interactions: the mixed strategy is only important insofar as it selects a pure strategy. However, the mixed strategy itself can play a more significant role in some settings, justifying the preceding notions of bias: In computational environments (e.g., networks), the mixed strategies of software agents can be encoded as programs that are submitted to a server, and therefore the mixed strategies themselves are visible to certain parties. Such game-theoretic settings were nicely motivated in the work of Rozenfeld and Tennenholtz (2007), and their justification is also implicit in the earlier work of Tennenholtz (2004). Once mixed strategies are visible, bias towards certain mixed strategies can arise due to social norms agents are expected to play certain strategies, mediation (Monderer and Tennenholtz 2009; Rozenfeld and Tennenholtz 2007) agents are told to play certain strategies, and privacy certain mixed strategies reveal more about the agent s preferences than others. In other settings, mixed strategies may be instantiated multiple times before the agents actually interact. For example, security games (Tambe 2012) are 2-player games played by a defender and an attacker. The defender s strategy specifies a random allocation of security resources to potential targets, and the attacker s strategy pinpoints a target that will be attacked. It is typically assumed that the defender would play a mixed strategy for a period of time before the attacker makes his move. The crux of this example is that redeploying security resources (such as boats, in the case of the US Coast Guard) is costly, and different instantiations of a mixed strategy lead to different deployments. This can bias the defender, say, towards pure strategies, or away from high-entropy strategies. While security games are often viewed as Stackelberg games (where the defender moves first), they have also been studied as normal form games that are solved using Nash equilibrium (Korzhyk et al. 2011). Rather than representing an agent s actual utility, a bias towards a specific strategy may be hard-coded in order to lead the agent closer to certain behaviors that are known to be beneficial. For example, state-of-the- Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence art computer poker agents rely on computing an approximation of one equilibrium strategy (Sandholm 2010; Billings et al. 2003). One can bias agents towards strategies that, e.g., maximize opponent exploitation or minimize exploitability, and directly find an equilibrium of the biased game (as we will show). This approach is somewhat reminiscent of the work of Johanson et al. (2007), but is fundamentally different from most existing approaches for opponent exploitation, which create a model for the opponent s strategy that is biased towards a precomputed equilibrium strategy (of the original game), such that it is consistent with the agent s observations of its play thus far (Ganzfried and Sandholm 2011); or perform the bias on the agent s own strategy, requiring its exploitative strategy to be close to an equilibrium strategy of the original game (Ganzfried and Sandholm 2012). 1.1 Overview of Our Model and Results Our definition of a biased game starts from a normal form game, but lets the utility of player i for a strategy profile p = (p1,..., pn) be his (expected) payoff in the normal form game, minus a bias term of the form fi ( pi pi ). Here, fi( ) is some real-valued function (possibly 0 everywhere), is a norm, and pi is the base strategy of i. Given a biased game that includes of all these components, we can define an equilibrium in the same way that the Nash equilibrium is defined (each strategy is a best response to other strategies). We remark that a biased game and its equilibria cannot be captured as a normal form game and its Nash equilibria due to the potential nonlinearity of the utilities. From a mathematical viewpoint, we believe that biased games are a very natural (strict) generalization of normal form games. And from a conceptual viewpoint, as we have argued above, they provide a new perspective on important, AI-related problems. In 3 we prove our main result: the existence of equilibria in biased games. Specifically, we show that an equilibrium always exists if each fi is a non-decreasing continuous convex function, and the norms are all Lp norms. We also construct a biased game that satisfies these conditions, except for having decreasing fi functions, in which equilibria do not exist. In 4 we make some progress on the computation of equilibria in biased games, by (significantly) generalizing a basic algorithm for Nash equilibrium computation. 1.2 Related Work A line of work that is somewhat related to ours includes the classic paper on psychological games (Geanakoplos, Pearce, and Stacchetti 1989), and influential papers that extended these ideas (see, e.g., (Battigalli and Dufwenberg 2009)). Psychological games are extensions of normal form games, where each player s utility depends on: (i) the mixed strategies played by all players (as usual), and (ii) his beliefs. A psychological game also includes, for each player, a coherent response mapping, which maps the beliefs held by others and strategies played by others to the player s belief. A psychological equilibrium consists of a vector of beliefs, and a vector of strategies, such that the beliefs are coherent, and given these beliefs, the strategies form a Nash equilibrium. Of course, in order to obtain technical results (such as equilibrium existence), one must restrict the structure of these various components of the game. One may wonder, though, whether our bias term can be encoded in a player s beliefs. This is not the case, because in biased games the utility functions are defined in a way that, when the beliefs are fixed, we obtain a normal form game. That is, the beliefs only play a role to the extent that they determine the payoffs in the induced normal form game. Therefore, the psychological games framework cannot model our novel notion of bias. 2 Our Model Before we begin, it will be helpful to establish some notation which we will use throughout the paper. m : [m] = {1,2,...,m}. Pn = {x Rn| n i=1 xi = 1, i : xi 0} is the space of allowed probability vectors in n dimensions. Given a vector x Rn, we define (y,x i) as (x1,x2,...,xi 1,y,xi+1,xi+2,...,xn). We begin with the definition of the centerpiece of study in game theory. Definition 1. A normal form game with n players is given by: An n-tuple of finite strategy spaces (S1,S2,...,Sn). An n-tuple of multilinear payoff functions (T1,T2,...,Tn) where Ti : P|S1| P|S2| P|Sn| R is linear in each of its n parameters for all i. In such a game, player i plays a mixed strategy, which is a probability distribution pi P|Si| over his finite strategy space Si, and receives an (expected) utility of Ti(p) where p = (p1, p2,..., pn). Alternatively, one can think of the payoff functions as assigning a payoff to every profile of pure strategies in S1 Sn. These functions induce payoffs for a mixed strategy profile by calculating expected payoffs. We will find it more convenient, though, to think of the domain of the functions Ti as mixed strategy profiles. Importantly, when using a normal form game to analyze or predict actions of agents it is sensible to assume they act rationally: they will maximize their utility to the best of their ability. As a result, the equilibrium notion where each agent i best responds to other s strategies by choosing a distribution pi such that Ti(p) maxp i Ti(p i, p i) (i.e., such that no other mixed strategy gives a higher payoff) is a natural focal point of study. This is known as a Nash Equilibrium of a normal form game, and is so named due to Nash s proof of its existence in any such game (Nash 1950). The main focus of our paper is on an extension of normal form games that we call biased games. Informally, a biased game is a normal form game but where the utilities of the players are determined by a multilinear payoff function (similarly to the standard setting) summed to a function of the distance of the player s strategy to some base strategy. Definition 2. A biased game with n players includes an ntuple of finite strategy spaces (S1,S2,...,Sn). For all i, player i s utility for a mixed strategy profile is given by ui(p) := Ti(p) fi ( pi pi ), where: p = (p1, p2,..., pn). The Ti are multilinear payoff functions similar to the classic setting. The fi are real-valued functions. The norms are any valid norms (possibly different for different players).1 The pi describe base strategies in R|Si| it is not necessary, but intuitive, that they be in P|Si|. For ease of exposition we define the bias term as fi ( pi pi ). Notice that despite the name biased game, we allow that all fi = 0 which reduces immediately to the definition of a standard game. Definition 3. An equilibrium of a biased game is a strategy profile p such that ui(p) ui(qi, p i) for all i [n] and all qi P|Si|. Below we illustrate these definitions using a simple example. Example 1. Let us consider a security game with two players, the defender and the attacker, and two targets. The defender has one resource that can cover a single target. He receives a utility of 1 if he catches the attacker and 0 otherwise. Similarly, the attacker receives a utility of 1 if he attacks an undefended target and 0 otherwise. Now suppose the defender has a base strategy of [3/4,1/4] (for pedagogical purposes). The utilities may then be described by: u1(p1, p2) = p 1 p2 2 p1 3/4 1/4 u2(p1, p2) = p 1 , p2 = y 1 y then simple analysis shows that simultaneously maximizing u1 w.r.t. x and u2 w.r.t. y, is equivalent to: 1 if x < 1/2 0 if x > 1/2 anything otherwise We can then see that the only equilibrium is for x = 5/8 and y = 0. 1We do not attach an index to the norm to avoid confusion, as we later use subscripts for Lp norms and superscripts for exponents. 3 Existence of Equilibria In this section we examine the question of existence of equilibria in biased games, and prove our main result: Theorem 1. Every biased game in which fi is a nondecreasing continuous convex function for all i [n], and the norms are all Lp norms, has an equilibrium. Proof. Consider the function h(p) = (q1,q2,...,qn) where pi,qi P|Si| are probability distributions over i s actions and qi = argmaxp i vi(p i, p) with vi(p i, p) = ui(p i, p i) p i pi 2 2. We first show that this function is a well-defined2 continuous function and thus, because it acts upon a convex compact set, must have a fixed point by Brouwer s theorem. We then proceed to show that any fixed point of h must be an equilibrium. Lemma 1. h is well-defined. Proof. Let i be given. We show that qi is well-defined. Since vi(p i, p) as a function of p i is a continuous function on a compact space it must achieve its maximum. It therefore suffices to show that there exists a unique maximizer to vi. Suppose for the purposes of contradiction there exist two such qi, denoted x and y. Then let α (0,1) and z = αx+(1 α)y. Now consider the value vi would achieve at z: vi(z, p) = ui(z, p i) z pi 2 2 = Ti(z, p i) fi ( z pi ) z pi 2 2. Specifically, let us consider each term separately: Ti(z, p i) = Ti(αx+(1 α)y, p i) = αTi(x, p i)+(1 α)Ti(y, p i). fi ( z pi ) = fi ( αx+(1 α)y α pi (1 α) pi ) = f ( α (x pi)+(1 α)(y pi) ) f (α x pi +(1 α) y pi ) α f ( x pi )+(1 α)f ( y pi ). where we have used the triangle inequality and the definition of convexity. z pi 2 2 = αx+(1 α)y pi 2 2 = α(x pi)+(1 α)(y pi) 2 2 (α x pi 2 +(1 α) y pi 2)2 α x pi 2 2 +(1 α) y pi 2 2 where we have again used the triangle inequality and the definition of convexity. Importantly, there are two critical differences between this term and the previous one. First, in this case the first inequality (created due to the use of the triangle inequality of 2) achieves equality if and 2We must prove that the argmax produces exactly one probability distribution. only if x pi = β(y pi) for some β 0. This is a wellknown fact of Lp norms3 and intuitively the condition is equivalent to the two vectors facing the same direction. Second, in this case the second equality (created due to the use of convexity) is over a strictly convex function. We thus have equality if and only if x pi 2 = y pi 2. That is, the two vectors have the same magnitude. From these two observations, we see that we have equality if and only if x pi = y pi or more explicitly, x = y. As we are assuming this is not the case, we have strict inequality. With this analysis of each term in separation, we get the following: vi(z, p) = Ti(z, p i) fi ( z pi ) z pi 2 2 > αTi(x, p i)+(1 α)Ti(y, p i) α f ( x pi ) (1 α)f ( y pi ) α x pi 2 2 (1 α) y pi 2 2 = αvi(x, p)+(1 α)vi(y, p) = αvi(x, p)+(1 α)vi(x, p) = vi(x, p). Thus, z is strictly better than the assumed maximizers x and y! This is a clear contradiction. (Lem 1) Lemma 2. h is continuous. Proof. We first establish some notation which we will use throughout the proof. P will be the set of values that p can be drawn from. will be the L2 norm for vectors in Rn and for p P, p 2 = n i=1 pi 2. The induced metrics of these norms will allow us to formally discuss continuity. p P will denote the arbitrary input point to h for which we wish to show continuity. Now it suffices to show that qi is continuous at p for arbitrary i. Suppose ε > 0 is given. We wish to show that there exists a δ such that p p < δ implies qi q i < ε where qi = argmaxx vi(x, p) and q i = argmaxx vi(x, p ). Let X = {x | x qi < ε} and assume that Xc is nonempty (if this is not the case, decrease the size of ε until this is the case). Furthermore, as Xc is a closed subset of a compact set, it is compact as well. Now note that vi(y, p) is a continuous function of y and therefore achieves its maximum on the compact set Xc, denoted Mε. Importantly, Lemma 1 shows that M = Mε where M = max x vi(x, p) = vi(qi, p). We now further observe that vi(x,q) is a continuous function of q on a compact set and thus, by the Heine-Cantor theorem is uniformly continuous. We therefore have that there exists some δ > 0 such that if p p < δ then |vi(x, p) vi(x, p )| < M Mε 3This is due to the biconditional conditions for equality in Minkowski s inequality. for all x. Two important consequences of this are: vi(qi, p ) = vi(qi, p) vi(qi, p)+vi(qi, p ) vi(qi, p) |vi(qi, p) vi(qi, p )| 2 For all y Xc: vi(y, p ) = vi(y, p) vi(y, p)+vi(y, p ) vi(y, p)+|vi(y, p) vi(y, p )| < Mε + M Mε 2 Together, these imply that vi(qi, p ) > vi(x, p ) and so argmaxx vi(x, p ) X. Thus, qi q i < ε. (Lem 2) We have shown that h is indeed well-defined and continuous. By Brouwer s fixed point theorem, it must therefore have a fixed point. Our next lemma demonstrates that such fixed points are equilibria. Lemma 3. Any fixed point of h represents an equilibrium in the biased game. Proof. Suppose for the purposes of contradiction we have a fixed point of h that is not an equilibrium. That is, suppose h(p) = p and there exists some i such that player i gains by the unilateral deviation from pi to p i. Now let α [0,1] and z = α p i +(1 α)pi. Then we have: vi(z, p) = ui(z, p i) z pi 2 2 = Ti(z, p i) fi ( z pi ) z pi 2 2. Similarly to the proof of Lemma 1 we find that Ti(z, p i) = αTi(p i, p i)+(1 α)Ti(pi, p i) fi( z pi ) α f p i pi +(1 α)f ( pi pi ). z pi 2 2 = α p i +(1 α)pi pi 2 2 = α(p i pi) 2 2 = α2 p i pi 2 2. We therefore find that: = Ti(z, p i) fi ( z pi ) z pi 2 2 αTi(p i, p i)+(1 α)Ti(pi, p i) α f p i pi (1 α)f ( pi pi ) α2 p i pi 2 2 = αui(p i, p i)+(1 α)ui(pi, p i) α2 p i pi 2 2. Now let us consider this lower bound, call LB. For α = 0 we have LB = ui(pi, p i) which is tight (as z = pi in this case). That is, we have equality. Moreover, we have α LB|α=0 = ui(p i, p i) ui(pi, p i). Importantly, we assumed that it was advantageous for player i to deviate from pi to p i and thus we have that ui(p i, p i) ui(pi, p i) > 0. Therefore, there exists a small α such that vi(z, p) > ui(pi, p i) = ui(pi, p i) pi pi 2 2 = vi(pi, p). This is a clear contradiction. (Lem 3) To conclude, we have argued that h has a fixed point, and every fixed point is an equilibrium of the biased game. This completes the proof of the theorem. (Thm 1) Let us now examine the assumptions that Theorem 1 makes. The assumption that the fi are continuous and convex, and that the norms are Lp norms, can be seen as mild technical assumptions. But the assumption that the fi are non-decreasing has more conceptual bite. What would a biased game with decreasing fi look like? For example, recall our discussion of security games, where a mixed strategy equates to a potentially costly redeployment of resources. One of the possible interpretations is to set the base strategy to be the uniform distribution (that is, a uniformly random assignment of resources), and let the defender be biased away from this strategy, that is, the fi functions are negative and strictly decreasing. Unfortunately, it turns out that Theorem 1 is simply not true for decreasing fi functions: a biased game may not admit equilibria even when all the other assumptions hold, i.e., the fi are continuous convex functions and the norms are Lp norms. Below we construct such an example. Example 2. Suppose we have a two-player game with: T1 = 4 0 0 0 , T2 = 0 4 0 0 where the Bi describe the bias terms. Then the utility of the (row) player 1 as a function of x and y is: u1(x,y) = 2(x2 + 2(y 1)x + 1). Similarly, the utility of the (column) player 2 is: u2(x,y) = 2(y2 2xy+2x). Now note that as u1 is an upward-facing parabola in x, its maximum over the set x [0,1] is reached at one of the endpoints (i.e. x {0,1}). So let us consider these two cases. Suppose first that x = 0. Then u2(x,y) = u2(0,y) = 2y2 and so u2 is maximized for y [0,1] when y = 1. However, this implies that u1(x,y) = u1(x,1) = 2x2 +2 and thus u1 is maximized when x = 1 a contradiction. Now suppose instead that x = 1. Then u2(x,y) = u2(1,y) = 2(y2 2y+2) and so u2 is maximized for y [0,1] when y = 0. However, this implies that u1(x,y) = u1(x,0) = 2(x2 2x + 1) and thus u1 is maximized when x = 0 a contradiction. 4 Computation of Equilibria In this section we investigate the computation of equilibria in biased games. From the outset our expectations are quite low, as even in normal form games, computing a Nash equilibrium is computationally hard (Daskalakis, Goldberg, and Papadimitriou 2009; Conitzer and Sandholm 2008). However, there are various equilibrium computation algorithms that work well on average or in special cases, such as the famous Lemke-Howson (1964) Algorithm. A major challenge in our (much more general) setting is that algorithms for computing Nash equilibria in normal form games typically rely on the property that if p is a Nash equilibrium and si is a pure strategy in the support of pi, then si is itself a best response to p i. This is clearly not the case when it comes to biased games and their equilibria. This can be seen in Example 1, where pure strategies in the support of p1 are never best responses to p2. To further illustrate the difficulties that one encounters in our setting, we note that in normal form games with two players and rational payoffs, there exist Nash equilibria defined by rational probabilities (Nash 1950), and this is, of course, true if there is only one player. In contrast, below we give an example of a biased game with one player and rational parameters, but only irrational equilibrium. Note that this also demonstrates the strict generality of this setting to the standard setting. Example 3. Consider the game where the sole player has two strategies with payoffs 1 and 0, and the bias term is p [1/2,1/2] 4 2. A simple analysis then yields that the sole equilibrium is to play the first strategy with probability 1 2 + 1 2 3 Due to these difficulties, we focus on certain subsets of biased games (which, in particular, circumvent Example 3). Specifically, we consider the two-player (and later, more generally the n-player) setting with a bias term of the form c 1 or c 2 2 where c 0 is some constant. Crucially, this still generalizes the classic setting. Our goal is to generalize the (extremely simple) support enumeration algorithm for computing Nash equilibria (see, e.g., (von Stengel 2007, Algorithm 3.4)). Let us first consider the L2 case: player i has a bias term of the form ci 2 2 where ci 0. Recall that for each player i, if the strategy of the other player is fixed, then i simply wishes to maximize his utility ui(pi). That is, for every player i, we wish to have that ui(pi) is maximized subject to the constraints that the entries of pi are nonnegative and sum to one. The Karush-Kuhn-Tucker (KKT) conditions on a player s utility then give necessary and sufficient conditions for maximization sufficiency is due to the concavity of the objective and the affine nature of the constraints. Thus, equilibrium computation is equivalent to solving the following system. For all i and pure strategies j of i: pi,j 0 µi,j 0 µi,jpi,j = 0 p i 1 = 1 STANDARD(i, j) 2 BIAS(i, j)+λi + µi,j = 0. STANDARD(i, j) = d dpi,j Ti(p1, p2), and BIAS(i, j) = ci (pi,j pi,j). Crucially, aside from the µi,jpi,j = 0 conditions, the complete characterization is then a linear feasibility program. We can thus consider the 2|S1|+|S2| possibilities (recall that |Si| is the number of pure strategies of player i) of which one of µi,j and pi,j are zero to find the equilibria. That is, for every player i and strategy j of i we set one of µi,j and pi,j to zero and solve the resulting linear program. This computes an equilibrium exactly (albeit in exponential time). Dealing with bias terms of the form ci 1 where ci 0 is largely analogous. The important difference appears due to the discontinuity of the derivative of the L1 norm. Via a simple case analysis which we omit here, we see that for all i and pure strategies j of i: pi,j 0 µi,j 0 µi,jpi,j = 0 and at least one of two discontinuity-caused requirements must be satisfied: either STANDARD(i, j) BIAS(i, j)+λi + µi,j = 0 BIAS(i, j) 0 STANDARD(i, j)+BIAS(i, j)+λi + µi,j = 0 BIAS(i, j) 0. Therefore, to compute an equilibrium when all players have c 1 bias terms we can consider the 4|S1|+|S2| linear programs that arise. That is, for every player i who has a c 1 bias term, and strategy j of i, we set one of µi,j and pi,j to zero (as before) and also determine which of the two discontinuity-caused requirements we will choose to include in our linear program the players who have c 2 2 bias terms are dealt with as before. The next statement summarizes the discussion above. Theorem 2. Given a biased game with two players and bias terms of the form ci pi pi 2 2 or ci pi pi 1, an equilibrium can be computed by solving at most 4|S1|+|S2| linear programs. Although we have discussed the algorithm only for the two-player setting, this approach can be extended to multiple players at the cost of solving multilinear programs instead of linear ones. The discrepancy is caused due to the STANDARD(i, j) term: the derivative in its expression is multilinear for n 3. This is a generalization of the sequential approach discussed by Widger and Grosu (2009) for normal form games. 5 Discussion In our model of biased games, a player s bias term is fi ( pi pi ). Consider a more general definition of the bias term, fi pi pi + j [n] γi,jp j , where the γi,j are real weights such that γi,j = 0 if players i and j do not have the same number of pure strategies. The new term j [n] γi,jpj captures a different type of bias, towards playing a mixed strategy that is similar to, or different from, those played by other players. This extension can help capture well-known social phenomena, such as homophily (the propensity of individuals to bond with others that are similar to themselves). Interestingly, all our results (existence and computation) also hold for the more general definition of biased games. However, it seems poorly motivated in the context of the settings discussed in Section 1, which is why we chose to simplify the presentation by focusing on the more restricted definition of biased games. We feel that our existence result (Theorem 1) is rather general, but the computation of equilibria in biased games poses difficult problems even for zero-sum games. Still, we have been able to identify some rather interesting properties of two-player zero-sum biased games with bias terms of the form c pi d pi 2 2 for some constants c > 0 and d, which lead us to conjecture that, under these conditions, there exists a unique equilibrium. Of course, the conditions of this conjecture are quite restrictive. In order to tackle equilibrium computation in biased games under assumptions as general as those of Theorem 1, fundamentally new ideas are required. Acknowledgements We thank Sam Ganzfried and the anonymous AAAI 14 reviewers for helpful comments. Kurokawa and Procaccia were supported in part by the NSF under grants NSF IIS1350598 and NSF CCF-1215883. Caragiannis was supported in part by the European Social Fund and Greek national funds through the research funding program Thales on Algorithmic Game Theory and by a Caratheodory research grant from the University of Patras. Battigalli, P., and Dufwenberg, M. 2009. Dynamic psychological games. Journal of Economic Theory 144(1):1 35. Billings, D.; Burch, N.; Davidson, A.; Holte, R.; Schaeffer, J.; Schauenberg, T.; and Szafron, D. 2003. Approximating game-theoretic optimal strategies for full-scale poker. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI), 661 668. Conitzer, V., and Sandholm, T. 2008. Complexity results about Nash equilibria. Games and Economic Behavior 63(2):621 641. Daskalakis, C.; Goldberg, P. W.; and Papadimitriou, C. H. 2009. The complexity of computing a Nash equilibrium. SIAM Journal on Computing 39(1):195 259. Ganzfried, S., and Sandholm, T. 2011. Game theory-based opponent modeling in large imperfect-information games. In Proceedings of the 10th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 533 540. Ganzfried, S., and Sandholm, T. 2012. Safe opponent exploitation. In Proceedings of the 13th ACM Conference on Electronic Commerce (EC), 587 604. Geanakoplos, J.; Pearce, D.; and Stacchetti, E. 1989. Psychological games and sequential rationality. Games and Economic Behavior 1:60 79. Johanson, M.; Zinkevich, M.; and Bowling, M. 2007. Computing robust counter-strategies. In Proceedings of the 21st Annual Conference on Neural Information Processing Systems (NIPS). Korzhyk, D.; Yin, Z.; Kiekintveld, C.; Conitzer, V.; and Tambe, M. 2011. Stackelberg vs. Nash in security games: An extended investigation of interchangeability, equivalence, and uniqueness. Journal of Artificial Intelligence Research 41:297 327. Lemke, C. E., and Howson, J. T. 1964. Equilibrium points of bimatrix games. SIAM Journal on Applied Mathematics 12(2):413 423. Monderer, D., and Tennenholtz, M. 2009. Strong mediated equilibrium. Artificial Intelligence 173(1):180 195. Nash, J. F. 1950. Equilibrium points in N-person games. Proceedings of the National Academy of Sciences of the United States of America 36:48 49. Rozenfeld, O., and Tennenholtz, M. 2007. Routing mediators. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), 1488 1493. Sandholm, T. 2010. The state of solving large incompleteinformation games, and application to poker. AI Magazine 31(4):13 32. Tambe, M. 2012. Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned. Cambridge University Press. Tennenholtz, M. 2004. Program equilibrium. Games and Economic Behavior 49(2):363 373. von Stengel, B. 2007. Equilibrium computation for twoplayer games in strategic and extensive form. In Nisan, N.; Roughgarden, T.; Tardos, E.; and Vazirani, V., eds., Algorithmic Game Theory. Cambridge University Press. chapter 9. Widger, J., and Grosu, D. 2009. Parallel computation of Nash equilibria in N-player games. In Proceedings of the 12th International Conference on Computational Science and Engineering (CSE), 209 215.