# learning_equilibria_of_games_via_payoff_queries__a8692bd0.pdf

Journal of Machine Learning Research 16 (2015) 1305-1344 Submitted 12/13; Revised 12/14; Published 8/15

Learning Equilibria of Games via PayoﬀQueries

John Fearnley John.Fearnley@liverpool.ac.uk Martin Gairing Gairing@liverpool.ac.uk Ashton Building, Ashton Street, University of Liverpool, United Kingdom

Paul W. Goldberg Paul.Goldberg@cs.ox.ac.uk Wolfson Building, Parks Road, University of Oxford, United Kingdom

Rahul Savani Rahul.Savani@liverpool.ac.uk Ashton Building, Ashton Street, University of Liverpool, United Kingdom

Editor: Vahab Mirrokni

A recent body of experimental literature has studied empirical game-theoretical analysis, in which we have partial knowledge of a game, consisting of observations of a subset of the pure-strategy proﬁles and their associated payoﬀs to players. The aim is to ﬁnd an exact or approximate Nash equilibrium of the game, based on these observations. It is usually assumed that the strategy proﬁles may be chosen in an on-line manner by the algorithm. We study a corresponding computational learning model, and the query complexity of learning equilibria for various classes of games. We give basic results for exact equilibria of bimatrix and graphical games. We then study the query complexity of approximate equilibria in bimatrix games. Finally, we study the query complexity of exact equilibria in symmetric network congestion games. For directed acyclic networks, we can learn the cost functions (and hence compute an equilibrium) while querying just a small fraction of pure-strategy proﬁles. For the special case of parallel links, we have the stronger result that an equilibrium can be identiﬁed while only learning a small fraction of the cost values. Keywords: query complexity, bimatrix game, congestion game, equilibrium computation, approximate Nash equilibrium

1. Introduction

Suppose that we have a game G with a known set of players, and known strategy sets for each player. We want to design an algorithm to solve G, where the algorithm can only obtain information about G via payoﬀqueries. In a payoﬀquery, the algorithm proposes pure strategies for the players, and is told the resulting payoﬀs. The general research issue is to identify bounds on the number of payoﬀqueries needed to ﬁnd an equilibrium, subject to the assumption that G belongs to some given class of games. A general motivation for this topic is the observation that many data sets are generated by economic or competitive agents (for example, transactions on ﬁnancial or housing markets, or data on competitive sports). In attempting to learn from such data sets, it seems natural to model the data-generating process in game-theoretic terms. To some extent, the work in agent-based modelling takes this approach: artiﬁcial selﬁsh agents are simulated, and a general objective is to replicate various economic phenomena and behaviour observed in practice. We believe that there is considerable future potential to study data

c 2015 John Fearnley, Martin Gairing, Paul Goldberg, and Rahul Savani.

Fearnley, Gairing, Goldberg, and Savani

sets through the game-theoretic lens in this way. This has already been successfully applied in the AI literature on adversarial security games, where for example, Yang et al. (2013) apply existing models of bounded rationality of an opponent, so as to improve competitive performance in an artiﬁcial online game. Nguyen et al. (2013) develop an adversary-based model (SUQR), and shows that SUQR s performance (using parameters learned from realworld data) improves over previous work that does not model adversaries; an extension of SUQR has been deployed in the context of ﬁshery protection (Brown et al., 2014). Suppose we have a detailed computational simulation of a game, and we want to check whether it gives rise to behaviour that corresponds with real-world observations. A key observation is that it is not too hard to take such a simulation, and feed into it some chosen behaviour of the (simulated) players, check their payoﬀs, and (with a bit more eﬀort) check on whether players have a proﬁtable deviation. From this, we get to the challenge of searching for an equilibrium of the game (ideally a Nash equilibrium; failing that, search for something weaker, like an approximate equilibrium). In terms of the theoretical model that is studied in this paper, our choice of which behaviour to simulate corresponds to the choices of payoﬀqueries. Below, we discuss some of the literature in this setting.

1.1 Motivation for the Payoﬀ-Query Model

Given a game, especially one with many players, it is unreasonable to assume that anyone maintains an explicit representation of its payoﬀfunction, even if the game in question has a concise representation. However, in practice, a reasonable modelling assumption is that given, say, a strategy proﬁle for the players, we can determine their payoﬀs, or some estimate of the payoﬀs. We are interested in algorithms that ﬁnd Nash equilibria using a sequence of queries, where a query proposes a strategy proﬁle and gets told the payoﬀs. We would like to know under what conditions an algorithm can ﬁnd a solution based on knowledge of some but not all of the game s payoﬀs, which is particularly important when there are many players, and the number of pure-strategy proﬁles is large. This kind of challenge (where you get observations of proﬁle/payoﬀ-vector pairs, and you want to ﬁnd an approximate equilibrium, as opposed to the unobserved payoﬀs) has been the subject of experimental work (Vorobeychik et al., 2007; Wellman, 2006; Jordan et al., 2008; Duong et al., 2009), where Jordan et al. (2008) focuses on the case (highly relevant to this work) where the algorithm selects a sequence of pure proﬁles and gets told the resulting payoﬀs. In this paper, we introduce the study of payoﬀ-query algorithms from the algorithmic complexity viewpoint.1 We are interested in upper and lower bounds on the query complexity of classes of games. From the theoretical perspective, we are studying a constrained class of algorithms for computing equilibria of games. The study of such constraints especially when they lead to lower bounds or impossibility results informs us about the approaches that a successful algorithm needs to apply. In the context of equilibrium computation, other kinds of constraint include uncoupled algorithms for computing equilibria (Hart and Mas-Colell, 2003, 2006), communication-constrained algorithms (Hart and Mansour, 2010; Daskalakis et al.,

1. The ﬁrst discussion of this query model (that we are aware of) appears in a 2009 blog article by Noam Nisan: https://agtb.wordpress.com/2009/11/19/the-computational-complexity-of-pure-nash/. It also mentions best-reply queries, which deserve further attention in the context of adversarial security games.

Learning Equilibria of Games via Payoff Queries

2010; Goldberg and Pastink, 2012), and oblivious algorithms (Daskalakis and Papadimitriou, 2009). Of course, the restriction to polynomial-time algorithms is the best-known example of such a constraint. Based on the algorithms and open problems identiﬁed in this paper, we ﬁnd this to be a compelling motivation for the further study of the payoﬀ-query model. There are various related kinds of query models that are suggested by the payoﬀ queries studied here, which may also be of similar theoretical interest; we discuss these in Section 6.

1.2 Games and Query Models

In this paper we introduce the study of payoﬀ-queries for strategic-form games. We also consider two models of concisely represented games: graphical games (Kearns et al., 2001), where players are nodes in a given graph and the payoﬀof a player only depends on the strategies of its neighbors in the graph, and symmetric network congestion games (Fabrikant et al., 2004), where the strategy space of the players corresponds to the set of paths that connect two nodes in a network. For a strategic-form game, we assume that initially the querying algorithm only knows n, the number of players, and k, the number of pure strategies that each player has.

Deﬁnition 1 A payoﬀquery to a strategic-form game G selects a pure-strategy proﬁle s for G, and is given as response, the payoﬀs that G s players derive from s.

There are kn pure-strategy proﬁles in a game, and one could learn the game exhaustively using this many payoﬀqueries. We are interested in algorithms that require only a small fraction of this trivial upper bound on the number of queries required. For our results on symmetric network congestion games, we assume that initially the algorithm only knows the number of players n, and the set of pure strategies, given by a graph and the common origin/destination pair. In this paper, we will consider two diﬀerent query models, which are described in the following deﬁnition.

Deﬁnition 2 For a symmetric congestion game with m pure strategies and n players, a query is a tuple q = (q1, q2, . . . , qm), where for each pure strategy i = 1, 2, . . . , m, we have that qi {0, 1, 2, . . . , n} is the number of players assigned to i under the query. In response to the query q, the querier learns the costs of each pure strategy under the assigned loads. Let Q = P 1 i m qi. We consider two diﬀerent types of queries:

In a normal-query, we require that Q = n;

in an under-query, we require that Q < n.

Normal-queries correspond to the query model that we use for strategic-form games. For a congestion game, m, which is the number of paths from the origin to the destination in a graph, may be exponential. While we deﬁned a query for congestion as a tuple of length m, both normal-queries and under-queries require at most n positions of this tuple to be non-zero, so the query can be speciﬁed succinctly. We use under-queries in our query algorithm for games played on directed acyclic graphs. We feel that under-queries are a reasonable query model for congestion games, because we can ask some players to refrain from playing when we conduct our query.

Fearnley, Gairing, Goldberg, and Savani

Deﬁnition 3 The payoﬀquery complexity of a class of games G, with respect to some solution concept such as exact or approximate Nash equilibrium, is deﬁned as follows. It is the smallest N such that there is some algorithm A that, given N payoﬀqueries to any game G G (where initially none of the payoﬀs of G are known) can ﬁnd a solution of G.

The deﬁnition imposes no computational bound on the algorithm A. It is to some extent inspired by the work on query-based learning initiated by Angluin (1987, 1988), in the context of computational learning theory. Note that A may select the queries in an on-line manner, so queries can depend on the responses to previous queries.

1.3 Overview of Results

We study a variety of diﬀerent settings. In Section 3, we consider bimatrix games. Our ﬁrst result is a lower bound for computing an exact Nash equilibrium: in Theorem 4, we show that computing an exact Nash equilibrium in a k k bimatrix game has payoﬀquery complexity k2, even for zero-sum games. In other words, we have to query every pure strategy proﬁle. We then turn our attention to approximate Nash equilibria, where we obtain some more positive results. With the standard assumption that all payoﬀs lie in the range [0, 1], we show that, when 2 i k 1, the payoﬀquery complexity of computing a (1 1

i )- approximate Nash equilibrium is at most 2k i + 1 (Theorem 5) and at least k i + 1 (Theorem 7.) We also observe that, when ϵ 1 1

k, no payoﬀqueries are needed at all, because an ϵ-Nash equilibrium is achieved when both players mix uniformly over their pure strategies. The query complexity of computing an approximate Nash equilibrium when ϵ < 1 2 appears to be a challenging problem, and we provide an initial lower bound in this direction in Theorem 13: we show that the payoﬀquery complexity of ﬁnding a ϵ-approximate Nash equilibrium for ϵ = O( 1 log k) is Ω(k log k). This gives an interesting contrast with the ϵ 1

2 case. Whereas we can always compute a 1

2-approximate with 2k 1 payoﬀqueries, there exists a constant ϵ < 1

2 for which this is not the case, as shown in Corollary 14. Having studied payoﬀquery complexity in bimatrix games, it is then natural to look for improved payoﬀquery complexity results in the context of structured games. In particular, we are interested in concisely represented games, where the payoﬀquery complexity may be much smaller than the number of pure strategy proﬁles. As an initial result in this direction, in Section 4 we consider graphical games, where we show (Theorem 15) that for graphical games with constant degree d, a Nash equilibrium can be found with a polynomial number of payoﬀ-queries. This algorithm works by discovering every payoﬀin the game, however unlike bimatrix games, this can be done without querying every pure strategy proﬁle. Finally, we focus on two diﬀerent models of congestion games. In Section 5.1, we consider the case of parallel links, where the game has a origin and destination vertex, and m parallel links between them. We show both lower and upper bounds for this setting. If n denotes the number of players, then we obtain a log(n) + m payoﬀquery lower bound (Theorem 17), which applies to both query models. We obtain an upper bound of O log(n) log2(m) log log(m) + m normal-queries (Theorem 26). Note that there are n m diﬀerent

Learning Equilibria of Games via Payoff Queries

payoﬀs in a parallel links game, and so our upper bound implies that you do not need to discover the entire payoﬀfunction in order to solve a parallel links game. In Sections 5.2, 5.3, 5.4, we consider the more general case of symmetric network congestion games on directed acyclic graphs. We show that if the game has m edges and n players, then we can ﬁnd a Nash equilibrium using m n payoﬀqueries (Theorem 38). The algorithm discovers every payoﬀin the game, but it only queries a small fraction of the pure strategy proﬁles.

2. Related Work

In Section 2.1 we review some very recent work on the payoﬀquery complexity of related game-theoretic solution concepts. In Section 2.2 we review the experimental work that motivated this paper. Finally, in Section 2.3 we discuss the relationship with work that analyzes best-response dynamics in a game-theoretic context.

2.1 PayoﬀQuery Complexity

A preliminary version of this paper appeared at the ACM conference on Electronic Commerce (Fearnley et al., 2013). Work that has appeared subsequently has studied query complexity bounds for general multi-player games, where the main parameter of interest is the number of players n, who usually just have a small number of pure strategies. Hart and Nisan (2013) obtain an exponential in n lower bound on the query complexity of ﬁnding an exact correlated equilibrium of a general n-player game. Note that any lower bounds for correlated equilibria apply immediately to Nash equilibria, since Nash equilibria are a more restrictive solution concept. For approximate correlated equilibria, no-regret learning dynamics can be simulated by a randomized payoﬀquery algorithm, so that the query complexity of approximate correlated equilibria is polynomial in the number of players (Babichenko and Barman, 2013; Hart and Nisan, 2013). Goldberg and Roth (2014) studied the dependence in more detail, obtaining upper and lower bounds that are logarithmic in n. However, randomness is needed: Babichenko and Barman (2013) show that ﬁnding an exact correlated equilibrium in an n-player games using a deterministic querying strategy requires exponentially many queries in n. This result is strengthened by Hart and Nisan (2013), where it is shown that deterministic querying strategies require exponentially many queries to ﬁnd even a 1

2-approximate correlated equilibrium. Approximate well-supported Nash equilibria are another approximate solution concept that have been studied in the context of strategic form games (Kontogiannis and Spirakis, 2010; Fearnley et al., 2012). Babichenko (2014) has shown that ﬁnding a 10 8-well supported Nash equilibrium in an n-player game requires exponentially many queries in n. The query complexity of computing an ϵ-approximate Nash equilibrium (that need not be well-supported) for constant ϵ remains open, although Goldberg and Roth (2014) show that it is polynomial if the unknown game can be speciﬁed concisely. These negative results for n-player games motivate the consideration of more structured classes of games, such as congestion games, which we study in this paper. Finally, Fearnley and Savani (2014) have continued the study of query complexity for bimatrix games that was initiated in this paper. In particular, they show that randomized payoﬀquery algorithms can achieve better approximation ratios: there is a randomized

Fearnley, Gairing, Goldberg, and Savani

algorithm for ﬁnding a (3

5 2 +ϵ)-Nash equilibrium in a bimatrix game using O(k log k

ϵ2 ) payoﬀ queries, and there is a randomized algorithm for ﬁnding a (2

3 +ϵ)-WSNE in a bimatrix game using O(k log k

ϵ4 ) payoﬀqueries. They also provide lower bounds for ﬁnding well-supported Nash equilibria in bimatrix games: ﬁnding an ϵ-well-supported Nash equilibrium requires k 1 payoﬀqueries for any ϵ < 1, even in win-lose games, and ﬁnding a 1 3k-well-supported Nash equilibrium requires Ω(k2) payoﬀqueries, even in win-lose constant-sum games.

2.2 Experimental Work

In empirical game-theoretic analysis (Wellman, 2006; Jordan et al., 2010), a game is presented to the analyst via a set of observations of strategy proﬁles (usually, pure) and their corresponding payoﬀs. This set of proﬁles/payoﬀ-vector pairs is called an empirical game. In some settings the strategy proﬁles are randomly generated, but it is typically feasible to obtain observations via the payoﬀqueries we study here. The proﬁle selection problem (Jordan et al., 2008) is the challenge of choosing helpful strategy proﬁles. The strategy exploration problem (Jordan et al., 2010) is the special case of ﬁnding the best way to limit the search to a small subset of a large set of strategies. Jordan et al. (2008) envisage a setting where a game (called a base game) has a corresponding game simulator, an implementation in software, which is amenable to payoﬀ queries; a more general scenario allows the observed payoﬀs to be sampled from a distribution associated with the strategy proﬁle. The distribution is sometimes considered to be due to a noise process, and called the noisy payoﬀmodel in Jordan et al. (2008). In this paper we just consider deterministic payoﬀs, the revealed payoﬀmodel in Jordan et al. (2008). As noted in Vorobeychik et al. (2007), a proﬁle can be repeatedly queried to sample from the distribution of payoﬀs, and thus get an estimate of the expected values. The two interacting challenges are to identify helpful queries, and to use them to ﬁnd pure-strategy proﬁles that have low regret (where regret refers to the largest incentive to deviate, amongst the players.) Vorobeychik et al. (2007) study the payoﬀfunction approximation task, in which a game belongs to a known class, and there is a regression challenge to determine certain parameters; the information about the game consists of a random sample of pure proﬁles and resulting payoﬀvectors. However, success is measured by the extent that the players predicted behaviour is close to the behaviour associated with the true payoﬀs, rather than how well the true payoﬀfunctions are estimated. Work on speciﬁc classes of multi-player games includes the following. Duong et al. (2009) studies algorithms for learning graphical games; we consider a graphical game learning algorithm in Section 4. Jordan et al. (2008) apply payoﬀ-query learning to various kinds of games generated by GAMUT (Nudelman et al., 2004), including a class of congestion games. Vorobeychik et al. (2007) investigate a ﬁrst-price auction and also a scheduling game, where payoﬀs are described via a ﬁnite random sample of proﬁle/payoﬀvector pairs. Earlier, Sureka and Wurman (2005) study search for pure Nash equilibria of strategic-form games (mostly with 5 players and 10 pure strategies). Most of the experimental work (e.g., Sureka and Wurman 2005; Jordan et al. 2008; Duong et al. 2009) uses local search, in which proﬁles that get queried are typically very similar (diﬀering in just one player s strategy) from previously queried proﬁles. Jordan

Learning Equilibria of Games via Payoff Queries

et al. (2008) experiment with local-search type algorithms in which when a player has the incentive to deviate, the tested proﬁle is updated with that deviation. Sureka and Wurman (2005) study search for pure equilibria via best-response dynamics while maintaining a tabu list, introduced to reduce the risk of cycles.

2.3 Best-Response Dynamics and Local Search

There is a large body of literature that studies bestand better-response dynamics for classes of potential games, and gives bounds on the number of steps required for convergence to pure-strategy equilibria. These dynamics relate to the payoﬀquery model since they work by exploring the space of pure proﬁles, and receiving feedback consisting of payoﬀs. The diﬀerence is that they purport to model a decentralized process of selﬁsh behaviour by the players, while the payoﬀquery model envisages a centralized algorithm that is less constrained. In this section, we discuss some of the relevant literature. Local search processes in that each pure proﬁle is obtained from the previous one by letting a single player move have been studied extensively in the literature. Bounds on the convergence of deterministic best-response dynamics were considered in Even-Dar et al. (2003) and Feldmann et al. (2003). Gairing and Savani (2010, 2011) showed polynomial convergence of better-response dynamics for certain hedonic games. The better-response dynamics considered by Goldberg (2004) is the basic randomized local search algorithm, and bounds are obtained for its convergence to exact equilibrium. The work in Bei et al. (2013) shows that a Nash equilibrium of a bimatrix game can be found using a polynomial number of better-response queries. Chien and Sinclair (2011) study another local search, the ϵ-Nash dynamics, and its convergence to approximate equilibria. Gairing et al. (2010) employ controlled local search dynamics (where a sequence of players moves simultaneously) to compute pure Nash equilibria. Other papers (e.g., Fischer et al. 2006; Berenbrink et al. 2007) analyze strongly-distributed dynamics in which multiple players can move in the same time step; consequently the dynamics is not a local search. However, these dynamical systems could all be simulated by payoﬀquery algorithms in which at each step, at most nk queries are made to determine the change in payoﬀs available to players as a result of unilateral deviations. This paper begins to answer the question: how much better could a payoﬀquery algorithm do, if it were not subject to that constraint? Finally, Alon et al. (2011) consider payoﬀ-query algorithms for ﬁnding the costs of paths in graphs. They consider weight discovery protocols where the aim is to determine the costs of edges, and shortest path discovery protocols where the aim is to ﬁnd a shortest path. The latter objective is more similar to what we consider, since it can avoid the need to learn the entire payoﬀfunction; also a shortest path is an equilibrium strategy for the one-player case of a network congestion game.

3. Bimatrix Games

In this section, we give bounds on the payoﬀ-query complexity of computing approximate Nash equilibria of bimatrix games. A bimatrix game is a pair (R, C) of two k k matrices: R gives payoﬀs for the row player, and C gives payoﬀs for the column player. We use [n] to denote the set {1, 2, . . . , n}. A mixed strategy is a probability distribution over [k]. A

Fearnley, Gairing, Goldberg, and Savani

mixed strategy proﬁle is a pair s = (x, y), where x is a mixed strategy for the row player, and y is a mixed strategy for the column player. Let s = (x, y) be a mixed strategy proﬁle in a k k bimatrix game (R, C). We say that a row i [k] is a best response for the row player if Ri y = maxj [k] Rj y. We say that a column i [k] is a best response for the column player if (x C)i = maxj [k](x C)j. We deﬁne the row player s regret under s = (x, y) as the diﬀerence between the payoﬀof a best response and the payoﬀthat the row player obtains under s. More formally, the regret that the row player suﬀers under s is:

max j [k](Rj y) x R y.

Similarly, the column player s regret is deﬁned to be:

max j [k]((x C)j) x C y.

We say that s is a mixed Nash equilibrium if both players have regret 0 under s. An ϵ-Nash equilibrium is an approximate solution concept: for every ϵ [0, 1], we say that s is an ϵ-Nash equilibrium if both players suﬀer regret at most ϵ under s. We begin with the following simple observation: there are no query-eﬃcient algorithms for ﬁnding exact Nash equilibria, even in zero-sum games. The following theorem shows that, in order to ﬁnd an exact Nash equilibrium, we must query all k k pure strategy proﬁles.

Theorem 4 The payoﬀquery complexity of ﬁnding an exact Nash equilibrium of a zero-sum k k bimatrix game is k2.

Proof Consider a generalized version of matching pennies, where the column player pays 1 to the row player whenever both players choose the same strategy, otherwise the row player pays 1 to the column player. Note that this is a zero-sum game, and that it has a unique Nash equilibrium, namely when both players randomize uniformly over their strategies. Now suppose each payoﬀin the game is perturbed by a small quantity, in such a way as to maintain the zero-sum property. For small perturbations, there will still be a unique fullymixed equilibrium proﬁle, but it can only be known exactly if all the payoﬀs are known exactly. Thus, we cannot ﬁnd an exact Nash equilibrium in a zero-sum bimatrix game without querying all k k pure strategy proﬁles.

Theorem 4 implies that we cannot devise query-eﬃcient algorithms for ﬁnding exact Nash equilibria. This naturally raises the question of whether there are query-eﬃcient algorithms for ﬁnding approximate Nash equilibria, and we continue by presenting results on this topic. From now on, we will assume that all payoﬀs lie in the range [0, 1], which is a standard assumption when ﬁnding approximate Nash equilibria. Our ﬁrst result is an upper bound. The work of Daskalakis, Mehta, and Papadimitriou (Daskalakis et al., 2009b) gives a simple algorithm for ﬁnding a 1

2-Nash equilibrium. We adapt their algorithm to prove the following result.

Learning Equilibria of Games via Payoff Queries

Theorem 5 Let i be chosen such that 2 i k 1. The payoﬀquery complexity of ﬁnding a (1 1

i )-approximate equilibrium of a k k bimatrix game is at most 2k i + 1.

Proof We begin by querying all k pure proﬁles where the row player plays row 1. This allows us to ﬁnd the column player s best response to row 1. Without loss of generality, we can assume that this is column 1. Now query column 1 against rows 2 through k i + 2. Note that we have made a total of 2k i + 1 queries. Let row b be a row that maximizes the row player s payoﬀagainst column 1, among those that we have queried. Let B = {1, b} [k i + 3, k]. We propose the following mixed strategy proﬁle s: the column player plays column 1 with probability 1, and the row player mixes uniformly over the strategies in B. Note that the row player is mixing between i rows, and thus plays each of them with probability 1

i . We claim that s is a (1 1

i )-approximate Nash equilibrium. Let R and C be the actual payoﬀmatrices for the row and column player, respectively. Note that the row player s best response to column 1 is either b, or one of the strategies between k i + 3 and k. Call this row j, and observe that j B. The row player s regret can be expressed as

i Rℓ,1 = (1 1

Let j be a pure best response of the column player under s. Observe that, since column 1 is a best response against row 1, we have that C1,j C1,1 0. The column player s regret can be expressed as:

i (Cℓ,j Cℓ,1)

i (Cℓ,j Cℓ,1)

Thus we have shown that both players suﬀer regret at most 1 1

Note that, when i = 2, the algorithm of Theorem 5 ﬁnds a 1

2-Nash equilibrium using the same technique as the algorithm from Daskalakis et al. (2009b). For i > 2, our algorithm uses fewer payoﬀqueries in exchange for a worse approximation. When i = k 1, our algorithm uses k+2 payoﬀqueries in order to ﬁnd a (1 1 k 1)-Nash equilibrium. It turns out that, for ϵ 1 1

k, we do not need to make any payoﬀqueries at all: an ϵ-Nash equilibrium

Fearnley, Gairing, Goldberg, and Savani

is obtained when both players play the uniform distribution over their strategies, because both players must place at least 1

k of their probability on a pure best response. We now turn our attention to lower bounds. We complement the result of Theorem 5 by showing lower bounds for ﬁnding (1 1

i )-Nash equilibria, when i is in the range 2 i k 1. First, we prove an auxiliary lemma.

Lemma 6 Suppose that every payoﬀquery that is made by the algorithm returns 0 for both players. Let i be chosen such that 2 i k 1, and let s be a (1 1

i )-Nash equilibrium. Any column that receives no queries must be assigned at least 1

i probability by s.

Proof Suppose, for the sake of contradiction, that c is a column that received no queries, and that c is assigned strictly less than 1

i probability by s. We construct a column player matrix C as follows:

( 1 if j = c, 0 otherwise.

Since c received no queries, C is consistent with all queries that have been made. Note that the column player s payoﬀunder s is strictly less than 1

i , and that the payoﬀof playing c as a pure strategy is 1. Thus, the column player s regret is strictly greater than 1 1

i , which contradicts the fact that s is a (1 1

i )-Nash equilibrium

Now we can show our lower bound.

Theorem 7 Let i be chosen such that 2 i k 1. The payoﬀquery complexity of ﬁnding a (1 1

i )-approximate Nash equilibrium of a k k bimatrix game is at least k i + 1.

Proof Assume that all payoﬀqueries return 0 for both players. Suppose, for the sake of contradiction, that an algorithm makes fewer than k i + 1 payoﬀqueries, and then outputs s as a (1 1

i )-Nash equilibrium. It follows that there must be at least i columns that have received no payoﬀqueries at all, and without loss of generality, we can assume that these are columns 1 through i. By Lemma 6, we know that s must assign exactly 1

i probability to each of the columns 1 through i. Since there are k rows, there is at least one row r that receives probability at most 1

k under s. We construct a row player payoﬀ matrix R as follows:

( 1 if j = r and 1 j i, 0 otherwise.

Since columns 1 through i were not queried, R is consistent with all queries that have been made so far. The row player s payoﬀunder s is at most 1

k. On the other hand, the row player would receive payoﬀ1 for playing r as a pure strategy. Thus, the row player s regret is at least: 1 1

This contradicts the fact that s is a (1 1

i )-Nash equilibrium.

Learning Equilibria of Games via Payoff Queries

As a consequence of the previous two theorems, when 2 i k 1, we have that the payoﬀquery complexity of ﬁnding a (1 1

i )-Nash equilibrium lies somewhere in the range [k i + 1, 2k i + 1]. Determining the precise payoﬀquery complexity for this case is an open problem. So far, we have only considered ϵ-Nash equilibria with ϵ 1 2. Of course, the most interesting challenge is to determine the payoﬀquery complexity for values of ϵ < 1

2. By our previous results, we know that the payoﬀquery complexity for ﬁnding a 1

2-Nash equilibrium is O(k), and the payoﬀquery complexity for ﬁnding a 0-Nash equilibrium is O(k2), but we do not know how the payoﬀquery complexity behaves as we vary ϵ between 0 and 1

2. Our ﬁnal result in this section will be to show a lower bound for ϵ = O( 1 log k). We will show that ﬁnding a O( 1 log k)-Nash equilibrium requires Ω(k log k) payoﬀqueries. This establishes that there are some positive values of ϵ, for which computing an ϵ-Nash equilibrium is asymptotically harder than computing a 1

2-Nash equilibrium. We will use the following class of bimatrix games, which have been previously used in Theorem 1 of Feder et al. (2007).

Deﬁnition 8 Let Gℓbe the class of strategic-form games where the column player has ℓ pure strategies and the row player has ℓ ℓ/2 pure strategies (where we assume ℓis even). Let Gℓ Gℓbe the win-lose constant-sum game in which each row of the row player s payoﬀ matrix has ℓ

2 1 s and ℓ

2 0 s, all rows being distinct. The column player s payoﬀs are one minus the row player s payoﬀs.

It is well-known that every zero-sum game has a unique value, which is the payoﬀthat both players can guarantee for themselves, independent of what the other player does. The value of each game Gℓ Gℓis 1

2 since either player can obtain payoﬀ1

2 by using the uniform distribution over their pure strategies. Our ﬁrst lemma shows that, if the column player deviates from this by placing too much probability on a single column, then the row player can take advantage and increase his payoﬀ.

Lemma 9 Suppose that in game Gℓ Gℓ, the column player places probability α > 1/ℓon some column. Then the row player can obtain a payoﬀstrictly greater than 1

Proof Let j be a column that the column player plays with probability α. Let Rj be the set of rows where the row player obtains payoﬀ1 against column j. Suppose the row player plays the uniform distribution over rows in Rj. When the column player plays j, the row player receives payoﬀ1. Let j = j be a column, and consider the payoﬀs to the row player where j intersects Rj. A fraction ℓ/2 1

ℓ 1 of these entries pay the row player 1, while a

fraction ℓ/2

ℓ 1 pay the row player 0. Consequently whenever the column player plays j = j,

the row player s expected payoﬀis ℓ/2 1

ℓ 1 . Thus with probability α the row player receives

payoﬀ1, and with probability 1 α he receives payoﬀℓ/2 1

ℓ 1 . Thus, the payoﬀto the row

Fearnley, Gairing, Goldberg, and Savani

α + (1 α)ℓ/2 1

2α 1 α 2(ℓ 1)

which completes the proof.

We now use the bound from the previous lemma to show that, in an approximate Nash equilibrium for Gℓ, the column player cannot place too much probability on any individual column.

Corollary 10 Let α > 1

k, and let ϵ = 1

ℓ). In every ϵ-Nash equilibrium of Gℓ Gℓ, the column player plays each individual column with probability at most α.

Proof Suppose, for the sake of contradiction, that there is an ϵ-Nash equilibrium s in which that the column player assigns column j probability strictly greater than α. Then, by Lemma 9, the row player s payoﬀis strictly greater than 1

2ℓ, and therefore the row player s payoﬀin s must be strictly greater than:

Therefore, the column player obtains payoﬀstrictly less than 1

2 ϵ. Since the value of Gℓ is 1

2, the column player s regret in s is strictly greater than ϵ, and therefore s is not an ϵ-Nash equilibrium.

We can now provide a lower bound for the payoﬀquery complexity of ﬁnding an approximate Nash equilibrium for the games in Gℓ.

Lemma 11 For any ϵ < 1

12, and any even ℓ 8, the payoﬀquery complexity of ﬁnding an ϵ-Nash equilibrium for the games in Gℓis at least 1

2 ℓ ℓ/2 ( 1 16ϵ+4/ℓ).

Proof Let A be a payoﬀquery algorithm for ﬁnding an ϵ-Nash equilibrium, and, for the sake of contradiction, suppose that A makes fewer than 1

2 ℓ ℓ/2 ( 1 16ϵ+4/ℓ) many payoﬀ queries when processing Gℓ. Let s be the mixed strategy proﬁle that A outputs for Gℓ. By Corollary 10, we know that no column in s is assigned more than α = 4ϵ+ 1

ℓprobability. We also know that in s, the row player s payoﬀis at most 1

2 +ϵ, since s is an ϵ-Nash equilibrium of a constant-sum game with value 1

2. Since A made fewer than 1

2 ℓ ℓ/2 ( 1 16ϵ+4/ℓ) payoﬀ queries, at least half of the rows received fewer than ( 1 16ϵ+4/ℓ) queries. Since ℓ 8, this

implies that there are at least 1

2 8 4 = 45 such rows. Thus, there is one such row, call it r, that is played with probability strictly less than 1 12 in s.

Learning Equilibria of Games via Payoff Queries

Since s assigns at most α probability to each column, the total amount of probability that s assigns to the queried portion of r is at most α( 1 16ϵ+4/ℓ) = 1

4. Now suppose that we modify Gℓby replacing all un-queried entries of r with payoﬀs of 1 for the row player. Call this new game G ℓ. Note that A outputs the same strategy proﬁle s for both Gℓand G ℓ. Let p be the payoﬀto the row player of playing s in Gℓ, and let p be the payoﬀto the row player of playing s in G ℓ. Since r is played with probability less than 1 12 we have:

However, the row player s best response payoﬀis at least 3

4 in G ℓ, so we have:

Therefore, we can conclude that:

However, this is impossible because ϵ < 1

Finally, we can extend the lower bound to square bimatrix games.

Lemma 12 For k k bimatrix games, the payoﬀquery complexity of ﬁnding an ϵ-Nash equilibrium, for ϵ 1

8, is at least k ( 1 32/ log k+64ϵ).

Proof Let k be the largest number of the form ℓ ℓ/2 that is smaller than k. We have k k/4 and ℓ log k/2. By Lemma 11, the number of payoﬀqueries needed to ﬁnd an ϵ-Nash equilibrium for games in Gk is at least: ℓ ℓ/2

1 16ϵ + 4/ℓ

= k 1 4/ℓ+ 16ϵ

1 8/ log(k) + 16ϵ

= k 1 32/ log(k) + 64ϵ

The games in Gℓcan be written down as a k k game, by duplicating rows and columns. Note that these operations preserve approximate equilibria.

By taking ϵ O( 1 log k) in the previous lemma, we arrive at our ﬁnal theorem.

Fearnley, Gairing, Goldberg, and Savani

Theorem 13 For k k bimatrix games, the payoﬀquery complexity of ﬁnding a ϵ-Nash equilibrium for ϵ O( 1 log k), is Ω(k log k).

Recall, from Theorem 5, that we can always ﬁnd a 1

2-Nash equilibrium using 2k 1 payoﬀqueries. The following corollary of Lemma 12 shows that there are some constant values of ϵ that require more payoﬀqueries.

Corollary 14 There is a constant value of ϵ > 0 for which ﬁnding an ϵ-Nash equilibrium of a k k bimatrix game requires strictly more than 2k 1 payoﬀqueries.

Proof Consider, for example, setting ϵ = 1 512 in Lemma 12. Then, for the family of games in Gl with l > 2256, we have a lower bound of

1 32 log k + 0.0064

> k 1 0.125 + 0.125 = 4 k,

on the number of payoﬀqueries.

An interesting question that remains is whether one can a show a superlinear lower bound on the number of payoﬀqueries required for a constant ϵ.

4. Graphical Games

In this section, we give a simple payoﬀquery-based algorithm for graphical games. In a n-player graphical game (Kearns et al., 2001) the players lie at the vertices of a degree-d graph, and a player s payoﬀis a function of the strategies of just himself and his neighbors. If every player has k pure strategies, then the number of payoﬀvalues needed to specify such a game is n kd+1 which, in contrast with strategic-form games, is polynomial (assuming d is a constant). Previously, Duong et al. (2009) have carried out experimental work on payoﬀqueries for graphical games. They compare a number of techniques; the algorithm we give here is polynomial-time but would likely be less eﬃcient in practice. Similar to Duong et al. (2009), we assume the underlying graph G is unknown, and we want to induce the structure of G, and corresponding payoﬀs.

Theorem 15 For constant d, the payoﬀquery complexity of degree d graphical games is polynomial.

Proof Algorithm 1 constructs a directed graph G for the (initially unknown) game, along with the payoﬀfunction. G is the aﬀects graph (Goldberg and Papadimitriou, 2006) in which a directed edge (p , p) has the meaning that the behaviour of p may aﬀect p s payoﬀ. Note that in Step 2, |S| < (n k)d+1. In a degree-d graphical game, any player p s payoﬀs may be aﬀected by his own strategy, and the strategies of at most d neighbours p for which edges (p , p) exist. The existence of edge (p , p) is equivalent to the existence of strategy proﬁles s, s that diﬀer only in p s strategy and p s payoﬀ. This is what Algorithm 1 checks for. Finally, when the edges, and hence neighborhoods of the graph game have been found,

Learning Equilibria of Games via Payoff Queries

Algorithm 1 Graphical Games

1: Initialize graph G s vertices to be the player set, with no edges

2: Let S be the set of pure proﬁles in which at least n (d + 1) players play 1.

3: Query each element of S.

4: for all players p, p do

5: if s, s S that diﬀer only in p s payoﬀand p s strategy then

6: add directed edge (p, p ) to graph

9: for all players p do

10: Let Np be p s neighborhood in G

11: Use elements of S to ﬁnd p s payoﬀs as a function of strategies of Np 12: end for

it is simple to read oﬀeach player s payoﬀmatrix from the data in Step 3.

Algorithm 1 learns the entire payoﬀfunction with polynomially many queries, but there are a couple of important caveats. First, although the payoﬀquery complexity is polynomial, the computational complexity is probably not polynomial, since it is PPAD-complete to actually compute an approximate Nash equilibrium for graphical games (Daskalakis et al., 2009a). Second, while Algorithm 1 avoids querying all of the exponentially-many purestrategy proﬁles, it works in a brute-force manner that learns the entire payoﬀfunction. It is natural to prefer algorithms that ﬁnd a solution without learning the entire game, such as those that we give for Theorem 5 and Theorem 26.

5. Congestion Games

In this section, we give bounds on the payoﬀ-query complexity of ﬁnding a pure Nash equilibrium in symmetric network congestion games. A congestion game is deﬁned by a tuple Γ = (N, E, (Si)i N, (fe)e E). Here, N = {1, 2, . . . , n} is a set of n players and E is a set of resources. Each player chooses as her strategy a set si E from a given set of available strategies Si 2E. Associated with each resource e E is a non-negative, non-decreasing function fe : N 7 R+. These functions describe costs (latencies) to be charged to the players for using resource e. An outcome (or strategy proﬁle) is a choice of strategies s = (s1, s2, ..., sn) by players with si Si. For an outcome s deﬁne ne(s) = |i N : e si| as the number of players that use resource e. The cost for player i is deﬁned by ci(s) = P e si fe(ne(s)). A pure Nash equilibrium is an outcome s where no player has an incentive to deviate from her current strategy. Formally, s is a pure Nash equilibrium if for each player i N and s i Si, which is an alternative strategy for player i, we have ci(s) ci(s i, s i). Here (s i, s i) denotes the outcome that results when player i changes her strategy in s from si to s i. In a network congestion game, resources correspond to the edges in a directed multigraph G = (V, E). Each player i is assigned an origin node oi, and a destination node di. A strategy for player i consists of a sequence of edges that form a directed path from oi to

Fearnley, Gairing, Goldberg, and Savani

di, and the strategy set Si consists of all such paths. In a symmetric network congestion game all players have the same origin and destination nodes. We write a symmetric network congestion game as Γ = (N, V, E, (fe)e E, o, d), where collectively V , E, o, and d succinctly deﬁne the strategy space (Si)i N. We consider two types of networks, directed acyclic graphs, and the special case of parallel links. We assume that initially we only know the number of players n and the strategy space. The latency functions are completely unknown initially. As discussed in Section 1.2, we use several diﬀerent querying models for congestion games.

5.1 Parallel Links

In this section, we consider congestion games on m parallel links. We present a lower bound and an upper bound on the query complexity of ﬁnding an exact pure equilibrium of these games. To simplify the presentation of the algorithmic ideas of our upper bound we introduce a stronger type of query that we call an over-query. Recall from Deﬁnition 2 that for a query q = (q1, q2, . . . , qm), we denote by Q the total number of players used in the query, i.e., Q = P 1 i m qi.

Deﬁnition 16 An over-query is a query with n < Q mn.

First, we present a simple lower bound. Then, we present an algorithm, Algorithm 2, that uses over-queries. Finally, we extend Algorithm 2 to Algorithm 3, which uses only normal queries.

5.1.1 Lower Bound

In the following construction, we show that, if there are two links, the querier can do no better than performing binary search in order to ﬁnd an equilibrium, which gives a lower bound of log(n) many queries.

Theorem 17 A querier must make log(n) queries to determine a pure equilibrium of a symmetric network congestion game played on parallel links.

Proof We ﬁx a graph G with two parallel links e1 and e2, and we ﬁx the cost of e2 so that fe2(i) = 1 for all i N. We consider functions fe1 that only return costs of 0 or 2. Since fe1 is non-decreasing, this implies that it will be a step function with a single step. We say that the step is at location i N if fe1(j) = 0 for all j i, and fe1(j) = 2 for all j > i. The precise location of the step will be decided by an adversary, in response to the queries that are received. The adversary s strategy maintains two integers ℓand u with ℓ< u, and initially the adversary sets ℓ= 0 and u = n. Intuitively, for all values below ℓthe adversary has ﬁxed fe1 to 0, and for all values above u the adversary has ﬁxed fe1 to 2. The range of values between u and ℓare yet to be ﬁxed, and all values in this range could potentially be the location of the step. Suppose that the adversary receives the query s. The adversary will respond with a pair (c1, c2), where c1 is the cost of e1, and c2 is the cost of e2. The adversary uses the following strategy:

Learning Equilibria of Games via Payoff Queries

If ne1(s) ℓ, then the adversary responds with (0, 1). If ne1(s) u, then the adversary responds with (2, 1).

If ne1(s) < u+ℓ

2 , that is, if ne1(s) is closer to ℓthan it is to u, then the adversary sets ℓ= ne1(s), and responds with (0, 1).

If ne1(s) u+ℓ

2 , that is, if ne1(s) is closer to u than it is to ℓ, then the adversary sets u = ne1(s), and responds with (2, 1).

Note that, if there exists an i with ℓ< i < u, then the querier cannot correctly determine the Nash equilibrium. This is because the step could be at location i, or it could be at location i 1. In the former case, the unique Nash equilibrium assigns i players to e1 and n i players to e2, and in the latter case the unique Nash equilibrium assigns i 1 players to e1 and n i + 1 players to e2. By construction, the adversary s strategy ensures that, in response to each query, the gap between u and ℓmay decrease by at most one half. Thus, the querier must make log(n) queries to correctly determine the Nash equilibrium.

Consider a one-player game with m links. Clearly, we can solve this game with a single over-query, but it requires m normal-queries. Thus we have the following:

Corollary 18 If over-queries are not allowed, then log(n) + m queries are required to determine a pure equilibrium of a symmetric network congestion game played on parallel links.

5.1.2 Upper Bound

In the rest of the section, we provide an upper bound, by constructing a payoﬀquery algorithm that ﬁnds a pure Nash equilibrium using O log(n) log2(m) log log(m) + m normal-queries. In order to simplify the presentation, we ﬁrst present an algorithm that makes use of over-queries; later we show how this can be translated into an algorithm that uses only normal-queries. Our algorithm is based on an algorithm from Gairing et al. (2008). Before we present the full algorithm, we give an overview of the techniques by describing a simpliﬁed version of the algorithm. The basic idea is to group the players into blocks, where all players in a block must play on the same link. In each round of the algorithm, we maintain the property that the blocks are in equilibrium: no block of players can collectively deviate in order to reduce their latency. Initially, we place all of the players into a single block, and then in each round of the algorithm, we split each block into smaller blocks, and compute a new equilibrium for the smaller block size. Eventually, the block size will be reduced to 1, and we recover a Nash equilibrium for the congestion game. In this simpliﬁed overview, we will assume that the number of players n is equal to 2i

for some i N, and in each round we will split each block in half. Our full algorithm will be more complicated, because it must deal with an arbitrary number of players, and it will split each block into more than two pieces. At the start of the algorithm, we place all n players into a single block. In order to ﬁnd an equilibrium for this block, we simply have to ﬁnd the link i [m] that minimizes fi(n). We can do this with a single over-query q = (n, n, . . . , n).

Fearnley, Gairing, Goldberg, and Savani

Now suppose that we have found an equilibrium s for block size δ. We split each block into two equal-sized pieces, and our task is to transform s into an equilibrium for block size δ/2 by moving blocks between the links. The key observation is that no link can receive two or more blocks of size δ/2, because this would contradict the fact that s is an equilibrium for block size δ. So, when we move blocks between the links, we know that each link can receive at most one block, and therefore each link can lose at most m 1 blocks. We can make a single over-query in order to discover the cost of adding one block of δ/2 players to each link: we simply query p = (n1(s) + δ/2, n2(s) + δ/2, . . . , nm(s) + δ/2). On the other hand, we also need to determine how many blocks each link loses, and a naive approach would use m queries. We now describe a method that uses only log2(m) under-queries. Suppose that we guess that q, where 0 q m, is the number of blocks that move. We give an algorithm that veriﬁes whether this guess is correct. Let c be the (q + 1)th smallest cost returned by the query p. For each link i, we determine qi, which is the number of δ/2-sized blocks that would want to move to a link with cost c. This can be done by binary search, in parallel for all links, using log(m) many under-queries. There are three possible outcomes:

If Pm i=1 qi = q, then our guess was correct, and exactly q blocks move.

If Pm i=1 qi < q, then our guess was too high, and fewer than q blocks move.

If Pm i=1 qi > q, then our guess was too low, and more than q blocks move.

Thus, to determine exactly how many blocks move between the links, we can use a nested binary search approach: in the outer level we guess how many blocks move, and in the inner level we use the above method to determine if our guess was too high or too low. Therefore, we have a method for constructing an equilibrium with block size δ/2 from an equilibrium with block size δ using log2(m) many queries. Since we start with block size n, and we halve the block size in every round, this gives us an algorithm that ﬁnds a Nash equilibrium using log(n) log2(m) many payoﬀqueries. In the rest of this section, we formalize this approach, and we deal with the issues that were ignored in this high level overview. In particular, we present an algorithm that works for any number of players n, and we obtain a slightly better query complexity by splitting each block into log(m) many pieces in each round.

5.1.3 The Algorithm With Over-Queries

The algorithm Parallel Links is depicted in Algorithm 2. We will show how this algorithm can be implemented with O log(n) log2(m) log log(m) queries. The integer k is a parameter to

the algorithm that determines the block size: in each round we consider blocks of size kt

for some t. To deal with the fact that n may not be an exact power of k, the algorithm will maintain a special link a. This link is deﬁned to be the link upon which all n players are placed at the start of the algorithm. Since every subsequent step of the algorithm only moves players in blocks of size kt for some t, link a will be the only link where the number of players is not a multiple of the block size.

Learning Equilibria of Games via Payoff Queries

We start by formalizing the notion of an equilibrium with respect to a certain block size. For a congestion game Γ, an integer δ, and a special link a we deﬁne a δ-equilibrium as follows:

Deﬁnition 19 (δ-equilibrium) A strategy proﬁle s is δ-equilibrium if δ|ni(s) for all i [m] \ {a}, and for all links i, j [m] with ni(s) δ we have fi(ni(s)) fj(nj(s) + δ).

Intuitively, we can think of a δ-equilibrium s as a Nash equilibrium in a transformed game where the players (of the original game) are partitioned into blocks of size δ and each block represents a player in the transformed game, and the remaining (n mod δ) players are ﬁxed to link a. We start with an informal description of algorithm Parallel Links. On Line 1 we initialize the algorithm by using one over-query to ﬁnd the cheapest link a, and assigning all n players to link a. Note that a is the special link, as discussed earlier. The algorithm then works in T + 1 phases, where T = log(n)

log(k) . Each phase is one iteration of the forloop. The for-loop is governed by a variable t, which is initially T and decreases by 1 in each iteration. Within any iteration, the algorithm uses the function Refine Profile to transform a kt+1-equilibrium into a kt-equilibrium. Recall, from the overview, that when k = 2, we observed that each link can receive at most one block when we transform a 2t+1-equilibrium into a 2t-equilibrium. In the following lemma, we establish a similar property for the case where k = 2: each link can receive at most 2k blocks. Intuitively, one might expect each link to receive at most k blocks, but the extra factor of two here arises due to the special link a, which was not considered in our simpliﬁed overview.

Lemma 20 We can convert a kt+1-equilibrium s into a kt-equilibrium s by moving at most 2k blocks of δ = kt players to any individual link and at most km blocks of δ players in total.

Proof Since s is kt+1-equilibrium, we have fi(ni(s)) fj(nj(s) + kt+1) for all i [m] \ {a}, j [m]. Moreover, either (a) fa(na(s)) fj(nj(s) + kt+1) for all j [m] or (b) na(s) < kt+1. In case (a), this implies that each link j [m] can in total receive at most k blocks of size δ = kt from links i [m]. In case (b), this implies that each link j [m] can in total receive at most k blocks of size δ = kt from links i [m] \ {a}. Moreover, since na(s ) < kt+1, we can move at most k blocks of size δ = kt from link a. In either case, in total we move at most km blocks. All links receive and lose players only in multiples of δ = kt, which ensures that kt|ni(s ) for all i [m] \ {a} is maintained.

Refine Profile determines the number of blocks q which have to be moved by binary search on q in [0, km]. Since, by Lemma 20, each link receives at most 2k blocks of players, we spend 2k over-queries to determine the cost function values fi(ni(s) + r δ) for all integers r 2k and all links i [m]. We deﬁne Q as the multi-set of these cost function values and Cmin(q) as the (q + 1)-th smallest value in Q. Intuitively, Cmin(q) is the cost of the (q + 1)-th block of players that we would move. We use Cmin(q) to ﬁnd out how many blocks of players qi we need to remove from each link i [m] so that on each link i [m] the cost is at most Cmin(q) or we can t remove any further blocks as there are less than δ players

Fearnley, Gairing, Goldberg, and Savani

Algorithm 2 Parallel Links

1: a arg mini [m] fi(n) 1 over-query

2: initialize strategy proﬁle s by putting all players on link a

3: T log(n)

4: for t = T, T 1, . . . , 1, 0 do

6: s Refine Profile(s, δ, 0, km)

8: return s

9: function Refine Profile(s, δ, qmin, qmax)

10: q qmin+qmax

11: Parallel for all links i [m]

12: Query for costs fi(ni(s) + rδ) for all integer 1 r 2k 2k queries

13: End Parallel

14: Q the ordered multiset of 2km non-decreasing costs from the above queries

15: Cmin(q) (q + 1)-th smallest element of Q

16: pi number of times i [m] contributes a cost to the q smallest elements of Q

17: Parallel for all links i [m]

18: if fi(ni(s) ni(s)

δ δ) > Cmin(q) then 1 query; only relevant for link a

19: qi ni(s)

20: else (using binary search on qi [0, min{km, ni(s)

21: qi min {qi : fi(ni(s) qiδ) Cmin(q)} log(km) queries

23: End Parallel

24: if P i [m] qi = q then

25: modify s by removing qi and adding pi blocks of δ players to every link i [m]

26: return s

27: else if P i [m] qi < q then

28: return Refine Profile(s, δ, qmin, q 1)

29: else (P i [m] qi > q)

30: return Refine Profile(s, δ, q + 1, qmax)

32: end function

Learning Equilibria of Games via Payoff Queries

assigned to it (which can only happen on link a). By Lemma 20, we need to remove at most km blocks of players in total. Therefore, we can determine qi [0, min{km, ni(s)

δ }] by binary search in parallel on all links, with O(log(km)) under-queries. Now, if Pm i=1 qi = q, we can construct a kt-equilibrium by removing qi and adding pi blocks of δ players to link i [m]; note that for every i [m], either qi = 0 or pi = 0. If Pm i=1 qi = q, our guess for q was not correct and we have to continue the binary search on q. The algorithm maintains the following invariant:

Lemma 21 Refine Profile(s, δ, 0, km) returns a δ-equilibrium.

Proof Observe that δ = kt. In the ﬁrst iteration of the for-loop t = T and Refine Profile(s, δ, 0, km) gets a n-equilibrium as input, which is also a k T+1-equilibrium as all players are assigned to link a and k T+1 > n. So to prove the claim, it suﬃces to show that Refine Profile(s, kt, 0, km) returns a kt-equilibrium if s is a kt+1-equilibrium. For the s returned by Refine Profile and the q in its returning call, we have fi(ni(s)) Cmin(q) fi(ni(s) + δ) for all i [m] \ {a}. The left inequality follows from line 21 of the algorithm. The right inequality follows from the deﬁnition of Cmin(q) as the (q+1)-th smallest element in Q in line 15 of the algorithm. For link a, we have fa(na(s)) Cmin(q) fa(na(s) + δ) or we have fa(na(s)) > Cmin(q) and na(s) < δ, where the ﬁrst case follows from lines 21 and 15 as before, and the second case corresponds to line 18. Noting that Refine Profile maintains that for the returned s we have δ|ni(s) for all i [m] \ {a}, as it only moves blocks of size δ, the claim follows.

We now give the payoﬀquery complexity of Refine Profile. We split our analysis into over-queries and non-over-queries (i.e., under-queries or normal-queries), because we will later show how the over-queries made by our algorithm can be translated into a sequence of non-over-queries.

Lemma 22 Refine Profile(s, δ, 0, km) can be implemented to make 2k over-queries and O(log2(km)) non-over-queries.

Proof Note that, as long as δ is not changed, the queries made on line 12 are the same for each pair of qmin and qmax. Therefore, we can perform these 2k over-queries when we ﬁrst call Refine Profile(s, δ, 0, km), and reuse these values during each recursive call. For each value of q in the binary search, we make O(log(km)) under-queries to determine the qi s in parallel for all links i [m]. The binary search on q adds a factor log(km) to give O(log2(km)) under-queries in total.

Using Lemmas 21 and 22 we can prove the following.

Theorem 23 Algorithm Parallel Links returns a pure Nash equilibrium and can be implemented with O log(n) log2(m) log log(m) queries, of which 2k log n log log m are over-queries.

Proof In the last iteration of the for-loop, we have δ = 1, so Lemma 21 implies that s is a pure Nash equilibrium. To ﬁnd the best link in line 1 of the algorithm, we need one

Fearnley, Gairing, Goldberg, and Savani

over-query. For any k 2, the algorithm does T + 1 = O log(n) log(k) iterations of the for-

loop. In each iteration we do O(log2(km)) under-queries and 2k over-queries. Choosing k = Θ(log(m)) yields the stated upper bound.

5.1.4 Using Only Normal-Queries

We now show how Algorithm 2 can be implemented without the use of over-queries. Before doing so, we remark that in the parallel links setting, we can also avoid using under-queries.

Lemma 24 If a parallel links congestion game has at least two links, then every underquery can be translated into two normal-queries.

Proof Suppose that the game has m 2 links, and let q = (i1, i2, . . . , im) be an underquery. Let n = Pm j=1 ij be the total number of players used by q. We deﬁne the following queries:

q1 = (i1 + n n , i2, . . . , im),

q2 = (i1, i2, . . . , im + n n ).

Clearly both q1 and q2 are normal-queries. Query q1 tells us the cost of links 2 through m under q, and query q2 tells us the cost of link 1 under q.

We now turn our attention to over-queries. The following lemma gives a general method for translating over-queries into non-over-queries.

Lemma 25 Suppose we have a parallel links game with m links and n players. Let q = (i1, i2, . . . , im) be an over-query, and deﬁne n = Pm j=1 ij. We can translate q into a sequence of O(n /n) non-over-queries.

Proof Consider the following greedy algorithm: ﬁnd the smallest index b such that P 1 k b ik n and assign links 1 through b to query q1. Set i1 = i2 = = ib = 0, and repeat. Clearly each query that we generate during this algorithm is a non-over-query. Let q1, q2, . . . , ql be the sequence of non-over-queries generated by the above algorithm for some l N. For each j, let nj be the total number of players used by qj, and observe that P 1 j l nj = n . Furthermore, for each j, let rj = n nj be the total number of players not used by qj. Due to the nature of our algorithm, for every j > 1 we must have rj 1 < nj, since the ﬁrst link assigned to qj would not ﬁt in qj 1. Thus, we have:

1 j l rj < X

2 j l nj + r1

Learning Equilibria of Games via Payoff Queries

Since the total number of queries in the sequence is l, we can argue that:

1 j l (nj + rj)

< n + n + n

Thus, our greedy algorithm generates at most O(n /n) non-over-queries.

In order to optimize the number of non-over queries we have to adjust Algorithm 2 slightly, because with k = Θ(log(m)) in early iterations of the for loop, i.e., when T is large, the number of players used in the over queries in line (12) is large and applying Lemma 25 would yield to a total of O log(n) log2(m) log log(m) + m log(m) non-over queries. In contrast, we will now show that our adjusted Algorithm 3 can be implemented to do at most O log(n) log2(m) log log(m) + m non-over queries. The main idea is to divide the block size by 2 until the number of players in a block is small enough and then switch to k = Θ(log(m)).

Algorithm 3 Parallel Links avoiding over-queries

1: a arg mini [m] fi(n) 1 over-query

2: initialize strategy proﬁle s by putting all players on link a

3: T j log(n/m)

4: T0 largest t such that k T 2t < n

5: for t = T0, T0 1, . . . , 1 do

6: δ k T 2t

7: s Refine Profile(s, δ, 0, 2m)

9: for t = T, T 1, . . . , 1, 0 do

11: s Refine Profile(s, δ, 0, km)

12: end for

13: return s

To initialize our algorithm, we make an over-query that uses m n players. By Lemma 25, we can translate this into O(m) non-over-queries. In each iteration of the ﬁrst for-loop with value t, by Lemma 22, we make O(1) overqueries. Each of these uses at most n + m 4 k T 2t players. By Lemma 25, these can be simulated by O(1 + mk T 2t

n ) non-over-queries. Summing up over all iterations and using the deﬁnition of T0, we can argue that all over-queries of the ﬁrst for-loop can be simulated by

t=1 O 1 + mk T 2t

= O(T0) + O mk T 2T0

Fearnley, Gairing, Goldberg, and Savani

non-over-queries. In each iteration of the second for-loop with value t, by Lemma 22, we make make 2k over-queries that each use at most n + m 2k kt players. By Lemma 25, these can be simulated by O(mkt+1

n ) non-over-queries. Summing up over all iterations, we can argue that all over-queries of the second for-loop can be simulated by

t=0 O mkt+1

log(n) log(m)+log(k)

log(n) log(k)

non-over-queries. Combining this discussion with Theorem 23, we get the following result:

Theorem 26 Algorithm 3 returns a pure Nash equilibrium and can be implemented with O log(n) log2(m) log log(m) + m queries.

The upper bound in Theorem 26 should be contrasted with the lower bound of log(n)+m (Corollary 18).

5.2 Symmetric Network Congestion Games on Directed Acyclic Graphs

In this section, we consider symmetric network congestion games on directed acyclic graphs. Throughout this section, we consider the game Γ = (N, V, E, (fe)e E, o, d), where (V, E) is a directed acyclic graph (DAG). We use the relation to denote a topological ordering over the vertices in V . We assume that, for every vertex v V , there exists a path from o to v, and there exists a path from v to d. If either of these conditions does not hold for some vertex v, then v cannot appear on an o-d path, and so it is safe to delete v. We provide an algorithm that discovers a cost function for each edge. One immediate observation is that we can never hope to ﬁnd the actual cost functions. Consider the following one-player congestion game.

If we set fa(1) = fb(1) = 1 and fc(1) = fd(1) = 0, then all o-d paths have cost 1. However, we could also achieve the same property by setting fa(1) = fb(1) = 0 and setting fc(1) = fd(1) = 1. Thus, it is impossible to learn the actual cost functions using payoﬀqueries. To deal with this issue, we introduce the notion of an equivalent cost function: two cost functions are said to be equivalent if they assign the same cost to every strategy proﬁle. We show that, while it is impossible to ﬁnd the actual cost function via payoﬀqueries, we can use payoﬀqueries to ﬁnd an equivalent cost function.

Learning Equilibria of Games via Payoff Queries

Our algorithm proceeds inductively over the number of players in the game. For the base case, we give an algorithm that ﬁnds an equivalent cost function f such that f e(1) is deﬁned for every edge e. This corresponds to learning all the costs in a one-player congestion game played on Γ. Then, for the inductive step, we show how the costs for an i-player game can be used to ﬁnd the costs in an i + 1 player game. That is, we use the known values of f e(j) for j i to ﬁnd the cost of f e(i + 1) for every edge e. Therefore, at the end of the algorithm, we have an equivalent cost function f for an n-player game on Γ, and we can then apply a standard congestion game algorithm (Fabrikant et al., 2004) in order to solve our game. Unlike our work on parallel links, in this section we will not use over-queries at all. In each inductive step, when we are considering an i-player congestion game, we will make queries that use exactly i players. Thus, in the ﬁrst n 1 rounds we will use under-queries, and in the ﬁnal round we will use normal-queries. For the sake of brevity, in this section we will use the word query to refer to both normal and under-queries. As a shorthand for deﬁning queries, we use notation of the form s (1 7 p, 3 7 q). This example deﬁnes s to be a four-player query that assigns 1 player to p and 3 players to q, where p and q are paths from the origin to the destination in a symmetric network congestion game. We use Query(s) to denote the outcome of querying s. It returns a function cs, which gives the cost of each strategy when s is played.

5.2.1 Preprocessing

Our algorithm requires a preprocessing step. We say that edges e and e are dependent if visiting one implies that we must visit the other. More formally, e and e are dependent if, for every o-d path p, we either have e, e p, or we have e, e / p. We preprocess the game to ensure that there are no pairs of dependent edges. To do this, we check every pair of edges e and e , and test whether they are dependent. If they are, then we contract e , i.e., if e = (v, u), then we delete e , and set v = u. The following lemma shows that this preprocessing is valid, and therefore, from now on, we can assume that our congestion game contains no pair of dependent edges.

Lemma 27 There is an algorithm that, given a congestion game Γ, where (V, E) is a DAG, produces a game Γ with no pair of dependent edges, such that every Nash equilibrium of Γ

can be converted to a Nash equilibrium of Γ. The algorithm and conversion of equilibria take polynomial time and make zero payoﬀqueries. Moreover, payoﬀqueries to Γ can be trivially simulated with payoﬀqueries to Γ.

Proof Our algorithm will check, for each pair of edges e = (v, u) and e = (v , u ), whether e and e are dependent. This is done in the following way. Note that if v = v , then e and e

cannot possibly be dependent. Thus, we can assume without loss of generality that v v . The algorithm performs two checks:

Delete e and verify that there is no path from o to v .

Delete e and verify that there is no path from u to d.

The ﬁrst check ensures that every path that uses e must also use e. The second check ensures that every path that uses e must also use e . Thus, if both checks are satisﬁed,

Fearnley, Gairing, Goldberg, and Savani

then e and e are dependent. On the other hand, if one of the checks is not satisﬁed, then we can construct an o-d path that uses e and not e , or a path that uses e and not e, which veriﬁes that e and e are not dependent. Whenever the algorithm ﬁnds a pair of edges e, e E that are dependent, it contracts e . More formally, if e = (v, u), then the algorithm constructs a new congestion game Γ = (N, V , E , (f e)e E , o, d) where V = V \ {u}, and E contains:

every edge (w, x) E with and w = u, and

an edge (v, x) for every edge (u, x) E.

Note that E does not contain e . Moreover, we deﬁne the cost functions f as follows. For each edge e = e, we set f e (i) = fe (i) for all i. For the edge e, we deﬁne f e(i) = fe(i)+fe (i) for all i. We argue that this operation is correct. Since e and e are dependent, we have that, for every strategy proﬁle s, and for every o-d path p: X

e p f e (i) = X

e p fe (i).

Therefore, we can easily translate every Nash equilibrium of Γ into a Nash equilibrium for Γ. Moreover, every payoﬀquery for Γ can be translated into a payoﬀquery for Γ by adding the edge e where appropriate. Thus, the algorithm constructs a sequence of games Γ1, Γ2, . . . , where each game Γi+1 is obtained by contracting an edge in Γi. Moreover, the Nash equilibria for Γi+1 can be translated to Γi, which implies that the algorithm is correct. This algorithm can obviously be implemented in polynomial time. Moreover, since the algorithm only inspects structural properties of the graph, it does not make any payoﬀqueries.

5.2.2 Equivalent Cost Functions

As we have mentioned, we cannot hope to ﬁnd the actual cost function of Γ using payoﬀ queries. To deal with this, we introduce the following notion of equivalence.

Deﬁnition 28 (Equivalence) Two cost functions f and f are equivalent if for every strategy proﬁle s = (s1, s2, . . . , sn), we have P e si fe(ne(s)) = P e si f e(ni(s)), for all i.

Clearly, the Nash equilibria of a game cannot change if we replace its cost function f with an equivalent cost function f . We say that (f e)e E is a partial cost function if for some e E and some i n, f e(i) is undeﬁned. We say that f is an extension of f if f is a partial cost function, and if f e (i) = f e(i) for every e E and i n for which f e(i) is deﬁned. We say that f is a total extension of f if f is an extension of f , and if f e (i) is deﬁned for all e E and all i n.

Deﬁnition 29 (Partial equivalent cost function) Let f be a cost function. We say that f is a partial equivalent of f if f is a partial cost function, and if there exists a total extension f of f such that f is equivalent to f.

Learning Equilibria of Games via Payoff Queries

Our goal is to ﬁnd a total equivalent cost function by learning the costs one edge at a time. Thus, our algorithm will begin with a partial cost function f0 such that f0 e (i) is undeﬁned for all e E and all i n. Since it is undeﬁned everywhere, it is obvious that f0 is a partial equivalent of f. At every step of the algorithm, we will take a partial equivalent cost function f of f, and produce an extension f of f , such that f is still a partial equivalent of f. This guarantees that, when the algorithm terminates, the ﬁnal cost function is equivalent to f.

5.3 The One-Player Case

For the one player case, our algorithm is relatively straightforward. The algorithm proceeds iteratively by processing the vertices according to their topological order, starting from the origin vertex o, and moving towards the destination vertex d. Each time we process a vertex k, we determine the cost of every incoming edge (u, k). There are two diﬀerent cases: the case where k = d, and the case where k = d. For the latter case, we will observe that, once we know the cost of every edge other than the incoming edges to d, we can easily ﬁnd the cost of the incoming edges to d. The former case is slightly more complicated. When we consider a vertex k = d, it turns out that we cannot ﬁnd the actual costs for the incoming edges at k. Instead, we can use payoﬀqueries to discover the diﬀerence in cost between each pair of incoming edges, and therefore, we can ﬁnd the cheapest incoming edge e to k. We proceed by ﬁxing the cost of e to be 0. Once we have done this, we can then set the cost of each other incoming edge e according to the diﬀerence between the cost of e and the cost of e , which we have already discovered. We prove that this approach is correct by showing that it yields a partial equivalent cost function. We now formally describe our algorithm. The algorithm begins with the partial cost function f0. The algorithm processes vertices iteratively according to the topological ordering . Suppose that we are in iteration a + 1 of the algorithm, and that we are processing a vertex k V . We have a partial equivalent cost function fa such that fa e (1) is deﬁned for every edge e = (v, u) with u k, for some vertex k. We then produce a partial equivalent cost function fa+1 such that fa+1 e (1) is deﬁned for every edge e = (v, u) with u k. We now consider the two cases.

5.3.1 The k = d Case

We use the procedure shown in Algorithm 4 to process k. Lines 1 through 3 simply copy the old cost function fa into the new cost function fa+1. This ensures that fa+1 is an extension of fa. The algorithm then picks an arbitrary k-d path p. The loop on lines 5 through 10 compute the function t, which for each incoming edge e = (v, k), gives the cost t(ep) of allocating one player to ep. Note, in particular, that the value of the expression P e p fa e (1) is known to the algorithm, because every vertex visited by p has already been processed. The algorithm then selects e to be the edge that minimizes t, and sets the cost of e to be 0. Once it has done this, lines 13 through 15 compute the costs of the other edges relative to e . When we set the cost of e to be 0, we are making use of equivalence. Suppose that the actual cost of e is ce . Setting the cost of e to be 0 has the following eﬀects:

Fearnley, Gairing, Goldberg, and Savani

Algorithm 4 Process K Input: A partial equivalent cost function fa, such that fa e (1) is deﬁned for all edges (v, u) with u k. Output: A partial equivalent cost function fa+1, such that fa+1 e (1) is deﬁned for all edges (v, u) with u k.

1: for all e for which fa e (1) is deﬁned do

2: fa+1 e (1) fa e (1)

4: p an arbitrary k-d path

5: for all e = (v, k) E do

6: p an arbitrary o-v path

7: s (1 7 p ep)

8: cs Query(s)

9: t(ep) cs(p ep) P e p fa e (1)

10: end for

11: e edge e = (v, k) that minimizes t(ep)

12: fa+1 e (1) 0

13: for all e = (v, k) E with e = e do

14: fa+1 e (1) t(ep) t(e p)

15: end for

Every incoming edge at k has its cost reduced by ce .

Every outgoing edge at k has its cost increased by ce .

This maintains equivalence with the original cost function, because for every path p that passes through k, the total cost of p remains unchanged. The following lemma formalizes this and proves that fa+1 is indeed a partial equivalent cost function.

Lemma 30 Let k = d be a vertex, and let fa be a partial equivalent cost function such that fa e (1) is deﬁned for all edges e = (v, u) with u k. When given these inputs, Algorithm 4 computes a partial equivalent cost function fa+1 such that fa+1 e (1) is deﬁned for all edges e = (v, u) with u k.

Proof It can be veriﬁed that the algorithm assigns a cost to fa+1 e (1) for every edge e = (v, u) with u k. To complete the proof of the lemma, we must show that fa+1 is a partial equivalent cost function. Since fa is a partial equivalent cost function, there must exist a total extension of fa that is equivalent to f. Let f denote such an extension. We use f to construct f , which is a total extension of fa+1 that is equivalent to f. Let e = (v, k) be an incoming edge at k. We begin by deriving a formula for t(ep), which is computed on line 9. Note that, since f is equivalent to f, we have cs(p ep) = P e p ep f e (1). Note also that f e (1) = fa e (1) for every edge e p . Therefore, we have the

Learning Equilibria of Games via Payoff Queries

t(ep) = cs(p ep) X

e p fa e (1)

e p ep f e (1) X

e p f e (1)

e ep f e (1).

For each edge e = (v, k) with e = e , line 14 sets:

fa+1 e (1) = t(ep) t(e p)

e ep f e (1) X

e e p f e (1)

= f e(1) f e (1).

Note also that line 12 sets:

fa+1 e (1) = 0 = f e (1) f e (1).

Hence, we can conclude that fa+1 e (1) = f e(1) f e (1) for every incoming edge e = (v, k). We construct the total cost function f as follows. For every edge e = (v, u), and every i n, we set:

f e(i) f e (1) if u = k, f e(i) + f e (1) if v = k, f e(i) otherwise.

Since we have shown that fa+1 e (1) = f e(1) f e (1) for every incoming edge e = (v, k), we have that f e (1) is a total extension of fa+1. We must now show that f e and f are equivalent. We will do this by showing that f

and f are equivalent. Let s = (s1, s2, . . . , sn) be an arbitrarily chosen strategy proﬁle. If si does not visit k, then we have: X

e si f e (ne(s)) = X

e si f e(ne(s)).

On the other hand, if si does visit k, then it must use exactly one edge (v, u) with u = k, and exactly one edge (v, u) with v = k. Therefore, we have: X

e si f e (ne(s)) = X

e si f e(ne(s)) f e (1) + f e (1)

e si f e(ne(s)).

Therefore, f is equivalent to f , which also implies that it is equivalent to f. Thus, we have found a total extension of fi+1 that is equivalent to f, as required.

Fearnley, Gairing, Goldberg, and Savani

5.3.2 The k = d Case

When the algorithm processes d, it will have a partial cost function fa such that fa e (1) is deﬁned for every edge e = (v, u) with u = d. The algorithm is required to produce a partial cost function fa+1 such that fa+1 e (1) is deﬁned for all e E. We use Algorithm 5 to do this. Lines 1 through 3 ensure that fa+1 is equivalent to fa. Then, the algorithm loops

Algorithm 5 Process D Input: A partial equivalent cost function fa, such that fa e (1) is deﬁned for all edges e = (v, u) with u d. Output: A partial equivalent cost function fa+1, such that fa e (1) is deﬁned for all edges e E.

1: for all e for which fa e (1) is deﬁned do

2: fa+1 e (1) fa e (1)

4: for all e = (v, d) E do

5: p an arbitrary o-v path

6: s (1 7 pe)

7: cs Query(s)

8: fa+1 e (1) cs(pe) P e p fa e (1)

through each incoming edge e = (v, d), and line 8 computes fa+1 e (1). Note, in particular, that fa e (1) is deﬁned for every edge e p, and thus the computation on line 8 can be performed. Lemma 31 shows that Algorithm 5 is correct.

Lemma 31 Let k = d be a vertex, and let fa be a partial equivalent cost function deﬁned for all edges (v, u) with u d. When given these inputs, Algorithm 5 computes a partial equivalent cost function fa+1.

Proof Since fa is a partial equivalent cost function, there must exist a cost function f

that is an extension of fa, where f is equivalent to f. We show that f is also an extension of fa+1. Let e = (v, d) be an incoming edge at d. Consider line 8 of the algorithm. Note that, since f is equivalent to f, we have cs(pe) = P e pe f e (1). Furthermore, since f is an extension of fa+1, we have fa e (1) = f e (1) for every e p. Therefore, we have:

fa+1 e (1) = cs(pe) X

e p fa e (1)

e pe f e (1) X

e p f e (1)

We also have fa+1 e (1) = f e(1) for every edge e = (v, u) with u d, and we have shown that fa+1 e (1) = f e(1) for every edge e = (v, u) with u = d. Therefore f is an extension of fa+1, which implies that fa+1 is a partial equivalent cost function.

Learning Equilibria of Games via Payoff Queries

The algorithm makes exactly |E| payoﬀqueries in order to ﬁnd the one-player costs. When Algorithm 4 processes a vertex k, it makes exactly one query for each incoming edge (v, k) at k. The same property holds for Algorithm 5. This implies that, in total, the algorithm makes |E| queries.

5.4 The Many-Player Case

In this section, we will assume that we have a partial equivalent cost function fa such that fa e (j) is deﬁned whenever j i. We will give an algorithm that goes through a sequence of iterations and produces a partial cost function fa , such that fa e (j) is deﬁned whenever j i + 1. The algorithm for the many-player case proceeds in a similar fashion to the algorithm for the one-player case. The algorithm is still iterative, and it still processes vertices according to their topological order, starting from the origin o, and moving towards the destination d. In this algorithm, when we process a vertex k, we will discover, for each incoming edge e to k, the cost of placing i + 1 players on e. However, there is an additional complication. Our technique for discovering the cost of placing i + 1 players on the incoming edge at k requires two edge disjoint paths from k to d, but there is no reason at all to assume that two such paths exist. We say that an edge e is a bridge between two vertices v and u, if every v-u path contains e. Furthermore, if we ﬁx a vertex k V , then we say that an edge e is a k-bridge if e is a bridge between k-d. The following lemma can be proved using the max-ﬂow min-cut theorem and is a variant of Menger s theorem.

Lemma 32 Let v and u be two vertices. There are two edge disjoint paths between v and u if, and only if, there is no bridge between v and u.

Proof Let (V, E) be a graph, and let v, u V be two vertices. We construct a network ﬂow instance where every edge e E has capacity 1, and we ask for the maximum ﬂow between v and u. Since each edge has capacity 1, we have that the maximum ﬂow between v and u is greater than 1 if, and only if, there are two edge-disjoint paths between v and u. Moreover, by the max-ﬂow min-cut theorem, the maximum ﬂow from v to u is greater than 1 if and only if there is no bridge between v and u.

As a consequence of Lemma 32, we can only process k if there are no k-bridges. To resolve this, before attempting to process k, we ﬁrst use a separate algorithm to determine the cost of placing i + 1 players on each k-bridge. After doing this, we can then ﬁnd two k-d paths that are edge disjoint except for k bridges. This, combined with the fact that we know the cost of placing i+1 players on each k-bridge, is suﬃcient to allow us to process k. The remainder of this section will proceed as follows. We ﬁrst describe our algorithm for ﬁnding the costs of the k bridges. After doing so, we then describe our algorithm for processing k.

Fearnley, Gairing, Goldberg, and Savani

5.4.1 Bridges

Given a vertex k, we show how to determine the cost of the k-bridges. Let b1, b2, . . . , bm denote the list of k-bridges sorted according to the topological ordering . That is, if b1 = (v1, u1), and b2 = (v2, u2), then we have v1 v2, and so on. Our algorithm is given a partial cost function fa, such that fa e (j) is deﬁned for all j i, and returns a cost function fa+1 that is an extension of fa where, for all ℓ, we have that fa+1 bℓ (i + 1) is deﬁned. Our algorithm processes the k-bridges in reverse topological order, starting with the ﬁnal bridge bm. Suppose that we are processing the bridge bj = (v, u). We will make one payoﬀquery to ﬁnd the cost of bj, which is described by the following diagram.

o k v u d bj

The dashed lines in the diagram represent paths. They must satisfy some special requirements, which we now describe. The paths p4 and p5 must be edge disjoint, apart from k-bridges. The following lemma shows that we can always select two such paths.

Lemma 33 For each k-bridge bj = (v, u), there exists two paths p4 and p5 from u to d such that p4 p5 = {bj+1, bj+2, . . . bm}.

Proof Note that for each ℓ, there cannot exist a bridge between bℓand bℓ+1. Therefore, we can apply Lemma 32 to argue that there must exist two edge-disjoint paths between bℓ and bℓ+1 For the same reason, we can ﬁnd two edge-disjoint paths between bm and d. To complete the proof, we simply concatenate these paths.

On the other hand, the paths p1, p2, and p3 must satisfy a diﬀerent set of constraints, which are formalized by the following lemma.

Lemma 34 Let bj = (v, u) be a k-bridge, let p2 be an arbitrarily chosen o-k path. There exists an o-k path p1 and a k-v path p3 such that: p1 and p3 are edge disjoint; and if p1 visits k, then p2 and p1 use diﬀerent incoming edges for k.

Proof We show how p1 and p3 can be constructed. This splits into two cases, and we begin by considering the bridges bj with j > 1. Due to our preprocessing from Lemma 27, bj and bj 1 cannot be dependent. Note that every o-d path that uses bj 1 must also use bj. Therefore, there must exist an o-d path p that uses bj and not bj 1. We ﬁx p1 to be the preﬁx of p up to the point where it visits bj. Let p 3 be an arbitrarily selected path from k to bj 1. Note that p1 cannot share an edge with p 3, because otherwise p1 would be forced to visit bj 1. We now show how p 3 can be extended to reach bj without intersecting p1. Since there are no bridges between bj 1 and bj, we can apply Lemma 32 to obtain two edge-disjoint paths q and q from bj 1 to bj. If one of these paths does not intersect with p1, then we are done. Otherwise suppose, without loss of generality, that p1 intersects with q before it intersects with q . We create a path p 1 that follows p1 until the ﬁrst intersection with q, and

Learning Equilibria of Games via Payoff Queries

follows q after that. Since q and q are disjoint, the paths p 1 and p 3q satisfy the required conditions. Now we consider the bridge b1. If k has at least two incoming edges, then we can apply Lemma 32 to ﬁnd two edge disjoint paths from k to b1, and we can easily construct p1 and p3 using these paths. Otherwise, let e be the sole incoming edge at k. Since e and b1 are not dependent, we can ﬁnd a path p1 from o to b1 which does not use e, and we can use the same technique as we did for j > 1 to ﬁnd a path p3 from k to b1 that does not intersect with p1.

Algorithm 6 Find KBridges(k) Input: A vertex k, and a partial equivalent cost function fa, such that fa e (j) is deﬁned for every j i. Output: A partial equivalent cost function fa+1, such that fa+1 is an extension of fa, and fa+1 e is deﬁned for every e that is a k bridge.

1: for all e and j for which fa e (j) is deﬁned do

2: fa+1 e (j) fa e (j)

4: for j = m to 1 do

5: p4, p5 paths chosen according to Lemma 33

6: p1, p2, p3 paths chosen according to Lemma 34

7: s (1 7 p1bjp4, i 7 p2p3bjp5)

8: cs Query(s)

9: fa+1 bj (i + 1) cs(p1bjp4) P e p1 fa+1 e (ne(s)) P e p4 fa+1 e (ne(s))

10: end for

Algorithm 6 shows how the cost of placing i + 1 players on each of the k-bridges can be discovered. Note that on line 9, since s assigns one player to p1, we have ne(s) = 1 for every e p1. Therefore, fa+1 e (ne(s)) is known for every edge e p1. Moreover, for every edge e p4, we have that ne(s) = i+1 if e is a k-bridge, and we have ne(s) = 1, otherwise. Since the algorithm processes the k-bridges in reverse order, we have that fa+1 e (ne(s)) is deﬁned for every edge e p4. The following lemma shows that line 9 correctly computes the cost of bj.

Lemma 35 Let k be a vertex, and let fa be a partial equivalent cost function, such that fa e (j) is deﬁned for every j i. Algorithm 6 computes a partial equivalent cost function fa+1, such that fa+1 is an extension of fa, and fa+1 e is deﬁned for every e that is a k-bridge.

Proof It can be veriﬁed that the algorithm constructs a partial cost function fa+1 that is an extension of fa, where fa+1 e is deﬁned for every e that is a k-bridge. We must show that fa+1 is partially equivalent to f. Since fa is partially equivalent to f, there exists some total cost function f that is an extension of fa, such that f is equivalent to f. We will show that f is also an extension of fa+1. We will do so inductively. The inductive hypothesis is that fa+1 e (i + 1) = f e(i + 1) for every e = bl with ℓ> j. The base case, where j = m, is trivial, because there are no k-bridges bl with ℓ> m. Now suppose that we have shown the inductive hypothesis for

Fearnley, Gairing, Goldberg, and Savani

some j. We show that fa+1 bj (i + 1) = f bj(i + 1). Let s be the strategy queried when the algorithm considers bj. Consider an edge e p1. By Lemma 34, we have that ne(s) = 1. By assumption, we have that fa+1 e (1) = fa e (1) for every edge e, and therefore fa+1 e (ne(s)) = f e(ne(s)) for every edge e p1. Now consider an edge e p4. By Lemma 33, we have that ne(s) = 1 whenever e is not a k-bridge, and we have ne(s) = i + 1 whenever e is a k-bridge. Therefore, by the inductive hypothesis, we have that fa+1 e (ne(s)) = f e(ne(s)) for every e p4. Since f is equivalent to f, we have that cs(p1blp3) = P e p1blp3 f e. Therefore, line 9 sets:

fa+1 bj (i + 1) = cs(p1bjp4) X

e p1 fa+1 e (ne(s)) X

e p4 fa+1 e (ne(s))

e p1bjp4 f e(ne(s)) X

e p1 f e(ne(s)) X

e p4 f e(ne(s))

= f bj(ne(s)) = f bj(i + 1).

Thus, the algorithm correctly sets fa+1 bj (i + 1) = f bj(i + 1).

5.4.2 Incoming Edges of k

We now describe the second part of the many-player case. After ﬁnding the cost of each k-bridge, we ﬁnd the cost of each incoming edge at k. The following diagram describes how we ﬁnd the cost of e = (v, k), an incoming edge at k .

o v k d e p

The path p is an arbitrarily chosen path from o to v. The paths p1 and p2 are chosen according to the following lemma.

Lemma 36 There exist two k-d paths p1, p2 such that every edge in p1 p2 is a k-bridge.

Proof Let b1 be the ﬁrst k-bridge. By Lemma 32 there exists edge disjoint paths from k to b1. The proof can then be completed by applying Lemma 33.

Algorithm 7 shows how we ﬁnd the cost of putting i + 1 players on each edge e that is incoming at k. Apart from the consideration of k-bridges, this algorithm uses the same technique as Algorithm 4. Consider line 9. Note that every vertex in p is processed before k is processed, and therefore fa+1 e (i+1) is known for every e p. Moreover, for every edge e p1, we have that ne (s) = i + 1 if e is a k-bridge, and we have ne (s) = 1 otherwise. In either case, the fa+1 e (ne (s)) is known for every edge e p1. The following lemma show that line 9 correctly computes fa+1 e (i + 1).

Learning Equilibria of Games via Payoff Queries

Algorithm 7 Multi Process K Input: A vertex k, and a partial equivalent cost function fa, such that fa e (j) is deﬁned for all e E when j i, all e = (v, u) with u k when j = i + 1, and all k-bridges when j = i + 1. Output: A partial equivalent cost function fa, such that fa e (j) is deﬁned for all e E when j i, and for all e = (v, u) with u k when j = i + 1.

1: for all e and j for which fa e (j) is deﬁned do

2: fa+1 e (j) fa e (j)

4: for all e = (v, k) E do

5: p an arbitrary o-v path

6: p1, p2 paths chosen according to Lemma 36

7: s (1 7 pep1, i 7 pep2)

8: cs Query(s)

9: fa+1 e (i + 1) cs(pep1) P e p fa+1 e (i + 1) P e p1 fa+1 e (ne (s)).

10: end for

Lemma 37 Let k be a vertex, and let fa be a partial equivalent cost function, such that fa e (j) is deﬁned for all e E when j i, all e = (v, u) with u k when j = i + 1, and all k-bridges when j = i + 1. Algorithm 7 produces a partial equivalent cost function fa+1, such that fa+1 e (j) is deﬁned for all e E when j i, and for all e = (v, u) with u k when j = i + 1.

Proof It can be veriﬁed that the algorithm constructs a partial cost function fa+1 that is deﬁned for the correct parameters. We must show that fa+1 is partially equivalent to f. Note that fa+1 is an extension of fa. Since fa is partially equivalent to f, there exists some total cost function f that is an extension of fa, such that f is equivalent to f. We will show that f is also an extension of fa+1. Let e = (v, k) be an incoming edge at k. We will show that fa+1 e (i + 1) = f e(i + 1). Let s be the strategy that the algorithm queries while processing e. Since f is equivalent to f, we have that cs(pep1) = P e pep1 f e (ne (s)). For every edge e p1, we have ne (s) = i + 1. Since every vertex w visited by p satisﬁes w k, for every e p1 we must have fa+1 e (ne (s)) = fa e (ne (s)) = f e (ne (s)). For every edge e p1, we have ne (s) = 1 if e is not a k-bridge, and we have ne (s) = i + 1 if e is a k-bridge. In either case, we have that fa+1 e (ne (s)) = fa e (ne (s)) = f e (ne (s)) for every edge e p1. Therefore, line 9 sets:

fa+1 e (i + 1) = cs(pep1) X

e p fa+1 e (ne (s)) X

e p1 fa+1 e (ne (s))

e pep1 f e (ne (s)) X

e p f e (ne (s)) X

e p1 f e (ne (s))

= f e(ne(s)) = f e(i + 1).

Therefore, for each incoming edge e = (v, k), we have that fa+1 e (i + 1) = f e(i + 1). Hence, f is an extension of fa+1, which implies that fa+1 is partially equivalent to f.

Fearnley, Gairing, Goldberg, and Savani

5.4.3 Query Complexity

We argue that the algorithm can be implemented so that the costs for (i + 1) players can be discovered using at most |E| many payoﬀqueries. Every time Algorithm 6 discovers the cost of placing i + 1 players on a k-bridge, it makes exactly one payoﬀquery. Every time Algorithm 7 discovers the cost of an incoming edge (v, k), it makes exactly one payoﬀ query. The key observation is that the costs discovered by Algorithm 6 do not need to be rediscovered by Algorithm 7. That is, we can modify Algorithm 7 so that it ignores every incoming edge (v, k) that has already been processed by Algorithm 6. This modiﬁcation ensures that the algorithm uses precisely |E| payoﬀqueries to discover the edge costs for i + 1 players. This gives us the following theorem.

Theorem 38 Let Γ be a symmetric network congestion game with n-players played on a DAG with |E| edges. The payoﬀquery complexity of ﬁnding a Nash equilibrium in Γ is at most n |E|.

6. Conclusions and Further Work

We ﬁrst consider open questions in the setting of payoﬀqueries, which has been the main setting for the results in this paper. We then consider alternative query models.

6.0.1 Open Questions Concerning Payoff Queries

In the context of strategic-form games, there are a number of open problems. In Theorem 13, we show a super-linear lower bound on the payoﬀquery complexity when ϵ is allowed to depend on k. Can we prove a super-linear lower bound for a constant ϵ? Is there a deterministic algorithm that can ﬁnd an ϵ-Nash equilibrium with ϵ < 1

2 without querying the entire payoﬀmatrices? Fearnley and Savani (2014) achieve ϵ < 1 2 with the use of randomization, but doing so with a deterministic algorithm appears to be challenging. Finally, when 2 i k 1, we have shown that the payoﬀquery complexity of ﬁnding a (1 1

i )-Nash equilibrium lies somewhere in the range [k i + 1, 2k i + 1]. Determining the precise payoﬀquery complexity for this case is an open problem. For congestion games, our lower bound of log n+m arises from a game with two parallel links and a one-player game with m links. The upper bound of O log(n) log2(m) log log(m) + m

is a poly-logarithmic factor oﬀfrom this lower bound, with the factor depending on m. Can this factor be improved? It seems unlikely that the dependence of this factor on m can be completely removed, in which case, in order to provide tight bounds, a single lower bound construction that depends simultaneously on n and m would be necessary. For symmetric network congestion games on DAGs it is unclear whether the payoﬀ query complexity is sub-linear in n. Non-trivial lower and upper bounds for more general settings, such as asymmetric network congestion games (DAG or not) or general (nonnetwork) congestion games would also be interesting.

6.0.2 Other Query Models

We have deﬁned a payoﬀquery as given by a pure (not mixed) proﬁle s, since that is of main relevance to empirical game-theoretic modelling. Furthermore, if s was a mixed

Learning Equilibria of Games via Payoff Queries

proﬁle, it could be simulated by sampling a number of pure proﬁles from s and making the corresponding sequence of pure payoﬀqueries. An alternative deﬁnition might require a payoﬀquery to just report a single speciﬁed player s payoﬀ, but that would change the query complexity by a factor at most n. Our main results have related to exact payoﬀqueries, though other query models are interesting too. A very natural type of query is a best-response query, where a strategy s is chosen, and the algorithm is told the players best responses to s. In general s may have to be a mixed strategy; it is not hard to check that pure-strategy best response queries are insuﬃcient; even for a two-player two-action game, knowledge of the best responses to pure proﬁles is not suﬃcient to identify an ϵ-Nash equilibrium for ϵ < 1

2. Fictitious Play (Fudenberg and Levine 1998, Chapter 2) can be regarded as a query protocol that uses best-response queries (to mixed strategies) to ﬁnd a Nash equilibrium in zero-sum games, and essentially a 1/2-Nash equilibrium in general-sum games (Goldberg et al., 2013). We can always synthesize a pure best-response query with n(k 1) payoﬀqueries. Hence, for questions of polynomial query complexity, payoﬀqueries are at least as powerful as bestresponse queries. Are there games where best-response queries are much more useful than payoﬀqueries? If k is large then it is expensive to synthesize best-response queries with payoﬀqueries. The DMP-algorithm (Daskalakis et al., 2009b) ﬁnds a 1

2-Nash equilibrium via only two best-response queries, whereas Theorem 5 notes that O(k) payoﬀqueries are needed. A noisy payoﬀquery outputs an observation of a random variable taking values in [0, 1] whose expected value is the true payoﬀ. Alternative versions might assume that the observed payoﬀis within some distance ϵ from the true payoﬀ. Noisy query models might be more realistic, and they are suggested by by the experimental papers on querying games. However in a theoretical context, one could obtain good approximations of the expected payoﬀs for a proﬁle s, by repeated sampling. It would interesting to understand the power of diﬀerent query models.

Acknowledgments

We would like to thank Michael Wellman for interesting discussions on this topic, and Milind Tambe for discussions on its relationship with adversarial security games. This work was supported by ESRC grant ESRC/BSB/09, and EPSRC grants EP/K01000X/1, EP/J019399/1, EP/H046623/1, and EP/L011018/1.

N. Alon, Y. Emek, M. Feldman, and M. Tennenholtz. Economical graph discovery. In Proc. of ICS, pages 476 486, 2011.

D. Angluin. Learning regular sets from queries and counterexamples. Information and Computation, 75(2):87 106, 1987.

D. Angluin. Queries and concept learning. Machine Learning, 2(4):319 342, 1988.

Y. Babichenko. Query complexity of approximate Nash equilibria. In Proc. of STOC, 2014.

Fearnley, Gairing, Goldberg, and Savani

Y. Babichenko and S. Barman. Query complexity of correlated equilibrium. Co RR, abs/1306.2437, 2013.

X. Bei, N. Chen, and S. Zhang. On the complexity of trial and error. In Proc. of STOC, pages 31 40, 2013.

P. Berenbrink, T.K. Friedetzky, L.A. Goldberg, P.W. Goldberg, and R. Martin. Distributed selﬁsh load balancing. SIAM Journal on Computing, 37:1163 1181, 2007.

M. Brown, W.B. Haskell, and M. Tambe. Addressing scalability and robustness in security games with multiple boundedly rational adversaries. In Proc. of Game Sec, 2014.

S. Chien and A. Sinclair. Convergence to approximate Nash equilibria in congestion games. Games and Economic Behavior, 71(2):315 327, 2011.

C. Daskalakis and C.H. Papadimitriou. On oblivious PTAS s for Nash equilibrium. In Proc. of STOC, pages 75 84, 2009.

C. Daskalakis, P.W. Goldberg, and C.H. Papadimitriou. The complexity of computing a Nash equilibrium. SIAM Journal on Computing, 39(1):195 259, May 2009a.

C. Daskalakis, A. Mehta, and C.H. Papadimitriou. A note on approximate Nash equilibria. Theoretical Computer Science, 410(17):1581 1588, 2009b.

C. Daskalakis, R. Frongillo, C.H. Papadimitriou, G. Pierrakos, and G. Valiant. On learning algorithms for Nash equilibria. In Proc. of SAGT, pages 114 125, Oct 2010.

Q. Duong, Y. Vorobeychik, S. Singh, and M. Wellman. Learning graphical game models. In Proc. of IJCAI, pages 116 121, 2009.

E. Even-Dar, A. Kesselmann, and Y. Mansour. Convergence time to Nash equilibria. In Proc. of ICALP, pages 502 513, 2003.

A. Fabrikant, C.H. Papadimitriou, and K. Talwar. The complexity of pure Nash equilibria. In Proc. of STOC, pages 604 612, 2004.

J. Fearnley and R. Savani. Finding approximate Nash equilibria of bimatrix games via payoﬀqueries. In Proc. of EC, pages 657 674, 2014.

J. Fearnley, P.W. Goldberg, R. Savani, and T.B. Sørensen. Approximate well-supported Nash equilibria below two-thirds. In Proc. of SAGT, pages 108 119, 2012.

J. Fearnley, M. Gairing, P.W. Goldberg, and R. Savani. Learning equilibria of games via payoﬀqueries. In Proc. of EC, pages 397 414, 2013.

T. Feder, H. Nazerzadeh, and A. Saberi. Approximating Nash equilibria using small-support strategies. In Proc. of EC, pages 352 354, 2007.

R. Feldmann, M. Gairing, T. L ucking, B. Monien, and M. Rode. Nashiﬁcation and the coordination ratio for a selﬁsh routing game. In Proc. of ICALP, pages 514 526, 2003.

Learning Equilibria of Games via Payoff Queries

S. Fischer, H. R acke, and B. V ocking. Fast convergence to Wardrop equilibria by adaptive sampling methods. In Proc. of STOC, pages pp. 653 662, 2006.

D. Fudenberg and D.K. Levine. The Theory of Learning in Games. MIT Press, 1998.

M. Gairing and R. Savani. Computing stable outcomes in hedonic games. In Proc. of SAGT, pages 174 185, 2010.

M. Gairing and R. Savani. Computing stable outcomes in hedonic games with voting based deviations. In Proc. of AAMAS, pages 559 566, 2011.

M. Gairing, T. L ucking, M. Mavronicolas, B. Monien, and M. Rode. Nash equilibria in discrete routing games with convex latency functions. Journal of Computer and System Sciences, 74:1199 1225, 2008.

M. Gairing, T. L ucking, M. Mavronicolas, and B. Monien. Computing Nash equilibria for scheduling on restricted parallel links. Theory Comput. Syst., 47(2):405 432, 2010.

P.W. Goldberg. Bounds for the convergence rate of randomized local search in a multiplayer, load-balancing game. In Proc. of PODC, pages 131 140, 2004.

P.W. Goldberg and C.H. Papadimitriou. Reducibility among equilibrium problems. In Proc. of STOC, pages 61 70, 2006.

P.W. Goldberg and A. Pastink. On the communication complexity of approximate Nash equilibria. In Proc. of SAGT, pages 192 203, 2012.

P.W. Goldberg and A. Roth. Bounds for the query complexity of approximate equilibria. In Proc. of EC, pages 639 656. ACM, June 2014.

P.W. Goldberg, R. Savani, T.B. Sørensen, and C. Ventre. On the approximation performance of ﬁctitious play in ﬁnite games. International Journal of Game Theory, 42(4): 1059 1083, 2013.

S. Hart and Y. Mansour. How long to equilibrium? The communication complexity of uncoupled equilibrium procedures. Games and Economic Behavior, 69:107 126, 2010.

S. Hart and A. Mas-Colell. Uncoupled dynamics do not lead to Nash equilibrium. American Economic Review, 93(5):1830 1836, 2003.

S. Hart and A. Mas-Colell. Stochastic uncoupled dynamics and Nash equilibrium. Games and Economic Behavior, 57(2):286 303, 2006.

S. Hart and N. Nisan. The query complexity of correlated equilibria. In Proc. of SAGT, 2013.

P.R. Jordan, Y. Vorobeychik, and M.P. Wellman. Searching for approximate equilibria in empirical games. In Proc. of AAMAS, pages 1063 1070, 2008.

P.R. Jordan, L.J. Schvartzman, and M.P. Wellman. Strategy exploration in empirical games. In Proc. of AAMAS, pages 1131 1138, 2010.

Fearnley, Gairing, Goldberg, and Savani

M. Kearns, M. Littman, and S. Singh. Graphical models for game theory. In Proc. of UAI, pages 253 260, 2001.

S.C. Kontogiannis and P.G. Spirakis. Well supported approximate equilibria in bimatrix games. Algorithmica, 57(4):653 667, 2010.

T.H. Nguyen, R. Yang, A. Azaria, S. Kraus, and M. Tambe. Analyzing the eﬀectiveness of adversary modeling in security games. In Proc. of AAAI, 2013.

E. Nudelman, J. Wortman, Y. Shoham, and K. Leyton-Brown. Run the GAMUT: A comprehensive approach to evaluating game-theoretic algorithms. In Proc. of AAMAS, pages 880 887, 2004.

A. Sureka and P.R. Wurman. Using tabu best-response search to ﬁnd pure strategy Nash equilibria in normal form games. In Proc. of AAMAS, pages 1023 1029, 2005.

Y. Vorobeychik, M.P. Wellman, and S. Singh. Learning payoﬀfunctions in inﬁnite games. Machine Learning, 67:145 168, 2007.

M.P. Wellman. Methods for empirical game-theoretic analysis. In Proc. of AAAI, pages 1552 1555, 2006.

R. Yang, C. Kiekintveld, F. Ord o nez, M. Tambe, and R. John. Improving resource allocation strategies against human adversaries in security games: An extended study. Artiﬁcial Intelligence, 195:440 469, 2013.