# disarmament_games_with_resource__971d3696.pdf

Disarmament Games with Resources

Yuan Deng, Vincent Conitzer Department of Computer Science Duke University Durham, NC 27708, USA {ericdy,conitzer}@cs.duke.edu

A paper by Deng and Conitzer in AAAI 17 introduces disarmament games, in which players alternatingly commit not to play certain pure strategies. However, in practice disarmament usually does not consist in removing a strategy, but rather in removing a resource (and doing so rules out all the strategies in which that resource is used simultaneously). In this paper, we introduce a model of disarmament games in which resources, rather than strategies, are removed. We prove NP-completeness of several formulations of the problem of achieving desirable outcomes via disarmament. We then study the case where resources can be fractionally removed, and prove a result analogous to the folk theorem that all desirable outcomes can be achieved. We show that we can approximately achieve any desirable outcome in a polynomial number of rounds, though determining whether a given outcome can be obtained in a given number of rounds remains NP-complete.

Introduction Computing game-theoretic solutions has long been a topic of interest to AI researchers and other computer scientists. In earlier days, the main focus was on computing Nash equilibria (Gilboa and Zemel 1989; Porter, Nudelman, and Shoham 2008; Conitzer and Sandholm 2008; Daskalakis, Goldberg, and Papadimitriou 2009; Chen, Deng, and Teng 2009). While computing Nash equilibria remains a very active topic of research, much of the recent work on computing gametheoretic solutions done in the AI community has focused on computing optimal strategies to commit to, also referred to as Stackelberg strategies (Conitzer and Sandholm 2006; von Stengel and Zamir 2010). Algorithms for computing these solutions have been deployed in a variety of real-world domains (Tambe 2011). All this work concerns models in which one player commits, and then another responds to the commitment. A recent exception is a paper by Deng and Conitzer (2017), who show that sometimes, much more can be achieved by allowing players to go back and forth, alternatingly committing not to play certain pure strategies. They call the resulting games disarmament games. In these disarmament games, there is a disarmament phase before the players play the game. In a step of this

Copyright c 2018, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

phase, the current player can remove any of his pure strategies in the normal-form game. Once players stop disarming, the players play the normal-form game consisting of the remaining pure strategies. This results in a larger (extensiveform) game that includes the disarmament phase, and the goal is to identify desirable equilibria of this larger game, which generally result in much better outcomes than the equilibria of the game without disarmament.1 That is, we try to have players disarm so as to achieve a better outcome, but we must ensure that players are incentivized to take these disarmament steps. However, in real disarmament games, the removal of individual pure strategies would often be hard to achieve. It is often possible to remove a resource (say, a particular weapon), but doing so generally eliminates many strategies, namely all those that involved the use of that resource. Moreover, removing resources does not allow one to remove the pure strategy of not using any resources at all. Finally, even in an environment where it is somehow possible to remove individual pure strategies (i.e., subsets of resources), the number of pure strategies is exponential. Moreover, Deng and Conitzer (2017) only consider Nash equilibrium as the solution concept, which may consist of incredible threats in subgames of the extensive-form disarmament games. However, it is hard to generalize to subgame perfect equilibrium when considering normal-form games since the computation of Nash equilibrium in a normal-form game is already hard. Motivated by these observations, in this paper, we directly model the removal of resources instead of pure strategies. To do so, we deﬁne multi-resource games, in which each player has some resources, the use of which will impact the players differently. We then consider a disarmament model for these games and investigate whether and how we can reach desirable outcomes for several variants (requiring subgame perfection or not and allowing fractional removals or not).

Example 1. Suppose that there are two players, 0 and 1. Each of them has two resources, x and y. Using x results in 1 utility for the player who plays it and 2 utility for the opponent; using y results in 0 utility for the player who plays

1There will also always be an undesirable equilibrium of the disarmament game where nobody disarms, but it seems reasonable to believe that better equilibria will be played in practice, if they can be identiﬁed.

The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18)

it and 2 utility for the opponent. Without disarmament, the Pareto optimal equilibrium of this game is that both players use only x, resulting in 1 utility for each. However, with disarmament, it is possible to remove all resources, so that both players end up with utility 0, a clear Pareto improvement. To achieve this, we can use the following disarmament schedule. First, player 0 removes x; then, player 1 removes x and y; and ﬁnally, player 0 removes y. This disarmament schedule can be sustained in an equilibrium of the larger disarmament game, because if player 1 decides to deviate from the schedule and not disarm, then 0 can play resource y, resulting in a utility of at most 1 for player 1, lower than what 1 would obtain for not deviating.

Deﬁnitions We ﬁrst introduce multi-resource games and then deﬁne disarmament games based on the multi-resource games. We only consider multi-resource games in which there are no interactions between the resources, i.e., their effects on the players utilities are additive. This restriction only strengthens the hardness results that we prove, because hardness results are stronger when they are proved in a more restricted setting. The positive results for fractional disarmament that we prove later in the paper can be generalized to hold under a much weaker assumption than additivity, but we do not do so here for the sake of readability. Deﬁnition 1. A 2-player multi-resource game is deﬁned by G = R0, R1, v0, v1 . For each player b {0, 1}, his set of resources is Rb and his strategy set is the power set 2Rb of his resource set Rb. His valuation function is vb : R0 R1 R, where vb(s) denotes player b s valuation for resource s R0 R1. The utility ub(S0, S1) of player b when player 0 plays S0 2R0 and 1 plays S1 2R1 is

ub(S0, S1) =

s S0 S1 vb(s).

In multi-resource games, we have the following simple characterization of Nash equilibria, which allows us to quickly compute a Nash equilibrium or even an optimal Nash equilibrium. For convenience, let Pb = {s R0 R1 | vb(s) > 0} be the set of resources that give player b positive utility and let Nb = {s R0 R1 | vb(s) < 0} be the set of resources that give player b negative utility. Proposition 1. In a multi-resource game G = R0, R1, v0, v1 , the strategy proﬁle (S0, S1) forms a Nash equilibrium if and only if Rb Pb Sb Rb \ Nb for b {0, 1}. The game induced by T0 R0 and T1 R1 is the two-player multi-resource game GT0,T1 = T0, T1, v0, v1 , where v0 and v1 are restricted to T0 T1. As usual, we use b to denote the player other than b. We now deﬁne the disarmament game GD(G) on top of this multi-resource game. The remaining deﬁnitions in this section very closely mirror (and are in some cases identical to) those in (Deng and Conitzer 2017), and so should be seen as background rather than as a new contribution. The main difference is that in that earlier work, speciﬁc strategies of the normal-form game are removed, whereas in this

work, resources are removed (where the removal of a single resource generally results in the elimination of many strategies). We also consider subgame-perfect Nash equilibrium in this paper. The disarmament game consists of two stages, a disarmament stage and a game play stage. During the disarmament stage, players alternatingly remove nonempty sets of resources from R0 and R1. In the game play stage, triggered when a player removes nothing, they play whatever multiresource game remains. We then consider the resulting game as a perfect information extensive-form game. To be more precise, we consider a game DAG rather than a game tree, because multiple sequences of disarmaments may lead to the same state. Also, we associate terminal nodes of this DAG with multi-resource games, rather than (directly) with payoffs. This DAG still has exponential size, however, so we never represent it explicitly. All of this is consistent with prior work (Deng and Conitzer 2017). Deﬁnition 2 (Disarmament Game). Given a multi-resource game G = R0, R1, v0, v1 , the disarmament game GD(G) is deﬁned as an extensive-form game by the following. The set of disarmament actions A: {X0 | X0 R0} {X1 | X1 R1} {Play}, where Xb denotes the set of resources to keep and Play denotes ending the disarmament stage. The set of non-terminal nodes H: {[T0, T1, b] | T0 R0, T1 R1, b {0, 1}}, where T0 and T1 denote the remaining resources and b denotes the player to move. The set of terminal nodes Z: {[T0, T1] | T0 R0, T1 R1}. The player selection function ρ : H {0, 1}: ρ([T0, T1, b]) = b. The available-actions function χ : H 2A, where χ([T0, T1, b]) = {Xb | Xb Tb} {Play}. (Playing Xb corresponds to keeping those resources.) The successor function γ : H A H Z: γ([T0, T1, 0], X0) = [X0, T1, 1]; γ([T0, T1, 1], X1) = [T0, X1, 0]; γ([T0, T1, b], Play) = [T0, T1]. The root of the game is root = [R0, R1, 0]. In the terminal nodes z = [T0, T1], player 0 and 1 play the multi-resource game GT0,T1, resulting in their ﬁnal utilities. Deﬁnition 3 (Strategy in GD). A strategy σb = (α, β) for GD consists of disarmament strategy α h H | ρ(h)=b χ(h) for non-terminal nodes and play strategy β

[T0,T1] Z 2Tb (i.e., subsets of remaining resources) for terminal nodes. Note that we restrict our attention to deterministic behavior during the disarmament stage (in which there is perfect information). Deﬁnition 4 (On-path history & outcome). For a strategy proﬁle (σ0, σ1) with σ0 = (α0, β0) and σ1 = (α1, β1), denote the on-path history by P = (h0 = root, h1, , h K, h K+1 = z = [T0, T1]) where for all i, γ(hi, αρ(hi)(hi)) = hi+1. The outcome of the strategy proﬁle (σ0, σ1) is (β0(z), β1(z)).

We say an on-path history has length K if it contains K non-terminal nodes, excluding the root. In a slight abuse of notation, let ub(o) be player b s utility for outcome o. We are interested in two equilibrium notions, namely Nash equilibrium (NE) and sub-game perfect Nash equilibrium (SPNE). A strategy proﬁle (σ0, σ1) forms an NE if and only if no player can increase his utility by deviating to another strategy; and a strategy proﬁle (σ0, σ1) forms an SPNE if and only if no player can increase his utility starting from any node by deviating to another strategy. Deng and Conitzer (2017) shows that on-path histories of a strategy proﬁle can serve as certiﬁcates to check whether a strategy proﬁle forms a Nash equilibrium or not, placing the equilibrium computation in NP, if one can compute the security level for each player at each node of the game. Deﬁnition 5 (Security level). The security level secb for player b in a two-player multi-resource game G = T0, T1, v0, v1 is the utility that player b can guarantee himself no matter how the other player plays. Formally,

secb(GT0,T1) = max βb 2Tb min β b 2T b ub(βb, β b)

Proposition 2. In a multi-resource game G = T0, T1, v0, v1 , the security level for player b is

secb(GT0,T1) =

s Pb Tb vb(s) +

s Nb T b vb(s)

Lemma 1 (Deng and Conitzer (2017)). P is an on-path history induced by a Nash equilibrium strategy proﬁle with outcome o if and only if o is an equilibrium of the terminal node in P and for each non-terminal node [T0, T1, b] P, secb(GT0,T1) ub(o).

Computing One Equilibrium In disarmament games with resources, it is easy to compute one SPNE of the disarmament game, which implies that it is also easy to compute one NE. Theorem 1. In a multi-resource game G = R0, R1, v0, v1 , given a Nash equilibrium strategy proﬁle (S0, S1) (without disarmament) with utility u0(S0, S1) = u 0 and u1(S0, S1) = u 1, we can efﬁciently construct a subgame perfect Nash equilibrium of the corresponding disarmament game GD(G) with the same utility (u 0, u 1).

Proof. Because (S0, S1) is a Nash equilibrium, by Proposition 1 we have Rb Pb Sb Rb \ Nb for b {0, 1}. Consider the following strategy proﬁle: for each player b,

for every non-terminal node, choose Play; for every terminal node [T0, T1], choose Tb Sb.

First, note that the players play an NE in every terminal node. As for the non-terminal nodes [T0, T1, b], a deviation must change the state to [Xb, T b, 1 b] with Xb Tb. Next, b will choose Play, and subsequently b will play T b S b. But this is exactly what b would have played if b had not deviated; moreover, without deviating, b would have best-responded to T b S b from within its resources T0. Since Xb Tb, b can do no better after deviating.

Computing Equilibrium with Speciﬁc Outcome Therefore, we are interested in the computation of an equilibrium that results in a particular outcome. We now deﬁne the two computational problems:

Deﬁnition 6 (RESOURCE-DISARMC problem). In RESOURCE-DISARMC (where C {NE, SPNE}) given a disarmament game GD and an outcome o = (β 0, β 1), the objective is to determine whether there exists an equilibrium (σ 0, σ 1) of type C such that the outcome is o .

We also consider a variation of the RESOURCE-DISARM problem called K-RESOURCE-DISARM.

Deﬁnition 7 (K-RESOURCE-DISARMC problem). In KRESOURCE-DISARMC, given a disarmament game GD and an outcome o = (β 0, β 1), the objective is to determine whether there exists an equilibrium (σ 0, σ 1) of type C such that the outcome is o and the length of its induced on-path history is at most K.

RESOURCE-DISARMNE

For an on-path history with length K, the total number of non-terminal nodes, excluding the root, is K. For K 2, K-RESOURCE-DISARMNE is in P: since both players have only one step to remove resources, it sufﬁces to check the path in which player b {0, 1} removes all resources from Rb Pb, except the resources played in the outcome. We now show that 3-RESOURCE-DISARMNE is weakly NP-hard by a reduction from the SUBSET-SUM problem.

Lemma 2. 3-RESOURCE-DISARMNE is weakly NP-hard. This remains true even if the input outcome o , if feasible, would uniquely maximize welfare.

Note that the last condition implies that ﬁnding a welfaremaximizing equilibrium of the disarmament game (rather than being given a speciﬁc o to achieve) is weakly NP-hard.

Proof. A SUBSET-SUM problem instance is deﬁned by K = L, w, W , where L = [n]2 is a set of goods and each good is associated with a positive value wi, indicating good i has weight wi. The objective is to determine whether there exists a subset L L with total weight exactly W. Let sum W = n i=1 wi; W.L.O.G, W < sum W. Given a SUBSET-SUM instance, we construct a multi-resource game G(K) = R0, R1, v0, v1 as follows:

R0 = [n] { }, R1 = {x, y}; v0(i) = wi ε and v1(i) = wi ε for all i [n]; moreover, v0( ) = 0 and v1( ) = W; v0(x) = sum W γ, v1(x) = sum W, v0(y) = sum W + W, and v1(y) = 0.

where γ, ε > 0 are sufﬁciently small. The target outcome is o = ( , ), i.e., nobody uses any resources. (This does not necessarily mean that they discarded all the resources.) The condition in the theorem about maximizing welfare is satisﬁed with outcome o , because every resource gives negative social welfare.

2Let [n] be the set {1, , n}; as a special case, [0] = .

If there exists a solution L to the SUBSET-SUM problem K, then the on-path history with h1 = [R0 \ L , R1, 1], h2 = [R0 \ L , {y}, 0] and h3 = [ , {y}, 1] is induced by a Nash equilibrium strategy proﬁle with outcome o . One can check this on-path history satisﬁes the requirement of Lemma 1 by computing the security level of every nonterminal node according to Proposition 2: at the root, player 0 s security level is nε γ sum W + W < 0; at h1, player 1 s security level is (n |L |)ε + sum W W i L\L wi W +

i L wi = 0; at h2, player 0 s security level is sum W + W +

i L\L wi ε < W

i L wi = 0; and at h3 it is an equilibrium to play ( , ). For the other direction, suppose there exists an on-path history leading to o induced by a Nash equilibrium strategy proﬁle. Note that ( , ) cannot be a Nash equilibrium in the induced game if for the terminal node z = [T0, T1], L T0 = or x T1. Suppose the state after player 0 s ﬁrst move is [R0 \ R 0, R1, 1]. We ﬁrst argue that R 0. For the sake of contradiction, suppose R 0; then it must be that R 0 = { }, because otherwise sec1(GR0\R 0,R1) > sum W

i [n]\R 0(wi + γ) > 0. But, letting R 1 denote what 1 discards next, it cannot be the case that x R 1, because this would result in sec0(GR0\R 0,R1\R 1)

i [n] wi sum W + W nε > 0. But we have already shown that player 1 needs to discard x at this point. Hence we have our desired contradiction and can conclude that R 0 [n]. Again using Lemma 1, we must have sec1(GR0\R 0,R1) = W (n |R 0|)ε +

i R 0 wi 0, that is,

i R 0 wi W + (n |R 0|)ε. As for player 1 s action in state [R0 \ R 0, R1, 1], notice that player 1 cannot remove resources x and y together; otherwise, the security level of player 0 will be larger than 0. Thus, player 1 must remove only resource x at this turn, leading to state [R0 \ R 0, {y}, 0]. Again using Lemma 1, we must have sec0(GR0\R 0,{y}) =

i R 0 wi + W (n |R 0|)ε 0, that is,

i R 0 wi W (n |R 0|)ε. With sufﬁciently small ε, these two inequalities turn to i R 0 wi = W. Therefore, R 0 is a solution to the SUBSET-SUM instance.

Although K-RESOURCE-DISARMNE is weakly NPhard, it can be solved using dynamic programming to obtain a pseudopolynomial algorithm. Theorem 2. K-RESOURCE-DISARMNE is weakly NPcomplete. This remains true even if the input outcome o , if feasible, would uniquely maximize welfare. When there exists no limitation on the number of rounds, the problem becomes strongly NP-complete. Theorem 3. RESOURCE-DISARMNE is strongly NPcomplete. This remains true even if the input outcome o , if feasible, would uniquely maximize welfare.

Proof. A 3-PARTITION problem instance is deﬁned by a set of integers I = {w1, , wq} with q divisible by 3; the objective is to partition {1, , q} into subsets W1, , Wq/3 such that for each i,

j Wi wj = B where

B = (3/q) q j=1 wj. 3-PARTITION remains strongly NPcomplete even when we have B/4 < wj < B/2 for each j, in which case we must have |Wi| = 3 for each i. We reduce from this special case, also assuming W.L.O.G that B > 0 (and hence all the wj are positive too). Construct a multi-resource game G(I) = R0, R1, v0, v1 as follows:

R0 = [q] {x, y}, R1 = {z1, z2, , zq/3} { };

v0(i) = wi and v1(i) = wi γ for all i [q]; moreover, v0(x) = 0, v1(x) = B, v0(y) = ε δ, and v1(y) = δ; v0(zj) = B 3ε/q and v1(zj) = B for all j [q/3]; moreover, v0( ) = ε and v1( ) = 0;

where ε/2 < δ < ε, 0 < γ, ε B, and the target outcome is o = ( , ). The condition in the theorem about maximizing welfare is satisﬁed with outcome o , again because every resource gives negative social welfare. If there exists a solution W 1 , , W q/3 to the 3PARTITION instance I, then consider the on-path history with h2k 1 = [R0 \ k j=1 W j , R1 \ {z1, , zk 1}, 1],

h2k = [R0 \ k j=1 W j , R1 \ {z1, , zk}, 0] for 1 k q and h2q+1 = [{x}, { }, 1]. We argue that this corresponds to a a Nash equilibrium strategy proﬁle with outcome o , by checking that this on-path history satisﬁes the requirement of Lemma 1 by computing the security level at every non-terminal node and appealing to Proposition 2. For 1 k q, at h2k 1, player 1 s security level is (q/3 k + 1)(B + 3γ) δ + (q/3 k + 1)B < δ; at h2k, player 0 s security level is (q/3 k)B (q/3 k)(B+3ε/q) δ = ε δ + 3kε/q < 0; and at h2q+1, it is an equilibrium to play ( , ). For the other direction, suppose there exists an on-path history leading to o induced by a Nash equilibrium strategy proﬁle. Note that ( , ) cannot be a Nash equilibrium in the induced game if for the terminal node z = [T0, T1], ([q] {y}) T0 = or {z1, z2, , zq/3} T1 = . Moreover, it is without loss of generality to assume that player 1 removes strategy in his ﬁnal disarmament round if he does, since removing strategy earlier only increases player 0 s security level for certain non-terminal nodes. Similarly, it is W.L.O.G to assume that player 0 removes strategy x in his ﬁnal disarmament round if he does. Consider the ﬁrst iteration, i.e., the ﬁrst disarmament step by both players. Suppose player 0 removes a subset R 0, and player 1 removes c 1 strategies from {z1, , zq/3} (note that both players have identical valuations for any of these). After player 0 s move, by Lemma 1, the security level of player 1 must satisfy B +(q/3) B δ i R0\R 0(wi + γ) = B δ (n |R 0|)γ +

i R 0 wi 0, that is, i R 0 wi B + δ + (n |R 0|)γ. Moreover, after player 1 s move, the security level of player 0 must satisfy (q/3 c)( B 3ε/q) δ +

i R0\R 0 wi 0, that is, i R 0 wi c(B + 3ε/q) ε δ. Combining the two inequalities (and the fact that c 1), by the choice of ε, δ and the fact that B/4 < wi < B/2 for all i [q], we must

have c = 1 and

i R 0 wi = B exactly. Repeating this argument inductively, we can show that for each iteration j, the subset Rj 0 of strategies removed by player 0 must satisfy

i Rj 0 wi = B and player 1 removes one strategy from

{z1, , zq/3}. Thus, the subsets R1 0, , Rq/3 0 constitute a solution for the 3-PARTITION problem.

RESOURCE-DISARMSPNE

We now turn to the computation of SPNE. We ﬁrst obtain a version of Lemma 1 for the SPNE setting. Under SPNE, the players can only play an NE of the remaining multiresource game even for the nodes not in the on-path history. By Proposition 1, in an NE of a multi-resource game (without disarmament), a player cannot use a resource that gives himself negative utility. Hence, such a resource has no impact at all under SPNE; in particular, it does not affect the other player s security level. (In contrast, in a non-subgameperfect NE of the disarmament game, the resource may be used in parts of the disarmament game tree that are off the path of play, constituting a non-credible threat, and thereby help to implement a particular outcome.) Also, in an NE of a multi-resource game, a player must use any resource that provides himself positive utility. Hence, if such a resource also gives the other player positive utility, while in an NE of the disarmament game it is, off the path of play, possible to avoid playing that resource in order to punish the other player, this is not possible in an SPNE. We use the following notation for the set of resources that b will use to punish b in SPNE:

Q b(T b) = (P b T b) (Nb (T b \ N b)).

This is the union of the resources that b must use to bestrespond in SPNE, and those that it is able to use in order to punish b. Deﬁne the security level against a self-preserving player (SLAP) as the maximum utility a player can guarantee himself against a player that is not willing to use resources that hurt himself. Proposition 3. The SLAP for player b in a two-player multiresource game G = T0, T1, v0, v1 is

SLAPb(GT0,T1) =

s Pb Tb vb(s) +

s Q b(T b) vb(s)

Now, we are ready to prove a similar lemma as Lemma 1 for the SLAP level. Using this, we can characterize the computational complexity of 3-RESOURCEDISARMSP NE and RESOURCE-DISARMSP NE. Lemma 3. P is an on-path history induced by a subgame perfect Nash equilibrium strategy proﬁle with outcome o if and only if o is an equilibrium of the terminal node in P and for each non-terminal node [T0, T1, b] P, SLAPb(GT0,T1) ub(o).

Proof. : If o is not an equilibrium of the terminal node in P then we clearly do not have an SPNE. If there exists a non-terminal node h = [T0, T1, b] P such that SLAPb(GT0,T1) > ub(o), then at node h, player b can deviate to choose Play and then play the strategy Pb Tb in

the induced game. By Proposition 1, player b must play a strategy X b for which P b T b X b T b \ N b in the induced game. Therefore, the utility of player b is s Pb Tb vb(s) +

s X b vb(s)

s Pb Tb vb(s) +

s Q b(T b) vb(s) = SLAPb(GT0,T1) > ub(o). : If o is an equilibrium of the terminal node in P and for each non-terminal node h = [T0, T1, b] P, SLAPb(GT0,T1) ub(o), consider a strategy proﬁle that speciﬁes to:

for nodes in the on-path history, choose the action to follow the on-path history; and for every non-terminal node not in the on-path history, choose Play;

for each terminal node z = [Tb, T b] not in P, player b plays Qb(Tb) and player b plays Q b(T b);

for the terminal node z = [Tb, T b] in P, both players play according to o.

First, note that for any subgame rooted at an off-path node, as in the proof of Theorem 1 (noting that Qb(Tb), Q b(T b) form a Nash equilibrium of the multi-resource game GTb,T b), the strategy proﬁle forms a Nash equilibrium. As for an on-path non-terminal node [Tb, T b, b], if player b deviates to choose Play, according to the deﬁnition of the SLAP level, his utility is at most SLAPb(GTb,T b) in the induced multi-resource game. If player b deviates to remove some resource, his utility can only decrease compared to choosing Play immediately. Finally, the players play the equilibrium o at the terminal node on the path.

Theorem 4. 3-RESOURCE-DISARMSP NE is weakly NPcomplete and RESOURCE-DISARMSP NE is strongly NPcomplete. These remain true even if the input outcome o , if feasible, would uniquely maximize welfare.

Fractional Disarmament

We now turn our attention to fractional disarmament, where resources are divisible and each player can reduce each resource by a fraction. In the previous section, the integral disarmament case, hardness results already appeared with two players. It turns out that with fractional disarmament, we can get certain positive results.

Example 2. We now modify Example 1 so that each player only has resource x. Without resource y, it is impossible to successfully disarm in the integral disarmament model, because once a player removes x, the other player has a strict incentive to not remove his x. However, in the fractional disarmament model, successful disarmament is possible: at the k-th move of each player, he removes an additional fraction 1/2k of resource x; in other words, he keeps a fraction 1/2k of resource x after his kth move. Assuming player 0 moves ﬁrst, after player 0 s kth move, the equilibrium utility for ending disarmament at that point is ( 3/2k, 0), while after player 1 s k-th move, that utility is ( 1/2k, 1/2k). Therefore, both players are

incentivized to follow the path, and they can obtain utilities arbitrarily close to (0, 0) after sufﬁciently many steps.3

The above example demonstrates that with fractional disarmament, we can achieve objectives that cannot be obtained under integral disarmament. In fact, we show that any feasible outcome can be obtained for any number of players. Before formalizing our results, we ﬁrst extend the deﬁnition of two-player multi-resource games to a multi-player version. Deﬁnition 8. An n-player multi-resource game is deﬁned by G = R1, , Rn, v1, , vn , where player i s resource set is Ri and his valuation function is vi : n j=1 Rj R. Given strategy proﬁle (S1, , Sn), the utility of player i is computed as ui(S1, , Sn) =

s n j=1 Sj vi(s).

For simplicity, assume Ri = [m] for all i [n] and let 0 (and 1) denote a vector of all zeroes (ones).

Fractional disarmament game We represent a remaining game in a terminal node as p = [p1, p2, , pn]. Here, pi [0, 1]m represents the vector of the remaining fractions of player i s resources. We use pi(s) to denote the s-th element in the vector pi. In the induced game Gp, player i can play any vector qi [0, 1]m with qi pi. The utility function for player i becomes ui(q1, q2, , qn) = n j=1

s Rj qj(s)vi(s). To deﬁne a multi-player disarmament game GF D(G), we introduce another option for the players during the disarmament stage (in addition to the options of Play and removing resources), namely a Pass action, which keeps the disarmament stage alive unless all players use it in sequence. The reason we need this action once we move beyond two players is that sometimes, in order to make progress, two players alternatingly need to remove actions while a third player waits and does nothing. Therefore, we associate tuples {[p1, , pn, i, c] | i [n], pi [0, 1]m; 0 c < n} with the non-terminal nodes, where i denotes the player to move and c denotes a counter that records the number of consecutive players that have chosen Pass. The counter is initialized to 0 and when a player chooses Pass, if the counter is less than n 1, the counter is incremented by 1 and the game turns to player i + 1 s move; if it is equal to n 1, since all n players have chosen Pass, the game stage will be triggered by choosing Pass. If a player chooses to remove some fractions of resources, the counter is reset to 0. As in the previous (integral) deﬁnition of disarmament games, player i can still choose Play to trigger the game stage immediately, or choose a new vector of remaining resource fractions from {p i | p i pi, p i [0, 1]m}. The root of the game is [1, , 1, 1, 0]. Strategies, on-path histories, and outcomes in fractional disarmament games are deﬁned similarly as before. Again,

3Of course, if it is common knowledge that a certain round k is the last disarmament round, then the last player to move has a strict incentive not to disarm at that point. This problem can be circumvented by tossing a public coin in each round that with small probability will signal the end of all disarmament, so that players are never sure that their current step is the last. This is discussed in more detail by Deng and Conitzer (2017).

let Pi = {s n j=1 Rj | vi(s) > 0} be the set of resources that give player i positive utility and Nj = {s n j=1 Rj | vi(s) < 0} the set of resources that give player i negative utility.

Security and SLAP level The notions of security level and SLAP are modiﬁed accordingly. Formally, the security level of player b in Gp is deﬁned by

s Pi pi(s)vi(s) +

s Ni pj(s)vi(s)

We use Qi,j = (Rj Pj) (Ni (Rj \Nj)) to represent the set of resources that j will use to punish i in an SPNE. This is the union of the resources that j must use to best-respond in an SPNE, and those that it is able to use in order to punish i. Then we have

SLAPi(Gp) =

s Pi pi(s)vi(s) +

s Qi,j pj(s)vi(s)

In fractional disarmament games, Lemmas 1 and 3 still hold.

A folk theorem Deng and Conitzer (2017) shows a folk theorem that all desirable utility proﬁles can be obtained for two-player disarmament games in which individual mixed strategies can be removed.4 In that result, desirable utility proﬁles consist of utility proﬁles that are feasible i.e., the utility proﬁle results from some mixed-strategy proﬁle and enforceable i.e., for each player, his utility is higher than his security level at the root of the disarmament game. However, it is not immediately clear how to extend this result to general games with more than two players and/or SPNE. For fractional disarmament games based on multi-resource games, however, we obtain a folk theorem result for games with more than two players and obtain a better convergence rate to approximate the equilibrium (whether NE or SPNE is the solution concept). In fractional disarmament games, a utility proﬁle (u 1, , u n) is feasible if and only if there exists a strategy proﬁle (p 1, , p n) such that for all i [n]: ui(p 1, , p n) = u i and j [m], p i (j) > 0 only if vi(j) 0.5

Theorem 5. In fractional disarmament games GF D(G), for any vector of feasible utilities (u 1, , u n) and its associated strategy proﬁle (p 1, , p n) with u i > seci(G1, ,1) for all i [n], there exists a Nash equilibrium of GF D(G) such that the terminal node of its induced on-path history is z = [p 1, , p n] and the outcome o obtained at the terminal node satisﬁes ui(o) = u i for all i [n].

4The removal of mixed strategies is technically similar to the fractional removal we consider here, though in our context it is more natural to think about resources being divisible rather than removing probability distributions. 5The folk theorem result in (Deng and Conitzer 2017) does not require the last condition. It does not need to, because that folk theorem requires that it is possible to remove fractions of any strategy. In our context, however, it is not possible to, say, remove the strategy of playing no resources at all. This makes it impossible to get someone to play a resource that harms himself.

Theorem 6. In fractional disarmament games GF D(G), for any vector of feasible utilities (u 1, , u n) with u i > seci(G1, ,1) for all i [n], there exists a V ε (additive)- Nash equilibrium of GF D(G), where V is the sum of absolute values associated with the resources, such that the length of its on-path history is O(n2 log 1

ε) and the outcome o obtained at the terminal node satisﬁes satisﬁes ui(o) > u i V ε for all i [n].

All the results appear in this section also hold for SPNE if we replace the security level with the SLAP level.

A constructive proof

In this subsection, we provide a constructive proof for Theorems 5 and 6 via an algorithm to generate the corresponding on-path history. We ﬁrst demonstrate an essential property of fractional disarmament games.

Lemma 4. Given a multi-resource game and utility proﬁle (u 1, , u n) for every player i, let F = {[p1, , pn] | i [n], pi [0, 1]m, seci(Gp1, ,pn) u i }. Then, F is a convex set.

Given target utility proﬁle (u 1, , u n) with corresponding strategy proﬁle (p 1, , p n), let p i = 1 p i . We introduce parameter θi for each player i, initialized to 1, such that the vector of fractions of player i s remaining resources is pi = p i +θip i. For convenience, let Gθ θ1, ,θn = Gp 1+θ1p 1, ,p n+θnp n. Deﬁne an update function f:

fi(θ i) = inf{θi 0 | j, secj(Gθ θi,θ i) u j}

That is, given θ i, function f returns the minimum θi such that every player s security level in the induced game is still lower than or equal to his target utility. We now present the algorithm to generate the on-path histories for Theorem 5 and 6. The idea behind the algorithm is to, in each stage, decrease all players θi to the same value l, one by one, until we achieve the desired ϵ. This avoids getting stuck at a boundary but requires us to be conservative in choosing l.

Algorithm 1: Generate on-path strategy for feasible utilities that exceed security levels

Input: A multi-resource game G, a target utility proﬁle (u 1, , u n) associated with its strategy proﬁle (p 1, , p n), and the approximation ratio ε Output: An on-path history P for GF D(G) For all i [n], let p i = 1 p i , θi = 1; Let h0 = [1, , 1, 1, 0], t = 0, l = 1; while l > ε do

l = ((n 1)l + maxi [n] fi(θ i))/n; for i : 1 to n do

θi = l; t = t + 1; ht = [p 1 + θ1p 1, , p n + θnp n, i + 1, 0];

ht+1 = [p 1 + θ1p 1, , p n + θnp n]; return h;

Lemma 5. When setting ε = 0, Algorithm 1 produces an on-path history that is part of a Nash equilibrium and in the limit leads to the terminal node z = [p 1, , p n]. Lemma 6. Given an ε, Algorithm 1 produces an on-path history with length O(n2 log 1

ε) that is part of a Nash equilibrium and leads to a terminal node z = [p 1 + ε p 1, , p n + ε p 1] with ε < ε, in which ui(Gθ ε , ,ε ) > u i V ε for all i [n].

Computation of minimum rounds In the previous subsection, we showed that for any desirable utility proﬁle, there exists a Nash equilibrium of a fractional disarmament game achieving those utilities; moreover, to get an ε-approximate Nash equilibrium, we need at most O(n2 log 1

ε) rounds. Still, we may wish to compute the minimum number of rounds. In fractional disarmament games, deﬁne the number of removal rounds of an on-path history P = (h0 = root, h1, , h K, h K+1 = z) to be6

K |{hi P | χ(hi) = Pass}|

Deﬁnition 9 (MIN-FRACTIONAL-DISARMC problem). In MIN-FRACTIONAL-DISARMC, given a fractional disarmament game GD(G)F , an outcome o = (p 1, , p n), and a parameter K, the objective is to determine whether there exists an equilibrium (σ 1, , σ n) of type C such that the outcome is o and the number of removal rounds in its induced on-path history is at most K. Unlike in the case where resources must be removed integrally, the problem is easy with two players by implementing the following scheme: (1) For a resource of player b {0, 1} that gives non-negative utilities to both players, remove the required fraction in player b s ﬁrst disarmament round; (2) For a resource of player b {0, 1} that gives non-positive utility to player b, remove the required fraction in player b s last disarmament round; (3) For the remaining resources that give player b positive utility and player b negative utility; remove the resources i with larger vb(i)/v b(i) ﬁrst until the security level of player b equals u b(o ) in each round. Theorem 7. In 2-player fractional disarmament games, MIN-FRACTIONAL-DISARMNE is in P. However, when we have multiple players, the problem becomes NP-complete. Theorem 8. In n-player fractional disarmament games, MIN-FRACTIONAL-DISARMNE is strongly NP-complete even if each player has only one resource.

Future Research Future research could investigate game representations other than the multi-resource games deﬁned here, for example ones that allow interaction between resources. (Of course, any representation scheme that can compactly represent all multi-resource games will inherit the hardness results given here.) It could also attempt to make the model more realistic by modeling imperfect observation of others disarmament

6We do not count the Pass rounds because we can immediately jump to the next player instead.

actions, the possibility of re-arming (potentially at a cost), etc. These steps all may contribute to these techniques becoming practical enough for real applications.

Acknowledgement We are thankful for support from ARO under grants W911NF-12-1-0550 and W911NF-11-1-0332, NSF under awards IIS-1527434 and CCF-1337215, the Future of Life Institute, and a Guggenheim Fellowship.

References Chen, X.; Deng, X.; and Teng, S.-H. 2009. Settling the complexity of computing two-player Nash equilibria. Journal of the ACM 56(3). Conitzer, V., and Sandholm, T. 2006. Computing the optimal strategy to commit to. In Proceedings of the ACM Conference on Electronic Commerce (EC), 82 90. Conitzer, V., and Sandholm, T. 2008. New complexity results about Nash equilibria. Games and Economic Behavior 63(2):621 641. Daskalakis, C.; Goldberg, P.; and Papadimitriou, C. H. 2009. The complexity of computing a Nash equilibrium. SIAM Journal on Computing 39(1):195 259. Deng, Y., and Conitzer, V. 2017. Disarmament games. In Proceedings of the Thirty-First AAAI Conference on Artiﬁcial Intelligence. Gilboa, I., and Zemel, E. 1989. Nash and correlated equilibria: Some complexity considerations. Games and Economic Behavior 1:80 93. Porter, R.; Nudelman, E.; and Shoham, Y. 2008. Simple search methods for ﬁnding a Nash equilibrium. Games and Economic Behavior 63(2):642 662. Tambe, M. 2011. Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned. Cambridge University Press. von Stengel, B., and Zamir, S. 2010. Leadership games with convex strategy sets. Games and Economic Behavior 69:446 457.