# quasiperfect_stackelberg_equilibrium__353bd9d7.pdf

The Thirty-Third AAAI Conference on Artiﬁcial Intelligence (AAAI-19)

Quasi-Perfect Stackelberg Equilibrium

Alberto Marchesi,1 Gabriele Farina,2 Christian Kroer,2 Nicola Gatti,1 Tuomas Sandholm2

1 Politecnico di Milano, Piazza Leonardo da Vinci 32, I-20133, Milan, Italy 2 Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA alberto.marchesi@polimi.it, gfarina@cs.cmu.edu, ckroer@cs.cmu.edu, nicola.gatti@polimi.it, sandholm@cs.cmu.edu

Equilibrium reﬁnements are important in extensive-form (i.e., tree-form) games, where they amend weaknesses of the Nash equilibrium concept by requiring sequential rationality and other beneﬁcial properties. One of the most attractive reﬁnement concepts is quasi-perfect equilibrium. While quasiperfection has been studied in extensive-form games, it is poorly understood in Stackelberg settings that is, settings where a leader can commit to a strategy which are important for modeling, for example, security games. In this paper, we introduce the axiomatic deﬁnition of quasi-perfect Stackelberg equilibrium. We develop a broad class of game perturbation schemes that lead to them in the limit. Our class of perturbation schemes strictly generalizes prior perturbation schemes introduced for the computation of (non Stackelberg) quasi-perfect equilibria. Based on our perturbation schemes, we develop a branch-and-bound algorithm for computing a quasi-perfect Stackelberg equilibrium. It leverages a perturbed variant of the linear program for computing a Stackelberg extensive-form correlated equilibrium. Experiments show that our algorithm can be used to ﬁnd an approximate quasi-perfect Stackelberg equilibrium in games with thousands of nodes.

1 Introduction The main solution concept in game theory, Nash equilibrium (NE), may prescribe non-credible strategies in extensiveform (i.e., tree-form) games (EFGs). To solve that problem, equilibrium reﬁnements have been proposed for such games (Selten 1975). Among the plethora of NE reﬁnements (see van Damme (1987) for details), the quasi-perfect equilibrium (QPE), proposed by Van Damme (1984), plays a central role, and it is considered one of the most attractive NE reﬁnement concepts, as argued, for example, by Mertens (1995). The rationale behind the QPE concept is that every player, in every information set, plays her best response to perturbed that is, subject to trembles strategies of the opponents. Unlike the normal-form perfect equilibrium, the QPE guarantees that the strategies of the players are sequentially rational, and furthermore, quasi-perfection implies normal-form perfection. Unlike the extensive-form perfect equilibrium (EFPE), in a QPE every player (reasonably) assumes that she will not make mistakes in the future,

Copyright c 2019, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

and this excludes some unreasonable strategies (Mertens 1995). Computation of NE reﬁnements has received extensive attention in the literature. In the two-player case, Miltersen and Sørensen (2010) provide algorithms for ﬁnding a QPE, while Farina and Gatti (2017) for ﬁnding an EFPE. In particular, Miltersen and Sørensen (2010) show that a strict subset of the QPEs can be found when the sequence form is subject to a speciﬁc perturbation, while Farina and Gatti (2017) do the same for the EFPE. Iterative algorithms for such perturbed games in the zero-sum EFPE setting were introduced by Kroer, Farina, and Sandholm (2017) and Farina, Kroer, and Sandholm (2017).1 In Stackelberg games, a leader commits to a (possibly mixed) strategy ﬁrst, and then a follower best responds to that strategy (von Stackelberg 1934). Stackelberg games have received signiﬁcant attention in recent years (Conitzer and Sandholm 2006) due to their applications, for example, in security domains (Tambe 2011). Work on equilibrium reﬁnements in the context of Stackelberg extensive-form games has only started recently. Akin to usual extensive-form game reﬁnements, Stackelberg equilibrium (SE) reﬁnements should guarantee both the optimality of the commitment off the equilibrium path and some form of robustness against small trembles of the opponent. To our knowledge, there is only one prior study of reﬁnements for Stackelberg extensive-form games (Farina et al. 2018). They characterize a set of SE reﬁnements based on what solutions can be obtained by imposing a perturbation scheme on the game where players tremble onto suboptimal strategies with some small probabilities and tak-

1Normal-form proper equilibrium is a reﬁnement of QPE (Van Damme 1984), but it has drawbacks: (1) it requires players to assume a very speciﬁc structure on trembles which is not necessarily well-motivated, (2) the minimum tremble magnitudes depend on the action probabilities, which begets additional computational challenges, and (3) it is unknown whether it can be represented via perturbation schemes, even in the non-Stackelberg setting. For the zero-sum case, Miltersen and Sørensen (2008) show a polynomial-time approach using the sequence form, but it is based on solving a large (possibly linear in game-size) number of LPs, and thus may not be practical. For the general-sum case, it is not even known whether the sequence form can be applied; the only known approach relies on conversion to normal form which causes an exponential blow-up and then applying a pivoting algorithm (Sørensen 2012).

ing the limit as the trembling probability approaches zero. They prove that, for any perturbation scheme, all the limit points of sequences of SEs in a perturbed game are SEs of the original, unperturbed game. Interestingly, they prove that when restricting attention to the common tie-breaking rules for the follower (strong SE assumes the follower breaks ties in the best way for the leader and weak SE assumes the follower breaks tie in the worst way for the leader), this is no longer the case. Their approach does not start from a gametheoretic, axiomatic deﬁnition of the reﬁnement concept. As we show in this paper, their approach captures only a strict subset of the solutions that are consistent with our natural game-theoretically deﬁned reﬁnement concept. One way to view this is that their operational deﬁnition is deﬁcient in that it does not characterize all the solutions that are consistent with the natural, axiomatic deﬁnition of the reﬁnement concept. Another view is that they have an operational deﬁnition and we provide a generalization. In terms of complexity, they prove that ﬁnding any SE is NP-hard. (Hardness had previously been proven for ﬁnding a strong SE (Letchford and Conitzer 2010).) Therefore, ﬁnding any SE reﬁnement is also NP-hard. Our contributions. In this paper, we formally deﬁne the quasi-perfect Stackelberg equilibrium (QPSE) reﬁnement game theoretically in the same axiomatic fashion as QPE was deﬁned for non-Stackelberg games (Van Damme 1984). As in the case of QPEs, our deﬁnition is based on a set of properties of the players strategies, and it cannot be directly used to search for a QPSE. Subsequently, we deﬁne a class of perturbation schemes for the sequence form such that any limit point of a sequence of SEs in a perturbed game is a QPSE. This class of perturbation schemes strictly includes those used to ﬁnd a QPE by Miltersen and Sørensen (2010). Then, we extend the algorithm by ˇCerm ak et al. (2016) to the case of QPSE computation. We derive the corresponding mathematical program for computing a Stackelberg extensive-form correlated equilibrium (SEFCE) when a perturbation scheme is introduced and we discuss how the individual steps of the algorithm change. In particular, the implementation of our algorithm is much more involved, requiring the combination of branch-and-bound techniques with arbitrary-precision arithmetic to deal with small perturbations. This does not allow a direct application of off-the-shelf solvers. Finally, we experimentally evaluate the scalability of our algorithm.

2 Preliminaries

Using standard notation (Shoham and Leyton-Brown 2008), a Stackelberg extensive-form game (SEFG) of imperfect information is a tuple (N, H, Z, A, ρ, χ, C, u, I). N = {ℓ, f} is the set of players, the leader and the follower. H = Hc Hℓ Hf is the set of nonterminal nodes, where Hc is the set of chance nodes, while Hℓand Hf are the sets of leader s and follower s decision nodes, respectively. Z is the set of terminal nodes. A = Ac Aℓ Af is the set of actions, where Ac contains chance moves, while Aℓ and Af are the sets of leader s and follower s actions, respectively. ρ : H 2A is the action function that as-

signs to each nonterminal node a set of available actions. χ : H A H Z is the successor function that deﬁnes the node reached when an action is performed in a nonterminal node. C : H Z [0, 1] assigns each node with its probability of being reached given chance moves. u = {uℓ, uf}, where uℓ, uf : Z R specify leader s and follower s payoffs, respectively, in each terminal node. Finally, I = {Iℓ, If}, where Iℓand If deﬁne partitions of Hℓ and Hf, respectively, into information sets, that is, groups of nodes that are indistinguishable by the player. For every information set I Ii and nodes h, ˆh I, it must be the case that ρ(h) = ρ(ˆh) = A(I), otherwise player i would be able to distinguish the two nodes. As usual, w.l.o.g., we assume that each action a A belongs to only one set A(I). We focus on perfect-recall SEFGs in which no player forgets what she did or knew in the past, that is, for every i N and I Ii, all nodes belonging to I share the same player i s moves on their paths from the root. Thus, we can restrict the attention to behavioral strategies (Kuhn 1953), which deﬁne, for every player i N and information set I Ii, a probability distribution over the actions A(I). For i N, let πi Πi be a player i s behavioral strategy, with πia denoting the probability of playing action a Ai. Overloading notation, we use ui as if it were deﬁned over strategies instead of terminal nodes. Speciﬁcally, ui(πℓ, πf) is player i s expected utility when πℓ Πℓand πf Πf are played. Perfect-recall SEFGs admit an equivalent representation called the sequence form (Von Stengel 1996; Romanovskii 1962). Every node h H Z deﬁnes a sequence σi(h) for player i N, which is the ordered set of player i s actions on the path from the root to h. Let Σi be the set of player i s sequences. As usual, let σ Σi be a ﬁctitious element representing the empty sequence. In perfect-recall games, given an information set I Ii, for any pair of nodes h, ˆh I it holds σi(h) = σi(ˆh) = σi(I). Given σi Σi and a A(I) with I Ii : σi = σi(I), we denote as σia the extended sequence obtained by appending a to σi. Moreover, for any pair σi, ˆσi Σi, we write ˆσi σi whenever ˆσi is a preﬁx of σi, that is, σi can be obtained by extending ˆσi with a ﬁnite number of actions. Given σi Σi, we also let Ii(σi) be the information set I Ii : σi = σi(I)a for some a A(I). In the sequence form, a strategy, called a realization plan, assigns each sequence with its probability of being played. For i N, let ri Ri be a player i s realization plan. In order to be well-deﬁned, a realization plan ri must be such that ri(σ ) = 1 and, for I Ii, ri(σi(I)) = P

a A(I) ri(σi(I)a). Finally, letting Σ = Σℓ Σf be the set of sequence pairs σ = (σℓ, σf), overloading notation, ui : Σ R is player i s utility function in the sequence form, with ui(σ) = P

h Z:σℓ(h)=σℓ σf (h)=σf ui(h)C(h). Moreover, we also use ui as if it were deﬁned over realization plans. Formally, ui(rℓ, rf) = P

σ Σ ui(σ)rℓ(σℓ)rf(σf). The sequence form is usually expressed with matrix notation as follows. Player i s utility function is a |Σℓ| |Σf| matrix Ui whose entries are the utilities ui(σ), for σ Σ. Constraints deﬁning ri Ri are expressed as Firi = fi, where: Fi is a (|Ii| + 1) |Σi| matrix, fi R|Ii|+1, and, overloading notation, ri R|Σi| is a vector representing ri.

Speciﬁcally, introducing a ﬁctitious information set I , the entry of Fi indexed by (I , σ ) is 1, and, for I Ii and σi Σi, the entry indexed by (I, σi) is 1 if σi = σi(I), while it is 1 if σi = σi(I)a for some a A(I). Fi is zero everywhere else. Moreover, f T i = (1 0 0). Finally, given rℓ Rℓand rf Rf, we can write ui(rℓ, rf) = r T ℓUirf. In perfect-recall games, behavioral strategies and realization plans are equally expressive. Given ri Ri, we obtain an equivalent πi Πi by setting, for all I Ii and a A(I), πia = ri(σi(I)a)

ri(σi(I)) when ri(σi(I)) > 0, while πia can be any otherwise. Similarly, πi Πi has an equivalent ri Ri with ri(σi) = Q

a σi πia for all σi Σi.2

The solution concept associated with SEFGs is the SE. An SEFG may have many SEs, depending on the leader s assumption on how the follower breaks ties among multiple best responses. A leader s strategy is part of an SE if it is optimal for some tie-breaking rule of the follower. Letting BRΓ(πℓ) = arg maxπf Πf uf(πℓ, πf) be the set of follower s best responses to πℓ Πℓin an SEFG Γ, we have the following formal deﬁnition of SE.3

Deﬁnition 1. Given an SEFG Γ, (πℓ, πf) is an SE of Γ if πf BRΓ(πℓ) and, for all ˆπℓ Πℓ, there exists ˆπf BRΓ(ˆπℓ) such that uℓ(πℓ, πf) uℓ(ˆπℓ, ˆπf).

Many papers on SEs focus on strong SEs (SSEs), which assume that the follower breaks ties in favor of the leader.

Deﬁnition 2. Given an SEFG Γ, (πℓ, πf) is an SSE of Γ if πf BRΓ(πℓ) and, for all ˆπℓ Πℓand ˆπf BRΓ(ˆπℓ), it holds uℓ(πℓ, πf) uℓ(ˆπℓ, ˆπf).

Finally, SEs and SSEs can be deﬁned analogously for SEFGs in sequence form (using the equivalence between behavioral strategies and realization plans).

3 Deﬁnition of Quasi-Perfect Stackelberg Equilibrim

In this section, we introduce QPSEs, which reﬁne SEs in SEFGs using an approach resembling that adopted by Van Damme (1984) for deﬁning QPEs in EFGs. First, we provide needed additional notation. We say that πi Πi is completely mixed if πia > 0 for all a Ai. Given two information sets I, ˆI Ii, we write I ˆI whenever ˆI follows I, i.e., there exists a path from h I to ˆh ˆI. We assume I ˆI for all ˆI Ii such that there is no I = ˆI Ii : I ˆI. In perfect-recall games, is a partial order over Ii {I }. Given πi, ˆπi Πi and I Ii {I }, πi

I ˆπi is equal to ˆπi at all ˆI Ii : I ˆI, while it is equal to πi everywhere else. Moreover, for I Ii, we write πi =I ˆπi if πia = ˆπia for all a A(I). Finally, given completely mixed strategies πℓ Πℓ, πf Πf and I Ii, ui,I(πℓ, πf) denotes player i s expected utility given that I has been reached and strategies πℓand πf are played.

2Here, the equivalence is in terms of probabilities that the strategies induce on terminal nodes, i.e., it is realization equivalence. 3In this paper, we deﬁne SEs following a characterization introduced by Farina et al. (2018) (Lemma 2 in their paper).

Next, we introduce a fundamental building block: the idea of follower s best response at an information set I If. Intuitively, πf is an I-best response to πℓwhenever playing as prescribed by πf at the information set I is part of some follower s best response to πℓin the game following I, given that I has been reached during play. Formally: Deﬁnition 3. Given an SEFG Γ, a completely mixed πℓ Πℓ, and I If, we say that πf Πf is an I-best response to πℓ, written πf BRI(πℓ), if the following holds:

max ˆπf Πf : πf =I ˆπf

uf,I πℓ, πf

I ˆπf = max ˆπf Πf uf,I πℓ, πf

For i N and πi Πi, let {πi,k}k N be a sequence of completely mixed player i s strategies with πi as a limit point. We are now ready to deﬁne the reﬁnement concept. In words, in a QPSE, the leader selects an optimal strategy to commit to in all information sets, given that the follower best responds to it at every information set, following some tie-breaking rule. Speciﬁcally, point (ii) in Deﬁnition 4 ensures that the leader s commitment is optimal also in those information sets that are unreachable in absence of players errors. Notice that the leader only accounts for follower s future errors, while the follower assumes that only the leader can make mistakes in future. This is in line with the idea underlying QPEs in EFGs (Van Damme 1984).4

Deﬁnition 4. Given an SEFG Γ, (πℓ, πf) is a quasi-perfect Stackelberg equilibrium (QPSE) of Γ if there exist sequences {πi,k}k N, deﬁned for every i N and πi Πi, such that:

(i) πf BRI(πℓ,k) for all I If; (ii) for all I Iℓ {I } and ˆπℓ Πℓ, there exists ˆπf Πf : ˆπf BRˆI(πℓ,k

I ˆπℓ,k) for all ˆI If, with:

Iπℓ, πf,k uℓ πℓ,k

I ˆπℓ, ˆπf,k . (1)

As with SEs, we introduce the strong version of QPSEs.5

Deﬁnition 5. Given an SEFG Γ, (πℓ, πf) is a quasi-perfect strong Stackelberg equilibrium (QPSSE) of Γ if there exist {πi,k}k N, deﬁned for every i N and πi Πi, such that:

(i) πf BRI(πℓ,k) for all I If; (ii) for all I Iℓ {I }, ˆπℓ Πℓ, and ˆπf Πf : ˆπf BRˆI(πℓ,k

I ˆπℓ,k) for all ˆI If, Equation (1) holds. As we will show in Section 4, QPSEs are reﬁnements of SEs, that is, any QPSE is also an SE.

4 Family of Perturbation Schemes for QPSE We now introduce a family of perturbation schemes for SEFGs in sequence form that satisﬁes the following fundamental property: limits of SEs in perturbed sequence-form SEFGs are QPSEs of the original unperturbed SEFGs as the

4Van Damme (1984) deﬁnes a QPE of an n-player extensiveform game as a strategy proﬁle (πi)i N obtained as a limit point of a sequence of completely mixed strategy proﬁles {(πi,k)i N }k N such that πi BRI((πj,k)j =i N ) for all i N and I Ii. 5Since Equation (1) must hold for every ˆπℓ Πℓand ˆπf Πf : ˆπf BRˆI(πℓ,k

I ˆπℓ,k) for all ˆI If, Deﬁnition 5 assumes that the follower breaks ties in favor of the leader.

magnitude of the perturbation goes to zero. In addition to being theoretically relevant, this result enables us to design an algorithm for computing QPSEs in SEFGs (Section 7). Deﬁnition 6 (ξ-perturbation scheme). Given an SEFG Γ and i N, let ξi : (0, 1] Σi R+ be a function that maps a perturbation magnitude ϵ (0, 1] and a sequence σi Σi to a positive lower-bound ξi(ϵ, σi) on the probability of playing σi such that:

(i) ξi(ϵ, σi) is a polynomial in ϵ, for all σi Σi; (ii) lim ϵ 0+ ξi(ϵ, σi) = 0, for all σi Σi \ {σ };

(iii) lim ϵ 0+ ξi(ϵ, σi(I)a)

ξi(ϵ, σi(I)) = 0, for all I Ii, a A(I).

Then, a ξi-perturbation scheme for Ri is a function ϵ 7 Ri(ϵ) deﬁned over ϵ (0, 1] in which Ri(ϵ) is the set of all ri Ri such that ri(σi) ξi(ϵ, σi) for all σi Σi. In words, the lower-bounds on sequence probabilities enjoy the following properties: (i) they are polynomials in the variable ϵ; (ii) they approach zero as ϵ goes to zero; and (iii) ξi(ϵ, σi(I)a) approaches zero faster than ξi(ϵ, σi(I)). We denote by (Γ, ξℓ, ξf) a ξ-perturbed SEFG with ξiperturbation schemes. We let Γ(ϵ) be a particular sequenceform ξ-perturbed game instance obtained from Γ by restricting each set of realization plans Ri to be Ri(ϵ). We denote by ri(ϵ) any realization plan in Ri(ϵ), and we let ξi(ϵ) R|Qi| be a vector whose components are the lowerbounds ξi(ϵ, σi). We denote by ri(ϵ) = ri(ϵ) ξi(ϵ) the residual of ri(ϵ), which represents the part of player i s strategy that is not ﬁxed by the perturbation.6 Next, we state our main result about sequences of SEs in ξ-perturbed games. We postpone the proof to Section 6. Theorem 1. Given a ξ-perturbed SEFG (Γ, ξℓ, ξf), let {ϵk}k N 0 and let {(rℓ(ϵk), rf(ϵk))}k N be a sequence of SEs in Γ(ϵk). Then, any limit point (πℓ, πf) of the sequence {(πℓ,k, πf,k)}k N is a QPSE of Γ, where (πℓ,k, πf,k) are equivalent to (rℓ(ϵk), rf(ϵk)) for all k N. Theorem 1 also allows us to conclude the following, as a consequence of Theorem 1 of Farina et al. (2018). Corollary 1. Any QPSE of an SEFG Γ is an SE of Γ. Retirements (ii)-(iii) in Deﬁnition 6 cannot be removed: Observation 1. There are ξ-perturbed SEFGs (Γ, ξℓ, ξf) with ξi-perturbation schemes that violate point (ii) or (iii) in Deﬁnition 6 for which Theorem 1 does not hold.

Proof. Consider the SEFG in Figure 1b with ξℓ(ϵ, a1 ℓ) = ξℓ(ϵ, a2 ℓ) = ϵ and ξℓ(ϵ, a2 ℓa3 ℓ) = ξℓ(ϵ, a2 ℓa4 ℓ) = ϵ 3, which violates requirement (iii) in Deﬁnition 6. Clearly, any SE of Γ(ϵ) requires rℓ(ϵ, a1 ℓ) = 1 ϵ, rℓ(ϵ, a2 ℓ) = ϵ, rℓ(ϵ, a2 ℓa3 ℓ) = ϵ 3, and rℓ(ϵ, a2 ℓa4 ℓ) = 2ϵ

3 . Thus, any limit point of a sequence of SEs has πℓ(a3 ℓ) = 1

3 and πℓ(a4 ℓ) = 2

3, which cannot be the case in a QPSE of Γ, as the leader s optimal strategy at ℓ.2 is to play a4 ℓ. As for requirement (ii), we can build a similar example by setting ξℓ(ϵ, a2 ℓ) = 1

6We assume without loss of generality that Γ(ϵ) is well-deﬁned, that is, each set Ri(ϵ) is non-empty for every ϵ (0, 1].

Figure 1: Examples SEFGs.

Miltersen and Sørensen (2010) introduced the idea of perturbing sequence-form EFGs in order to ﬁnd a QPE. Our perturbation scheme generalizes theirs, where ξi(ϵ, σi) = ϵ|σi| for all σi Σi\{σ }, with |σi| being the number of actions in σi. There are games where our perturbation captures QPSEs that are not obtainable with theirs. For instance, in the SEFG in Figure 1a, (πℓ, πf), with πℓ(a1 ℓ) = πℓ(a3 ℓ) = 1, πℓ(a2 ℓ) = πℓ(a4 ℓ) = 0, and πf(a1 f) = πf(a2 f) = 1 2, is a QPSE that cannot be obtained with their perturbation scheme while it is reachable by setting ξℓ(ϵ, a2 ℓ) = ϵ2. We observe that (πℓ, πf) is also a QPE when we look at the game as an EFG without commitment; this shows that our perturbation scheme generalizes theirs also for QPEs. Finally, when restricting the attention to SSEs, one can state the following: limits of SSEs in ξ-perturbed games are QPSSEs of the unperturbed game. This is made formal by Theorem 4 in (Marchesi et al. 2018).

5 Best Responses in ξ-Perturbed Games We now study properties of the follower s best responses to the leader s strategy in ξ-perturbed games. These properties will be useful for proving our results later in the paper. In the following, letting Σi(a) = {σi Σi | a σi} for all a Ai, Σi(I) = S a A(I) Σi(a) denotes player i s sequences that pass through information set I Ii. For ease of presentation, given I Ii, gi,I(rℓ, rf) = P

σ Σ:σi Σi(I) ui(σ)rℓ(σℓ)rf(σf) denotes player i s expected utility contribution from terminal nodes reachable from I. Finally, for I Ii, let Ri(I) Ri be the set of ri Ri : ri(σi(I)) = 1, while, for a A(I), Ri(a) Ri(I) is the set of ri Ri : ri(σi(I)a) = 1. Let BRΓ(ϵ)(rℓ(ϵ)) = arg maxrf (ϵ) Rf (ϵ) uf(rℓ(ϵ), rf(ϵ)) be the set of follower s best responses to rℓ(ϵ) Rℓ(ϵ) in Γ(ϵ). The next lemma gives a mathematical programming formulation of the follower s best-response problem in Γ(ϵ). Lemma 1. For every rℓ(ϵ) Rℓ(ϵ), rf(ϵ) BRΓ(ϵ)(rℓ(ϵ)) if and only if rf(ϵ) is optimal for Problem P(ϵ) below.

( max rf rℓ(ϵ)T Uf rf s.t. Ff rf = ff Ffξf(ϵ), rf 0.

All omitted proofs are in (Marchesi et al. 2018). The dual of Problem P(ϵ) above is as follows. Proposition 1. For rℓ(ϵ) Rℓ(ϵ), Problem D(ϵ) below is the dual of Problem P(ϵ), where vf R|If |+1 is the vector

of dual variables.

min vf (ff Ffξf(ϵ))T vf s.t. F T f vf U T f rℓ(ϵ).

Remark 1. Constraints (2b) in Problem D(ϵ) deﬁned above ensure that, for every I If and a A(I), we have

σ Σ:σf =σf (I)a uf(σ)rℓ(ϵ, σℓ) + X

ˆI If :σf (ˆI)=σf (I)a vf,ˆI. (3)

The optimal solutions to Problem D(ϵ) enjoy important properties that are stated in the following lemmas. The ﬁrst one says that, in an optimal solution, each variable vf,I must equal the maximum possible expected utility the follower can achieve following information set I If. The second lemma says that if an optimal solution to Problem D(ϵ) satisﬁes Constraint (3) with equality for an information set I If and an action a A(I), then playing a at I is optimal in the game following I.

Lemma 2. For every rℓ(ϵ) Rℓ(ϵ), if v f R|If |+1 is optimal for Problem D(ϵ), then for every I If:

v f,I = max ˆrf Rf (I) gf,I(rℓ(ϵ), ˆrf). (4)

Lemma 3. For every rℓ(ϵ) Rℓ(ϵ), I If, and a A(I), if Constraint (3) holds with equality in an optimal solution to Problem D(ϵ), then

max ˆrf Rf (a) gf,I(rℓ(ϵ), ˆrf) = max ˆrf Rf (I) gf,I(rℓ(ϵ), ˆrf). (5)

Now we are ready to prove a fundamental property of the follower s best responses in ξ-perturbed game instances Γ(ϵ). Intuitively, in a perturbed game instance, the follower best responds playing sequence σ(If)a with probability strictly greater than its lower-bound ξf(ϵ, σf(I)a) only if playing a is optimal in the game following I. Theorem 2 formally expresses the idea that, in a perturbed game instance Γ(ϵ), when the follower decides how to best respond to a leader s commitment in a given information set, she does not take into account her future trembles, but only opponents ones.

Theorem 2. Given rℓ(ϵ) Rℓ(ϵ), rf(ϵ) BRΓ(ϵ)(rℓ(ϵ)), I If, and a A(I), if rf(ϵ, σf(I)a) > ξf(ϵ, σf(I)a), then

max ˆrf Rf (a) gf,I(rℓ(ϵ), ˆrf) = max ˆrf Rf (I) gf,I(rℓ(ϵ), ˆrf).

Proof. By Lemma 1, rf(ϵ) BRΓ(ϵ)(rℓ(ϵ)) if and only if rf(ϵ) = rf(ϵ) ξf(ϵ) is optimal for Problem P(ϵ). By applying the complementarity slackness theorem to Problems P(ϵ) and D(ϵ) we have that, if rf(ϵ) and v f R|If |+1

are optimal, then, whenever rf(ϵ, σf(I)a) > 0, that is, rf(ϵ, σf(I)a) > ξf(ϵ, σf(I)a), Constraint (3) for information set I and action a must hold with equality, which, by Lemma 3, yields Equation (5).

6 Limits of SEs in ξ-Perturbed Games are QPSEs of the Unperturbed Games Here, we prove Theorem 1. First, we introduce two lemmas. The ﬁrst provides a characterization of I-best responses in terms of sequence form. Intuitively, a follower s strategy πf is an I-best response to πℓif and only if it places positive probability only on actions a A(I) that are part of some best response of the follower below information set I. Lemma 4. Given an SEFG Γ, a completely mixed πℓ Πℓ and I If, πf BRI(πℓ) if for every a A(I):

πia > 0 = max ˆrf Rf (a) gf,I(rℓ, ˆrf) = max ˆrf Rf (I) gf,I(rℓ, ˆrf),

where rℓ Rℓis equivalent to πℓ. The next lemma shows that any limit point of a sequence of follower s best responses in ξ-perturbed games is a follower s best response at every information set in Γ. Lemma 5. Given a ξ-perturbed SEFG (Γ, ξℓ, ξf), let {ϵk}k N 0 and let {(rℓ(ϵk), rf(ϵk))}k N be a sequence of realization plans in Γ(ϵk) with rf(ϵk) BRΓ(ϵk)(rℓ(ϵk)). Then, any limit point (πℓ, πf) of {(πℓ,k, πf,k)}k N is such that, eventually, πf BRIf (πℓ,k) for all I If, where (πℓ,k, πf,k) are equivalent to (rℓ(ϵk), rf(ϵk)) for all k N. Finally, we can prove Theorem 1.

Proof of Theorem 1. First, since rf(ϵk) BRΓ(ϵk)(rℓ(ϵk)) for all k N, Lemma 5 allows us to conclude that requirement (i) in Deﬁnition 4 holds. Therefore, in order to prove Theorem 1, we need to show that requirement (ii) holds as well. For contradiction, suppose that point (ii) does not hold, that is, no matter how we choose sequences {πi,k}k N, for i N and πi Πi, there is an information set I Iℓ {I } and a leader s strategy ˆπℓ Πℓsuch that, for every ˆπf Πf : ˆπf BRˆI(πℓ,k

I ˆπℓ,k) for all ˆI If, we have:

Iπℓ, πf,k) < uℓ(πℓ,k

I ˆπℓ, ˆπf,k).

By continuity, there must exist an index k N such that, for all k N : k k, the following holds:

Iπℓ,k, πf,k) < uℓ(πℓ,k

I ˆπℓ,k, ˆπf,k).

Moreover, uℓ(πℓ,k

Iπℓ,k, πf,k) = uℓ(πℓ,k, πf,k). Let sequence {ˆπℓ,k}k N be such that ˆrℓ(ϵk) Rℓ(ϵk) for all k N, where each realization plan ˆrℓ(ϵk) is equivalent to the strategy πℓ,k

I ˆπℓ,k. This is always possible since requirement (iii) in Deﬁnition 6 is satisﬁed. Consider a sequence {(ˆrℓ(ϵk), ˆrf(ϵk)}k N with ˆrf(ϵk) BRΓ(ϵk)(ˆrℓ(ϵk)), and let {(πℓ,k

I ˆπℓ,k, ˆπf,k)}k N be a sequence such that each strategy ˆπf,k is equivalent to ˆrf(ϵk). By Lemma 5, any limit point (πℓ

I ˆπℓ, ˆπf) of {(πℓ,k

I ˆπℓ,k, ˆπf,k)}k N satisﬁes ˆπf BRˆI(πℓ,k

I ˆπℓ,k) for all ˆI If. Thus, using the equivalence between strategies and realization plans, for all k N : k k we have the following:

uℓ(rℓ(ϵk), rf(ϵk)) < uℓ(ˆrℓ(ϵk), ˆrf(ϵk)).

Notice that this holds no matter how we choose ˆrf(ϵk) BRΓ(ϵk)(ˆrℓ(ϵk)), which contradicts the fact that (rℓ(ϵk), rf(ϵk)) is an SE of Γ(ϵk).

7 Algorithm One can use our perturbation scheme to compute an (approximate) QPSE. We do this by developing an LP for computing an SEFCE in a given ξ-perturbed game instance, where we maximize the leader s value. We then conduct a branchand-bound search on this SEFCE LP. It branches on which actions to force be recommended to the follower (by the correlation device of the SEFCE). The idea is that, as long as we only recommend a single action to the follower at any given information set, we get an SE of the perturbed game (specifically an SSE, and an SSE has maximum value among all SEs), and, thus, according to Theorem 1, a QPSE (speciﬁcally QPSSE) if we take the limit point of the perturbations. As in prior papers on EFCE computation in general-sum games, we focus on games without chance nodes (von Stengel and Forges 2008; ˇCerm ak et al. 2016). For computing an SEFCE we need to specify joint probabilities over sequence pairs (σℓ, σf) Σ. However, not all pairs need to specify probabilities, only pairs such that choosing σf is affected by the probability put on σℓ(we do not need to care about the converse of this, as only the follower needs to be induced to follow the recommended strategy). Intuitively, the set of the leader s sequences relevant to a given σf Σf is made of those sequences that affect the expected value of σf or any alternative sequence ˆσf Σf whose last action is available at If(σf).

Deﬁnition 7 (Relevant sequences). A pair (σℓ, σf) Σ is relevant if either σℓ= σ or there exists h, ˆh H s.t. ˆh precedes h, h If(σf), and ˆh Iℓ(σℓ), or if the condition holds with the roles of σℓand σf reversed.

For every information set I Ii, we let rel(I) be the set of sequences relevant to each child sequence σi(I)a for a A(I). We let p(σℓ, σf) be the probability that we recommend that the leader plays sequence σℓ, and that the follower sends her residual (i.e., the probability that is not ﬁxed by the perturbation) to σf. Moreover, we let η(σf) be the maximum probability that the follower can put on a sequence σf Σf given the ξf-perturbation scheme. First, we introduce a new value function representing the value to the leader of the sequence pair (σℓ, σf) Σ given that σf represents an assignment of residual probability:

uϵ ℓ(σℓ, σf) = X

h Z:σℓ(h)=σℓ σf (h)=σf η(σf)uℓ(h) + X

ˆσf Σf ξf(ϵ, ˆσf)uℓ(σℓ, ˆσf).

The following LP ﬁnds an SEFCE in a ξ-perturbed SEFG.

(σℓ,σf ) Σ p(σℓ, σf)uϵ ℓ(σℓ, σf) s.t. (6a)

p(σ , σ ) = 1, p(σℓ, σf) 0 (σℓ, σf) Σ (6b) X

σf rel(σℓ) p(σℓ, σf) ξℓ(ϵ, σℓ) σℓ Σℓ

p(σℓ(I), σf) = X

a A(I) p(σℓ(I)a, σf) I Iℓ, σf rel(I)

p(σℓ, σf(I)) = X

a A(I) p(σℓ, σf(I)a) I If, σℓ rel(I)

v(σf) = η(σf) X

σℓ rel(σf ) p(σℓ, σf)uf(σℓ, σf) + (6f)

I If :σf (I)=σf

a A(I) v(σfa) σf Σf

v(I, σf) η(σf(I)a) X

σℓ rel(σf ) p(σℓ, σf)uf(σℓ, σf(I)a) (6g)

ˆ I If ;σf ( ˆ I)=σf (I)a v(ˆI, σf) I If, a A(I), σf prec(I)

v(σf(I)a) = v(I, σf(I)a) I If, a A(I). (6h)

In (6g) of this LP, prec(I), where I If, is the set of follower s sequences σf that precede I in the sense that there is ˆI If with σf(ˆI) σf(I) and σf = σf(ˆI)a for some a A(ˆI). This LP is a modiﬁcation of the SEFCE LP given by ˇCerm ak et al. (2016). The new LP has two modiﬁcations to allow perturbation. First, it has constraints (6c) to ensure that the sum of recommendation probabilities on any leader s sequence is at least ξℓ(ϵ, σℓ). Second, because we are now recommending where to send residual probability for the follower, we must modify the objective in order to give the correct expected value for the leader.7 We can branch-and-bound on recommendations to the follower in a way that ensures that the ﬁnal outcome is an SSE. That is guaranteed by the following theorem, which shows that we can add and remove constraints on which follower actions to recommend in a way that guarantees an SSE of the perturbed game as long as the follower is recommended a pure strategy with respect to the residual probabilities.

Theorem 3. If a solution to LP (6) is such that for all I If there exists a A(I) such that p(σℓ, σf(I)ˆa) = 0 for all ˆa A(I), σℓ rel(σf(I)a) with ˆa = a, then a strategy proﬁle can be extracted in polynomial time such that it is an SSE of the perturbed game instance.

Now it is obvious that the LP (6) upper bounds the value of any SSE since the SSE is a feasible solution to the LP. Theorem 3 shows that one way to ﬁnd an SSE is to ﬁnd a solution to LP (6) where the follower is recommended a pure strategy with respect to the residual probabilities. Since any SSE represents such a solution, we can branch on which actions we make pure at each information set, and use branchand-bound to prune the space of possible solutions. This approach was proposed by ˇCerm ak et al. (2016) for computing SSEs in unperturbed games, where they showed that it performs better than a single MIP. Because our LP for perturbed games uses residual probabilities for the follower, we can apply the branching methodology of ˇCerm ak et al. (2016). At each node in the search we choose some information set I where more than one action is recommended. We then

7We use the deﬁnition of relevant sequences and the LP from von Stengel and Forges (2008) rather than those of ˇCerm ak et al. (2016). The latter are not well deﬁned for (6d) and (6e).

branch on which action in A(I) to recommend. Forcing a given action is accomplished by requiring all other action probabilities be zero. Our branch-and-bound chooses information sets according to depth, always branching on the shallowest one with at least two recommended action. We explore actions in descending order of mass, where the mass on a A(I) (with sequence σf) is P

σℓ rel(σf ) p(σℓ, σf). The algorithm ﬁnds an SSE of the perturbed game. In the limit as the perturbation approaches zero, this yields a QPSE. No algorithm is currently known for computing such an exact limit. In practice, we pick a small perturbation and solve the branch-and-bound using that value. This immediately leads to an approximate notion of QPSE (akin to approximate reﬁnement notions in non-Stackelberg EFGs (Farina, Kroer, and Sandholm 2017; Kroer, Farina, and Sandholm 2017)). Another approach is to use our algorithm as an anytime algorithm where one runs it repeatedly with smaller and smaller perturbation values.

8 Experiments We conducted experiments with our algorithm on two common benchmark EFGs. The ﬁrst is a search game played on the graph shown in Figure 2. It is a simultaneous-move game (which can be modeled as a turn-taking EFG with appropriately chosen information sets). The leader controls two patrols that can each move within their respective shaded areas (labeled P1 and P2), and at each time step the controller chooses a move for both patrols. The follower is always at a single node on the graph, initially the leftmost node labeled S and can move freely to any adjacent node (except at patrolled nodes, the follower cannot move from a patrolled node to another patrolled node). The follower can also choose to wait in place for a time step in order to clean up their traces. If a patrol visits a node that was previously visited by the follower, and the follower did not wait to clean up their traces, they can see that the follower was there. If the follower reaches any of the rightmost nodes they received the respective payoff at the node (5 and 10, respectively). If the follower and any patrol are on the same node at any time step, the follower is captured, which leads to a payoff of 0 for the follower and a payoff of 1 for the leader. Finally, the game times out after k simultaneous moves, in which case the leader receives payoff 0 and the follower receives (because we are interested in games where the follower attempts to reach an end node). This is the game considered by Kroer, Farina, and Sandholm (2018) except with the bottom layer removed, and is similar to games considered by Boˇsansk y et al. (2014) and Boˇsansk y and ˇCerm ak (2015).

Figure 2: The graph on which the search game is played.

The second game is a variant on Goofspiel (Ross 1971),

a bidding game where each player has a hand of cards numbered 1 to 3. There are 3 prizes worth 1, . . . , 3. In each turn, the prize is the smallest among the remaining prizes. Within the turn, the each of two players simultaneously chooses some private card to play. The player with the larger card wins the prize. In case of a tie, the prize is discarded, so this is not a constant-sum game. The cards that were played get discarded. Once all cards have been played, a player s score is the sum of the prizes that she has won. The LP solver we used is GLPK 4.63 (GLPK 2017). We had to make the following changes to GLPK. First, we had to expose some internal routines so that we could input to the solver rational numbers rather than double-precision numbers. Second, we ﬁxed a glitch in GLPK s rational LP solver in its pivoting step (it was not correct when the rational numbers were too small). Our code and GLPK use the GNU GMP library to provide arbitrary-precision arithmetic. The code, written in the C++14 language, was compiled with the g++ 7.2.0 compiler. It was run on a single thread on a 2.3 GHz Intel Xeon processor. The results are shown in Figure 3.

Search Game

Leader s utility difference

Compute time (min)

Pertubation magnitude ϵ

Leader s utility difference

Compute time (sec)

Figure 3: Experiments. Dashed lines show compute time. Solid lines show the loss in the leader s utility compared to the SSE value in the unperturbed game.

9 Conclusions and Future Research Quasi-perfect equilibrium has been studied in extensiveform games, but was poorly understood in Stackelberg settings. We provided a game-theoretic, axiomatic deﬁnition of quasi-perfect Stackelberg equilibrium (QPSE). We developed a family of game perturbation schemes that lead to a

QPSE in the limit. Our family generalizes prior perturbation schemes introduced for ﬁnding (even non-Stackelberg) quasi-perfect equilibria. Using our perturbation schemes, we developed a branch-and-bound algorithm for QPSE. It leverages a perturbed variant of the linear program for computing a Stackelberg extensive-form correlated equilibrium. Experiments show that our algorithm can be used to ﬁnd an approximate QPSE in games with thousands of nodes. We showed that some perturbation schemes outside our family do not lead to QPSEs in some games. It remains an open question whether our perturbation family fully characterizes the whole set of QPSEs. As to requirement (i) in Deﬁnition 6, can all QPSEs be captured by perturbation schemes that only use polynomial lower bounds on trembles? It was recently shown that in non-Stackelberg extensiveform games, there exists a perturbation size that is small enough (while still strictly positive) that an exact reﬁned (e.g., quasi-perfect) equilibrium can be found by solving a mathematical program with that perturbation size (Miltersen and Sørensen 2010; Farina and Gatti 2017; Farina, Gatti, and Sandholm 2018), and Farina, Gatti, and Sandholm (2018) provide an algorithm for checking whether a given guess of perturbation size is small enough. That obviates the need to try to explicitly compute a limit of a sequence. It would be interesting to see whether such theory can also be developed for Stackelberg extensive-form games and for our perturbation family in particular.

Acknowledgements This material is based on work supported by the National Science Foundation under grants IIS-1718457, IIS-1617590, and CCF-1733556, and the ARO under award W911NF-171-0082. Christian Kroer was also sponsored by a Facebook Fellowship.

References Boˇsansk y, B., and ˇCerm ak, J. 2015. Sequence-form algorithm for computing Stackelberg equilibria in extensive-form games. In AAAI, 805 811. Boˇsansk y, B.; Kiekintveld, C.; Lis y, V.; and Pˇechouˇcek, M. 2014. An exact double-oracle algorithm for zero-sum extensive-form games with imperfect information. Journal of Artiﬁcial Intelligence Research 829 866.

ˇCerm ak, J.; Boˇsansk y, B.; Durkota, K.; Lis y, V.; and Kiekintveld, C. 2016. Using correlated strategies for computing Stackelberg equilibria in extensive-form games. In AAAI, 439 445. Conitzer, V., and Sandholm, T. 2006. Computing the optimal strategy to commit to. In EC, 82 90. Farina, G., and Gatti, N. 2017. Extensive-form perfect equilibrium computation in two-player games. In AAAI, 502 508. Farina, G.; Marchesi, A.; Kroer, C.; Gatti, N.; and Sandholm, T. 2018. Trembling-hand perfection in extensive-form games with commitment. In IJCAI, 233 239. Farina, G.; Gatti, N.; and Sandholm, T. 2018. Practical exact algorithm for trembling-hand equilibrium reﬁnements in games. In AAMAS-IJCAI workshop on Agents and Incentives in Artiﬁcial Intelligence (AI3).

Farina, G.; Kroer, C.; and Sandholm, T. 2017. Regret minimization in behaviorally-constrained zero-sum games. In ICML, 1107 1116. GLPK. 2017. Gnu linear programming kit, version 4.63. Kroer, C.; Farina, G.; and Sandholm, T. 2017. Smoothing method for approximate extensive-form perfect equilibrium. In IJCAI, 295 301. Kroer, C.; Farina, G.; and Sandholm, T. 2018. Robust Stackelberg equilibria in extensive-form games and extension to limited lookahead. In AAAI, 1130 1137. Kuhn, H. W. 1953. Extensive games and the problem of information. Annals of Mathematics Study 28 193 216. Letchford, J., and Conitzer, V. 2010. Computing optimal strategies to commit to in extensive-form games. In EC, 83 92. Marchesi, A.; Farina, G.; Kroer, C.; Gatti, N.; and Sandholm, T. 2018. Quasi-perfect stackelberg equilibrium. Co RR abs/1811.03871. Mertens, J.-F. 1995. Two examples of strategic equilibrium. Games and Economic Behavior 8(2):378 388. Miltersen, P. B., and Sørensen, T. B. 2008. Fast algorithms for ﬁnding proper strategies in game trees. In SODA, 874 883. Miltersen, P. B., and Sørensen, T. B. 2010. Computing a quasiperfect equilibrium of a two-player game. Economic Theory 42(1):175 192. Romanovskii, I. 1962. Reduction of a game with complete memory to a matrix game. Soviet Mathematics 3. Ross, S. M. 1971. Goofspielthe game of pure strategy. Journal of Applied Probability 8(3):621 625. Selten, R. 1975. Reexamination of the perfectness concept for equilibrium points in extensive games. International journal of game theory 4(1):25 55. Shoham, Y., and Leyton-Brown, K. 2008. Multiagent systems: Algorithmic, game-theoretic, and logical foundations. Cambridge University Press. Sørensen, T. B. 2012. Computing a proper equilibrium of a bimatrix game. In EC, 916 928. Tambe, M. 2011. Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned. Cambridge University Press. Van Damme, E. 1984. A relation between perfect equilibria in extensive form games and proper equilibria in normal form games. International Journal of Game Theory 13(1):1 13. van Damme, E. 1987. Stability and Perfection of Nash Equilibria. Berlin, Heidelberg: Springer-Verlag. von Stackelberg, H. 1934. Marktform und Gleichgewicht. Springer, Vienna. von Stengel, B., and Forges, F. 2008. Extensive-form correlated equilibrium: Deﬁnition and computational complexity. Mathematics of Operations Research 33(4):1002 1022. Von Stengel, B. 1996. Efﬁcient computation of behavior strategies. Games and Economic Behavior 14(2):220 246.