# on_computational_tractability_for_rational_verification__9e701771.pdf

On Computational Tractability for Rational Veriﬁcation

Julian Gutierrez1 , Muhammad Najib1 , Giuseppe Perelli2 , Michael Wooldridge1

1Department of Computer Science, University of Oxford, UK 2Department of Informatics, University of Leicester, UK {julian.gutierrez, mnajib, michael.wooldridge}@cs.ox.ac.uk, giuseppe.perelli@leicester.ac.uk

Rational veriﬁcation involves checking which temporal logic properties hold of a concurrent/multiagent system, under the assumption that agents in the system choose strategies in game theoretic equilibrium. Rational veriﬁcation can be understood as a counterpart of model checking for multiagent systems, but while model checking can be done in polynomial time for some temporal logic speciﬁcation languages such as CTL, and polynomial space with LTL speciﬁcations, rational veriﬁcation is much more intractable: 2EXPTIMEcomplete with LTL speciﬁcations, even when using explicit-state system representations. In this paper we show that the complexity of rational veriﬁcation can be greatly reduced by restricting speciﬁcations to GR(1), a fragment of LTL that can represent most response properties of reactive systems. We also provide improved complexity results for rational veriﬁcation when considering players goals given by mean-payoff utility functions arguably the most widely used quantitative objective for agents in concurrent and multiagent systems. In particular, we show that for a number of relevant settings, rational veriﬁcation can be done in polynomial space or even in polynomial time.

1 Introduction The formal veriﬁcation of systems using temporal logics such as LTL and CTL [Emerson, 1990] is a major research area, which has led to the development of an impressive number of industrial-strength veriﬁcation tools and techniques. Arguably the most successful technique within formal veriﬁcation is model checking, which can be done in polynomial space for LTL speciﬁcations and even in polynomial time for CTL speciﬁcations [Clarke et al., 2018]. In the context of multiagent systems, rational veriﬁcation forms a natural counterpart of model checking [Gutierrez et al., 2015b; Wooldridge et al., 2016; Gutierrez et al., 2017a]. This is the problem of checking whether a given property ϕ, expressed as a temporal logic formula, is satisﬁed in a computation of a system that might be generated if agents within the system choose strategies for selecting actions that form a game-

theoretic (e.g., Nash) equilibrium. Unlike model checking, rational veriﬁcation is still in its infancy: the main ideas, formal models, and reasoning techniques underlying rational veriﬁcation are under development, while current tool support is limited and cannot yet handle systems of industrial size [Toumi et al., 2015; Gutierrez et al., 2018a]. One key difﬁculty is that rational veriﬁcation is computationally much harder than model checking, because checking equilibrium properties requires quantifying over the strategies available to players in the system. Rational veriﬁcation is also different from model checking in the kinds of properties that each technique tries to check: while model checking is interested in correctness with respect to any possible behaviour of a system, rational veriﬁcation is interested only in behaviours that can be sustained by a Nash equilibrium, when a multiagent system is modelled as a multi-player game. This, in particular, adds a new ingredient to the veriﬁcation problem, as it is now necessary to take into account the preferences of players with respect to the possible runs of the system. Typically, in rational veriﬁcation, such preferences are given by associating an LTL goal γi with each player i in the game. In this case, rational veriﬁcation with respect to a speciﬁcation ϕ is 2EXPTIME-complete, regardless of whether the representation of the system is given succinctly [Gutierrez et al., 2017a; Gutierrez et al., 2015b] or explicitly simply as a ﬁnite-state labelled transition graph [Gutierrez et al., 2015a]. In this paper, we address this issue and provide complexity results that greatly improve on the 2EXPTIME-complete result of the general case. In particular, we consider games where the goals of players are represented as either GR(1) formulae (an important fragment of LTL that can express most response properties of a concurrent and reactive system [Bloem et al., 2012]), or mean-payoff utility functions (one of the most studied reward and quality measures used in games for automated formal veriﬁcation). In each case, we study the rational veriﬁcation problem for system speciﬁcations ϕ given as GR(1) formulae and as LTL formulae, with respect to system models that are formally represented as concurrent game structures [Alur et al., 2002]. Our main results, summarised in Table 1, show that in the cases above mentioned, the 2EXPTIME result can be dramatically improved, to settings where rational veriﬁcation can be solved in polynomial space, NP, or even in polynomial time if the number of players in the game is assumed to be ﬁxed.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

Players goals Speciﬁcation E-NASH

LTL LTL 2EXPTIME-complete GR(1) LTL PSPACE-complete (Corollary 1) GR(1) GR(1) FPT (Theorem 3) mp LTL PSPACE-complete (Corollary 2) mp GR(1) NP-complete (Theorem 5)

Table 1: Summary of main complexity results.

Related Work:

Rational veriﬁcation has been studied for a number of settings, including iterated Boolean games, reactive modules games, and concurrent game structures [Gutierrez et al., 2015b; Gutierrez et al., 2017a; Gutierrez et al., 2015a; Gutierrez et al., 2017b]. In all cases, the problem is 2EXPTIMEcomplete. Rational veriﬁcation is also closely related to rational synthesis, which is also 2EXPTIME-complete both in the Boolean case [Fisman et al., 2010] and with rational environments [Kupferman et al., 2016]. All of the above cases only consider perfect information. In settings with imperfect information, the problem has been shown to be undecidable both for games with succinct and explicit model representations [Gutierrez et al., 2018b; Filiot et al., 2018]. Our work also relates to LTL and mean-payoff (mp) games at large. While the former are already 2EXPTIME-complete even for two-player games (and in fact already 2EXPTIMEhard for many LTL fragments [Alur and La Torre, 2004]), the latter are NP-complete for multi-player games [Ummels and Wojtczak, 2011] and in NP co NP for two-player games [Zwick and Paterson, 1996], and in fact solvable in quasipolynomial time since they can be reduced to two-player perfect-information parity games [Calude et al., 2017].

2 Preliminaries

Linear Temporal Logic. LTL extends propositional logic with two operators, X ( next ) and U ( until ), for expressing properties of paths [Pnueli, 1977; Emerson, 1990]. The syntax of LTL is deﬁned with respect to a set AP of atomic propositions as follows:

ϕ ::= | p | ϕ | ϕ ϕ | Xϕ | ϕ U ϕ

where p AP. As usual, we deﬁne ϕ1 ϕ2 ( ϕ1 ϕ2), ϕ1 ϕ2 ϕ1 ϕ2, Fϕ U ϕ, and Gϕ F ϕ. We interpret LTL formulae with respect to pairs (α, t), where α (2AP)ω is an inﬁnite sequence of sets of atomic proposition that indicates which propositional variables are true in every time point and t N is a temporal index into α. Formally, the semantics of LTL is given by the following rules:

(α, t) |= (α, t) |= p iff p αt (α, t) |= ϕ iff it is not the case that (α, t) |= ϕ (α, t) |= ϕ ψ iff (α, t) |= ϕ or (α, t) |= ψ (α, t) |= Xϕ iff (α, t + 1) |= ϕ (α, t) |= ϕ U ψ iff for some t t : (α, t ) |= ψ and for all t t < t : (α, t ) |= ϕ .

If (α, 0) |= ϕ, we write α |= ϕ and say that α satisﬁes ϕ. General Reactivity of rank 1. The language of General Reactivity of rank 1, denoted GR(1), is the fragment of LTL of formulae written in the following form [Bloem et al., 2012]:

(GFψ1 . . . GFψm) (GFϕ1 . . . GFϕn),

where each subformula ψi and ϕi is a Boolean combination of atomic propositions. Mean-Payoff value. For an inﬁnite sequence β Rω of real numbers, let mp(β) be the mean-payoff value of β, that is,

mp(β) = lim inf n avgn(β)

where, for n N, we deﬁne avgn(β) = 1

n Pn 1 j=0 βj.

Arenas. An arena is a tuple

A = N, Ac, St, s0, tr, λ

where N, Ac, and St are ﬁnite non-empty sets of players (write N = |N|), actions, and states, respectively; s0 St is the initial state; tr : St Ac St is a transition function mapping each pair consisting of a state s St and an action proﬁle a Ac = Ac N, one for each player, to a successor state; and λ : St 2AP is a labelling function, mapping every state to a subset of atomic propositions. We sometimes call an action proﬁle a = (a1, . . . , an) Ac a decision, and denote ai the action taken by player i. We also consider partial decisions. For a set of players C N and action proﬁle a, we let a C and a C be two tuples of actions, respectively, one for all players in C and one for all players in N\C. We also write ai for a{i} and a i for a N\{i}. For two decisions a and a , we write ( a C, a C) to denote the decision where the actions for players in C are taken from a and the actions for players in N \ C are taken from a . A path π = (s0, a0), (s1, a1) is an inﬁnite sequence in (St Ac)ω such that tr(sk, ak) = sk+1 for all k. Paths are generated in the arena by each player i selecting a strategy σi that will deﬁne how to make choices over time. We model strategies as ﬁnite state machines with output. Formally, for arena A, a strategy σi = (Qi, q0 i , δi, τi) for player i is a ﬁnite state machine with output (a transducer), where Qi is a ﬁnite and non-empty set of internal states, q0 i is the initial state, δi : Qi Ac Qi is a deterministic internal transition function, and τi : Qi Aci an action function, Aci Ac for all i N. Let Stri be the set of strategies for player i. A strategy proﬁle σ = (σ1, . . . , σn) is a vector of strategies, one for each player. As with actions, σi denotes the strategy assigned to player i in proﬁle σ. Moreover, by ( σB, σ C) we

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

denote the combination of proﬁles where players in disjoint B and C are assigned their corresponding strategies in σ and σ , respectively. Once a state s and a strategy proﬁle σ are ﬁxed, the game has an outcome, a path in A, which we denote by π( σ, s). Because strategies are deterministic, π( σ, s) is the unique path induced by σ, that is, the sequence s0, s1, s2, . . . such that sk+1 = tr(sk, (τ1(qk 1), . . . , τn(qk n))), and

qk+1 i = δi(qk i , (τ1(qk 1), . . . , τn(qk n))), for all k 0. Arenas deﬁne the dynamic structure of games, but lack a central aspect of a game: preferences, which give games their strategic structure. A multi-player game is obtained from an arena A by associating each player with a goal. We consider multi-player games with GR(1) and mp goals. A multi-player GR(1) game is a tuple GGR(1) = A, (γi)i N where A is an arena and γi is the GR(1) goal for player i. A multi-player mp game is a tuple Gmp = A, (wi)i N , where A is an arena and wi : St Z is a function mapping every state of the arena into an integer number. When it is clear from the context, we refer to a multi-player GR(1) or mp game as a game and denote it by G. In any game with arena A, a path π in A induces a sequence λ(π) = λ(s0)λ(s1) of sets of atomic propositions; if, in addition, A is the arena of an mp game, then, for each player i, the sequence wi(π) = wi(s0)wi(s1) of weights is also induced. For a GR(1) game and a path π in it, the payoff of a player i is payi(π) = 1 if λ(π) |= γi and payi(π) = 0 otherwise. Regarding an mp game, the payoff of player i is payi(π) = mp(wi(π)). Moreover, for a GR(1) game and a path π, by Win(π) = {i N : λ(π) |= γi} and Lose(π) = {j N : λ(π) |= γj} we denote the set of winners and losers, respectively, over π, that is, the set of players that get their goal satisﬁed and not satisﬁed, respectively, over π. With an abuse of notation, we sometime denote Win( σ, s) = Win(π( σ, s)) and Lose( σ, s) = Lose(π( σ, s)), respectively, the set of winners and losers over the path generated by strategy proﬁle σ when starting the game from s. Furthermore, we simply write π( σ) for π( σ, s0). Nash equilibrium. Using payoff functions, we can deﬁne the concept of Nash equilibrium [Osborne and Rubinstein, 1994]. For a game G, a strategy proﬁle σ is a Nash equilibrium of G if, for every player i and strategy σ i Stri, we have payi(π( σ)) payi(π(( σ i, σ i))) . Let NE(G) be the set of Nash equilibria of G. E-NASH and rational veriﬁcation. In rational veriﬁcation, a key question/problem is E-NASH, which is concerned with the existence of a Nash equilibrium that fulﬁls a given temporal speciﬁcation ϕ. Formally, E-NASH is deﬁned as follows: Deﬁnition 1 (E-NASH). Given a game G and a formula ϕ: Does there exist σ NE(G) such that π( σ) |= ϕ? This problem can be instantiated in many ways. For instance, in [Gutierrez et al., 2015b], E-NASH was investigated over iterated Boolean Games with speciﬁcations and players goals in LTL, and was proved to be 2EXPTIME-complete. Iterated Boolean games is a very natural framework, but it is computationally intractable.

Motivated by this computational limitation, in this paper, we study E-NASH for a number of relevant instantiations of the problem, which we show to have better computational complexity. In particular, we study cases where

Speciﬁcations ϕ are LTL and players goals are GR(1);

Speciﬁcations ϕ are LTL and players have mp goals;

Both the speciﬁcation ϕ and the goals are GR(1);

Speciﬁcations ϕ are GR(1) and players have mp goals.

Automata. Some of the algorithms we present for the ENASH problem use techniques from automata theory. Specifically, we use deterministic automata on inﬁnite words with Streett acceptance conditions. Formally, a deterministic Streett automaton on inﬁnite words (DSW) is a tuple A = (Σ, Q, q0, δ, Ω) where Σ is the input alphabet, Q is a ﬁnite set of states, δ : Q Σ Q is a transition function, q0 is an initial state, and Ωis a Streett acceptance condition. A Streett condition Ωis a set of pairs {(E1, C1), . . . , (En, Cn)} where Ek Q and Ck Q for all k [1, n]. A run ρ is accepting in a DSW A with condition Ωif ρ either visits Ek ﬁnitely many times or visits Ck inﬁnitely often, i.e., if for every k either inf (ρ) Ek = or inf (ρ) Ck = .

3 Games of General Reactivity of Rank 1

As indicated before, we solve GR(1) games in two cases: the ﬁrst one is when the speciﬁcation formula is expressed in LTL, while the goals are in GR(1); the second one when the speciﬁcation formula as well as the goals belong to GR(1). First, we provide a general result about a characterization of Nash Equilibrium for GR(1) given in terms of punishments. We ﬁrst require some notation. For a GR(1) game G, player j N, and state s St, the strategy proﬁle σ j is punishing for player j in s if π(( σ j, σ j), s) |= γj, for every possible strategy σ j of player j. We say that a state s is punishing for j if there exists a punishing strategy proﬁle for j on s. Moreover, we denote by Punj(G) the set of punishing states in G. A pair (s, a) St Ac is punishing-secure for player j, if tr(s, ( a j, a j)) Punj(G) for every action a j.

Theorem 1. In a given GR(1) game G, there exists a Nash Equilibrium if and only if there exists an ultimately periodic path π such that, for every k N, the pair (sk, ak) of the k-th iteration of π is punishing secure for every j Lose(π).

Proof sketch. From left to right, let σ NE(G) and π be the ultimately periodic path generated by σ. Assume by contradiction that π is not punishing secure for some j N, that is, there is k N and action a j such that tr(sk, ( a j, a j)k) / Punj(G). Thus, j can deviate at sk and satisfy γj, which is a contradiction to σ being a Nash equilibrium. From right to left, recall that π can be generated by a ﬁnite transducer, say T. Moreover, for every losing player j, there is a punishing strategy proﬁle for j in every s Punj(G). Combining T with such punishment strategies, we build a proﬁle σ that follows the actions prescribed by T, until a losing player j deviates. In such a case, σ would start punishing player j. Observe that GR(1) objectives are preﬁx-independent, which

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

is not true for general LTL objectives. That means that the punishment from the k-th iteration takes effect no matter what preﬁx π k has been played so far. Thus, there is no beneﬁcial deviation for j and σ is a Nash equilibrium.

At this point, solving E-NASH can be done as follows:

1. Guess a set W N of winners; 2. For each player j L = N \ W, a loser in the game, compute its punishment region Punj(G); 3. Remove from G the states that are not punishing for players j L and the edges (s, s ) that are labelled with an action proﬁle a such that (s, a) is not punishingsecure for some j L, thus obtaining a game G L; 4. Check whether there exists an ultimately periodic path π in G L such that π |= ϕ V i W γi holds.

The four steps described in the above procedure yield Algorithm 1, which solves the problem at hand.

Algorithm 1: E-NASH of GR(1) games.

1 Input: A game GGR(1) and a speciﬁcation formula ϕ.

2 for i N do

3 Compute Puni(G)

4 for W N do

5 Compute G L

6 if π |= (ϕ V i W γi) for some π G L then

7 return Accept

8 return Reject

While line 6 requires solving the model checking problem for an LTL formula, which can be done in polynomial space, line 5 can be done in polynomial time. Line 4, on the other hand, makes the procedure run in exponential time in the number of players, but still in polynomial space. We then only need to check line 3: this step can be done in polynomial time, as we now show. Theorem 2. For a given GR(1) game G over the arena A = N, Ac, St, s0, tr, λ and a player j N, computing the winning region Punj(G) of player j can be done in polynomial time with respect to the size of both G and γj.

Proof. We reduce the problem to computing the winning region of a suitably deﬁned Streett game of index k = 1, whose complexity is known to be O(mnk+1kk!) [Piterman and Pnueli, 2006]. Given that in our case we have k = 1, we obtain a polynomial time algorithm. Recall that the goal of player j is of the form:

r=1 GFθj r,

where ψj l s and θj r s are boolean combinations of atomic propositions. Then, consider the arena A = N, Ac, St , s 0, tr 1 where

1We omit the deﬁnition of labelling function, as not needed here.

St = St {0, . . . , mj} {0, . . . , nj}; s 0 = (s0, 0, 0); tr ((s, ι1, ι2), a) = (tr(s, a), ι 1, ι 2) where

1, if ι1 = 0 ι1, if ι1 = 0 and s |= ψj ι1 (ι1 (mj+1) 1), otherwise , and

1, if ι2 = 0 ι2, if ι2 = 0 and s |= θj ι2 (ι2 (nj+1) 1), otherwise

Intuitively, arena A mimics the behaviour of A and carries two indexes, ι1 and ι2. Index ι1 is increased by one every time the path visits a state that satisﬁes ψj ι1 and resets to 0 every time the path visits a state that satisﬁes ψj mj. Clearly, ι1 is reset inﬁnitely many times if and only if the path satisﬁes every ψj l inﬁnitely many times, and so if and only if it satisﬁes the temporal speciﬁcation Vmj l=1 GFψj l . The same argument applies to index ι2, but with respect to the boolean combinations θj r s. Now, consider the sets Cj = St {0} {0, . . . , nj} and Ej = St {0, . . . , mj} {0}. Clearly, the Streett pair (Cj, Ej) is satisﬁed by all and only the paths in A that satisfy γj. Therefore, the winning region of γj can be computed as the winning set of the Streett game of index 1 with (Cj, Ej) being the only Streett pair. As this can be done in polynomial time, we proved the statement.

Based on Theorem 2, we have the following result. Corollary 1. The E-NASH problem for GR(1) games with an LTL speciﬁcation is PSPACE-complete.

Proof. The upper-bound follows from the procedure described above. Regarding the lower-bound, note that modelchecking an LTL formula ϕ against a Kripke structure K can be easily encoded as an instance of E-NASH where G is played over a Kripke structure K, taken to be its arena, players goals being tautologies, and the speciﬁcation being ϕ. In such a case, we have that K |= ϕ if and only if E-NASH for the pair (G, ϕ) has a negative answer.

Corollary 1 sharply contrasts with the same result in case the goals of the players are general LTL formulae. In this more general case, E-NASH is 2EXPTIME-complete. The special case of GR(1) speciﬁcations. One of hardest parts of Algorithm 1 is line 6, where an LTL model checking problem has to be solved, making the running time of the whole procedure exponential in the size of the speciﬁcation and goals of the players. As we show in this section, a way to drastically reduce the complexity of our decision procedure is to let the speciﬁcation be in GR(1) too. In such a case, the LTL model checking procedure in line 6 of Algorithm 1 can be avoided, leading to a much simpler construction, which runs in polynomial time for every ﬁxed number of players. In this section, we provide precisely such a simpler construction. Recall that every GR(1) speciﬁcation ϕ can be regarded as a Streett condition of index 1 over an arena A suitably

2By k we denote the addition modulo k.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

constructed from the original arena A. Thus, by denoting (Cϕ, Eϕ) and (Ci, Ei) the Streett pairs corresponding to the GR(1) conditions ϕ and γi, respectively, the problem of ﬁnding a path in A satisfying the formula ϕ V i W γi amounts to deciding the emptiness of the Streett automaton A = Ac, St , s 0, tr, Ω where Ω= {(Cϕ, Eϕ), (Cγi, Eγi)i W }. Note that the size of A is polynomial in the size of the GR(1) formulae involved, polynomial in the number of states and actions in the original arena A, and exponential in the number of players. More speciﬁcally, we have that |St | = |St| |γ||N| and so the number of edges is at most |St | 2. Moreover, the emptiness problem of a deterministic Streett word automaton can be solved in time that is polynomial in the automaton s index and its number of states and transitions [Rauch Henzinger and Telle, 1996; Kupferman, 2015]. The complexity of the E-NASH problem takes 2|N| times a procedure for computing at most |N| punishing regions (that is polynomial in the size of both G and ϕ, γ1, . . . , γN) plus the complexity of the emptiness problem for a Streett automaton whose size is polynomial in G ϕ, γ1, . . . , γN, and exponential in the number of players. Based on the constructions described above, we can show the following (ﬁxed-parameter tractable) complexity result.

Theorem 3. For a given GR(1) game G and a GR(1) formula ϕ, the E-NASH problem can be solved in time that is polynomial in |St|, |Ac|, and |ϕ|, |γ1|, . . . , |γN| and exponential in the number of players |N|. Therefore, the problem is ﬁxedparameter tractable, parametrized in the number of players.

4 Mean-Payoff Games

We now focus on multi-player mean-payoff (mp) games. As in the previous case, we ﬁrst characterise the Nash Equilibria of a game in terms of punishments and then reduce E-NASH to a suitable path-ﬁnding problem in the underlying arena. To do this, we ﬁrst need to recall the notion of secure values for mean-payoff games [Ummels and Wojtczak, 2011]. For a player i and a state s St, by puni(s) we denote the punishment value of i over s, that is, the maximum payoff that i can achieve from s, when all other players behave adversarially. Such a value can be computed by considering the corresponding two-player zero-sum mean-payoff game [Zwick and Paterson, 1996]. Thus, it is in NP co NP, and note that both player i and coalition N \ {i} can achieve the optimal value of the game using memoryless strategies. For a player i and a value z R, a pair (s, a) is z-secure for i if puni(tr(s, ( a i, a i))) z for every a i Ac.

Theorem 4. For every mp game G and ultimately periodic path π = (s0, a0), (s1, a1), . . ., the following are equivalent

1. There is σ NE(G) such that π = π( σ, s0);

2. There exists z RN, where zi {puni(s) : s St} such that, for every i N

(a) for all k N, the pair (sk, ak) is zi-secure for i, and (b) zi payi(π).

Proof sketch. (1) (2): We prove by contraposition. Let zi be the largest value player i can get by deviating from π, and let k N be such that zi = puni(tr(sk, ( a i, a i))), i.e., i can get as much as zi by going alone at sk. Suppose further that payi(π) < zi. Thus, player i would deviate at sk since i can get better payoff by not following π which is a contradiction to π being a path induced by a Nash equilibrium. (2) (1): Deﬁne strategy proﬁle σ that follows π as long as no-one has deviated from π. In such a case where player i deviates on the k-th iteration, the strategy proﬁle σ i starts playing the zi-secure strategy for player i that guarantees the payoff of player i to be less than zi. Therefore, we have payi(π( σ i, σ i)) zi payi(π), for every possible strategy σ i of player i (the second inequality is due to condition 2(b)). Thus, there is no beneﬁcial deviation for player i and π is a path induced by a Nash equilibrium.

The characterization of Nash Equilibria provided in Theorem 4 allows us to turn the E-NASH problem for mp games into a path ﬁnding problem over G. Similarly to the case of GR(1) games, we have the following procedure.

1. For every i N and s St, compute the value puni(s);

2. Guess a vector z RN of values, each of them being a punishment value for a player i;

3. Compute the game G[z] by removing the states s such that puni(s) zi for some player i and the transitions (s, a) that are not zi secure for some player i;

4. Find an ultimately periodic path π in game G[z] such that π |= ϕ and zi payi(π) for every player i N.

Step 1 can be done in NP for every pair (i, s), step 2 can be done in exponential time and polynomial space in the number of z-secure values, and step 3 can be done in polynomial time, similar to the case of GR(1) games. Regarding the last step, its complexity depends on the speciﬁcation language. For the case of ϕ being an LTL formula, consider the formula

ϕE-NASH := ϕ

i N (mp(i) zi),

written in the language LTLLim, an extension of LTL where statements about mean-payoff values over a given weighted arena can be made [Boker et al., 2014]. Observe that formula ϕE-NASH corresponds exactly to requirement 2(b) in Theorem 4. Moreover, since every path in G[z] satisﬁes condition 2(a) by construction, every path that satisﬁes ϕE-NASH is a solution of the E-NASH problem and viceversa. We can solve the latter problem by model checking the formula against the arena underlying G[z]. Since this can be done in PSPACE [Boker et al., 2014], we have the following result.

Corollary 2. The E-NASH problem for mp games with an LTL speciﬁcation formula ϕ is PSPACE-complete.

As for the case of GR(1) games, we can summarize the procedure in the following algorithm (Algorithm 2). The special case of GR(1) speciﬁcations. As in the case of GR(1) games, here we show that restricting the speciﬁcation language to GR(1) lowers the complexity also for mp games.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

Algorithm 2: E-NASH of mp games.

1 Input: A game Gmp and a speciﬁcation formula ϕ.

2 for i N and s St do

3 Compute puni(G)

4 for z {puni(s) : s St}N do

5 Compute G[z]

6 if π |= ϕE-NASH for some π G[z] then

7 return Accept

8 return Reject

The reason is that the path ﬁnding problem for GR(1) speciﬁcations can be done while avoiding model-checking of an LTLLim formula. In order to do this, we follow a different approach. Using an mp game G and a GR(1) speciﬁcation ϕ we deﬁne a linear program such that the linear program has a solution if and only if the pair (G, ϕ) is an instance of E-NASH. In particular, this approach is similar to the technique used in [Gutierrez et al., 2017c, Theorem 2], where Linear Programming is used to ﬁnd the complexity of solving a variant of E-NASH. Formally, we have the following result.

Theorem 5. The E-NASH problem for mp games with a GR(1) speciﬁcation ϕ is NP-complete.

Proof. We will deﬁne a linear program of size polynomial in G having a solution if and only if there exists an ultimately periodic path whose payoff for every player i is at least a minimum threshold zi and satisﬁes the GR(1) speciﬁcation. In order to do that, ﬁrst recall that ϕ has the following form

and let V (ψl) and V (θr) be the subset of states in G that satisfy the Boolean combinations ψl and θr, respectively. Observe that property ϕ is satisﬁed over a path π if, and only if, either π visits every V (θr) inﬁnitely many times or visits some of the V (ψl) only a ﬁnite number of times. For the game G[z], let V, E be the underlying graph, and for every edge e E introduce a variable xe. Informally, the value xe is the number of times that the edge e is used on a cycle. Formally, let src(e) = {v V : w e = (v, w) E}; trg(e) = {v V : w e = (w, v) E}; out(v) = {e E : src(e) = v}; and in(v) = {e E : trg(e) = v}. Consider ψl for some 1 l m, and deﬁne the linear program LP(ψl) with the following inequalities and equations:

Eq1: xe 0 for each edge e a basic consistency criterion;

Eq2: Σe Exe 1 ensures that at least one edge is chosen;

Eq3: for each a A, Σe Ewa(src(e))xe 0 this enforces that the total sum of any solution is positive;

Eq4: Σsrc(e) V (ψl) = xe = 0 this ensures that no state in V (ψl) is in the cycle associated with the solution;

Eq5: for each v V , Σe out(v)xe = Σe in(v)xe this condition says that the number of times one enters a vertex is equal to the number of times one leaves that vertex.

By construction, it follows that LP(ψl) admits a solution if and only if there exists a path π in G such that zi payi(π) for every player i and visits V (ψl) only ﬁnitely many times. In addition, consider the linear program LP(θ1, . . . , θn) deﬁned with the following inequalities and equations: Eq1: xe 0 for each edge e a basic consistency criterion; Eq2: Σe Exe 1 ensures that at least one edge is chosen; Eq3: for each a A, Σe Ewa(src(e))xe 0 this enforces that the total sum of any solution is positive; Eq4: for all 1 r n, Σsrc(e) V (θr) = xe 1 this ensures that for every V (θr) at least one state is in the cycle; Eq5: for each v V , Σe out(v)xe = Σe in(v)xe this condition says that the number of times one enters a vertex is equal to the number of times one leaves that vertex. In this case, LP(θ1, . . . , θn) admits a solution if and only if there exists a path π such that zi payi(π) for every player i and visits every V (θr) inﬁnitely many times. Since the constructions above are polynomial in the size of both G and ϕ, we can conclude it is possible to check in NP the statement that there is a path π satisfying ϕ such that zi payi(π) for every player i in the game if and only if one of the two linear programs deﬁned above has a solution. For the lower bound, we use [Ummels and Wojtczak, 2011] and observe that if ϕ is true, then the problem is equivalent to checking whether the mp game has a Nash equilibrium.

5 Other Rational Veriﬁcation Problems E-NASH is, arguably, the most fundamental problem in the rational veriﬁcation framework, but it is not the only one. The two other key problems are A-NASH and NON-EMPTINESS. The former is the dual problem of E-NASH, which asks, given a game G and a speciﬁcation ϕ, whether ϕ is satisﬁed in all Nash equilibria of G. The latter simply asks whether the game G has at least one Nash equilibrium, and it is the special case of E-NASH where the speciﬁcation ϕ is any tautology. We can conclude from (the proofs of) the results presented so far, which are summarised in Table 1, that while A-NASH for GR(1) games is also PSPACE and FPT, respectively, in case of LTL and GR(1) speciﬁcations, for mp games the problem is, respectively, PSPACE and co NP, in each case. In addition, we can also conclude that whereas NON-EMPTINESS for GR(1) games is FPT, for mp games is NP-complete. These results contrast with those when players goals are general LTL formulae, where all problems are 2EXPTIME-complete since LTL synthesis, which is 2EXPTIME-hard [Pnueli and Rosner, 1989], can be encoded. These results also contrast with those presented in [Gao et al., 2017], where it is shown that, in succinct model representations given by iterated Boolean games or reactive modules, all problems in the rational veriﬁcation framework can be reduced to NON-EMPTINESS, which clearly cannot be the case here, unless the whole polynomial hierarchy collapses.

Acknowledgments Najib acknowledges the support of the Indonesia Endowment Fund for Education (LPDP), and Perelli the support of the ERC project d Syn MA (grant agreement No 772459).

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

References [Alur and La Torre, 2004] Rajeev Alur and Salvatore La Torre. Deterministic Generators and Games for LTL Fragments. ACM Transactions on Computational Logic, 5(1):1 25, 2004. [Alur et al., 2002] Rajeev Alur, Thomas Henzinger, and Orna Kupferman. Alternating-Time Temporal Logic. Journal of the ACM, 49(5):672 713, 2002. [Bloem et al., 2012] Roderick Bloem, Barbara Jobstmann, Nir Piterman, Amir Pnueli, and Yaniv Sa ar. Synthesis of reactive(1) designs. Journal of Computer and System Sciences, 78(3):911 938, 2012. [Boker et al., 2014] Udi Boker, Krishnendu Chatterjee, Thomas Henzinger, and Orna Kupferman. Temporal Speciﬁcations with Accumulative Values. ACM Transactions on Computational Logic, 15(4):27:1 27:25, 2014. [Calude et al., 2017] Cristian S. Calude, Sanjay Jain, Bakhadyr Khoussainov, Wei Li, and Frank Stephan. Deciding parity games in quasipolynomial time. In STOC, pages 252 263. ACM, 2017. [Clarke et al., 2018] Edmund M. Clarke, Orna Grumberg, Daniel Kroening, Doron Peled, and Helmut Veith. Model Checking (2nd edition). MIT Press, 2018. [Emerson, 1990] E. Allen Emerson. Temporal and modal logic. In Handbook of Theoretical Computer Science Volume B: Formal Models and Semantics, pages 996 1072. Elsevier, 1990. [Filiot et al., 2018] Emmanuel Filiot, Raffaella Gentilini, and Jean-Franc ois Raskin. Rational Synthesis Under Imperfect Information. In LICS, pages 422 431. ACM, 2018. [Fisman et al., 2010] Dana Fisman, Orna Kupferman, and Yoad Lustig. Rational Synthesis. In TACAS, volume 6015 of LNCS, pages 190 204. Springer, 2010. [Gao et al., 2017] Tong Gao, Julian Gutierrez, and Michael Wooldridge. Iterated Boolean Games for Rational Veriﬁcation. In AAMAS, pages 705 713. ACM, 2017. [Gutierrez et al., 2015a] Julian Gutierrez, Paul Harrenstein, and Michael Wooldridge. Expresiveness and Complexity Results for Strategic Reasoning. In CONCUR, volume 42 of LIPIcs, pages 268 282. Schloss Dagstuhl, 2015. [Gutierrez et al., 2015b] Julian Gutierrez, Paul Harrenstein, and Michael Wooldridge. Iterated Boolean Games. Information and Computation, 242:53 79, 2015. [Gutierrez et al., 2017a] Julian Gutierrez, Paul Harrenstein, and Michael Wooldridge. From Model Checking to Equilibrium Checking: Reactive Modules for Rational Veriﬁcation. Artiﬁcial Intelligence, 248:123 157, 2017. [Gutierrez et al., 2017b] Julian Gutierrez, Paul Harrenstein, and Michael Wooldridge. Reasoning about Equilibria in

Game-like Concurrent Systems. Annals of Pure and Applied Logic, 168(2):373 403, 2017. [Gutierrez et al., 2017c] Julian Gutierrez, Aniello Murano, Giuseppe Perelli, Sasha Rubin, and Michael Wooldridge. Nash Equilibria in Concurrent Games with Lexicographic Preferences. In IJCAI, pages 1067 1073, 2017. [Gutierrez et al., 2018a] Julian Gutierrez, Muhammad Najib, Giuseppe Perelli, and Michael Wooldridge. EVE: A Tool for Temporal Equilibrium Analysis. In ATVA, volume 11138 of LNCS, pages 551 557. Springer, 2018. [Gutierrez et al., 2018b] Julian Gutierrez, Giuseppe Perelli, and Michael Wooldridge. Imperfect Information in Reactive Modules games. Information and Computation, 261(Part):650 675, 2018. [Kupferman et al., 2016] Orna Kupferman, Giuseppe Perelli, and Moshe Vardi. Synthesis with Rational Environments. Annals of Mathematics and Artiﬁcial Intelligence, 78(1):3 20, 2016. [Kupferman, 2015] Orna Kupferman. Automata Theory and Model Checking. Handbook of TCS, 2015. [Osborne and Rubinstein, 1994] Martin J. Osborne and Ariel Rubinstein. A Course in Game Theory. MIT Press, 1994. [Piterman and Pnueli, 2006] Nir Piterman and Amir Pnueli. Faster Solutions of Rabin and Streett Games. In LICS, pages 275 284, 2006. [Pnueli and Rosner, 1989] Amir Pnueli and Roni Rosner. On the synthesis of a reactive module. In POPL, pages 179 190. ACM Press, 1989. [Pnueli, 1977] Amir Pnueli. The Temporal Logic of Programs. In FOCS, pages 46 57. IEEE, 1977. [Rauch Henzinger and Telle, 1996] Monika Rauch Henzinger and Jan Telle. Faster Algorithms for the Nonemptiness of Streett Automata and for Communication Protocol Pruning. In SWAT, pages 16 27, 1996. [Toumi et al., 2015] Alexis Toumi, Julian Gutierrez, and Michael Wooldridge. A Tool for the Automated Veriﬁcation of Nash Equilibria in Concurrent Games. In ICTAC, volume 9399 of LNCS, pages 583 594. Springer, 2015. [Ummels and Wojtczak, 2011] Michael Ummels and Dominik Wojtczak. The Complexity of Nash Equilibria in Limit-Average Games. In CONCUR, pages 482 496, 2011. [Wooldridge et al., 2016] Michael Wooldridge, Julian Gutierrez, Paul Harrenstein, Enrico Marchioni, Giuseppe Perelli, and Alexis Toumi. Rational Veriﬁcation: From Model Checking to Equilibrium Checking. In AAAI, pages 4184 4191. AAAI Press, 2016. [Zwick and Paterson, 1996] Uri Zwick and Mike Paterson. The Complexity of Mean Payoff Games on Graphs. Theoretical Computer Science, 158(1):343 359, 1996.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)