# probabilistic_strategy_logic__817d179d.pdf

Probabilistic Strategy Logic

Benjamin Aminof1,2 , Marta Kwiatkowska3 , Bastien Maubert4 , Aniello Murano4 and Sasha Rubin4

1JKU Linz, Austria 2TU Wien, Austria 3University of Oxford, UK 4Universit a degli Studi di Napoli Federico II, Italy aminof@forsyte.at, marta.kwiatkowska@cs.ox.ac.uk, bastien.maubert@gmail.com, murano@unina.it, rubin@unina.it

We introduce Probabilistic Strategy Logic, an extension of Strategy Logic for stochastic systems. The logic has probabilistic terms that allow it to express many standard solution concepts, such as Nash equilibria in randomised strategies, as well as constraints on probabilities, such as independence. We study the model-checking problem for agents with perfectand imperfect-recall. The former is undecidable, while the latter is decidable in space exponential in the system and triple-exponential in the formula. We identify a natural fragment of the logic, in which every temporal operator is immediately preceded by a probabilistic operator, and show that it is decidable in space exponential in the system and the formula, and double-exponential in the nesting depth of the probabilistic terms. Taking a ﬁxed nesting depth, this gives a fragment that still captures many standard solution concepts, and is decidable in exponential space.

1 Introduction There are a number of logical formalisms for expressing properties of strategic behaviour multi-agent systems, including Alternating-time Temporal Logics (ATL/ATL ) [Alur et al., 2002], variants such as ATL with strategy contexts [Laroussinie and Markey, 2015], Strategy Logic (SL) [Chatterjee et al., 2010; Mogavero et al., 2014], and extensions of SL, e.g., allowing imperfect information [Belardinelli et al., 2017; Cerm ak et al., 2018]. The general trend of these logics is to capture increasingly sophisticated gametheoretic concepts, such as winning strategies, Nash equilibria, secure equilibria, in systems where agents have lineartemporal goals (e.g., safety, achievement, liveness). In order to capture settings that combine probability and multiple agents, such logics have been extended, typically, by adding an operator that allows one to express the probability that an agent s goal holds. The main drawback of all existing strategic logics with probabilistic aspects falls into one of two camps: either, it is widely believed they cannot express classic solution concepts such as Nash equilibria, or they can express some classic solution concepts, but do so via dedicated

operators for each solution concept, and thus are not a general purpose logic. In this work we deﬁne a logic that does not suffer these drawbacks. In particular, we deﬁne Probabilistic Strategy Logic (PSL) and study its model-checking problem. The syntax of PSL is based on SL, but has additional arithmetic terms that allow one to compare the probabilities of formulas holding. The models of PSL are multi-agent stochastic transition systems (in contrast, the models of SL are deterministic). These transition systems capture many fundamental stochastic models in which agents have temporal goals, such as Markov chains and Markov decision processes with linear-temporal goals [Courcoubetis and Yannakakis, 1995]. We study the model-checking problem for two classic types of agents [Fagin et al., 1995]: perfect-recall (who use memoryful strategies), and imperfect-recall (who use memoryless strategies, also called Markovian strategies or policies). We observe that the model-checking problem for perfect-recall agents is undecidable. On the other hand, we prove that the model-checking problem for imperfectrecall agents is decidable in EXPSPACE in the system and 3EXPSPACE in the formula. We also deﬁne a natural fragment, which we call the vanilla fragment, in which every temporal operator is immediately preceded by a probabilistic operator, and show that its model-checking problem is decidable in EXPSPACE in the system and the formula and 2EXPSPACE in the nesting depth of the probabilistic terms (we show that this depth is equal to 1 for many natural formulas). To prove these results we reduce to the ﬁrst-order theory of real arithmetic, known to be decidable in EXPSPACE [Ben Or et al., 1986]. Interestingly, the precise complexity of real arithmetic is a longstanding open problem [Berman, 1980]. Thus, any improvement in algorithms for real-arithmetic, also apply to model-checking PSL. On the other hand, we also prove that real-arithmetic is polynomial-reducible to the complexity of vanilla PSL with no nesting of probabilistic terms and a single agent using memoryless strategies. Thus, PSL and real-arithmetic are intimately connected logics.

2 Probabilistic Strategy Logic (PSL) In this section we deﬁne Probabilistic Strategy Logic (PSL). The syntax is inspired by Strategy Logic (SL) [Mogavero et al., 2014] and the semantics are ﬁnite-state multi-agent stochastic transition-systems.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

2.1 PSL Syntax Fix ﬁnite non-empty sets of atoms AP, agents Ag, and strategy variables V ar. The syntax of PSL is deﬁned by the following grammar:

ϕ ::= p | ϕ | ϕ ϕ | x.ϕ | τ τ

τ ::= c | τ 1 | τ τ | τ + τ | τ τ | Pβ(ψ) ψ ::= ϕ | ψ | ψ ψ | X ψ | ψ U ψ

where p AP, x V ar, c Q and β : Ag V ar. Functions β : Ag V ar are called bindings; intuitively, they tell an agent which strategy to use. Formulas ϕ are called history formulas (they will be interpreted over histories). Formulas τ are called arithmetic terms, and the ones of the form Pβ(ψ) are called probabilistic terms. The formulas ψ are called path formulas (they will be interpreted over paths). Every history formula is also a path formula. Note that every linear-temporal logic (LTL) formula [Pnueli, 1977] is a path formula (i.e., one that does not mention quantiﬁcation or arithmetic terms). We introduce the usual shorthands for Boolean and arithmetic operations, e.g., ϕ ϕ is shorthand for ϕ ϕ , and Pβ(ψ) = c is shorthand for (Pβ(ψ) c) (c Pβ(ψ)). Write [n] for the set {1, 2, , n}, and assume Ag = [n] for some n N.

Semantics (Intuition) Formulas of PSL are interpreted over stochastic transition systems whose transitions are determined by the simultaneous actions of agents. A strategy is a rule that tells an agent, in every situation, what action to do with what probability. In general, this decision is based on the history of the evolution of the system (also known as randomised history-dependent strategies in the stochastic games literature, and similar to behavioural strategies in extensiveform games in the game-theory literature). In PSL, there are ﬁrst-order variables x that vary over agent strategies; they are quantiﬁed in state formulas of the form x.ϕ. The temporal operators are those of LTL, i.e., the path formula X ψ is read ψ holds in the next step , and ψ1 U ψ2 is read ψ1 holds until ψ2 holds . The meaning of the term Pβ(ψ) is this: if agent i uses the strategy β(i) (for each i Ag), then Pβ(ψ) is the probability that a random path in the system satisﬁes ψ. Note that arithmetic terms are polynomials over the constants c and over the probabilistic terms Pβ(ψ); in particular, there are no arithmetic variables in our logic.

2.2 PSL Examples PSL can express central game-theoretic notions about probabilistic agents, e.g., domination (strict, weak, very weak), ϵ best-response, ϵ Nash-equilibria, Pareto optimality, socialwelfare maximising equilbrium, k-resilience and t-immunity, cf. [Halpern, 2008]. Here, agents goals can be given in LTL. This declarative speciﬁcation of goals (in contrast to the classic case of having agents maximise the expected value of a reward function [Russell and Norvig, 1995]) is in line with the literature on stochastic games with ω-regular goals [Chatterjee and Henzinger, 2012]. Here are some examples. Suppose there are n agents, and let ψ1, , ψk be LTL formulas representing agent goals (in general, these can be arbitrary path formulas of PSL). Let player i receive reward

ci,j Q if ψj holds on the outcome of the game. Thus, in an outcome π of the game, player i s utility is P j:π|=ψj ci,j. A player s payoff is deﬁned to be its expected utility. Notation. Let x = (x1, , xn) be a tuple of n strategy variables. To express that agent i uses strategy xi we use bindings. Deﬁne the binding β : Ag V ar by β(i) = xi for i Ag. We write β[i 7 y] to denote the binding that agrees with β on j = i, and that maps i to variable y.

Example 1 (Expected payoff). The arithmetic term Ei,β(x) deﬁned as P j [k] ci,j Pβ(ψj) is the expected payoff for agent i given that agent l plays strategy xl.

We now show how to express that a strategy is a bestresponse. These are stable in the sense that a player, knowing the other players strategies, has no incentive to switch.

Example 2 (Best response). Deﬁne the PSL formula BRi,β(x) to be y Ei,β[i7 y] Ei,β . This formula is read for every strategy y, the expected payoff for agent i if she plays y (and agent j = i plays xj) is no larger than the expected payoff for agent i if she plays xi (and agent j = i plays xj) . In other words, this expresses that agent i playing xi is a best-response to the other agents playing xj (for j = i). Note that replacing Ei,β by Ei,β + ϵ (for non-negative rational ϵ) deﬁnes ϵ best-response.

If every player is playing a best-response strategy, the strategy proﬁle is called a Nash equilibrium. This is the foundational solution concept in Game Theory.

Example 3 (Nash equilibrium). Deﬁne the PSL formula NEβ(x) to be V i [1,n] BRi,β(x). This formula is read every agent i is playing a best response (given that agent j plays xj) , i.e., that x is a Nash equilibrium. Moreover, NEβ(x) V i Ag li Ei,β(x) ri expresses an equilbrium where the expected payoff for agent i falls into a given interval [li, ri] [Ummels and Wojtczak, 2011].

Example 4 (Polynomial expressions). The syntax can express polynomial constraints on probabilities. For instance, the formula Pβ(ψ1 ψ2) = Pβ(ψ1) Pβ(ψ2) expresses that the events ψ1 and ψ2 (under the binding β) are independent.

Example 5 (Social-welfare maximisation). The following expresses that x is a Nash equilibrium that maximises social welfare, where β (i) = x i for i Ag: NEβ(x) h x .NEβ (x ) P i [n] Ei,β (x ) P i [n] Ei,β(x) i .

Example 6 (Rational Synthesis). Recent studies of synthesis for rational agents have proposed asking for the existence of a strategy for the system (modeled as agent 1), to enforce a path formula ψ against all of the strategies of the rational environment (modeled as the agents 2, 3, , n) that form a Nash equilibrium [Fisman et al., 2010; Kupferman et al., 2016]. In a stochastic system, one might ask that the path formula ψ should hold with probability one. This is expressed by the PSL formula x1. x2 x3 . . . xn. h V i [2,n] BRi,β(x) i Pβ(ψ) = 1 . A variant formula can be written to express that ψ holds with probability one against some equilibrium.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

2.3 PSL Semantics PSL formulas are interpreted over multi-agent stochastic transition systems. Here are the deﬁnitions.

Distributions For a ﬁnite non-empty set X let Dist(X) denote the set of distributions over X, i.e., functions d : X [0, 1] such that P x X d(x) = 1. Write x d for d(x) > 0. A point distribution is one for which d(x) = 1 for some element x X. If di is a distribution over Xi, then, writing X = Q i Xi, the product distribution is the distribution d : X [0, 1] deﬁned by d(x) = Q i di(xi).

Systems A multi-agent stochastic transition system (or simply system) G is a tuple (Ac, St, Tr, Lab) where

Ac is a ﬁnite non-empty set of actions,

St is a ﬁnite non-empty set of states,

Tr : St Ac Ag Dist(St) is a transition function,

Lab : St 2AP is a labelling function.

We say that G is deterministic (instead of stochastic) if every Tr(v, a) is a point distribution. Multi-agent stochastic transition systems can model games in extensive form, repeated one-stage games, Markov chains (no agent), Markov decision processes (one agent), decentralised Markov decision processes (multiple agents), and stochastic games (however, instead of agents being rewarded at states, in our setting agents are rewarded if given path formulas hold). Such systems have been used to model multi-agent pathplanning with unreliable actuators, human in the loop UAV mission planning, autonomous urban driving, randomised communicationand security-protocols, energy management systems, etc. [Kwiatkowska et al., 2012; Chen et al., 2013b; Svorenov a and Kwiatkowska, 2016].

Markov Chains A Markov chain M is a tuple (Q, p) where Q is a set of states and p Dist(Q Q) is a distribution. The values p(s, t) are called transition probabilities of M. A Markov chain M and a state q Q induce a canonical probability space on the set of inﬁnite paths starting in q [Kemeny and Snell, 1976].

Actions and Paths A joint-action a is an element of Ac Ag. A path π is a non-empty sequence v1v2 of states such that there exists a sequence a1a2 of joint-actions such that vi+1 Tr(vi, ai) for every i. We write len(π) for the length of π, and π i for the preﬁx of π of length i (for i len(π)). Finite paths are called histories, and the set of all histories is denoted Hist. Write last(h) for the last state of a history h.

Strategies A strategy is a function σ : Hist Dist(Ac). Let Σ denote the set of all strategies. A strategy proﬁle is a tuple ρ of strategies, one for each agent. We write ρi for the strategy of agent i. We overload notation and let ρ also denote the function Hist Dist(Ac Ag) that maps h to the product distribution of the ρi(h) s. I.e., a strategy proﬁle assigns, for each history h, a distribution over the set of joint-actions.

Probability Space on Outcomes An outcome of a strategy proﬁle ρ and a history h is a (ﬁnite or inﬁnite) path π that starts with h and is extended by ρ, i.e., i) π1 = h, and ii) for every k len(h) there exists ak ρ(π k) such that

πk+1 Tr(πk, ak). The set of outcomes of a strategy proﬁle ρ and history h is denoted out(ρ, h). A given transition system G, strategy proﬁle ρ, and history h induce an inﬁnitestate Markov chain Gρ,h whose states are the histories in out(ρ, h) and whose transition probabilities p(h , h s ) are deﬁned as P a ρ(h )(a) Tr(last(h ), a)(s ). The Markov chain Gρ,h induces a canonical probability space: its sample space can be identiﬁed with the set of inﬁnite paths in out(ρ, h), and its measure is denoted µρ,h.

Valuations and Bindings Recall (from the syntax) that a binding is a function β : Ag V ar, i.e., it assigns variables to agents. A valuation is a partial function ν : V ar Σ, i.e., it assigns strategies to some variables. If the range of β is contained in the domain of ν, then composing them (left to right), we see that ν β : Ag Σ is a joint strategy.

Free Variables A variable x is free in a PSL formula if it has a subformula of the form Pβ(ψ) such that x β(Ag) but Pβ(ψ) is not in the scope of x. A history formula with no free variables is called a sentence.

PSL Semantics Formulas of PSL are interpreted given a transition system G, a valuation ν : V ar Σ whose domain contains the free variables of the formula, and either a history h (for history formulas), or an inﬁnite path π and an index i N (for path formulas).1 The semantics of history formulas is as follows:

G, ν, h |= p iff p Lab(last(h)) G, ν, h |= ϕ iff G, ν, h |= ϕ G, ν, h |= ϕ1 ϕ2 iff W j G, ν, h |= ϕj G, ν, h |= x.ϕ iff σ Σ. G, ν[x 7 σ], h |= ϕ G, ν, h |= τ1 τ2 iff 2 valν,h(τ1) valν,h(τ2)

valν,h(c) = c and valν,h(τ 1) = (valν,h(τ)) 1

valν,h(τ τ ) = valν,h(τ) valν,h(τ ) for { , +, } valν,h(Pβ(ψ)) = µν β,h({π : G, ν, π, 1 |= ψ}).3

The semantics of the path formulas are deﬁned as follows:

G, ν, π, i |= ϕ iff G, ν, π i |= ϕ G, ν, π, i |= ψ iff G, ν, π, i |= ψ G, ν, π, i |= ψ1 ψ2 iff W j G, ν, π, i |= ψj G, ν, π, i |= X ψ iff G, ν, π, i + 1 |= ψ G, ν, π, i |= ψ1 U ψ2 iff k i such that G, ν, π, k |= ψ2 and j [i, k). G, ν, π, j |= ψ1 The model-checking problem is to decide, given a system G, a state s St, and a sentence ϕ, whether or not G, s |= ϕ.

1The standard semantics of SL shifts strategies during the evaluation of temporal operators [Mogavero et al., 2014; Bouyer et al., 2016]. Here instead, inspired by [Berthon et al., 2017], we carry the accumulated history with us and do not shift strategies. 2To avoid the anomaly of division by zero, we let G, ν, h |= τ1 τ2 if τ1 or τ2 contain a subterm τ 1 for which valν,h(τ) = 0. 3Recall from the deﬁnitions that µν β,h( ) is the probability measure on the inﬁnite paths starting with h and consistent with the joint-strategy ν β.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

2.4 Discussion of the Semantics Restricted Classes of Strategies A strategy σ is deterministic (or pure) if σ(h) is a point distribution for every history h. A strategy σ is ﬁnite-memory if there is a deterministic ﬁnite-state machine (St, Q, q0, δ) (over alphabet St) and output function out : Q St Dist(Act) such that σ(h) = out(δ(q0, h), last(h)) for all h Hist. Intuitively, the strategy σ can be implemented with log |Q| bits of memory. In case |Q| = 1, we call the strategy memoryless. Equivalently, a strategy σ is memoryless if last(h) = last(h ) implies σ(h) = σ(h ). We can safely write a memoryless strategy as a function σ : St Dist(Ac) whose domain is St instead of Hist, and a strategy proﬁle of memoryless strategies as a function ρ : St Dist(Ac Ag). In the deﬁnition of PSL semantics, if we let Σ be the set of memoryless strategies, we can replace histories h by states s and write, e.g., G, ν, s |= ϕ. Also, in the semantics of Pβ(ψ), the induced Markov chain Gρ,s where ρ = ν β, can be taken to be ﬁnite-state, i.e., the state set is St and the transition probability from v to w is P a Ac Ag ρ(v)(a) Tr(v, a)(w).

PSL and Non-probabilistic Strategic Logics Restricting the semantics of PSL to deterministic transition systems and deterministic strategies results in (a notational variant of) SL. Indeed, in this case Pβ(ψ) {0, 1}, and Pβ(ψ) = 1 iff ψ holds. This way, PSL subsumes SL, and thus also, e.g., ATL .

Expressing Nash Equilibria Since PSL includes comparisons between probabilities, it can express the standard meaning of Nash Equilibrium (NE), i.e., that no deviation can increase an agent s expected payoff, see Example 3. In comparison, ordinary Strategy Logic (SL) [Mogavero et al., 2014] can express NE for deterministic arenas and pure strategies. Stochastic Game Logic (SGL) [Baier et al., 2012] is a similar logic to PSL, except it can only compare probabilities to constants (see Sec. 4 for a detailed comparison with SGL).

Vanilla PSL We deﬁne a syntactic fragment of PSL that we call vanilla PSL. Intuitively, the restriction is similar to that put on CTL to obtain CTL; a similar restriction is deﬁned for a fragment of SL closely related to ATL , see [Malvone et al., 2018]. The syntax of vanilla PSL is the same as that of PSL except that the path formulas are given by the grammar: ψ ::= ϕ | X ϕ | ϕ U ϕ. The relevance of this fragment can be seen by noting that all the PSL formulas in the Examples in Section 2.2 are also vanilla PSL formulas assuming that the goal of agent i (for every i Ag) is a vanilla PSL path formula ψi. For instance, safety and reachability goals can be expressed in vanilla PSL. We will see that our algorithm for model-checking vanilla PSL (assuming all strategies are memoryless) has a substantially lower complexity than PSL.

3 Model Checking PSL

PSL with unrestricted strategies is too expressive to be decidable. In fact, this is already true when restricted to a single agent, deterministic ﬁnite-memory strategies, and formulas of the form x.ϕ where ϕ is a formula of the grammar p | ϕ | ϕ ϕ | Pβ(F ϕ) c for {>, =} and c Q [0, 1] (with β(x) = 1); see [Br azdil et al., 2006]. For memoryless strategies, the news is better:

Theorem 1. 1. Model-checking PSL restricted to memoryless strategies is decidable in space exponential in the system and triple-exponential in the formula. 2. Model-checking vanilla PSL restricted to memoryless strategies is decidable in space exponential in the system and the formula and double-exponential in the nestingdepth of probabilistic terms of the formula. In particular, for vanilla PSL with a ﬁxed nesting depth of probabilistic terms, this gives an EXPSPACE algorithm. Note that in all the examples of Section 2.2 this depth is 1. To prove Theorem 1, we reduce to model-checking real arithmetic. Real Arithmetic refers to ﬁrst-order logic of the structure R := (R, +, , , 0, 1). This logic is very expressive, e.g., it is not hard to see that solutions of systems of polynomial inequalities with rational coefﬁcients are deﬁnable in this logic, and that one can deﬁne the relation =, and the operations r r and r 1, e.g., r 1 can be represented by a new variable r and the constraint r r = 1 (note that if r = 0 this constraint evaluates to false). A formula Φ of real arithmetic, with free variables in the set X, is interpreted given an evaluation χ : X R, and satisfaction is denoted R, χ |= Φ. Theorem 2. [Ben-Or et al., 1986] Model-checking the structure R of real-arithmetic against ﬁrst-order logic sentences is decidable in space logarithmic in the size of the formula and exponential in the number of quantiﬁers in the formula.

Translation of PSL to R Fix a transition system G. Introduce real-valued ﬁrst-order variables rx,s,a for every strategy variable x V ar, state s St, and action a Ac. Intuitively, rx,s,a represents the probability that strategy x does action a in state s. An evaluation χ that maps rx,s,a to ν(x)(s)(a) (for every x, s, a) is said to be compatible with ν. The algorithm inductively translates a PSL history formula ϕ with free variables x, and a state t, into a real arithmetic formula Φϕ,t of R with free variables rx,s,a for x x, s St and a Ac, such that for every valuation ν and every evaluation χ compatible with it, we have that

G, ν, t |= ϕ iff R, χ |= Φϕ,t (1)

If ϕ is a sentence then so is Φϕ,t. Thus, to model check G, t |= ϕ, simply model check R |= Φϕ,t. Translating the Boolean cases The boolean cases are as follows: Φp,t := true if p Lab(t), and false otherwise; Φϕ1 ϕ2,t := Φϕ1,t Φϕ2,t, and Φ ϕ,t := Φϕ,t. Translating the existential case The translation of x.ϕ is done by quantifying the corresponding real variables rx,s,a, and expressing that they code a distribution. Formally:

Φ x.ϕ,t := ( rx,s,a)s St,a Ac [Distx Φϕ,t]

where Distx is the conjunction of V s St,a Ac rx,s,a 0 and V s St P a Ac rx,s,a = 1.

Translating inequalities For every term τ introduce a variable rτ, and let ST(τ) be all the arithmetic subterms of τ (including τ itself), i.e., the subterms treating Pβ(ψ) as atomic. The translation Φτ1 τ2,t of τ1 τ2 is

( rτ)τ ST (τ1) ST (τ2)EQNt(τ1) EQNt(τ2) rτ1 rτ2

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

where the construction of EQNt(τi) and the meaning of variables rτ are given below. In particular, the formula EQNt(τi) has free variables rx,s,a (for all x, s, a) and rτ (for τ ST(τi)), and satisﬁes the following property for every valuation ν and every evaluation χ compatible with it:

R, χ |= EQNt(τi) iff τ ST(τi), χ(rτ) = valν,t(τ) (2)

Observe that (2) implies that (1) holds in case ϕ = τ1 τ2. Indeed, let ν be a valuation and χ a compatible evaluation. Suppose G, ν, t |= τ1 τ2. Then, by (2), EQNt(τ1) EQNt(τ2) rτ1 rτ2 holds under χ extended with the assignments rτ 7 valν,t(τ), and thus R, χ |= Φτ1 τ2,t. Conversely, suppose R, χ |= Φτ1 τ2,t and pick the rτ s so that EQNt(τ1) EQNt(τ2) rτ1 rτ2 holds under χ. Thus, in particular, by (2), rτi = valν,t(τi), and thus rτ1 rτ2 implies G, ν, t |= τ1 τ2. Translating terms. Each subterm τ of ϕ is associated with a fresh variable rτ, and the formula EQNt(τ) is deﬁned inductively as follows: EQNt(c) := rc = c, EQNt(τ 1) := EQNt(τ) rτ 1 = (rτ) 1, and EQNt(τ τ ) := EQNt(τ) EQNt(τ ) rτ τ = rτ rτ where { , +, }. It is not hard to see that (2) holds for these cases. It remains to show how to construct EQNt(Pβ(ψ)). To do this, we use an automata-theoretic approach as follows. As in the automata contruction for CTL [Kupferman et al., 2000] we view ψ as an LTL formula over atoms max(ψ), the set of maximal history-subformulas of ψ, and we construct a deterministic Rabin word automaton Aψ (say with state set Qψ) that accepts exactly the set of words w (2max(ψ))ω such that w |= ψ. Note, in particular, that this step translates the temporal operators. We recall how to compute the probability that an LTL formula ψ holds in the Markov chain Gν β,t [Vardi, 1985; Courcoubetis and Yannakakis, 1995]. This amounts to solving a system of equations over the product Markov chain Gν β,t Aψ with variables rs,q representing the probability that a path generated from s in the Markov chain is accepted by the automaton Aψ starting in state q. Note, however, that in our case, the transition probabilities of the chain Gν β,t are not given explicitly (as numbers), but implicitly as variables rx,s,a. It is not hard to see that these equations can be encoded in real-arithmetic (cf. [Baier et al., 2012]), i.e., given a state t of G, β and Aψ, one can deﬁne a formula φt β,Aψ over variables rx,s,a and the variable rt,qι (here qι is the initial state of the automaton Aψ), such that we have: for every valuation ν and every evaluation χ compatible with it,

R, χ |= φt β,Aψ iff χ(rt,qι) = valν,t(Pβ(ψ)) (3)

We have space for the briefest outline. The formula φt β,Aψ will have the form

( rϕ ,s)s St ϕ max(ψ)(Max ( rv,v )v,v St Qψ (Prod Rabin))

where the fresh variables rϕ ,s are constrained by Max so that rϕ ,s > 0 iff ϕ holds in state s, and the fresh variables rv,v , for states v, v of the product Markov chain Gν β,t Aψ,

are constrained by Prod to be the transition probabilities in the product Markov chain Gν β,t Aψ, and variable rt,qι is constrained by Rabin to be the probability that a random path in the product chain is accepted by the automaton Aψ. Finally, we can deﬁne EQNt(Pβ(ψ)) to be

rt,qι(φt β,Aψ r Pβ(ψ) = rt,qι).

Observe that (3) implies that (2) holds in case τi = Pβ(ψ).

Computational Complexity Not surprisingly, the cost of translating Pβ(ψ) dominates the other steps in the translation of a PSL formula. The size of Pβ(ψ) is exponential in G, and double-exponential in ψ; and the number of quantiﬁers it uses is polynomial in G and double-exponential in ψ. Thus, for a PSL formula ϕ, a worst-case blowup occurs if there is a linearly deep nesting of such probabilistic terms in ϕ. In this case, the size of Φϕ,s is at most exponential in G and double-exponential in ϕ, and the number of quantiﬁers is polynomial in G and double-exponential in ϕ. For vanilla PSL, each ψ has constant size, and so the size of Φϕ,s is polynomial in ϕ, exponential in G and the nesting depth d of the probabilistic terms; and the number of quantiﬁers is polynomial in G and ϕ, and exponential in d. Theorem 1 then follows from Theorem 2.

Extensions PSL and the proof of the model-checking problem (for memoryless strategies) are highly extensible. (1) Our algorithm immediately gives better complexity for other fragments of PSL, e.g., the fragment of PSL whose path formulas are ψ ::= ϕ | X ϕ | ϕ U ϕ | G F ϕ, is an extension of vanilla PSL that allows one to express some fairness properties. This has the same complexity as the vanilla fragment. (2) We can add variables for deterministic strategies, and extend the translation of y.ϕ for such y by ( ry,w,a)w St,a Ac [Disty Pointy Φϕ,β,s] where Pointy is the formula w a ry,w,a = 1. This allows one to express, e.g., the notion of pure-strategy ϵ-equilibrium. (3) We can treat agents with partial observation, i.e., agent i has an indistinguishability relation i over states and the z.ϕ case is given by

( ry,w,a)w St,a Ac [Disty Unify Φϕ,β,s] ,

where Unify is w v:v iw a (ry,w,a = ry,v,a). For comparison, strategy logic under imperfect information is studied in [Berthon et al., 2017; Belardinelli et al., 2017], and probabilistic extensions/variants of ATL are studied in [Huang and Luo, 2013; Schnoor, 2013].

A Lower Bound We have shown that one can model check, in EXPSPACE, the fragment of vanilla PSL in which the nesting depth d of probabilistic terms is bounded. As discussed in the introduction, we now show that there is a polynomial reduction from realarithmetic to this fragment. Theorem 3. There is a polynomial reduction from modelchecking R to that of vanilla PSL restricted to a single agent, memoryless strategies, and a bounded nesting of probabilistic terms.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

Proof. We show how to translate a given real-arithmetic sentence ϕ in linear-time to a deterministic system G, a state s0 of G, and a vanilla PSL formula bϕ with no nesting of probabilistic terms, such that R |= ϕ iff G, s |= bϕ. Assume, wlog, that variable names in ϕ are not reused, and let X be the variables occuring in ϕ. Let V ar := X, AP := {x1, x2 : x X}. Deﬁne the deterministic system G = (Ac, St, Tr, Lab), in which |Ag| = 1, where Ac := { } AP; St := {s0} AP; Tr is given by

s0 xi xi, s0 s0, and xi a xi, for every a, x, i; and Lab(s0) = , and Lab(xi) = {xi} for every x, i. Let βx be the assignment of the strategy x to the agent. Construct bφ from φ by: replacing every occurence of a term x by the term enc(x) := (Pβx(X x1)) 1 (Pβx(X x2)) 1; then replacing every inequality ϕ := τ1 τ2 with the formula ϕ V x free(ϕ) V i=1,2 Pβx(X xi) = 0. For every subformula ϕ of φ, the corresponding translated subformula bϕ has the following property: for every evaluation χ there exists a valuation ν such that R, χ |= ϕ iff G, ν, start |= bϕ. The proof is a straightforward induction that uses the fact that every real number r can be written as (r1) 1 (r2) 1 for r1, r2 (0, 1

2). Then, a strategy that, for example, assigns the probability ri to the action xi (i = 1, 2) and the probability 1 (r1) 1 (r2) 1 to the action , has the property that enc(x) = r.

4 Related Work

Probabilistic Strategic Logics Early probabilistic extensions of ATL are PATL/PATL [Chen and Lu, 2007]. They use operators of the form A Pr(ψ) c read agents in A have strategies such that for all strategies of agents not in A, the probability that the path formula ψ holds is c , where {<, =, >} and c [0, 1] is rational. Just as SL subsumes ATL/ATL , so PSL subsumes PATL/PATL . To see this, note that A Pr(ψ) c is equivalent to the PSL formula (allowing shorthands) ( xi)i A( xj)j APrβ(ψ) c. Another variant of probabilistic ATL, ATL with probabilistic success, is studied in [Bulling and Jamroga, 2009]. The logics r PATL/r PATL [Chen et al., 2013a; Kwiatkowska et al., 2018] extend PATL/PATL with operators that can enforce an expected reward c. The main differences with PSL are: (i) r PATL can express cumulative rewards given by the transition system, while PSL can only express rewards for path formulas holding; (ii) unlike PSL, r PATL is not designed to express solution concepts. Model-checking r PATL

is 2EXPTIME-complete.

Closely Related Works r PATL+NE [Kwiatkowska et al., 2019] extends r PATL with an equilibrium operator expressing that there exists a subgame perfect Nash equilibrium between the coalitions A and Ag \ A under which the sum of the (probability or reward) objectives for the coalitions is c. The main differences with PSL are: (i) r PATL+NE (like r PATL before it) can express more complex rewards than PSL, (ii) PSL can express other complex concepts from game-theory. SGL (Stochastic Game Logic) [Baier et al., 2012] is like ATL with strategy contexts, and uses a probabilistic operator P c(D, φ1, , φk), where D is a Rabin word

automaton over the alphabet [k], and each φi is an SGL formula, which expresses that the probability is c that a path π, when viewed as the sequence whose jth element consist of the indices i [k] such that φi holds in the sufﬁx π i, is accepted by D. Model-checking SGL is undecidable in general; and restricted to memoryless strategies it is decidable via a translation to real-arithmetic (an analysis of their algorithm yields a 2EXPSPACE upper bound). Our proof of Theorem 1 is based on this translation, and differs from it in two ways: we compile path formulas into automata over truth values of subformulas; and we translate inequalities between arithmetic terms. The main differences with PSL are: (i) there are no terms in SGL that compare probabilities of path formulas, and thus, one cannot, in general, express any of the examples of PSL formulas in Section 2.2, including Nash equilibrium (intuitively, the common fragment of SGL and PSL is the fragment of PSL in which the terms τ τ are replaced by Pβ(ψ) c); (ii) SGL uses automata instead of LTL to represent path formulas; if we had used automata instead of LTL our model-checking algorithm would work in 2EXPSPACE (instead of 3EXPSPACE), but we chose not to do so because automata are not as readable as formulas. Overall PSL stands out for two reasons: (i) it is not subsumed by any of the logics mentioned (and it subsumes aspects of each of them), and (ii) its syntax is, we believe, simpler and more natural than those of the logics mentioned.

5 Conclusion Strategy Logic (SL) can elegantly express important solution concepts in game-theory, e.g., Nash Equilibria, dominant strategies, subgame-perfect equilibria [Mogavero et al., 2014]. Although extended in a number of useful ways, e.g., imperfect information [Berthon et al., 2017; Belardinelli et al., 2017] and counting strategies [Aminof et al., 2018; Malvone et al., 2018], until now it has only been investigated in the restricted setting of pure strategies and determinisitic systems. This is unfortunate, as game-theory and MAS often involve randomised strategies and stochastic systems. We ﬁll this gap by introducing PSL, a probabilistic extension of SL. The importance of PSL is further highlighted by a combination of three facts, as we have shown: it has a natural syntax which mimics ﬁrst-order logic; it is expressive; and it is highly extensible. Thus, PSL takes an important step to bring us closer to the ideal of having a natural, expressive, and decidable logic for strategic reasoning in multi-agent systems. There are a number of non-trivial open problems regarding the exact complexity of model-checking PSL and its fragments. Also, the decidability of model-checking the qualitative fragment of PSL (with no restriction on the strategies) is open, i.e., the qualitative fragment restricts terms τ to be of the form 0 | 1 | Pβ(ψ), and thus one can express, e.g., Pβ(ψ) = 1, but not Pβ(ψ) = 0.5.

Acknowledgements Benjamin Aminof acknowledges the Austrian Science Fund (FWF) P 32021, and Marta Kwiatkowska acknowledges partial support from the EPSRC Mobile Autonomy Programme Grant EP/M019918/1.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

[Alur et al., 2002] R. Alur, T.A. Henzinger, and O. Kupferman. Alternating-time temporal logic. J.ACM,49(5):672 713, 2002. [Aminof et al., 2018] B. Aminof, V. Malvone, A. Murano, and S. Rubin. Graded modalities in strategy logic. Inf. Comput., 261:634 649, 2018. [Baier et al., 2012] C. Baier, T. Br azdil, M. Gr oßer, and A. Kucera. Stochastic game logic. Acta Inf., 49(4):203 224, 2012. [Belardinelli et al., 2017] F. Belardinelli, A. Lomuscio, A. Murano, and S. Rubin. Veriﬁcation of broadcasting multi-agent systems against an epistemic strategy logic. In IJCAI, 2017. [Ben-Or et al., 1986] M. Ben-Or, D. Kozen, and J. H. Reif. The complexity of elementary algebra and geometry. JCSS, 32(2):251 264, 1986. [Berman, 1980] L. Berman. The complexity of logical theories. Theor. Comput. Sci., 11(1):71 77, 1980. [Berthon et al., 2017] R. Berthon, B. Maubert, A. Murano, S. Rubin, and M.Y. Vardi. Strategy logic with imperfect information. In LICS, 2017. [Bouyer et al., 2016] P. Bouyer, P. Gardy, and N. Markey. On the semantics of strategy logic. IPL, 116(2):75 79, 2016. [Br azdil et al., 2006] T. Br azdil, V. Brozek, V. Forejt, and A. Kucera. Stochastic games with branching-time winning objectives. In LICS, 2006. [Bulling and Jamroga, 2009] N. Bulling and W. Jamroga. What agents can probably enforce. Fundam. Inform., 93(1-3):81 96, 2009. [Cerm ak et al., 2018] P. Cerm ak, A. Lomuscio, F. Mogavero, and A. Murano. Practical veriﬁcation of multiagent systems against SLK speciﬁcations. Inf. Comput., 261:588 614, 2018. [Chatterjee and Henzinger, 2012] K. Chatterjee and T.A. Henzinger. A survey of stochastic ω-regular games. JCSS, 78(2):394 413, 2012. [Chatterjee et al., 2010] K. Chatterjee, T.A. Henzinger, and N. Piterman. Strategy Logic. Inf. Comput., 208(6):677 693, 2010. [Chen and Lu, 2007] T. Chen and J. Lu. Probabilistic alternating-time temporal logic and model checking algorithm. In FSKD, 2007. [Chen et al., 2013a] T. Chen, V. Forejt, M. Kwiatkowska, D. Parker, and A. Simaitis. Automatic veriﬁcation of competitive stochastic systems. FMSD, 43(1):61 92, 2013. [Chen et al., 2013b] T. Chen, V. Forejt, M. Kwiatkowska, D. Parker, and A. Simaitis. PRISM-games: A model checker for stochastic multi-player games. In TACAS, 2013. [Courcoubetis and Yannakakis, 1995] C. Courcoubetis and M. Yannakakis. The complexity of probabilistic veriﬁcation. J. ACM, 42(4):857 907, 1995.

[Fagin et al., 1995] R. Fagin, J. Y. Halpern, Y. Moses, and M.Y. Vardi. Reasoning about Knowledge. MIT, 1995. [Fisman et al., 2010] D. Fisman, O. Kupferman, and Y. Lustig. Rational synthesis. In TACAS, 2010. [Halpern, 2008] J. Y. Halpern. Beyond nash equilibrium: Solution concepts for the 21st century. In KR, 2008. [Huang and Luo, 2013] Xiaowei Huang and Cheng Luo. A logic of probabilistic knowledge and strategy. In AAMAS, pages 845 852, 2013. [Kemeny and Snell, 1976] J. G. Kemeny and J. L. Snell. Finite markov chains. Springer, 1976. [Kupferman et al., 2000] O. Kupferman, M.Y. Vardi, and P. Wolper. An automata-theoretic approach to branchingtime model checking. J. ACM, 47(2):312 360, 2000. [Kupferman et al., 2016] O. Kupferman, G. Perelli, and M.Y. Vardi. Synthesis with rational environments. Ann. Math. Artif. Intell., 78(1):3 20, 2016. [Kwiatkowska et al., 2012] M. Kwiatkowska, G. Norman, and D. Parker. The PRISM benchmark suite. In QEST, 2012. [Kwiatkowska et al., 2018] M. Kwiatkowska, G. Norman, D. Parker, and G. Santos. Automated veriﬁcation of concurrent stochastic games. In QEST, 2018. [Kwiatkowska et al., 2019] M. Kwiatkowska, G. Norman, D. Parker, and G. Santos. Equilibria-based probabilistic model checking for concurrent stochastic games. Co RR, abs/1811.07145, 2019. [Laroussinie and Markey, 2015] F. Laroussinie and N. Markey. Augmenting ATL with strategy contexts. Inf. Comput., 245:98 123, 2015. [Malvone et al., 2018] V. Malvone, F. Mogavero, A. Murano, and L. Sorrentino. Reasoning about graded strategy quantiﬁers. Inf. Comput., 259(3):390 411, 2018. [Mogavero et al., 2014] F. Mogavero, A. Murano, G. Perelli, and M.Y. Vardi. Reasoning about strategies: On the model-checking problem. ACM Trans. Comput. Log., 15(4):34:1 34:47, 2014. [Pnueli, 1977] A. Pnueli. The temporal logic of programs. In FOCS, pages 46 57, 1977. [Russell and Norvig, 1995] S.J. Russell and P. Norvig. Artiﬁcial Intelligence. A Modern Approach. Prentice, 1995. [Schnoor, 2013] H. Schnoor. Epistemic and probabilistic ATL with quantiﬁcation and explicit strategies. In ICAART, volume 449, pages 131 148, 2013. [Svorenov a and Kwiatkowska, 2016] M. Svorenov a and M. Kwiatkowska. Quantitative veriﬁcation and strategy synthesis for stochastic games. EJC, 30:15 30, 2016. [Ummels and Wojtczak, 2011] M. Ummels and D. Wojtczak. The complexity of nash equilibria in stochastic multiplayer games. LMCS, 7(3), 2011. [Vardi, 1985] M.Y. Vardi. Automatic veriﬁcation of probabilistic concurrent ﬁnite-state programs. In FOCS, 1985.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)