# natural_strategic_ability_in_stochastic_multiagent_systems__e4afe2f7.pdf

Natural Strategic Ability in Stochastic Multi-Agent Systems

Rapha el Berthon1, Joost-Pieter Katoen1, Munyque Mittelmann2, Aniello Murano2

1 RWTH Aachen University, Germany 2 University of Naples Federico II, Italy {berthon,katoen}@cs.rwth-aachen.de, {munyque.mittelmann, aniello.murano}@unina.it

Strategies synthesized using formal methods can be complex and often require infinite memory, which does not correspond to the expected behavior when trying to model Multi-Agent Systems (MAS). To capture such behaviors, natural strategies are a recently proposed framework striking a balance between the ability of agents to strategize with memory and the modelchecking complexity, but until now it has been restricted to fully deterministic settings. For the first time, we consider the probabilistic temporal logics PATL and PATL under natural strategies (Nat PATL and Nat PATL , resp.). As main result we show that, in stochastic MAS, Nat PATL model-checking is NP-complete when the active coalition is restricted to deterministic strategies. We also give a 2NEXPTIME complexity result for Nat PATL with the same restriction. In the unrestricted case, we give an EXPSPACE complexity for Nat PATL and 3EXPSPACE complexity for Nat PATL .

Introduction In the last decade, much attention has been devoted to the verification of Multi-Agent Systems (MAS). One of the most important early developments was the Alternatingtime Temporal Logics ATL and ATL (Alur, Henzinger, and Kupferman 2002). Since its initial proposal, ATL has been extended in various directions, considering, for instance, strategy contexts (Laroussinie and Markey 2015) or adding imperfect information and epistemic operators (Jamroga and Bulling 2011). Strategy Logic (SL) (Chatterjee, Henzinger, and Piterman 2010; Mogavero et al. 2014) extends ATL to treat strategies as first-order variables. The probabilistic logics PATL, PATL (Chen and Lu 2007), Stochastic Game Logic (Baier et al. 2012), and PSL (Aminof et al. 2019) enhances ATL, ATL , ATL with strategy contexts, and SL, resp., to the probabilistic setting. Those logics allow us to express that a coalition can enforce that the probability of satisfying their goal meets a specified constraint. The importance of the aforementioned logics lies in the uncertainty often faced by MAS, due to the occurrence of randomization, such as natural events and the behavior of their components (i.e., the agents). While those aspects cannot be known with certainty, they can be measured based on experiments or past observations. Exam-

Copyright 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

ples include, among others, the affluence of users interacting with the system, unknown preference of its agents modeled with probabilistic distributions, and errors of its sensorial components. All the aforementioned logics also have downsides, either complexity-wise or memory-wise. PSL is undecidable, and is still 3EXPSPACE when restricted to memoryless strategies. PATL model checking is in NP co-NP but requires infinite-memory strategies. Stochastic game logic is PSPACE with memoryless deterministic strategies, and EXPSPACE with memoryless probabilistic strategies. These last two results are of interest, but the memoryless assumption is quite restrictive. Natural strategies, first defined in (Jamroga, Malvone, and Murano 2019a), are lists of condition-action pairs with a bounded memory representation. This definition contrasts with combinatorial strategies (i.e., functions from histories to actions), considered typically in the semantics of logics for MAS, including ATL and ATL . The motivation for natural strategies, as argued in (Jamroga, Malvone, and Murano 2019a), is that combinatorial strategies are not realistic in the context of human behavior, because of the difficulty to execute and design complex plans. In particular, systems that are difficult to use are often ignored by the users, even if they respect design specifications such as security constraints. Artificial agents with limited memory or computational power cannot use combinatorial strategies either. On the other end of the spectrum, memoryless strategies that depend only on the current state cannot provide adequate solutions to many planning problems. Natural strategies encompass both bounded memory and specifications of agents with simple strategies, by allowing agents to use some past observations without requiring infinite memory. They aim at capturing the intuitive approach a human would use when describing strategies. As a result, these strategies are easier to explain using natural language. They also intrinsically feature imperfect information, since they reason about the sequence of propositional variables observed in previous states, instead of the states themselves. Although the systems with whom these agents interact may be stochastic, the study of natural strategies has been until now restricted to fully deterministic settings. For the first time, we consider PATL and PATL under natural strategies and investigate their model checking problem for stochastic MAS. Remarkably, the logics we consider can

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Det. Strategies Prob. Strategies Nat PATLr NP-complete EXPSPACE Nat PATL r 2NEXPTIME 3EXPSPACE Nat PATLR NP-complete EXPSPACE Nat PATL R 2NEXPTIME 3EXPSPACE

Table 1: Summary of model checking complexity problems for Nat PATL and Nat PATL with stochastic MAS.

also be seen as an extension of POMDPS to a setting with multiple agents with bounded memory strategies (Chatterjee, Chmelik, and Davies 2016).

Contribution. In this paper, we propose variants of the probabilistic logics PATL and PATL with natural strategies (denoted Nat PATL and Nat PATL , resp.) and study their complexity for model checking. We present complexity results for deterministic and, for the first time, probabilistic natural strategies. With respect to the agents memory, we investigate both the memoryless and bounded recall settings 1. Table 1 summarizes the results of this paper. The main advantage of the logics proposed is that they enable to express and verify the strategic abilities of stochastic MAS in which agents have limited memory and/or computational power, with a reasonably good model checking complexity. In particular, the model checking of Nat PATLR is NP-complete for deterministic natural strategies, and in EXPSPACE for probabilistic natural strategies.

Related Work

Several works consider the verification of stochastic MAS with specifications given in probabilistic logics. In particular, Huang and Luo (2013) study an ATL-like logic for stochastic MAS when agents play deterministic strategies and have probabilistic knowledge. The model checking problem has been studied for Probabilistic Alternating-Time µ-Calculus (Song et al. 2019). Huang, Su, and Zhang (2012) consider the logic Probabilistic ATL (PATL ) under incomplete information and synchronous perfect recall. PATL was also considered under imperfect information and memoryless strategies (Belardinelli et al. 2023), and with accumulated costs/rewards (Chen et al. 2013). Also in the context of MAS, probabilistic logics were used for the verification of unbounded parameterized systems (Lomuscio and Pirovano 2020), resource-bounded systems (Nguyen and Rakib 2019), and under assumptions over opponents strategies (Bulling and Jamroga 2009). Our work is also related to the research on representation of strategies with limited memory. This includes the representation of finite-memory strategies by input/output automata (Vester 2013), decision trees (Br azdil et al. 2015), ATL with bounded memory ( Agotnes and Walther 2009), as well as the use of bounded memory as an approximation of perfect recall (Belardinelli, Lomuscio, and Malvone 2018). More recently, Deuser and Naumov (2020) represented strategies as Mealy machines and studied how bounded recall affects the agents abilities to execute plans.

1As usual, we denote no recall with r and recall with R.

Natural strategies have first been studied in (Jamroga, Malvone, and Murano 2019a) on multiple deterministic settings: finding winning strategies in concurrent games with LTL specifications, deciding if a set of strategies defines a Nash equilibrium, and model checking ATL. This last use of natural strategies has later been extended to ATL with imperfect information (Jamroga, Malvone, and Murano 2019b) and SL (Belardinelli et al. 2022). The study of partially observable MDPs (POMDPs) also considers a variety of strategy representations, as discussed in (Vlassis, Littman, and Barber 2012). When allowing infinite-memory strategies, finding an almost-sure winning strategy with a B uchi or reachability objective requires exponential time on POMDPs, while finding strategies for almost-sure parity objectives (Baier, Bertrand, and Gr oßer 2008; Chatterjee, Doyen, and Henzinger 2010) and for maximizing a reachability objective (Madani, Hanks, and Condon 2003) is undecidable. However, when resticting the memory of the strategies to some fixed bound (Pajarinen and Peltonen 2011; Junges et al. 2018), the complexity of threshold reachability becomes ETR-complete (the existential theory of the reals) with probabilistic strategies and NP-complete with deterministic strategies (Junges 2020). The complexity of almost-sure reachability with bounded memory probabilistic strategies is also NP-complete (Chatterjee, Chmelik, and Davies 2016).

Preliminaries In this paper, we fix finite non-empty sets of agents Ag, actions Ac, and atomic propositions AP. We write c for a tuple of actions (ca)a Ag, one for each agent, and such tuples are called action profiles. Given an action profile c and C Ag, we let c C be the components of agents in C, and c C is (cb)b C. Similarly, we let Ag C = Ag \ C.

Distributions. Let X be a finite non-empty set. A (probability) distribution over X is a function d : X [0, 1] such that P x X d(x) = 1. Let Dist(X) be the set of distributions over X. We write x d for d(x) > 0. If d(x) = 1 for some element x X, then d is a point (a.k.a. Dirac) distribution. If, for i I, di is a distribution over Xi, then, writing X = Q i I Xi, the product distribution of the di is the distribution d : X [0, 1] defined by d(x) = Q i I di(xi).

Markov Chains. A Markov chain M is a tuple (St, p) where St is a countable non-empty set of states and p Dist(St St) is a distribution. For s, t St, the values p(s, t) are called transition probabilities of M. A path is an infinite sequence of states.

Concurrent Game Structures. A stochastic concurrent game structure (or simply CGS) G is a tuple (St, L, δ, ℓ) where (i) St is a finite non-empty set of states; (ii) L : St Ag 2Ac \{ } is a legality function defining the available actions for each agent in each state, we write L(s) for the tuple (L(s, a))a Ag; (iii) for each state s St and each move c L(s), the stochastic transition function δ gives the (conditional) probability δ(s, c)(s ) of a transition from state s for all s St if each player a Ag plays the action ca, and remark that δ(s, c) Dist(St); (iv) ℓ: St 2AP is

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

a labelling function. For each state s St and joint action c Q a Ag L(s, a), we assume that there is a state s St such that δ(s, c)(s ) is non-zero, that is, every state has a successive state from a legal move, formally c L(s, a). Example 1 (Secure voting 2). Assume a voting system with two types of agents: voters and coercers, represented by the disjoint sets V Ag and C Ag, resp. We consider a finite set of receipts, and signatures. The actions of the voters are scan Ballot, enter V ote, cnl V ote, conf, check Sigs, checkrecr, shredr, and noop, which represent that the agent is scanning the ballot, entering their vote, canceling it, confirming it, checking its signature s, checking the receipt r, shredding the receipt r, and doing nothing, resp. On its turn, the coercer can perform the actions coercev, requestv, punishv, and noop, representing that she is coercing the voter v, requesting v to vote, punishing v, and doing nothing, resp. The CGS has propositions denoting the state of the voting system. Specifically, they describe whether the voter v was coerced (coercedv), punished (punishedv), requested to vote (requestedv), has a ballot available (has Ballotv), scanned the ballot (scannedv), entered the vote which has the signature s (ent V otev,s), and has already voted (votv). For a signature s, the proposition sig Oks denotes whether the signature s was checked and corresponds to the one in the system, while the proposition sig Fails denotes that it was checked but did not correspond. For a receipt r, the propositions recv,r and shrededr denotes whether r associated with voter v and whether r was destroyed (and it s no longer visible), resp. Actions performed by the agents may fail and may not change the state of the system as intended by them. For instance, the coercer may not succeed (attempting) to coerce a voter with the action coercev (and thus, coercedv may not be true in the next state). Similarly, a voter s request to shred her receipt may fail, and the information on the receipt be still visible. The probability of an action failing is described by the CGS stochastic transition function.

Plays. A play or path in a CGS G is an infinite sequence π = s0s1 of states such that there exists a sequence c0c1 of joint-actions such that ci L(si) and si+1 δ(si, ci) (i.e., δ(si, ci)(si+1) > 0) for every i 0. We write πi for si, π i for the suffix of π starting at position i. Finite paths are called histories, and the set of all histories is denoted Hist. We write last(h) for the last state of a history h and len(h) for the size of h.

Behavioral Natural Strategies In this section, we define behavioral3 natural strategies over CGS, based on the definition in (Jamroga, Malvone, and

2Our running example on secure voting is adapted from the case study from (Jamroga, Kurpiewski, and Malvone 2020, 2022). 3Behavioral strategies define the probability of taking an action in a state. This is different from mixed strategies, which define the probability of taking a strategy in a game. The relation of behavioral and mixed strategies is discussed in (Kaneko and Kline 1995).

Murano 2019a), and use them to provide the semantics of Nat ATL . Natural strategies are conditional plans, represented through an ordered list of condition-action rules. The intuition is that the first rule whose condition holds in the history of the game is selected, and the corresponding action is executed. The conditions are regular expressions over Boolean formulas over AP, denoted Bool(AP) and given by the following BNF grammar:

φ ::= p | φ φ | φ where p AP.

Given a state s St and a formula φ Bool(AP), we inductively define the satisfaction value of φ in s, denoted s |= φ, as follows:

s |= p iff p ℓ(s) s |= φ1 φ2 iff s |= φ1 or s |= φ2 s |= φ iff not s |= φ

Let Reg(Bool(AP)) be the set of regular expressions over the conditions Bool(AP), defined with the constructors , , * representing concatenation, nondeterministic choice, and Kleene iteration, respectively. Given a regular expression r and the language L(r) of finite words generated by r, a history h is consistent with r iff there exists a word b L(r) such that |h| = |b| and h[i] |= b[i], for all 0 i |h|. Intuitively, a history h is consistent with a regular expression r if the i-th epistemic condition in r holds in the i-th state of h (for any position i in h). A behavioral natural strategy σ with recall for an agent a Ag is a sequence of pairs (r, Dist(Ac)), where r Reg(Bool(AP)) is a regular expression representing recall, and d(Ac) is a distribution over the actions with d(c) = 0 if c is available for a in last(h) (i.e., for c L(a, last(h))), for all histories h consistent with r. The last pair in the sequence is required to be ( *, d(Ac)), with d(c) = 1 for some c L(s, a) and every s St. A behavioral memoryless natural strategy is a behavioral natural strategy without recall: each condition is a Boolean formula (i.e., all regular expressions have length 1). A strategy σ is deterministic if for all pairs (r, d), we have |{c Ac | d(c) = 0}| = 1. For readability of the examples, given a pair (r, d), we write (r, c) if d(c) = 1 for some action c Ac. Example 2 (Secure voting, continued). Recall the voting system introduced in Example 1. The following is a deterministic memoryless natural strategy for the voter v:

1. (has Ballotv scannedv, scan Ballot) 2. ( votv scannedv, enter V ote) 3. ( votv ent V otev,s (sig Oks sig Fails), check Sigs), for each signature s 4. ( votv ent V otev,s sig Fails, cnl V ote), for each s 5. ( votv ent V otev,s sig Oks, conf), for each s 6. (votv recv,r shrededr, shredr), for each receipt r 7. ( , noop)

This strategy specifies that the agent first scans the ballot in case there is one, and it was not scanned (Pair 1). Otherwise, if the agent has not voted yet and has scanned, she enters her vote (Pair 2). If the agent did not vote, entered the vote and did not check the signature, she checks it (3).

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

When the signature is checked, the agent chooses to cancel or confirm the vote, depending on whether the verification has failed or succeeded (Pairs 4 and 5). If the agent has voted and there is an unshredded visible receipt, the agent requests it to be shredded (Pair 6). Finally, if none of the previous conditions apply, the agent does not do any action (Pair 7). A behavioral natural strategy with recall for a coercer is:

1. * V v V coercedv, d V , where d V is a probability distribution over the actions for coercing the voters, with P v V d V (coercev) = 1 2. ( * coercedv requestedv, requestv), for v V 3. ( * requestedv* (requestedv votv)* punishedv, punishv), for each v V 4. ( * coercedv coercedv , dv,v ), where dv,v is a probability distribution over the actions for coercing the voters v and v , with dv,v (coercev)+dv,v (coercev ) = 1, for each pair (v, v ) of distinct voters in V 5. ( *, noop)

This behavioral strategy considers first the situation in which no voter was already coerced, and the agent chooses the action to coerce one of them randomly (Pair 1). Next condition-action pair in the strategy says that if a voter was coerced, but her vote was not requested, the agent requests her vote (Pair 2). If the voter was requested (at least once in the history), but (continually) did not vote and was not punished, the agent punishes her (Pair 3). Next pair says that if there are two distinct non-coerced voters, the agent randomly chooses one to coerce (Pair 4). If none of those conditions apply, no operation is performed (Pair 5). Throughout this paper, let ρ {r, R} denote whether we consider memoryless or recall strategies respectively. Let Strρ a be the set of behavioral natural strategies for agent a and Strρ = a Ag Strρ a. Let size(σa) denote the number of guarded actions in σa, condi(σa) be the i-th guarded condition on σa, condi(σa)[j] be the j-th Boolean formula of the guarded condition σa, and acti(σa) be the corresponding probability distribution on actions. Finally, match(h, σa) is the smallest index i size(σa) such that for all 0 j |last(h)|, h[j] |= condi(σa)[j]4 and acti(σa) L(a, last(h)). In other words, match(h, σa) matches the state last(h) with the first condition in σa that holds in h, and action available in last(h). Given a natural strategy σa for agent a and a history h, we denote by σa(h) the probability distribution of actions assigned by σa in the last state of h, i.e., σa(h) = actmatch(h,σa)(σa).

Complexity of Natural Strategies. The complexity c(σ) of strategy σ is the total size of its representation and is denoted as follows: c(σ) := P (r,d) σ |r|, where |r| is the number of symbols in r, except parentheses.

4Note that, as in (Jamroga, Malvone, and Murano 2019a), we consider the case in which the condition has the same length of the history. There is also the case in which the condition is shorter than the history. This is due to the usage of the Kleene iteration operator. In the latter case, we need to check a finite number of times the same Boolean formula in different states of the history.

Relation to Game Theory. From a game-theoretic point of view, natural strategies can be encoded as finite-memory strategies using finite state machines (e.g., finite-state transducers) that only read the propositional variables holding in a state (akin to imperfect information). Natural strategies and finite-state machines are fundamentally different in their encoding, in particular, the finite state machine may be exponential in the size of the natural strategy it is associated with, as proved in Theorem 12 of (Jamroga, Malvone, and Murano 2019a).

PATL and PATL with Natural Strategies Next, we introduce the Probabilistic Alternating-Time Temporal Logics PATL and PATL with Natural Strategies (denoted, Nat PATL and Nat PATL, resp). Definition 1. The syntax of Nat PATL is defined by the grammar:

φ ::= p | φ φ | φ | Xφ | φUφ | C d k φ

where p AP, k N, C Ag, d is a rational constant in [0, 1], and { , <, >, }. The intuitive reading of the operators is as follows: next X and until U are the standard temporal operators. C d k φ asserts that there exists a strategy with complexity at most k for the coalition C to collaboratively enforce φ with a probability in relation with constant d. We make use of the usual syntactic sugar Fφ := Uφ and Gφ := F φ for temporal operators. A Nat PATL formula of the form C dφ is also called state formula. An important syntactic restriction of Nat PATL , namely Nat PATL, is defined as follows. Definition 2 (Nat PATL syntax). The syntax of Nat PATL is defined by the grammar

φ ::= p | φ φ | φ | C d k (Xφ) | C d k (φUφ)

where p, k, C, d, and are as above. Before presenting the semantics, we show how to define the probability space on outcomes.

Probability Space on Outcomes. An outcome of a strategy profile σ = (σa)a Ag and a state s is a play π that starts in state s and is extended by σ, formally π0 = s, and for every k len(h) there exists ck (σa(π k))a Ag such that πk+1 δ(πk, ck). The set of outcomes of a strategy profile σ and state h is denoted Out(σ, s). A given CGS G, strategy profile σ, and state s induce an infinitestate Markov chain Mσ,s whose states are the histories in Out(σ, s). and whose transition probabilities are defined as p(h, hs ) = P c Ac Ag σ(h)(c) δ(last(h), c)(s ). The Markov chain Mσ,s induces a canonical probability space on its set of infinite paths (Kemeny, Snell, and Knapp 1976), which can be identified with the set of plays in Out(σ, s) and the corresponding measure is denoted out(σ, s). 5 Given a coalition strategy σC Q a C Strρ a, the set of possible outcomes of σC from a state s St to be

5This is a classic construction, see for instance (Clarke et al. 2018; Berthon et al. 2020).

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

the set out C(σC, s) = {out((σC, σ C), s) : σ C Q a Ag C Strρ a} of probability measures that the players in C enforce when they follow the strategy σC, namely, for each a Ag, player a follows strategy σa in σC. We use µσC s to range over the measures in out C(σC, s) as follows:

Definition 3 (Nat PATL and Nat PATL semantics). Given a setting ρ {r, R}, Nat PATL and Nat PATL formulas are interpreted in a stochastic CGS G and a path π,

G, π |=ρ p iff p ℓ(π0) G, π |=ρ φ iff G, π |=ρ φ G, π |=ρ φ1 φ2 iff G, π |=ρ φ1 or G, π |=ρ φ2

G, π |=ρ C d k φ iff σC Y

a C {α Strρ a : c(α) k}

s.t. µσC π0 out C(σC, π0), µσC π0 ({π : G, π |=ρ φ}) d

G, π |=ρ Xφ iff G, π 1 |=ρ φ G, π |=ρ ψ1Uψ2 iff k 0 s.t. G, π k |=ρ ψ2 and j [0, k). G, π j |=ρ ψ1

Motivating Examples In this section, we present problems that motivate reasoning in stochastic MAS, and we illustrate how Nat PATL - formulas can be used to express properties on those systems. Let us start with an example of door access control with a random robot. This example illustrates a setting in which it suffices to have deterministic strategies in stochastic CGSs. Example 3 (Access control). We consider the example illustrated in Figure 1. We are given a set Ag of agents, a set of square tiles, where a non-controlled robot moves randomly either one tile right, left, up, or down at every time step. Between every tile, there is either a wall, a door controlled by some agent with actions open and close, or nothing. The robot can cross an empty space, cannot cross a wall, and can only cross a door if the agent controlling has taken action open. Given a set of targets represented by atomic propositions T = {ti AP, i {1, n}} labelling some tiles, and related coalitions {Ci Ag, i {1, n}}, we use Nat PATL to state that some coalition C Ag has a strategy with memory k N reaching all targets infinitely often with probability 0.7, formally:

tj T,tj =ti F tj (1)

In the example of Figure 1, where n = 2, the coalition controls two doors adjacent to the initial state. Even though the structure is probabilistic, memoryless strategies are sufficient. Opening the left door gives the robot a chance to move left, which brings it closer to target t0, but in this centerleftmost square, the agents not in the coalition may open the door leading to the bottom-left square, where they can then trap the robot: the robot only has probability 1

2 to successfully reach t0, and otherwise may be trapped forever. The other option available to the coalition in the initial state is to close the left door, and open all other doors. The robot will take longer, but has probability 1 to eventually reach target

t1. Thus, we can reach t1 with probability 1, but t0 with only probability 1

2, and property 1 does not hold. Going back to Example 1, we now illustrate how Nat PATL and Nat PATL can be used for the formal security analysis of voting systems. Example 4 (Secure voting, continued). A requirement usually considered for electronic voting systems is voterverifiability, which captures the ability of the voter to verify her vote (Jamroga, Kurpiewski, and Malvone 2022). In our example, this is represented by the propositions sig Oks and sig Fails. The Nat PATL formula

v 0.9 k F(sig Oks sig Fails)

says that the voter v has a strategy of size at most k so that, at some point and with probability at least 0.9, she obtains either the positive or the negative outcome of verifying the signature s. Another requirement is receipt-freeness, which expresses that the voters can not gain a receipt to prove that they voted in a certain way. In our example, the propositions receiptv,r and shrededr represent that a receipt r is associated with the voter v and that the information on it was destroyed. The Nat PATL formula:

v 0.5 k F _

receipt r (receiptv,r shrededr)

says that there is no strategy of complexity at most k to ensure with probability at least 0.5 that, eventually, there will be an unshredded receipt for her.

Model Checking Complexity In this section, we look at the complexity of model checking for different versions of Nat PATL. Definition 4. Given a setting ρ {r, R}, a CGS G, state s St, and formula φ in Nat PATLρ (Nat PATL ρ, resp.), the model checking problem for Nat PATLρ (Nat PATL ρ, resp.) consists in deciding, whether G, s |=ρ φ. Theorem 1. Model checking Nat PATLr (respectively Nat PATLR) with deterministic natural strategies for the coalition is in NP. The main idea is that since we know the memory bound of every strategy of the coalition, we can guess these strategies, and polynomially verify their correctness since we only have to check MDPs with reachability or invariance objectives.

= = = = = = =

Figure 1: A robot in a maze, where = and denote a door respectively controlled by the coalition or the agents not in the coalition. Full lines represent walls. t0, t1 are two targets.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Theorem 2. Model checking Nat PATLr (respectively Nat PATLR) with deterministic natural strategies for the coalition is NP-hard.

Proof. We start by showing that Nat PATLr with deterministic natural strategies for the coalition extends POMDPs with memoryless deterministic strategies and almost-sure reachability objective. Indeed, a POMDP represented as a CGS G = (St, L, δ, ℓ) (two states are indistinguishable if they are labelled by the same propositional variables), a single agent A, and a set of target states distinguished by some propositional variable t that holds only in these states, there exists a strategy almost surely reaching t from an initial state s St if and only if the Nat PATLr formula A =1 |St|(Ft) holds on G. Indeed, memoryless strategies cannot have a complexity higher than |St|, the number of states in the MDP, and so available strategies coincide. In Proposition 2 of (Chatterjee, Chmelik, and Davies 2016), it is shown that finding strategies for POMDPs with memoryless randomized strategies and almost-sure reachability objective is NP-hard. It uses a reduction from Lemma 1 of (Chatterjee, K oßler, and Schmid 2013), that only uses deterministic strategies. As such, finding strategies for POMDPs with memoryless deterministic strategies and almost-sure reachability objective is NPhard, so it is the model checking Nat PATLr with behavioral natural deterministic strategies for the coalition.

Theorem 3. Model checking Nat PATL r (respectively Nat PATL R) with deterministic natural strategies for the coalition is in 2NEXPTIME.

Sketch of proof. The idea is similar to Theorem 1, we guess strategies of bounded size for every coalition, but we now have to check LTL formulas on MDPs, which is 2EXPTIME. We also show 2EXPTIME-hardness with a reduction from LTL on MDPs. This complexity is not surprising, since classical ATL has already doubly exponential complexity (Alur, Henzinger, and Kupferman 2002).

Theorem 4. Model checking Nat PATL r (respectively Nat PATL R) with deterministic natural strategies for the coalition is 2EXPTIME-hard.

Proof. We use a reduction from LTL model checking on MDPs, which is 2EXPTIME-complete. Given an LTL formula φ, a threshold d [0, 1] and a CGS G with only one agent Ag, we say φ holds with at least probability d on G if and only if the Nat PATL r formula 1 d k ( φ) holds on MDP G: this formula states that for any strategy of the agent (without any complexity bound, since the coalition is empty), formula φ holds with probability at least 1 d: this only happens if there is no strategy ensuring φ with at least probability d.

When considering probabilistic strategies, we follow the same technique as (Aminof et al. 2019) to reduce the problem to model checking real arithmetic. Since LTL is subsumed by Nat PATL R, we also have a doubly-exponential blowup. On the other hand, with Nat PATLR, we roughly follow an idea introduced for stochastic game logic (Baier

et al. 2012) to avoid this blowup. As in the proof of Theorem 1, it is sufficient to consider reachability and invariance problems, both of which are polynomial. The same holds for Nat PATL r. Next, we consider the model checking of behavioral strategies. We remark that the 2EXPTIMEhardness from Theorem 4 also applies to Nat PATLr and Nat PATL r with behavioral natural strategies. We now give model checking algorithms and their complexity. Theorem 5. Model checking Nat PATL r (respectively Nat PATL R) with behavioral natural strategies for the coalition is in 3EXPSPACE.

Sketch of proof. Probabilistic Strategy Logic (PSL) with an additional behavioral natural strategies operator nat k captures Nat PATL R, and we show its model checking is in 3EXPSPACE. When translating a Nat PATL R formula into a PSL formula in a bottom-up manner, assuming formula φ can already be translated into some PSL(φ) without any complexity blowup, the C d k φ subformulas, can be translated as nat k σ µPC σ, Ag\C µ(PSL(φ)) d: a coalition satisfies φ iff there exists a natural strategy for the coalition such that for all strategies of the other agents, the PSL translation of φ holds. To model check the operator nat k , we modify the proof of Theorem 1 of (Aminof et al. 2019) showing that model checking PSL with memoryless strategies is in 3EXPSPACE. This proof translates PSL into real arithmetic, and a variable rx,s,a represents the probability for strategy x to take action a in state s. We can extend this notation to behavioral natural strategies: for a strategy with complexity k, we replace variables rx,s,a by rx,s,a,q where q is the current state of the automata representing the regular expressions of a behavioral natural strategy: _

strategies σ, compl(σ) k

(r,a) σ A(r,a)

is an automaton with current state q[r]. We state that two probabilities are equal if they are accepted by the same regular expressions:

(r,a) σ acc(q[r], A(r,a)) acc(q [r], A(r,a))

rx,s,a,q = rx,s,a,q

Both are exponential in the largest k in the formula, since there are exponentially many possible automata of size less or equal to k, and we need to describe at most k of them in every conjunction. Nothing else is changed in the proof of (Aminof et al. 2019), and thus we have a 3EXPSPACE complexity in the size of the Nat PATL R formula, exponential in the size of the system and 2EXPSPACE in the largest complexity k used in the formula.

Theorem 6. Model checking Nat PATLr (respectively Nat PATLR) with behavioral natural strategies for the coalition is in EXPSPACE.

Sketch of proof. The proof is slightly more complicated than the previous one. We again adapt the proof of Theorem 1 of (Aminof et al. 2019). In the proof of The-

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

orem 5, we translate our fragment of PSL with natural strategies to real arithmetic. The only exponential blowup comes when translating the coalition operator into nat k σ µPC σ, Ag\C µ(PSL(φ)) d where φ is assumed to be an LTL formula whose atoms are either propositional variables, or variables representing other formulas starting with P. This translation constructs a deterministic Rabin automaton whose size is exponential in the size of the CGS, double exponential in the size of ψ, and uses a number of quantifiers double exponential in the size of ψ. Since we consider Nat PATLR, this LTL formula may only be either Xφ or φUφ , where φ and φ have been inductively represented as Boolean formulas. Proposition 5.1 and Theorem 5.2 of (Alur, Henzinger, and Kupferman 2002) show that such formulas can be polynomially translated to either reachability or invariance games, which can be done using an automaton and a number of variables both polynomial in the size of the CGS and ψ. Since model checking real arithmetic is exponential in the number of quantifiers of the formula (Ben-Or, Kozen, and Reif 1986; Fitchas, Galligo, and Morgenstern 1987), we obtain that model checking Nat PATLR with behavioral natural strategies for the coalition is in EXPSPACE.

Expressivity

We now compare the expressive power of Nat PATL to that of PATL . We first recall the notions of distinguishing and expressive powers.

Definition 5 (Distinguishing power and expressive power (Wang and Dechesne 2009)). Consider two logical systems L1 and L2, with their semantics (denoted |=L1 and |=L2, resp.) defined over the same class of models M. We say that L1 is at least as distinguishing as L2 (written: L2 d L1) iff for every pair of models M, M M , if there exists a formula φ2 L2 such that M |=L2 φ2 and M |=L2 φ2, then there is also φ1 L1 with M |=L1 φ1 and M |=L1 φ1. Moreover, L1 is at least as expressive as L2 (written: L2 e L1) iff for every φ2 L2 there exists φ1 L1 such that, for every M M, we have M |=L2 φ2 iff M |=L1 φ1.

Nat PATL and PATL are based on different notions of strategic ability. As for the deterministic setting with ATL, each behavioral natural strategy can be translated to a behavioral combinatorial one (i.e., mappings from sequences of states to actions), but not vice versa. Consequently, PATL can express that a given coalition has a combinatorial strategy to achieve their goal, which is not expressible in Nat PATL . On the other hand, Nat ATL allows expressing that a winning natural strategy with bounded complexity does not exist, which cannot be captured in PATL . Now we show that Nat PATL allows expressing properties that cannot be captured in PATL , and vice versa. Theorem 7. For both memoryless and recall semantics:

Nat PATL (resp. Nat PATL ) and PATL (resp, PATL ) have incomparable distinguishing power over CGS. Nat PATL (resp. Nat PATL ) and PATL (resp, PATL ) have incomparable expressive power over CGS.

Sketch of proof. The proof is obtained by an adjustment of the proofs of Prop. 8 and 9 in (Belardinelli et al. 2022), comparing natural strategies with combinatorial strategies.

In this work, we have defined multiple variations of PATL with natural strategies, and studied their model-checking complexity. We have illustrated with multiple examples the relevance of the probabilistic setting, which can represent uncertainty in a very precise way, and the interest in natural strategies, that are both efficient and much closer to what a real-world agent is expected to manipulate. In terms of model checking, the NP-completeness of Nat PATL with deterministic strategies is promising, and shows we can capture POMDPs with bounded memory without any significant loss. While the 2NEXPTIME complexity for Nat PATL with deterministic strategies is high, we have shown a close lower bound, namely 2EXPTIME-hardness. With probabilistic strategies, the EXPSPACE membership of Nat PATL is quite similar to the result of (Baier et al. 2012), and the 3EXPSPACE membership of Nat PATL is also similar to (Aminof et al. 2019). Since this exponential space blowup comes from the use of real arithmetic to encode probabilities, any improvement would likely come from the introduction of a totally new technique. Similarly, the doubly-exponential blowup between PATL and PATL comes from the 2EXPTIMEcompleteness of LTL model checking on MDPs. We also keep the 2EXPTIME-hardness from the deterministic case. To our knowledge, similar works (Baier et al. 2012; Aminof et al. 2019), do also not give different lower bounds between deterministic and probabilistic strategies. A possible approach would be to use a construction from POMDPs, more precisely either (Junges et al. 2018), showing that synthesis on POMDPs with reachability objectives and bounded memory is NP-complete for deterministic strategies and ETR-complete for probabilistic finite-memory strategies or (Oliehoek 2012), showing that finding a policy maximizing a reward on a decentralized POMDPs with full memory is NEXPTIME-complete). Our results on expressivity mean that there are properties of stochastic MAS with natural strategies that cannot be equivalently translated to properties based on combinatorial strategies, and vice versa. The proof of Theorem 5 shows that we could extend natural strategies to PSL, but it would be difficult to get a better result than our 3EXPSPACE complexity. Considering qualitative PATL or PSL (i.e. only thresholds > 0 and = 1) may yield a better complexity. For the quantitative setting, i.e., thresholds such as > 1 2, techniques from the field of probabilistic model checking can be applied, e.g., graph analysis, bisimilation minimization, symbolic techniques, and partial-order reduction (Katoen 2016). Another direction would be to consider epistemic operators. Indeed, many applications involving agents with a reasonable way to strategize also have to take into account the knowledge and beliefs of these agents. As such, we would have to find a good epistemic framework such that natural strategies keep the desired balance between expressivity and complexity.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Acknowledgements This research has been supported by the EU H2020 Marie Sklodowska-Curie project with grant agreement No 101105549, PRIN project RIPER (No. 20203FFYLK), the PNRR MUR project PE0000013-FAIR, the In DAM 2023 project Strategic Reasoning in Mechanism Design , and the DFG Project POMPOM (KA 1462/6-1).

References Agotnes, T.; and Walther, D. 2009. A Logic of Strategic Ability Under Bounded Memory. Journal of Logic, Language and Information, 18(1): 55 77. Alur, R.; Henzinger, T.; and Kupferman, O. 2002. Alternating-time temporal logic. J. ACM, 49(5): 672 713. Aminof, B.; Kwiatkowska, M.; Maubert, B.; Murano, A.; and Rubin, S. 2019. Probabilistic Strategy Logic. In Proc. of IJCAI 2019, 32 38. ijcai.org. Baier, C.; Bertrand, N.; and Gr oßer, M. 2008. On Decision Problems for Probabilistic B uchi Automata. In FOSSACS. Baier, C.; Br azdil, T.; Gr oßer, M.; and Kucera, A. 2012. Stochastic game logic. Acta Informatica, 49(4): 203 224. Belardinelli, F.; Jamroga, W.; Malvone, V.; Mittelmann, M.; Murano, A.; and Perrussel, L. 2022. Reasoning about Human-Friendly Strategies in Repeated Keyword Auctions. In AAMAS-22. Belardinelli, F.; Jamroga, W.; Mittelmann, M.; and Murano, A. 2023. Strategic Abilities of Forgetful Agents in Stochastic Environments. In Proc. of KR-23. Belardinelli, F.; Lomuscio, A.; and Malvone, V. 2018. Approximating Perfect Recall When Model Checking Strategic Abilities. In Proc. of KR 2018. Ben-Or, M.; Kozen, D.; and Reif, J. H. 1986. The Complexity of Elementary Algebra and Geometry. J. Comput. Syst. Sci., 32(2): 251 264. Berthon, R.; Fijalkow, N.; Filiot, E.; Guha, S.; Maubert, B.; Murano, A.; Pinault, L.; Pinchinat, S.; Rubin, S.; and Serre, O. 2020. Alternating Tree Automata with Qualitative Semantics. ACM Trans. Comput. Logic, 22(1): 1 24. Br azdil, T.; Chatterjee, K.; Chmelik, M.; Fellner, A.; and Kret ınsk y, J. 2015. Counterexample Explanation by Learning Small Strategies in Markov Decision Processes. In CAV 2015, LNCS 9206, 158 177. Springer. Bulling, N.; and Jamroga, W. 2009. What Agents Can Probably Enforce. Fundam. Informaticae, 93(1-3): 81 96. Chatterjee, K.; Chmelik, M.; and Davies, J. 2016. A Symbolic SAT-Based Algorithm for Almost-Sure Reachability with Small Strategies in POMDPs. In AAAI, 3225 3232. AAAI Press. Chatterjee, K.; Doyen, L.; and Henzinger, T. A. 2010. Qualitative Analysis of Partially-Observable Markov Decision Processes. In Proc. of MFCS. Chatterjee, K.; Henzinger, T. A.; and Piterman, N. 2010. Strategy Logic. Inf. Comput., 208(6): 677 693. Chatterjee, K.; K oßler, A.; and Schmid, U. 2013. Automated analysis of real-time scheduling using graph games. In HSCC, 163 172. ACM.

Chen, T.; Forejt, V.; Kwiatkowska, M.; Parker, D.; and Simaitis, A. 2013. Automatic verification of competitive stochastic systems. Formal Methods in System Design, 43: 61 92. Chen, T.; and Lu, J. 2007. Probabilistic alternating-time temporal logic and model checking algorithm. In Proc. of FSKD, 35 39. Clarke, E.; Grumberg, O.; Kroening, D.; Peled, D.; and Veith, H. 2018. Model checking. MIT press. Deuser, K.; and Naumov, P. 2020. On composition of bounded-recall plans. Artificial Intelligence, 289: 103399. Fitchas, N.; Galligo, A.; and Morgenstern, J. 1987. Algorithmes rapides en sequentiel et en parallele pour l elimination des quantificateurs en G eom etrie elementaire. Seminaire sur les structures alg ebriques ordonn ees. Huang, X.; and Luo, C. 2013. A logic of probabilistic knowledge and strategy. In Proc. of AAMAS 2013, 845 852. Huang, X.; Su, K.; and Zhang, C. 2012. Probabilistic Alternating-Time Temporal Logic of Incomplete Information and Synchronous Perfect Recall. In Proc. of AAAI 2012. Jamroga, W.; and Bulling, N. 2011. Comparing variants of strategic ability. In Proc. of IJCAI 2011, 252 257. IJCAI/AAAI. Jamroga, W.; Kurpiewski, D.; and Malvone, V. 2020. Natural strategic abilities in voting protocols. In Int. Workshop on Socio-Technical Aspects in Security and Trust, 45 62. Jamroga, W.; Kurpiewski, D.; and Malvone, V. 2022. How to measure usable security: Natural strategies in voting protocols. Journal of Computer Security, 30(3): 381 409. Jamroga, W.; Malvone, V.; and Murano, A. 2019a. Natural strategic ability. Artificial Intelligence, 277: 103170. Jamroga, W.; Malvone, V.; and Murano, A. 2019b. Natural Strategic Ability under Imperfect Information. In Proc. AAMAS 2019. Junges, S. 2020. Parameter synthesis in Markov models. Ph.D. thesis, RWTH Aachen University, Germany. Junges, S.; Jansen, N.; Wimmer, R.; Quatmann, T.; Winterer, L.; Katoen, J.; and Becker, B. 2018. Finite-State Controllers of POMDPs using Parameter Synthesis. In Proc. of UAI 2018. Kaneko, M.; and Kline, J. J. 1995. Behavior strategies, mixed strategies and perfect recall. International Journal of Game Theory, 24: 127 145. Katoen, J.-P. 2016. The probabilistic model checking landscape. In Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer Science, 31 45. Kemeny, J. G.; Snell, J. L.; and Knapp, A. W. 1976. Stochastic Processes. In Denumerable Markov Chains, 40 57. Springer. Laroussinie, F.; and Markey, N. 2015. Augmenting ATL with strategy contexts. Inf. Comput., 245: 98 123. Lomuscio, A.; and Pirovano, E. 2020. Parameterised verification of strategic properties in probabilistic multi-agent systems. In Proc. of AAMAS 2020, 762 770.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Madani, O.; Hanks, S.; and Condon, A. 2003. On the undecidability of probabilistic planning and related stochastic optimization problems. Artif. Intell., 147(1-2): 5 34. Mogavero, F.; Murano, A.; Perelli, G.; and Vardi, M. Y. 2014. Reasoning About Strategies: On the Model-Checking Problem. ACM Trans. Comput. Log., 15(4). Nguyen, H. N.; and Rakib, A. 2019. A Probabilistic Logic for Resource-Bounded Multi-Agent Systems. In Proc. of IJCAI 2019, 521 527. Oliehoek, F. A. 2012. Decentralized POMDPs. In Reinforcement Learning: State-of-the-Art, 471 503. Springer. Pajarinen, J.; and Peltonen, J. 2011. Periodic Finite State Controllers for Efficient POMDP and DEC-POMDP Planning. In Proc. of NIPS 2011., 2636 2644. Song, F.; Zhang, Y.; Chen, T.; Tang, Y.; and Xu, Z. 2019. Probabilistic alternating-time µ-calculus. In Proc. of AAAI, 6179 6186. Vester, S. 2013. Alternating-time temporal logic with finitememory strategies. In Gand ALF 2013, 194 207. Vlassis, N.; Littman, M. L.; and Barber, D. 2012. On the Computational Complexity of Stochastic Controller Optimization in POMDPs. ACM Trans. Comput. Theory, 4(4): 12:1 12:8. Wang, Y.; and Dechesne, F. 2009. On expressive power and class invariance. ar Xiv preprint ar Xiv:0905.4332.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)