# reactive_synthesis_of_dominant_strategies__8c5b615e.pdf

Reactive Synthesis of Dominant Strategies

Benjamin Aminof1,3, Giuseppe De Giacomo2, 3, Sasha Rubin4

1 TU Wien 2 University of Oxford 3 Universit a degli Studi di Roma La Sapienza 4 University of Sydney aminof@forsyte.at, giuseppe.degiacomo@cs.ox.ac.uk, degiacomo@diag.uniroma1.it, sasha.rubin@sydney.edu.au

We study the synthesis under environment speciﬁcations problem for LTL/LTLf which, in particular, generalizes FOND (strong) planning with these temporal goals. We consider the case where the agent cannot enforce its goal for which the argument for using best-effort strategies has been made and study the intermediate ground, between enforcing and best-effort strategies, of dominant strategies. Intuitively, such strategies achieve the goal against any environment for which it is achievable. We show that dominant strategies may exist when enforcing ones do not, while still sharing with the latter many desirable properties such as being interchangeable with each other, and being monotone with respect to tightening of environment speciﬁcations. We give necessary and sufﬁcient conditions for the existence of dominant strategies, and show that deciding if they exist is 2EXPTIME-complete the same as for enforcing strategies. Finally, we give a uniform, optimal, game-theoretic algorithm for simultaneously solving the three synthesis problems of enforcing, dominant, and best-effort strategies.

Introduction In Reasoning about Actions, the agent has an explicit representation of the environment, corresponding to what it knows about the behavior of the environment it is operating in (Mc Carthy and Hayes 1969; Green 1969). Nondeterministic environments are often represented as nondeterministic domains, e.g., in Planning on Fully Observable Nondeterministic Domains (FOND) (Geffner and Bonet 2013). Planning on nondeterministic domains can be seen as a prominent case of Reactive Synthesis (Church 1963) under environment speciﬁcations (Aminof et al. 2018, 2019), which is the problem of automatically constructing a reactive system from a logical speciﬁcation. That is, environments can be speciﬁed as sets of environment strategies that enforce a given temporal logic formula and the computed reactive system is the sought after plan. In this paper, we consider speciﬁcations expressed in linear-time temporal logic over inﬁnite traces (LTL) and ﬁnite traces (LTLf), c.f. (Pnueli and Rosner 1989; De Giacomo and Vardi 2015). The standard solution concept in Reactive Synthesis is that of an enforcing strategy (Pnueli and Rosner 1989) (cf.

Copyright 2023, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

strong policy in FOND planning), i.e., an agent strategy that achieves the goal against all the environment strategies that conform to the environment speciﬁcation. However, often an enforcing strategy does not exist, leading to the introduction of other solution concepts, including strong-cyclic policies, which exploit the fact that often the environment resolves nondeterminism stochastically (Daniele, Traverso, and Vardi 1999; Cimatti et al. 2003), and more recently besteffort strategies, i.e., those that achieve the goal against a maximal set of environment strategies that conform to the speciﬁcation (Berwanger 2007; Faella 2009; Aminof et al. 2020, 2021; Aminof, De Giacomo, and Rubin 2021). In this work we study dominant agent strategies, i.e., those that achieve the goal against the maximum set of environment strategies that conform to the speciﬁcation. Clearly, enforcing strategies are dominant, and dominant strategies are best-effort. However, while best-effort strategies always exist, neither dominant nor enforcing strategies always exist. Intuitively, an enforcing strategy ensures getting the best imaginable outcome, a best-effort strategy ensures that one has made the best possible decision (given that the decision has to be made before the exact environment is known), and a dominant strategy ensures that one will always get the best possible outcome. Our contributions are as follows:

1. We show that dominant strategies, like enforcing ones, are (i) interchangeable with each other and (ii) monotone with respect to tightening of environment speciﬁcations.

2. We provide a local characterization of the dominant strategies in terms of what the strategy should achieve from any given history. The characterization is inspired by a similar characterization of the best-effort strategies (Berwanger 2007; Aminof et al. 2020).

3. We study the problem of ﬁnding a dominant strategy given goals and environments speciﬁed in LTL/LTLf. We prove that the corresponding realizability problem is 2EXPTIME-complete, the same as that for enforcing strategies. Based on the local characterization, we give a uniform, optimal, game-theoretic algorithm for simultaneously solving the three synthesis problems of enforcing, dominant, and best-effort strategies. The algorithm generalizes the one for best-effort strategies (Aminof, De Giacomo, and Rubin 2021).

The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23)

Preliminaries For a sequence x, we write xi for its ith element; the ﬁrst element is x0; the length of x is |x| N { } (with the convention that 1 = ); the preﬁx of x of length 0 i |x| is denoted by x<i or x i 1.

Linear-time Temporal Logic (LTL) For a set AP of atomic propositions, formulas of LTL over AP are deﬁned by the following BNF (where p AP):

ϕ::=p|ϕ ϕ| ϕ|Xϕ|ϕ U ϕ

We use the usual abbreviations, ϕ ϕ .= ϕ ϕ , true .= p p, F ϕ .= true U ϕ, G ϕ .= F ϕ, etc. The size |ϕ| of a formula ϕ is the number of symbols in it. A trace τ (2AP )ω is an inﬁnite sequence of valuations of the atoms. For n 0, write τn for the valuation at position n. Given a trace τ, an integer n, and an LTL formula ϕ, the satisfaction relation (τ, n) |= ϕ, stating that ϕ holds at step n of the sequence τ, is deﬁned as follows:

(τ, n) |= p iff p τn; (τ, n) |= ϕ1 ϕ2 iff (τ, n) |= ϕ1 or (τ, n) |= ϕ2; (τ, n) |= ϕ iff it is not the case that (τ, n) |= ϕ; (τ, n) |= X ϕ iff n < |τ| 1 and (τ, n + 1) |= ϕ; (τ, n) |= ϕ1 U ϕ2 iff (τ, m) |= ϕ2 for some n m < |τ|, and (τ, j) |= ϕ1 for all n j < m.

Here X is read next and U is read until . Write τ |= ϕ if (τ, 0) |= ϕ, read τ satisﬁes ϕ. We also consider the variant LTLf (Bacchus and Kabanza 2000; Baier and Mc Ilraith 2006; De Giacomo and Vardi 2013). It has the same syntax and semantics as LTL above except that τ is a ﬁnite sequence. Observe that the satisfaction of X and U on ﬁnite traces is deﬁned pessimistically , i.e., a trace cannot end before the promised eventuality holds. We use the following convention for interpreting LTLf formulas ψ over inﬁnite traces τ: write τ |= ψ to mean that some ﬁnite preﬁx of τ satisﬁes ψ (analogous to the agent using an explicit stop action).

Reactive Synthesis The problem of (reactive) synthesis is to construct an agent that achieves a goal while continually interacting with its environment. We specify goals and environments with LTL formulas. Let X and Y be disjoint ﬁnite sets of Boolean variables, called the environment variables and agent variables, respectively. Then 2X (resp. 2Y) is the set of environment (resp. agent) moves. A play π .= X0 Y0 X1 Y1 is an element of (2X 2Y)ω. The trace induced by π is the inﬁnite sequence (Xi Yi)i 0, i.e., position i of the trace consists of both Xi and Yi. A play π satisﬁes an LTL/LTLf formula ϕ over AP = X Y, written π |= ϕ, if the trace induced by π satisﬁes ϕ. A history h is a ﬁnite preﬁx of a play. An agent strategy is a function σag : (2X 2Y) 2X 2Y that maps histories ending in environment moves to agent moves. Similarly, an environment strategy is a function σenv : (2X 2Y) 2X that maps histories ending in agent moves (including, since the environment moves ﬁrst, the empty history λ) to environment moves. We let Σall ag denote the set of all agent strategies. A strategy σ is called oblivious if its output only depends on the length of the input history, i.e., if |h| = |h |

σ(h) = σ(h ). Intuitively, an oblivious strategy is blind to the actions of the other side, and only sees the passage of time; it is thus equivalent to an inﬁnite string of moves. For an agent (resp. env) strategy σ and a play/history ρ, say that σ and ρ are consistent, if for every proper preﬁx ρ<i of odd (resp. even) length we have that ρi = σ(ρ<i). Let PLAY(σag, σenv) be the unique play consistent with both σag and σenv. An agent strategy σag enforces ψ, written σag ψ, iff PLAY(σag, σenv) |= ψ for every environment strategy σenv; if such a strategy exists we say the agent can enforce ψ. A symmetric deﬁnition holds for environment strategies. Let E be an LTL/LTLf formula. We write ΣE env for the set of environment strategies that enforce E. If ΣE env = we say that E is an environment speciﬁcation (aka assumption). The idea of requiring environment speciﬁcations to be environment-enforceable formulas is justiﬁed in (Aminof et al. 2018). Write ΣE,h env for those environment strategies that enforce E and are consistent with a history h. For the rest of this paper, we use E to denote an environment speciﬁcation. The synthesis under environment speciﬁcations problem1 asks, for an LTL goal ϕ and an LTL environment speciﬁcation E, to ﬁnd (if there is one) an agent strategy σag such that PLAY(σag, σenv) |= ϕ for every σenv that enforces E; and in this case we say that σag enforces ϕ under E. This problem is 2EXPTIME-complete (Aminof et al. 2019, 2018). The case E = true, called the synthesis problem, was pioneered in (Pnueli and Rosner 1989) and has the same complexity.

Deﬁning Dominant Strategies To formalize the notion of dominant strategies, we ﬁrst deﬁne what it means for an agent strategy to dominate another. Let E be an environment speciﬁcation, and let ϕ be an agent goal. Deﬁne a binary relation ϕ|E on agent strategies: Deﬁnition 1 (Dominance). σ1 ϕ|E σ2 iff for every σenv E, PLAY(σ2, σenv) |= ϕ implies PLAY(σ1, σenv) |= ϕ. In this case we say that σ1 dominates σ2 (for goal ϕ under environment speciﬁcation E). As usual, write σ1 >ϕ|E σ2 iff σ1 ϕ|E σ2 and σ2 ϕ|E σ1. If σ1 >ϕ|E σ2 we say that σ1 strictly dominates σ2 (for goal ϕ under environment speciﬁcation E). Intuitively, σ1 >ϕ|E σ2 means that σ1 does at least as well as σ2 against every environment strategy enforcing E, and strictly better against at least one such strategy. The relation ϕ|E is a preorder, and >ϕ|E is a strict partial order. A maximum element (if it exists) in the preorder ϕ|E is called dominant: Deﬁnition 2 (Dominant). An agent strategy σ1 is dominant, aka maximum, for the goal ϕ under the environment speciﬁcation E, iff σ1 ϕ|E σ2 for every agent strategy σ2. If there is no environment speciﬁcation, i.e., E = true, we say that σ2 is dominant for the goal ϕ. A slightly weaker notion than being dominant is being best-effort (Berwanger 2007; Aminof, De Giacomo, and Rubin 2021): Deﬁnition 3 (Best-effort). An agent strategy σ1 is besteffort, aka maximal, for the goal ϕ under the env. speciﬁcation E, iff there is no agent strategy σ2 s.t. σ2 >ϕ|E σ1.

1In the literature this is sometimes called the synthesis under assumptions problem.

Given ϕ, E, and an agent strategy σ, let ΣE env(ϕ, σ) ΣE env be the set of environment strategies σenv ΣE env such that PLAY(σ, σenv) |= ϕ, i.e., the set of environment strategies σenv that enforce E and against which σ achieves the goal. Using this notation we can characterize enforcing, dominant, and maximal strategies (for a goal ϕ under the environment speciﬁcation E) as follows: σ is enforcing iff ΣE env(ϕ, σ) = ΣE env; it is dominant iff ΣE env(ϕ, σ ) ΣE env(ϕ, σ) for every σ ; and it is maximal iff ΣE env(ϕ, σ) ΣE env(ϕ, σ ) for every σ . In particular, if σ2 is not maximal, say σ1 strictly dominates it, then an agent that uses σ2 is not doing its best : if it used σ1 instead it could achieve the goal against a strictly larger set of environment strategies. This explains why, within this framework, maximal strategies are called best-effort . When the goal ϕ and the env. speciﬁcation E are clear from the context we may say enforcing , dominant , and best-effort without explicitly saying for the goal ϕ under the environment speciﬁcation E . An easy consequence of the deﬁnitions is the following: Theorem 1 (Hierarchy). For a goal ϕ and environment speciﬁcation E, every enforcing strategy is dominant, and every dominant strategy is best-effort. Moreover: 1. If there exists an enforcing agent strategy then the dominant agent strategies are exactly the enforcing ones. 2. If there exists a dominant agent strategy then the besteffort agent strategies are exactly the dominant ones. Best-effort strategies always exist, and ﬁnding them is not harder than solving the classic synthesis problem. Theorem 2. (Aminof, De Giacomo, and Rubin 2021) Given LTL formulas ϕ, E, there exists a best-effort strategy for ϕ under E. Furthermore, ﬁnding such a strategy can be done in 2EXPTIME. The following example shows that dominant strategies (and thus also enforcing strategies) do not always exist. Example 1. Let the environment and the agent each have a single variable (x, y, respectively), and consider a matching pennies game where the agent moves ﬁrst. This can be encoded by the goal ϕ .= y X x. The environment is E .= true. Clearly, the agent has no strategy that enforces the goal. Moreover, there is no dominant strategy since the two agent strategies, i.e., σi in which the agent s ﬁrst move is y = i for i {false, true}, are incomparable (technically, there are inﬁnitely many agent strategies; however, effectively there are only two since only the ﬁrst agent move is meaningful for the given goal). Indeed, let δi be the environment strategy that always does x = i, for i {false, true}, and note that PLAY(σi, δj) |= ϕ iff i = j. Finally, note that all agent strategies are best-effort. The following example shows a case where enforcing strategies do not exist, but dominant ones do. Example 2. Consider a robotic vacuum cleaner which at each time step cleans one of two possible rooms. This is encoded using the agent variable Y = {CA}, where CA = true (resp. CA = false) means that the robot cleans room A (resp. B). Also, at each time step, the environment can optionally add dirt to each room. This is encoded using the environment variables X = {DA, DB}, where a true variable

means that dirt is added to the corresponding room. Consider the goal ϕ .= G((DA CA) (DB CA)) stating that when dirt is added to a room it should be cleaned in the same time step. The environment is E .= true. The agent has no enforcing strategy since the goal is violated if at any time both environment variables are true; however, it has dominant strategies: e.g., the strategy σ that at every time step cleans room A if dirt was added to it, and otherwise cleans room B. Finally, note that if we take E .= F(DA X DA) G DB, then more agent strategies become dominant: e.g., the strategy σ that behaves like σ except that once it sees that DA holds for two consecutive time steps always cleans room A.

In Theorem 10 we show that ﬁnding a dominant strategy is 2EXPTIME hard. This fact, combined with Theorem 1 and Theorem 2, imply that from a practical standpoint at least when viewing things through the lens of complexity classes one should always look for a best-effort strategy, instead of a dominant or an enforcing one. Indeed, the cost of ﬁnding a best-effort strategy is not higher than looking for a dominant or enforcing one, and in case a dominant (resp. enforcing) strategy exists, the returned best-effort strategy will be dominant (resp. enforcing). Hence, the main interest when it comes to dominant strategies (or enforcing ones for that matter) is deciding or characterizing when they exist.

Characterizing Dominant Strategies

While the appeal of enforcing strategies is obvious, and the appeal of best-effort strategies was argued before see (Berwanger 2007; Aminof, De Giacomo, and Rubin 2021) for more on this one may justiﬁably ask what is the appeal of dominant strategies. One answer is the following: dominant strategies (like enforcing strategies) have the property that they are interchangeable. In other words, two dominant agent strategies σ1 and σ2 perform exactly the same in all environments: they either both achieve the goal or both fail to do so. On the other hand, choosing between two besteffort strategies σ1 and σ2 (that are not dominant) is always a gamble in the sense that against some non-empty set of environment strategies, σ1 achieves the goal and σ2 does not, while against another non-empty set of environment strategies the situation will be reversed. Thus: selecting an enforcing strategy ensures always getting the best imaginable outcome; selecting a dominant strategy ensures that one will always get the best possible outcome, whereas selecting a best-effort strategy ensures that one has made the best possible decision (given that the decision has to be made before the exact environment is known).

Hopeful characterization Another answer to the question of why dominant strategies are appealing is given by the following characterization which states, intuitively, that a dominant strategy achieves the goal in all cases except when it is absolutely impossible. Call an environment strategy σenv hopeless (wrt ϕ) if it enforces ϕ; otherwise call it hopeful. Thus, the agent has no way of achieving ϕ against hopeless environments.

Proposition 3. Given ϕ, E, an agent strategy σ is dominant (wrt ϕ|E) iff PLAY(σ, σenv) |= ϕ for every hopeful environment strategy σenv ΣE env.

Proof. For the if direction: clearly, for every agent strategy σ we have that ΣE env(ϕ, σ ) does not contain any hopeless strategy, and thus ΣE env(ϕ, σ ) ΣE env(ϕ, σ), so σ is dominant. For the other direction, assume that σ is dominant and consider a hopeful environment strategy σenv ΣE env. Being hopeful, there is an agent strategy σ such that σenv ΣE env(ϕ, σ ), and since σ is dominant then ΣE env(ϕ, σ ) ΣE env(ϕ, σ). Hence, σenv ΣE env(ϕ, σ) as required.

When there is no environment speciﬁcation, i.e., when we take E = true, and there are no enforcing agent strategies, then a somewhat surprising connection between the existence of dominant agent strategies and the existence of oblivious hopeless environment strategies exists:

Proposition 4. Given a goal ϕ, if the agent has a dominant strategy for ϕ, that does not enforce ϕ, then there is an oblivious environment strategy enforcing ϕ.

Proof. Let σag be a dominant but not enforcing strategy. By Theorem 1, the agent cannot enforce ϕ and thus, by the Borel-determinacy (Martin 1975), of the corresponding game, the environment can enforce ϕ, say using the strategy σenv. Let π .= PLAY(σag, σenv), and let σ env be the oblivious strategy deﬁned by letting σ env(h) = π|h|. Observe that

PLAY(σag, σ env) = π, and that π |= ϕ. Since σag is dominant it follows that PLAY(σ, σ env) |= ϕ for every agent strategy σ, so σ env enforces ϕ, i.e., is hopeless.

Proposition 4 gives a necessary condition namely the existence of an oblivious hopeless environment strategy for the existence of dominant non-enforcing strategies for a goal ϕ. This condition is, however, not sufﬁcient. To see that, take Example 1 but modify the goal to be ϕ .= x (y X x). Note that an environment strategy that starts with x is hopeless, but there is no dominant strategy (use the same arguments as for the original goal in Example 1). At ﬁrst glance, Proposition 4 suggests that dominant strategies may be somewhat rare: why should one expect that ϕ can be enforced by an oblivious environment that is blind to the agent s actions? Note, however, that this proposition only covers the case with no environment speciﬁcation, where it is quite likely that the modeling over-approximates the possible environments so much that it admits spurious oblivious hopeless strategies. Once an environment speciﬁcation is introduced, such strategies would most probably fail to enforce it, and will be ignored by Deﬁnition 1.

Monotonicity wrt tightening Another desirable property shared by enforcing and dominant strategies is monotonicity with respect to tightening of environment speciﬁcations. Given E1, E2, we say that E1 is a tightening of E2 if ΣE1 env ΣE2 env. Proposition 3 implies that, given a goal ϕ, every dominant strategy for ϕ under E2 is also dominant for ϕ under a tightening E1 of E2. I.e., tightening the environment speciﬁcation cannot remove dominant strategies, but may add ones.

(The same holds for enforcing strategies, as can be easily deduced from the deﬁnition.) Proposition 5. For a goal ϕ and env. speciﬁcations E1, E2 s.t. ΣE1 env ΣE2 env, if σ is dominant (resp. enforcing) for ϕ under E2 then it is dominant (resp. enforcing) for ϕ under E1. It is interesting to note that this monotonicity is not in general true for best-effort strategies. That is, a strategy that is best-effort for a goal under one environment speciﬁcation may or may not be best effort for the same goal under a tightening of that speciﬁcation. In fact, the situation is quite intricate: (Aminof et al. 2021) show an example where tightening the speciﬁcation reduces the number of best-effort strategies, as well as an example with three environments ΣE1 env ΣE2 env ΣE3 env and an agent strategy σ that is besteffort for a goal ϕ under E1 as well as E3, but not under E2. We remark that the possible lack of monotonicity for besteffort strategies is mitigated by the following facts. First, by Theorem 1, when dominant or enforcing strategies exist then they are exactly the best-effort strategies. Second, it is shown in (Aminof et al. 2021) that given a goal ϕ and a chain of environment speciﬁcations ΣE1 env ΣE2 env ΣEn env, one can always ﬁnd an agent strategy σ that is simultaneously best-effort for ϕ under Ei for all 1 i n.

Local Characterization of Dominant Strategies We now provide a local characterization of the dominant strategies. Let HE(σag) be the set of histories h that are consistent with the agent strategy σag and for which ΣE,h env = . Deﬁnition 4. Given ϕ, E, an agent strategy σag, and a history h HE(σag), deﬁne valϕ|E(σag, h) as follows:

valϕ|E(σag, h) := 1 (winning) if PLAY(σag, σenv) |= ϕ for every σenv ΣE,h env ; valϕ|E(σag, h) := 1 (losing) if PLAY(σag, σenv) |= ϕ for every σenv ΣE,h env ; valϕ|E(σag, h) := 0 (pending) otherwise. We can compare agent strategies by looking at histories at which they make a different decision, i.e., split . Formally, σ1, σ2 split at a history h if h ends in an environment move, is consistent with σ1 and with σ2, and σ1(h) = σ2(h). The following proposition characterizes dominance by comparing the values of agent strategies at their split points. Proposition 6 (Characterization of Dominance). (Aminof et al. 2020; Berwanger 2007) Given agent strategies σ1, σ2, we have that σ1 ϕ|E σ2 iff for every history h with ΣE,h env = , at which σ1, σ2 split: (1) valϕ|E(σ1, h) valϕ|E(σ2, h), and (2) it does not hold that valϕ|E(σ1, h) = valϕ|E(σ2, h) = 0. The next deﬁnition assigns values to histories alone: Deﬁnition 5. (Aminof et al. 2020; Berwanger 2007) For a history h, deﬁne valϕ|E(h) as the maximum of valϕ|E(σag, h), where σag varies over all agent-strategies for which h HE(σag); if there are no such strategies then write valϕ|E(h) = und (which stands for undeﬁned ). We also call a history h winning , pending , or losing if valϕ|E(h) is 1, 0, or 1, respectively.

Proposition 7 (Local characterisation of best-effort). (Aminof et al. 2020; Berwanger 2007) An agent strategy σ is best-effort (wrt ϕ|E) iff valϕ|E(σ, h) = valϕ|E(h) for every history h HE(σ) that ends in an environment move. The following characterizes dominant strategies. Theorem 8 (Local characterisation of dominant). An agent strategy σ1 is dominant (wrt ϕ|E) iff for every h HE(σ1) that ends in an environment move: (a) if valϕ|E(h) = 1 then valϕ|E(σ1, h) = 1, and (b) if valϕ|E(h) = 0 then (i) valϕ|E(σ1, h) = 0, and (ii) valϕ|E(h Y ) = 1 for every agent move Y = σ1(h).

Proof. We start with the if direction. Let σ2 be any agent strategy, and let h be any history with ΣE,h env = at which σ1, σ2 split. Note that this means, in particular, that h HE(σ1), and thus conditions (a) and (b) above apply to h. To show that σ1 is dominant it is enough to show that the conditions (1) and (2) of Proposition 6 hold. There are three cases to consider. First, if valϕ|E(h) = 1, then by (a) also valϕ|E(σ1, h) = 1, and so (1) and (2) hold. Second, if valϕ|E(h) = 1, then valϕ|E(σ1, h) = valϕ|E(σ2, h) = 1, and so (1) and (2) hold. Third, if valϕ|E(h) = 0 then by the ﬁrst part of (b) we have that valϕ|E(σ1, h) = 0 so (1) holds; since σ1, σ2 split at h we have that σ1(h) = σ2(h) and thus by the second part of (b) valϕ|E(σ2, h) = valϕ|E(σ2, h σ2(h)) = 1 and (2) holds. For the other direction, assume that σ1 is dominant and let h be a history as deﬁned in the condition of this theorem. Since a dominant strategy is in particular maximal then by Proposition 7 valϕ|E(h) = valϕ|E(σ1, h), and thus (a) and the ﬁrst part of (b) hold. Assume by contradiction that the second part of (b) does not hold, i.e., that there is some Y = σ1(h) with valϕ|E(h Y ) = 1. Since h ends in an environment move then valϕ|E(h Y ) valϕ|E(h) and thus, since valϕ|E(h) = 0, it must be that valϕ|E(h Y ) = 0. Let σ2 be an agent strategy that witnesses the fact that valϕ|E(h Y ) = 0 (i.e., h Y HE(σ2) and valϕ|E(σ2, h Y ) = 0), and note that h Y HE(σ2) implies that σ2(h) = Y . It follow that valϕ|E(σ2, h) = 0 and that σ1, σ2 split at h. Recall that the assumption that σ1 is dominant implies that σ1 ϕ|E σ2, and obtain a contradiction to item (2) in Proposition 6.

Theorem 8 highlights an interesting fact about dominant strategies: all dominant strategies behave exactly the same on pending histories (that do not extend winning ones). In other words, the difference between two dominant strategies is only in how they win from winning histories, or lose from losing histories. This is the basis of the following characterization of when dominant strategies exist: Theorem 9. The agent has a dominant strategy for ϕ under E iff for every history h that ends in an environment move: (a) either valϕ|E(h ) = 1 for some preﬁx h of h, or (b) valϕ|E(h Y ) = 0 for at most one agent move Y .

Proof. For the forward direction, let σ be a dominant strategy, and take some history h ending in an environment

move. Consider the longest preﬁx h HE(σ) of h ending in an environment move (h always exists since the empty history is in HE(σ)). If valϕ|E(h ) = 1 then (a) holds. If valϕ|E(h ) { 1, und} then the same is true for all of the extensions of h and (b) holds. If valϕ|E(h ) = 0, by Theorem 8 we have that valϕ|E(h Y ) = 1 for every Y = σ(h ), that valϕ|E(h σ(h )) = 0, and that if h = h

then (b) holds and we are done. If, however, h = h , then either h extends h Y for Y = σ(h ), in which case h and all its extensions are also losing and (b) holds; or h extends h σ(h ) and thus (recall that h ends in an environment move) there is some environment move X such that h .= h σ(h ) X is a preﬁx of h. Observe that h is consistent with σ, and by our choice of h we have that h HE(σ). This implies that ΣE,h env = , i.e., that h is not consistent with any strategy in ΣE env. The same is obviously true for any extension of h and thus, for any agent move Y , we have that valϕ|E(h Y ) = und, and (b) holds. For the other direction, assume (a) and (b) above hold. Given a history h, observe that : if valϕ|E(h) = 0 then it must be that (b) holds (since extensions of winning histories cannot be pending). Let σ be a best-effort strategy (by Theorem 2 there is such a strategy), and apply Proposition 7 to σ in order to obtain, together with , that the conditions of Theorem 8 hold, and thus σ is dominant.

We remark that in the statement of Theorem 9, the case where valϕ|E(h) is undeﬁned falls under condition (b).

Computing Dominant Strategies The main computational problem addressed in this work is to establish the complexity and algorithms for deciding if a dominant strategy exists and in this case ﬁnding one.

Deﬁnition 6 (Computational problem). Given an LTL/LTLf goal ϕ and environment speciﬁcation E, the dominant synthesis problem under environment speciﬁcations is to ﬁnd an agent strategy that is dominant for ϕ under E, or say there is none.

Theorem 10. The dominant synthesis problem for LTL/LTLf goals, even without env. speciﬁcations, is 2EXPTIME hard. This is true even for the problem of deciding the existence of dominant strategies.

Proof. We will show the latter statement, from which the former follows. The proof is by reduction the LTL/LTLf synthesis problem, known to be 2EXPTIME hard (Pnueli and Rosner 1989; De Giacomo and Vardi 2015). Assume w.l.o.g. that x X and y, y Y are some of the agent and environment variables. Given a goal ϕ, consider the formula ψ .= (y X ϕ) ( y (y X x)). Intuitively, ψ says that if the agent asserts y in his ﬁrst move then he wants to play for ϕ, and otherwise he wants to play rigged matching pennies . Consider the history h .= {x} of the single environment move asserting x (this choice is arbitrary, and any other move will do). Observe that this history is winning for ψ iff the agent can enforce ϕ. Furthermore, if the agent cannot enforce ϕ then there are at least two actions Y (namely Y .= { y, y }, Y .= { y, y }) with

valϕ|E(h Y ) = 0. Hence, by Theorem 9, there is a dominant strategy for ψ iff there is an enforcing strategy for ϕ. Note that for LTLf, the agent may terminate the matching pennies game too early, and thus we replace the second conjunct by y (X true (y X x)).

In the remainder of this section we develop algorithms for solving the dominant synthesis problem under environment speciﬁcations for LTL/LTLf. Our algorithm returns an enforcing strategy if there is one, a dominant strategy if there is one, and a best-effort strategy otherwise.

Deterministic Transition Systems. A deterministic transition system D = (Σ, Q, ι, δ) consists of a ﬁnite input alphabet Σ (typically, Σ = 2AP ), a ﬁnite set Q of states, an initial state ι Q, and a transition function δ : Q Σ Q. The size of D is the number of its states. Let α = α0α1 be a ﬁnite or inﬁnite sequence of letters in Σ. The run (aka path) induced by α is the sequence q0q1 of states where q0 = ι and qi+1 = δ(qi, αi) for every i < |α|. We extend δ to the function Q Σ Q as follows: δ(q, λ) = q, and for n > 0, if qn = δ(q, α0 αn 1) then δ(q, α0α1 αn) = δ(qn, αn).

Deﬁnition 7. The product of two transition systems Di = (Σ, Qi, ιi, δi) (for i = 1, 2) over the same input alphabet is the transition system D1 D2 = (Σ, Q, ι, δ) where Q = Q1 Q2; ι = (ι1, ι2); and δ((q1, q2), x) = (δ(q1, x), δ(q2, x)). The product D1 D2 Dk can be similarly deﬁned for any ﬁnite sequence D1, D2, , Dk of transition systems over the same input alphabet Σ.

Automata. A Deterministic Finite Word automaton (DFA) A = (D, F) consists of a deterministic transition system D and a set F Q of accepting states. A ﬁnite run ρ = q0q1 qn in D is called accepting if qn F. A ﬁnite string α Σ is accepted by A iff its run is accepting. The set of ﬁnite strings accepted by A is the language of A. A Deterministic Parity Word automaton (DPA) A = (D, c) consists of a deterministic transition system D and a coloring function c : Q Z. The index of A is the number of integers in the image of c, i.e., |{n Z | c 1(n) = }|. An inﬁnite run ρ = q0q1 in D satisﬁes c (aka accepted) iff the smallest n such that c(qi) = n for inﬁnitely many i is even. An inﬁnite string α Σω is accepted by A iff the run induced by it satisﬁes c. The set of inﬁnite strings accepted by A is the language of the A.

Theorem 11 (Formulas to Automata). (cf. (Vardi 1995) and (De Giacomo and Vardi 2013))

1. One can build a DFA Aϕ, accepting exactly the models of an LTLf formula ϕ, whose size is at most 2EXP in |ϕ|. 2. One can build a DPA Aϕ, accepting exactly the models of an LTL formula ϕ, whose size is at most 2EXP in |ϕ| and whose index is at most EXP in |ϕ|.

We can lift the acceptance conditions of components of D to D as follows:

Deﬁnition 8. For k-many DPAs Ai = (Di, ci) (or DFAs Ai = (Di, Fi)), and writing D for the product of their transition systems D1, , Dk, deﬁne the lifting of cj to

D (resp. lifting of Fj to D) to be the coloring function dj : Q Z deﬁned by dj(q1, , qk) .= cj(qj) (resp. the set Q1 Qi 1 Fi Qi+1 Qk). Note that (D, dj) is a DPA (resp. (D, Fj) is a DFA), and its language is the same as the language of the DPA (Dj, cj) (resp. DFA (Dj, Fj)). We sometimes limit the environment to moves that do not leave a set Q of states. We do this by adding a sink-state, and redirecting to it moves of the environment that start in Q and for which some response of the agent leaves Q : Deﬁnition 9. Let D = (Σ, Q, ι, δ) be a transition system and let Q Q be a set of states. The environmentrestriction of D to Q is the transition system (Σ, Q {sink}, ι, δ ), where δ agrees with δ except that δ (q, (X Y )) = sink in case q = sink, or q Q and δ(q, (X Z)) Q for some Z Y.

Games on Deterministic Automata. It is known that solving synthesis (without environment speciﬁcations) for an LTL/LTLf goal ϕ can be reduced to solving a two-player game of perfect information on the DPA/DFA corresponding to ϕ. We describe this process below since we will later generalize it. The contents of this section are an adaptation of standard material on games-graphs (Apt and Gr adel 2011). Informally, the current position in the game is a state q of the automaton, ﬁrst the environment moves by setting X X, then the agent follows by setting Y Y, and the position in the game is updated to the state δ(q, (X Y )). This interaction generates an inﬁnite run, and the agent is declared the winner if the run is accepting. Formally, a DPA-game (resp. a DFA-game) is played on a DPA A = (D, c) (resp. DFA A = (D, F)), by two players: agent and environment. The transition system D = (2X Y, Q, ι, δ) is called the arena, and the acceptance condition of A is called the objective. An agent strategy σag is winning if for every environment strategy σenv, the trace PLAY(σag, σenv) is accepted by A. Similarly, an environment strategy σenv is winning if for every agent strategy σag, the trace PLAY(σag, σenv) is not accepted by A (the objective is from the agent s point of view). A player s winning region in the game on A is the set of states q for which that player has a winning strategy in the game with the same objective but on the arena (2X Y, Q, q, δ), i.e., starting from q. A strategy winning from every state in the winning region is called uniform winning. If the agent makes its moves just by looking at the current environment move and the current state of the automaton (instead of the exact full history of environment moves), then we say that the agent is using a positional strategy. Formally, a function fag : Q 2X 2Y induces the positional agent strategy σag: σag(X0) = fag(ι, X0), and for n > 0 and α = X0Y0X1Y1 Xn 1Yn 1, deﬁne σag(α Xn) = fag(qn, Xn) where qn .= δ(ι, α). We can similarly deﬁne environment positional strategies as functions fenv : Q 2X. The relevant computational problem associated with a DPA/DFA-game is to compute, for a given player, that player s winning region W, as well as a uniform winning positional strategy for that player. We call this solving the

game. This can be done for DPA/DFA games by a simple ﬁxed-point algorithm.

Theorem 12 (cf. (Apt and Gr adel 2011)). DFA-games can be solved in time polynomial in the size of the given DFA A; DPA-games can be solved in time polynomial in the size, and EXP in the index, of the given DPA A.

We also consider the case where the agent and environment co-operate. Formally, a pair of strategies σag, σenv is co-operatively winning if the trace PLAY(σag, σenv) is accepted by the automaton A. The co-operative winning region W of a DPA/DFA-game A is the set of states q for which there is a pair of strategies that are co-operatively winning in the game played on (Σ, Q, q, δ, c), i.e., starting from q. Solving a co-operative DPA/DFA-game on A means to ﬁnd the set W , as well as a pair of uniform positional strategies (although we will only use the agent s) that co-operatively win from every state in W . Note that this amounts to solving the emptiness problem for A.

Theorem 13 (cf. (Apt and Gr adel 2011)). Co-operative DFA-games can be solved in time polynomial in the size of the given DFA A; Co-operative DPA-games can be solved in time polynomial in the size and index of the given DPA A.

Uniﬁed Synthesis Algorithm We now present our algorithms for solving the dominant synthesis problem under environment speciﬁcations for LTL/LTLf. We present both algorithms together as a single algorithm since the steps are the same, and the only difference between LTL and LTLf is handled by the type of automata and games we use. The algorithm is an extension of the best-effort synthesis algorithm from (Aminof, De Giacomo, and Rubin 2021), and the fact that it correctly ﬁnds a best-effort strategy is shown there. We enhance this algorithm by adding checks to see if an enforcing or a dominant strategy exist; by Theorem 1, if an enforcing (resp. dominant) strategy exists then the best-effort strategy found is enforcing (resp. dominant). Unifying Algorithm. Given LTL/LTLf formulas ϕ, E:

1. Using Theorem 11, for every ξ { E, E ϕ, E ϕ} compute the DPAs (resp. DFAs) Aξ = (Dξ, Fξ). 2. Using Deﬁnition 7 form (in linear time) the product D = D E DE ϕ DE ϕ. Using Deﬁnition 8, lift the ﬁnal states of each component to the product, e.g., the lifted condition G E consists of all states (q E, q E ϕ, q E ϕ) Q s.t. q E F E. Let δ denote D s transition function. 3. Use Theorem 12, compute a uniform positional winning agent strategy fag in the DPA-game (resp. DFA-game) (D, FE ϕ). Let W Q be the agent s winning region. 4. If the initial state of D is in W then set Enforce Flag = true, and goto step 8. 5. Use Theorem 12, ﬁnd the environment s winning region V Q in the DPA-game (resp. DFA-game) (D, F E). 6. Using Deﬁnition 9, compute (in linear-time) the environment-restriction D of D to the set V . 7. Using Theorem 13, compute a co-operatively winning uniform positional strategy gag in the DPA-game (resp.

DFA-game) (D , FE ϕ). Let W Q be the cooperative winning region. 8. Compute the agent strategy σag induced by the positional strategy kag : Q 2X 2Y deﬁned as follows: If Enforce Flag is true then deﬁne kag(q, X ) = fag(q, X ) for q W, and kag(q, X ) is arbitrary for q W; otherwise, deﬁne kag(q, X ) = fag(q, X ) if q W, and kag(q, X ) = gag(q, X ) for q W. 9. If Enforce Flag is true, then return σag and the statement the strategy σag is enforcing . 10. Otherwise, if there is some state q Q, reachable from the initial state, such that (a) there is no path from the initial state to q that visits a state in W; and (b) there is some env. move X and two agent moves Y , Y such that both δ(q, X Y ) and δ(q, X Y ) are in W , then return σag and the statement the strategy σag is best-effort, and there is no dominant strategy . These conditions can be checked in polynomial time in the size of D. 11. Otherwise return σag and the statement the strategy σag is dominant, and there is no enforcing strategy . The last two steps of the algorithm are correct by Theorem 9 and Theorem 1. The algorithm above, together with Theorem 10, give us: Theorem 14. Dominant synthesis under environment speciﬁcations (for LTL/LTLf goals and environment speciﬁcations) is 2EXPTIME-complete.

Related Work We have discussed the closely related work on (classic) synthesis (Pnueli and Rosner 1989), synthesis under environment speciﬁcations (Aminof et al. 2019), and best-effort synthesis (Aminof, De Giacomo, and Rubin 2021). We now discuss trace-based variants of the notion of dominant synthesis. (Damm and Finkbeiner 2011, 2014; Finkbeiner and Passing 2020) study remorse-free dominance, which can be expressed using our terminology as follows: for an agent goal ϕ, deﬁne σ1 tr σ2 if PLAY(σ2, σenv) |= ϕ implies PLAY(σ2, σenv) |= ϕ for every oblivious strategy σenv; and dominant strategies are deﬁned to be maximums wrt tr. They provide an optimal automata-theoretic algorithm that solves synthesis of dominant strategies for LTL goals; prove that a dominant strategy exists iff there is a unique weakest environment assumption that would guarantee a winning strategy; and apply this to provide an incremental algorithm for solving distributed synthesis which ﬁnds a dominant strategy for one process at a time and propagating the assumption to the processes to be synthesized. An identical notion called good-enough synthesis is studied in (Almagor and Kupferman 2020) (although these papers deﬁne their notions differently, they can be easily seen to be equivalent by a slight variant of Proposition 3) mainly in a multi-valued setting, and in (Li et al. 2021) in the single-agent setting for LTLf goals. We remark that the restriction to oblivious environment strategies inherent in these trace-based works is severe, and results in different properties then we have in our much broader setting.

Acknowledgments This work is partially supported by the Austrian Science Fund (FWF) P 32021, by the ERC Advanced Grant White Mech (No. 834228), by the EU ICT-48 2020 project TAILOR (No. 952215), and by the PRIN project RIPER (No. 20203FFYLK).

References Almagor, S.; and Kupferman, O. 2020. Good-Enough Synthesis. In Lahiri, S. K.; and Wang, C., eds., CAV, volume 12225 of Lecture Notes in Computer Science, 541 563. Springer. Aminof, B.; De Giacomo, G.; Lomuscio, A.; Murano, A.; and Rubin, S. 2020. Synthesizing strategies under expected and exceptional environment behaviors. In IJCAI. Aminof, B.; De Giacomo, G.; Lomuscio, A.; Murano, A.; and Rubin, S. 2021. Synthesizing Best-effort Strategies under Hierarchical Environment Speciﬁcations. In KR. Aminof, B.; De Giacomo, G.; Murano, A.; and Rubin, S. 2018. Synthesis under Assumptions. In KR. Aminof, B.; De Giacomo, G.; Murano, A.; and Rubin, S. 2019. Planning under LTL Environment Speciﬁcations. In ICAPS. Aminof, B.; De Giacomo, G.; and Rubin, S. 2021. Best Effort Synthesis: Doing Your Best Is Not Harder Than Giving Up. In IJCAI. Apt, K.; and Gr adel, E. 2011. Lectures in game theory for computer scientists. Cambridge. Bacchus, F.; and Kabanza, F. 2000. Using temporal logics to express search control knowledge for planning. Artif. Intell., 116(1-2). Baier, J. A.; and Mc Ilraith, S. A. 2006. Planning with First Order Temporally Extended Goals using Heuristic Search. In AAAI. Berwanger, D. 2007. Admissibility in Inﬁnite Games. In STACS. Church, A. 1963. Logic, arithmetics, and automata. In Proc. Int. Cong. Mathematicians, 1962. Cimatti, A.; Pistore, M.; Roveri, M.; and Traverso, P. 2003. Weak, Strong, and Strong Cyclic Planning via Symbolic Model Checking. AIJ, 1 2(147). Damm, W.; and Finkbeiner, B. 2011. Does It Pay to Extend the Perimeter of a World Model? In FM 2011. Springer. Damm, W.; and Finkbeiner, B. 2014. Automatic Compositional Synthesis of Distributed Systems. In FM. Daniele, M.; Traverso, P.; and Vardi, M. 1999. Strong Cyclic Planning Revisited. In ECP. De Giacomo, G.; and Vardi, M. 2015. Synthesis for LTL and LDL on ﬁnite traces. In IJCAI. De Giacomo, G.; and Vardi, M. Y. 2013. Linear Temporal Logic and Linear Dynamic Logic on Finite Traces. In IJCAI. Faella, M. 2009. Admissible Strategies in Inﬁnite Games over Graphs. In MFCS. Finkbeiner, B.; and Passing, N. 2020. Dependency-Based Compositional Synthesis. In ATVA.

Geffner, H.; and Bonet, B. 2013. A Coincise Introduction to Models and Methods for Automated Planning. Morgan & Claypool. Green, C. 1969. Theorem Proving by resolution as basis for question-answering systems. In Machine Intelligence, volume 4, 183 205. American Elsevier. Li, Y.; Turrini, A.; Vardi, M. Y.; and Zhang, L. 2021. Synthesizing Good-Enough Strategies for LTLf Speciﬁcations. In IJCAI, 4144 4151. ijcai.org. Martin, D. A. 1975. Borel determinacy. Annals of Mathematics, 363 371. Mc Carthy, J.; and Hayes, P. J. 1969. Some Philosophical Problems From the Stand Point of Artiﬁcial Intelligence. Machine Intelligence, 4: 463 502. Pnueli, A.; and Rosner, R. 1989. On the Synthesis of a Reactive Module. In POPL. Vardi, M. Y. 1995. An Automata-Theoretic Approach to Linear Temporal Logic. In Moller, F.; and Birtwistle, G. M., eds., Logics for Concurrency - Structure versus Automata, volume 1043 of LNCS.