# moral_planning_agents_with_ltl_values__bfbdbeb5.pdf Moral Planning Agents with LTL Values Umberto Grandi , Emiliano Lorini , Timothy Parker IRIT, CNRS, University of Toulouse, Toulouse, France {umberto.grandi,emiliano.lorini,timothy.parker}@irit.fr A moral planning agent (MPA) seeks to compare two plans or compute an optimal plan in an interactive setting with other agents, where relative ideality and optimality of plans are defined with respect to a prioritized value base. We model MPAs whose values are expressed by formulas of linear temporal logic (LTL) and define comparison for both joint plans and individual plans. We introduce different evaluation criteria for individual plans including an optimistic (risk-seeking) criterion, a pessimistic (risk-averse) one, and two criteria based on the use of anticipated responsibility. We provide complexity results for a variety of MPA problems. 1 Introduction Evaluation is a core concept in cognitive theories of action [Gollwitzer, 1996], emotion [Moors et al., 2013], knowledge and beliefs [Abelson, 1979], and in the connection between epistemic and motivational attitudes [Miceli and Castelfranchi, 2000]. It is the operation of comparing the goodness or desirability of options from a given set of alternatives in relation to a set of goals. It also important for ethics where the philosophical doctrine of pluralistic consequentialism [Sen, 1985; Sen, 1987] and recent theories of reason-based choice [Dietrich and List, 2013; Dietrich and List, 2017] have emphasized its crucial role in decision-making. According to pluralistic consequentialism, a moral agent has to weigh different and sometimes conflicting values to assess the relative ideality of different alternatives. Evaluation also plays a pivotal role is some existing computational models of ethical deliberation and planning in robotics [Arkin et al., 2012; Vanderelst and Winfield, 2018] and in AI [Rodriguez-Soto et al., 2020; Serramia et al., 2018; Ajmeri et al., 2020; Cranefield et al., 2017]. More recently, it was also formalized in a multi-modal logic of values, knowledge and preferences [Lorini, 2021]. In this paper we study the role of evaluation in a novel multi-agent planning setting where agents share a set of ethical values expressed by formulas of linear temporal logic (LTL), ranked according to their priority. In particular, we focus on evaluation in the context of moral planning agents (MPAs). MPAs compare plans, which are either joint plans or individual plans, lexicographically with respect to their prioritized value base. A joint plan is a finite sequence of joint actions performed by a coalition of agents, while an individual plan is a finite sequence of individual actions performed by a single agent. The notion of joint plan is applicable whenever planning is delegated to a central planner which has to compute the optimal solution for all agents who will each execute their part of the joint plan on their own. Alternatively, the notion of individual plan is relevant for decentralized applications where agents do not know each other s plans. In order for an agent to compare individual plans, it must consider all possible outcomes given the possible actions of the other agents. We study several evaluation criteria for individual plans: an optimistic (risk-seeking) criterion, a pessimistic (risk-averse) one, and criteria based on intrinsically ethical notions of responsibility. An optimistic agent compares individual plans by considering for each plan the bestpossible history that could result. A pessimistic agent considers only the worst-possible histories. Finally, an agent sensitive to anticipated blameworthiness will be concerned by the possible violation of some ethical values that it brought about (anticipated active blameworthiness) or that it could have prevented (anticipated passive blameworthiness). These two notions of blameworthiness rely on more primitive concepts of active and passive responsibility. Similar notions of optimism and pessimism have been been studied in the domain of qualitative decision theory [Brafman and Tennenholtz, 2018], while concepts of responsibility have been formalized in logical settings [Lorini et al., 2014], in causal models [Chockler and Halpern, 2004] and game-theoretic settings [Braham and van Hees, 2012; Lorini and M uhlenbernd, 2015; Lorini and M uhlenbernd, 2018]. However, to the best of our knowledge, the use of responsibility and blameworthiness in plan evaluation is novel. The paper is organised as follows: Section 2 defines our model of multi-agent planning and introduces an illustrative example of our model in action. Section 3 describes how we evaluate joint plans and demonstrates this in our example. Section 4 focuses on individual plans, introducing and discussing optimistic and pessimistic comparison, as well as our notion of blameworthiness. Section 5 analyses the computational complexity of the comparison notions introduced in Section 4. Section 6 situates our paper in the wider fields of ethics in AI and planning, and compares our work to a Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23) number of similar papers. Finally Section 7 summarises the paper and suggests directions for future work. 2 The Model In this section we introduce and define our planning framework for Moral Planning Agents. Our model is grounded on the logical theory of ethical choice presented by Lorini [2015], which is in turn based on situation calculus [Reiter, 2001]. We use a compact representation of values as formulas of linear temporal logic, in line with the work of Bienvenu et al. [2006] on preference-based temporal planning. 2.1 Agents, Actions, and Histories Let Agt be a set of agents, and let Prop be a countable set of atomic propositions, defining a set of states S = 2Prop, with elements s, s , . . .. Let Act be a finite non-empty set of action names. Elements of Prop are noted p, q, . . ., while elements of Act are noted a, b, . . .. We also assume the existence of a special action skip. We define a k-history to be a pair H = (Hst, Hact) with Hst : {0, . . . , k} S and Hact : Agt {0, . . . , k 1} Act. A history specifies the actual configuration of the environment (i.e., the state) at a certain time point and the actions executed by each of the agents that inform the next state. The set of k-histories is noted Histk. The set of all histories is Hist = S k N Histk. 2.2 Multi-Agent Action Theory To represent actions effects and preconditions compactly, we use a multi-agent action theory inspired by situation calculus [Reiter, 2001] and developed by Lorini [2015]. Let LPL be the standard propositional language built from atomic formulas p for p Prop and do(i, a) where i Agt and a Act. We suppose actions in Act are described by an action theory γ = (γ+, γ ), where γ+ and γ are, respectively, the positive and negative effect precondition function γ+ : Agt Act Prop LPL and γ : Agt Act Prop LPL. The fact γ+(i, a, p) guarantees that proposition p will be true in the next state when action a is executed by agent i (provided no other action interferes), while γ (i, a, p) guarantees that proposition p will be false in the next state when action a is executed by i (without interference). In case of conflicts between actions, we use an intertial principle: if γ+(i, a, p) and γ (j, b, p) are both true at a given state and actions a and b (which may be the same action) are executed by agents i and j (which may also be the same), then the truth value of p will not change in the next state. Furthermore, in cases where not all actions are available to all agents we can simply set γ+(i, a, p) = γ (i, a, p) = for all p Prop to signal that action a is not available to agent i. We also the assume the existence of the special action skip, such that γ+(i, skip, p) = γ (i, skip, p) = for all i and p. Equivalently, we could define skip as follows: γ+(i, skip, p) = p and γ (i, skip, p) = p for all i and p. Definition 1 (Action-compatible histories). Let γ = (γ+, γ ) be an action theory and let H = (Hst, Hact) be a k-history. We say H is compatible with γ if the following condition holds for every t {0, . . . , k 1}, Hst(t + 1) = (Hst(t) \ X) Y where: X = p Prop : i Agt, a Act such that Hact(i, t) = a and H, t |= γ (i, a, p) and j Agt, b Act if Hact(j, t) = b then H, t |= γ+(j, b, p) Y = p Prop : i Agt, a Act such that H, t |= γ+(i, a, p) and j Agt, b Act if Hact(j, t) = b then H, t |= γ (j, b, p) . In words, a history H is compatible with the action theory γ = (γ+, γ ) if its state transition respects the theory. This means that, all propositional facts for which the negative effect precondition of an executed action hold, while the positive effect preconditions of all executed actions do not hold, become false. In addition, all propositional facts for which the positive effect precondition of an executed action holds, while the negative effect preconditions of all executed actions do not hold, become true. All other propositional facts do not change their truth values. 2.3 Action Sequences and Joint Plans Let us now move from the notion of action to the notion of plan. Given k N, a k-action-sequence is a function π : {0, . . . , k 1} Act. The set of k-action-sequences is noted Seqk. For a (nonempty) coalition of agents J 2Agt \ we can define a joint k-plan as a function Π : J Seqk. If J is a singleton set then we call Π an individual plan. The set of joint k-plans for a coalition J is written Plan J k. The set of all joint plans for a non-empty coalition J is Plan J = S k N Plan J k. Given a joint plan Π for coalition J and another sub-coalition J J, we can write the joint plan of coalition J in Π as ΠJ , we can also write Π J for the joint plan of sub-coalition J \ J . Definition 2 (Plan Compatibility). Given Π1 Plan J k1 and Π2 Plan J k2 for sub-coalition J J and k2 k1, we say that Π1 is compatible with Π2 if Π1(j)(t) = Π2(j)(t) t {1, . . . , k2}, j J and Π1(j)(t) = skip t {k2 + 1, . . . , k1}, j J Given two k-plans Π1 and Π2 for disjoint coalitions J1, J2, we write Π1 Π2 (the combination of Π1 and Π2) for the shortest joint plan for J1 J2 such that (Π1 Π2)J1 is compatible with Π1 and (Π1 Π2)J2 is compatible with Π2. The following definition introduces the notion of history generated by a joint k-plan Π at an initial state s0. It is the action-compatible k-history along which the agents jointly execute the plan Π starting at state s0. Definition 3 (History generated by a joint k-plan). Let γ = (γ+, γ ) be an action theory, s0 S and Π Plan Agt k . Then, the history generated by Π from state s0 in conformity Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23) with γ is the k-history HΠ,s0,γ = (HΠ,s0,γ st , HΠ,s0,γ act ) such that: i) HΠ,s0,γ Hist(γ), ii) HΠ,s0,γ st (0) = s0, iii) t s.t 0 t k 1, i Agt : HΠ,s0,γ act (i, t) = Π(i)(t). Example 1 (Toy-sharing). Consider a childcare robot that shares a set of four toys between two children. We model this by supposing that there three are agents, called Adam, Beth and Rob (the robot). The available actions are the skip action and 24 actions1written in the form Move(i, j, A), where the agent attempts to move toy A from agent i to agent j (where i = j). There are 12 atomic propositions, each written Has(i,A) representing that agent i has toy A. The set of toys is written toys. The action theory γtoys states that the action Move(i, j, A) succeeds exactly when toy A is with agent i and no other agent attempts to take toy A from agent i. In case of such a conflict, the toy remains where it is. The full formalisation of γtoys is in the supplementary material. Consider an initial state with only two toys, both of which are held by Rob, that is s0 = {Has(Rob,1), Has(Rob,2)}. Consider the following joint 2-plan Π1: Rob: [0 7 Move(Rob,Adam,1), 1 7 Move(Rob,Beth,2)], Adam: [0 7 skip, 1 7 skip], Beth: [0 7 skip, 1 7 skip] In this plan Rob gives one toy to each child in turn, meaning that the final state will be s2 = {Has(Adam,1), Has(Beth,2)}. 2.4 Linear Temporal Logic In order to represent agents values in a temporal planning situation, we introduce the language of LTLf (Linear Temporal Logic over Finite Traces) [Pnueli, 1977; De Giacomo and Vardi, 2013; De Giacomo and Vardi, 2015], which we denote LLTLf , defined by the following grammar: φ ::= p | do(i, a) | φ | φ φ | Xφ | φ U φ, with p Prop, i Agt and a Act. Atomic formulas in this language are those that consist of a single proposition p or a single instance of do(i, a). X and U are the operators next and until of LTLf. Operators henceforth (G) and eventually (F) are defined in the usual way: Gφ def = ( U φ) and Fφ def = G φ. Semantic interpretation of formulas in LLTLf relative to a k-history H Hist and a time point t {0, . . . , k} goes as follows (we omit boolean cases which are defined as usual): H, t |=p p Hst(t), H, t |=do(i, a) t < k AND Hact(i, t) = a H, t |=Xφ t < k AND H, t + 1 |= φ, H, t |=φ1 U φ2 t t : t k AND H, t |=φ2 AND t t : IF t < t THEN H, t |= φ1. Given a set of LLTLf -formulas Σ, we define Sat(Σ,H) = {φ Σ : H, 0 |= φ} to be the set of formulas from Σ that are true in history H. Similarly, Sat(Σ,Π,s0,γ) = Sat(Σ,HΠ,s0,γ) for joint plan Π starting from state s0 under the action theory γ. 2.5 Planning with Moral Agents Moral values are not always consistent and occasionally conflict with each other [Mc Connell, 2022]. Furthermore, it is more serious to violate some values than others (murder is worse than lying). Therefore define the concept of a value base as an ordered sequence of sets of values (written as LLTLf -formulas) Ω1, ..., Ωn, with Ω1 containing the most important values and Ωn the least. Definition 4 (Moral Planning Agent Problem). A moral planning agent problem (MPAP) is a tuple = (γ, s0, Ω) where γ = (γ+, γ ) is an action theory, s0 is an initial state, and Ω= (Ω1, . . . , Ωm) is a value base with Ωk LLTLf for every 1 k m. We assume that all agents have a single, shared value base. This value base is used to compute the relative ideality of histories, namely, whether a history H1 is at least as ideal as another history H2. Following Lorini [2021], we call evaluation the operation of computing an ideality ordering over plans from a value base. Inspired by work in preference representation languages [Lang, 2004] and preference-based temporal planning [Bienvenu et al., 2006; Grandi et al., 2022], we define the following qualitative criterion of evaluation, noted qual , which compares two histories lexicographically on the basis of inclusion between sets of satisfied values. Definition 5 (Qualitative ordering of histories). Let = (γ, s0, Ω) be an MPAP with value base Ω= (Ω1, . . . , Ωm) and H1, H2 Histk two γ-compatible histories. Then, H1 qual H2 if and only if: i) n s.t 1 n m and Sat(Ωn,H1) Sat(Ωn,H2) and n if 1 n < n then Sat(Ωn ,H1) = Sat(Ωn ,H2); or ii) n if 1 n m then Sat(Ωn,H1) = Sat(Ωn,H2) This method of evaluation compares histories by checking if either history satisfies a strict subset of the values satisfied by the other history at priority level 1. If they satisfy exactly the same set of values then we compare instead according to values of priority level 2, and so on. If at some point both histories satisfy different sets of values and neither is a subset of the other, then the two histories are incomparable according to qualitative ordering. Since incomparability is not always desirable we can also define a notion of quantitative comparison quant . This is equivalent to qualitative comparison except that we compare the number of values satisfied by each history at each priority level. This ensures that any two histories will always be comparable. 2.6 From Values to LTL Formulas So far we have said very little about either the source of these values forming a value base (whether they should be imposed from above or learned by the agent) or their nature (whether Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23) they should be consequentialist, deontological, or something else). This is a deliberate choice, as we want our model to be able to accommodate a wide variety of different moral agents and their values (some of which may be practical or social rather than moral). While defining a general method for formalising moral values in LTLf would be outside the scope of this paper, we will give an example of how this could be done in practice, by formalising the value of equality. More specifically, when confronted with situations where a finite set of objects It are distributed amongst a finite group of agents J, equality requires an equal distribution (at the end of the plan). Assuming the existence of some general ownership atom Has(i,A), we can formalise such a value as EQUALITY(It,J) = FG Eq Share(It,J), where: Eq Share(It,J) def = W n |It| V j J At Least(It,j,n) At Least(It,j,n + 1) At Least(It,j,n) def = W I It:|I|=n V A I Has(j,A) Example 2 (Toy sharing - continued). We suppose that the shared value base Ωtoys contains three values. Firstly, both children must be given at least one toy (subsistence). Secondly, nobody should take a toy away from someone else, though they may give away their own toys (property). Finally, both children should have the same number of toys (equality). The formalisation of these values is as follows (where Ω1 is subsistence, Ω2 is property and Ω3 is equality): Ω1 = FG(At Least(toys, Adam, 1), FG(At Least(toys, Beth, 1) i =j,j =l,A [1,4] do(i, Move(j, l, A))) Ω3 = FG Eq Share(toys, {Adam, Beth}) Note the different temporal characteristics of these values. Substistence and equality both describe a state that must be satisfied at the end of the plan, whereas property must be satisfied at all points in the plan. Note also how our model is able to handle both consequentialist values (subsistence and equality both describe states of affairs that must hold at some point) and deontological values (property forbids agents from taking certain kinds of actions). 3 Evaluating Joint Plans Once we fix a moral agent planning problem , every joint plan Π induces a single generated history HΠ,s0,γ (recall Definition 3). Thus, we can lift Definition 5 of comparison among histories to comparison among joint plans. Definition 6. Let = (γ, s0, Ω) be an MPAP and let Π1, Π2 Plan Agt be two joint plans for Agt. Then, Π1 qual Π2 if and only if HΠ1,s0,γ qual HΠ2,s0,γ. Similarly, Π1 quant Π2 if and only if HΠ1,s0,γ quant HΠ2,s0,γ. Given an MPAP = (γ, s0, Ω) and a joint plan Π Plan Agt we say that Π is k-non-dominated for if there is no plan Π of length at most k such that Π qual Π and Π qual Π (an equivalent definition can be given using quant ). In conclusion, comparing joint plans with reference to a value base is conceptually equivalent to the problem of comparing two histories. Thus, we can import the analysis and the computational complexity results of Grandi et al. [2022], who showed that the problem of comparing singleagent plans (and hence histories) can be solved in polynomial time, while the problem of testing whether a given plan or history is non-dominated is PSPACE-complete. Example 3 (Toy sharing - continued). Consider again the initial state with only two toys, both of which are held by Rob, s0 = {Has(Rob,1), Has(Rob,2)}. In this instance there are several joint plans that satisfy all values (and therefore are, by necessity, non-dominated). The first is the plan that we introduced before: Rob: [0 7 Move(Rob,Adam,1), 1 7 Move(Rob,Beth,2)], Adam: [0 7 skip, 1 7 skip], Beth: [0 7 skip, 1 7 skip] It is straightforward to see that this plan satisfies the values of subsistence, property and equality. Consider now a different initial state with only one toy, held by Rob (s0 = {Has(Rob,1)}). In this case there is no plan satisfying all values, as our subsistence values conflict with our equality value. A non-dominated plan will therefore be one that prioritises values according to the lexicographic ordering. In this case, the best Rob can do is to satisfy the property value and one of the subsistence values: Π2 = (Rob: [0 7 Move(Rob,Beth,1)), Adam: [0 7 skip], Beth: [0 7 skip]) However, if we change the value base by swapping the contents of Ω1 and Ω3, then Π2 is dominated by the following plan, which would itself be non-dominated (as now the best option for Rob is to keep the toy and satisfy equality): Rob: [0 7 skip], Adam: [0 7 skip], Beth: [0 7 skip] 4 Evaluating Individual Plans Evaluating joint plans is helpful whenever agents are able to coordinate, for example in presence of a central planner. Otherwise, agents may not have any knowledge about the other agents plans. Meaning that their individual plans may be part of many possible joint plans, meaning many possible histories and many possible sets of satisfied values. 4.1 Optimistic and Pessimistic Comparison Two intuitive ways of comparing individual plans under complete ignorance of the other agents plans are by comparing according to the best-case outcome of that individual plan (risk-seeking) or the worst-case outcome of that individual plan (risk-averse). In line with Lorini [2015], we call these methods of comparison optimistic and pessimistic . Definition 7 (Optimistic k-comparison). Let = (γ, s0, Ω) be an MPAP, i Agt, and Π1, Π2 be individual plans. Let k be an integer greater than or equal to the length of both Π1 and Π2. Then, Π1 opt i,k Π2 iff Π 2 Plan Agt k s.t. Π 2 is Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23) compatible with Π2 and Π 1 Plan Agt k s.t Π 1 is compatible with Π1 we have Π 1 quant Π 2. Definition 8 (Pessimistic k-comparison). Let = (γ, s0, Ω) be an MPAP, i Agt, and Π1, Π2 be individual plans. Let k be an integer greater than or equal to the length of both Π1 and Π2. Then, Π1 pess i,k Π2 iff Π 1 Plan Agt k s.t Π 1 is compatible with Π1 and Π 2 Plan Agt k s.t Π 2 is compatible with Π2 we have Π 1 quant Π 2. The reading of the plan comparison statement Π1 opt i,k Π2/Π1 pess i,k Π2 is agent i s k-plan Π2 is optimistically/pessimistically at least as ideal as agent i s k-plan Π1 . We use the standard notation of writing Π1