# topological_planning_with_postunique_and_unary_actions__840fd89f.pdf

Topological Planning with Post-unique and Unary Actions

Guillaume Pr evost1 , St ephane Cardon1 , Tristan Cazenave2 , Christophe Guettier3 and Eric Jacopin4

1Acad emie Militaire de Saint-Cyr Co etquidan, CRe C Saint-Cyr, France 2Universit e Paris Dauphine - PSL, LAMSADE, CNRS, France 3Safran Electronics and Defense, France 4Hawkswell Studios, France guillaume.prevost@st-cyr.terre-net.defense.gouv.fr

We are interested in realistic planning problems to model the behavior of Non-Playable Characters (NPCs) in video games. Search-based action planning, introduced by the game F.E.A.R. in 2005, has an exponential time complexity allowing to control only a dozen NPCs between two frames. A close study of the plans generated in first-person shooters shows that: (1) actions are unary, (2) actions are contextually post-unique and (3) there is no two instances of the same action in an NPC s plan. By considering (1), (2) and (3) as restrictions, we introduce new classes of problems with the Simplified Action Structure formalism which indeed allow to model realistic problems and whose instances are solvable by a linear-time algorithm. We also experimentally show that our algorithm is capable of managing millions of NPCs per frame.

1 Introduction

In 2005, F.E.A.R. was released and it is the first video game to use action planning for the control in real-time of the Non-Playable Characters (NPCs). The planning is executed through the well-known Goal-Oriented Action Planning (GOAP) system [Orkin, 2005]. The behavior of these NPCs were so relevant that reviews praised the approach when released [Ocampo, 2007], and it is still recognized as such today [Horti, 2017]. This Artificial Intelligence (AI) technique has then be implemented in several other video games such as Rise of the Tomb Raider [Conway, 2015], Middle Earth: Shadow of Mordor [Higley, 2015], Immortals Fenyx rising and the Assassin s Creed series since Odyssey [Girard, 2021]. Action planning is a sub-field of AI that aims to give agents, or NPCs in our case, the capability to build sequences of actions to plan and behave in their environment. Given the description of an initial state, the description of a goal state and the description of an action set, the purpose of a planner is to find a sequence of actions, known as plan, to reach the goal state and that is applicable in the initial state, or to return that no such plan exists. GOAP uses finite domain variables to represent the planning problem of the NPCs. These prob-

lems are decidable but planning remains intractable in general [Bylander, 1994]. In particular, the planning algorithm of GOAP is based on A* whose worst-case time complexity is exponential with the number of actions and the size of the plan. To give a plan to as many NPCs as possible while respecting the real-time constraint, which represents less than 10% of the time between two frames1, and avoiding triggering the worst-case scenarios, the GOAP developers of these game companies have relied on tricks2. Among these tricks are: 1. The action representation is simple. Each action has few preand post-conditions. In F.E.A.R., almost all actions are unary, i.e. each action only has one post-condition [Monolith Productions, 2006]. 2. There are pruning techniques to avoid exploring the entire action search space while planning [Orkin, 2003; Girard, 2021]. In F.E.A.R., each action has a Context Precondition method whose role is to check whether the action is contextually viable. If not, the action is merely withdrawn from the search. It means that even if several actions may have the same post-condition, only some of them (sometimes none) will be considered. 3. Plans are short [Jacopin, 2014]. In Middle Earth: Shadow of Mordor, to keep plans short, Higley explains that some actions are just animations [Higley, 2015]. It means these actions will never be considered by the planner but they will still be animated to make the player believes NPCs have long plans. Even though game companies use these tricks, the planning problems of their NPCs remain intractable. Points 1., 2. and 3. can be used to define assumptions, however:

1. Actions are unary. Each action only has one postcondition.

2. Actions are (contextually) post-unique. Given a situation, no two actions have the same post-condition.

3. Each action has at most one occurrence in the plan.

In this paper, we propose to use these assumptions as restrictions so as to create classes of planning problems that are

1It represents 1, 67ms for a 60 frames per second video game. 2The Software Development Kit of F.E.A.R. is freely available online.

Proceedings of the Thirty-Second International Joint Conference on Artiﬁcial Intelligence (IJCAI-23)

tractable, and with an algorithm capable of solving each instance. The remainder of the paper is organized as follows: we first give a background on action planning restrictions. We then define an NPC planning problem with post-unique and unary actions to introduce the SAS formalism and the issues with these two restrictions. It follows the definition of new tractable classes of planning problems and the introduction of our linear time algorithm capable of solving the problem instances of these classes, along with complexity and correctness theorems. We eventually present a concrete experiment performed on abstract settings to show that our classes of problems allow the creation of realistic problems and to highlight the potential of our planner.

2 Background

The use of restrictions to create tractable classes of problems is not new [Cooper et al., 2012]. C. B ackstr om, in his thesis, has developed the Simplified Action Structure (SAS) formalism and has applied restrictions on his action representation to create the first tractable class of problems: SAS-PUS [B ackstr om, 1992]. PUS are the restrictions and stand for Post-uniqueness, Unariness, and Single-Valuedness, which fits two of our assumptions. In SAS, actions have post-conditions and two types of preconditions: the preand prevail-conditions. The pre-conditions define what must be true before the action execution and what will be changed by the post-conditions. Whereas the prevail-conditions define what must be true before the action execution and what must hold during the entire action execution. If an NPC fills a bucket with water, the bucket is previously empty (precondition), then filled (post-condition). And the bucket must remain in hands while being filled (prevail-condition). The (S) restriction implies that if a prevail-condition is defined for an action, then all the other actions must either have the same prevail-condition or not be affected by it. In our example, if an NPC has an action requiring the bucket in hands, all the other actions of the set must either require the bucket in hands or not care about having it. In other words, (S) implies that there is no action in the NPC s set whose prevail-condition is to not have the bucket in hands. (S) is very restrictive and prevents the creation of some realistic problems. To generalize our bucket example, one cannot create On/Off situations with the SAS-PUS class of problems. Considering our assumptions 1. and 2., it would be great to get rid of the single-valuedness. (S) cannot be simply removed, however, C. B ackstr om proved that the resulting class SAS-PU is intractable [B ackstr om, 1992]. The reason for the intractability is the exponentially-sized minimal3 solution plans of some problem instances. An underlying conclusion is that some actions have an exponential number of occurrences inside these plans, which, in addition to making the SAS-PU class intractable, does not respect our third assumption. According to Theorem 4.4 [B ackstr om, 1992, p.76], the class SAS-PUS does not respect our third assumption either: the minimal solution plan of each solvable SAS-PUS

3A minimal solution plan is a plan with the lowest number of actions.

A pre post prv Name

a1 v0 = 1 v0 = 0 u, u, u Drop Haystack a2 v0 = 0 v0 = 1 u, 0, u Take Haystack a3 v0 = 1 v0 = 2 u, u, u Fill Horse Feeder a4 v1 = 1 v1 = 0 u, u, u Drop Bucket a5 v1 = 0 v1 = 1 0, u, u Pick Up Bucket a6 v2 = 0 v2 = 1 u, 1, u Fill Bucket With Water a7 v2 = 1 v2 = 2 u, 1, u Fill Horse Trough

M = {v0 : Haystack, v1 : Bucket, v2 : Water} Dv0 = {0 : none, 1 : in Hands, 2 : in Feeder} Dv1 = {0 : none, 1 : in Hands} Dv2 = {0 : in Source, 1 : in Bucket, 2 : in Trough}

Table 1: The SAS action set of the Horse Breeder. Actions are (P) and (U). Variable u means undefined.

problem instance contains actions with at most two occurrences. Thus, the question is: does there exist a restriction which, once combined with (P) and (U), creates a tractable class whose solvable problem instances are solved by a minimal solution plans containing actions with at most one occurrence? (P), (U) and (S) are syntactical restrictions because they affect the action representation. There also exists structural restrictions that restrict the structure of a planning problem [Jonsson and B ackstr om, 1998]. Most of the time, the structure of a problem is represented as a graph: [Domshlak and Brafman, 2002; Helmert, 2006] have used the causal graph which focuses on the post-prevail dependencies; [Jonsson and B ackstr om, 1998] have introduced the domaintransition graph which, unlike the causal graph, focuses on the post-pre dependencies. The idea is then to take advantage of this representation to find a structural restriction. The causal graph and the domain-transition graph, however, respectively ignore the post-pre dependencies or the postprevail dependencies. To fully captures the intractability of SAS-PU problems and to define a new structural restriction that respect our 3rd assumption, we need to consider both the post-pre and the post-prevail dependencies. To this end, we define in this paper two other types of graph: the domain action graph presented in Section 3 and the action graph presented in Section 4. They focus on SAS-PU actions and the relations between them.

3 SAS-PU Planning Problems

A SAS planning problem is defined as a couple (M, A), with M a set of state variables and A a set of actions. Each state variable vi M has a domain of values denoted Dvi. A list of size |M| of such values defines a state and the ith element of a state is a value of Dvi assigned to the state variable vi. Then, each action of A is defined with post-conditions (post) and two types of preconditions: the pre-conditions (pre) and the prevail-conditions (prv). The unariness (U) implies each action only has one post-condition. The post-uniqueness (P) implies there are no two actions with the same post-condition. In SAS, the preand post-conditions both define the same state variables, so, due to (U), they both only affect one state variable. We thus define them with the state variable af-

Proceedings of the Thirty-Second International Joint Conference on Artiﬁcial Intelligence (IJCAI-23)

Drop Haystack (a1) Take Haystack (a2) Fill Horse Feeder (a3)

( 1) Drop Bucket (a4) (a5) Pick Up Bucket

Fill Bucket With

(a7) Trough

Figure 1: The domain-action graphs of the Horse Breeder.

fected (vi) and the value assigned (p, q Dvi, p = q): a A, pre(a) = (vi = q) post(a) = (vi = p). The prevail-conditions, on the contrary, are represented as a partially-defined state: a A, prv(a) = vi = p | vi M, p Dvi {u} , and u is the undefined value. Table 1 gives an example of a SAS-PU planning problem with the Horse Breeder, a NPC from the game Red Dead Redemption 2 [Defend The House, 2018]. The purpose of a SAS planner is then to solve a problem instance defined as the quadruple (M, A, s0, s ), with s0 the start state and s the goal state. These two states are totally defined in this paper as we work with SAS and not with SAS+ nor SAS [Jonsson and B ackstr om, 1998]. Let Ihb denote a problem instance of the Horse Breeder, Ihb = (M, A, 0, 0, 0 , 2, 0, 2 ) is a possible problem instance to define the daily routine: feed the horses. His goal is to have the water in the horse trough (s [v2] = 2) and the horse feeder filled with a haystack (s [v0] = 2). At the beginning of his daily routine, he carries nothing (s0[v0] = 0, s0[v1] = 0) and the water is in the source (s0[v2] = 0). To solve a problem instance, the planner will eventually build a minimal solution plan . Such plans are composed of several chains of actions, one (possibly empty) for each state variable from the start to the goal state [B ackstr om, 1992]. A chain of actions on vi from s0[vi] to s [vi], denoted chainvi(s0[vi], s [vi]), is a linear sequence of actions that focus on the relation between the preand post-condition. Once all the chains are known, the actions of these chains are ordered among themselves according to their prevail-conditions to create . For example, = a5, a6, a7, a4, a2, a3 is a linear and minimal solution plan that solves Ihb, and is composed of: chainv0(0, 2) = a2, a3 , chainv1(0, 0) = a5, a4 and chainv2(0, 2) = a6, a7 . In section 2, we explained that there exist problem instances in SAS that can be solved by minimal solution plans containing actions with several occurrences. In SAS, planners are allowed to instantiate several times the same action to build a plan. This is feasible if and only if these actions are looping with some other actions via the post-pre dependencies. The Horse Breeder can pick up and drop the bucket as many times as he wants because Drop Bucket and Pick Up Bucket are looping together. On the contrary, in this problem representation, the planner can only instantiate Fill Bucket With Water, Fill Horse Feeder or Fill Horse Trough once to solve a problem instance. The domain-action graph enables the visualization of such cycles:

Definition 1 (Domain-action graph). Let vi M, a domainaction graph of vi, denoted G(vi), is a weakly connected and

Drop Haystack (a1)

Take Haystack

Fill Horse Feeder

Drop Bucket (a4)

Pick Up Bucket

Fill Bucket With

Trough (a7)

Figure 2: The action graph (G) of the Horse Breeder. The blue edges are the post-pre dependencies (Epre(A)) while the green edges are the post-prevail ones (Eprv(A)).

directed graph. G(vi) = (A(vi), Epre(A(vi))) s.t.:

A(vi) = {a | a A, post(a) Dvi} is the vertex set.

Epre(A(vi)) = {(a, b) | a, b A(vi), post(a) = pre(b)} is the directed edge set.

The Horse Breeder has three domain-action graphs, one for each state variable (cf. Figure 1). G(v0) and G(v1) have one cycle with two actions while G(v2) has no cycle. It can be highlighted that, due to (P), there are at most |Dvi| actions in A(vi). Then, if there are less than |Dvi| 1 actions in A(vi) then G(vi) is disconnected and the actions of each of the disconnected part will never end up in the same action plan. Although this is theoretically feasible, with a SAS structure, i.e. with totally defined start and goal states, this is equivalent to considering that there are as many state variables as there are disconnected parts. Hence the weakly connected characteristic of the domain-action graphs. Eventually, if there are exactly |Dvi| 1 actions in A(vi), then there is one value in Dvi that cannot be set by an action but only by the environment. In our example, we consider that (Water = in Source) can only be set by the environment and not by an action of the Horse Breeder.

Lemma 1. Let vi M,

If |A(vi)| = |Dvi| 1, then G(vi) is a directed tree.

If |A(vi)| = |Dvi|, then G(vi) has a unique cycle.

Proof. (Idea of proof) This lemma is proven by recursion and by using the previous observations4.

This Lemma is significant as it highlights that actions responsible for the intractability of SAS-PU problems are contained in identifiable cycles:

Definition 2. We denote Cycle(vi) the set of actions that are inside the unique, possibly empty, cycle of G(vi).

For the Horse Breeder, we have: Cycle(v0) = {a1, a2}, Cycle(v1) = {a4, a5}, Cycle(v2) = . Although, for each vi M, Cycle(vi) contains actions that are likely to be instantiated several times, if none of them satisfy one defined prevail-condition of another action, then the planner does not need to pass through Cycle(vi) several times. Let consider that the Horse Breeder only has two actions: Drop Bucket and Pick Up Bucket: the minimal solution plan to solve s0[v1] = s [v1] is = ; and, the minimal solution plan to

4Feel free to contact the authors to get the technical appendices.

Proceedings of the Thirty-Second International Joint Conference on Artiﬁcial Intelligence (IJCAI-23)

solve s0[v1] = s [v1] is of size 1 and contains either an instance of Drop Bucket or an instance of Pick Up Bucket depending on which one satisfies s [v1]. Now, no matter how many actions the planning problem is composed of, if none of them requires Drop Bucket or Pick Up Bucket to satisfy their prevailcondition, the planner will never loop through Cycle(v1). In fact, this is crucial to identify what we called the requestable actions:

Definition 3 (Requestable Action). A requestable action is an action that solves the prevail-condition of at least another action of A. We denote Req the set of requestable actions and Req(vi) the set of requestable actions affecting vi M. vi M, Req(vi) = {a | a A(vi), b A, post(a) = prv(b)[vi]} and Req = S

vi M Req(vi)

For the Horse Breeder, the requestable actions are: Req = {a1, a4, a5}. Definitions 2 and 3 allow us to define a new structural restriction in function of the number of actions per cycle:

Definition 4. Let vi M, k N. Given a SAS-PU problem, we denote Ck the structural restriction that limits to at most k the number of actions inside each Cycle(vi) having at least one requestable action. If Req(vi) Cycle(vi) = , then |Cycle(vi)| k.

In the next section, we introduce the new classes of problems SAS-PUCk. In particular, with respect to some k N, we introduce new classes of tractable problems and we indicate from which k our 3rd assumption is no longer respected.

4 The Classes of Problems SAS-PUCk Our 3rd assumption is: there is no two times the same action in an NPC s plan. This assumption is an output restriction, however, and does not give information on how to design a SAS planning problem. The purpose of this section is thus to study SAS-PUCk problems for some k N so as to find problems whose solvable instances are necessarily solved by a solution plan respecting our 3rd assumption. If such classes of problems exist, then we will say that they respect the 3rd assumption. In the following, we only consider k = 0, k = 2 and k 3. More precisely, we prove that the class SAS-PUC0 respects our 3rd assumption but not the class SAS-PUC2 nor the class SAS-PUCk when k 3. With additional restrictions on SAS-PUC2 problems, however, we created two subclasses, namely SAS-PUCS 2 and SAS-PUC 2, that respect our 3rd assumption. Concerning k = 1, this case is meaningless as it implies there is an action whose pre-condition is equal to its post-condition. This is not possible due to inner restrictions of the SAS formalism [B ackstr om, 1992, p.52].

Theorem 1. SAS-PUC0 problems respect our 3rd assumption.

Proof. (Idea of proof) In these problems, for each vi M, there is no Cycle(vi) containing a requestable action. If the planner instantiates some actions from these cycles, it is just to link the start to the goal state. The planner will never loop through these cycles to seek for a requestable action. For

the actions outside such cycles, they can obviously only be instantiated once. The theorem is then proved by recursion.

Theorem 2. SAS-PUC2 problems are intractable.

Proof. (Idea of proof) The Gray Code problem defined in Proof 6.14 [B ackstr om, 1992, p.138] is also of the class SASPUC2. The conclusion is that it exists problem instances with an exponentially-sized minimal solution plan.

Lemma 2. For k 3, every SAS-PUCk problem has at least one problem instance whose minimal solution plan has at least one action with two occurrences in it.

Proof. (Idea of proof) For each of these problems, we can build a problem instance similar to the one presented in Figure 4.3 [B ackstr om, 1992, p.75]. It results the statement of this Lemma.

According to our 3rd assumption and Lemma 2, we cannot model a NPC problem with the class SAS-PUCk with k 3. The Horse Breeder problem is of the class SAS-PUC2. It is (P) and (U) and both Cycle(v0) and Cycle(v1) are concerned by the restriction (C2). |Cycle(v0)| = |Cycle(v1)| = 2 and they both contain at least one requestable action: Req(v0) Cycle(v0) = {a1} and Req(v1) Cycle(v1) = {a4, a5}. It can be proved by hands that the Horse Breeder problem respects our 3rd assumption. So there exists sub-classes of SAS-PUC2 problems that respect our 3rd assumption: SASPUCS 2 and SAS-PUC 2 are two of them.

Definition 5. SAS-PUCS 2 is a sub-class of SAS-PUC2 and there is at most one requestable action per domain-action graph cycle: vi M, |Req(vi) Cycle(vi)| 1.

The letter S in (CS 2 ) recalls (S), the single-valuedness restriction.

Theorem 3. SAS-PUCS 2 problems respect our 3rd assumption.

Proof. (Idea of proof) Let vi M, consider Cycle(vi) = {a , a+} such that post(a+) = + and post(a ) = , and a+ is the requestable action. If s0[vi] = +, then the planner does not need to pass through Cycle(vi) as + is satisfied by s0[vi] and a is not requestable. If s0[vi] = , then the planner can search for a+ in Cycle(vi). it does not need to do it more than once, however, thus respecting the 3rd assumption. If s0[vi] Dvi \ { , +}, then the planner cannot reach the actions in Cycle(vi). So the problem instance is not solvable if a+ is requested while planning. The theorem is proved by recursion using this idea of proof.

SAS-PUCS 2 problems suffer from the same issue as the SAS-PUS problems: On/Off situations cannot be modeled. The Horse Breeder is not of the class SAS-PUCS 2 as Drop Bucket and Pick Up Bucket are both requestable and looping together: |Req(v1) Cycle(v1)| = 2. A corollary of both Theorems 2 and 3 is that a domain-action graph cycle, in which both actions are requestable, is the underlying cause of intractability in certain SAS-PUC2 problems. Proof 6.14 [B ackstr om, 1992, p.138] is a very good example

Proceedings of the Thirty-Second International Joint Conference on Artiﬁcial Intelligence (IJCAI-23)

A ID Npre Nprv Name

a1 a0 v0 {a2} Drop Haystack a2 a1 v0 {a1} {a4} Take Haystack a3 a2 v0 {a2} Fill Horse Feeder a4 a0 v1 {a5} Drop Bucket a5 a1 v1 {a4} {a1} Pick Up Bucket a6 a1 v2 {ghost} {a5} Fill Bucket With Water a7 a2 v2 {a6} {a5} Fill Horse Trough

Table 2: Identifiers and predecessor sets for the Horse Breeder.

to understand how a planner can loop through such a cycle: the two requestable actions must be instantiated several times to alternatively satisfy the prevail-condition of some others action instances who already have a specific ordering. SAS-PUC 2 problems, on the contrary, allow the use of domain-action graph cycles with two requestable actions, which is essential to model some On/Off situations such as Cycle(v1) in the Horse Breeder problem, and still respect our 3rd assumption. To define this class, we need to introduce the action graph, which is a graph that captures both the post-pre and the post-prevail dependencies between the actions. Figure 2 gives the action graph of the Horse Breeder.

Definition 6 (Action graph). An action graph is the directed graph G = (A, EA), with A the vertex set and EA the directed edge set. EA = Epre(A) Eprv(A) such that:

Eprv(A) = {(a, b) | a Req(vi), b A, post(a) = prv(b)[vi]}, stores the post-prevail dependencies.

Epre(A) = S

vi M (Epre(A(vi))), stores the post-pre de-

pendencies.

Definition 7. SAS-PUC 2 is a sub-class of SAS-PUC2 and for each Cycle(vi) = {a+, a } with two requestable actions, the actions requesting a+ must not be related to the actions requesting a in the action graph G \ G(vi).

The Horse Breeder is of the class SAS-PUC 2: a4, a5 Cycle(v1) are both requestable and a2, a6, a7 A are such that {(a4, a2), (a5, a6), (a5, a7)} Eprv(A) and a2 is not related to either a6 nor a7 in G \ G(v1). That is, if a4 and a5 are removed from G, a2 is in a different subgraph as a6 and a7. Let Cycle(vi) = {a+, a }, the idea of SAS-PUC 2 is that, in any instance of these problems, the actions requesting a+ can be ordered independently of the actions requesting a . Thus, the planner can instantiate a+ and ordered all the actions requesting a+ before instantiating a and ordered all the actions requesting a .

Theorem 4. SAS-PUC 2 problems respect our 3rd assumption.

Proof. (Idea of the proof) When G(vi) is removed from G, the results is a disconnected graph with two or more parts. And each part is an action graph that models either a SASPUC0 problem, a SAS-PUCS 2 problem or a SAS-PUC 2 problem. The theorem is then proved by recursion.

Procedure 1 Build Chain(vi, s, g, D, ED, A) Input: vi M, s, g Dvi, s = g and ag vi is white; D, a set of yellow actions; ED, a set of orders between the actions of D; Parameters: x, y, two values of Dvi. Output: Each browsed action a is colored yellow, added to D and the order (Npre(a), a) is added to ED. 1: x ; y 2: if ag vi is a ghost action then fail 3: end if 4: Color(ag vi) yellow; D D {ag vi}; y pre(ag vi); ED ED {(ay vi, ag vi)} 5: if Next(ay vi) = {ay vi can be a ghost action.} then 6: Next(ay vi) ag vi 7: end if 8: while y = s do 9: if ay vi is a ghost action then fail 10: end if 11: if Color(ay vi) = yellow then fail 12: end if 13: Color(ay vi) yellow; D D {ay vi}; x y; y pre(ay vi); ED ED {(ay vi, ax vi)} 14: if Next(ay vi) = {ay vi can be a ghost action.} then 15: Next(ay vi) ax vi 16: end if 17: end while

5 Topological Planning In this section, we present our algorithm Topo Plan and two procedures that composes it: Build Chain and DFSTopo. The specification of our algorithm is the following: Definition 8. (Topo Plan s specification) Input: (M, A, s0, s ), a problem instance of the class SAS-PUC0, SAS-PUCS 2 or SAS-PUC 2. Output: If the instance is solvable, then Topo Plan returns , a linear and minimal solution plan with at most one occurrence of each action of A. If the instance is not solvable, Topo Plan yields a failure.

5.1 Pre-processing First of all, there is a pre-processing phase for our algorithm in which each action is associated with the state variable it affects, the value of its preand post-condition, and its prevailconditions. (P) and (U) allow the use of the post-condition of each action as an identifier (ID) for creating a hashing table. Let vi M, p Dvi, a A : ap vi identifies a iff post(a) = (vi = p). We also created two sets of predecessors: Npre(a) = {b | (b, a) Epre(A)} is the set of predecessors that establish the pre-condition of a. Due to (P), this set is a singleton. Nprv(a) = {b | (b, a) Eprv(A)} is the set of predecessors that establish the defined prevailconditions of a. Eventually, if |A(vi)| = |Dvi| 1, then there is exactly one ID that identifies a ghost action. A ghost action is an action with an ID but that does not exist in the action set. The concatenation of Table 1 and Table 2 is an example of this pre-processing applied to the Horse Breeder. The post-uniqueness does not mean pre-uniqueness. For example, the Horse Breeder s actions Fill Horse Feeder and

Proceedings of the Thirty-Second International Joint Conference on Artiﬁcial Intelligence (IJCAI-23)

Procedure 2 DFSTopo(ap vi, D, ED, s0, ) Input: ap vi, a yellow action affecting vi with the value p Dvi; s0 S; , a linear sequence of green actions. Output: ap vi is colored in green once all its neighbors have been topologically sorted; It is then enqueued to . 1: Color(ap vi) blue 2: for aq vj ED(ap vi){i.e. (aq vj, ap vi) ED} do 3: if aq vj / Npre(ap vi) ap vi is not the first action to modify s0[vi] then 4: if Color(aq vj) = blue then fail {Cycle spotted.} 5: end if 6: if Color(aq vj) = yellow then 7: DFSTopo(aq vj, D, ED, s0, ) 8: end if 9: end if 10: end for 11: Color(ap vi) green; + {ap vi};

Take Haystack are post-unique but both have the same precondition. Topo Plan plans backwards to benefit from the post-uniqueness, but we gave actions a pointer, called Next, that Topo Plan will dynamically set (1.6, 1.15) and use (3.20, 3.23, 3.24) while planning to refer in constant time to the successor of an action in a chain5. Eventually, each action can have four different colors: (white) the action is not in the solution plan, (yellow) the action is in the solution plan, (blue) the action is being topologically sorted, (green) the action is topologically sorted.

5.2 How Topo Plan Works

According to Theorems 1, 3 and 4, solvable instances of a SAS-PUC0, SAS-PUCS 2 or SAS-PUC 2 problem are solved by a minimal solution plan that respects our 3rd assumption. We previously explained that such action plans are composed of chains of actions. The strategy of our algorithm, Topo Plan, is therefore to find and create every required chain of actions, then to order the actions of these chains via their post-prevail dependencies. The building of the chains is done backwards by the procedure Build Chain (3.4, 3.15 and 3.21). The postprevail orders are done by browsing each action a colored yellow by Build Chain (3.9, a is therefore in a chain) and by looking after the predecessors of a via Nprv(a) (3.10). Due to the fact that we work with totally defined start and goal states, and due to the shape of a domain-action graph with (P) and (U) actions, i.e. a directed tree with a (possibly empty) unique cycle (cf. Lemma 1), a chain of actions derived from this graph that respects our 3rd assumption can only have 3 different forms. The first form (σ1) consists of a unique path from the start value to the goal value. For instance with the Horse Breeder, chainv0(0, 2) = a2, a3 and chainv2(0, 2) = a6, a7 are of the form of (σ1). In our algorithm, these chains are built during Phase 1 (3.2 to 3.6) by the Build Chain procedure (3.4) for each state variable whose value in the start and goal states is different. The

5We refer to the procedures with the notation (procedure.line).

Procedure 3 Topo Plan(M, A, s0, s ) Input/Output: (Definition 8). Parameters: D, T , two yellow action sets; EDT , an order set for the actions of D T . 1: ; D ; T ; EDT 2: for vi M do {Phase 1} 3: if s0[vi] = s [vi] then 4: Build Chain(vi, s0[vi], s [vi], D, EDT , A) 5: end if 6: end for 7: if D = then return {s0 and s are equal.} 8: end if 9: for ap vi D T do {Phase 2} 10: for aq vj Nprv(ap vi) do 11: if q = s0[vj] then 12: if aq vj / A then fail 13: end if 14: if Color(aq vj) = white then 15: Build Chain(vj, s0[vj], q, T , EDT , A) 16: end if 17: EDT EDT {(aq vj, ap vi)} 18: end if 19: if q = s [vj] then 20: if Next(aq vj) = then 21: Build Chain(vj, q, s0[vj], T , EDT , A) 22: end if 23: if prv(Next(aq vj))[vi] = p then 24: EDT EDT {(ap vi,Next(aq vj))} 25: end if 26: end if 27: if q = s0[vj] aq vj Cycle(vj) Cycle(vj) is concerned by (C 2) then

28: EDT EDT {(as0[vj] vj , ap vi)} 29: end if 30: end for 31: end for 32: for a D T do {Phase 3} 33: if Color(a) = yellow then 34: DFSTopo(a, D T , EDT , s0, ) 35: end if 36: end for 37: return

second form (σ2) consists of the unfolding of the domainaction graph cycle. It happens when the start value is equal to the post-condition of one of the two actions inside that cycle and the other action of the cycle is requested to solve the problem instance. This action chain (σ2) therefore leaves and returns to the start value6. For instance with the Horse Breeder, chainv1(0, 0) = a5, a4 is of the form of (σ2). These chains can be built during Phase 2 (3.9 to 3.31). In this phase, actions instantiated in a chain are ordered in relation to each other according to their post-prevail dependencies (3.17, 3.24 and 3.28). But some requested actions

6It can be highlighted that if the cycle is unfolded more than once, the 3rd assumption is no longer respected.

Proceedings of the Thirty-Second International Joint Conference on Artiﬁcial Intelligence (IJCAI-23)

may not be instantiated yet because they are in a domainaction graph cycle. The two Build Chain procedures (3.15) and (3.21) thus build the missing chains. Let consider the Horse Breeder instance Ihb = (M, A, 0, 0, 0 , 2, 0, 2 ), at the end of Phase 1, the chains chainv0(0, 2) = a2, a3 and chainv2(0, 2) = a6, a7 are built but chainv1(0, 0) = . Yet a5 is required by a6 and a7, so the first Build Chain procedure (3.15) builds chainv1(0, 1) = a5 and the other one (3.21) builds chainv1(1, 0) = a4 . The concatenation of the two results in chainv1(0, 0) = a5, a4 which is a chain of the form of (σ2). The last possible form (σ3) is the concatenation of (σ2) followed by (σ1). In the Horse Breeder problem, it can happen for the state variable v0: if s0[v0] = 1 and s [v0] = 2, then chainv0(1, 2) = a1, a2, a3 is feasible such that: σ1 = a3 and σ2 = a1, a2 . It should be noticed that, in this case, the first action of (σ1) and the first action of (σ2) both have their pre-condition equal to the start value, hence the second stop condition (2.3) in the DFSTopo procedure. Eventually, in Phase 3 (3.32 to 3.36), Topo Plan topologically sorts the partially ordered sequence of actions D T , EDT returned by Phase 2.

Theorem 5. Topo Plan satisfies its specification so it is correct and complete.

Proof. (Idea of the proof) We prove that the three phases of Topo Plan are correct and complete. To do so, we define a loop invariant for each phase and we show that each loop terminates.

Theorem 6 (Time Complexity). Topo Plan worst-case time complexity is O(|A| + |EA|), with A the set of actions and EA the set of orders between the actions of A.

Proof. We explained that the Build Chain calls (3.4), (3.15) and (3.21) never build the same chain of actions: (3.4) builds σ1 chains, (3.15) and (3.21) builds two different parts of σ2. This is also ensured by the if statements (3.14), (3.20) and (1.11): they check the Build Chain procedure only builds chains with white actions, i.e. actions that are not yet in the plan. It results D T = . It also results that, between (3.2) and (3.31) the three Build Chain calls execute O(|D T |) O(|A|) instructions: |D| instructions in Phase 1 (3.2 to 3.6) plus |T | instructions in Phase 2 (3.9 to 3.31). Now, for the two for loops (3.9) and (3.10), assume that all the necessary chains have been built, then the core of the second for loop (from 3.11 to 3.29) has a constant cst number of instructions. The first for loop (3.9) only take 1 instruction to execute. It results the formula: X

a D T (1 + X

b Nprv(a) cst) = X

a A (1 + cst X

b Nprv(a) 1)

=|A| + cst X

(b,a) Eprv(A) 1

=O(|A| + |Eprv(A)|) (1)

Eventually, at the end of Phase 2 (3.31), Topo Plan has executed O(|A|) + O(|A| + |Eprv(A)|) O(|A| + |Eprv(A)|) instructions. Phase 3 (3.32 to 3.36), performs a topological sort to sort the non-linear plan D T , EDT . It takes

O(|D T | + |EDT |) O(|A| + |EA|) instructions which dominates the whole.

Theorem 7 (Space Complexity). Topo Plan requires at most O(|A|2) space.

Proof. A (PU) action a takes O(|M|) space: the preand post-condition can be reduced to one variable each, plus a variable for the ID; the set Npre(a) is a singleton due to restrictions (P) and (U); the prevail-conditions, on the contrary, are stored on a list of |M| elements and the set Nprv(a) has at most |M| elements. Then, the hashing table that stores IDs takes O(|A|) space. Finally, the pre-processing phase creates an action graph G which takes at most O(|A|2) space: The set of actions takes O(|A|) space and each action can have at most O(|A|) predecessors. Hence O(|A|2), which dominates the whole.

6 Experiments We have carried out experiments on abstract settings to test Topo Plan. Given the description of three different realistic SAS-PUC 2 problems (Different NPCs from Red Dead Redemption 2 (including the Horse Breeder), citizens from Assassin s Creed: Origins [Ubisoft, 2017] and the acquisition machines from Horizon Zero Dawn [Games, 2017]), we wanted to test how many of these NPCs can get a plan in real time by our C++ implementation of Topo Plan? With the following configuration: AMD Ryzen 7 2700X (8-Core) CPU (3.7GHz), 32Gb of RAM and Windows 10 (64 bits), Topo Plan was able to provide more than 3.000.000 plans with up to 10 actions in it in less than 1.67ms, which is 10% of the time between two frames in a 60FPS video game7.

7 Conclusion Based on three assumptions made by studying commercial video games using planning systems, we have introduced three new tractable classes of planning problems (SAS-PUC0, SAS-PUCS 2 and SAS-PUC 2) that allow the design of realistic NPCs. The instances of these problems are all solvable by a correct, complete and a linear-time planning algorithm, called Topo Plan, that we provide in this paper. We used the SAS formalism which imposes the start and the goal states to be totally defined. In a future work, we can study other structures such as SAS [Jonsson and B ackstr om, 1998] that allows s to be partially-defined, or SAS+ [B ackstr om, 1992] that allows both s0 and s to be partially defined. Such structures may be easier to use to define NPC problem instances for example. Among our assumptions are the post-uniqueness (P), although it is an acceptable restriction to create NPC planning problems, as shown in this paper, it remains a restriction GOAP developers would like to relieve. Our new structural restriction defined in this paper can surely help to study how (P) can be relieved to create new tractable classes of problems without (P).

7More details can be found in [Pr evost et al., 2022] about the benchmarks, the experiments and the results. There is no mention of the classes SAS-PUC0, SAS-PUCS 2 and SAS-PUC 2 in this paper which, instead, groups them under the class SAS-PUT1, where T1 is the concatenation of T (totally-ordered) and T1 (our 3rd assumption).

Proceedings of the Thirty-Second International Joint Conference on Artiﬁcial Intelligence (IJCAI-23)

Acknowledgments

Musical thanks to Aline Hufschmitt for her reviews and constant support during this work. We also would like to thank Gabriel Robert and Simon Girard, AI experts from Ubisoft, for sharing their game development experience which definitely helped us to define our assumptions and create our NPC models. Eventually, we would like to express our sincere gratitude to Peter Jonsson and Aur elie Beynier, the reviewers of Guillaume s thesis, for their invaluable appreciation and their encouragement to publish in this conference.

[B ackstr om, 1992] Christer B ackstr om. Computational Complexity of Reasoning about Plans. Ph D thesis, Department of Computer and Information Science, Link oping University, september 1992.

[Bylander, 1994] Tom Bylander. The computational complexity of propositional STRIPS planning. Artificial Intelligence, 69(1-2):165 204, 1994.

[Conway, 2015] Chris Conway. GOAP in Tomb Raider. https://youtu.be/gm7K68663r A?t=1200, March 2015. Accessed on May 12th, 2023.

[Cooper et al., 2012] Martin Cooper, Fr ed eric Maris, and Pierre R egnier. Tractable monotone temporal planning. In Proceedings of the International Conference on Automated Planning and Scheduling, volume 22, pages 20 28, 2012.

[Defend The House, 2018] Defend The House. NPC daily life in Read Dead Redemption 2. https://youtu.be/Mr UJJgpp Mn4?t=434, November 2018. Accessed on May 12th, 2023.

[Domshlak and Brafman, 2002] Carmel Domshlak and Ronen Brafman. Structure and complexity in planning with unary operators. In Proceedings of the 6th International Conference on Artificial Intelligence Planning Systems, pages 34 43. AAAI Press, 2002.

[Games, 2017] Guerilla Games. Horizon Zero Dawn. https://www.guerrilla-games.com/games, December 2017. Accessed on May 12th, 2023.

[Girard, 2021] Simon Girard. Postmortem: AI action planning on Assassin s Creed Odyssey and Immortals Fenyx Rising. https://www.gamedeveloper.com/programming/postmortem AI-action-planning-on-Assassins-Creed-Odyssey-and Immortals-Fenyx-Rising-, November 2021. Accessed on May 12th, 2023.

[Helmert, 2006] Malte Helmert. The fast downward planning system. Journal of Artificial Intelligence Research, 26:191 246, 2006.

[Higley, 2015] Peter Higley. GOAP at monolith productions. https://youtu.be/gm7K68663r A, March 2015. Accessed on May 12th, 2023.

[Horti, 2017] Samuel Horti. Why F.E.A.R. s AI is still the best in first-person shooters Flank, cover and run away. https://www.rockpapershotgun.com/why-fears-aiis-still-the-best-in-first-person-shooters, April 2017. Accessed on May 12th, 2023.

[Jacopin, 2014] Eric Jacopin. Game AI planning analytics: The case of three first-person shooters. In Proceedings of the 10th AIIDE, pages 119 124. AAAI Press, 2014. [Jonsson and B ackstr om, 1998] Peter Jonsson and Christer B ackstr om. State-variable planning under structural restrictions: algorithms and complexity. Artificial Intelligence, 100(1-2):125 176, April 1998. [Monolith Productions, 2006] Monolith Productions. F.E.A.R. public tools. https://www.gamefront.com/games/f-e-a-r/file/f-e-a-rv1-08-sdk, June 2006. Accessed on May 12th, 2023. [Ocampo, 2007] Jason Ocampo. F.E.A.R. review. https://www.gamespot.com/reviews/fear-review/19006169771/, April 2007. Accessed on May 12th, 2023. [Orkin, 2003] Jeff Orkin. Applying goal-oriented action planning to games. In Steve Rabin, editor, AI Game Programming Wisdom 2, volume 2, chapter 3.4, pages 217 227. Charles River Media, 2003. [Orkin, 2005] Jeff Orkin. Agent architecture considerations for real-time planning in games. In Proceedings of the 1st AIIDE, pages 105 110, 2005. [Pr evost et al., 2022] Guillaume Pr evost, St ephane Cardon, Eric Jacopin, Tristan Cazenave, and Christophe Guettier. Planning for millions of NPCs in real-time. In 2022 IEEE Symposium Series on Computational Intelligence (SSCI), pages 330 336. IEEE, 2022. [Ubisoft, 2017] Ubisoft. Assassin s Creed: Origins. https://store.ubi.com/fr/assassin-s-creed-origins-allgames, Octobre 2017. Accessed on May 12th, 2023.

Proceedings of the Thirty-Second International Joint Conference on Artiﬁcial Intelligence (IJCAI-23)