# efficient_temporal_planning_using_metastates__25012f5d.pdf

The Thirty-Third AAAI Conference on Artiﬁcial Intelligence (AAAI-19)

Efﬁcient Temporal Planning Using Metastates

Amanda Coles, Andrew Coles, J. Christopher Beck

*Department of Informatics, King s College London, UK. Department of Mechanical & Industrial Engineering, University of Toronto, Canada. email: {amanda,andrew}.coles@kcl.ac.uk, jcb@mie.utoronto.ca

When performing temporal planning as forward state-space search, effective state memoisation is challenging. Whereas in classical planning, two states are equal if they have the same facts and variable values, in temporal planning this is not the case: as the plans that led to the two states are subject to temporal constraints, one might be extendable into at temporally valid plan, while the other might not. In this paper, we present an approach for reducing the state space explosion that arises due to having to keep many copies of the same classically equal state states that are classically equal are aggregated into metastates, and these are separated lazily only in the case of temporal inconsistency. Our evaluation shows that this approach, implemented in OPTIC and compared to existing state-of-the-art memoisation techniques, improves performance across a range of temporal domains.

1 Introduction

Planning is fundamental to intelligent autonomous behavior and reasoning about time is essential to planning in many real-world domains. One of the most popular paradigms for planning is forward state-space search, and key to its success is memoisation. In classical planning if a given state (set of propositions and variable assignments) has been seen before, then it need not be explored again if it is reached by a different sequence of actions. This reasoning can easily be extended to a setting where actions have costs, by keeping only the lowest cost path to a given state. Temporal planning brings with it further challenges for memoisation (Coles and Coles 2016). Not only are the values of variables and propositions important, as in classical planning, but also the path taken to reach a state can determine whether the partial plan can be extended into a temporally valid plan to reach the goal. For example, if we took a longer path to reach a state, we might no longer be able to meet a deadline. In the worst case in temporal planning, this results in the need to keep different states for every possible path to every classically equal state. Coles and Coles (2016) made some progress on this issue with a technique to prune isomorphic plans and to identify special cases where achieving facts earlier can always be proven to be better. However,

Copyright c 2019, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

there are many cases where states that are not likely to be interestingly different still have to be considered as such. In this paper we propose a radically different approach to dealing with the need to explore classically identical states in temporal planning. Our approach is based on the idea of metastates. Instead of inserting multiple copies of a classically identical state (which are reached by different paths) into the open list for search to explore, we maintain a single metastate that aggregates these. Each time a new state is generated, we either create a new metastate if it is classically unique; or if we have previously seen a classically identical state we add information to the existing metastate to record that there is an alternative path to the state (which may lead to different temporal constraints). Now, we search over the space of metastates, nominally expanding only one member from each. Since here we are interested in satisﬁcing planning we need only consider expanding the other members of a metastates, if its descendants are temporally inconsistent. We empirically evaluate our approach on temporal planning domains and our results show a signiﬁcant improvement in performance over the state-of-the-art in memoisation for temporal planning.

2 Background

2.1 Problem Deﬁnition

A PDDL2.1 (Fox and Long 2003) planning problem is deﬁned over a collection of propositions P, and a vector of numeric variables v. These are manipulated and referred to by actions. The executability of actions is determined by their preconditions, conjunctions of conditions. A condition is either a single proposition p P, p, or a numeric constraint over v. We assume all such constraints are linear, and hence can be represented in the form w.v{>, , <, , =}c where w is a vector of constants and c is a constant). Each durative action A has three sets of preconditions: pre A, pre A, pre A. These represent the conditions that must hold at its start, throughout its execution (invariants), and at its end, respectively. Instantaneous effects can occur at the start or end of A: eff+ A (eff A) denote propositions added (resp. deleted) at the start; effnum A denotes numeric effects. Similarly, eff+ A, eff and effnum record effects at the end. We assume numeric effects are of the form: v{+=, -=, =}w.v + c (v v). Semantically, the val-

Figure 1: Example Driverlog Problem

ues instantaneous of effects become available small amount of time, ϵ, after they occur. Finally, the action has a duration constraint: a conjunction of numeric constraints applied to a special variable dur A denoting its duration. As a special case, instantaneous actions have duration ϵ, and only one set of preconditions pre A and effects eff+A, eff A, and effnum A. A durative action A can be split into two instantaneous snap-actions, A and A , representing the start and end of the action respectively, and a set of constraints (invariant and duration constraints). Action A has precondition pre A and effects eff+ A, eff A, effnum A. A is the analogous action for the end of A. A solution to the problem is a plan: a timestamped sequence of actions with associated durations, that transforms the initial state I into one that satisﬁes the goal G. All pre/invariant conditions must satisﬁed at the time of/during execution and actions that have started must have ﬁnished.

2.2 Memoisation in Temporal Planning Forward search temporal planning begins from the initial state: a set propositions that are known to be true and assignments to the numeric variables in v. At each state S during search the planner generates successor states, each S corresponding to the application of a logically applicable snap action: one whose preconditions are satisﬁed in S, and whose effects do not violate the invariant conditions pre A of any action A that has started but not yet ended. Searching in this way ensures all plans are logically valid (all preconditions are satisﬁed), but does not ensure they are temporally valid (respect the duration constraints of actions). In this work we build on the planner OPTIC, which records temporal constraints in each state in the form of a Simple Temporal Problem (STP) (Dechter, Meiri, and Pearl 1991) or Mixed Integer Program (MIP), and updates these as search progresses. Because of these temporal constraints, state memoisation i.e. determining when two states are equivalent, to avoid redundant search is more difﬁcult than in classical planning. Suppose we have two states A and B, with the same facts, variable values, and executing actions, but different temporal constraints. While these may have the same logically applicable actions, applying the same action in each to yield A and B may yield different temporal constraints, such that those in A are satisﬁed while those in B are not. Thus, we cannot say that A and B are equal, as the search tree reachable under each may be different. To understand why this might be, consider the Driverlog Shift problem depicted in Figure 1. The Driverlog Shift domain extends the Driverlog domain by adding a shift action, that adds a predicate (available ?driver); then, all actions involving the driver (boarding, disembarking and driv-

ing the truck) have this predicate as an invariant and so must take place during the execution of shift. Let us assume that the duration of the shift action is 6 time units and it takes a nominal 0.1 time units to load and unload packages. This means that the plan to deliver the package to E (the goal) via B is not feasible as it will take too long; whereas going via D is feasible. Suppose in forward-search to reach this plan we encounter the state where the driver and package are in the truck at C, having followed the plan that drives via B, and memoise that we have seen the state with these facts true. Now, if we later reach this state again, having taken the path via D, classical memoisation would prune the state, as it has the same facts as one that was seen before. In temporal planning this would render the problem unsolvable: the state reached via D, has an STN whose temporal constraints allow it to be extended into a solution plan; whereas the state via B, the only one we kept does not. In general, in temporal planning, we therefore cannot prune a state simply because it has the same facts as one we have already seen. Coles and Coles (2016) took a step towards addressing this issue by testing for equality based on the plan to reach states by checking whether these plans are isomorphic. When OPTIC adds a new snap action to a plan, it is ordered only after other actions as required for logical soundness, e.g. after the adders of its preconditions. Suppose in our Driverlog example there are two packages p and q at A to be loaded onto the truck. OPTIC will generate two states, one resulting from applying load p, load q and the other, load q, load p. However, since the two load actions are independent they will not be ordered with respect to each other, so these plans are isomorphic partial orders. In terms of the planning problem, isomorphic partial orders are effectively identical so by identifying such partial orders and keeping only one such state, the search space can be reduced. This was effective at reducing the number of nodes expanded by search, but has the limitation that if the two plans contain different actions, they cannot be isomorphic: it can detect permutations of the same actions, but not interchanging different but effectively equivalent actions. It is this point we explore in this paper. For example, in our Driverlog Shift problem, two plans that start shift then go from A to C one going via B, the other via D cannot be isomorphic as the move actions are different, even if the truck in both is at C. We need to keep both for completeness when the duration of shift is tight enough to preclude one of the plans from reaching a goal state; but if the duration of shift is sufﬁciently long, it would not matter for the purposes of solving the problem which route was taken to C (although one might admit a better quality plan). But, regardless of the duration of shift , a temporal planner would have to keep both, and expand them as different states, blowing up the search tree; whereas a classical planner would keep only one. While we implement our ideas in OPTIC, this is without loss of generality. Our work is applicable to other approaches to managing temporal constraints in forward planning, such as the decision epoch approach (Cushing et al. 2007) used in SAPA (Do and Kambhampati 2003) and TFD (Eyerich, Mattm uller, and R oger 2009) as these must still consider the queue of pending action ends when deter-

mining whether states are equal; as, again, the search tree reachable under two otherwise equal states may differ.

3 Metastate-Space Search In this section we introduce the notion of metastates and explain the details of forward-search metastate space.

3.1 Preliminaries A state is deﬁned as follows:

Deﬁnition 3.1 A state A state comprises:

f the facts that are true in the state.

v the values of each state variable.

P = [p0..pn] a partial plan. Each pi represents an instantaneous action, start snap action or end snap action.

Q a list of actions that started in P, but have not yet ﬁnished. For each a, i, dmin, dmax Q: a identiﬁes the ground durative action; i is its step index in the plan P; dmin, dmax are the minimum/maximum duration of a, calculated based on the values of v in the state at step i.

T temporal constraints over the steps in the plan P.

In classical forward-search planning, two states S and S can then be said to be equal if S.f = S .f and S. v = S . v. The other parts of the tuple are irrelevant: Q is always empty, as all actions are instantaneous; T is unnecessary, as steps in P are totally ordered. Even if P is different in S and S , this has no bearing on further state expansion: what determines this is which preconditions are satisﬁed and since f and v are equal for each state the same preconditions must be satisﬁed in both. Thus, as the reachable search spaces under S and S are identical, S can be pruned if it is equal to another state S that has already been seen. In temporal forward-search planning, we have to make three important distinctions:

The list of actions Q matters: we can only end an action a i.e. apply the snap-action a , if a Q.

Whenever an action has been applied, all the invariant conditions of the actions recorded in Q must be respected.

T must be consistent. Otherwise, while the plan under construction may be logically consistent (all preconditions are satisﬁed), it may be temporally inconsistent for instance, if T constrains a long durative action to be contained entirely within a shorter one.

The ﬁrst two of these are logical in ﬂavor: a straightforward inspection of Q will suitably restrict the applicable actions to respect invariant constraints and start end planning semantics, without reference to temporal information per se. The last, however, poses an issue for state memoisation. If two states S and S have the same facts, the same variable values, and the same actions in Q, the same actions will be logically consistent extensions of S and S . But, they might not be temporally consistent extensions of both S and S , due to their effect on the temporal constraints T. Thus, it is not completeness preserving to keep only one of S or S .

The current state of the art (Coles and Coles 2016) addresses the memoisation issue in two ways. First they prove that, for states in which no actions are executing if Q is empty (i.e. there are no open actions), and S and S are both temporally consistent, it is completeness preserving to keep only one of them. Because search proceeds in a forwards direction, all subsequent plan steps will be ordered after those already in the plan; and as these constraints w.r.t. the existing plan steps are only ever minimum separation constraints (a later step is ordered 0 or ϵ after some existing step), any temporally consistent plan extension from S will also be consistent from S . Second, for all other states (with open actions), this observation no longer applies, in which case S and S are both to be kept unless their plans are isomorphic; i.e. exactly the same actions were applied in a different order, but leading to identical temporal constraints. This was shown to be effective compared to simpler alternative of keeping all states with open actions, but the requirement that S and S must be reached by identical actions limits pruning that can be achieved.

3.2 Strong and Weak Equality The core limitation of standard memoisation is that to preserve completeness, it must use strict state duplicate detection. As noted, this is particularly restrictive when comparing states with open actions. For instance, a state S is reached by moving from A to B to C and then starting an action a, is different to a state S moving from A to D to C and then starting a even if all the facts in the states are the same, a is an open action, so plan isomorphism is used. As the two plans contain different actions, they cannot be isomorphic, so both S and S are kept. In practice, we hypothesize that it is often sufﬁcient to only expand one of S or S , as they are sufﬁciently similar, but we want to maintain completeness in the case where expanding the other was necessary. As a step towards this, we ﬁrst deﬁne notions of strong and weak state equality. Strong equality is suitable for completeness-preserving memoisation; weak equality is not, but deﬁnes the concept of two states being sufﬁciently similar . Ancillary to this, we deﬁne two helper functions, the number of times action a is executing in the queue Q of a state: num exec(Q, a) = |{i | a, i, dmin, dmax Q}| and a set of pairs each comprising an action, and the number of times that action is executing in a state: exec(Q) = { a, num exec(Q, a) | num exec(Q, a) > 0} Recall that the semantics of PDDL 2.1 permit an action to self-overlap (Rintanen 2007) (i.e. a new instance of a can start before a previous instance of a has ended): this is why action a may be executing more than once in the same state. We now deﬁne strong equality as follows:

Deﬁnition 3.2 Strong equality States S = f, v, P, Q, T and S = f , v , P , Q , T , where T and T are temporally consistent, are strongly equal iff f = f , v = v and either (i) Q = Q = ; or (ii) the partial plans P and P are isomorphic. The notation S = S denotes strong equality.

Note that if S = S , and S = S , then trivially, S = S : in the ﬁrst case all queues must necessarily be empty; and in the second case, isomorphism is transitive. Weak equality broadens this:

Deﬁnition 3.3 Weak equality States S = f, v, P, Q, T and S = f , v , P , Q , T , where T and T are temporally consistent, are weakly equal iff f = f , v = v , exec(Q) = exec(Q ). The notation S S denotes weak equality. As with strong equality, weak equality is trivially transitive. Additionally, it is a strict relaxation of strong equality: (S = S ) (S S ). For the two cases of strong equality:

The ﬁrst case is a restriction of weak equality: if Q = Q = then exec(Q) = exec(Q ) = .

In the second case, if P and P are isomorphic they must contain the same actions; so if P contains a more times than a then so does P , so trivially exec(Q) = exec(Q ).

3.3 Metastates We now make use of strong and weak equality to deﬁne a metastate-space over which to search; and a search algorithm to do this. We deﬁne a metastate as follows:

Deﬁnition 3.4 Metastate A metastate M is a tuple Σ, Π, Γ, ρ, ex, re, c re , where:

Σ is a list of member states [σ0..σn], that are pairwise weakly equal.

Π is a list of parent metastates, and the action applied to reach M from the parent [ M0, a0 .. Mn, an ].

Γ, a list of child metastates and the action applied to reach them [ M0, a0 .. Mn, an ].

q { , } is a boolean ﬂag set to iff the metastate is queued for expansion.

ex Z+ 0 , a counter of how many of Σ have been explicitly expanded.

re { , } is a boolean ﬂag set to iff either M has not yet been expanded; or the most recent expansion of M was partial due to one or more successors being inconsistent according to the temporal constraints.

c re Z+ 0 counts how many of the children of M could in principle lead to a larger reachable search space if given an additional member.

In this deﬁnition, Σ records the states in a metastate, and Π and Γ deﬁne the structure of the metastate space: if there is an edge in the metastate space between two metastates M and M labeled with the action a, then M , a M.Γ and M, a M .Π. The other entries in the tuple are bookkeeping information to support search, we will explain the meaning of these and how they are used later.

3.4 Searching with Metastates: Overview We begin with a high-level overview of search:

a) A search queue of metastates is initialized to the metastate containing the initial state.

Expanded=[A,B,D,C]

Expanded=[A,B,C]

Open = [E,D]

Figure 2: Example metastate spaces, with the order in which metastates were expanded, and their open lists. The letter denotes the truck location; the number is |M.Σ|.

b) At each iteration, a metastate is popped from the search queue and expanded. The expansion of a metastate M expands one of its member states; speciﬁcally, σM.ex, the state in M.Σ with index M.ex.

c) Search aims to expand each metastate only once; however, to ensure completeness there are two cases in which a metastate must be re-expanded:

i If there was an action that was logically applicable in the last-expanded member of M, but which led to a temporally inconsistent child state (i.e. if M.re). It is necessary to consider other member states of M as they have different temporal constraints and hence may lead to a temporally consistent child. ii If one of the children of M, transitively, needs to be reexpanded. In this case, we must consider other member states of M as expanding them will add new members to the children of M, supporting their re-expansion.

d) Search attempts to avoid explicitly generating all the member states of a metastate. New members are only explicitly generated under one of two conditions:

i If search reaches a state S that is weakly equal to the states in an existing metastate M, S is added to M. ii If a metastate needs to be re-expanded, c (i), but all the members of M have already been expanded, a traversal back via the parents of M (M.Π) is used to generate additional member states for M. If no new members can be found the c re values of M s ancestors are incremented, so that if they acquire new members they are re-expanded as per c (ii).

To illustrate why re-expansion of metastates or generation of new metastate members is sometimes necessary, two example metastate spaces are depicted in Figure 2. These are based on the Driverlog Shift example (Figure 1) and the task is to reach location E and unload a package. For simplicity we assume that in Metastate A (where search begins in this example) the package and driver are already in the truck and the shift action has started. Further, we only show states reachable by applying move actions: the letter representing each metastate corresponds to the location of the truck. Each state-space demonstrates a different search scenario:

In Figure 2(a), the metastate C has two members corresponding to the states reached by plans ABC and ADC.

Algorithm 1: Memoise

Data: memoised, a set of all metastates generated; S a new state; MP, a , the metastate parent of S and action applied to reach it Result: enqueue, a list of metastates to subsequently enqueue

1 if M memoised | M.σ0 S then

2 return Add Member To Meta State(M, S, MP, a )

4 M [S], [ MP, a ], [], , 0, , 0 ;

5 memoised memoised M;

6 return [M]

Metastate E, however, only has one member as when C was expanded, only its ﬁrst member (reached via ABC) was used, generating one state in E for the plan ABCE. The metastate E is then popped from the open list and expanded. The unload action needed here cannot be applied, due to the shift action constraining how much time can pass: this is because the long route to E, via B, was taken. E must now be re-expanded (c (i)) but it has no other member states. Thus, a traversal back via the parents of E attempts to generate additional members for E (d (ii)). This traversal ﬁnds the second member state of C (reached via ADC); appends move CE to the plan to yield another state weakly equal to the existing member of E; then enqueues E for re-expansion. This state will then be expanded and successfully reach the goal in time.

In Figure 2(b), B has been expanded, but D has not. Thus, C has only one member, and when E is unsuccessfully expanded, a traversal back via its parents (d (ii)) cannot generate any additional members, as they all have only one member state. Hence, the c re values in E s ancestors (C, B and A) are incremented to note that they have a child that needs to be re-expanded. When D is popped from the open list, the state generated (ADC) becomes an additional member of C (d (i)). C will then be put on the open list as C.c re>0 (c (ii)); C is re-expanded generating a new member for E (ADCE), putting E on the open list as E.re= (c (i)); E is re-expanded, using this new member, leading to successful application of unload .

3.5 Searching with Metastates: Algorithms Algorithm 1 deﬁnes how state memoisation is performed with reference to the metastate space. For each state S encountered in search, ﬁrst a check is made at line 1 to see if there is an existing metastate whose members are weakly equal to S. If there is, Add Member To Meta State is used to update the metastate. In the simpler case, if there is no such metastate, then a new one is created, containing just S. Algorithm 2 deals with adding a new member S to a metastate M. At line 1 we check whether an existing member is strongly equal to S if it is, S does not need to be kept. Otherwise, it is added to the members of the metastate. As our aim is to avoid redundant expansion of the search space, when a new member has been added to a metastate,

Algorithm 2: Add Member To Meta State

Data: M, a metastate; S, a new weakly-equal member state; MP, a , the metastate parent of S and action applied to reach it Result: enqueue, a list of zero or one metastates to subsequently enqueue

1 if S M.Σ | S = S then return [];

2 M.Σ M.Σ + [S];

3 if MP, a M.Π then M.Π M.Π + [ MP, a ];

4 if M.q then return [];

5 if M.re M.c re > 0 then

6 M.ex |M.Σ| 1; M.q ; return [M];

7 return []

M.q is then checked to see if M is already queued for expansion: if it is, it need not be queued again. Otherwise, if it is not queued, then the metastate is returned to be enqueued only if one of two conditions holds (e (i)/(ii) Section 3.4):

if M.re, then the last expansion of M i.e. an expansion based on a pre-existing member state did not lead to a temporally consistent child for each logically applicable action. In this case, expanding M again will derive children from the new member state.

if M.c re, then the last expansion of M did lead to a temporally consistent child for each logically applicable action, but one or more of these when expanded either did not lead to a temporally consistent child for each logically applicable action, or is the ancestor of such a metastate. In this case, because there is a new member for M, expanding it again will yield new members for its children.

Algorithm 3 presents search itself: a priority-queue-based search, using a heuristic function. We use WA* with W=5; and a temporal RPG heuristic, as in OPTIC. At its core (lines 21 26) is an ordinary forward-search expansion loop, generating one successor per applicable action; keeping it if it is returned by the Memoised function (Algorithm 1). There are three possible reasons a metastate M could be on the open list. First, if either M.re = i.e. if the successors of the previous expansion of M were not all temporally consistent (Section 3.4 e (i)), or if M has not yet been expanded. The aforementioned forward-search expansion loop is used to expand the metastate in these ﬁrst two cases. The ﬁnal case is when c re>0 (Section 3.4 f (ii)): one or more of M s children needs to be re-expanded but needs additional member states before this can occur; and by reexpanding M we will generate an additional member for this child. This is handled by a selective state-expansion, that only generates member states for the child metastates that need them (lines 9 18). If M.re= , this selective loop alone is sufﬁcient, as the re and c re values in the child metastates record which need additional members: child metastates reached by the other applicable actions can be ignored. Following expanding the current member S of the current metastate M, lines 27 30 determine whether M then needs to be added to the open-list again for re-expansion. This occurs either because when M was expanded some of

Algorithm 3: Search

Data: The initial state I Result A solution plan

1 IM [I], [], [], , 0, , 0 ;

2 memoised {IM };

4 while Q is not empty do

5 M pop the next metastate from Q;

6 S M.σM.ex;

7 M.q ; M.ex M.ex + 1;

8 enqueue [];

9 if M.c re > 0 then

10 foreach Mi, ai M.Γ do

11 if Mi.q then continue;

12 if Mi.re Mi.c re > 0 then

13 Si apply ai to S;

14 if S .T is temporally consistent S Mi.Σ | S = Si then

15 Mi.Σ Mi.Σ + [Si];

17 enqueue enqueue + [Mi];

18 M.c re M.c re 1;

19 if M.re then

21 foreach a that is logically applicable in S do

22 S apply a in S;

23 if S .T is temporally consistent then

24 if S is a goal state then return S ;

25 enqueue enqueue + Memoise(memoised, S , M, a );

26 else M.re ;

27 if M.re (M.c re > 0) then

28 if M.ex = |M.Σ| then M.Σ M.Σ + Find Another Member(M, M, []) ;

29 enqueue enqueue + [M];

31 foreach M enqueue do

32 while M .ex < |M .Σ| do

33 h M heuristic evaluation of M .σM .ex;

34 if h M = then

35 insert M into Q with h-value h M ;

37 M .ex M .ex + 1;

38 if M .ex = |M .Σ| then M .σ M .Σ + Find Another Member(M , M , []) ;

39 if M .ex = |M .Σ| then

41 Increment Parent CREValues(M );

42 return problem unsolvable;

its successors were not temporally consistent (i.e. M.re was changed back to true at line 26) (Section 3.4 e (i)) or because it has at least one child that still would beneﬁt from

Algorithm 4: Find Another Member

Data: MC, a metastate; MT, a metastate; tail, the plan from MC to MT Result: enqueue, a list of 0 or 1 metastates to subsequently enqueue

1 foreach Mi, ai MC.Π do

2 foreach S Mi.Σ do

3 S apply ([ai]+tail) to S;

4 if S .T is temporally consistent and S MT.Σ | S = S then

5 MT.Σ MT.Σ + [S ];

7 return [MT];

8 rc Find Another Member(Mi, MT, [ai]+tail);

9 if rc = then return r;

10 return [];

an additional member state being generated (Section 3.4 e (ii)). The complication here is what to do in the case where there are no more members of M left for expansion, i.e. if M.ex = |M.Σ| (Section 3.4 f (ii)). In this case, a helper function traverses back from M to its parents, to generate additional candidate members of M from the members of its parents. This is presented in Algorithm 4 this recursively generates candidate plans for reaching M from a parent, and if one is found that is temporally consistent, and not strongly equal to an existing member of M, then it is returned to be added as a member to M. This corresponds to the case in Figure 2(a): when the metastate E is expanded for the ﬁrst time, unload could not be applied without violating the temporal constraints; thus, Find Another Member would be used to look for another member state for E. When traversing back to its parent, C, and looping over its two members (ABC and ADC), extending the ﬁrst (to yield ABCE) would duplicate the existing member of E, but extending the second (to yield ADCE) would produce a new member for E, supporting its re-expansion. Of course, Find Another Member cannot always generate a new member in which case it returns an empty list. The ﬁnal loop in Algorithm 3 at lines 31 41 considers each metastate M that is to be enqueued on Q. In the nominal case, a member state M .σM .ex is heuristically evaluated, and its heuristic value is used when placing M in Q. There are two caveats though. First, if M .σM .ex is a dead-end, one of the other members of M (if there is one) may not be; hence, M .ex is incremented, to allow further heuristic evaluation. As this may exhaust the generated members of M , we see again a call to Algorithm 4 to attempt to generate another member state. Second, if no additional members can be generated, it may be that search later ﬁnds another plan that reaches one of the parents of M , from which it is possible to generate another member of M . To preserve completeness, it is important that this can happen. This is achieved through the use of the c re values, as updated by Algorithm 5: these count how many of the immediate children of a metastate are awaiting an ad-

Algorithm 5: Increment Parent CREValues

Data: M , a metastate

1 foreach Mi, ai M .Π do

2 if Mi.c re = 0 then Increment Parent CREValues(Mi);

3 Mi.c re Mi.c re + 1;

ditional member. Referring back to earlier in Algorithm 3, lines 9 18 then use these values to selectively push additional member states down to the appropriate children until, in turn, the new member for M is generated. Finally, we sketch a proof for the completeness of our algorithm. Trivially, any goal states generated are returned (Algorithm 3 line 24). Thus, to be incomplete, there must be a state S from which applying a sequence of snap-actions tail = [a0..an] was the only way to reach a goal state; but S was never expanded. In the simple case, S is the ﬁrst member of a metastate M, and all metastates are queued for expansion at least once; so S would have been expanded. Otherwise, S must have been added as a member of an existing metastate M. In this case, because applying tail to S is posited as being the only way to reach a goal state, there must be a subsequence tail sub = [a0..ai], i n of the tail snap actions that when applied to any other S M.Σ leads to an inconsistent state. Then, either (i) S M.Σ when this occurred, so Algorithm 4 would be used to apply tail sub to S; or (ii) S M.Σ when this occurred, so Algorithm 5 would increment M.c re causing M (and hence S) to be expanded later, as soon as S M.Σ (Algorithm 2 line 5). Thus, regardless of when S was added to M.Σ, tail will be applied to it and hence the goal would be reached.

4 Evaluation To evaluate our metastate search algorithm, we use as a control the state memoisation techniques of Coles and Coles (2016) also implemented in OPTIC. In both cases we use WA* search (with W=5) guided by OPTIC s TRPG heuristic. For evaluation domains, we take International Planning Competition benchmarks and temporally interesting domains from the literature. We exclude any domains where all actions are compression safe. In these domains, during search there will never be any states with open actions; hence, as discussed in Coles and Coles (2016), memoising based on the classical planning notion of state equality is completeness preserving. Moreover, we would only have one member per metastate, as states with no open actions are strongly equal to other states with the same facts and variables, so our search would behave exactly as the control. This leaves 10 domains with required concurrency (Cushing et al. 2007) Cafe and Driverlog Shift (Coles et al. 2009); P2P (Huang et al. 2009); TMS and Turn and Open (IPC2011); the compiled timewindows/deadlines variants of Pipes No-Tankage, Satellite and UMTS (IPC2004) and 5 other IPC domains Rovers (2002); Pipes Tankage (2004); Transport, Elevators and Openstacks (2008). An overview of the performance of our algorithm compared to the control is shown in Figures 3 and 4: where

10 100 1000 10000 100000

With Metastates, Nodes Expanded

Without Metastates, Nodes Expanded

Cafe (99%) Driverlog Shift (92%) Match (98%) TMS (99%) P2P (99%) Turn and Open (1%) Pipes Deadlines Compiled (97%) Satellite TW Compiled (21%) UMTS TW Compiled (97%) Elevators (64%) Openstacks (51%) Pipes Tankage (78%) Rovers (7%) Transport (45%)

Figure 3: Scatterplot comparing nodes expanded

0.01 0.1 1 10 100 1000

With Metastates, Time (s)

Without Metastates, Time (s)

Cafe Driverlog Shift Match* TMS P2P* Turn and Open* Pipes Deadlines Compiled Satellite TW Compiled UMTS TW Compiled Elevators Openstacks* Pipes Tankage Rovers* Transport*

Figure 4: Scatterplot comparing time taken

one conﬁguration failed to solve a problem this is shown as 100,000 nodes or 1800 seconds. Nodes below the plotted line y=x indicate that using metastates improved the performance of the planner; those above that it was worsened. In total 194 problems were solved by at least one conﬁguration, in 73 of these the planner using metastates expanded fewer nodes and only in 6 did it expand more. The only reason the planner without metastates might expand fewer nodes is because the expansion order of our search is not the same as unmodiﬁed WA*, and perturbing expansion order means that the control might happen to reach the goal ﬁrst. The results for time taken to solve problems closely mirror those for nodes expanded, indicating the overheads of the bookkeeping required for metastates is minimal. In terms of coverage, using metastates we can solve 189 problem instances; without we solve only 164. A two-tailed Wilcoxon signed-rank test conﬁrms statistical signiﬁcance that using metastates out-performs the control, both in terms of nodes expanded and time taken to solve problems with P > 0.99. Of our 14 domains we see a performance improvement in 9; while the performance in the other 5 remains stable. We

can gain greater insights into this by examining the domains more closely. The numbers in round brackets in Figure 3 show the percentage of states generated by the control planner that had open (currently executing) actions. Recall that it sufﬁces to use weak equality in states without open actions; both planners exploit this. Therefore, if there are few states with open actions the potential of our technique to improve search is limited. This explains why we see relatively few gains in the Rovers and Turn And Open domains, but again, we also see no signiﬁcant performance degradation, the number of states generated was identical to the control and both planners produced the same plan. Perhaps surprisingly performance also remains the same on two of the domains with required concurrency, Match and P2P. In each of these domains states with open actions arise due to an envelope action (light match/serve ﬁle) inside which actions (mend fuse/download) must be ﬁt. However, whilst the envelope action is open the heuristic easily guides search to perform the relevant activities and therefore very little search is required to solve these problems, and so potential gains are limited. Another domain with similar results is openstacks, as the benchmark set for this problem is trivial and requires very little to no search. The remaining 9 domains where we see performance improvements are generally the more challenging domains and those in which the planner without metastates expands a large number of weakly equal states, when any one of these would allow it to reach a goal state. Many of these problems are solved an order of magnitude faster with metastates, thanks to the reduced size of the metastate search space. A ﬁnal note on solution quality, the planners found plans with the same makespan in 154 of out of the 161 problems that were mutually solved, in the remaining 7 the solutions produced using metastates were slightly longer.

5 Conclusions In this paper, we presented a novel search framework based on a notion of weak equality between the states reached, that allows them to be grouped together into metastates. The motivation is that, typically, only one member of each metastate needs to be expanded. As this is not always the case, though, additional members are expanded on an asneeded basis, where inconsistent states are reached. In the context of temporal planning, where weak equality corresponds to states that differ only in terms of their temporal constraints, our approach reduces the overheads of searching with temporal models. In temporally simple problems, where there cannot be inconsistent states, metastate re-expansion is never needed. In temporally complex problems, some but not all metastates need to be re-expanded, doing this on an as-needed basis allows us to achieve significant performance improvements. Our approach separates logical state information from constraint-based temporal information. The latter represents relationships that must hold among actions in the current partial plan and hence implicitly deﬁnes a set of states and a set of plans. Such a constraint representation need not be limited to temporal relations. In the future, we hope to apply our approach to other expressive planning models in-

cluding hybrid numeric planning, PDDL3 preference models (Gerevini et al. 2009), and planning with richer temporal relations such as interval constraints (Coles et al. 2019).

Acknowledgments This project has received funding from the European Union s Horizon 2020 Research and Innovation programme under Grant Agreement No. 730086 (ERGO); from EPSRC grant EP/P008410/1 (AI Planning with Continuous Non Linear Change); and from the Natural Sciences and Engineering Research Council of Canada.

References Coles, A. J., and Coles, A. I. 2016. Have I Been Here Before? State Memoisation in Temporal Planning. In Proceedings of ICAPS. Coles, A.; Fox, M.; Halsey, K.; Long, D.; and Smith, A. 2009. Managing concurrency in temporal planning using planner-scheduler interaction. Artiﬁcial Intelligence 173(1). Coles, A.; Coles, A.; Martinez, M.; Savas, E.; Delfa, J. M.; de la Rosa, T.; E-Mart ınez, Y.; and Garc ıa-Olaya, A. 2019. Efﬁciently Reasoning with Interval Constraints in Forward Search Planning. In Proceedings of AAAI. Cushing, W.; Kambhampati, S.; Mausam; and Weld, D. 2007. When is temporal planning really temporal planning? In Proceedings of IJCAI. Dechter, R.; Meiri, I.; and Pearl, J. 1991. Temporal constraint networks. Artiﬁcial Intelligence 49. Do, M. B., and Kambhampati, S. 2003. Sapa: Multiobjective Heuristic Metric Temporal Planner. Journal of Artiﬁcial Intelligence Research (JAIR) 20. Eyerich, P.; Mattm uller, R.; and R oger, G. 2009. Using the Context-enhanced Additive Heuristic for Temporal and Numeric Planning. In Proceedings of ICAPS. Fox, M., and Long, D. 2003. PDDL2.1: An extension of PDDL for expressing temporal planning domains. Journal of Artiﬁcial Intelligence Research (JAIR) 20. Gerevini, A. E.; Long, D.; Haslum, P.; Saetti, A.; and Dimopoulos, Y. 2009. Deterministic Planning in the Fifth International Planning Competition: PDDL3 and Experimental Evaluation of the Planners. Artiﬁcial Intelligence. Hoffmann, J., and Edelkamp, S. 2005. The Deterministic Part of IPC-4: An Overview. Journal of Artiﬁcial Intelligence Research (JAIR) 24. Huang, R.; Chen, Y.; and Zhang, W. 2009. An Optimal Temporally Expressive Planner: Initial Results and Application to P2P Network Optimization. In Proceedings of ICAPS. Linares L opez, C.; Celorrio, S. J.; and Olaya, A. G. 2015. The deterministic part of the seventh international planning competition. Artiﬁcial Intelligence 223. Long, D., and Fox, M. 2003. The 3rd International Planning Competition: Results and Analysis. Journal of Artiﬁcial Intelligence Research (JAIR) 20. Rintanen, J. 2007. Complexity of concurrent temporal planning. In Proceedings of ICAPS.