# optimising_partialorder_plans_via_action_reinstantiation__1b04ae26.pdf

Optimising Partial-Order Plans Via Action Reinstantiation

Max Waters , Lin Padgham and Sebastian Sardina RMIT University, Melbourne, Victoria 3000, Australia {max.waters, lin.padgham, sebastian.sardina}@rmit.edu.au

This work investigates the problem of optimising a partial-order plan s (POP) ﬂexibility through the simultaneous transformation of its action ordering and variable binding constraints. While the former has been extensively studied through the notions of deordering and reordering, the latter has received much less attention. We show that a plan s variable bindings are often related to resource usage and their reinstantiation can yield more ﬂexible plans. To do so, we extend existing POP optimality criteria to support variable reinstantiation, and prove that checking if a plan can be optimised further is NP-complete. We also propose a MAXSAT-based technique for increasing plan ﬂexibility and provide a thorough experimental evaluation that suggests that there are beneﬁts in action reinstantiation.

1 Introduction This work investigates the modiﬁcation of objects and resources in a plan to achieve higher ﬂexibility w.r.t. action orderings. While the optimisation of action ordering has been extensively studied through the notions of plan deordering and plan reordering, both from a theoretical [B ackstr om, 1998; Aghighi and B ackstr om, 2017] and practical [Kambhampati and Kedar, 1994; Muise et al., 2016; Siddiqui and Haslum, 2012; Say et al., 2016] perspective, the optimisation of resources (or actions domain objects) that are used in the course of executing a plan has received much less attention. Ultimately, the aim is to achieve a least commitment approach to planning [Weld, 1994] that maximises executiontime ﬂexibility by delaying decisions regarding action ordering and resource utilisation for as long as possible. Consider a planning instance from (a reduced version of) the IPC rovers domain, in which a ﬂeet of rovers must navigate the surface of a planet, collecting soil and rock samples. The domain objects comprise two rovers (R1 and R2) and three waypoints (W1, W2, and W3). There are soil and rock samples at waypoints W2 and W3, resp., and both rovers begin at W1. The goal is to gather both samples. Plan P1 in Figure 1a is an optimal solution: R1 navigates from W1 to W2 and collects the soil sample, then navigates to W3 and collects the rock sample. An examination of the causal structure of P1 shows that

1. move1(R1, W1, W2) 2. get-soil(R1, W2) 3. move2(R1, W2, W3) 4. get-rock(R1, W3)

(a) Plan P1.

1. move1(R1, W1, W2) 2. get-soil(R1, W2) 3. move2(R2, W1, W3) 4. get-rock(R2, W3)

(b) Plan P2.

O = { move1(x1, x2, x3) θ = { x1\R1, x2\W1, x3\W2 get-soil(x4, x5) x4\R1, x5\W2 move2(x6, x7, x8) x6\R2, x7\W1, x8\W3 get-rock(x9, x10) } x9\R2, x10\W3 } = { move1 get-soil, move2 get-rock }

(c) Partial-order plan P3.

Figure 1: Three plans from the rovers domain. Subscripts have been used to better distinguish between actions of the same type.

the order of its actions cannot be altered without compromising its validity (e.g., step 3 threatens the causal link between steps 1 and 2, as it moves R1 away from W2). Plan P2 in Figure 1b is a modiﬁcation of P1 that instead uses rover R2 to collect the rock sample. This modiﬁcation allows the plan s ordering constraints to be relaxed: the actions can be executed in any order so long as 1 2 and 3 4. This simple example shows that an over-commitment to a particular set of resources (in this case, R1) can reduce a plan s possible orderings. Thus, this paper introduces the notions of reinstantiated deorderings and reinstantiated reorderings, namely, transformations of a plan under which ordering constraints can be removed or arbitrarily modiﬁed, resp., and the plan s actions variable bindings can be changed. The idea is to seek alternative variable bindings supporting greater minimisation of the plan s ordering constraints than is possible if the bindings remained unchanged. More concretely, we deﬁne optimality criteria for partialorder plans (POP) that take variable bindings into consideration (Section 3) and propose a technique for computing optimally reinstantiated POPs by encoding the problem into a MAXSAT instance (Section 4). Finally, we experimentally assess the beneﬁts of reinstantiation by comparing it with existing de/reordering techniques (Section 5). Results show that optimising both bindings and orderings incurs a much greater computational cost than optimising orderings alone. However, when (even non-optimal) reinstantiated de/reorderings can be found, they are signiﬁcantly more ﬂexible than plans produced by standard optimisation methods.

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

2 Preliminaries

2.1 Logical Preliminaries The logical structures will be expressed in a function-free ﬁrst-order language L with ﬁnitely many variable, constant, and predicate symbols, and the usual logical connectives, including equality and precedence [Brachman and Levesque, 2004, Chapter 2]. The letters x, y and z indicate variables, c constants and t, u and v terms (all possibly with annotations). The notation t denotes ordered lists of possibly nonunique terms, with t[i] indicating the i-th element of the list and t1 = t2 being shorthand for | t1 | = | t2 | t1[1] = t2[1] t1[| t1 |] = t2[| t2 |]. For any structure η expressed in L, vars(η) and consts(η) denote the variables and constants appearing in η, resp. A substitution θ is a mapping from variables to terms, for example θ = {x1/t1, . . . , xn/tn} maps each variable xi to term ti, for 1 i n, and every other variable to itself. We use domain(θ) to denote the set of variables explicitly mapped by substitution θ, and θ(x) to denote the term corresponding to variable x under θ. The result of applying a substitution θ to a logical structure η is written ηθ, and means to simultaneously replace every variable x in η with θ(x). A substitution θ is ground if every variable in its domain is mapped to a ground term, and is complete w.r.t. η iff vars(η) domain(θ).

2.2 Classical Planning Formalism Classical planning tasks will be expressed in a standard ﬁrstorder STRIPS formalism [Lifschitz, 1987].

Classical Planning Tasks A classical planning task is a tuple Π = C, O, s I, s G , where C is a set of constants, O is a set of operators, and s I and s G are sets of ground literals describing the initial state and goal, respectively. An operator is a tuple o = name(o), vars(o), pre(o), post(o) , where name(o) is the name, or type of the operator, vars(o) is a list of variables and pre(o) and post(o) are ﬁnite sets of (ground or nonground) literals with variables taken from vars(o). It will be assumed that for all operators, name(o) = o, and so operators will be referred to by name. When distinguishing between different operators of the same name, the notation o( x) is used, where vars(o) = x. An action is a ground operator a = pre(a), post(a) , where pre(a) and post(a) are ﬁnite sets of ground literals. If o is an operator and substitution θ is ground and complete w.r.t. vars(o), then pre(o)θ, post(o)θ is the action resulting from instantiating o with θ.

Classical and Partial-Order Plans A classical plan a is a ﬁnite sequence of actions. Assuming that no variable appears in more than one operator, a plan can also be represented as oθ, where o is a list of operators and θ is a ground substitution that is complete w.r.t. o. A planning task can be embedded into a plan by including the distinguished actions a I and a G whose pre/postconditions are the task s initial state and goal, resp. In this form, a classical plan is valid iff it is executable (i.e., the actions can be applied in sequence without violating their precondition requirements).

A partial-order plan (POP) is a generalised classical plan that need not completely deﬁne the order of its operators: Deﬁnition 1. A partial-order plan (POP) is a tuple P = O, θ, where O is a set of operators, θ is a ground substitution which is complete w.r.t. to O, and is a strict, transitively closed partial order over O. As with classical plans, a planning task is embedded into a POP via the inclusion of the distinguished parameter-free operators o I (which yields the initial state) and o G (which checks the goals). A POP is a compact representation of a set of classical plans, known as its linearisations: Deﬁnition 2. A classical plan o1, . . . , on θ is a linearisation of POP P = O, θ, iff O = {o1, . . . , on} and {ai aj : 1 i < j n}. A POP s validity is deﬁned w.r.t. its linearisations: Deﬁnition 3. A POP is valid iff all of its linearisations are executable. Classical plans are special cases of POPs, and any classical plan o1, . . . , on θ can be expressed as an equivalent partialorder plan {o1, . . . , on}, θ, {ai aj : 1 i < j n} .

The Producer-Consumer-Threat Formalism The producer-consumer-threat formalism (PCT) [B ackstr om, 1998] is typically used to describe a POP s causal structure by identifying which actions produce, consume or threaten which ground literals. Here, it describes how operators produce, consume or threaten (possibly non-ground) literals:

Deﬁnition 4. Let o be an operator and q( t) a literal. Then:

prods(o, q( t)) def = q( t) post(o).

cons(o, q( t)) def = q( t) pre(o).

thrtns(o, q( t)) def = q( t) post(o).

In the ﬁeld of partial-order causal link planning (POCL), a PCT-based notion of POP validity is used that derives from the implied causal dependencies between a POP s operators [Weld, 1994], not the validity of its linearisations. A causal link associates a consumer, (i.e., an operator precondition), with a producer (i.e., an operator postcondition):

Deﬁnition 5. A causal link is a 4-tuple op, q( t), oc, q( u) s.t. prods(op, q( t)) and cons(oc, q( u)). The consumer is supported by the causal link iff it is preceded by, and codesignated with, the producer (i.e., θ( t) = θ( u)). The causal link is threatened iff a codesignated threat (i.e., some ot, q( v) s.t. thrtns(ot, q( v)) and θ( v) = θ( u)) can be ordered between the producer and consumer. A POP P s implicit unthreatened causal links are denoted LP , and deﬁned as follows: Deﬁnition 6. The unthreatened causal links of a POP P = O, θ, is the set LP s.t. op, q( t), oc, q( u) LP iff: 1. op oc,

2. θ( t) = θ( u), and 3. for all ot, q( v) s.t. thrtns(ot, q( v)), either θ( u) = θ( v), ot op or oc ot.

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

A POP P is POCL-valid iff every consumer is supported by an unthreatened causal link:

Deﬁnition 7. A POP P = O, θ, is POCL-valid iff for all oc O and q( u) s.t. cons(oc, q( u)), there exist op O and q( t) s.t. op, q( t), oc, q( u) LP .

This notion of validity is stronger than that in Deﬁnition 3, meaning that a POP might not be POCL-valid despite its linearisations being executable [Kambhampati and Nau, 1996]. However, any such POP can always be made POCL-valid by adding ordering constraints [Bercher and Olz, 2020].

2.3 Partial Weighted MAXSAT The partial weighted maximum satisﬁability (partial weighted MAXSAT) problem is a generalised optimisation variant of SAT that distinguishes between soft clauses, which are assigned a numeric weight, and hard clauses, which are unweighted. The aim of the partial weighted MAXSAT problem is to ﬁnd an interpretation that satisﬁes all hard clauses and maximises the total weight of the satisﬁed soft clauses. The

syntax k φ indicates that clause φ has weight k, and no weight marking indicates a hard clause. For example, the formula 3 (p1 p2) 2 p1 p2 is trivially unsatisﬁable, but the interpretation {p1/ , p2/ } is an optimal solution to the partial weighted MAXSAT problem.

3 Optimality Criteria for Partial-Order Plans

B ackstr om [1998] s seminal work on action ordering and plan ﬂexibility considers two types of modiﬁcation: deordering, in which constraints can be removed but not added, and reordering, which allows both. Both must result in a valid POP:

Deﬁnition 8. Let P = O, θ, and Q = O, θ, be two POPs. Then:

Q is a reordering of P iff they are both valid.

Q is a deordering of P iff it is a reordering and .

Q is a strict deordering of P iff it is a deordering and .

B ackstr om deﬁnes the relative optimality of two POPs by comparing their ordering constraints. A minimal deordering of a POP cannot be relaxed any further without rendering the POP invalid, and a minimum deordering is the smallest of all valid deorderings of a POP. A minimum reordering has the fewest possible ordering constraints while remaining valid:

Deﬁnition 9. Let P = O, θ, and Q = O, θ, be two POPs. Then:

Q is a minimal deordering of P iff Q is a deordering of P and there is no POP R such that R is a strict deordering of Q.

Q is a minimum deordering of P iff Q is a deordering of P and there is no POP R = O, θ, such that R is a deordering of P and | | < | |.

Q is a minimum reordering of P iff Q is a reordering of P and there is no POP R = O, θ, such that R is a reordering of P and | | < | |.

While a minimal deordering of a given POP can be found in polynomial time, deciding whether there exists a de/reordering with fewer than k ordering constraints is NPcomplete, and ﬁnding a minimum de/reordering is NP-hard and cannot be approximated within a constant factor. A key limitation of the optimality criteria above is the assumption that the POP s variable bindings remain static. We thus deﬁne more generalised criteria that accommodate modiﬁcations to a POP s variable bindings. In all cases, the resulting POP must remain both ground (i.e., its variable bindings must be complete and ground w.r.t. O) and valid: Deﬁnition 10. Let P = O, θ, and Q = O, θ , be two POPs. Then: Q is a reinstantiated reordering of P iff both P and Q are valid. Q is a reinstantiated deordering of P iff Q is a reinstantiated reordering of P and . Q is a reinstantiated strict deordering of P iff Q is a reinstantiated deordering of P and . For example, plans P1 P3 in Figure 1 use the same operators, and so they are reinstantiated reorderings of each other, whereas P3 is a reinstantiated strict deordering of P1 and P2. The notion of a least-constrained POP can now be generalised to allow for changes in variable bindings. A minimal reinstantiated deordering of a POP is a reinstantiated deordering that cannot be relaxed any further: no modiﬁcation to the POP s variable bindings will allow for the removal of any ordering constraints. A minimum reinstantiated deordering of a POP has the fewest ordering constraints of all of its reinstantiated deorderings. Of all the possible modiﬁcations to a POP s ordering and binding, a minimum reinstantiated reordering contains the fewest ordering constraints: Deﬁnition 11. Let P = O, θ, and Q = O, θ , be two POPs. Then: Q is a minimal reinstantiated deordering of P iff Q is a reinstantiated deordering of P and there is no POP R such that R is a reinstantiated strict deordering of Q. Q is a minimum reinstantiated deordering of P iff Q is a reinstantiated deordering of P and there is no POP R = O, θ , such that R is a reinstantiated deordering of P and | | < | |. Q is a minimum reinstantiated reordering of P iff Q is a reinstantiated reordering of P and there is no POP R = O, θ , such that R is a reinstantiated reordering of P and | | < | |. For example, plan P3 has two ordering constraints, and as there is no reinstantiation of P1 which allows for fewer than this, P3 is a minimum reinstantiated reordering of P1.

Complexity Results Deciding whether a POP has a reinstantiated de/reordering with fewer than k ordering constraints is NP-complete, and the optimisation problem of ﬁnding a minimum reinstantiated de/reordering cannot be approximated within a constant factor (proofs for minimum reinstantiated deordering are provided, those for reordering are a trivial modiﬁcation):

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

Theorem 1. Given a POP P = O, θ, and an integer k > 0, determining whether there exists a POP Q = O, θ , s.t. Q is a reinstantiated deordering (or reordering) of P and | | < k is NP-complete.

Proof. For membership, guess a POP Q = O, θ , and verify in P-time that Q is valid, and | | < k. Hardness is by reduction from the NP-complete [B ackstr om, 1998] decision problem of minimum deordering, which asks whether P has a deorder with < k ordering constraints. Construct P = O , θ, where o O iff there exists a o O s.t. vars(o ) = vars(o), post(o ) = post(o) and pre(o ) = pre(o) {x = θ(x) : x vars(o )}. As the operators in O have preconditions that ﬁx the variable bindings, P has a deorder with < k ordering constraints iff P has a reinstantiated deorder with < k ordering constraints.

Theorem 2. The problem of ﬁnding a minimum reinstantiated deordering (or reordering) of a POP is not in APX unless NP DTIME(npoly log n).

Proof. Proof is by reductio. Assume that minimum reinstantiated deordering is in APX. Then there must be a function, A, that approximates it within a constant factor. As the reduction in the Theorem 1 proof preserves solutions, A also approximates minimum deordering within a constant factor, which is impossible unless NP DTIME(npoly log n) [B ackstr om, 1998].

Optimality Under POCL-validity Under the additional requirement that the input and optimised POPs be POCL-valid, minimum de/reordering remain NPcomplete (a trivial corollary of Theorem 4.8 in [B ackstr om, 1998]). However, due to the stronger requirements of POCLvalidity there exist minimum POCL-valid de/reorderings that can be further optimised while remaining valid under Deﬁnition 3 [Kambhampati and Nau, 1996]. These complexity and completeness results can be trivially extended to minimum reinstantiated POCL-valid de/reordering.

4 Partial Weighted MAXSAT Encoding

The problem of optimising a POP s ordering can be naturally expressed as an instance of the partial weighted MAXSAT problem (Section 2.3). This approach was introduced by Muise et al. [2016], whose MD and MR encodings transform a POP into MAXSAT instances with optimal solutions corresponding to minimum POCL-valid de/reorderings, resp. This section introduces MRD and MRR: generalised encodings that allow the POP to be reinstantiated. Their optimal solutions therefore correspond to minimum POCL-valid reinstantiated de/reorderings, resp. They also optimise MD and MR by breaking symmetries and removing propositions and clauses that are not required to enforce plan validity. We note due to the difﬁculty of expressing Deﬁnition 3 into MAXSAT, all four encodings instead preserve plan validity with the POCL requirements in Deﬁnition 7. While easier to encode, this raises the possiblity that the resulting optimised POPs contain unnecessary ordering constraints.

4.1 The MD and MR Encodings

MD and MR use two types of propositional variables. If P = O, , θ is the input POP, then for each pair of actions a1, a2 Oθ, a proposition pa1 a2 is introduced to indicate that a1 must precede a2 in the resulting POP. For all ac, ap and q such that ac consumes q and ap produces q, a proposition pap,q,ac is introduced to encode the requirement that in the ﬁnal POP, ap be causally linked to ac with respect to q. Formulae 1 3 ensure that the output POP is acyclic and transitively closed with all actions ordered between the initial state and goal:

a pa a. (1)

a1,a2,a3 pa1 a2 pa2 a3 pa1 a3. (2)

a {a I,a G} pa I a pa a G. (3)

Formulae 4 and 5 encode POCL-validity as in Deﬁnition 7:

ap,ac,q: cons(ac,q), prods(ap,q))

at:at =ac, thrtns(at,q)

pat ap pac at . (4)

ac,q: cons(ac,q)

ap:prods(ap,q) pap ac pap,q,ac. (5)

Formula 6 adds a soft unit clause with the negation of each ordering proposition, meaning that higher weight solutions will have fewer ordering constraints:

1 pa1 a2. (6)

Deﬁnition 12. MR encodes a POP into a partial weighted MAXSAT instance through Formulae 1 6.

The MD encoding extends MR with clauses that disallow any ordering constraints that were not in the input plan, thus forcing the resulting POP to be deordering of the input:

(a1,a2) pa1 a2. (7)

Deﬁnition 13. MD encodes a POP into a partial weighted MAXSAT instance through Formulae 1 7.

4.2 The MRD and MRR Encodings

MRD and MRR allow the input POP to be reinstantiated, and so use different variable types to express the allowable rebindings. Three types are used, as per the following sets:

P contains propositions of the form po1 o2, which indicate that o1 o2 in the resulting POP.

Pθ contains propositions of the form pt=u, which encode a requirement that in the ﬁnal POP θ(t) = θ(u).

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

o I p(1) p(2)

a1(x1) p(x1) q(x1)

a2(x2) p(x2) q(x2)

b(x3, x4) q(x3) q(x4) g o G g

Figure 2: Plan P4 from a synthetic domain. Pre/postconditions are to the left/right of operators, resp., and arrows indicate causal links.

PC contains propositions of the form pop, t,oc, u, indicating that there must be some q s.t. causal link op, q( t), oc, q( u) is unthreatened in the ﬁnal POP. Muise et al. [2016] observed that the cubic number of clauses generated by Formula 2 make MD and MR infeasable for large plans. As this is exacerbated in MRD and MRR by the need to also close the variable equality relation, a number of optimisations are introduced. The ﬁrst removes any ordering or binding constraints (and propositions that encode them) that are not required to preserve causal links. For example, consider Plan P4 in Figure 2. Operator a1 cannot be used to produce any of a2 s preconditions, and nor does it threaten any of its postconditions (nor vice-versa). Therefore, no minimum de/reordering of P4 will require that a1 a2, a2 a1 or that x1 and x2 be (non-) codesignated. MRD and MRR therefore exclude the propositions pa1 a2, pa2 a1, px1=x2 and px2=x1. The second optimisation prevents the encoding of possible causal links if the required ordering constraints are known to be unsatisﬁable. For example, Formula 7 in MD enforces deordering with negated unit clauses. Rather than include propositions that are a priori false, MRD removes the proposition and limits the possible causal links to those where the consumer did not precede the producer in the input POP. Additionally, POPs can contain multiple operators of the same type (e.g., the two instances of a in plan P4), resulting in an exponential number of de/reorderings that are equivalent under permutation of the operators. As is standard [Say et al., 2016; Shlyakhter, 2007], we (partially) break these symmetries by limiting ordering constraints between operators of the same type to those found in an arbitrary linearisation. However, rather than achieve this with additional symmetry breaking clauses, we limit P and PC as above. The deﬁnition below formalises and generalises these ideas. If o1 and o2 are operators and A is a set of allowable orderings, then o1 and o2 are causally related iff o1 A o2 and there is a possible causal or threat relationship between them: Deﬁnition 14. o1, o2, q( t) and q( u) are causally related w.r.t. A, iff o1 A o2 and either: 1. prods(o1, q( t)) and cons(o2, q( u)),

2. thrtns(o1, q( t)) and prods(o2, q( u)), or

3. cons(o1, q( t)) and thrtns(o2, q( u)). The sets of propositional variables can now be deﬁned: Deﬁnition 15. Let P = O, θ, be a POP and A O2 be a set of allowable orderings. Then P , Pθ, and PC are the smallest sets s.t.: po1 o2 P iff either (i) there exists a q( t) and q( u) s.t. o1, o2, q( t) and q( u) are causally related w.r.t. A, or (ii) there exists an o3 s.t. po1 o3, po3 o2 P .

pt=u Pθ iff either (i) there exists a o1, o2, q( t) and q( u) that are causally related w.r.t. A and some i s.t. t[i] = t, u[i] = u, or (ii) there exists a v s.t. pt=v, pv=u Pθ.1

pop, t,oc, u PC iff there exists a q s.t. prods(op, q( t)), cons(oc, q( u)) and op A oc. MRD and MRR encode a POP P = O, θ, and a set of allowable orderings A O2 through the following sets of clauses. Formulae 8 10 ensure that any solution represents an acyclic, transitively closed POP where all operators precede the goal and are preceded by the initial state. They derive from Formulae 1 3 in MR, but have removed any propositions and constraints not required to preserve validity:

po1 o2,po2 o1 P po1 o2 po2 o1. (8)

po1 o2,po1 o3, po2 o3 P

po1 o2 po2 o3 po1 o3. (9)

po I o,po o G P po I o po o G. (10)

Formulae 11 and 12 ensures that the equality relation over variables and constants is symmetric and transitive, and Formula 13 ensures that each variable is bound to exactly one object. They are extensions to MD and MR that ensure the consistency of the POP s variable bindings:

pt=u Pθ pt=u pu=t. (11)

pt=u,pt=v, pu=v Pθ

pt=u pu=v pt=v. (12)

c consts(O): px=c Pθ

c1,c2 consts(O): px=c1,px=c2 Pθ, c1 =c2

px=c1 px=c2 . (13)

Formulae 14 and 15 encode POCL-validity. Formula 14 requires that each consumer be causally linked to at least one producer, and Formula 15 deﬁnes the ordering, binding and threat protection constraints required for a causal link to hold. These derive from Formulae 4 and 5 in MD and MR, but have been optimised to remove redundant propositions, and generalised to allow the POP s variable bindings to change:

oc,q( u): cons(oc,q( u))

pop, t,oc, u PC pop, t,oc, u. (14)

pop, t,oc, u PC pop, t,oc, u

1 i | t| p t[i]= u[i] pop oc

ot,q( v):ot =oc, thrtns(ot,q( v))

p P {pot op,poc ot}

p Pθ {p t[i]= v[i]:1 i | t|}

1Implemented code merges all pt=u and pu=t and updates Formulae 11 and 12 accordingly.

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

Formula 16, an optimised version of Formula 6 in MD and MR, adds soft clauses that minimise the ordering constraints:

1 po1 o2. (16)

The MRD and MRR encodings can now be deﬁned. The allowed orderings for MRD are those from the input plan, forcing the resulting POP to be a deordering of the input:

Deﬁnition 16. MRD encodes a POP P = O, θ, into a partial weighted MAXSAT instance through Formulae 8 16, with the allowable orderings A= .

The MRR encoding only disallows orderings for the purposes of symmetry breaking:

Deﬁnition 17. If P = O, θ, is a POP and is an arbitrary linearisation of P, then MRR encodes P into a partialweighted MAXSAT instance through Formulae 8 16, with the allowable orderings A= O2 \ {(o2, o1) : name(o1) = name(o2), o1 o2}).

5 Experimental Evaluation

While a minimum reinstantiated de/reorder of a POP can never be less ﬂexible than a minimum de/reorder (Deﬁnitions 9 and 11), the question remains whether, in practice, reinstantiation provides any signiﬁcant increase in ﬂexibility. The search for a minimum de/reorder often times out before an optimal (or indeed any) solution has been found [Muise et al., 2016], and allowing variable bindings creates a harder problem with an exponentially larger search space. This evaluation will thus address the question of whether reinstantiated de/reordering can provide more ﬂexibility than standard de/reordering under the same resource constraints.

5.1 Experimental Setup

We answer the question above by comparing optimised plans produced by the Loandra MAXSAT solver [Berg et al., 2019] using the MD, MR, MRD and MRR encodings. We also compare the MAXSAT-based techniques with the explanation-based order generalisation (EOG) deordering technique of Kambhampati and Kedar [1994]. First, a validation structure is constructed: a subset of LP (Deﬁnition 6) that links each consumer to its earliest producer:

Deﬁnition 18. If P = O, θ, is a POP then VP is a validation stucture of P iff VP LP and there exists a total order s.t. if op, q( t), oc, q( u) VP and o p, q( t ), oc, q( u) LP then op o p.

This validation structure serves as a deordering heuristic:

Deﬁnition 19. If P is a POP with validation structure VP , then the explanation-based order generalisation of P w.r.t. VP is the POP P = O, θ, + where o1 o2 iff there exists a q( t), q( u) s.t. either:

1. o1, q( t), o2, q( u) VP ,

2. there exists an o3 and q( v) s.t. o2, q( u), o3, q( v) VP , o1 o2 and thrtns(o1, q( u)), or

P NP-hard Ground Reinstantiated

Deorder Reorder

Figure 3: POP optimisation hierarchy. Arrows lead from less to more general optimisations.

3. there exists an o3 and q( v) s.t. o3, q( v), o1, q( u) VP , o1 o2 and thrtns(o2, q( u)).

While EOG cannot guarantee even a minimal deorder of its input [B ackstr om, 1998], it can be computed in polynomial time and in practice consistently ﬁnds minimum deorderings [Muise et al., 2016]. It thus serves an effective baseline for evaluating the MAXSAT-based optimisation techniques. Figure 3 depicts the hierarchy of (the problems solved by) these algorithms. EOG is in P, all others are NP-hard to compute optimally. MD (minimum deorder) searches for a minimally ordered POP but disallows reorderings and changes to variable bindings. This is relaxed by both MR (minimum reorder), which allows reorderings, and MRD (minimum reinstantiated deorder), which allows bindings to change. The most general is MRR (minimum reinstantiated reorder), which allows both reorderings and changes to bindings. Test cases (i.e., input plans) were generated by giving all ﬁrst-order IPC STRIPS planning instances to three planners. To ensure a variety of plans, three planners of distinct types were used: the novelty-driven best-ﬁrst search planner Dual BFWS [Lipovetzky and Geffner, 2017], the heuristic forwardsearch planner LAMA [Richter and Westphal, 2010] and the SAT planner Madagascar [Rintanen, 2010]. Each (unique) plan was encoded with each encoding2, and the resulting MAXSAT instances were preprocessed with Max Pre [Korhonen et al., 2017] and given to the Loandra MAXSAT solver. Plan generation and encoding/optimisation were both limited to 8GB and 30m at 2.60GHz. Because the optimality criteria in Deﬁnitions 9 and 11 cannot compare POPs with different operators, POP ﬂexibilty is here assessed with the commonly-used ﬂex measure [Nguyen and Kambhampati, 2001; Muise et al., 2016; Siddiqui and Haslum, 2012], which is computed from the proportion of operators that are not (transitively) ordered:

Deﬁnition 20. If P = O, θ, is a POP then ﬂex(P) def = 1 (| |/ P{i : 1 i < | O |}).

A POP s ﬂex ranges from 0 to 1 (a totally-ordered and unordered POP, resp.) and is a strong indicator of the number of linearisations it represents [Muise et al., 2016].

5.2 Results

Results are summarised in Table 1. Domains where no plan was improved by any technique (agricola, pegsol, snake, organic-synth-split, sokoban, visitall) have been excluded.

2Implementation is available at bitbucket.org/max waters/mrr

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

Domain f EOG T C EOG T C EOG T C EOG

airport (115) 0.25 0.13 63% 30% 1.28 28% 0% 1.9 24% 30% barman (74) 0.01 30 41% 2.75 77% 28.79 36% childsnack (90) 0.69 15.77 100% 4% 20.64 100% 6% 29.99 99% 6% data (22) 0.36 6.93 86% 25.53 50% 30 14% depot (63) 0.38 9.85 90% 8% 15.02 71% 28% 26.42 44% 44% driverlog (59) 0.37 4 97% 2% 7.35 78% 17% 15.9 73% 34% elevators (101) 0.43 3.71 65% 6.28 59% 12% 24.09 56% 28% ﬂoortile (65) 0.39 20.2 100% 16.58 98% 30 5% freecell (147) 0.14 1.08 100% 2% 7.07 95% 40% 21.59 56% 94% ged (91) 0.11 9.6 92% 9.7 65% 0% 10.21 60% gripper (60) 0.02 26.33 85% 19% 0.39 100% 0% 25.15 42% 29% hiking (82) 0.13 4.03 94% 6% 8.08 96% 22% 20.96 44% 42% logistics98 (101) 0.64 5.34 67% 3% 20.55 45% 8% 26.92 28% 9% mprime (97) 0.22 0.15 100% 13.11 77% 54% 14.1 78% 60% mystery (51) 0.22 0.06 100% 4.92 96% 44% 8.12 94% 48% nomystery (95) 0.04 2.72 100% 12% 9.22 96% 17.88 60% 39% parcprinter (118) 0.7 0.12 100% 0% 14.63 100% 24.95 99% parking (80) 0 0.86 100% 0% 26.73 35% 29.32 5% pipesworld (89) 0.19 0.85 100% 2% 7.42 91% 74% 21.99 62% 96% rovers (120) 0.67 9.82 83% 1% 16 61% 6% 23.8 49% 12% satellite (100) 0.5 0.94 88% 21.39 66% 4% 25.76 40% 8% scanalyzer (104) 0.37 6.53 97% 8% 20.67 80% 35% 24.97 53% 38% termes (22) 30 5% 8.44 27% 30 0% tetris (79) 0.63 0.78 96% 20.67 49% 6% 25.16 32% 9% thoughtful (40) 0.14 5.54 100% 12.21 100% 22.66 33% tpp (89) 0.34 13.66 66% 13% 19.46 52% 32% 23.11 43% 49% transport (98) 0.34 8.44 70% 18.21 58% 37% 26.84 46% 46% w-working (120) 0.88 0.1 100% 0% 16.65 86% 3% 20.92 73% 3% zenotravel (57) 0.41 2.03 95% 1% 11.63 77% 17% 20.54 65% 21%

ALL (2429) 0.4 6.06 87% 2% 13.52 74% 13% 22.43 52% 20%

Table 1: For each domain, f EOG is the mean ﬂex produced by EOG, and for each encoder, C is the % of plans for which a (satisﬁcing or optimal) solution was found, T is the mean run time in minutes, and EOG is the mean % ﬂex increase over EOG. All ﬂex values are computed from plans for which every encoder found a solution, and all EOG values are statistically signiﬁcant with p < 0.01 as calculated from a single-tailed, paired t-test. Empty cells indicate no data, or no signiﬁcant difference.

Effectiveness of EOG Our results conﬁrm those of Muise et al. [2016] that EOG always ﬁnds a minimum deorder of the input plan, despite a lack of optimality guarantees. Also, EOG takes < 3s in 88% of plans, with nearly 100% coverage. We thus exclude MD from the rest of the discussion. As well as consistently ﬁnding minimum deorders, EOG found a minimum reorder in 69% of plans the plans that were solved optimally by MR, a minimum reinstantiated deorder in 53% of the plans solved optimally by MRD, and a minimum reinstantiated reorder in 45% of the plans solved optimally by MRR. As EOG can neither reorder actions nor rebind variables, this means that in a signiﬁcant number of plans, no reordering or reinstantiation was necessary in order to ﬁnd an optimally ordered POP.

Beneﬁt of Reinstantiation To assess the practical beneﬁt of reinstantiation, we compare the coverage, run time and ﬁnal ﬂex of MRD and MRR with EOG and MR. To allow a meaningful comparison, mean ﬂex values are computed from plans for which a solution was found by all encoders. Overall, MRR provides a 20% ﬂex increase over EOG.3 This contrasts with the 3% ﬂex increase provided by MR,

3All stated ﬂex differences are statistically signiﬁcant with p < 0.01 as calculated from a single-tailed, paired t-test.

Encoding Size n P C n P C n P C

0 10.0 104 1,197 100% 358 100% 268 100% 10.0 104 2.0 105 210 100% 221 100% 180 99% 2.0 105 4.0 105 181 100% 241 100% 213 94% 4.0 105 8.0 105 154 99% 265 95% 248 85% 8.0 105 1.6 106 167 99% 250 89% 259 68% 1.6 106 3.2 106 112 85% 269 82% 241 44% 3.2 106 6.4 106 82 30% 202 57% 220 27% 6.4 106 1.3 107 33 15% 211 50% 229 14% 1.3 107 2.6 107 6 0% 119 38% 206 15% 2.6 107+ 1 0% 7 14% 79 0%

Table 2: Plan count (n P ) and coverage (C) by encoding size (in number of clauses) for MR, MRD and MRR. Note that the encoding size is a log scale, each size range being twice that of the previous.

meaning that reinstantiation has yielded a further ﬂex increase of 17%. This improvement is consistent: MRR times out before ﬁnding as ﬂexible a solution as EOG in < 1% of the 1260 plans solved by both, with similar results for MR. MRR improves on EOG in 24 domains. The largest differences are in freecell and pipesworld, where MRR provides ﬂex increases of 94% and 96%, resp., while MR provides 2%, and mprime and transport, where MRR provides increases of 60% and 46%, resp., and MR provides no signiﬁcant beneﬁt. However, this additional ﬂexibility comes at signiﬁcant computational cost. MRR has a mean coverage and run time of 52% and 22.43m, resp., with coverage dropping signiﬁcantly in ﬂoortile (5%), parking (5%) and termes (0%). The less general MRD approach provides less additional ﬂex at less computational cost. With a mean coverage and run time of 74% and 13.52m, resp., it provides an overall ﬂex increase of 13% over EOG and statistically signiﬁcant increases in 21 domains, pipesworld being the highest (74%). Interestingly, the decrease in coverage is not simply due to larger MAXSAT formulae. Table 2 shows that, as expected, MRD and MRR generate the largest encodings, and coverage always decreases with encoding size. However, within each size range, the encoders coverages differ signiﬁcantly (e.g., MRD and MR always have more coverage than MRR), suggesting that the problem is the formulae s structure, not size. A likely cause is so-called object symmetries: combinations of differently labelled but functionally equivalent domain objects (e.g., rovers R1 and R2 in Figure 1). Their presence can result in a small increase in formula size but an exponential number of equivalent (non-) solutions.

Inﬂuence of Planner Finally, we note that the source of the input plan inﬂuences plan ﬂexibility. Madagascar produces POPs, more speciﬁcally parallel plans where each step is a set of unordered actions. Interestingly, EOG can still increase their initial mean ﬂex of 0.08 to 0.5 by removing unnecesary ordering constraints produced by the division of the plan into steps. MRR further improves on this by 15%. Over the sequential (ﬂex = 0) plans generated by Dual-BFWS and LAMA, EOG gives a mean ﬂex of 0.38 and 0.36, resp., which is improved upon by MRR by 30% and 14%, resp. Thus, while Dual-BFWS beneﬁts most from reinstantiation, optimised Madagascar plans are the most ﬂexible.

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

6 Discussion

This paper presented a practical technique for plan optimisation through the modiﬁcation of both ordering and variable binding constraints. Results show that in 52% of cases, MRR provides a ﬂex increase of 20% over the EOG baseline in 22m, with results varying by domain. However, as reﬂected by the low coverage of MRR, this comes with considerable commputational cost. Additionally, in a signiﬁcant number of cases, the additional computation simply reveals that the original bindings were already optimal. Whether MRR is preferable to less costly methods such as EOG depends on the application. If ofﬂine preprocessing time is available and execution-time ﬂexibilty is paramount, or the application domain is one where reinstantiation can quickly and consistently improve plan ﬂexibility (e.g., mystery), then reinstantiation is clearly worthwhile. This work generalises the POP optimality deﬁnitions of B ackstr om [1998] and optimisation techniques of Muise et al. [2016] to allow changes in variable bindings. Other POP optimisation techniques have been studied. Muise et al. present extensions to MD and MR that also minimise the POP s size by removing any actions without dependent consumers (a problem shown to be NP-complete by Olz and Bercher [2019]), and Bercher and Olz [2020] study the theoretical aspects of modifying a POP s orderings in order to minimise its makespan. Siddiqui and Haslum [2012] s block decomposition partitions a POP s actions into macrooperator blocks. The implicit requirement that blocks not be interleaved means that this approach can sometimes deorder plans that cannot be deordered by standard methods. Say et al. [2016] present MILP models for optimising POP size and ﬂexibility. They avoid the cubic size of Formula 2 by using variables to represent action start and end times, resulting in signiﬁcant efﬁciency gains over MD and MR. Unlike the work presented here, the above techniques do not allow variable bindings to change. Thus, further work could investigate whether generalising these techniqes by allowing reinstantiation results in more ﬂexible block decompositions or further reductions in plan size or makespan, and whether a MILP encoding of MRD and MRR can provide an improvement in execution time. More general optimisation approaches relax a plan into a partial plan, which need not completely specify orderings or bindings. While a POP s linearisations are different orderings of the same set of actions, a partial plan deﬁnes of a set of classical plans that differ in both their ordering and variable bindings. Kambhampati and Kedar [1994] generalise EOG, and keep only the ordering and binding constraints needed to enforce POCL validity. Waters et al. [2018] show that partial plan constraints are intractable, limiting their execution-time use, and so present an algorithm that searches for partial plans that can be instantiated in polynomial time. In Section 5, we suggest that object symmetries prevent effective reinstantiation. As there is much work on symmetry in planning [Joslin and Roy, 1997; Riddle et al., 2016; Fox and Long, 1999; Rintanen, 2003; Pochter et al., 2011], symmetry breaking in the context of plan optimisation is an interesting area for future work.

References [Aghighi and B ackstr om, 2017] Meysam Aghighi and Christer B ackstr om. Plan reordering and parallel execution - A parameterized complexity view. In Proc. of AAAI, pages 3540 3546, 2017. [B ackstr om, 1998] Christer B ackstr om. Computational aspects of reordering plans. Journal of Artiﬁcial Intelligence Research, 9:99 137, 1998. [Bercher and Olz, 2020] Pascal Bercher and Conny Olz. POP POCL, right? Complexity results for partial order (causal link) makespan minimization. In Proc. of AAAI, 2020. [Berg et al., 2019] Jeremias Berg, Emir Demirovic, and Peter J. Stuckey. Core-boosted linear search for incomplete Max SAT. In Proc. of the 16th International Conference on the Integration of Constraint Programming, AI, and Operations Research, pages 39 56, 2019. [Brachman and Levesque, 2004] Ronald Brachman and Hector Levesque. Knowledge Representation and Reasoning. Morgan Kaufmann Publishers, USA, 2004. [Fox and Long, 1999] Maria Fox and Derek Long. The detection and exploitation of symmetry in planning problems. In Proc. of IJCAI, pages 956 961, 1999. [Joslin and Roy, 1997] David Joslin and Amitabha Roy. Exploiting symmetry in lifted CSPs. In Proc.of AAAI, page 197 202, 1997. [Kambhampati and Kedar, 1994] Subbarao Kambhampati and Smadar Kedar. A uniﬁed framework for explanationbased generalization of partially ordered and partially instantiated plans. Artiﬁcial Intelligence, 67(1):29 70, 1994. [Kambhampati and Nau, 1996] Subbarao Kambhampati and Dana S. Nau. On the nature and role of modal truth criteria in planning. Artiﬁcal Intelligence, 82(1-2):129 155, 1996. [Korhonen et al., 2017] Tuukka Korhonen, Jeremias Berg, Paul Saikko, and Matti J arvisalo. Max Pre: An extended Max SAT preprocessor. In Proc. of the 20th Int. Conference on Theory and Applications of Satisﬁability Testing, volume 10491 of LNCS, pages 449 456. Springer, 2017. [Lifschitz, 1987] Vladimir Lifschitz. On the semantics of STRIPS. In Michael Georgeff and Amy Lansky, editors, Reasoning about Actions and Plans, pages 1 9. Morgan Kaufmann, San Mateo, CA, 1987. [Lipovetzky and Geffner, 2017] Nir Lipovetzky and Hector Geffner. Best-ﬁrst width search: Exploration and exploitation in classical planning. In Proc. of AAAI, pages 3590 3596, 2017. [Muise et al., 2016] Christian Muise, J. Christopher Beck, and Sheila A. Mc Ilraith. Optimal partial-order plan relaxation via Max SAT. Journal of Artiﬁcial Intelligence Research, 57:113 149, 2016. [Nguyen and Kambhampati, 2001] Xuan Long Nguyen and Subbarao Kambhampati. Reviving partial order planning. In Proc. of IJCAI, pages 459 464, 2001.

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

[Olz and Bercher, 2019] Conny Olz and Pascal Bercher. Eliminating redundant actions in partially ordered plans - A complexity analysis. In Proc. of ICAPS, pages 310 319, 2019. [Pochter et al., 2011] Nir Pochter, Aviv Zohar, and Jeffrey S. Rosenschein. Exploiting problem symmetries in statebased planners. In Proc. of AAAI, 2011. [Richter and Westphal, 2010] Silvia Richter and Matthias Westphal. The LAMA planner: Guiding cost-based anytime planning with landmarks. Journal of Artiﬁcial Intelligence Research, 39(1):127 177, 2010. [Riddle et al., 2016] Pat Riddle, Jordan Douglas, Mike Barley, and Santiago Franco. Improving performance by reformulating PDDL into a bagged representation. In Proc. of the 8th Workshop on Heuristics and Search for Domainindependent Planning, pages 28 36, 2016. [Rintanen, 2003] Jussi Rintanen. Symmetry reduction for SAT representations of transition systems. In Enrico Giunchiglia, Nicola Muscettola, and Dana S. Nau, editors, Proc. of ICAPS, pages 32 40, 2003. [Rintanen, 2010] Jussi Rintanen. Heuristics for planning with SAT. In David Cohen, editor, Proc. of CP, pages 414 428, 2010. [Say et al., 2016] Buser Say, Andr e A. Cir e, and J. Christopher Beck. Mathematical programming models for optimizing partial-order plan ﬂexibility. In Proc. of ECAI, pages 1044 1052, 2016. [Shlyakhter, 2007] Ilya Shlyakhter. Generating effective symmetry-breaking predicates for search problems. Discrete Applied Mathematics, 155(12):1539 1548, 2007. [Siddiqui and Haslum, 2012] Fazlul Hasan Siddiqui and Patrik Haslum. Block-structured plan deordering. In Proc. of the 25th Australasian Joint Conference on Advances in Artiﬁcial Intelligence, pages 803 814, 2012. [Waters et al., 2018] Max Waters, Bernhard Nebel, Lin Padgham, and Sebastian Sardi na. Plan relaxation via action debinding and deordering. In Proc. of ICAPS, pages 278 287, 2018. [Weld, 1994] Daniel S. Weld. An introduction to least commitment planning. AI Magazine, 15(4):27 61, 1994.

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)