# relaxed_existsstep_plans_in_planning_as_smt__d9919a1f.pdf Relaxed -Step Plans in Planning as SMT Miquel Bofill, Joan Espasa, Mateu Villaret University of Girona, Spain. {miquel.bofill, joan.espasa, mateu.villaret}@udg.edu Planning Modulo Theories (PMT), inspired by Satisfiability Modulo Theories (SMT), allows the integration of arbitrary first order theories, such as linear arithmetic, with propositional planning. Under this setting, planning as SAT is generalized to planning as SMT. In this paper we introduce a new encoding for planning as SMT, which adheres to the relaxed relaxed -step (R2 -step) semantics for parallel plans. We show the benefits of relaxing the requirements on the set of actions eligible to be executed at the same time, even though many redundant actions can be introduced. We also show how, by a Max SMT based post-processing step, redundant actions can be efficiently removed, and provide experimental results showing the benefits of this approach. 1 Introduction The possibility of several actions being planned at the same time step, i.e., the notion of parallel plans, is crucial for planning as satisfiability approaches [Rintanen et al., 2006], in which the feasibility of a plan in a given number of time steps is encoded as a SAT formula; the greater the number of time steps, the bigger the formula. Hence, parallel plans may help improve the performance of the planner because they enable the reduction of the time horizon. Pioneering work on parallel plans [Kautz and Selman, 1996] was based on the so-called -step semantics, which requires that parallel actions can be executed sequentially in any order, and the same terminal state is reached. Therefore, parallel application of actions is allowed only if they do not interfere in any way. The -step semantics, proposed in [Dimopoulos et al., 1997] and further developed in [Rintanen et al., 2006], is less restrictive since it only requires that parallel actions can be executed in some order. This relaxation allows much more parallelism. Another improvement is the relaxed -step semantics [Wehrle and Rintanen, 2007], which allows actions to be scheduled in parallel in a state s even if not all of them are applicable in s. Basically, those actions preconditions will be satisfied by the effects of other previously executed actions, when serialized. In the relaxed relaxed -step (R2 -step) semantics [Balyo, 2013] the application of action effects is relaxed similarly to precondition requirements in the relaxed -step semantics. Therefore the only requirement in the R2 -step semantics is that parallel actions can be ordered to form a valid sequential plan. From the -step semantics and its subsequent relaxations, actions planned in a time step cannot be executed in parallel anymore, but the terms parallel plan and parallel step are used anyway. In this work we go a step further in the pursuit of parallelism in the context of planning as SMT. Inspired by the highly relaxed semantics of [Balyo, 2013] for planning as SAT, we introduce a non-trivial encoding that lifts the R2 - step semantics to SMT. The rest of the paper is structured as follows. In Section 2 we introduce PMT problems, together with the notation used throughout the paper. In Section 3 we present our encoding for planning as SMT, that adheres to the R2 -step semantics. In Section 4 we show how the redundant actions occurring in the resulting plans can be easily removed. In Section 5 we provide experimental results showing a dramatic decrease in the number of time steps needed to reach a valid plan in many numeric domains, compared to other similar exact (non-heuristic) planners. Since the presented encoding is parametric to a given order between actions (which governs how, given a parallel plan, actions are serialized and executed) in the experimental section we give some insights on the impact of this order on the efficiency, and some hints on suitable orders. Finally, in Sections 6 and 7 we explain the connections of this work to similar ideas and present some conclusions. 2 Planning Modulo Theories We follow the notation defined in [Gregory et al., 2012] for Planning Modulo Theories (PMT). A state is a valuation over a finite set of variables X, i.e., an assignment function, mapping each variable x X to a value in its domain, Dx. A state space for a set of variables X is the set of all valuations over X. In a state space modulo T, T is a theory defining the domains of the state space variables and interpretations for the constants, functions and predicates. Definition 1 (Action). An action a, for a state space S modulo T, is a state transition function, comprising: A first-order sentence over S modulo T, Prea (the precondition of a). Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) A set Effa (the effects of a), of assignments to a subset of the state variables in S, each setting a distinct variable to a value defined by an expression over S modulo T. An action a, for a state space S modulo T, is applicable (or executable) in a state s S if T, s |= Prea (that is, the theory together with the valuation s satisfies the precondition of a). We represent actions a as pairs Prea, Effa , with the effects Effa often written as a substitution σa = {x1 7 exp1, . . . , xn 7 expn}, where expi is an expression that defines the value of variable xi in the resulting state, for each i in 1..n (e.g. x 7 x + k, for increasing a numeric variable x by k). We use and to denote the Boolean true and false values, respectively. Given an expression e and a substitution σ, by eσ we will denote the expression e after applying the substitution σ, i.e., after simultaneously replacing the variables in e by the expressions indicated in the substitution. Making abuse of notation, we will talk of a substitution as an assignment. Following the application of a, the state is updated by the assignments in Effa to the variables that they affect, leaving all other variables unchanged. We denote the unique state resulting from applying an action a, in a state s in which is applicable, by appa(s). For any given sequence of actions a1; a2; . . . ; an we define appa1;a2;...;an(s) as appan( appa2(appa1(s)) ). Definition 2 (Planning Modulo Theory). A Planning Modulo T problem, for a theory T, is a tuple π = S, A, I, G where: S is a state space in which all variable domains are defined in T, A is a set of actions for S modulo T, I is a valuation in S (the initial state), and G is a first order sentence over S modulo T (the goal). A (sequential) plan for π is a sequence of actions a1; . . . ; an such that, for all i in 1..n, ai is applicable in state si 1 and si is the result of applying ai to si 1, where s0 = I and T, sn |= G. 3 Encoding Plans under the R2 -Step Semantics in PMT The notion of R2 -step plan in the context of PMT is the following. Definition 3 (R2 -Step Plan). Given a set of actions A and an initial state I, for a state space S modulo T, a relaxed relaxed -step (R2 -step) plan for A and I is a sequence P = [A0, . . . , Al 1] of sets of actions together with a sequence of states s0, . . . , sl (the execution of P), for some l 0, such that s0 = I, and for all i {0, . . . , l 1} there is a total ordering a1 < < an of Ai, such that T, appa1;...;aj 1(si) |= Preaj for all aj = Preaj, Effaj Ai, and appa1;...;an(si) = si+1. This is a weakening of the definition of relaxed -plan in [Wehrle and Rintanen, 2007], since we remove the consistency requirement between effects of actions occurring at the same time step and, hence, the only requirement left is that those actions can be ordered to form a valid sequential plan. The definition also generalizes to the setting of PMT. It is equivalent to the definition of -step plan in [Bofill et al., 2016b], as well as to the definition of -step plan in [Rintanen et al., 2006] for the propositional case, which already captures R2 -step plans. In those works, however, the encodings given for -step plans are restricted to happenings which require, among other things, that preconditions of actions in each parallel step hold at the same time. On the contrary, here we are properly considering R2 -step plans in the sense of [Balyo, 2013]. Notice however that no formal definition of R2 -step plan is given in [Balyo, 2013]. Let us introduce a motivating example. Example 1. A merchant is looking to amass a small fortune in the fastest way. We will use the variable x to represent his gains. Suppose the merchant starts with no money (x = 0) and can perform two actions: work carrying boxes of tulips and gaining 10 coins a month, or invest some of his money in the tulip industry, doubling his earnings. His objective is reaching the sum of 20 coins. The actions can be modeled as follows: work: a1 = , x 7 x + 10 invest: a2 = x > 5, x 7 x 2 If we consider the -step semantics of [Rintanen et al., 2006], one of the requirements is that preconditions of parallel actions must hold at the start of the time step in which they are planned. Since the precondition of a2 is not satisfied in the initial state, this would make the shortest -step plan the following: Q = [{a1}, {a2}]. With the Relaxed -step semantics of [Wehrle and Rintanen, 2007], there is no requirement that forces preconditions of parallel actions to hold at the start of the time step in which they are planned. Still, there is a requirement of consistency between the effects applied at the same time step. The concept of consistency with numerical variables could be generalized as that all effects should be commutative. Since the effects x 7 x + 10 and x 7 x 2 are not commutative, the shortest plan would also be Q = [{a1}, {a2}]. With the R2 -step semantics, these requirements are lifted, so, considering the ordering between actions a1 < a2, our merchant can reach its goal with the one step plan Q = [{a1, a2}]. This ordering can be used to model the application of non-commutative effects as follows, where xt (resp. at) denotes the value of variable x (resp. execution of action a) at time step t: x0 = 0 initial state a0 1 precondition of a1 a0 1 x0 1 = x0 0 + 10, a0 1 x0 1 = x0 0 effects of a1 a0 2 x0 1 > 5 precondition of a2 a0 2 x0 2 = x0 1 2, a0 2 x0 2 = x0 1 effects of a2 x0 = x0 0, x0 2 = x1 tying constraints x1 = 20 goal The additional variables x0, x1 and x2 (superscripted with the time step) permit us to accumulate effects over the variable x. The variable x0 denotes the initial value of x. The Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) variable x1 embodies the value of x after the possible execution of a1. Note that x1 gets an updated value if a1 is executed, or keeps the previous value x0 otherwise. Therefore, in the precondition of action a2 we need to check x1 instead of x. The effects of action a2 are applied on x1 and captured on x2. Finally, the tying constraints link these auxiliary variables to the initial and final values of x. This encoding of chained effects will be the key to increase parallelism. 3.1 Encoding into SMT In this section we propose an encoding for planning as SMT, as a particular case of PMT, that adheres to the R2 -step semantics. The given encoding is valid for any theory T under quantifier-free first-order logic with equality. In particular, for numeric planning we could take T as the theory of the integers (or the reals) and use quantifier free linear integer (or real) arithmetic formulae. Let π = S, A, I, G be a planning problem modulo theory T under a quantifier-free first-order logic with equality. For each variable x in var(S) and each time step t, a new variable xt of the corresponding type is introduced, denoting the value of x at step t. Moreover, for each action a and each time step t, a Boolean variable at is introduced, denoting whether a is executed at step t. Given a term s, by st we denote term s, where all variables x in var(S) have been replaced by xt, and analogously for formulas. For example (x + y)t = xt + yt, and (p x > 0)t = pt xt > 0. For the case of effects, we define {x 7 }t = xt+1, {x 7 }t = xt+1 and {x 7 s}t = (xt+1 = st), where s is a non-Boolean term belonging to theory T. For example, for an assignment {x 7 x + k}, where k is a constant, we have {x 7 x + k}t = (xt+1 = xt + k). For sets of assignments, i.e., action effects, we define ({x 7 s} Eff)t = {x 7 s}t Efft and t = where s is a term (either Boolean or not) and Effis a set of assignments. For each action a = Prea, Effa and each variable x var(S), let Effa,x denote the assignment {x 7 exp} in Effa if any, or the empty set if there is no such assignment. For each x var(S), let Ax = {a | a A Effa,x = }, i.e., the set of actions that modify x. As it has already been said, the only requirement in the R2 -step semantics is that actions in each parallel step can be ordered to form a valid sequential plan. Then, let L = [a1, a2, . . . , a|A|] be a list enumerating all actions. The relative position of each action in L will determine the total ordering 1. Let n = 1. Let a1 0, . . . , ak 0 be the, according to 0 we need to prove that each action ai 0 is applicable after applying a1 0; . . . ; ai 1 0 . This is guaranteed by constraints (1), (2) and (3). By (2) the effects of the previous actions are applied, resulting in a temporal state . By (1) the precondition of ai 0 is satisfied by this temporal state. Roughly, this temporal state consists of the valuation assigning, to each x S, its updated value due to the effects of actions a1 0; . . . ; ai 1 0 . This value is captured by the closest previous chaining variable x0 prev(i), thanks to constraints (2) and (3). For n > 1, constraint (4) properly links, for each m in 1..n, the variables Xm 1 to the first temporal state of step m and the last temporal state of step m to Xm. Hence, if a1 0, . . . , ak0 0 , . . . , a1 n, . . . , akn n are the action variables set to true in the model of E(π, n,