# temporal_planning_with_clockbased_smt_encodings__9a9c3f3a.pdf Temporal Planning with Clock-Based SMT Encodings Jussi Rintanen Aalto University, Department of Computer Science, Helsinki, Finland We propose more scalable encodings of temporal planning in SMT. The first contribution is practical clock-based encodings of resources and effect delays. Existing encodings of effect delays (Shin and Davis, 2015) have a quadratic size, due to the necessity to determine the time differences between steps for a linear number of steps. Clocks improve this to linear. The second contribution is a new relaxed scheme for steps. Existing schemes require a step for every time point with discontinuous change. This is relaxed, improving scalability. 1 Introduction After the successes of SAT in solving the classical planning problem [Kautz and Selman, 1996; 1999], Shin and Davis [2005] proposed the encodings of temporal and hybrid systems planning in the SAT modulo Theories (SMT) framework. While the work solves conceptual problems of temporal planning with SMT, it has not proved successful in terms of performance. Shin & Davis adopted a temporal model with ϵ-separation [Fox and Long, 2003], which can double the number of steps, with a high performance penalty for constraint-based methods [Rintanen, 2015b]. ITSAT [Rankooh and Ghassem-Sani, 2015] probably the best scalable temporal planner avoids the problems of ϵ-separation by reducing the temporal problem to a non-temporal one. ITSAT, however, often generates plans with makespans twice the optimal, which in practice is unacceptable. Rintanen [2015a] adopts a temporal model without ϵ-separation and with opportunities for discretization to integer time, which is not possible with ϵ-separation. The first application of SMT was classical planning with numeric variables [Wolfman and Weld, 1999]. SMT was little used before the recent works on temporal planning, and very recently on continuous change [Bryce et al., 2015]. Also affiliated with Griffith University, Brisbane, Australia, and the Helsinki Institute for Information Technology, Finland. This work was funded by the Academy of Finland (Finnish Centre of Excellence in Computational Inference Research COIN, 251170). We acknowledge the computational resources provided by the Aalto Science-IT project. In this work, we pursue the fundamentals of temporal (and hybrid) planning further, believing that full temporal model is needed during search to achieve good-quality plans. First, we address the number of steps in SMT encodings, analogous to the number of time points in SAT encodings of classical planning [Kautz and Selman, 1996]. Shin and Davis [2005] require a step for every time point in which an action or a discrete change takes place. We propose a relaxed scheme in which one step can summarize discrete changes in multiple preceding time points. Second, we address the asymptotic size of encodings, reducing a encoding of time delays from quadratic to linear, while avoiding the excessive number of clocks of earlier encodings [Rintanen, 2015a]. The resulting encodings improve earlier state of the art in SMT-based temporal planning, solving dozens more of the standard benchmark problems, and sometimes improving runtimes by two orders of magnitude. 2 Model of Temporal Planning We adopt the formal framework of Rintanen [2015a]. A temporal action consists of a precondition φ (a propositional formula), and effects which are negated or unnegated state variables associated with times 0, indicating how much after the action the effect will take place. When the action is at absolute time t, then an effect for time t is at absolute time t+t . We call t the delay. An action can allocate a resource for a duration d0, at some time t0 relative to the starting point of the action: the resource is allocated for the interval t+t0 to t+t0+d0, and no other action can take place if it allocates the same resource for an intersecting interval. Definition 1 (Actions with Resources) Let X be a finite set of state variables and R a finite set of resources. An action is a triple p,q,e where the precondition p is a propositional formula over X, the resource requirement q is a set of tuples (ts,td,r) Q Q R such that td 0, and (ts,td,r,n) Q Q R N such that td 0, and the effect e is a set of pairs (t,l) where t 0 is a rational number and l is a literal over X. We refer to the precondition p of an action a by prec(a), the resource requirement q by rreq(a), and the effects e by Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) eff(a). The set R of resources is divided in two types. Unary resources, expressed by triples (ts,td,r), model absolute exclusion inside a group of actions: only one action can allocate r at a time. State resources, expressed by tuples (ts,td,r,n), where n is the state, can be used by multiple actions as long as the state n is the same. Definition 2 (Plans and Executions with Resources) For a problem instance X,R,I,A,G , with state variables X, resources R, initial state I, actions A, and goal G, a plan π is a finite set of pairs (a,t) such that the following holds. 1. t 0 is a rational number and a A is an action. 2. For all {(t1,a1),(t2,a2)} π such that for some r R (ts 1,td 1,r) rreq(a1) and (ts 2,td 2,r) rreq(a2), or (ts 1,td 1,r,n1) rreq(a1), (ts 2,td 2,r,n2) rreq(a2), and n1 =n2, (a) t1+ts 1+td 1 t2+ts 2, or (b) t2+ts 2+td 2 t1+ts 1. 3. There is an execution v:Q X {0,1} which is a mapping from non-negative rational time points and state variables to 0 and 1 such that (a) v(0,x)=I(x) for all x X, (b) v(t,prec(a))=1 for all (a,t) π (c) if (a,t) π and (t ,l) eff(a), then v(t+t ,l)=1, (d) state variables not changed by actions retain their values: for any tl and tu such that tlt. Effects (0,l) that make preconditions of simultaneous actions true (including the action itself) are not allowed. 3 SMT Encodings of Temporal Planning Timelines in temporal planning are continuous, but it is sufficient to represent explicitly only a finite sequence of time points in which an action starts or an effect takes place, called steps. For Boolean state variables X={x1,...,xn}, actions A={a1,...,am}, and N +1 steps, we need SMT variables x@i and a@i for all x X, a A, and i {0,...,N}. For each step, these SMT variables indicate the values of state variables and whether an action is taken. Variables τ@i denote the absolute time at step i, and @i=τ@i τ@(i 1). We constrain these values by @i>0. If φ is the precondition of action a, we have the formula a@i φ@i (1) where φ@i is the formula obtained from φ by replacing each x by x@i. By causes(x)@i we denote the disjunction of all the conditions under which x becomes true at step i. These formulas typically refer to atomic propositions for earlier steps. Similarly causes( x)@i for x becoming false. Hence when causes(l)@i is true, the effect l takes place. causes(x)@i x@i (2) causes( x)@i x@i (3) Frame axioms allow inferring that the value of a state variable remains unchanged. (x@i x@(i 1)) causes(x)@i (4) ( x@i x@(i 1)) causes( x)@i (5) How the disjuncts of causes(x)@i are defined depends on when the action causing the change takes place, and how the time difference between the action and the effect is expressed. Here we first present the Shin&Davis style encoding of delay, and will consider other options later. A disjunct of causes(x)@i for effects x at a relative time t>0 of an action a is j=0 (a@j ((τ@i τ@j)=t)) (6) if encoded in the spirit of Shin and Davis [2005] formula (4.3). These constraints have a quadratic size. When t>0, to guarantee that there is a step for every effect, we need j=i+1 (τ@j τ@i=t). (7) 3.1 Resource Constraints Consider actions a1 and a2 such that (t1,d1,r) rreq(a1) and (t2,d2,r) rreq(a2). If the intervals ]t1,t1+d1[ and ]t2,t2+ d2[ overlap, then the actions cannot be at the same step. The following constraint prevents these two actions from starting at the same step. a1@i a2@i (8) Now consider the case in which a1 may have been taken earlier at time 0 t, and this may prevent a2 from taken at the current step at time 0. The requirement that the allocations of the resource by the two actions do not overlap means that the relative time t where the action a1 may have been taken satisfies one of the following. 0 t+t1+d1 t2 0 t+t1 t2+d2 This means that either the first action frees the resource before the second allocates it, or vice versa. Simplified these constraints are as follows. t1+d1 t2 t t1 t2 d2 t Hence a2 is not allowed if a1 is at ] t1+t2+d2, t1 d1+ t2[ relative to the current time point. a1@j (t1 t2 d2> i j>t1+d1 t2) (9) Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) Here i j=Pi k=j+1 @k, or, equivalently, i j=τ@i τ@j. If t10) occurring at least once, use clock ct,d r . Any action allocating r with (t,d,r) resets ct,d r to 0. Next we discuss the conditions under which resource conflicts between actions can be detected by using the clocks associated with the resource allocations. Note that constraints on actions taken at the same step are always handled by static constraints as described earlier, and here we only address constraints on actions taken at different steps. Consider actions a1 and a2 (this includes the case a1=a2) which respectively allocate the same unary resource r with (0,d1,r) and (t2,d2,r). The constraint for handling conflicts of this form is the following. a2@i c0 r@i t2 (15) Consider actions a1 and a2 which respectively allocate the same unary resource r with (t1,d1,r) and (t2,d2,r). This is the general case, which cannot always be handled with clocks. The question is whether the clock ct1,d1 r represent a sufficient amount of information of earlier allocations (t1,d1,r) of r so that the conflict with allocations (t2,d2,r) can be detected with it, by using the following constraint. a2@i t1+d1 ct1,d1 r @i t2 t1 ct1,d1 r @i t2+d2 (16) It turns out that the clock is insufficient only if the immediately preceding action allocates the resource relatively later than the current action does, and some still earlier action allocates the resource in a conflicting way. Proposition 1 The clock ct1,d1 r is sufficient if t2+d2 t1. Proof: Consider action a2, taken at time t and allocating r with (t2,d2,r), as well as actions a0 and a1, both of which allocate the unary resource r with (t1,d1,r), with a1 taken at some time t t , and hence the second condition cannot be true, that is, a1 could not possibly allocate the resource after a2 has freed it. Therefore the only possibility of not having a conflict is that t +t1+d1 t+t2. Now, because t 0, then we use Proposition 1 to check if the clock ct1,d1 with formula (16) is sufficient, and if not, we use formula (9) instead. For standard benchmarks this scheme generally leads to the very compact constraints (15). Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) 6 Shared Clocks for Effect Delays Section 3.2 described how can could be used for representing effects delay (for relative time points >0). The high number of real-valued variables required makes that representation impractical. However, a different kind of scheme for using clocks typically avoids the high number of clocks. The basic observation is that two (or more) actions can share a clock whenever the actions cannot temporally overlap. Temporal overlap is not possible when the actions use the same exclusive resource for their whole duration. Example 4 Consider actions for moving an object from location to location. To prevent taking two actions moving the object from one location to two different locations, all these action allocate the same resource specific to the object for their whole duration. Now, in cases like above we can use one clock for a large number of actions. The clock is reset when an action is taken, and the actions effects take place when the clock reaches a value corresponding to the delay of the effect. The basic idea is to use the clock for those resource allocations that start at relative time point 0 of the action, and that last until the last effect of the action. The actions cannot overlap because they use the same unary resource for their whole duration. Hence this one clock can be shared by multiple actions. This way, practically all standard benchmark problems require only some dozens of clocks. 6.1 Qualitative Clocks Since a shared clock does not tell which action is active, we must use qualitative (Boolean) clocks to do that. These indicate a qualitative value range for the clock (a single value, or a range of values). Note that while the clock is initialized to d when the action allocates the resource with (t,d,r), the time values of the qualitative clocks run from 0 until d, for clarity. The constraints for connecting the real-valued clock to the qualitative clocks are given as (17)-(19), (27), (28), (36). The connection between the real-valued clock c and the qualitative clock cq representing the time since the start of the action, to the reset value of the clock da for this action, is Let T ={t1,...,tn} be time points, for example those (relative) time points in which the effects of a given action take place.We use the propositional variables, qa