# optimal_planning_modulo_theories__dd9d2ca9.pdf

Optimal Planning Modulo Theories

Francesco Leofante1 , Enrico Giunchiglia2 , Erika Abrah am3 and Armando Tacchella2

1Imperial College London, United Kingdom 2University of Genoa, Italy 3RWTH Aachen University, Germany f.leofante@imperial.ac.uk

We consider the problem of planning with arithmetic theories, and focus on generating optimal plans for numeric domains with constant and statedependent action costs. Solving these problems efﬁciently requires a seamless integration between propositional and numeric reasoning. We propose a novel approach that leverages Optimization Modulo Theories (OMT) solvers to implement a domain-independent optimal theory-planner. We present a new encoding for optimal planning in this setting and we evaluate our approach using wellknown, as well as new, numeric benchmarks.

1 Introduction

In this work we focus on numeric planning [Fox and Long, 2003], an extension of classical planning where preconditions and effects of actions may require reasoning on fragments of arithmetic theories. Despite being undecidable in the general case [Helmert, 2002], admissible heuristics have been extended to handle simple numeric planning problems [Scala et al., 2016a] in which actions have linear conditions and may only increase or decrease numeric variables by a constant see, e.g, [Scala et al., 2016a; Scala et al., 2017; Piacentini et al., 2018b]. Several heuristics have been proposed also for numeric planning problems where both conditions and effects are expressed as linear expressions over numeric state variables [Hoffmann, 2003; Illanes and Mc Ilraith, 2017; Li et al., 2018]. However these are inadmissible and cannot be used in the cost-optimal setting. Notably, cost-optimal planning with both simple and linear effects can be handled by [Piacentini et al., 2018a] via a compilation to mixed-integer linear programming (MILP) that proved to be competitive with heuristic search approaches. We propose a new planning approach that can solve to optimality numeric problems for which (integrated) algorithmic solutions have not been proposed yet. We focus on extending cost-optimal numeric planning to problems where conditions may be simple or linear and actions are equipped with constant and state-dependent costs, i.e., costs are encoded by arithmetic expressions over numeric state variables. Previous works have studied state-dependent action

costs (SDAC) [Ivankovic et al., 2014] in the classical setting [Geißer et al., 2015; Geißer et al., 2016] and in the presence of global numerical state constraints [Ivankovic et al., 2014; Haslum et al., 2018; Ivankovic et al., 2019], but did not explore extensions towards numeric planning. Support and scalability are challenging in the setting we target, but we show that such challenges are met by our approach leveraging recent advances in Optimization Modulo Theories (OMT) [Sebastiani and Tomasi, 2015], an extension of SMT that combines efﬁcient propositional reasoning with dedicated procedures for theory-optimization. After a brief review of background notions in Section 2, Sections 3 and 4 detail our contributions summarized as follows: A novel SMT encoding of numeric planning that enables an efﬁcient relaxed reachability analysis. We extend standard encodings with a sufﬁx that performs a Boolean abstraction of the transition function of the planning problem. Reasoning on this abstraction, we can conclude whether a goal is reachable only with a modest increase in the size of the formula and without resorting to expensive decision procedures for theory-reasoning. An extension of this construction that leads to Optimal Planning Modulo Theories. We show how our encoding needs to be modiﬁed in the OMT setting and provide optimality guarantees for our approach. While this is not the ﬁrst application of OMT to planning see, e.g., [Leofante et al., 2019; Leofante et al., 2018] this work presents the ﬁrst domain-independent results based on OMT. An empirical analysis of our planner on domains previously reported in the literature [Scala et al., 2017; Li et al., 2018], and on a new numeric domain featuring SDAC which state-of-the-art tools struggle to solve.

2 Background Numeric planning. We consider a fragment of numeric planning expressible in PDDL2.1 level 2 [Fox and Long, 2003]. A (numeric) planning problem is a tuple Π = VB, VQ, A, I, G where VB and VQ are ﬁnite disjoint sets of propositional and numeric variables of Π, respectively; for a variable v VB VQ let dom(v) denote the domain of v; we use Boolean B for propositional variables and for numeric variables the rationals Q equipped with the usual

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

order and arithmetic operations. A state of Π is a function s : (VB VQ) (B Q) assigning to each variable v VB VQ a value s(v) dom(v) from its corresponding domain. Let S denote the set of all states of Π. A propositional constraint is either a propositional variable v VB or its negation v. Arithmetic expressions e are composed from numeric variables and constants using arithmetic operators. An arithmetic expression is linear iff for each multiplication operator in it, at least one of the operands contains no variables. A numeric constraint e1 e2 compares two arithmetic expressions using a comparison operator {<, , =, , >}. A constraint ϕ is either a propositional or a numeric constraint. A condition Φ is a (possibly empty) set of constraints. The evaluation function J Ks and the satisfaction relation |= for expressions, constraints and conditions are as usual. A propositional assignment has the form v := e, where v VB is a propositional variable and e { , }. A numeric assignment has the form v := e, where v VQ is a numeric variable and e is an arithmetic expression; we say that v := e is an assignment to v. An assignment is either a propositional or a numeric assignment. An effect Ψ is a set of assignments that contains at most one assignment v := e for each variable v VB VQ; we say that v is assigned in Ψ iff there is an assignment v := e in Ψ. For any v VQ, d dom(v), and e a linear arithmetic expression, we call v := v+d a constant increment, v := v-d a constant decrement, v := v+e a linear increment, and v := v-e a linear decrement. Given a state s S and an effect Ψ, the successor of s and Ψ is the (unique) state s S such that s (v) = Je Ks for each v := e in Ψ, and s (v) = s(v) for each v VB VQ that is not assigned in Ψ; we write s, s |= Ψ. A is the set of actions a = (prea, eﬀa, ca), where prea is a condition, eﬀa is an effect and ca : S Q 1 is a statedependent positive cost function, speciﬁed by a numeric expression. An action a = (prea, eﬀa, ca) is applicable in state s iff (i) s |= ϕ for each ϕ prea and (ii) Je Ks is deﬁned for each assignment v := e in eﬀa. A numeric constraint e 0 is simple iff e is linear and for each assignment in a Aeﬀa, either the assigned variable does not appear in e or the assignment is a constant increment or a constant decrement. A numeric constraint e 0 is linear iff e is linear and for each assignment in a Aeﬀa, either the assigned variable does not appear in e or the assignment is a linear increment or a linear decrement. A set MA A of actions is independent if any variable assigned in the effect of an action in the set appears in no other action in the set (neither in conditions nor in effects nor in cost functions). I is a condition called the initial condition which is satisﬁed by exactly one state called the initial state; G is a condition called the goal condition; states satisfying G are called goal states. A (sequential) plan πn = a0, . . . , an-1 is a sequence of actions a0, . . . , an-1 A such that there exist (unique) states s0, . . . , sn S with s0 |= I, si-1 |= preai-1 and si-1, si |= eﬀai-1 for each i = 1, . . . , n, and sn |= G. The cost of πn is C(πn) = Pn-1 i=0 cai(si). A parallel plan πn = A0, . . . , An-1 is a sequence of independent action sets Ai = {ai,1, . . . , ai,ki} A, for i = 0, . . . , n-1,

such that a0,1, . . . , a0,k0, . . . , an-1,1, . . . , an-1,kn-1 is a plan for Π. The cost of a parallel plan πn is C(πn) = Pn-1 i=0 P a Ai cai(si). A plan πn is optimal for Π iff C(πn) C(π ) for all plans π of Π.

Planning as satisﬁability. Given a planning problem Π = VB, VQ, A, I, G , planning as satisﬁability [Kautz and Selman, 1992] tries to solve Π by encoding plans of bounded length as the solutions of a logical formula. We consider state-based encodings to SAT [Rintanen et al., 2006; Rintanen, 2009] and SMT [Shin and Davis, 2005]. To encode the existence of parallel plans of length up to n, we encode a sequence of n ground action sets as well as the semantics of their parallel execution. This requires n variable sets A0, . . . , An-1, where each Ai consists of a unique proposition ai for each action a A (stating whether or not a is executed in step i), and also n+1 copies Vi = {vi | v VB VQ}, i = 0, . . . , n of the propositional and numeric variable sets, storing the initial variable values in V0 and the values after the ith step in Vi. For each constraint ϕ in the plan speciﬁcation, we denote by ϕi the formula obtained from ϕ by replacing each variable v V with vi Vi. The same renaming applies to effects. Thus in formula (1) below, I0 is the initial condition with each variable v replaced by v0, and similarly Gn results from the goal condition G after replacing each v with vn. Furthermore, let Ti,i+1 be a formula describing how actions executed in the ith step of a plan affect states, i.e., Ti,i+1 enforces that each action implies its preconditions over Vi and its effects over Vi+1. With Ti,i+1 we also encode explanatory frame axioms and mutual exclusion (mutex) axioms. The former state that variables not affected by actions do not change their values; the latter enforce that multiple actions can be performed simultaneously if and only if they are independent. For a given planning problem Π, now we can encode bounded plans for horizon n with the following formula Πn:

i=0 Ti,i+1 Gn (1)

Example 1. Assume a planning problem with one proposition VB = {b} and one numeric variable VQ = {x}), initial condition b (x = 0), goal condition x = 1, and two actions A = {a+, a-}, a+ = ( , {x := x+2}, x) and a- = ({b}, {x := x-1, b := false}, x). Without considering costs, the encoding uses the following constructs:

I0 : x0 := 0 b0 Gn : xn = 1

T + i,i+1 : a+ i xi+1 := xi+2

T - i,i+1 : ai (bi xi+1 := xi-1 bi+1)

Ti,i+1 : (a+ i ai) T + i,i+1 T - i,i+1 (bi bi+1 ai) ( bi bi+1 ) (xi = xi+1 ai a+ i)

where denotes mutual exclusion. The formulas Π0 and Π1 are unsatisﬁable, but Π2 yields a plan executing both ai and a+ i once, in arbitrary order.

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

3 Optimal Planning Modulo Theories

When dealing with unit costs, the standard encoding above can be adapted to ﬁnd plans with a minimal number of actions by enforcing that exactly one action is executed in each step. If Π0, . . . , Πn-1 are unsatisﬁable and Πn has a solution, then the shortest plan has length n. However, for constant and state-dependent costs the shortest plan is not necessarily costoptimal, therefore new conditions are needed to determine when to stop searching. To put a bound on this cost-optimal search, we need at least a sufﬁcient condition to detect that no plans longer than a certain bound exist. The idea is that if after n steps no future action can modify any of the variables in the goal any more, then the problem does not admit any solution.

3.1 A Novel SMT Encoding

To formalize such a sufﬁcient condition Πbound n for a given horizon n, we enforce the initial condition and transitions as per formula (1) and extend it with requirements T abs n and Gabs n as explained below:

Πbound n : I0

i=0 Ti,i+1 T abs n Gabs n (2)

Having executed n steps, a necessary condition for satisfying the goal condition G at n or in the future is that for each constraint ϕ in G, either ϕ is true in n or at least one variable in ϕ will be modiﬁed by further steps. To express a necessary condition for the latter, we introduce for each variable v VB VQ a fresh proposition mod v, and we ensure that if mod v is false then there exists no plan that is able to modify the value of v after the ﬁrst n steps:

Gabs n : V ϕ G(ϕn W v ϕ mod v) (3)

To achieve such a result, we observe that having executed n steps, a necessary condition for executing an action a in the future is that for each precondition of a either it is true in state n or at least one of its variables will be modiﬁed by further executions. Since executing an action might in turn enable further actions, we encode ﬁxed points for this chain to obtain an over-approximation of all variables whose value could still be potentially changed if a longer horizon were given:

ϕ prea (ϕn _

a A, v:=e eﬀa aabs (4)

Note that aabs represents only a necessary condition for executing a in the future, on a path starting in the nth state. However, this condition is not sufﬁcient for the existence of plans with length n or longer, because after the ﬁrst n steps we disregard the concrete effects of actions. Those steps can be seen as executing abstract actions, where for each concrete action a A we deﬁne one abstract action aabs by

relaxing each constraint ϕ prea in the precondition of a to ϕ W v ϕ mod v and

relaxing each assignment v := e in the effect of a by mod v := true, assuming that mod v are initially false for all v V. Note that, since all abstract action effects set a subset of the mod v variables to true, they are not conﬂicting, therefore we do not encode mutex axioms. Example 2. Consider the planning problem of Example 1. Without considering costs, the encoding of Πbound n uses the following new constructs:

Gabs n : (xn = 1 mod x)

T abs n : a-,abs (bn mod b) (mod b a-,abs) (mod x a+,abs a-,abs)

Enforcing the smallest ﬁxed point. Our encoding should express the existence of a plan of length n or longer when after the ﬁrst n steps we switch from executing concrete actions to executing abstract actions. However, there is one remaining problem: formula (4) does not yet enforce the smallest ﬁxed point. For example, consider an action a with a single precondition constraint that refers to a variable v V and a single assignment in its effect that assigns v. Then a solution which evaluates both aabs and mod v to true satisﬁes the corresponding formulas in formula 4, but this way aabs might enable itself . Though the encoding is already correct in its current form, in order to make it efﬁcient, we compute the smallest ﬁxed point of all facts that are true in n under the abstract transition relation to obtain all valid reachable states. We borrow ideas from Answer Set Programming (ASP) [Gelfond and Lifschitz, 1988] to perform this computation in our framework1. Consider a program P corresponding to formula (4) where, for each abstract action aabs, a set of rules is built such that: (i) premises contain one of the disjunct appearing in the precondition of aabs and (ii) conclusions of each rule contain all mod v appearing in the effects of aabs. Theorem 1. (closure) For any ﬁxed truth values of constraints in action preconditions and the goal condition, the smallest closure of the abstract transition relation corresponds to the answer set of P. Previous works on compilation from ASP to the satisﬁability setting can be leveraged to encode this computation in our construction. Here we use an approach based on loop formulas as proposed by [Lin and Zhao, 2002]. We start by constructing a dependency graph for abstract actions as a directed graph D such that: (i.) the nodes of D correspond to all concrete constraints and all mod v, v V in action preconditions and (ii.) for each abstract action aabs and each variable mod v assigned by aabs, there is an edge connecting mod v to nodes corresponding to aabs s precondition.

1Our idea bears some similarities to the relaxed reachability analysis of [Helmert, 2009], although here we propose a new construction that can be embedded in the Planning as SAT framework. Moreover, we use this construction with a different purpose: while Helmert uses this to perform reachability analysis, we use this abstraction to provide termination guarantees for our planning algorithm.

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

From [Lin and Zhao, 2002] we know that strongly connected components of D represent loops in the formula, i.e., circular dependencies that hinder the computation of the smallest set closed under action rules. For each loop visiting nodes L, we forbid circular selfenabling of corresponding actions with the formula

l R(L) l ) (5)

where R(L) is a set of variables appearing in premises of abstract rules such that: (i.) at least one variable in their conclusions intersect L but (ii.) none of the variables in their premises intersects L.

Example 3. Consider the planning problem of Example 1, where action a+ is now removed, i.e., A = {a-}. The dependency graph for a-,abs is

The graph contains one loop L = {mod b} and the corresponding loop formula writes mod b bn Correctness. In the following let Aabs = {aabs B | a A} be the set of variables modelling (the execution of) abstract actions. Furthermore, let Πbound+ n be the conjunction of Πbound n and the axioms (5) for each loop in the dependency graph D. We can formulate the following correctness statement for Πbound+ n .

Theorem 2. (unsolvability) For any n 0, if Πbound+ n is unsatisﬁable then the planning problem Π does not admit solution.

Proof sketch. The full proof of Theorem 2 is complex. Due to space constraint we only provide an intuition of the proof. Assume Πbound+ n is unsatisﬁable while Π admits a plan πm = A0, . . . , Am 1 of length m n. By construction the sequence A0, . . . , An-1 satisﬁes Πn. Abstraction-enabling axioms ensure that the abstract counterparts of the actions An are still applicable in the nth state and their effects enable the execution of subsequent abstract action sets in place of An, ..., Am-1 . Since Am-1 achieves the goal in Π, the abstract goal can be reached in Πbound+ n as well, leading to a contradiction.

3.2 Extension to OMT In the ﬁrst part of this section we introduced a new encoding that enables relaxed reachability analysis. In the following we show how to leverage the properties of this encoding to provide termination conditions and optimality guarantees for our planning algorithm. The main result can be informally stated as follows: if, for a given horizon, the OMT solver ﬁnds a solution that does not require abstract actions, this solution is also a globally optimal plan. To achieve this we extend formula Πbound+ n as follows.

First, in the encoding T abs n of abstract actions aabs we remove the abstraction of effects on cost variables and add instead a metric for plan quality: for each concrete action a with cost function ca, the abstract action aabs has the minimal value of ca over all states, i.e. caabs := min s S ca(s).

With the extensions above, we ensure that abstract actions always have a cost that is lower than or equal to the one of their concrete counterpart. Under this cost schema, the solver will favour the execution of abstract actions to achieve the (abstract) goal as these have minimum cost. While this is needed to ensure optimality of our solutions as we will show, we must make sure the solver does not abuse this and push the execution of all actions to the sufﬁx. Not only this would affect optimal reasoning, but would also affect termination of bounded planning procedures. Indeed we would need to increase the planning horizon indeﬁnitely hoping to ﬁnd a valid plan (i.e., containing only concrete actions), but this would never happen. Hence to ensure termination, we augment the OMT encoding with the following axioms. For each action a A let Ma A be the set of actions that are not independent from a. For each action a A and for each step 0 < i < n we add

ϕ prea ϕi-1

With axioms (6) an action a can be executed at step i only if: (i.) a was already performed at step i-1 or, (ii.) a was not applicable at step i-1 or, (iii.) another action a , mutex with a, was performed at i-1. We then ensure that abstract actions are executed only if all previous steps (i.e., < n) are ﬁlled with at least one action: _

aabs Aabs aabs !

Let Πopt n denote the planning formula Πbound+ n extended with the axioms above, the following result holds. Lemma 1. For any n 0, Πbound+ n and Πopt n are equisatisﬁable. With the addition above we can formulate the following theorem. Theorem 3. (optimality) For any n 0, let µ be the optimal solution of Πopt n . If µ |= Gn then µ is a valid optimal plan.

Proof sketch. The intuition behind this proof is based on the fact that the goal state could be reached without resorting to abstract actions. Assume µ |= Gn but a cheaper plan πm = A0, . . . , An 1, . . . , Am 1 exists at horizon m > n. If a cheaper solution existed for m > n then a relaxed version of it would be encoded by a model µ of Πopt n where action sets A0, . . . , An 1 appear as in πm and actions in An, . . . , Am 1 are abstracted by actions in Aabs which have lower (or same) cost by deﬁnition, i.e., C(µ ) C(πm). We can now distinguish three cases: (i) C(µ ) < C(µ), which contradicts the assumption that µ is optimal, (ii) C(µ) = C(µ ) C(πm) and (iii) C(µ) < C(µ ) C(πm) which contradict the assumption that πm is optimal.

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

Algorithm 1 Optimal Planning Modulo Theories

1: procedure OMTPLAN(Π, ub) 2: set initial horizon n := 0 3: while n ub do 4: build formula Πopt n 5: if Πopt n is UNSAT then 6: return Π does not admit solution; 7: else 8: extract model µ for Πopt n ; 9: if µ satisﬁes Gn then 10: return µ 11: else 12: increase horizon n; 13: return no plan found within bound ub

Informally, Theorem 3 states that if a goal can be achieved within a given horizon without using abstract actions, then the plan computed for said horizon is also a global optimum. This condition, in combination with the cost schema we adopt, guarantees optimality of our solutions. As an example, consider a problem where the goal could be achieved by a single, costly action or by a sequence of cheaper actions. Under such circumstances, Πopt n would be satisﬁable for n = 1, however our cost schema forces the solver to satisfy the formula by using abstract actions which have lower costs. This, in turn, causes Theorem 3 to be violated and our planning algorithm to continue its search using longer horizons. These ideas are formalized in the following paragraph.

Planning algorithm. We embed our encoding in the OMTPLAN procedure shown in Algorithm 1. Given a planning problem Π and a user-speciﬁed upper bound ub, our procedure builds bounded encodings for increasing horizons (lines 2 4). At each iteration, we check if formula Πopt n is satisﬁable. If it is not the case, the procedure terminates according to Theorem 2 and signals that the planning problem does not admit a solution (lines 5 6). Notice that, for the purpose of checking reachability, this check could be done only once for n = 0. If formula Πopt n is satisﬁable instead, we extract a model µ for it in line 8. Notice that µ has minimum cost among all possible models of Πopt n , being the result of an OMT check. We then check the condition expressed in Theorem 3: if µ does not contain abstract actions (i.e., µ |= Gn, the goal can be achieved within the given horizon without resorting to abstraction) we return the optimal plan represented by µ, otherwise we increment the horizon for the next iteration (lines 9 12). Notice that the strategy used to increase n does not affect the optimality results of OMTPLAN as Theorem 3 holds for any n 0. Finally, if no solution can be found within the given upper bound ub, the procedure terminates signalling this fact in line 13.

4 Empirical Evaluation

To evaluate OMTPLAN, we developed a prototypical implementation in Python2. Our implementation leverages the modules developed in [Eyerich et al., 2009] for parsing, and

2OMTPlan is available at: https://github.com/fraleo/OMTPlan.

Domain # ˆhrmax CSC OMTPlan C T C T C T COUNTERS 15 6 28.22 15 1.36 7 524.59 DEPOTS 20 3 1050.02 1 4.9 1 78.48 FARMLAND 30 30 193.68 28 32.86 1 211.57 GARDENING 63 63 599.85 63 887.33 18 3031.23 SAILING 20 16 2101.13 17 2813.55 5 345.23 SATELLITE 20 2 293.1 4 459.8 1 17.85 ROVER 20 4 25.91 4 10.93 4 61.5 ZENOTRAVEL 20 6 579.3 7 699.65 4 107.74 Total 213 130 4871.21 139 4910.38 31 4378.19

Table 1: Coverage (C) and total solving time (T) in seconds for domains with simple conditions.

uses the Python API3 of νZ [Bjørner et al., 2015] to build and solve OMT formulas. Our experimental analysis compares with search based approaches implemented in the ENHSP planner [Scala et al., 2016b] and with the MILP compilation (CSC) of [Piacentini et al., 2018a]. Experiments are carried out using a timeout of 30 minutes and 4 GB memory limits on a machine running Debian 3.16 with processor Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz. Our analysis considers numeric problems with simple and linear conditions, and also numeric domains with SDACs: simple numeric domains are taken from [Scala et al., 2017]; linear domains are from [Li et al., 2018], with two additions, ROVER-METRIC and FO-ZENOTRAVEL, developed starting from their simple counterparts; ﬁnally, planning problems with SDACs are developed speciﬁcally to test OMTPLAN.

Simple and linear numeric domains. For domains with simple effects we compare against the ˆhrmax heuristic of [Scala et al., 2017] and the MILP compilation (CSC) of [Piacentini et al., 2018a]. Table 1 shows coverage and the total solving time. Results conﬁrm the efﬁciency of CSC on simple numeric problems, outperforming other approaches on almost all instances. On the other hand, our approach suffers from two main drawbacks. The ﬁrst one is that domains like FARMLAND, GARDENING and SAILING feature optimal plans with relatively many steps and little parallelism. Such long and narrow plans force us to produce large encodings that exceed the capabilities of νZ before ﬁnding optimal solutions. The second drawback has to do with axioms (6), and affects OMTPLAN adversely even in domains, e.g., COUNTERS, where optimal plans are short and wide , i.e., featuring relatively few steps and lots of parallelism. Indeed, while (6) tries to make sure that actions are taken before entering the sufﬁx, it may still happen that the optimal solution for a ﬁxed horizon is an abstract solution which also satisﬁes (6). In such cases, we are still forced to increment the horizon until we exceed the capability of the underlying solver. Note that the interaction between abstraction and axioms (6) is not always harmful as the adverse effect depends on the structure of the domain and the associated costs. In domains with linear effects we compare our encodings with CSC and with a blind search using a simple goal sensitive heuristic (hblind) that returns 0 if the state is a goal state and 1 otherwise. Table 2 reports results obtained in linear do-

3https://github.com/Z3Prover/z3/wiki/Documentation

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

Domain # hblind CSC OMTPlan C T C T C T FO-COUNT 20 4 339.84 3 223.83 9 2104.71 FO-COUNT-INV 20 3 77.29 2 48.82 6 937.41 FO-COUNT-RND 60 14 1411.79 10 520.29 23 2835.46 FO-FARMLAND 50 13 1035.09 2 47.07 1 6.21 FO-SAILING 20 2 610.85 0 - 1 71.56 ROVER-METRIC (1-10) 10 4 151.69 4 14.02 5 303.39 TPP-METRIC (1-10) 10 5 20.51 n.a. n.a. 3 524.12 ZENOTRAVEL-LINEAR 20 4 145.5 2 1.55 4 888.24 Total 220 49 3792,29 23 855.58 52 7671.10

Table 2: Coverage (C) and total solving time (T) in seconds for domains with linear conditions. Entries reporting n.a. indicate that the planner could not be run on the corresponding domain.

mains. Solving 52 problems, OMTPLAN outperforms other approaches, still leaving room for improvement on speciﬁc domains. The overall results is probably due to the increased complexity in the numerical part which OMTPLAN handles comparatively better than the other approaches, except on domains like TPP-METRIC, a variant of the Travelling Salesman Problem with no parallelism.

A new SDAC domain. We introduce a numeric planning domain with SDACs called SECURITY CLEARANCE. In this domain, an intelligence agency has to manage clearance authorizations for several documents across different security levels. The agency can authorize a level to read a document, but doing so changes the clearance of the document: authorizations at lower levels are revoked, while those at higher levels remain unchanged. Authorizing a level has a cost which directly corresponds to the level involved, e.g., authorizing level 2 costs 2. Since some documents may be more important than others, each document is initially assigned a priority. When needed, the agency can increase its priority incurring in a cost proportional to the current one. If a document has high priority, the agency can decide to authorize all levels at once by paying the appropriate price. Starting from an initial state where no level is authorized, the goal is to authorize all levels to read all documents while minimizing expenses.

Experiments on SDACs. For our experiments we generated 36 instances of the domain, varying the number of documents (from 2 to 10) and the number of levels (from 2 to 5). Exploring this domain both in depth and breadth, we can investigate weaknesses of constraint and search-based methods respectively. Here, CSC cannot be considered for our analysis as it does not provide support for state-dependent cost structures. Hence, we compare only with hblind. Figure 1 shows a cactus plot of the result obtained: for each planner, we sort the instances according to the run time of the planners and we plot them. In this arrangement, two samples on the same abscissa are not necessarily the result of the same sample, but they correspond to the same percentile. The domain is challenging for both approaches, with OMT being able to solve 26 instances and hblind solving 16. The performance of hblind degrades when the number of documents is increased, incurring in what could be explained as a worst-case behaviour of A . Indeed, the planner produces timeouts for almost all instances having strictly more than 5 documents, while already failing to solve some problem with

0 5 10 15 20 25 0

OMTPLAN hblind

Solved instances

CPU time (s)

Figure 1: Cactus plot for the SECURITY CLEARANCE domain. Instances are ordered by increasing CPU time, reported in seconds.

less documents. OMTPLAN s performance is comparable to hblind for instances with up to 4 documents (all levels), while a considerable difference can be noticed for instances with higher number of documents. In particular, OMTPLAN always manages to solve instances with 2 or 3 levels, even in domains with 10 documents. Still, domains having 4 or 5 levels proved challenging and could not be solved for instances having 6 documents or more.

5 Conclusion

We considered the problem of generating optimal plans for numeric domains with constant and state-dependent action costs. Since solving these problems requires an efﬁcient interplay between propositional and arithmetic reasoners, we proposed Optimization Modulo Theories as the framework of choice. We presented a novel encoding of planning problems that enables efﬁcient reasoning about optimality via OMT and abstraction. We further provided empirical evidence of the usefulness of our approach, demonstrating state-of-the-art results on some expressive classes of numeric problems. Our future work will focus on extending the Optimal Planning Modulo Theories framework to problems dealing with other theories, such as that of structured data types. OMT solvers support reasoning over such theories, and we intend to leverage these capabilities to extend our approach. This step could be seen both as a straight-forward implementation of the Planning Modulo Theories framework of [Gregory et al., 2012] and its extension to the optimal setting.

References [Bjørner et al., 2015] Nikolaj Bjørner, Anh-Dung Phan, and Lars Fleckenstein. νz - An optimizing SMT solver. In TACAS, pages 194 199, 2015. [Eyerich et al., 2009] Patrick Eyerich, Robert Mattm uller, and Gabriele R oger. Using the context-enhanced additive heuristic for temporal and numeric planning. In ICAPS, pages 130 137, 2009. [Fox and Long, 2003] Maria Fox and Derek Long. PDDL2.1: An extension to PDDL for expressing

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

temporal planning domains. J. Artif. Intell. Res. (JAIR), 20:61 124, 2003.

[Geißer et al., 2015] Florian Geißer, Thomas Keller, and Robert Mattm uller. Delete relaxations for planning with state-dependent action costs. In IJCAI, pages 1573 1579, 2015.

[Geißer et al., 2016] Florian Geißer, Thomas Keller, and Robert Mattm uller. Abstractions for planning with statedependent action costs. In ICAPS, pages 140 148, 2016.

[Gelfond and Lifschitz, 1988] Michael Gelfond and Vladimir Lifschitz. The stable model semantics for logic programming. In ICLP, pages 1070 1080, 1988.

[Gregory et al., 2012] Peter Gregory, Derek Long, Maria Fox, and J. Christopher Beck. Planning modulo theories: Extending the planning paradigm. In ICAPS, pages 65 73, 2012.

[Haslum et al., 2018] Patrik Haslum, Franc Ivankovic, Miquel Ram ırez, Dan Gordon, Sylvie Thi ebaux, Vikas Shivashankar, and Dana S. Nau. Extending classical planning with state constraints: Heuristics and search for optimal planning. J. Artif. Intell. Res., 62:373 431, 2018.

[Helmert, 2002] Malte Helmert. Decidability and undecidability results for planning with numerical state variables. In AIPS, pages 44 53, 2002.

[Helmert, 2009] Malte Helmert. Concise ﬁnite-domain representations for PDDL planning tasks. Artif. Intell., 173(56):503 535, 2009.

[Hoffmann, 2003] J org Hoffmann. The metric-ff planning system: Translating ignoring delete lists to numeric state variables. J. Artif. Intell. Res., 20:291 341, 2003.

[Illanes and Mc Ilraith, 2017] Leon Illanes and Sheila A. Mc Ilraith. Numeric planning via abstraction and policy guided search. In IJCAI, pages 4338 4345, 2017.

[Ivankovic et al., 2014] Franc Ivankovic, Patrik Haslum, Sylvie Thi ebaux, Vikas Shivashankar, and Dana S. Nau. Optimal planning with global numerical state constraints. In ICAPS, pages 145 153, 2014.

[Ivankovic et al., 2019] Franc Ivankovic, Dan Gordon, and Patrik Haslum. Planning with global state constraints and state-dependent action costs. In ICAPS, pages 232 236, 2019.

[Kautz and Selman, 1992] Henry A. Kautz and Bart Selman. Planning as satisﬁability. In ECAI, pages 359 363, 1992.

[Leofante et al., 2018] Francesco Leofante, Erika Abrah am, and Armando Tacchella. Task planning with OMT: an application to production logistics. In IFM, pages 316 325, 2018.

[Leofante et al., 2019] Francesco Leofante, Erika Abrah am, Tim Niemueller, Gerhard Lakemeyer, and Armando Tacchella. Integrated synthesis and execution of optimal plans for multi-robot systems in logistics. Information Systems Frontiers, 2019.

[Li et al., 2018] Dongxu Li, Enrico Scala, Patrik Haslum, and Sergiy Bogomolov. Effect-abstraction based relaxation for linear numeric planning. In IJCAI, pages 4787 4793, 2018. [Lin and Zhao, 2002] Fangzhen Lin and Yuting Zhao. ASSAT: computing answer sets of a logic program by SAT solvers. In AAAI, pages 112 118, 2002. [Piacentini et al., 2018a] Chiara Piacentini, Margarita P. Castro, Andr e Augusto Cir e, and J. Christopher Beck. Compiling optimal numeric planning to mixed integer linear programming. In ICAPS, pages 383 387, 2018. [Piacentini et al., 2018b] Chiara Piacentini, Margarita P. Castro, Andr e Augusto Cir e, and J. Christopher Beck. Linear and integer programming-based heuristics for costoptimal numeric planning. In AAAI, pages 6254 6261, 2018. [Rintanen et al., 2006] Jussi Rintanen, Keijo Heljanko, and Ilkka Niemel a. Planning as satisﬁability: parallel plans and algorithms for plan search. Artif. Intell., 170(1213):1031 1080, 2006. [Rintanen, 2009] Jussi Rintanen. Planning and SAT. In Handbook of Satisﬁability, pages 483 504. 2009. [Scala et al., 2016a] Enrico Scala, Patrik Haslum, and Sylvie Thi ebaux. Heuristics for numeric planning via subgoaling. In IJCAI, pages 3228 3234, 2016. [Scala et al., 2016b] Enrico Scala, Patrik Haslum, Sylvie Thi ebaux, and Miquel Ram ırez. Interval-based relaxation for general numeric planning. In ECAI, pages 655 663, 2016. [Scala et al., 2017] Enrico Scala, Patrik Haslum, Daniele Magazzeni, and Sylvie Thi ebaux. Landmarks for numeric planning problems. In IJCAI, pages 4384 4390, 2017. [Sebastiani and Tomasi, 2015] Roberto Sebastiani and Silvia Tomasi. Optimization modulo theories with linear rational costs. ACM Trans. Comput. Log., 16(2):12:1 12:43, 2015. [Shin and Davis, 2005] Ji-Ae Shin and Ernest Davis. Processes and continuous change in a sat-based planner. Artif. Intell., 166(1-2):194 253, 2005.

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)