# totsat__totallyordered_hierarchical_planning_through_sat__849b3e86.pdf

tot SAT Totally-Ordered Hierarchical Planning through SAT

Gregor Behnke, Daniel H oller, Susanne Biundo Institute of Artiﬁcial Intelligence, Ulm University, D-89069 Ulm, Germany {gregor.behnke, daniel.hoeller, susanne.biundo}@uni-ulm.de

In this paper, we propose a novel SAT-based planning approach for hierarchical planning by introducing the SATbased planner tot SAT for the class of totally-ordered HTN planning problems. We use the same general approach as SAT planning for classical planning does: bound the problem, translate the problem into a formula, and if the formula is not satisﬁable, increase the bound. In HTN planning, a suitable bound is the maximum depth of decomposition. We show how totally-ordered HTN planning problems can be translated into a SAT formula, given this bound. Furthermore, we have conducted an extensive empirical evaluation to compare our new planner against state-of-the-art HTN planners. It shows that our technique outperforms any of these systems.

Introduction Hierarchical planning provides a natural and expressive, but nevertheless easily usable means for planning. Its most common formalism is Hierarchical Task Network (HTN) planning (Erol, Hendler, and Nau 1996), which extends classical STRIPS-based planning in two ways. It introduces ﬁrst the concept of abstract tasks tasks which cannot be executed directly, and second decomposition methods, which relate abstract tasks to primitive STRIPS-like actions by providing a course of (primitive and/or abstract) tasks that needs to be performed to achieve the abstract task. The various decomposition methods for an abstract task deﬁne the allowed portfolio of ways to achieve that task. They can, e.g., be gained from domain experts, whose knowledge can be encoded and exploited this way. This makes HTN planning particularly useful for planning in complex real-world domains. Its topdown and decomposition-based reasoning also bears similarities with human thought processes (Byrne 1977). Although HTN planning is often used in real-world applications (Nau et al. 2005; Champandard, Verweij, and Straatman 2009; Hartanto and Hertzberg 2008; Morisset and Ghallab 2002; Dvorak et al. 2014; Bercher et al. 2015), there has so far only been little research in developing fast, domain-independent HTN planners. Systems are either guided by instructional information, hard-coded in the domain model, use blind search, or heuristic search in the plan space, relying on quite simple heuristics (Bercher et al.

Copyright c 2018, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

2017). Recently, a new HTN planning technique based on a bounded translation into classical planning as been proposed (Alford et al. 2016a). Most search processes of HTN planners consider both parts of the planning problem the hierarchy and the primitive action model separately and do not analyse interactions between them. This is especially true for heuristics, which have so far either been based solely on the hierarchy or solely on the primitive actions. The work by Alford et al. (2016a) provides an integration of both worlds. The planner FAPE (Dvorak et al. 2014) does this too (Bit-Monnot, Smith, and Do 2016), but restricts the decomposition hierarchy to be non-recursive, which greatly reduces the expressiveness of HTN planning. In this paper, we present a new technique for HTN planning based on a translation of the planning problem into a propositional satisﬁability problem. It enables a uniﬁed view on both parts of the problem and leads to a signiﬁcantly better performance compared to previous approaches. We restrict ourselves to the class of totally-ordered HTN planning, thereby easing the SAT formula s construction. This restriction is justiﬁed by various observations. First, we retain most of the expressive power of HTN planning and are still more expressive than STRIPS-based planning (H oller et al. 2014; H oller et al. 2016). Second, HTN domains often have a high degree of total order (Alford et al. 2016a), so that planners have even been speciﬁcally designed for this problem class (Nau et al. 1999; Alford, Kuter, and Nau 2009). Third, restricting models to be totally-ordered enables the creation of complete planning algorithms, as partially ordered HTN planning is undecidable (Erol, Hendler, and Nau 1996), while totally-ordered planning is EXPTIME-complete (Alford, Bercher, and Aha 2015). Finally, studying the structurally simpler class of totally-ordered HTN planning problems is a suitable ﬁrst step towards creating a SAT-based planner for general HTN planning problems. The remainder of the paper is organised as follows. Next, we introduce the totally-ordered HTN formalism and survey related approaches. We then describe our planning algorithm tot SAT and present its empirical evaluation.

HTN Planning In this paper, we use an adaptation of the formulation of HTN planning of Geier and Bercher (2011), in which we omit everything related to partial order. Hierarchical plan-

The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18)

ning distinguishes between two types of tasks: primitive and abstract. Primitive tasks a are identical to standard STRIPS actions: they can be executed directly and are speciﬁed by prec, add, and del lists with the usual state transition semantics. In contrast, abstract tasks must be reﬁned through methods into primitive actions before they can be executed. To describe the allowed reﬁnements, HTN planning problems use task networks, which describe a partially ordered multiset of both primitive and abstract tasks. In the totallyordered case a task network is merely a sequence of tasks.

Deﬁnition 1. Given a set of tasks T, a task network tn is any element of T . We deﬁne tn(i) to be the ith task of tn.

A totally-ordered HTN planning problem is deﬁned as:

Deﬁnition 2. A totally-ordered HTN planning problem is a 6-tuple P = (L, C, O, M, c I, s I), s.t.

L is a ﬁnite set of proposition symbols C is a ﬁnite set of abstract (or compound) tasks O is a ﬁnite set of primitive actions (or operators) M C (C O) is a set of decomposition methods c I C is the initial abstract task s I L is the initial state.

We deﬁne M(t) = {(t, tn) | (t, tn) M} to be the set of methods for the abstract task t and M(T) =

t T M(t). Decomposition methods are essentially rules that describe how an abstract task can be replaced by a sequence of (primitive or abstract) tasks, deﬁned as follows.

Deﬁnition 3. Let P = (L, C, O, M, c I, s I) be a planning problem, tn = ω1cω2 a task network where c C, and m = (c, ϕ) M a decomposition method. Then the application of m to tn results in the task network tn = ω1ϕω2, written tn m D tn . Likewise we write tn D tn if any such method exists, and tn D tn if any sequence of decompositions exists, which transforms tn into tn .

We deﬁne a solution to an HTN planning problem as:

Deﬁnition 4. Let P = (L, C, O, M, c I, s I) be a planning problem. A task network tn O is a solution to P if and only if c I D tn and tn is executable in s I. The set of all such task networks is deﬁned as S(P).

In this paper, we consider the problem of satisﬁcing planning, i.e., the decision problem to decide whether a task network tn S(P) exists. We want to emphasise that the HTN part of the planning problem is much more than a mere heuristic guide to solve some (in its core) classical planning problem it is a restriction on the allowed plans. This additional way to restrict the model makes HTN planning both more expressive than classical planning and computationally more difﬁcult. Our formalism differs from the one used by the HTN planning system SHOP (Nau et al. 1999) in that we do not allow for methods to have preconditions. This is not a restriction, as SHOP-style domains can be compiled into our formalism by introducing tasks with each method s precondition, which are ordered before all other tasks in their respective methods. Three of our evaluation domains contain method preconditions, which are compiled by the planner.

The reduced formalism bears a striking similarity to context-free grammars (CFGs). In fact, the set of solutions S(P) can be described by a CFG and for all CFGs there is a planning problem generating the same language (H oller et al. 2014). In theory, the task of planning boils down to ﬁnding a word generated by a context-free grammar, which can be done in linear time. The planning problem is a compact representation of the corresponding grammar, as it deﬁnes the solutions as the intersection of two properties: being a valid decomposition and being executable. Constructing the grammar explicitly would lead to an exponential blow-up, which makes any such approach to planning useless.

Related Work The idea of translating hierarchical planning problems into logical formulae is not entirely new. Mali and Kambhampati (1998) have proposed a translation of HTN planning problems into a SAT formula. Their notion of HTN planning signiﬁcantly differs from the (by now) established HTN formalism, making their formula much simpler and structurally different from ours. First, their formalism allows inserting additional tasks and does not have an initial task, which makes it equivalent to STRIPS planning. Second, they allow task sharing (Alford et al. 2016b), which leads both to signiﬁcantly shorter or even new solutions. Consider a planning problem with a single abstract task A which decomposes into two instances of the primitive task a, which can only be executed once. Using standard HTN semantics this problem has no solution, while with task sharing it does have one of length one. Lastly, their technique is also far less powerful than ours, as it cannot handle recursive domains, which greatly reduces the expressive power of hierarchical planning. Such domains can always be translated into an equivalent STRIPS planning problem, which is not the case for general or totally-ordered domains (H oller et al. 2014). Dix, Kuter, and Nau (2003) have proposed an encoding of totally-ordered HTN planning into answer set programming (ASP), based on the mechanics of SHOP. Their empirical evaluation shows that the translated domain performs signiﬁcantly worse than the standard SHOP algorithm (up to a factor of 1.000). We assume that this is due to the high expressive power needed for their models it requires numbers. Also, ASP is commonly solved by a translation into SAT, thus we deem a direct encoding to be more efﬁcient. We have not included their approach in our evaluation as no code for their translation is publicly available and (as stated) its performance is worse than SHOP, which we did include. In recent work, we presented a SAT-based technique for verifying HTN plans (Behnke, H oller, and Biundo 2017), a task which was shown to be NP-complete (Behnke, H oller, and Biundo 2015). In plan veriﬁcation, the objective is, given a sequence of actions, to determine whether this sequence can be obtained via decomposition from the initial abstract task. In theory, our proposed encoding could also be used as the basis for a planner, but its size and complexity make it impractical for that purpose. Also, the encoding we present here differs signiﬁcantly from the veriﬁcation encoding in that it represents decomposition in a tree-like structure rather than as a process iterating over sequences of actions.

As noted, the problem of ﬁnding a plan for a totallyordered HTN planning problem is equivalent to ﬁnding a word of a CFL given in a highly compressed form. For this test, it would however be necessary to construct the respective grammar G explicitly, which would require to construct the state space explicitly which is not feasible. Due to the speciﬁc nature of our input, there has been no work in the formal languages community as far as we know. There has been work relating to the usage of SAT techniques when analysing properties of CFLs, like universality (Axelsson, Heljanko, and Lange 2008), but their techniques are not applicable here, as they need the expanded CFG as their input.

Height-bounded Decomposition Trees

Our translation of HTN planning problems into SAT formulae is based on the notion of decomposition trees (DTs), which were originally introduced by Geier and Bercher (2011). A DT is a witness that a task network is a solution to a given planning problem. We adapt their deﬁnition and simplify it for the case of totally-ordered domains as follows:

Deﬁnition 5. Let P = (L, C, O, M, c I, s I) be an HTN planning problem. A valid totally-ordered decomposition tree T is a 4-tuple T = (V, E, α, β), where

V are the nodes of a directed tree, whose edges are given by the function E : V V 1, mapping each node in the tree to a ordered list of vertices its children. Let r be the root-node of this tree. α : V C O assigns each inner node an abstract task and each leaf a primitive task. β : V M assigns each inner node a method. α(r) = c I for all inner nodes v V with β(v) = (c, tn) and children E(v) = c1, . . . , cn, it holds that c = α(v) and α(c1) . . . α(cn) = tn.

The function E implicitly deﬁnes an order of all nodes of the tree. Two nodes a and b are ordered a b, if any only if for the lowest common ancestor c of a and b in the tree, the child a of c which is the ancestor of a occurs in E(c) before the respective ancestor b of b. The solution represented by a DT T are the tasks assigned to its leafs in the order implicitly induced by the function E. Geier and Bercher (2011) showed that for every solution tn S(P) a DT exists, whose leafs are assigned the tasks of tn in the correct order and vice versa. Thus, instead of generating a formula that describes all solutions, we can generate one that describes all possible DTs. This, however, is not possible in general, since the HTN can contain recursive methods, requiring to describe inﬁnitely many DTs in one formula. We take the same approach as SAT-planning does for classical planning: Restrict the solution by some bound and increase the bound until a solution has been found. We propose to use the height of the DTs as this bound in con-

1Let X be the set of all sequences over X.

trast to the plan length in classical planning2. I.e., given a height-bound K, we generate a SAT formula that describes all possible decomposition trees whose height is at most K. Here the question arises, which values should be tried for K? As the lower bound, we use the minimal height necessary to derive a task network solely containing primitive tasks from c I. Formally this height is the smallest K s.t. there exists an DT with height K whose leafs are assigned only primitive tasks. This bound can be computed inductively: For any primitive task it is 0, for every method it is the maximum of any contained task, and for any abstract task it is 1 plus the minimum over its decomposition methods. For cycles in the decomposition hierarchy, we can iterate until convergence. The following theorem provides an upper bound, showing that a planning procedure based on iteratively increasing the allowed height will terminate in ﬁnite time. Similar to classical planning, where there is also such an upper bound (2|L|), the determined bound is quite high, but plays no practical role, as solutions are usually found before that bound is reached.

Theorem 1. Let P = (L, C, O, M, c I, s I) be an HTN planning problem. If S(P) = , then S(P) also contains a solution whose decomposition height is at most |C| 2|L| 2.

Proof. Assume that S(P) is not empty, let tn be a solution with the minimal decomposition height, and K > |C| 2|L| 2 be this height. Then the decomposition tree of tn contains a path of length K. Label every node a in the tree with the states immediately before and after executing the primitive tasks resulting from a (i.e. the leafs below a). Since this path is longer than |C| 2|L| 2, at least two nodes will be labelled with the same abstract task and states. We can now remove the part of the tree between these two occurrences (including one of the two duplicated nodes). The resulting tree still represents a task network, which is a solution to P, as it only contains valid decompositions and the resulting task network will be executable. However the height of this path in the tree has decreased. If we repeat this process for all paths with length K, the resulting tree will represent a solution with a decomposition height <K.

Before we start to construct the formula itself, we ﬁrst need a suitable abstraction of DTs, which can describe sets of DTs in our case all DTs up to a given height. We have developed such an abstraction called Path Decomposition Trees (PDTs). A PDT P is required to have every DT of height K as a sub-tree. Figure 1 show an example for a PDT and a DT as its subgraph. The nodes of the PDT belonging to the DT are labelled with the respective task α(v). Our formula will describe such a subgraph DT and then checks that its leafs contain primitive tasks, which are executable in the order induced by the DT. Conveniently, this will be the same order as those nodes have in the PDT, which is marked with dashed lines in Figure 1.

2Bounding the height of a DT implicitly also bounds the plan length, but it is much simpler to construct a SAT-formula bounding the height, e.g., due to methods with empty task networks.

We start by detailing the formal criteria we impose on a PDT and show a theoretical property ensuring the completeness of our SAT-translation. There are several possible PDTs for every planning problem and height K that fulﬁl these criteria. As such, we introduce a broader framework upon which further research could be based. As a second step, we describe a way to generate PDTs for a given planning problem, leading to a single PDT for every problem/height combination. We then construct our SAT formula based on this single PDT.

Deﬁnition 6. Let P = (L, C, O, M, c I, s I) be a planning problem and K a height bound. A Path Decomposition Tree PK of height K is a triple PK = (V, E, α) where

V are the nodes of a directed tree of height K, whose edges are given by the function E : V V , and which has a single root node r. α : V 2C O assigns each node a set of possible tasks. c I α(r) for all inner nodes v V , for each abstract task c α(v) C that can be assigned to that node, and for each method (c, tn) M(c), there exists a sub-sequence v1, . . . , v|tn| of the children E(v), such that tn(i) α(vi) for all i {1, . . . , |tn|} all leaf nodes are either assigned only primitive tasks, or are at height K

A DT T = (VT , ET , αT , βT ) is contained in PK iff there is a sub-tree G = (V , E ) of (V, E), s.t. r V , which is isomorphic to (VT , ET ) with the isomorphism ψ : V VT s.t. v V : αT (ψ(v )) α(v ).

Theorem 2. Let P = (L, C, O, M, c I, s I) be a planning problem and K a height bound. Then every DT T which has a height of at most K is contained in a PDT PK.

Proof. Let T = (VT , ET , T , αT , βT ) be a DT of height at most K and PK = (V, E, , α) be a PDT, then we have to ﬁnd a sub-tree G of PK which is isomorphic to T and satisﬁes the conditions from the deﬁnition. We can construct this sub-tree recursively, starting with the roots of PK and T. For the root node r T of T we select the root r of PK to be in G and set the isomorphism respectively. β(r T ) assigns a decomposition method m = (αT (r T ), tn) to r T . Since PK is a PDT and αT (r T ) is contained in α(r), i.e., m is applicable to a task in α(r), we know that there is a sequence of children (v1, . . . , v|tn|) of r in PK s.t. for each vi the label set α(vi) contains the necessary label tn(i). Let v T 1 , . . . , v T |tn| be that sequence of children of r T in T. As T is a DT, we have αT (v T i ) = tn(i) and thus αT (v T i ) α(vi). Thus we can extend G by the ci and the isomorphism by the pairs (v T i , vi). Lastly, we continue constructing the sub-tree with each (v T i , vi) pair, as if they were the roots of the trees, until both are assigned a primitive task. The sub-tree G then fulﬁls the criteria from the deﬁnition by construction.

The deﬁnition of the PDT leaves several choice-points, e.g., how to select the sequence of children for a method or how to select the label sets for the nodes. When constructing our SAT formula, we are interested in a small PDT, i.e., one

Figure 1: A PDT. A subgraph-DT is marked black.

with the least amount of nodes and/or a PDT where the label set of each node is as small as possible. Since both parameters are relevant for the size of the PDT, deﬁning a formal optimisation problem is complicated. Also, performing this optimisation might be infeasible as we can t construct all DTs in beforehand. Given a planning problem P and a heigh bound K, we can construct a PDT PK using only a single function σ. σ essentially determines for a node v labelled with a set of tasks T = α(v) which children it should have and how they should be labelled in the PDT. Formally σ has the signature 2C O N (M N), and returns for every possible label set of a node of PK, the number of its children, and for every applicable method the positions of its subtasks. We call such a function a child arrangement function. The label set of the ith child must contain a task t, if there is a method for a task in T , whose jth task is t, s.t. σ will assign the jth task in the method to the child i. More formally, the set of labels induced by σ for the ith child is:

Σ(i, σ, T ) = {t | σ(T ) = (n, μ), t T , m = (t, tn) M, μ(m) = (c1, . . . , cn), j {1, . . . , n} : cj = i, tn(j) = t}

Lastly, we can deﬁne the PDT induced by a child arrangement σ. To ease the deﬁnition, we here assume a slightly extended HTN formalism, in which we do not have an initial task, but a set of initial tasks T , from which any can be chosen arbitrarily. In this construction, we use σ to construct the PDT locally and then recursively expand the tree until the height-limit K has been reached.

Deﬁnition 7. Let P = (L, C, O, M, T , s I) be a planning problem, K be a height bound, and σ be a child arrangement function. Then the PDT P σ K of height K induced by σ is deﬁned as follows: If K = 0, then P σ K = ( , , {( , T )}). Else let c be the number of children determined by σ(T ). Let for every child i be P i = P σ K 1 = (Vi, Ei, αi) the PDT of height K 1 induced by σ, where the set of initial tasks is Σ(i, σ, T ). Assume that the vertex-sets of all P i are disjunct (else rename them appropriately). Let ri be the root vertex of each P i. Then P σ K = (V, E, α) where

V = { } c i=1 Vi E = {( , (r1, . . . , rc))} c i=1 Ei α = {( , T )} c i=1 αi Clearly, the constructed tree is a valid PDT.

Proposition 1. Given a planning problem P and a child arrangement function σ, P σ K is a valid PDT of height K.

The last thing that remains to deﬁne is the child arrangement function σ. In our planner, we have used a simple scheme, were we have set the number of children always to the size of the largest applicable method. For each method its subtasks are assigned greedily to the children. We process the tasks in a method in the order occurring in that method and assign each task to one of the child nodes. When processing a task, we determine the ﬁrst child node that this task can be assigned to (the one after the node that the previous task was assigned to). If that child is already labelled with the task, we use that child, else we try to use the next child if enough children are left.

Translating PDTs into SAT Based on the PDT P σ K we have described in the previous section, we now construct a SAT formula F(P, K) that is satisﬁable if and only if there is a DT of height K whose leafs form an executable sequence of actions. As shown in Theorem 2, such a DT is essentially a sub-tree G of P σ K. In addition, its nodes must be labelled with tasks allowed by P σ K and its nodes must represent correct decompositions. Lastly, to be a solution the primitive tasks assigned to the leafs of G must form an executable action sequence. The formula describes such a sub-tree G of P σ K and will thus closely resemble the structure of the PDT P σ K. To ease the construction of the clauses expressing executability, we will enforce that the leafs of G are also leafs of P σ K. We can achieve this by adding (as appropriate) new nodes to the leafs of G , which simply repeat the primitive task of the original leaf. Initially, we separate the formula into two parts: one FD(P) describing the sub-tree G of the PDT, and one FE(P) describing that the resulting primitive task network must be executable. The formula will contain subformulae enforcing that at most one of a set of atoms can be true. As the naive quadratic exclusion encoding produces to many clauses, we have employed a binary encoding for all our atmost-one constraints, which only results in n log n clauses while introducing log n additional variables (Sinz 2005). We will denote this construct with M(L) for a set of literals L. Further, we will not describe the formula in disjunctive normal form due to readability3. Every satisfying valuation of FD(P) has to describe a single sub-tree G contained in P σ K that is a valid DT. We use the structure of the PDT and deﬁne for every node v in the PDT the formula necessary to represent a (potential) subtree DT rooted at that node. Each node in P σ K is uniquely identiﬁed by the path π = (p1, p2, . . . , pi) from the root to that node, where the path π i identiﬁes the ith child of the node identiﬁed by π. From here on, we use the path of a node to denote that node, and write K(π) for the height of π in P σ K. The function f(π) constructs said formula for each node π of the PDT P σ K. The formula contains the local conditions necessary for the described sub-tree to be valid DT and the formulae necessary for all its children. While constructing

3The formula can easily be transformed into DNF, which is required by most SAT solvers.

the formula, we will use two types of decision variables with the following meaning: tπ stating that in G the node π of P σ K is labelled with the task t, i.e., that αG (π) = t. mπ stating that in G the method m is chosen to decompose the task αG (π), i.e., that βG (π) = m If for some node π all tπ atoms are false, it is not part of the described DT G . Essentially, the sub-tree G will be deﬁned by the nodes of P σ K that are assigned a task. If α(π) only contains primitive tasks or K(π) equals K, then f(π) returns the formula M({tπ | t α(π) O}), i.e., if π is part of G , then it must be labelled with one primitive task in α(π). These nodes are the leafs of P σ K. If the mentioned condition is not true, we are dealing with an actual inner node of the PDT. If π is not part of G , then no task is assigned to it and thus all of its children cannot have a task assigned to them. Else, if the assigned task is abstract, a suitable decomposition is chosen, or if it is primitive, then the primitive task is inherited to the ﬁrst child. Here we split f(π) into six sub-formulae as follows:

f(π) = M({tπ | t α(π)}) select Method(π) apply Method(π) inherit Primitive(π) none Present(π) sub Tree(π)

The ﬁrst formula again ensures that π receives if any a unique task, as required for a DT. select Method ensures that a decomposition method is chosen if and only if the node π is assigned an abstract task, and that a single applicable method for that task is chosen. This also ensures that if a primitive task is chosen, no method can be applied.

selected Method(π) = M({mπ | M(α(π) C)})

t α(π) C tπ

m M(α(t) C) mπ tπ

The next formula apply Method states the results of an applied method. We have to assign each task in a method s task network to one of π s children π i. This assignment is determined by the child arrangement function σ, which we have described in the previous section. To ease notation, we denote the two values returned by σ(α(π)) as c and μ (the number of children to produce and the task assignment function, respectively) and use the function Σ(i, σ, T ) from the previous chapter, which returns all tasks that can be assigned to the ith child. The formula itself forces that if a method is selected for a node, then the correct tasks are assigned to its children and that all children in P σ K that are not assigned a task cannot contain a task, i.e., are not part of G .

apply Method(π) =

m=(t,tn) M(α(π))

i=1 tnπ μ(m)i i

i {1,...,c}\μ(m)

k Σ(j,σ,α(π)) kπ j

The next two formulae take care of the cases where the node π is either assigned a primitive action or none at all. In the

former case it is passed on to one of the children in the tree, in the latter none of the children can be assigned a task.

inherit Primitive(π) =

k Σ(j,σ,α(π)) kπ j

none Present(π) =

t Σ(i,σ,α(π)) tπ i

The formula sub Tree(π) = c i=1 f(π i) is responsible for the recursive construction of the tree. As the full decomposition formula FD(P), we choose f(()) c() I , i.e., the decomposition formula for the root and the assertion that the initial task is contained in G . The second part of FE(P) expresses that the tasks assigned to the leafs of G form an executable sequence of actions. We use an adaptation of the formula introduced by Kautz and Selman (1996) for this purpose. The difference is that we have to restrict the set of possible actions to the set α(π) for each task, and to connect the chosen action with the decomposition formula by using the variables tπ. Let Π = (π1, . . . , πn) be the leafs of the PDT in their order induced by P σ K. We introduce for every proposition symbol l L the decision variable li for 0 i n, stating that l is true after executing the action assigned to πi. Then FE(P) is deﬁned as

l L\s I l0 n 1 i=0 (action(i) maintain(i)) where

action(i) =

t α(πi+1) O tπ

l prec(t) li

l add(t) li+1

l del(t) li+1

maintain(i) =

t α(πi+1) O with l add(t) tπ

As a last step in this chapter, we provide a proof sketch ensuring that the constructed formula is satisﬁable iff the respective planning problem is solvable.

Theorem 3. Let P = (L, C, O, M, c I, s I) be a planning problem and K a height bound. Then F(P, K) is satisﬁable if and only if P has a solution with a decomposition tree of height K.

Proof. : Let γ be a satisfying valuation of F(P, K). We can construct a DT G = (V, E, α, β) as follows:

V = {π | t O C : tπ is true and for the parent π of π no oπ for o O is true } E(π) = (π i | t O C : tπ i is true and π V ), sorted by i α(π) = t for the one atom tπ that is true there is at most one due to the M constraints in the formula β(π) = m for the one atom mπ that is true select Method requires such an atom for all inner nodes and there is at most one due to the M constraint.

By the assertion of the atom c() I , we have α(r) = c I for the root r of G . The only thing left to show for G is that for each inner node π, its children are labelled according to the method β(π). This is ensured by the constraints of the formula apply Method. Since only the nodes contained in V are assigned tasks, the sequence of primitive actions in FE(P) is the same as the leafs of G . As FE(P) is satisﬁed, the sequence of primitive actions induced by G is executable. : Let s be a solution of P and let T be a decomposition tree for s whose height is at most K. By Theorem 2 there exists a sub-tree G of P σ K that is isomorphic to T and fulﬁls the criterion in Deﬁnition 6. Using G , we can construct a satisfying valuation of F(P, K) as follows:

tπ is true iff π G and the node v of T that is isomorphic to π is labelled with t, i.e., αT (v) = t, or if t is primitive and a predecessor π of π whose isomorphic partner v in G is labelled with t such that π is reachable from π only through ﬁrst children. mπ is true iff π G and the node v of T that is isomorphic to π is decomposed using m, i.e., βT (v) = m. li if l is true after executing i actions of s.

At most a single tπ and mπ is true for all nodes π. If tπ is true for some abstract task t, then a suitable method atom mπ is also true, as π must be an inner node of G and π has a correct β-label. Also, if t is primitive no method is selected, as π is a leaf of G . Since G is a valid DT, the constraints in apply Method are also satisﬁed (by the fourth point of a DT s deﬁnition). By the or-clause of the ﬁrst bullet point, we also ensure that inherit Primitive is satisﬁed. As G is a tree none Present is satisﬁed. Lastly, since s is a solution, it executable and so FE(P) is satisﬁed.

Evaluation To show that our HTN planner tot SAT performs well in practice, we have conducted the following empirical evaluation. Our implementation of tot SAT uses the parser and preprocessor of the planning system PANDA (Bercher, Keen, and Biundo 2014). We will release the code of tot SAT publicly. Domains. We have taken four commonly used HTN benchmark domains (UM-Translog, Woodworking, Satellite, and Smart Phone) from Bercher, Keen, and Biundo (2014). Since we have seen that our planner solves all these instances rather quickly, we have also added three new combinatorially more difﬁcult domains. ENTERTAINMENT is based on the non-hierarchical Assembly domain (Bercher et al. 2014), which describes setting up Hi Fi and video devices that can be connected with different types of cables. The goal is to connect the devices s.t. all available signals are transmitted to the appropriate signal sinks (e.g. TVs). The original domain is rather restrictive, e.g., cables can only be inserted if one of the connected devices already has a signal and unplugging cables is not possible. Our domain uses the additional expressivity of HTN planning to lift these two restrictions. ROVER is the HTN-structure developed for SHOP together with the problem instances of the IPC3 domain ROVER.

min max average media Variables 61 331132 18335 1627 Literals per clause 1.993 12.82 2.5 2.2 Horn clauses 83.8% 98.7% 91.4% 91.9% Assertional clauses 0.0005% 5.63% 0.6% 0.2%

Table 1: Statistical data on all SAT formulae generated during planning. Percentages are given as average percentages over all formulae. Horn clauses are those containing exactly one positive literal. Assertional clauses are those containing only one literal.

min max avg min % max % avg % SHOP vs optimal 0 5 +0.37 0% +41% 3.2% HTN2STRIPS vs optimal 0 1 +0.4 0% +10% +0.3% T DGc vs optimal 0 0 0 0% 0% 0% T DGm vs optimal 0 3 +0.14 0% +25% 1.4% tot SAT vs optimal 0 9 0.58 0% +90% 5.4%

tot SAT vs SHOP -17 20 +0.79 -53.8% +90% +0.79% tot SAT vs HTN2STRIPS -2 +58 +5.48 -9.1 % +287% +36.9%

Table 2: Relative sizes of solutions. Given are statistics for the difference in number of actions. The ﬁrst part of the table shows the comparison against known optimal solutions, computed using Dijkstra s Algorithm (Dijkstra 1959). The second part of the table shows the relative comparison of planners solving instances the optimal plan is unknown.

TRANSPORT is a domain describing a standard deliver-withtrucks scenario. There are several trucks (which do not need fuel) to deliver packages from their start location to a destination in a road network. For UM-TRANSLOG, WOODWORKING, SATELLITE, and SMARTPHONE, we have added ordering constraints to ensure that the domains are totally-ordered. For all these domains solvability was retained by this change. ENTERTAINMENT, ROVER and TRANSPORT were constructed s.t. the domains are already totally-ordered. We will release these domains publicly. Planners. Each planner was given 10 minutes runtime and 4 GB of RAM per instance on an Intel Xeon E5-2660. We compare tot SAT against two types of planning approaches: Those speciﬁcally designed for totally-ordered domains and those capable of handling unrestricted HTN planning problems. The ﬁrst category consists of the planning strategies SHOP (Nau et al. 1999) and HTN2STRIPS (Alford et al. 2016a). We have compared us against both the original implementation of SHOP2 and the re-implementation of SHOP recently added to PANDA. Note that SHOP was originally developed for totally-ordered domains and SHOP2 is its extension to partial order, i.e., using SHOP2 on totally-ordered instances will lead to the behaviour described in the original SHOPpaper. HTN2STRIPS is based on a translation of HTN planing problems into classical planning (Alford et al. 2016a). This planner needs similar to tot SAT a bound K for the translation, for which both a lower and an upper bound can be computed (Alford et al. 2016a). Given a time-bound t, the procedure of the planner is: run the classical planner (we used jasper (Xie, M uller, and Holte 2014), as did the original paper) on bound K with a timelimit of t minutes. If no solution was found, increment K and repeat. If the planner reached K s upper bound, we let it run until the total timelimit. We tried all possible values for t and found that the coverage of the planner is highest for t = 3min. The second group of planners consists of general HTN planning techniques. All of them are based on plan-space search in contrast to SHOP (using progression) and HTN2STRIPS. We have used the two currently best know heuristics for plan-based search in HTN planning (TDGc and TDGm, with and without recomputing (PR) (Bercher

et al. 2017)) implemented in PANDA using Greedy-A with a weight of 2. We also included the strategies DFS, BFS, and Dijkstra provided by PANDA, as well as its emulation of the UMCP planner (Erol, Hendler, and Nau 1994). Lastly, we also included the planner FAPE (Dvorak et al. 2014). However it does not accept all instances as input, as it cannot handle recursion in the domain. As such, we ran it only on the domains SATELLITE, WOODWORKING, and ROVER. We have tested four SAT solvers for tot SAT. Three of them are top performers at the SAT Competition 2016: cryptominisat5 (Soos 2016), Maple COMSPS (Liang et al. 2016), and Riss6 (Manthey, Stephan, and Werner 2016). The fourth solver is standard unmodiﬁed minisat. Similar to HTN2STRIPS, tot SAT has to try several values for the bound K in order to ﬁnd a solution to the planning problem. The ﬁrst value of K used by tot SAT is the lowest height needed to reach a primitive task network. We then construct F(P, K) and pass it to the SAT-solver. We have always set the timeout of the solver to the remaining runtime. Learnt clauses are not reused if the bound K is increased. Discussion. In Table 1 we present statistical data on the generated SAT formulae. We can see that in all instances, the size of the formula is moderately large (never above 4 105). Also, most clauses of the formulae are horn which are algorithmically easier to handle for a SAT solver. From this data we assume that the formulae are structurally relatively simple, which is wilful by construction. We think that our encoding actually exposes structural information about the domain in a way that is useful to the solvers. Table 3 shows the overall coverage of all evaluated planners. We can see that tot SAT solves depending on the SAT solver 123 to 125 of the 127 benchmark instances. For all domains except transport, tot SAT solves all instances, while only two transport instances (#29 and #30) cannot be solved in time. Even using the simple SAT-solver minisat, only thee instance remain unsolved. tot SAT would solve both these instance with 30 minute timelimit. Further we can see that the planners which are speciﬁcally designed for totally-ordered domains have a higher coverage than general HTN planner. This is expected, as they can exploit this special property during search. However, it is extremely surprising that the SHOP-strategy which is a blind depth-ﬁrst progressionsearch outperforms the best known heuristic HTN plan-

tot SAT cms

tot SAT Maple

tot SAT Riss6

tot SAT minisat

SHOP original

UM-TRANSLOG 22 22 22 22 22 22 22 18 22 22 22 22 22 0 22 22 22 SATELLITE 25 25 25 25 25 25 25 23 24 25 24 25 25 25 24 18 20 WOODWORKING 11 11 11 11 11 11 10 5 6 9 6 9 9 0 6 6 6 SMARTPHONE 7 7 7 7 7 5 4 6 4 4 4 5 6 0 4 4 4 ENTERTAINMENT 12 12 12 12 12 9 5 5 5 5 5 9 9 0 5 5 6 ROVER 20 20 20 20 20 18 16 5 0 3 0 2 2 15 0 0 0 TRANSPORT 30 28 27 26 27 2 0 20 1 0 1 2 1 0 1 0 1 total 127 125 124 123 124 92 82 82 62 68 62 74 74 40 62 55 59

Table 3: Number of solved instances per planner per domain. Maxima are indicated in bold. cms = cryptominisat5

ners signiﬁcantly. UM-Translog and Satellite are solved by practically every planner, i.e., we might suggest to remove these two from future benchmarks. When looking at the new domains transport and rover we can clearly see the different advantages of SHOP and HTN2STRIPS on these domains. While SHOP is able to exploit the structure of the HTN in the rover domain, it cannot solve the combinatoric problem in transport. In contrast, HTN2STRIPS can solve transport easier, but struggles to utilise the information encoded in the rover domain. tot SAT can solve both domains, showing that it combines the ability to solve combinatorially hard problems with the ability to exploit the HTN hierarchy. Especially the general HTN planners perform worse on these domains, showing the need for the development of better informed heuristics for HTN planning. This becomes even more obvious if we compare these heuristics with blind BFS it outperforms them all. Figure 2 shows the number of solved instances as a function of time. The choice of the SAT-solver seems to be unimportant, as all three show almost the same behaviour. We also see that tot SAT solves more instances than all other approaches starting roughly after 4 seconds of runtime. Over all instances, tot SAT calls the SAT-solver 3.71 (media 4) times, with a maximum of 8 calls before either a solution is found or the timeout is reached. Lastly, although we did not aim for optimal planning, we have also evaluated the relatives quality of solutions in terms of their length. The results are shown in Table 2. tot SAT tends to produce longer solutions, as it is allowed by the formula to assign tasks to all leafs. However, we think that it would be possible to create an optimal planner based on our encoding.

Conclusion We have presented a new planning technique for HTN planning SAT-based HTN planning. It is based on a translation of decomposition height bounded totally-ordered HTN planning problems into a SAT formula. This way, a planning problem can be solved by iteratively increasing this bound. We have evaluated it against state-of-the-art HTN planning systems and have shown that it outperforms all of

Figure 2: Runtime vs number of solved instances per planner

them. The presented technique solves totally-ordered HTN planning problems which are occurring naturally in practice. In view of our results, extending tot SAT to partially ordered domains is the natural next step to further improve the performance of HTN planning. Furthermore, the presented SAT-encoding seems to be well suited as the basis for allowing additional constraints during planning. This occurs often in so-called mixed-initiative planning systems where users inﬂuence the plan generation process (Nothdurft et al. 2015). This integration can, e.g., be achieved by enabling the planner to process certain requests to change a current plan (Behnke et al. 2016), which could be transformed into additional SAT formulae.

Acknowledgments This work was done within the Transregional Collaborative Research Centre SFB/TRR 62 Companion-Technology for Cognitive Technical Systems funded by the German Research Foundation (DFG).

Alford, R.; Bercher, P.; and Aha, D. W. 2015. Tight bounds for HTN planning. In Proc. of ICAPS 2015. Alford, R.; Behnke, G.; H oller, D.; Bercher, P.; Biundo, S.; and Aha, D. W. 2016a. Bound to plan: Exploiting classical heuristics via automatic translations of tail-recursive HTN problems. In Proc. of ICAPS 2016. Alford, R.; Shivashankar, V.; Roberts, M.; Frank, J.; and Aha, D. W. 2016b. Hierarchical planning: relating task and goal decomposition with task sharing. In Proc. of IJCAI 2016. Alford, R.; Kuter, U.; and Nau, D. S. 2009. Translating HTNs to PDDL: A small amount of domain knowledge can go a long way. In Proc. of IJCAI. Axelsson, R.; Heljanko, K.; and Lange, M. 2008. Analyzing context-free grammars using an incremental sat solver. In Proc. of ICALP 2008. Behnke, G.; H oller, D.; Bercher, P.; and Biundo, S. 2016. Change the plan - how hard can that be? In Proc. of ICAPS 2016. Behnke, G.; H oller, D.; and Biundo, S. 2015. On the complexity of HTN plan veriﬁcation and its implications for plan recognition. In Proc. of ICAPS 2015. Behnke, G.; H oller, D.; and Biundo, S. 2017. This is a solution! (... but is it though?) verifying solutions of hierarchical planning problems. In Proc. of ICAPS 2017. Bercher, P.; Biundo, S.; Geier, T.; H ornle, T.; Nothdurft, F.; Richter, F.; and Schattenberg, B. 2014. Plan, repair, execute, explain - how planning helps to assemble your home theater. In Proc. of ICAPS 2014. Bercher, P.; Richter, F.; H ornle, T.; Geier, T.; H oller, D.; Behnke, G.; Nothdurft, F.; Honold, F.; Minker, W.; Weber, M.; and Biundo, S. 2015. A planning-based assistance system for setting up a home theater. In Proc. of AAAI 2015. Bercher, P.; Behnke, G.; H oller, D.; and Biundo, S. 2017. An admissible HTN planning heuristic. In Proc. of IJCAI 2017. Bercher, P.; Keen, S.; and Biundo, S. 2014. Hybrid planning heuristics based on task decomposition graphs. In Proc. of So CS 2014. Bit-Monnot, A.; Smith, D. E.; and Do, M. 2016. Delete-free reachability analysis for temporal and hierarchical planning. In Proc. of HSDIP 2016. Byrne, R. 1977. Planning meals: Problem solving on a real data-base. Cognition 5:287 332. Champandard, A.; Verweij, T.; and Straatman, R. 2009. The AI for Killzone 2 s multiplayer bots. In Proc. of Game Developers Conference. Dijkstra, E. W. 1959. A note on two problems in connexion with graphs. Numerische Mathematik 1(1):269 271. Dix, J.; Kuter, U.; and Nau, D. 2003. Planning in answer set programming using ordered task decomposition. In Proc. of KI 2003.

Dvorak, F.; Bit-Monnot, A.; Ingrand, F.; and Ghallab, M. 2014. A ﬂexible ANML actor and planner in robotics. In Proc. of Plan Rob 2014. Erol, K.; Hendler, J. A.; and Nau, D. S. 1994. UMCP: A sound and complete procedure for hierarchical task-network planning. In Proc. of AIPS 1994. Erol, K.; Hendler, J. A.; and Nau, D. S. 1996. Complexity results for HTN planning. Annals of Mathematics and Artiﬁcial Intelligence 18(1). Geier, T., and Bercher, P. 2011. On the decidability of HTN planning with task insertion. In Proc. of IJCAI 2011. Hartanto, R., and Hertzberg, J. 2008. Fusing DL reasoning with HTN planning. In Proc. of KI 2008. H oller, D.; Behnke, G.; Bercher, P.; and Biundo, S. 2014. Language classiﬁcation of hierarchical planning problems. In Proc. of ECAI 2014. H oller, D.; Behnke, G.; Bercher, P.; and Biundo, S. 2016. Assessing the expressivity of planning formalisms through the comparison to formal languages. In Proc. of ICAPS 2016. Kautz, H., and Selman, B. 1996. Pushing the envelope: Planning, propositional logic, and stochastic search. In Proc. of AAAI 1996. Liang, J. H.; Oh, C.; Ganesh, V.; Czarnecki, K.; and Poupart, P. 2016. Maple COMSPS, Maple COMSPS LRB, Maple COMSPS CHB. In Proc. of SAT Competition 2016. Mali, A. D., and Kambhampati, S. 1998. Encoding HTN planning in propositional logic. In Proc. of AIPS 1998. Manthey, N.; Stephan, A.; and Werner, E. 2016. Riss 6 solver and derivatives. In Proc. of SAT Competition 2016. Morisset, B., and Ghallab, M. 2002. Learning how to combine sensory-motor modalities for a robust behavior. In Advances in Plan-Based Control of Robotic Agents. Nau, D.; Cao, Y.; Lotem, A.; and Munoz-Avila, H. 1999. SHOP: Simple hierarchical ordered planner. In Proc. of IJCAI 1999. Nau, D. S.; Au, T.-C.; Ilghami, O.; Kuter, U.; Wu, D.; Yaman, F.; Mu noz-Avila, H.; and Murdock, J. W. 2005. Applications of SHOP and SHOP2. Intelligent Systems, IEEE 20:34 41. Nothdurft, F.; Behnke, G.; Bercher, P.; Biundo, S.; and Minker, W. 2015. The interplay of user-centered dialog systems and AI planning. In Proc. of SIGDIAL 2015, 344 353. Association for Computational Linguistics. Sinz, C. 2005. Towards an optimal CNF encoding of boolean cardinality constraints. In Proc. of CP 2005, volume 3709. Soos, M. 2016. The Crypto Mini Sat 5 set of solvers at SAT Competition 2016. In Proc. of SAT Competition 2016. Xie, F.; M uller, M.; and Holte, R. 2014. Jasper: The art of exploration in greedy best ﬁrst search. In The 8th Int. Planning Competition, 39 42.