# completenesspreserving_dominance_techniques_for_satisficing_planning__2e595260.pdf

Completeness-Preserving Dominance Techniques for Satisﬁcing Planning

Alvaro Torralba Saarland University, Saarland Informatics Campus, Saarbr ucken, Germany torralba@cs.uni-saarland.de

Dominance pruning methods have recently been introduced for optimal planning. They compare states based on their goal distance to prune those that can be proven to be worse than others. In this paper, we introduce dominance techniques for satisﬁcing planning. We extend the deﬁnition of dominance, showing that being closer to the goal is not a prerequisite for dominance in the satisﬁcing setting. We develop a new method to automatically ﬁnd dominance relations in which a state dominates another if it has achieved more serializable sub-goals. We take advantage of dominance relations in different ways; while in optimal planning their usage focused on dominance pruning and action selection, we also use it to guide enforced hillclimbing search, resulting in a complete algorithm.

1 Introduction Satisﬁcing planning is the problem of, given an input planning task, ﬁnding a sequence of actions that go from the initial state to a state that satisﬁes the goal condition. Most satisﬁcing planners use search algorithms like Greedy Best-First Search (GBFS) or Enforced-Hill Climbing (EHC) guided with heuristics such as the delete-relaxation heuristic and extensions thereof [Hoffmann and Nebel, 2001; Domshlak et al., 2015] plus certain diversiﬁcation techniques [Richter et al., 2011; R oger and Helmert, 2010] and/or sub-goal selection strategies [Chen et al., 2004; Porteous et al., 2001; Hoffmann et al., 2004]. Both GBFS and EHC use heuristics, but they use them in different ways. In GBFS, heuristics determine the order in which states are expanded. EHC, on the other hand, uses heuristics to compare newly generated states against the initial state, restarting the search when it ﬁnds a state with lower heuristic value than the initial state. The success of EHC highly depends on the accuracy of the heuristics. When the heuristic is accurate EHC ﬁnds solutions very quickly, but it is incomplete in tasks with unrecognized deadend states, i.e., states that the heuristic ﬁnds promising but have no solution [Hoffmann, 2005]. Dominance pruning techniques have recently been introduced for optimal planning [Hall et al., 2013; Torralba and Hoffmann, 2015]. They reduce the search space by pruning

states that are dominated by others. The deﬁnition of dominance is based on goal distance: a state dominates another state if it can be proven to be at least as close to the goal. In this paper we explore the use of dominance methods to compare states in satisﬁcing search. We redeﬁne the notion of dominance for satisﬁcing planning, substituting the optimality guarantee by a completeness guarantee that ensures that at least one plan (not necessarily optimal) will be preserved. We also consider how dominance relations can be used to reduce the size of the search space. Like in optimal planning, one can prune states that are dominated by others, but lifting any considerations with respect to the cost of reaching such states. Also, a state s can be replaced by any of its successors s if s strictly dominates s. Based on this, we deﬁne a variant of EHC that is complete. Our work builds on previous methods to automatically ﬁnd dominance relations for a given planning task. We strengthen their reasoning and specialize them for satisﬁcing search. To do this, we deﬁne a new dominance relation that serializes the planning task, inspired by sub-goal serialization approaches [Barrett and Weld, 1993]. Our experiments show that these serialized dominance relations are able to identify dominance in a number of domains to guide a dominancebased EHC.

2 Background A labeled transition system (LTS) is a tuple Θ = S, L, T, s I, SG where S is a ﬁnite set of states, L is a ﬁnite set of labels, T S L S is a set of transitions, s I S is the start state, and SG S is the set of goal states. A plan for a state s is a path from s to some s G SG. A state s is reachable if there exists a path from s I to s. A state is solvable if there exists a plan from s, otherwise we say that s is a dead end. By h (s) (g (s)) we denote the length of a shortest plan for s (path from s I to s). A plan for s is optimal iff its cost equals h (s). Since our goal is to ﬁnd solutions fast, regardless of their cost, we assume unit-cost domains. We also simplify the explanation of previous work on dominance for optimal planning based on this assumption. Following previous work on dominance pruning, we consider a planning task as a set of LTSs on a common set of labels, {Θ1, . . . , Θn}. Given a planning task in the more common SAS+ formalism [B ackstr om and Nebel, 1995], the atomic transition systems representation with one LTS for

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

(A) Robot position.

Vi Vi mi mi

(B) Cell i visited.

F2 F1 F0 mx mx

(C) Fuel available.

Figure 1: Example based on the Visitall domain where a robot must visit all tiles in a square grid. The robot has two units of fuel which are consumed when moving into striped cells so the robot must not enter the square grid via the shortest path.

each SAS+ variable can be easily obtained [Helmert et al., 2014]. The state space of the task is the synchronized product of all the LTSs: Θ = Θ1 Θn. The synchronized product of two LTSs Θ1 Θ2 is another LTS with states S = {(s1, s2) | s1 Θ1 s2 Θ2}, transitions T = {((s1, s2), l, (s 1s 2)) | (s1, l, s 1) T1 (s2, l, s 2) T2}, s.t.

(s1, s2) SG iff s1 SG 1 and s2 SG 2 . We write s l s as a shorthand for (s, l, s ) Θ. Let τ be a set of labels, we write s τ s to denote a path from s to s where all labels belong to τ. We use subscripts to differentiate states in the state space Θ (e.g., s, s , t) and their projection into some Θi (e.g., si, s i, ti). We say that a transition s s in Θ affects Θi if it modiﬁes its value, si = s i. Consider a planning task represented as a set of LTSs {Θ1, . . . , Θn} like our running example shown in Figure 1, where a robot must visit all tiles in a square grid. There is an LTS representing the position of the robot (A), an LTS for each cell in the square grid that represents if the cell has been visited or not (B), and an LTS describing the available fuel (C). In (A) we depict the grid. The corresponding LTS has a node for each cell, and transitions between adjacent cells. Transitions moving the robot to cell i are labeled with label mi. Only walking into stripped cells (x) consumes fuel. All other labels have a self-loop transition in every state and they are omitted. A heuristic is a function h : S N that estimates the distance from every state to the goal. A state is reachable if there exists a sequence of actions from s I to it. A state is alive iff it is solvable, reachable, and not a goal state. A heuristic h is descending if all alive states have a successor with lower heuristic value. A heuristic is dead-end aware if h(s) = for all dead-end states s. Most common search algorithms in satisﬁcing planning (e.g., hill-climbing or GBFS) will solve the planning task with at most h(s I) expansions if h is a descending and dead-end aware heuristic [Seipp et al., 2016].1

A relation is a set of pairs of states. A relation is a preorder iff it is reﬂexive and transitive. We write s t as a shorthand for s t and t s (i.e., is a strict partialorder). We say that approximates a heuristic h iff s t implies h(t) h(s). Dominance relations approximate the goal distance; whenever s t (t dominates s) then t must be

1Seipp et al. [2016] consider the more general case of dead-end avoiding heuristics instead of dead-end aware heuristics.

at least as close to the goal as s (h (t) h (s)). Torralba and Hoffmann [2015] introduced label-dominance simulation, a method to compute a set of relations { 1, . . . , n} that can be combined to derive a dominance relation for Θ where s t iff si i ti for all i. In a best-ﬁrst search with open list open and closed list closed, dominance pruning consists of removing a state s from the open list without expanding it, whenever there exists t open closed such that s t and g(t) g(s). If is a dominance relation, then at least one optimal plan is preserved. Quantitative dominance extends the previous method by considering numeric functions instead of relations [Torralba, 2017]. A function D : S S Q { } is a quantitative dominance function (QDF) if D(s, t) h (s) h (t). QDFs are computed as a set of functions {D1, . . . , Dn} such that D(s, t) = P i Di(si, ti). To guarantee that the sum of all Di is a QDF, they must fulﬁll the equation:

Di(si, ti) = min

si l s i max

Di(s i, u i) hτ(ti, ui) + X

j =i DL j (l, l )

In words, whatever we can do from si (si l s i), we can

do from ti via a path ti τ ui l u i, taking into account the comparison of the goal distance between the ﬁnal result of both paths (Di(s i, u i)), the cost from ti to ui (hτ(ti, ui)), and how much cost we incur for applying l instead of l in all other LTSs (P j =i DL j (l, l )). This requires to deﬁne hτ and DL j :

hτ accounts for transitions that only affect a single LTS Θi. A label is a τ-label for Θi iff it can always be applied to change the value of Θi without affecting other LTSs. Formally, if l is a τ-label for Θi then sj l sj Θj = Θi, sj Θj. The τ-distance from si to ti, written hτ(si, ti), is the length of a shortest path from si to ti in Θi using only transitions with τ labels or if no such path exists. For example, moving the robot to a non-striped cell outside the square part of the grid is a τlabel because it changes the position of the robot without affecting other variables.

DL j (l, l ) measures how good it is to apply l instead of l in Θj. If DL j (l, l ) 0, it means that any time we can apply l to reach some sj, we can also apply l to reach tj s.t. D(sj, tj) 0. For example, in the LTS that represents the available fuel, DL F (mx, mi) = 0 for any striped cell x and non-striped cell i.

In the example of Figure 1, we can obtain a QDF {D1, . . . , Dn} where each Di is comparing states only according to their value in Θi. For the position of the robot we obtain D(x, y) = d(x, y) where d(x, y) is the distance from cell x to cell y using only movements that do not consume fuel. For the fuel, we obtain D(s, t) = 0 if t has at least as much fuel than s or otherwise. For each cell, we have a value of if the cell has been visited in s and not in t and 0 otherwise. In optimal planning, t dominates s if P i Di(si, ti) 0. In our example this means that a state is better if it has visited more cells, it has at least as much fuel and the position of the robot is the same.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

Figure 2: Example with two alternative paths to the goal.

QDFs can be used, apart for dominance pruning, to perform action selection. Action selection is a type of pruning where a state s open may be replaced by one of its immediate successors t if D(s, t) c(s, t) where c(s, t) is the cost of the transition from s to t. In that case, such transition starts an optimal plan from s, so at least one optimal solution is always preserved.

3 Satisﬁcing Dominance

In optimal planning, a dominance relation is one where for any s t, t should be as close to the goal as s. However, this is sometimes too restrictive for satisﬁcing search. For example consider a problem where there are two paths to the goal, one requires solving a hard combinatorial problem and the other follows a straightforward, but potentially longer, path. Assuming that providing any guarantees about the cost of solving the combinatorial problem is hard, no dominance can be proven for optimal planning. However, it is simple to manually design a dominance relation where the states in the simpler path dominate those related to solving the combinatorial problem, directly guiding the search towards the goal. With this aim, we deﬁne a satisﬁcing dominance relation as one that preserves solutions, no matter their cost or length.

Deﬁnition 1 (Satisﬁcing Dominance Relation) A preorder is a satisﬁcing dominance relation if there exists a descending and dead-end aware heuristic h such that approximates h (s t = h (t) h (s)).

Intuitively, h should be dead-end aware so that unsolvable states do not dominate solvable states, and descending to avoid the case where a state dominates all its successors, hence rendering the search incomplete. Note that simply requiring each state to not dominate one of its solvable successors is not enough to guarantee that a plan is preserved. Consider the example of Figure 2, where dominance pruning with a relation where t s and s t could prune both s and t , causing all solutions to be pruned. This is a generalization of dominance relations used in optimal planning, since the perfect heuristic h is descending and dead-end aware. Note that any descending and dead-end aware heuristic can be deﬁned via computing h after changing the cost of the transitions in Θ. Therefore, Deﬁnition 1 can also be interpreted as a dominance relation for an instance with a different cost function. In optimal planning, dominance relations have been used in two different ways: for dominance pruning (eliminating states that are dominated by others) and action selection pruning (automatically applying an action if this action is guaranteed to start an optimal plan). Next, we adapt these types of

pruning to satisﬁcing planning. Dominance pruning can be applied in a similar way as in optimal planning, but slightly stronger since the cost of reaching each state does not matter.

Theorem 1 Let be a satisﬁcing dominance relation. Then, a best-ﬁrst search with open list open, and closed list closed in which a state s open may be pruned if there exists another t open closed such that s t is complete.

Proof Sketch: Let h be the dead-end aware and descending heuristic approximated by . s was pruned so there must exist t open closed such that h (t) h (s). Let u be the state with lowest h value in the open list. Then, h (u) h (t) since if t is closed, one of its successors with lower h value (h is descending) was inserted in open. Since h (u) h (s) < , u is solvable. As h is descending, there exists a plan for u that does not contain any state dominated by s. Action selection can also be adapted for the satisﬁcing case. In this case, we do not care about the solution cost so quantitative dominance is not required anymore. Instead, we consider strict dominance to avoid loops in which two states that dominate each other are constantly replaced by one another. We can also generalize action selection to consider not only immediate successors, but also any successor that is reached by a sequence of actions. This is far more useful in satisﬁcing than in optimal planning because the cost of the action sequence can be ignored.

Theorem 2 Let be a strict satisﬁcing dominance relation. A best-ﬁrst search where a state s open can be replaced by some t such that t is the result of executing any sequence of actions in s and s t is complete.

Proof Sketch: As h (t) h (s), and h is descending, t must have a solution that does not traverse s, since all states ti in the solution have h (ti) < h (s) so ti = s and ti s. By transitivity neither t nor any state in its solution can be substituted by s or any state s such that s s. It should be noted that both types of pruning can be applied at the same time, but only if they use the same relation.

Theorem 3 Let be a satisﬁcing dominance relation. Then: 1. Let be a strict relation such that s t iff s t and t s. Performing dominance pruning with and action selection with is always safe. 2. Let be a different strict satisﬁcing dominance relation. Then, there exist cases where performing dominance pruning with and action selection with is not safe.

Proof Sketch: To show (2.) consider again the example of Figure 2. Let be a relation such that s s s I t t s G and be a strict relation such that t t s I s s s G. Then, by Theorem 2, s I can be replaced by t, and then later t could be pruned according to Theorem 1 because it is dominated by a previously expanded state (s I). To show (1.), if both relations and approximate the same heuristic h , then the minimum h value of any state in the open list monotonically decreases. A loop like the one in the example above cannot happen because the value of h

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

can only decrease along the path to the goal. Since h (t ) < h (t), t cannot be pruned by any state that has a larger h value (e.g., any state that is replaced by t).

4 Dominance-Based Enforced Hill-Climbing Enforced Hill-Climbing (EHC) is a well-known search algorithm for planning [Hoffmann and Nebel, 2001]. EHC performs a breadth ﬁrst search from the initial state s I until ﬁnding a state s such that h(s) < h(s I). At that point, if s is a goal state a plan has been found. Otherwise, the initial state is replaced by s and the search is restarted. According to Theorem 2, any time we ﬁnd a state strictly better than s according to a satisﬁcing dominance relation , we can remove s and all other of its successors from consideration. This may be expensive to do for all states in the search, but can be easily done for the special case where s is the initial state s I. In that case, the search is restarted from the newly found state that dominates s I. This is a form of EHC, where the search is restarted whenever a state better than s I is found, substituting the heuristic function by a dominance relation to determine which states are better than s I. Also, while the original EHC algorithm used breadth-ﬁrst search to escape the current plateau, there is no reason to not consider other best-ﬁrst search algorithms with different priority functions as well. We deﬁne the dominance-based EHC DEHC (X) algorithm relative to any best-ﬁrst search algorithm X and strict preorder . DEHC (X) runs algorithm X until ﬁnding a goal state or any state s such that s I s. In the latter case, it restarts from s.

Theorem 4 Let X be a sound and complete best-ﬁrst search algorithm, and let be a strict preorder such that for any pair of reachable states s, t, if s t and s is solvable then t is solvable. Then DEHC (X) is sound and complete.

Proof: Soundness follows from soundness of X. Completeness: If the instance is solvable, each run of X can ﬁnish either ﬁnding a goal, or ﬁnding a state t such that s I t and another instance of X is started from t. Then, as t must be solvable, X can never be restarted on an unsolvable state. The algorithm always terminates because is a strict preorder so the number of times X may be called is bounded by the number of reachable states which is always ﬁnite. The conditions required for are weaker than what is required for a satisﬁcing dominance relation. If is based on a satisﬁcing dominance relation , then by Theorem 3, dominance pruning can be used in X. In this case, any state dominated by the initial state in each call of X can be pruned, thereby ensuring that the search does not re-expand any previous initial state. However, DEHC can also be used with relations deﬁned from heuristic functions in the following way.

Deﬁnition 2 (Heuristic-based Relation for DEHC) Let D be a quantitative dominance function and h be any heuristic. We deﬁne h as a relation such that s h t if and only if D(s, t) > and h(t) < h(s).

As D(s, t) > implies that h (s) h (t) > this means that if s is solvable, t must be solvable as well. This results in a complete variant of EHC with any heuristic function that uses dominance only to avoid dead-ends. The role of

in this context is to select when to be more or less greedy following the heuristic advice, interpolating between GBFS (when no dominance is found) and EHC.

5 Practical Methods for Computing Satisﬁcing Dominance Relations

In this section, we introduce a new method to compute dominance relations for satisﬁcing planning.

5.1 Serialized Dominance Relations Consider the example of Figure 1. Dominance relations for optimal planning will consider a state better if more cells have been visited, the robot has at least as much fuel and the position of the robot is the same. The latter condition is an important limitation because, in order to ﬁnd a state that dominates the initial state, the robot must go back to the initial position every time that it visits more cells. This is undesirable since it will be harder to ﬁnd a state that dominates s I and it will result in longer plans. Intuitively, we prefer states where more cells have been visited, regardless of the position of the robot. This is possible because these sub-goals are serializable, i.e., no sub-goal must be undone in order to achieve the rest. To obtain such relation, we serialize the LTSs so that a state dominates another if it is as good for the ﬁrst j 1 LTSs (has not un-visited any position), it is strictly better in Θj (has visited a new position), and there exists a solution for all other LTSs without using any label that is dangerous for the previous LTSs. We deﬁne a label as dangerous for an LTS Θi according to i if applying it on some state si requires to go to a potentially worse state s i s.t. si i s i.

Deﬁnition 3 (Dangerous label) Let i be a relation for Θi. We say that a label l is dangerous for i if there exists a state si Θi such that there exists si l s i and there does not

exist si l ti s.t. si i ti.

For example, labels associated with movements that consume fuel are dangerous for the LTS that represents the amount of available fuel. However, movements of the robot are not dangerous for the LTSs that correspond to whether a cell has been visited or not. Now, we can serialize the LTSs that deﬁne our task. The serialized dominance gives preference to those states that are better according to the ﬁrst LTS, as long as a solution can be found for the other LTSs without using any label that is dangerous for the ﬁrst LTS (i.e., the sub-goals achieved do not need to be undone). To model this, we re-deﬁne label dominance (i.e., the component DL j (l, l ) used in the equation that deﬁnes a QDF) so that dangerous labels do not dominate any label.

Deﬁnition 4 (Serialized Dominance) Let 1, . . . , n be a label-dominance simulation for a list of LTSs Θ1, . . . , Θn and D1, . . . , Dn a list of functions that satisfy the equations of a QDF where DL j (l, l ) = for all l L and l L that is dangerous for i for any i < j. We deﬁne the serialized dominance relation as s S t iff sj j tj for all j [1, n] or exists i such that sj j tj for all j [1, i), si i ti and D(sj, tj) > for all j (i, n].

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

Theorem 5 A serialized dominance S is a satisﬁcing dominance relation.

Proof Sketch: We show that S approximates a descending and dead-end aware heuristic function h (s S t = h (t) h (s)). As h is dead-end aware, if s is a deadend then h (s) = and the condition holds. If s is not a dead-end then t cannot be a dead end because s S t implies P Di(si, ti) > . Therefore, it sufﬁces to consider the case where s and t are both solvable. We deﬁne h as the perfect goal distance under a cost function constructed from such that (i) costs of all transitions affecting Θi cost more than those that only affect Θj for i < j and (ii) if si i ti transitions from si cost more than transitions from ti. In both cases, the cost difference must be large enough so that the most expensive transition dominates the cost of the entire path. To prove that s S t = h (t) h (s), we assume WLOG that s1 1 t1.2 Then, for any path from s1, πs =

s1 l1 s1 1 . . . lk sk 1, there exists a path from t1, πt = t1 l 1

t1 1 . . . l k tk 1 such that si 1 ti 1 for all i [1, k]. Then, the cost of πt is lower than that of πs because the ﬁrst transition is more expensive from s1 (s1 1 t1) and the rest are not. Since Di(si, ti) > for all i [2, n], by the properties of a QDF, the path πt can always be extended into a plan for t by inserting additional actions. As DL i (l, l ) = these additional actions are not dangerous for 1. Since in our cost function the cost of the most expensive transition dominates the overall cost, the complete path for t is still cheaper than the one for s under this cost function so h (t) < h (s). The resulting dominance relation is heavily inﬂuenced by the ordering chosen for the LTSs. To preserve completeness, this order must be the same throughout the entire search. However, one does not need to decide the order a priori, but rather it can be dynamically chosen during the search. Initially, we keep a set with all {Θ1, . . . , Θn} unsorted and the list of serialized LTSs is initialized empty. When comparing a state s against s I, we check whether si i s I i for some i and insert Θi in the list of serialized LTSs if and only if thanks to this we get that s I S s. Using this policy in our running example, the order in which the cells are serialized in the dominance relation is exactly the order in which they are found during the search.

5.2 Recursive and Positive τ-Labels The method above is most interesting in situations where dominance relations in optimal planning cannot prove t to be closer to the goal than s, but where it can show that t is not a dead-end, i.e., < D(s, t) < 0. For this, τ-labels are of great importance. Having more τ-labels can only decrease the tau distance between states (hτ), which may in turn increase the value of D. Previous work considered l a τ-label for Θi if it has self-loop transitions for any state in all other Θj. In other words, transitions labeled with l may be used

2Let j be the smallest index for which sj j tj. If j > 1, we can consider instead the synchronized product Θ1 Θj. By the properties of label-dominance simulation, (s1, . . . , sj) 1,...,j (t1, . . . , tj).

to modify the value of Θi in any state without affecting the value of other LTSs. Hence, all τ-labels had to fulﬁll two properties:

1. They do not have preconditions on other LTSs so it is always applicable, and

2. They do not have side effects on other LTSs.

Here, we extend the notion of τ-labels in two different ways, relaxing each of these assumptions in order to ﬁnd coarser dominance relations.

Recursive τ-labels Some labels are not τ-labels because they have preconditions on other LTSs. For example, in a typical logistics transportation task, loading a package at some location is not a τ-label for the position of the package because it is not applicable in all states (the truck must be at the same location). However, the truck can always be driven from any given location to the position of the package, load it, and then drive back to the original position, reaching a state where the package is in the truck without affecting any other variable. Hence, we could introduce new transitions that correspond to those macro-actions. We use this in order to redeﬁne the set of τ-labels for each LTS. For every si such that there exists Θj with a path

πτ l (si, sj) = (si, sj) τ (si, s j) l (s i, s j) τ (s i, sj) for

all sj Θj, we may introduce a new transition sj l sj. The cost of this new transition is equal to the cost of the τ actions in πτ l (si, sj). Thanks to these self-loops, l may become a τ label. In that case, we do not introduce these transitions to the deﬁnition of the planning task. Instead, we simply consider label l to be a τ label with a cost equal to the maximum πτ l (si, sj) for any (si, sj). After introducing new τ-labels, the process can be repeated.

Positive τ-labels Some labels are not τ-labels because they have side effects. In our running example movements are not τ-labels for the LTSs representing the position of the robot because they have the side-effect of marking a cell as visited. However, these side-effects are always positive according to our dominance relation so they can be ignored for the computation of τ labels. A label is a positive τ-label for Θi iff Θj = Θi, sj Θj, there exists sj l tj Θj s.t. Dj(sj, tj) 0. When using this deﬁnition one must be careful, due to the circular dependency between the values of D and the set of τ-labels. D is typically computed by assuming a very coarse dominance relation and then iteratively reﬁning it until a ﬁxpoint is reached. However, the values of D during this computation have not been proven correct until it ends, so positive τ-labels cannot be deﬁned in terms of the D that is being computed. Hence, to avoid such circular dependencies we ﬁrst compute D based on the previous notion of τ-labels, and then compute a new set of τ-labels and re-compute D. This process can be repeated until no more labels are added to the set of τ labels.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

Baseline Pruning Domain GBFS L M DEHC(h B) DEHC(h FF ) GBFS(h FF ) (h FF ) D S D S D S

Logistics (63) 54 63 63 43 43 57 58 53 53 Miconic (150) 150 150 150 150 150 150 150 150 150 Openstacks (30) 30 30 30 7 7 30 30 30 30 Rovers (40) 23 40 40 14 12 22 24 23 23 Satellite (36) 30 36 36 10 10 21 25 22 26 Scanalyzer (50) 44 50 50 46 46 44 44 44 44 Visitall (40) 5 40 40 3 31 6 31 4 4 Woodwork (50) 49 50 50 6 6 49 49 49 49 Zenotravel (20) 20 20 20 13 13 20 20 20 20 P 405 479 479 292 318 399 431 395 399

Floortile (40) 8 8 8 8 8 14 14 14 14 Maintenance (20) 5 2 7 0 0 6 6 6 6 Nomystery (20) 9 13 15 20 11 20 17 20 16 Parking (40) 29 40 40 0 0 26 28 27 28 Pathways (30) 11 24 30 4 4 12 12 12 12 Pipes-NT (50) 30 43 44 15 14 29 29 29 29 Tidybot (20) 14 16 14 1 1 7 13 7 12 TPP (30) 22 30 30 6 6 23 22 21 23 P 128 176 188 54 44 137 141 136 140

Total (1636) P 1231 1462 1491 624 640 1217 1252 1212 1219

Table 1: Coverage on IPC instances. We highlight the best conﬁguration apart from LAMA (L) and Mercury (M). At the top are domains containing an instance where DEHC restarts from a state that dominates the initial state at least 10 times in a single instance. At the bottom, domains where dominance pruning has a non-negligible effect on coverage.

6 Experiments

We run experiments on all satisﬁcing-track STRIPS planning instances from the international planning competitions (IPC 98 IPC 14). All experiments were conducted on a cluster of Intel Xeon E5-2650v3 machines with time (memory) cut-offs of 30 minutes (4 GB). Our goal is to evaluate the potential of current dominance techniques to enhance search in satisﬁcing planning. As a simple baseline, we use lazy GBFS in Fast Downward [Helmert, 2006] with the h FF heuristic [Hoffmann and Nebel, 2001], and compare the results against dominance-based EHC guided with blind search (h B) and the h FF heuristic. We also include the performance of LAMA [Richter and Westphal, 2010] and Mercury [Katz and Hoffmann, 2014; Domshlak et al., 2015] as representatives of more modern planners. We run several conﬁgurations comparing our new serialized dominance ( S) against quantitative dominance relations ( D) used in previous work on optimal planning [Torralba, 2017]. However, satisﬁcing planning benchmarks are much larger than those for optimal planning so we change the dominance pruning setting in two important aspects. On the one hand, previous work considered using merge and shrink [Helmert et al., 2014] to reduce the number of LTSs. But, as the overhead is too large for these benchmarks, we consider instead only the atomic transition systems. This reduces the number of domains in which state-of-the-art methods are effective to ﬁnd dominance. On the other hand, previous work considered pruning states that are dominated by any previously expanded state. This check is too expensive in satisﬁcing planning so instead we only compare each state against its parent and the initial state.

Table 1 shows coverage results on domains where dominance has a non-negligible effect either by restarting the search from states that dominate the initial state (top part of the table) or by pruning states (bottom part). The results show that our dominance techniques are able to ﬁnd useful dominance relations in a number of IPC domains, even when only considering atomic transition systems. Compared to the baseline, the results are quite good in most domains. Results could be even better but our implementation is not able to ﬁnish the preprocessing in the largest tasks of some domains (e.g. Logistics, Rovers, Satellite, and Visitall). This explains why not all instances of Visitall are solved by DEHC(h B) and the difference wrt. the baseline in domains where no dominance pruning occurs.

The general trend is that serialized dominance relations ( S) are most useful to compare states against the initial state in the context of DEHC (domains in the upper part of the table), while the dominance relation based on the distance to the goal D is more effective for dominance pruning and action selection. The reason is that the serialization is global for the entire search, slightly reducing the ability of dominance pruning and action selection.

Focusing on domains where dominance is useful for DEHC, one can observe that there is no much synergy with heuristics and they can even be harmful, like in Scanalyzer. The reason is that the heuristic is not aware of the dominance relation so it may guide the search in a direction where no states dominating the initial state can be found.

Even though the results of DEHC are quite good on these domains compared to the baseline, they are still far behind LAMA and Mercury, which easily solve all instances in those domains. LAMA uses landmarks in order to achieve sub-goals very greedily so is extremely effective in domains where all sub-goals are serializable. Therefore, one may wonder whether there are cases where DEHC can beat LAMA. One example is Nomystery, where action selection pruning is specially effective. But our running example illustrates the strengths of dominance even better.

Figure 3 shows the comparison of DEHC S against LAMA in our running example. The particular feature of our running example is that it combines sub-goals that are easily serializable (visiting normal cells), with others that are not (visiting stripped cells). Heuristic approaches that greedily try to maximize the number of achieved sub-goals fall into a dead-end trap and are unable to solve the task. However, S is able to identify which sub-goals are safe and which ones could potentially be dangerous. As the heuristics are not aware of the dominance relation, this only works in combination with blind search. Otherwise the search is guided by the heuristic towards a part of the state space where no dominance can be found (correctly so, because it is a dead-end trap). DEHC D(h B) also beats LAMA in this domain, but it is still worse than DEHC S(h B) because using dominance purely based on goal distance the robot needs to go back to the initial state every time it visits a new cell, which is hard to do without any heuristic guidance towards there.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

Height/Width of the square part of the grid

Total time (s)

LAMA DEHC S(h FF)

DEHC D(h B)

DEHC S(h B)

Figure 3: Total time of DEHC and LAMA in our running example.

7 Related Work

The notion of dominance is related to approaches that characterize when a task can be solved in polynomial time. Our serialized dominance relation reminds of serializable subgoals [Korf, 1987; Barrett and Weld, 1993]. A set of subgoals is serializable if they can always be achieved sequentially without undoing any of them. Our dominance relation also imposes a serialization on the LTSs that form the planning task. This is slightly different from sub-goals in that we may obtain dominance if progress has been made in an LTS (e.g. a package being in the truck is better than at the initial position) while sub-goals only consider its goal value (the package being at its destination). If a problem has several serializable sub-goals, we can always construct a dominance relation that represents this information. Up to the best of our knowledge, there are no automatic algorithms to prove that a set of sub-goals is serializable. Serialized dominance could be tailored for this purpose. There is a long list of works that identify tractable fragments of the optimal and satisﬁcing planning problem [B ackstr om and Klein, 1991; Jonsson and B ackstr om, 1998; Brafman and Domshlak, 2003; Gim enez and Jonsson, 2008; Katz and Domshlak, 2008; Chen and Gim enez, 2010]. Our dominance techniques can capture some of the structure exploited by these tractable fragments, like acyclic causal graphs (using recursive τ-labels). But, at the same time, we are not limited by such features of the planning tasks (e.g. the causal graph of our running example is not acyclic). Moreover, dominance relations can be useful in tasks where planning is intractable but some part of the problem is easy to solve. In those cases, the use of dominance techniques can still dramatically reduce the search space. There are several parametrized search algorithms that run in polynomial time in a width parameter w. These algorithms are based on a substantial amount of pruning either by only allowing to change the value of up to w variables [Chen and Gim enez, 2007] or pruning states with a novelty greater than w [Lipovetzky and Geffner, 2012]. In both cases, these algorithms do not solve the entire planning task, but are used to ﬁnd a state better than the initial state in terms of the achieved sub-goals. Dominance relations offer an alternative way to compare states, which is more general than the criterion used by Chen and Gim enez and offers completeness guarantees unlike simple sub-goal based criteria suggesting potential in combining these approaches. Seipp et al. [2016] introduced another notion of width

based on how hard is to represent a dead-end aware and descending potential heuristic [Pommerening et al., 2015]. Many typical domains have a width of 2, meaning that it is easy to represent a heuristic that solves them in polynomial time. No method to automatically ﬁnd such heuristic is known yet but, satisﬁcing dominance relations approximate these kind of heuristics so any such algorithm could potentially be used to obtain dominance relations as well. A question that naturally comes up is what is the advantage of using dominance relations over heuristic functions. Dominance relations are more expressive than heuristics because they are partial preorders while heuristics are total preorders. For example, we may have relations where s t and s t , but the relation between s, t and s , t remains unknown (e.g. s t and s t). However, no assignment of heuristic values can represent this relation. In practice, this matters most in cases where dominance is able to discover some local information that can be exploited independently of the rest of the problem. Consider the Nomystery domain, where a truck transports a set of packages using a limited amount of fuel. The use of dominance allows us to identify that having a package at its destination is always good so we can unload it directly without considering any other alternative. The problem with heuristics is that they aggregate all estimations into a single value, making it very difﬁcult to identify in which parts of the state space the heuristic is wrong. In Nomystery, most heuristics will correctly estimate that all packages need to be loaded and unloaded exactly once, but underestimate the number of truck movements. However, the search will equally explore the possibility of loading/unloading packages in different locations due to the heuristic inaccuracies. Our Dominance-Based Enforced Hill-Climbing algorithm uses quantitative dominance functions to guarantee completeness by ensuring that the search is never restarted from a dead end. Recently, there have been other approaches based on heuristic reﬁnement that also devise a variant of EHC that preserves completeness [Fickert and Hoffmann, 2017].

8 Discussion

In this paper, we have introduced the notion of dominance for satisﬁcing planning. Dominance can be used for dominance pruning as well as to identify states that are strictly better than others. This allows the search algorithm to be extremely greedy, restarting the search from any state that dominates the initial state, but still preserving completeness. We have adapted the algorithms to automatically ﬁnd dominance relations for these purposes. Dominance can be a very powerful instrument to compare states, specially in instances with a mixture of complex and simple sub-goals. Our experiments show the ability of dominance to guide EHC search. However, there is still a gap when compared against state-of-the-art planners. Our results also point out some limitations of current dominance techniques that may be worth exploring in future work. Considering more than one variable at a time could help to ﬁnd stronger dominance relations in many domains. Also, heuristics could use the information captured by the dominance relation, increasing the synergy between them.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

Acknowledgments This paper has been supported by the German Research Foundation (DFG), under grant HO 2169/6-1, Star-Topology Decoupled State Space Search .

References [B ackstr om and Klein, 1991] Christer B ackstr om and Inger Klein. Planning in polynomial time: The SAS-PUBS class. Computational Intelligence, 7(4), 1991. [B ackstr om and Nebel, 1995] Christer B ackstr om and Bernhard Nebel. Complexity results for SAS+ planning. Computational Intelligence, 11(4):625 655, 1995. [Barrett and Weld, 1993] Anthony Barrett and Daniel S. Weld. Characterizing subgoal interactions for planning. pages 1388 1393, 1993. [Brafman and Domshlak, 2003] Ronen Brafman and Carmel Domshlak. Structure and complexity in planning with unary operators. Journal of Artiﬁcial Intelligence Research, 18:315 349, 2003. [Chen and Gim enez, 2007] Hubie Chen and Omer Gim enez. Act local, think global: Width notions for tractable planning. In Mark Boddy, Maria Fox, and Sylvie Thiebaux, editors, Proc. of the 17th International Conference on Automated Planning and Scheduling (ICAPS 07), pages 73 80, 2007. [Chen and Gim enez, 2010] Hubie Chen and Omer Gim enez. Causal graphs and structurally restricted planning. Journal of Computer and System Sciences, 76(7):579 592, 2010. [Chen et al., 2004] Y. Chen, C. Hsu, and B. Wah. SGPlan: Subgoal partitioning and resolution in planning. In Stefan Edelkamp, J org Hoffmann, Michael Littman, and H akan Younes, editors, Proc. of the 4th International Planning Competition, 2004. [Domshlak et al., 2015] Carmel Domshlak, J org Hoffmann, and Michael Katz. Red-black planning: A new systematic approach to partial delete relaxation. Artiﬁcial Intelligence, 221:73 114, 2015. [Fickert and Hoffmann, 2017] Maximilian Fickert and J org Hoffmann. Complete local search: Boosting hill-climbing through online heuristic-function reﬁnement. In Proc. of the 27th International Conference on Automated Planning and Scheduling (ICAPS 17), 2017. [Gim enez and Jonsson, 2008] Omer Gim enez and Anders Jonsson. The complexity of planning problems with simple causal graphs. Journal of Artiﬁcial Intelligence Research, 31:319 351, 2008. [Hall et al., 2013] David Hall, Alon Cohen, David Burkett, and Dan Klein. Faster optimal planning with partial-order pruning. In Daniel Borrajo, Simone Fratini, Subbarao Kambhampati, and Angelo Oddi, editors, Proc. of the 23rd International Conference on Automated Planning and Scheduling (ICAPS 13), 2013. [Helmert et al., 2014] Malte Helmert, Patrik Haslum, J org Hoffmann, and Raz Nissim. Merge & shrink abstraction: A method for generating lower bounds in factored state spaces. Journal of the Association for Computing Machinery, 61(3), 2014. [Helmert, 2006] Malte Helmert. The Fast Downward planning system. Journal of Artiﬁcial Intelligence Research, 26:191 246, 2006. [Hoffmann and Nebel, 2001] J org Hoffmann and Bernhard Nebel. The FF planning system: Fast plan generation through heuristic search. Journal of Artiﬁcial Intelligence Research, 14:253 302, 2001.

[Hoffmann et al., 2004] J org Hoffmann, Julie Porteous, and Laura Sebastia. Ordered landmarks in planning. Journal of Artiﬁcial Intelligence Research, 22:215 278, 2004. [Hoffmann, 2005] J org Hoffmann. Where ignoring delete lists works: Local search topology in planning benchmarks. Journal of Artiﬁcial Intelligence Research, 24:685 758, 2005. [Jonsson and B ackstr om, 1998] Peter Jonsson and Christer B ackstr om. State-variable planning under structural restrictions: Algorithms and complexity. Artiﬁcial Intelligence, 100(1 2):125 176, 1998. [Katz and Domshlak, 2008] Michael Katz and Carmel Domshlak. New islands of tractability of cost-optimal planning. Journal of Artiﬁcial Intelligence Research, 32:203 288, 2008. [Katz and Hoffmann, 2014] Michael Katz and J org Hoffmann. Mercury planner: Pushing the limits of partial delete relaxation. In IPC 2014 planner abstracts, pages 43 47, 2014. [Korf, 1987] Richard E. Korf. Planning as search: A quantitative approach. Artiﬁcial Intelligence, 33(1):65 88, 1987. [Lipovetzky and Geffner, 2012] Nir Lipovetzky and Hector Geffner. Width and serialization of classical planning problems. In Luc De Raedt, editor, Proc. of the 20th European Conference on Artiﬁcial Intelligence (ECAI 12), pages 540 545, 2012. [Pommerening et al., 2015] Florian Pommerening, Malte Helmert, Gabriele R oger, and Jendrik Seipp. From non-negative to general operator cost partitioning. In Blai Bonet and Sven Koenig, editors, Proc. of the 29th AAAI Conference on Artiﬁcial Intelligence (AAAI 15), pages 3335 3341, 2015. [Porteous et al., 2001] Julie Porteous, Laura Sebastia, and J org Hoffmann. On the extraction, ordering, and usage of landmarks in planning. In A. Cesta and D. Borrajo, editors, Proc. of the 6th European Conference on Planning (ECP 01), pages 37 48, 2001. [Richter and Westphal, 2010] Silvia Richter and Matthias Westphal. The LAMA planner: Guiding cost-based anytime planning with landmarks. Journal of Artiﬁcial Intelligence Research, 39:127 177, 2010. [Richter et al., 2011] Silvia Richter, Matthias Westphal, and Malte Helmert. LAMA 2008 and 2011 (planner abstract). In IPC 2011 planner abstracts, pages 50 54, 2011. [R oger and Helmert, 2010] Gabriele R oger and Malte Helmert. The more, the merrier: Combining heuristic estimators for satisﬁcing planning. In Ronen I. Brafman, Hector Geffner, J org Hoffmann, and Henry A. Kautz, editors, Proc. of the 20th International Conference on Automated Planning and Scheduling (ICAPS 10), pages 246 249, 2010. [Seipp et al., 2016] Jendrik Seipp, Florian Pommerening, Gabriele R oger, and Malte Helmert. Correlation complexity of classical planning domains. In Subbarao Kambhampati, editor, Proc. of the 25th International Joint Conference on Artiﬁcial Intelligence (IJCAI 16), pages 3242 3250, 2016.

[Torralba and Hoffmann, 2015] Alvaro Torralba and J org Hoffmann. Simulation-based admissible dominance pruning. In Qiang Yang, editor, Proc. of the 24th International Joint Conference on Artiﬁcial Intelligence (IJCAI 15), pages 1689 1695, 2015. [Torralba, 2017] Alvaro Torralba. From qualitative to quantitative dominance pruning for optimal planning. In Carles Sierra, editor, Proc. of the 26th International Joint Conference on Artiﬁcial Intelligence (IJCAI 17), pages 4426 4432, 2017.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)