# operatorpotential_heuristics_for_symbolic_search__c079a986.pdf

Operator-Potential Heuristics for Symbolic Search

Daniel Fiˇser1,2, Alvaro Torralba3, J org Hoffmann1

1 Saarland University, Saarland Informatics Campus, Saarbr ucken, Germany 2 Czech Technical University in Prague, Faculty of Electrical Engineering, Czech Republic 3 Aalborg University, Denmark danﬁs@danﬁs.cz, alto@cs.aau.dk, hoffmann@cs.uni-saarland.de

Symbolic search, using Binary Decision Diagrams (BDDs) to represent sets of states, is a competitive approach to optimal planning. Yet heuristic search in this context remains challenging. The many advances on admissible planning heuristics are not directly applicable, as they evaluate one state at a time. Indeed, progress using heuristic functions in symbolic search has been limited and even very informed heuristics have been shown to be detrimental. Here we show how this connection can be made stronger for LP-based potential heuristics. Our key observation is that, for this family of heuristic functions, the change of heuristic value induced by each operator can be precomputed. This facilitates their smooth integration into symbolic search. Our experiments show that this can pay off signiﬁcantly: we establish a new state of the art in optimal symbolic planning.

1 Introduction A search with admissible heuristics and symbolic search are currently the two main contenders for the state of the art in cost-optimal planning. In principle, these are two orthogonal enhancements of a vanilla search algorithm admissible heuristics aim to reduce the number of explored states, and symbolic search uses Binary Decision Diagrams (BDDs) (Bryant 1986) to efﬁciently represent and manipulate sets of states, greatly speeding up exhaustive search. A natural idea is to combine the two, and indeed that idea has been presented decades ago in the BDDA algorithm (Edelkamp and Reffel 1998; Edelkamp 2002). Yet that combination has not been an unqualiﬁed success. For a heuristic to be effective in symbolic search, two properties are required: (1) it must be possible to efﬁciently evaluate sets of states represented as BDDs, without evaluating the heuristic on each represented state individually; and (2) it must induce a good partitioning, so that sets of states with the same gand h-value can be efﬁciently represented as BDDs. Property (1) is fulﬁlled by some of the strongest heuristics for explicit-state search (e.g., symbolic PDBs (Kissmann and Edelkamp 2011; Franco et al. 2017; Torralba, L opez, and Borrajo 2018)) so they can be used in BDDA . However, it has been shown that even very informative heuristics can be detrimental in symbolic

Copyright 2022, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

search (Speck, Geißer, and Mattm uller 2020), when they do not fulﬁll property (2). The main reason is that reducing the amount of expanded states may be detrimental if the underlying BDD representation is less efﬁcient. Due to all this, symbolic bidirectional blind search (without heuristics) is considered the dominant symbolic search approach, and the use of heuristic search in this context has lost traction. Here we challenge this trend by showing that potential heuristics (Pommerening et al. 2015) yield fresh synergy between heuristic and symbolic search. Such heuristics assign a numeric value (a potential) to each fact of the planning task, in a way so that the sum of the potentials of the facts true in a state is an admissible estimate of the state s goal distance. As we show, potential heuristics are particularly well suited for combination with symbolic search. Our key observation is that potentials can be computed for each operator rather than for each fact. Such operator potentials combine synergically with symbolic search as they have property (1): Under certain conditions, the operator potential of an operator o is equal to the difference in heuristic values h(s ) h(s) for any state transition s s induced by the operator o. This allows for an efﬁcient encoding of potential heuristics in symbolic search without having to compute the heuristic value of each explored state (Jensen, Veloso, and Bryant 2008). The main difﬁculty in doing so is that these operator potentials are real (ﬂoating-point) numbers, which can lead to rounding and precision issues. Naively rounding these values may lead to a path-dependent and inconsistent heuristic. We show that this can be dealt with by rounding operator potentials within the mixed-integer linear program (MIP) that derives the potential heuristics. Our empirical analysis shows that potential heuristics also fulﬁll property (2). That is, they not only reduce the number of explored states, but also lead to improvements on the number of BDD nodes on average. This makes potential heuristics very helpful in symbolic search across a large number of benchmark domains. Overall, symbolic forward search with potential heuristics soundly outperforms symbolic bidirectional blind search, thus establishing a new state of the art in optimal symbolic planning.

2 Preliminaries We consider the ﬁnite domain representation (FDR) of planning tasks (B ackstr om and Nebel 1995). An FDR planning

The Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22)

task Π is speciﬁed by a tuple Π = V, O, I, G . V is a ﬁnite set of variables, each variable V V has a ﬁnite domain dom(V ). A fact V, v is a pair of a variable V V and one of its values v dom(V ). The set of all facts is denoted by F = { V, v | V V, v dom(V )}, and the set of facts of variable V is denoted by FV = { V, v | v dom(V )}. A partial state p is a variable assignment over some variables vars(p) V. We write p[V ] for the value assigned to the variable V vars(p) in the partial state p. We also identify p with the set of facts contained in p, i.e., p = { V, p[V ] | V vars(p)}. A partial state s is a state if vars(s) = V. I is an initial state. G is a partial state called goal, and a state s is a goal state iff G s. Let p, t be partial states. We say that t extends p if p t. O is a ﬁnite set of operators, each operator o O has a precondition pre(o) and effect eﬀ(o), which are partial states over V, and a cost cost(o) R+ 0 . An operator o is applicable in a state s iff pre(o) s. The resulting state of applying an applicable operator o in a state s is another state o Js K such that o Js K[V ] = eﬀ(o)[V ] for every V vars(eﬀ(o)), and o Js K[V ] = s[V ] for every V V\vars(eﬀ(o)). We also assume that pre(o)[V ] = eﬀ(o)[V ] for every V vars(pre(o)) vars(eﬀ(o)). Given a non-negative integer n N0, [n] denotes the set {1, . . . , n} with [0] deﬁned as an empty set. A sequence of operators π = o1, . . . , on is applicable in a state s0 if there are states s1, . . . , sn such that oi is applicable in si 1 and si = oi Jsi 1K for i [n]. The resulting state of this application is πJs0K = sn and cost(π) = Pn i=1 cost(oi) denotes the cost of this sequence of operators. A sequence of operators π is called an s-plan iff π is applicable in a state s and πJs K is a goal state. An s-plan π is called optimal if its cost is minimal among all s-plans. A state s is reachable if there exists an operator sequence π applicable in I such that πJIK = s. Otherwise, we say that s is unreachable. The set of all reachable states is denoted by R. An operator o is reachable iff it is applicable in some reachable state. A state s is a dead-end state iff G s and there is no s-plan. A set of facts M F is a mutex if M s for every reachable state s R. A heuristic h : R 7 R { } estimates the cost of optimal s-plans. The optimal heuristic h (s) maps each reachable state s to the cost of the optimal s-plan or to if s is a dead-end state. A heuristic h is called (a) admissible iff h(s) h (s) for every reachable state s R; (b) goalaware iff h(s) 0 for every reachable goal state s; and (c) consistent iff h(s) h(o Js K) + cost(o) for all reachable states s R and operators o O applicable in s. It is well-known that goal-aware and consistent heuristics are also admissible. In the context of heuristic search, h-value of a state node s refers to the heuristic value of s, g-value to the cost of the sequence of operators leading to s, and f-value is the sum of g-value and the maximum of h-value and zero (since we allow negative h-values).

3 Symbolic Search Background

Explicit state-space search operates on individual states, whereas symbolic search (Mc Millan 1993) works on sets of

states represented by their characteristic functions. A characteristic function f S of a set of states S is a Boolean function assigning 1 to states that belong to S and 0 to states that do not belong to S. Operations like the union ( ), intersection ( ), and complement of sets of states correspond to the disjunction ( ), conjunction ( ), and negation ( ) of their characteristic functions, respectively. Binary Decision Diagrams (BDDs) (Bryant 1986) are a efﬁcient data-structure to represent Boolean functions in the form of a directed acyclic graph. The size of a BDD is the number of nodes in this representation. The main advantage of using BDDs is that often a BDD is much smaller than the number of states it represents. In fact, BDDs can be exponentially smaller, as certain sets containing exponentially many states can be represented by BDDs of polynomial size (Edelkamp and Kissmann 2008). Most operations on BDDs take only polynomial time in the size of the BDD, which enables the efﬁcient manipulation of large sets of states. The most prominent implementation of symbolic heuristic search in the context of automated planning is BDDA (Edelkamp and Reffel 1998) which is a variant of A (Hart, Nilsson, and Raphael 1968) using BDDs to represent sets of states. In BDDA , operators of planning tasks are represented as transition relations (TRs), also using BDDs. A TR of an operator o is a characteristic function of pairs of states (s, o Js K) for all states s such that o is applicable in s. Having a TR To for every operator o O, we can construct a TR of a set of operators with the same cost c as Tc = W o O,cost(o)=c To. As the size of Tc may be exponential in the number of operators with cost c, in practice, it is often a good idea to use disjunctive partitioning to keep the size at bay (Jensen, Veloso, and Bryant 2008; Torralba et al. 2017). Moreover, mutexes can be used for a more accurate approximation of reachable states (Torralba et al. 2017). Like A , BDDA expands states by ascending order of their f-value. To take advantage of the symbolic representation, BDDA represents all states with the same g and h value in a single BDD Sg,h (disjunctive partitioning of Sg,h can also be used). Given a set of states Sg,h and a TR Tc, image(Sg,h, Tc) computes the set of successor states reachable from any state in Sg,h by applying any operator represented by Tc.1 The g-value of the resulting set of successor states is simply g + c. These successor states have to be split according to their h value. This can usually be performed efﬁciently (e.g., with symbolic PDBs (Kissmann and Edelkamp 2011)) by representing the heuristic as a BDD Sh per heuristic value that represents the set of states with that value and performing a conjunction. GHSETA and FSETA (Jensen, Veloso, and Bryant 2008) encode the heuristic function as part of the transition relation, creating multiple TRs depending on the impact of the operators on heuristic value. This is a very efﬁcient way of evaluating the heuristics within symbolic search. However, up to now, all heuristics known to be suitable for this representation were either non-informative, inadmissible, or domain dependent.

1The details how the function image works are not important here Torralba et al. (2017) provide a detailed description.

4 Potential Heuristics Background Potential heuristics (Pommerening et al. 2015) assign a numerical value to each fact, and the heuristic value for a state s is then simply a sum of the potentials of all facts in s. Deﬁnition 1. Let Π denote a planning task with facts F. A potential function is a function P : F 7 R. A potential heuristic for P maps each state s R to the sum of potentials of facts in s, i.e., h P(s) = P f s P(f). We will leverage prior work on so-called disambiguation (Alc azar et al. 2013) to strengthen potential heuristics (Fiˇser, Horˇc ık, and Komenda 2020). A disambiguation of a variable V for a given set of facts p is simply a set of facts F FV from variable V such that every reachable state extending p contains one of the facts from F. Deﬁnition 2. Let Π denote a planning task with facts F and variables V, let V V denote a variable, and let p denote a partial state. A set of facts F FV is called a disambiguation of V for p if for every reachable state s R such that p s it holds that F s = (i.e., V, s[V ] F). Clearly, every FV is a disambiguation of V for all possible partial states, and if V, v p and there exists a reachable state extending p, then { V, v } is a disambiguation of V for p. Moreover, if the disambiguation of V for p is an empty set (for any V ), then all states extending p are unreachable. Therefore, we can use empty disambiguations to determine unsolvability of planning tasks (if G extends p), or to prune unreachable operators (if a precondition of the operator extends p). So, from now on we will consider only nonempty disambiguations. Fiˇser, Horˇc ık, and Komenda (2020) showed how to use mutexes to ﬁnd disambiguations, so here we will assume we already have disambiguations inferred. Furthermore, to simplify the notation, we introduce a disambiguation map. Deﬁnition 3. A mapping D : (O V) V 7 2F is called a disambiguation map if (i) for every operator o O and every variable V vars(eﬀ(o)) it holds that D(o, V ) FV is a disambiguation of V for pre(o) such that |D(o, V )| 1; and (ii) for every variable V V it holds that D(V ) FV is a disambiguation of V for G such that |D(V )| 1. Now we can state sufﬁcient conditions for the potential heuristic to be admissible, which we will need later on. Theorem 4. (Fiˇser, Horˇc ık, and Komenda 2020) Let Π = V, O, I, G denote a planning task with facts F, and let P denote a potential function, and let D denote a disambiguation map. If X

V V max f D(V ) P(f) 0 (1)

and for every operator o O it holds that X

V vars(eﬀ(o)) max f D(o,V ) P(f) X

f eﬀ(o) P(f) cost(o), (2)

then the potential heuristic for P is admissible. In practice, we can obtain potentials as a solution to a linear program (LP) with constraints corresponding to conditions from Theorem 4: For each f F, we create a (realvalued) variable P(f), add constraints Eq. (1) and Eq. (2),

and then a solution of such LP for any objective function results in a goal-aware and consistent potential function. So far, potential heuristics have been used as described in Definition 1, i.e., each fact gets assigned a potential value and the heuristic value for a state s is the sum of potentials of all facts in s.

5 Operator-Potential Heuristics Our key observation is that potentials can also be designed in a different way, yielding a new synergy with symbolic search: We can assign a potential to each operator and compute an admissible heuristic value for a state s reached by a sequence of operators π as a sum of operator potentials of all operators in π. We start by introducing an operator-potential function. Deﬁnition 5. Given a potential function P, and a disambiguation map D, a function Q : O 7 R is called an operator-potential function for P and D if

f eﬀ(o) P(f) X

V vars(eﬀ(o)) max f D(o,V ) P(f) (3)

for every operator o O. Note that the value of Q(o) is just the value of the left hand side of Eq. (2) with the opposite sign. Or in other words, the operator-potential function for an operator o gives us the lower bound on the change of the heuristic value of the corresponding potential heuristic for the given potential function P and disambiguation map D. This immediately leads to another observation that for every sequence of operators π = o1, . . . , on applicable in the initial state I such that s = πJIK it holds that h P(I) + P i [n] Q(oi) h P(s). So, we can compute an admissible heuristic estimate for any state s reachable by a sequence of operators π as h P(I) + P i [n] Q(oi). However, this estimate is pathdependent and therefore not necessarily consistent. Consider a planning task with a binary variable V , and an operator o with empty precondition and eﬀ(o) = { V, 1 }. It is easy to see that the value Q(o) is ﬁxed for the operator as the smallest change in the heuristic value it can induce, but the actual change of the heuristic value may be different if the operator is applied on the state where V, 0 or V, 1 is set. To avoid this difﬁculty, we now show that, if the disambiguation map D maps every operator o and every effect variable V vars(eﬀ(o)) to a singleton, then h P(I) + P i [n] Q(oi) = h P(s) for every sequence of operators π = o1, . . . , on such that πJIK = s. That is, as long as the preconditions on the variables affected by any operator o are known precisely, the potential h-value for any state s can be computed as the potential h-value for the initial state plus the sum of operator potentials of operators from any sequence of operators π leading to s. Lemma 6 shows that equality holds for any two consecutive states, and Lemma 7 shows it holds over any sequence of operators applicable in the initial state. Lemma 6. Let P denote a potential function, D a disambiguation map, Q an operator-potential function for P and D, s a reachable state, and let o denote an operator applicable in s. If |D(o, V )| = 1 for every V vars(eﬀ(o)), then P f s P(f) + Q(o) = P f o Js K P(f).

Proof. Let A = S V vars(eﬀ(o)) D(o, V ). Since |D(o, V )| = 1 for every V vars(eﬀ(o)), Equation (3) can be rewritten as Q(o) = P f eﬀ(o) P(f) P f A P(f). And since s is reachable and o is applicable in s, it holds that A s. Let B = s \ A. Clearly, o Js K = B eﬀ(o) and B eﬀ(o) = . Therefore, P f s P(f) + Q(o) = P f o Js K P(f) can be rewritten to P f B P(f) + P f A P(f) + Q(o) = P f B P(f) + P f eﬀ(o) P(f), and further simpliﬁed to P f A P(f) + Q(o) = P f eﬀ(o) P(f). Expanding Q(o) gives us P f A P(f) + P f eﬀ(o) P(f) P f A P(f) = P f eﬀ(o) P(f), which concludes the proof.

Lemma 7. Let P, D, and Q be as before, and let π = o1, . . . , on denote a sequence of operators applicable in I, and let s = πJIK. If |D(o, V )| = 1 for every o O and every V vars(eﬀ(o)), then P f I P(f) + P i [n] Q(oi) = P f s P(f ).

Proof. (By induction) It clearly holds for an empty sequence π. Let s denote a state reachable from I by a sequence π = o1, . . . , on 1 , and let on O denote an operator applicable in s , and let s = on Js K. Now, assume that P f I P(f)+ P i [n 1] Q(oi) = P f s P(f ), and we need to prove that P f I P(f) + P i [n] Q(oi) = P f s P(f ). From the assumption, it follows that P f I P(f) + P i [n 1] Q(oi) + Q(on) = P f s P(f ) + Q(on), so it is enough to show that P f s P(f ) + Q(on) = P f s P(f) (Lemma 6).

Now, getting to the main result of this section, we formulate an operator-potential heuristic and we prove that this heuristic is well-deﬁned and equal to the corresponding (fact) potential heuristic.

Deﬁnition 8. Let Q denote an operator-potential function for P and D such that |D(o, V )| = 1 for every o O and every V vars(eﬀ(o)). An operator-potential heuristic h Q : R 7 R { } for Q is deﬁned as

f I P(f) + X

i [n] Q(oi) (4)

for any sequence of operators π = o1, . . . , on such that πJIK = s.

Theorem 9. Let D denote a disambiguation map such that |D(o, V )| = 1 for every o O and every V vars(eﬀ(o)), let P denote a potential function, and let Q denote an operator-potential function for P and D. Then h Q is welldeﬁned, and h Q(s) = h P(s) for every reachable state s, and h Q is admissible (goal-aware, consistent) if h P is admissible (goal-aware, consistent).

Proof. It follows directly from Lemma 7.

Note that every planning task can be transformed into another task where |D(o, V )| = 1 holds for every operator o and variable V vars(eﬀ(o)). One possibility is the use of transition normal form (Pommerening and Helmert 2015)

Figure 1: Let cost(oi) = 0 for all i [4], and cost(o5) = 1, and let Q(o1) = 1, Q(o2) = 0, Q(o3) = 0.9, Q(o4) = 0.1, and Q(o5) = 1, and let h Q(I) = 0. A simple example showing inconsistency after rounding operator potentials down to the nearest integers.

which is polynomial, but introduces a set of auxiliary operators, and requires a transformation of the resulting plan back to the original planning task. Another option is to enforce this property simply by enumerating all possible combinations of facts from all disambiguations D(o, V ) such that |D(o, V )| > 1 for all operators preconditions. For example, given an operator o with vars(eﬀ(o)) = {v1, v2}, and D(o, v1) = {f1, f2} and D(o, v2) = {f3, f4}, we replace the operator o with four new operators o1, . . . , o4 with the same effect eﬀ(o1) = eﬀ(o2) = eﬀ(o3) = eﬀ(o4) = eﬀ(o), but different preconditions pre(o1) = pre(o) {f1, f3}, pre(o2) = pre(o) {f1, f4}, pre(o3) = pre(o) {f2, f3}, and pre(o4) = pre(o) {f2, f4}. This way, the transformed planning task can grow exponentially in the number of effects not appearing in the preconditions of an operator. However, we can prune operators whose preconditions form mutex, and in our experiments, we ran out of memory in only one task.

6 Handling Floating-Point Potentials Although Theorem 9 identiﬁes conditions under which operator-potential heuristics are consistent and equal to the corresponding potential heuristics, in practice there is an additional complication resulting from the fact that the Q(o) values are typically represented as ﬂoating-point numbers. So, we should not compare Q(o) values on equality. Moreover, having ﬂoating-point h-values is an even larger issue in symbolic search, as states are aggregated based on their hvalue. If ﬂoating-point numbers are used, one could get different BDDs for every state in the search, greatly reducing the efﬁciency of symbolic search. So, it is desirable to represent in a single BDD all states whose h-values are rounded to the same integer value. Rounding operator potentials down to the nearest integers would resolve this problem and it would keep the heuristic function admissible. Unfortunately, this kind of rounding could make the heuristic inconsistent. Consider a planning task depicted in Figure 1. Clearly, the operator-potential heuristic h Q is both admissible and consistent. Now, let ˆQ denote an operator potential function obtained by rounding down Q to the nearest integers, i.e., ˆQ(o1) = 1, ˆQ(o2) = 0, ˆQ(o3) = 0, ˆQ(o4) = 0, and ˆQ(o5) = 1. The sum P f I P(f) + P i [n] Q(oi) (cf. Deﬁnition 8) provides an admissible estimate, because rounding down can only make the sum smaller. However, rounding down can also make this estimate path-dependent, i.e., we can obtain different values for a state depending on the path by which we reached the state, and inconsistent.

Consider the states s1 and s2, and operator sequences π = o1 and π = o3, o4 from Figure 1. Since the heuristic value for the initial state is zero, the inequality h Q(s1) h Q(s2) cost(o2) holds, because h Q(s1) = 1, h Q(s2) = 1, and cost(o2) = 0. But after rounding, we get ˆQ(o1) = 1 and ˆQ(o3) + ˆQ(o4) = 0 resulting in a higher estimate for s1 than for s2 using ˆQ. We resolve this issue by encoding the rounding directly into the mixed-integer linear program (MIP) expressing the potentials. Since the operator potential is just a left hand side of Eq. (2) with the opposite sign, we can extend the original LP for the inference of potentials with new integer variables, each corresponding to operator-potential Q(o), and add a new constraint Eq. (3) for each Q(o). That is, the new MIP has a real-valued variable P(f) for each f F, an integer variable Q(o) for each operator o O, and a set of constraints Eq. (1), Eq. (2), and Eq. (3). A solution to such MIP for any objective function yields a goal-aware and consistent operator-potential function. Moreover, since the variables corresponding to Q(o) are integer-valued, the resulting operator-potentials are also integer-valued while the fact potentials can remain real-valued. Therefore, the MIP solver ﬁnds operator-potentials that are properly rounded, so that we can compare the operator-potentials on equality without running into problems with ﬂoating-point numbers.

7 Symbolic Search with Operator-Potentials

Using potential heuristics in BDDA is not straightforward, as the standard way of evaluating the heuristics by constructing a BDD Sh representing all states with a heuristic value equal to h may not always be feasible. The naive approach would be to enumerate all possible sub-sets of features whose potentials add up to h. However, this requires enumerating exponentially many sub-sets and it may result in an exponentially large intermediate BDD. We overcome this difﬁculty by using operator-potential heuristics instead. To do so, we use GHSETA (Jensen, Veloso, and Bryant 2008), a symbolic heuristic search algorithm which partitions the TRs not only by the cost of the corresponding operators, but also by the change of the heuristic value they induce.2 That is, instead of creating a TR Tc for all operators o having cost(o) = c, we create a TR Tc,q representing all operators o such that cost(o) = c and Q(o) = q. For the initial state, the g-value is set to zero, and the hvalue is set to P f I P(f). For all subsequent states Sg,h expanded by the TR Tc,q, the g-value and h-value of the resulting state S g ,h = image(Sg,h, Tc,q) is set to g = g + c and h = h + q, respectively. The pseudocode for the algorithm using a consistent operator-potential heuristic is encapsulated in Algorithm 1. On lines 1 and 2, the TRs corresponding to all unique pairs of operator costs and Q(o) values are constructed. The heuristic value for the initial state is computed on line 3. The open list of sets of states (represented by BDDs) ordered by f = g + max(0, h) values is initialized with the initial state

2The alternative FSETA does not directly support negative heuristic values (Jensen, Veloso, and Bryant 2008).

Algorithm 1: Symbolic forward A with a consistent operator-potential heuristic.

Input: A planning task Π, a consistent operator potential function Q. Output: An optimal plan or unsolvable .

1 for each c, q {cost(o), Q(o) | o O} do

2 Construct Tc,q from {o O | cost(o) = c, Q(o) = q};

4 S0,h I {I};

5 open { max(0, h I), S0,h I };

7 while open = do

8 Sg,h Pop Min(open) \ closed;

9 if Sg,h contains a goal state then

10 return Extract Plan(Sg,h);

11 closed closed Sg,h;

12 for each Tc,q do

13 Sg+c,h+q image(Sg,h, Tc,q) \ closed;

14 if Sg+c,h+q = then

15 f g + c + max(0, h + q);

16 Insert Or Update(open, f, Sg+c,h+q );

17 return unsolvable ;

on lines 4 and 5. On line 6, a BDD representing all closed states is constructed. The while-cycle on lines 7 16 is an A

algorithm adapted to the symbolic search. On line 8, we extract the set of states with the lowest f-value from the open list and remove all closed states from this set. If a goal state is reached (line 9 and 10), an optimal plan is extracted and returned (for details see (Torralba et al. 2017)). If the current set of states Sg,h does not contain a goal state, then all these states are added to the set of closed states (line 11). On lines 12 16, all operators are applied and the resulting states that were not closed yet are assigned the correct g and h values and either inserted into the open list (if there is no Sg+c,h+q in the open list), or the set of states in the open list is extended with the new set of states. Note that we need the operator-potential heuristic to be consistent in order to avoid re-opening states (cf. lines 8 and 13). This also means that if the computation of consistent operator-potentials require the transformation of the planning task described in Section 5, then the same transformed task must be used also for the symbolic search. GHSETA can also be adapted to work with the pathdependent (and thus possibly inconsistent) variant of the operator-potential heuristic that does not require transforming the planning task. We need to allow re-opening states that were previously closed with a higher g-value. We can achieve that by maintaining not one BDD representing closed states, but one BDD for each g-value (cf. lines 6 and 11). And when removing closed states from the set of states, we remove only the closed states with the same or lower gvalue (cf. lines 8 and 13).

8 Experimental Evaluation We implemented our search algorithm in C.3 Operators and facts are pruned with the h2 heuristic in forward and back-

3https://gitlab.com/danﬁs/cpddl, branch aaai22-symba-op-pot

10 102 103 104 105 106 bfw

(a) Mean number of BDD nodes per expanded BDD.

10 102 103 104 105 bfw

(b) Number of expanded BDDs.

10 102 103 104 105 106 107 108 bfw

(c) Sum of expanded BDD nodes.

1 10 102 103 bfw

(d) Runtime in seconds.

10-2 10-1 1 10 102 103 LP

(e) Inference time of potentials.

Figure 2: (a) (d): Comparison of symbolic forward uniform-cost search (bfw) against GHSETA with the best-performing variant of the consistent operator-potential heuristic (A+I) on commonly solved tasks. (e): Time (s) spent computing potentials for the original formulation (LP) and with added constraints on integer operator potentials (MIP) on transformed tasks.

ward direction (Alc azar and Torralba 2015), and the translation from PDDL to FDR uses the inference of mutex groups proposed by Fiˇser (2020). We used all planning domains from the optimal track of International Planning Competitions (IPCs) from 1998 to 2018 excluding the ones containing conditional effects after translation. We merged, for each domain, all benchmark suites across different IPCs. This leaves 48 domains overall. We used a cluster of computing nodes with Intel Xeon Scalable Gold 6146 processors and CPLEX (I)LP solver v12.10. The time and memory limits were set to 30 minutes and 8 GB, respectively. We used a time limit of 30 seconds for applying mutexes on the goal BDD and 10 seconds for merging transition relation BDDs (Torralba et al. 2017). We evaluated GHSETA with the following variants of the consistent operator-potential heuristics obtained on the transformed planning tasks: I: maximize the h Q-value of the initial state (Pommerening et al. 2015). A+I: maximize the h Q-value for the average (syntactic) state while enforcing the maximum h Q(I) (Seipp, Pommerening, and Helmert 2015; Fiˇser, Horˇc ık, and Komenda 2020). S1k+I: maximize the h Q-value for 1 000 states sampled using random walks, while enforcing the maximum h Q(I) (Seipp, Pommerening, and Helmert 2015; Fiˇser, Horˇc ık, and Komenda 2020). M2+I: maximize the h Q-value for all reachable states approximated with mutex pairs while enforcing the maximum h Q(I) (Fiˇser, Horˇc ık, and Komenda 2020). We also evaluated the same consistent operator-potential heuristics with the tasks transformed to the transition normal form (Pommerening and Helmert 2015) (preﬁxed with tnf-); and the path-dependent operator-potential heuristics on the original planning task (preﬁxed with pd-). We compare these to symbolic uniform-cost search using forward (bfw) and bidirectional search (bbi)4. Furthermore, we compare to other state-of-the-art planners. We ran

4Our implementation does not lack in performance behind the IPC 2018 Symb A planner its overall coverage is 942 and 852 for blind bidirectional and blind forward search, respectively.

A+I M2+I comp2

S1k+I pd-A+I

bbi pot A+I I

scrp 23 23 19 26 24 27 26 30 31 36 37 39 1 112 A+I 13 7 17 17 9 24 26 26 30 33 37 36 1 109 M2+I 13 2 17 16 10 23 25 23 31 33 36 35 1 103 comp2 16 16 18 23 18 22 27 25 32 33 38 37 1 096 S1k+I 13 3 6 16 9 22 22 22 29 29 35 34 1 081 pd-A+I 12 2 5 16 15 22 22 23 27 30 32 34 1 080 bbi 14 14 14 12 16 15 23 20 23 24 29 31 1 004 pot A+I 5 9 9 13 14 11 18 19 23 24 22 29 1 000 I 10 2 3 9 5 5 16 15 20 21 24 29 996 bfw 10 7 6 7 9 7 7 17 10 17 22 28 936 lmc 2 4 5 6 8 8 13 14 15 20 21 28 901 ms 1 1 2 4 5 4 11 4 10 16 15 24 859 tnf-A+I 4 0 1 6 3 2 7 7 6 12 11 12 803

Table 1: Summary of domain coverage. A value in row x and column y is the number of domains where x solved more tasks than y, it is bold if higher than the value in row y and column x. tot shows overall number of solved tasks.

A with the LM-Cut (lmc) heuristic (Helmert and Domshlak 2009), with the merge-and-shrink (ms) heuristic with SCCDFP merge strategy and non-greedy bisimulation shrink strategy (Helmert et al. 2014; Sievers, Wehrle, and Helmert 2016), and with the potential heuristic (pot A+I), i.e., a variant of A+I for A . We further compare to two of the bestperforming non-portfolio planners from IPC 2018: Complementary2 (comp2) (Franco et al. 2017; Franco, Lelis, and Barley 2018), and Scorpion (scrp) (Seipp 2018; Seipp and Helmert 2018). Table 1 shows the comparison across all planners in terms of total coverage, and in terms of the number of domains in which each planner is superior to others.

Operator Potentials in Symbolic Search The comparison of symbolic potential variants against the baseline forward search without heuristics (bfw) clearly demonstrates that potential heuristics are beneﬁcial for the performance of symbolic search over a wide range of different domains. The best variant of GHSETA with an operator-potential heuristic (A+I) solves 173 more tasks

than the baseline (bfw). It increases coverage on 30 different domains, and it is detrimental in only 7 domains. Among the different variants of potential heuristics, we observe that the optimization criteria can have a signiﬁcant impact on performance. The best variant is A+I, closely followed by S1k+I and M2+I, and signiﬁcantly better than I. These results are well in line with the results of the same potential heuristics in explicit search (Fiˇser, Horˇc ık, and Komenda 2020). For a heuristic to be beneﬁcial in symbolic search, it is required that sets of states with the same g and h value are efﬁciently represented with BDDs. The positive coverage results from Table 1 suggest that this is indeed the case for the operator-potential heuristics. To conﬁrm this, Figure 2 compares the performance of the baseline, symbolic search without any heuristics (bfw), against the best conﬁguration of our symbolic search with potential heuristics (A+I). First, we observe that the average number of BDD nodes per expanded BDD was almost always lower (Figure 2a). This means that, indeed, the partitioning induced by operator-potential heuristics is often beneﬁcial, resulting on sets of states during the search that have a concise BDD representation. This property, however, is not guaranteed by the method. In particular, the average number of BDD nodes per expanded BDD was higher for bfw in only ten tasks, though only by a small margin. On the other hand, the number of expanded BDDs is almost always increased (Figure 2b), which is not surprising, as sets of states during the search are not only partitioned by g-value, but also by h-value. Most remarkably, the number of BDD nodes from all expanded BDDs often decreased with potential heuristics (Figure 2c). This conﬁrms that these heuristics are not only informative for explicit-state search, avoiding expansion of certain states, but also beneﬁcial in symbolic search by inducing a good BDD partitioning. This contrasts with previous results on very informative heuristics ( 1

4h ), that despite their accuracy are often detrimental for the performance of symbolic search (Speck, Geißer, and Mattm uller 2020). Furthermore, the runtime of the planner is often decreased (Figure 2d), conﬁrming that partitioning the TRs using operator potentials is an effective way of evaluating the heuristic in symbolic search. So, not only potential heuristics can be informative for symbolic search, but they can also be efﬁciently evaluated. The increase in runtime for some of the tasks is often due to increased computational effort in the inference of integer operator potentials. Compared against symbolic bidirectional uniform cost search (bbi), which has state-of-the-art performance in symbolic search planning (Torralba et al. 2014; Torralba, Linares L opez, and Borrajo 2016), A+I solves 105 more tasks. The two algorithms are still quite complementary though, with A+I being superior in 24 domains and bbi in 14. This suggests that there is a potential to further improve bidirectional heuristic search by integrating operator-potential heuristics.

Explicit-State Search with Potential Heuristics

Compared to the potential heuristics in explicit-state search, A+I solves 109 instances more than pot A+I. Moreover, there are only 9 domains where using symbolic search with the

same heuristics is detrimental, compared to 26 domains where it is beneﬁcial. Note that these two conﬁgurations are using the same optimization criteria to compute the potentials. However, as explained in Section 6, in order to obtain consistent operatorpotential heuristics, we must (1) split the operators so that all variables mentioned in the effect appear in the preconditions; and (2) use MIP to ensure that operator potentials have an integer value. In most domains this is not an issue, and there are only a few domains where the number of operators increases signiﬁcantly (e.g., only in agricola and maintenance the task size at least doubled). Nevertheless, this sometimes has an overhead on the computation of the potential heuristics, as illustrated by Figure 2e, which makes the improvements in coverage even more remarkable. To analyze if this overhead could be easily avoided, we also compared A+I against pd-A+I and tnf-A+I. The path-dependent variant pd-A+I partially avoids the overhead shown in Figure 2e, because it does not require the transformation of tasks as per (1). Yet, in terms of coverage, results are inferior in general, reducing coverage in 9 domains and improving only in agricola, and caldera. We also tried a variant of pd-A+I with ﬂoating-point potentials (i.e., only LP is solved) and rounding to nearest smaller integer (to deal with partitioning of states), but it solved 107 fewer tasks from 20 domains. This shows that it pays off to force the heuristic to be consistent, unless the task size increases signiﬁcantly. Finally, using the transition normal form (tnf-A+I) turns out to be detrimental for symbolic search and never pays-off, because the symbolic search must also use the task in transition normal form with a lot of auxiliary zero-cost operators.

Comparison Against State of the Art

Compare ﬁnally the performance of our best-performing variants (A+I, M2+I, and S1k+I) against the unrelated approaches lmc, ms, comp2, and scrp. Our planners clearly beat lmc and ms in terms of overall coverage and frequency of per-domain superiority. The state-of-the-art planners comp2 and scrp are roughly on par in overall coverage. In terms of individual domains, the clear conclusion is that our new techniques are highly complementary to the previous state of the art: A+I, M2+I, and S1k+I outperform comp2 and scrp in 12 to 16 domains.

9 Conclusion

While heuristic search and symbolic search are both contenders for the throne in optimal planning, and their combination is a natural and promising avenue, the results with that combination have thus far been disappointing. As we show, this picture changes dramatically when leveraging the fact that potential heuristics can be viewed as potentials over operators, which enables their smooth integration into symbolic search. We have shown that and how this can be done, in particular while retaining consistency. Our empirical results show that this boosts the performance of optimal symbolic planning, which is now on par with the best heuristic search based optimal planners.

Acknowledgements This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) project number 389792660 TRR 248 (see https: //perspicuous-computing.science). The experimental evaluation was supported by the OP VVV funded project CZ.02.1.01/0.0/0.0/16 019/0000765 Research Center for Informatics .

References Alc azar, V.; Borrajo, D.; Fern andez, S.; and Fuentetaja, R. 2013. Revisiting Regression in Planning. In Proceedings of the 23rd International Joint Conference on Artiﬁcial Intelligence (IJCAI 13), 2254 2260. Alc azar, V.; and Torralba, A. 2015. A Reminder about the Importance of Computing and Exploiting Invariants in Planning. In Proceedings of the 25th International Conference on Automated Planning and Scheduling (ICAPS 15), 2 6. B ackstr om, C.; and Nebel, B. 1995. Complexity Results for SAS+ Planning. Computational Intelligence, 11(4): 625 655. Bryant, R. E. 1986. Graph-Based Algorithms for Boolean Function Manipulation. IEEE Transactions on Computers, 35(8): 677 691. Edelkamp, S. 2002. Symbolic Pattern Databases in Heuristic Search Planning. In Proceedings of the 6th International Conference on Artiﬁcial Intelligence Planning and Scheduling (AIPS 02), 274 283. Edelkamp, S.; and Kissmann, P. 2008. Limits and Possibilities of BDDs in State Space Search. In Proceedings of the 23rd National Conference of the American Association for Artiﬁcial Intelligence (AAAI 08), 1452 1453. Edelkamp, S.; and Reffel, F. 1998. OBDDs in Heuristic Search. In Proceedings of the 22nd Annual German Conference on Advances in Artiﬁcial Intelligence (KI 98), volume 1504 of Lecture Notes in Computer Science, 81 92. Springer. Fiˇser, D. 2020. Lifted Fact-Alternating Mutex Groups and Pruned Grounding of Classical Planning Problems. In Proceedings of the 34th AAAI Conference on Artiﬁcial Intelligence (AAAI 20). Fiˇser, D.; Horˇc ık, R.; and Komenda, A. 2020. Strengthening Potential Heuristics with Mutexes and Disambiguations. In Proceedings of the 30th International Conference on Automated Planning and Scheduling (ICAPS 20), 124 133. Franco, S.; Lelis, L. H. S.; and Barley, M. 2018. The Complementary2 Planner in IPC 2018. In IPC 2018 planner abstracts, 32 36. Franco, S.; Torralba, A.; Lelis, L. H.; and Barley, M. 2017. On Creating Complementary Pattern Databases. In Proceedings of the 26th International Joint Conference on Artiﬁcial Intelligence (IJCAI 17), 4302 4309. Hart, P. E.; Nilsson, N. J.; and Raphael, B. 1968. A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Transactions on Systems Science and Cybernetics, 4(2): 100 107.

Helmert, M.; and Domshlak, C. 2009. Landmarks, Critical Paths and Abstractions: What s the Difference Anyway? In Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS 09), 162 169. Helmert, M.; Haslum, P.; Hoffmann, J.; and Nissim, R. 2014. Merge & Shrink Abstraction: A Method for Generating Lower Bounds in Factored State Spaces. Journal of the Association for Computing Machinery, 61(3): 16:1 16:63. Jensen, R. M.; Veloso, M. M.; and Bryant, R. E. 2008. Stateset branching: Leveraging BDDs for heuristic search. Artiﬁcial Intelligence, 172(2-3): 103 139. Kissmann, P.; and Edelkamp, S. 2011. Improving Cost Optimal Domain-Independent Symbolic Planning. In Proceedings of the 25th National Conference of the American Association for Artiﬁcial Intelligence (AAAI 11), 992 997. Mc Millan, K. L. 1993. Symbolic Model Checking. Kluwer Academic Publishers. Pommerening, F.; and Helmert, M. 2015. A Normal Form for Classical Planning Tasks. In Proceedings of the 25th International Conference on Automated Planning and Scheduling (ICAPS 15), 188 192. Pommerening, F.; Helmert, M.; R oger, G.; and Seipp, J. 2015. From Non-Negative to General Operator Cost Partitioning. In Proceedings of the 29th AAAI Conference on Artiﬁcial Intelligence (AAAI 15), 3335 3341. Seipp, J. 2018. Fast Downward Scorpion. In IPC 2018 planner abstracts, 77 79. Seipp, J.; and Helmert, M. 2018. Counterexample-Guided Cartesian Abstraction Reﬁnement for Classical Planning. Journal of Artiﬁcial Intelligence Research, 62: 535 577. Seipp, J.; Pommerening, F.; and Helmert, M. 2015. New Optimization Functions for Potential Heuristics. In Proceedings of the 25th International Conference on Automated Planning and Scheduling (ICAPS 15), 193 201. Sievers, S.; Wehrle, M.; and Helmert, M. 2016. An Analysis of Merge Strategies for Merge-and-Shrink Heuristics. In Proceedings of the 26th International Conference on Automated Planning and Scheduling (ICAPS 16), 294 298. Speck, D.; Geißer, F.; and Mattm uller, R. 2020. When Perfect Is Not Good Enough: On the Search Behaviour of Symbolic Heuristic Search. In Proceedings of the 30th International Conference on Automated Planning and Scheduling (ICAPS 20), 263 271. Torralba, A.; Alc azar, V.; Borrajo, D.; Kissmann, P.; and Edelkamp, S. 2014. Sym BA*: A Symbolic Bidirectional A* Planner. In IPC 2014 planner abstracts, 105 109. Torralba, A.; Alc azar, V.; Kissmann, P.; and Edelkamp, S. 2017. Efﬁcient symbolic search for cost-optimal planning. Artiﬁcial Intelligence, 242: 52 79. Torralba, A.; Linares L opez, C.; and Borrajo, D. 2016. Abstraction Heuristics for Symbolic Bidirectional Search. In Proceedings of the 25th International Joint Conference on Artiﬁcial Intelligence (IJCAI 16), 3272 3278. Torralba, A.; L opez, C. L.; and Borrajo, D. 2018. Symbolic perimeter abstraction heuristics for cost-optimal planning. Artiﬁcial Intelligence, 259: 1 31.