# iterative_budgeted_exponential_search__e114f887.pdf

Iterative Budgeted Exponential Search

Malte Helmert 1 , Tor Lattimore2 , Levi H. S. Lelis3 , Laurent Orseau2 and Nathan R. Sturtevant4

1Department of Mathematics and Computer Science, University of Basel, Switzerland 2Deep Mind, London, UK 3Departamento de Informática, Universidade Federal de Viçosa, Brazil 4Department of Computing Science, University of Alberta, Edmonton, AB, Canada malte.helmert@unibas.ch, {lattimore, lorseau}@google.com, levi.lelis@ufv.br, nathanst@ualberta.ca

We tackle two long-standing problems related to re-expansions in heuristic search algorithms. For graph search, A* can require Ω(2n) expansions, where n is the number of states within the ﬁnal f bound. Existing algorithms that address this problem like B and B improve this bound to Ω(n2). For tree search, IDA* can also require Ω(n2) expansions. We describe a new algorithmic framework that iteratively controls an expansion budget and solution cost limit, giving rise to new graph and tree search algorithms for which the number of expansions is O(n log C ), where C is the optimal solution cost. Our experiments show that the new algorithms are robust in scenarios where existing algorithms fail. In the case of tree search, our new algorithms have no overhead over IDA* in scenarios to which IDA* is well suited and can therefore be recommended as a general replacement for IDA*.

1 Introduction There are two long-standing problems in heuristic search where existing algorithms struggle to balance the number of expansions and re-expansions performed in comparison to an oracle. One is in graph search, the other in tree search. The ﬁrst problem deals with admissible but inconsistent heuristics in graph search. With some caveats [Holte, 2010], A* with an admissible and consistent heuristic expands the minimum required number of states [Hart et al., 1968; Dechter and Pearl, 1985]. However, with inconsistent heuristics it may expand exponentially more states than more cautious algorithms such as B [Martelli, 1977] and B [Mérõ, 1984], which have a quadratic worst case. The second problem is in heuristic tree search algorithms that use memory that grows only linearly with the search depth. In contrast, A* memory usage grows linearly with time and often exponentially with the depth of the search. To satisfy such low memory requirements, linear-memory

*Alphabetical order. This paper is the result of merging two independent submissions to IJCAI 2019 [Orseau et al., 2019; Sturtevant and Helmert, 2019].

tree search algorithms perform successive depth-ﬁrst searches with an increasing limit on the cost. They also forgo global duplicate elimination, meaning that they do not detect if multiple paths from the initial state lead to the same state, which can lead to exponentially worse runtime compared to algorithms like A* when such duplicates are frequent. Hybrid algorithms that uses bounded memory for duplicate elimination are possible [Akagi et al., 2010, for example]. IDA* [Korf, 1985] is a cautious linear-memory algorithm that increases the f-cost bound minimally (see also RBFS [Korf, 1993]). At each iteration, IDA* searches all nodes up to the f-cost bound. The minimum cost of the nodes pruned in one iteration becomes the cost bound for the next iteration. This approach ensures that the last cost bound will be exactly the minimum solution cost. This is efﬁcient when the number of nodes matching the current cost bound grows exponentially with the number of iterations, as the total number of expansions will be dominated by the last iteration. In the worst case however, IDA* may expand only one new node in each iteration, leading to a quadratic number of (re-)expansions. Several methods have been developed to mitigate the re-expansion overhead of IDA* [Burns and Ruml, 2013; Sharon et al., 2014; Hatem et al., 2018; Sarkar et al., 1991; Wah and Shang, 1994] by increasing the cost bound more aggressively at each iteration with the aim of achieving an exponential growth rate. However, with this approach the last cost bound can be larger than the minimum solution cost, which may incur an arbitrarily large performance penalty. For these algorithms, theoretical guarantees (when provided) require strong assumptions such as uniformity of the costs or branching factor [Hatem et al., 2015, for example]. We propose a novel framework called Iterative Budgeted Exponential Search (IBEX) guaranteeing for both problems described above that the number of expansions is nearlinear in the number of nodes whose cost is at most the minimum solution cost. This is achieved by combining two ideas: (1) a budget on the number of expansions and (2) an exponential search for the maximum f-cost that can be searched exhaustively within the given budget. This framework proves that no solution can be found for the current budget, and then doubles it until a solution is found. This ensures that the last budget is always within twice the minimum required budget, while amortizing the work on early iterations due to the exponential growth of the budget.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

We develop two simple and fast algorithms that enjoy near-linear expansion guarantees, propose a number of enhancements, show how the tree search and graph search problems can be reduced to our framework, and show that these algorithms perform at least as well as state-of-theart algorithms on a number of traditional domains without exhibiting any of the catastrophic failure cases.

2 Heuristic Search Problems

A black box heuristic search problem is deﬁned by a ﬁnite state space S, a set of goal states S S, a cost function c : S S [0, ] and an initial state sinit S. The successors of a state s are those states s that can be reached by a ﬁnite-cost edge: succ(s) = {s : c(s, s ) < }. This deﬁnes a directed graph G = (S, E) where states correspond to vertices in the graph and the edges are the ﬁnite-cost successors E = {(s, s ) : c(s, s ) < }. A path is a sequence of states π = (st)m t=1 with s1 = sinit and its cost is g(π) = Pm 1 t=1 c(st, st+1), which may be inﬁnite if there is no edge between adjacent states. The end state of path π = (st)m t=1 is state(π) = sm and its successor paths are succ(π) = {(st)m+1 t=1 : st+1 succ(sm)}. A state s is expanded when the function succ(s) is called for s; a state s is generated when succ(s) is called creating s succ(s). A node corresponds to a single path π from the root of a search tree. Expanding a node corresponds to expanding state(π) and generating all the corresponding successor paths (nodes). Search algorithms may expand the same state multiple times because multiple nodes might represent the same state. Let π (s) = argminπ:state(π)=s g(π) be a least-cost path to state s and g (s) = g(π (s)) be the cost of such a path. Let Π be the set of all paths from the initial state to all goal states. Then, the cost of a least-cost path to a goal state is C = minπ Π g(π). The objective is to ﬁnd a least-cost path from the initial state to a goal state. Let h (s) be the minimal cost over all paths from s to any goal state. A heuristic is a function h : S [0, ] that provides an estimate of h . A heuristic is admissible if h(s) h (s) for all states s S and consistent if h(s) h(s ) + c(s, s ) for all pairs of states s, s . The fcost of a path is f(π) = g(π) + h(state(π)). Note that if s = state(π) is a goal state and the heuristic is admissible we must have h(s) = 0. We say that a search algorithm is a graph search if it eliminates duplicates of states generated by the algorithm; otherwise it is called a tree search.

2.1 Graph Search With a consistent heuristic, f-costs along a path are nondecreasing, thus a graph search algorithm must expand all states in the graph with f(s) = g (s) + h(s) < C . In this setting, A* has an optimal behaviour [Dechter and Pearl, 1985]. When the heuristic is admissible but inconsistent, for comparing algorithms one could consider the ideal number of nodes that A* would expand if the heuristic was made consistent. Unfortunately, not only does there exist no optimal algorithm for this case [Mérõ, 1984], but it can even be shown that all algorithms may need to

expand exponentially too many nodes in some cases (see supplementary material). Hence we focus our attention on the following relaxed notion of optimality. Let n G = |{s : minπ:state(π)=s maxs π(s) f(s ) C }| be the number of states that can be reached by a path along which all states have f-cost at most C ; this is the deﬁnition used by Martelli [1977]. Then, there exist problems where A* performs up to Ω(2n G ) expansions [Martelli, 1977]. This limitation has been partially addressed with the B [Martelli, 1977] and B [Mérõ, 1984] algorithms for which the number of expansions is at most O(n2 G ). We improve on this result with a new algorithm for which the number of expansions is at most O(n G log(C )).

2.2 Tree Search

Tree search algorithms work on the tree expansion of the state space, where every path from sinit corresponds to a tree node. Consequently, states reached on multiple paths will be expanded multiple times. We say a path π is necessarily expanded if maxs π f(s ) < C and possibly expanded if maxs π f(s ) C . A tree search algorithm must always expand all necessarily expanded paths and will usually also expand some paths that are possibly but not necessarily expanded. To avoid the subtleties of tie-breaking, we discuss upper bounds in terms of possibly expanded paths, leaving a more detailed analysis for future work. We write n T for the number of possibly expanded paths in a tree search. In the worst case, IDA* may perform Ω(n2 T ) expansions. To mitigate this issue, algorithms such as IDA*_CR [Sarkar et al., 1991] and EDA* [Sharon et al., 2014] increase the fcost bound more aggressively. These methods are effective when the growth of the tree is regular enough, but can fail catastrophically when the tree grows rapidly near the optimal f-cost, as will be observed in the experiments. We provide an algorithm that performs at most a logarithmic factor more expansions than n T and uses memory that is linear in the search depth.

Supplementary material. Detailed proofs of the various claims made in this paper are provided in the extended version [Helmert et al., 2019].

Notation. The natural numbers are N0 = {0, 1, 2, . . .} and N1 = {1, 2, 3, . . .}. For real-valued x and a let x a = max{a, x } and similarly for x a.

3 Abstract View

We now introduce a useful abstraction that allows us to treat tree and graph search in a uniﬁed manner. Tree search is used as a motivating example. The problem with algorithms like EDA* that aggressively increase the f-cost limit is the possibility of a signiﬁcant number of wasted expansions once the f-cost limit is above C . The core insight of our framework is that this can be mitigated by stopping the search if the number of expansions exceeds a budget and slowly increasing the budget in a careful manner. A depth-ﬁrst search with an f-cost limit and expansion budget reveals that either (a) the expansion budget was

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

insufﬁcient to search the whole tree with f-cost smaller or equal to the limit, or (b) the expansion budget was sufﬁcient. In the latter case, if the goal is found, then the algorithm can return a certiﬁably optimal solution. Furthermore, when the budget is insufﬁcient the largest f-cost of a node visited by the search serves as an upper bound on the largest f-cost for which the budget will be exceeded. When the budget is sufﬁcient, the smallest f-cost in the fringe is a lower bound on the same. This information means that combining exponential search [Bentley and Yao, 1976] with repeated depth-ﬁrst searches with a varying f-cost limit and ﬁxed expansion budget can be used to quickly ﬁnd a solution if the budget is sufﬁcient to expand all nodes with f-cost less than C and otherwise produce a certiﬁcate that the budget is insufﬁcient, a process we explain in detail in Section 5. Based on this idea, the basic version of our new algorithm operates in iterations. Within each iteration the algorithm makes multiple depth-ﬁrst searches with a ﬁxed expansion budget and varying f-cost limits. An iteration ends once the algorithm ﬁnds the optimal solution and expands all paths with f-cost less than C , or once it can prove that the present expansion budget is insufﬁcient to ﬁnd the optimal solution. At the end of the iteration the expansion budget is doubled. In what follows we abstract the search procedure into a query function that accepts as input an f-cost limit and an expansion budget and returns an interval that contains the smallest f-cost limit for which the budget is insufﬁcient or throws an exception with an optimal solution if the f-cost limit is at least C and the expansion budget is larger or equal to the number of nodes with f-cost less than the limit.

Formal model. We consider an increasing list A of real numbers v 1, possibly with repetition. Deﬁne a function n : [1, ) N0 by n(C) = |{v A : v C}|, where multiple occurrences are counted separately. Next, let C A and n = n(C ). In our application to tree search, A is the list of all node f values, including duplicates, and n(C) is the number of paths in the search tree for which the f values is at most C and C is the cost of the optimal solution. We can require that all f values are at least 1 with no loss of generality: if h(sinit) < 1 we introduce an artiﬁcial new initial state with heuristic value 1 and an edge of cost 1 h(sinit) from the new state to sinit. This shifts all path costs by at most 1, so if C is the original optimal solution cost, we have C C + 1.

Query functions. Let Ccrit(b) = min{v A : n(v) > b} be the smallest value in A for which expansion budget b is insufﬁcient. We deﬁne three functions, querylim, queryint and queryext, all accepting as input an f-cost limit C and expansion budget b. A call to any of the functions makes at most min{b, n(C)} expansions and throws an exception with an optimal solution if n n(C) b. Otherwise all three functions return an interval containing Ccrit(b) on which we make different assumptions as described next. The abstract objective is to make a query that ﬁnds an optimal solution using as few expansions as possible.

Limited feedback. In the limited feedback model the query function returns an interval that only provides information about whether or not the expansion budget b was smaller or

larger than n(C):

querylim(C, b) = [C, ] if C < Ccrit(b) budget sufﬁcient , [1, C] if C Ccrit(b) budget exceeded . Integer feedback. In many practical problems, the list A only contains integers. In this case we consider the feedback model:

queryint(C, b) = [ C + 1, ] if C < Ccrit(b) , [1, C] if C Ccrit(b) . The discrete nature of the returned interval means that if Ccrit(b) [C, C + 1], then

queryint(C, b) queryint(C + 1, b) = {Ccrit(b)} , In other words, the exact value of Ccrit(b) can be identiﬁed by making queries on either side of an interval of unit width containing it. By contrast, querylim cannot be used to identify Ccrit(b) exactly. Extended feedback. For heuristic search problems the interval returned by the query function can be reﬁned more precisely by using the smallest observed f-cost in the fringe and largest f-cost of an expanded path. Deﬁne A>(C) = min{v A : v > C} and A (C) = max{v A : v C}. When C A we deﬁne δ(C) = A>(C) A (C) and δmin = min{δ(C) : C C , C A} . These concepts are illustrated in Fig. 1. In the extended feedback model, when the expansion budget is sufﬁcient the response of the query is the interval [A>(C), ]. Otherwise the query returns an interval [1, v] where v is any value in A [Ccrit(b), C] (for example v = A (C)):

queryext(C, b) = [A>(C), ] if C < Ccrit(b) , [1, v], with v [Ccrit(b), C] A if C Ccrit(b) .

In tree search the value of v when C Ccrit(b) is the largest f-cost over paths expanded by the search, which may depend on the expansion order. As for integer feedback, the information provided by extended feedback allows the algorithm to prove that an expansion budget is insufﬁcient to ﬁnd a solution. Summary of results. In the following sections we describe algorithms for all query models for which the number of node expansions is at most a logarithmic factor more than n . The limited feedback model is the most challenging and is detailed last, while the extended feedback model provides the cleanest illustration of our ideas. The logarithmic factor depends on C and δmin or δ(C ). The theorems are summarized in Table 1, with precise statements given in the relevant sections. Overview. In the next section we implement queryext for tree search and for graph search (Section 4). We then introduce a variant of exponential search that uses the query function to ﬁnd the critical cost for a given budget (Section 5). Our main algorithm (IBEX) uses the exponential search with a growing budget an optimal solution is found (Section 6). The Dov IBEX algorithm is then provided to deal with the more general limited feedback setting (Sections 7 and 8).

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

Ccrit(b) A (C) C A>(C) C 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Figure 1: The function n( ), generated by b = 3, C = 10.2, C = 15 and A = [1, 3, 5, 5, 8, 12, 13, 13, 15]. Ccrit(b) = 5 is the smallest value in A for which there are more than b = 3 values of at most 5. The largest value at most C in A is A (C) = 8, and A>(C) = 12 is the next value in A. We also have δmin = 1 and δ(C) = 4. Example queries include: querylim(C = 4, b = 3) = [4, ] and queryext(C = 4, b = 3) = [5, ] and queryext(C = 9, b = 3) = [1, 8].

Limited feedback O(Z log Z), Z = n log C δ(C )

Extended feedback O n log C δmin

Integer feedback O (n log(C ))

Table 1: Number of expansions in the worst case of our algorithms for the different types of feedback.

4 Reductions

We now explain how to reduce tree search and graph search to the abstract framework and implement queryext for these domains. These query functions will be used in the next sections as part of the main algorithms. Recall that the number of expansions performed by query(C, ) must be at most n(C) and n( ) is non-decreasing and that n = n(C ). The list A is composed of the f-costs of the nodes encountered during the search. Recall from our problem deﬁnition that each node corresponds to a path π, and that expanding a node corresponds to expanding a state s = state(π). For a search cost-bounded by C, let the fringe be all generated nodes with f(π) > C. Also, let the set of visited nodes be the generated nodes such that f(π) C.

Tree search. For tree search we implement queryext in Algorithm 1, which is a variant of depth-ﬁrst search with an f-cost limit C and expansion budget b. The algorithm terminates before exceeding the expansion budget and tracks the smallest f-cost observed in the fringe and the largest fcost of any visited node, which are used to implement the extended feedback model. Thus, for an f-cost bound C, n(C) = |{π : maxs π f(s ) C}|. When an optimal solution is found, the algorithm throws an exception, which is expected to be caught and handled by the user. The function querylim could be implement like queryext without needing to track min_fringe and max_expanded, but that would throw away valuable information.

Graph search. In graph search, queryext is implemented in a similar way as Algorithm 1, but DFS is replaced with

graph_search , which appears in the supplementary material.

Algorithm 1 Query with extended feedback for tree search

1 def queryext(C, budget):

2 data.min_fringe = # will be > C

3 data.max_visited = 0 # will be C

4 data.expanded = 0 # number of expansions

5 data.best_path = none # f(none) =

7 DFS(C, budget, {sinit}, data)

8 catch "budget exceeded":

9 return [1, data.max_visited]

10 if data.best_path = none: # solution found

11 throw data # to be dealt with by the user

12 return [data.min_fringe, ]

13 # data: return info, passed by reference

14 # π: path

15 def DFS(C, budget, π, data):

16 if f(π) > C:

17 data.min_fringe = min(data.min_fringe, f(π))

19 data.max_visited = max(data.max_visited, f(π))

20 if f(π) f(data.best_path):

21 return # branch and bound

22 if is_goal(state(π)):

23 data.best_path = π

24 # Here we could throw the solution if its

25 # cost is equal to a known lower bound

27 if data.expanded == budget:

28 throw "budget exceeded"

29 data.expanded++

30 for s succ(state(π)):

31 DFS(C, budget, π + {s }, data)

The graph_search function is equivalent to BFIDA* [Zhou and Hansen, 2004] with the breadth-ﬁrst search replaced with Uniform-Cost Search (UCS) [Russell and Norvig, 2009; Felner, 2011], using an f-cost limit C on the generated nodes and tracking the maximum f-cost among visited states and the minimum f-cost in the fringe. As in UCS, states are processed in increasing g-cost order. Since g is nondecreasing, states are not expanded more than once in each query. Therefore the number of expansions is at most n(C) = |{s : minπ:state(π)=s maxs π f(s ) C}| and the number of expansions made by query(C , ) is at most n(C ) = n G as required.

5 Exponential Search

With the reductions out of the way, we now introduce a budgeted variant of exponential search [Bentley and Yao, 1976], which is closely related to the bracketed bisection method [Press et al., 1992, 9]. Algorithm 2 accepts as input a budget b, an initial cost limit start Ccrit(b) and a function query {queryext, queryint, querylim}. The algorithm starts by

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

Algorithm 2 Exponential search with budgeted queries

1 def exp_search(start, b, query):

2 low = start

5 if high == :

6 C = 2 low # exponential phase

8 C = (low + high) / 2 # binary phase

9 [low, high] = [low, high] query(C, b)

10 until low == high

11 return low

setting low = start and initiates an exponential phase where low is repeatedly doubled until query(2 low, b) has insufﬁcient budget. The algorithm then sets high = 2 low and performs a binary search on the interval [low, high] until low = high. See Fig. 2 for an illustration. The discrete structure in the integer and extended feedback models ensures that the algorithm halts after at most logarithmically many queries and returns Ccrit(b). In the limited feedback model the algorithm generally does not halt, but will make a terminating query if b n . These properties are summarized in the next two propositions, which use the following deﬁnition:

nexp(ε, x, ) = 1 + l log2 x

1 + j log2 x

This is an upper bound on the number of calls to query needed when starting at start = ε, ﬁnding a upper bound high x and then reducing the interval [low, high] to a size at most (leading to a query within that interval). Recall that making a query with cost limit C and expansion budget b will ﬁnd an optimal solution if n n(C) b. Proposition 1. Suppose b n . Then for any feedback model Algorithm 2 makes a query that terminates the interaction after at most nexp(start, C , Ccrit(b) C ) calls to query. Proposition 2. Suppose b < n . Then, for the extended feedback model, Algorithm 2 returns Ccrit(b) with at most nexp(start, Ccrit(b), δmin) queries.

6 Iterative Budgeted Exponential Search The Iterative Budgeted Exponential Search (IBEX) algorithm uses the extended query model, which is available in our applications to tree and graph search. Algorithm 3 initializes a lower bound on the optimal cost with the lowest value in the array C1 = min A; we now denote this quantity by Cmin. It subsequently operates in iterations k N0. In iteration k it sets the budget to bk = 2k and calls Ck+1 = exp_search(Ck, bk, queryext) to obtain a better lower bound. Theorem 3. The number of expansions made by Algorithm 3 is at most

4n nexp(Cmin, C , δmin) = O n log C

low = 1.3 high = C = 2.6

low = 2.9 high = C = 5.8

low = 2.9 high = 5.0 C = 4.0

low = 4.5 high = 5.0 C = 4.8

low high = 4.5

Figure 2: The four queries made by exp_search with the extended feedback model and start = 1.3 and budget b = 7. The ticks below the x-axis indicate the elements of A = [1.4, 1.5, 1.8, 2.3, 2.9, 3.5, 3.6, 3.9, 4.5, 5, 6]. The small circles are the values of C in each query. In the ﬁrst call to queryext the budget was sufﬁcient and so low is set to A>(C) = 2.9, which is doubled to produce the next query. In the second call to query, C = 5.8 leads to an insufﬁcient budget and then high is set to 5.0. In the third query Clow is increased to 4.5, which is Ccrit(b). In the fourth query the budget is insufﬁcient and the algorithm halts.

Proof. Deﬁne r1 = nexp(Cmin, C , δmin). Let k = log2 n be the ﬁrst iteration k for which bk n . Proposition 2 shows that for iterations k < k the number of queries performed in the call to exp_search is at most nexp(Ck, Ccrit(bk), δmin) r1. Proposition 1 shows that the game ends during iteration k after a number of queries bounded by nexp(Ck , C , Ccrit(bk ) C ) r1. Since each call to query with budget bk expands at most bk = 2k nodes, the total number of expansions is bounded by Pk k=1 2kr1 2k +1r1 22+log2 n r1 = 4n r1.

Remark 4. Algorithm 3 also works when queryext is replaced by queryint, and now δmin = 1.

When used for graph search, we call the resulting IBEX variant Budgeted Graph Search (BGS), and for tree search Budgeted Tree Search (BTS).

Remark 5. Observe that if DFS or graph_search are called with budget n(C) n , it throws an optimal solution. Since the state space is ﬁnite, both BTS and BGS return an optimal solution if one exists. If no solution exists, BGS will exhaust the graph and return no solution , but BTS may run forever unless additional duplicate detection is performed.

7 Uniform Budgeted Scheduler

While IBEX handles the integer and extended feedback models, it cannot handle the limited feedback model. This is handled by a new algorithm, presented in the next section, using a ﬁnely balanced dovetailing idea that we call the Uniform Budgeted Scheduler (UBS, see Algorithm 4) and

Algorithm 3 Iterative Budgeted Exponential Search

1 def IBEX(): # simple version

2 C1 = Cmin

3 for k = 1,2,...

5 Ck+1 = exp_search(Ck, bk, queryext)

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

Algorithm 4 Uniform Budgeted Scheduler

1 def UBS(T, run_prog):

2 q = make_priority_queue(T)

3 q.insert((1, 1)) # k=1, r=1

4 while not q.empty():

5 # Remove prog of minimum T cost

6 (k, r) = q.extract_min()

7 budget = T(k, r) T(k, r 1)

8 if run_prog(k, budget) != "halted":

9 q.insert((k, r + 1))

10 if r == 1:

11 q.insert((k + 1, 1))

12 return none

takes inspiration from Luby et al. (1993) speedup algorithm. UBS runs a growing and unbounded number of programs in a dovetailing fashion, for varying segments of steps. The notion of step is to be deﬁned by the user; in heuristic search we take it to be a single node expansion. During one segment, the selected program can make arbitrary computations but must use no more steps than its current budget. Program k halts when it reaches exactly τk steps, which may be inﬁnite. UBS maintains a priority queue of pairs (program index k, segment number r), initialized with (1, 1) and ordered by a function T : N1 N0 N0 with T(k, r) < T(k, r + 1) and T(k, r) T(k + 1, r) for all (k, r) and T(k, 0) = 0 for all k. In each iteration UBS removes the T-minimal element (k, r) from the front of the queue and calls run_prog (k, b) with b = T(k, r) T(k, r 1), which means that program k is being run for its rth segment with a budget of b steps, leading for program k to a total of at most T(k, r) steps over the r segments. If this does not cause program k to halt, then (k, r+1) is added to the priority queue. Finally, if r = 1, then (k + 1, 1) is added too. The function run_prog is deﬁned by the user and may store and restore the state of program k as well as allow access to a shared memory space. The monotonicity assumptions on T mean that UBS is essentially executing program/segment pairs (k, r) according to a Uniform Cost Search where a pair (k, r) is a node in the search tree with cost T(k, r) (Fig. 3). In this sense UBS tries (asymptotically) to maintain a uniform amount of steps used among all non-halting programs. Let TUBS(k, r) be the number of steps used by UBS after executing (k, r). Tj(k, r) = max{T(j, m) : T(j, m) T(k, r), m N0} is an upper bound on the number of steps used by program j when UBS executes (k, r).

(k = 1, r = 1) (k = 1, r = 2) (k = 1, r = 3)

(k = 2, r = 1) (k = 2, r = 2) halts

(k = 3, r = 1) (k = 3, r = 2) (k = 3, r = 3)

Figure 3: An example tree used by UBS. Program k = 2 halts after τ2 T(2, 2) steps.

Theorem 6. For any (k, r) with T(k, r 1) < τk,

TUBS(k, r) X

j N1 min{τj, Tj(k, r)}

T(k, r) max{j : T(j, 1) T(k, r)} .

Example 7. A good choice is T(k, r) = r2k. Then Theorem 6 implies that TUBS(k, r) r2k k + log2(r) .

8 Dov IBEX: Limited Feedback

Dov IBEX uses UBS to dovetail multiple instances of exponential search (see Algorithm 5). The algorithm can use any of the three queries, while still exploiting the additional information provided in the extended and integer feedback models. Following Example 7, we use T(k, r) = r2k so that T(k, r) T(k, r 1) = 2k. Program k executes one iteration of the loop of exponential search with budget bk = 2k; if the budget is not entirely used during a segment, the segment ends early. In the limited feedback model, exponential search may continue halving the interval [Clow, Chigh] indeﬁnitely and never halt. The scheduler solves this issue: by interleaving multiple instantiations of exponential search with increasing budgets, the total time can be bounded as a function of the time required by the ﬁrst program k that ﬁnds a solution.

Theorem 8. For query {queryext, queryint, querylim}, the number of expansions made by Algorithm 5 is at most

Z log2 Z , Z = 2n nexp(Cmin, C , δ(C ))

and also at most O(Z log Z ) with

Z = min b n b nexp(Cmin, C , Ccrit(b) C ) . (1)

The dependency on δ(C ) is gentler than the dependency on δmin in Theorem 3. It means that the behaviour of Dov IBEX only depends on the structure of the search space at the solution rather than on the worst case of all iterations.

Remark 9. Sometimes there exists a b > n for which

bnexp(Cmin, C , Ccrit(b) C ) n nexp(Cmin, C , δmin) .

In these cases the combination of UBS and exponential search can improve on Algorithm 3. Furthermore when using extended feedback, programs k < k will now halt if they can prove they cannot ﬁnd a solution. This allows us to provide the following complementary bound, which is only an additive 2n r2 log2 r2 term away from Theorem 3.

Theorem 10. When query = queryext, the number of expansions made by Algorithm 5 is at most 2n (r1 + r2(1 + log2 r2 ) with r1 = nexp(Cmin, C , δmin), r2 = nexp(Cmin, C , δ(C )).

9 Enhancements

We now describe three enhancements for Algorithm 3 (see Algorithm 6). The enhancements use a modiﬁed query function called query+ ext that returns the same interval as queryext and the number of expansions, which is

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

Algorithm 5 Dovetailing IBEX

1 def run_prog(k, b):

2 [Clow, Chigh] = get_state(k, default = [Cmin, ] )

3 if Chigh = : # exponential phase

4 C = 2 Clow

5 else: # binary phase

6 C = (Clow + Chigh)/2

7 [Clow, Chigh] = [Clow, Chigh] query(C, b)

8 if Clow >= Chigh: return "halted"

9 store_state(k, [Chigh, Clow])

10 def Dov IBEX():

11 UBS((k, r) 7 r2k, run_prog)

nused = min{b, n(C)}. For notational simplicity we write [x, y], nused = query+ ext(C, b). First, in each iteration of the enhanced IBEX a query is performed with an inﬁnite expansion budget and minimum cost limit Clow (Line 7). If the resulting number of expansions is at least 2b, where b is the current budget, then IBEX updates Clow, skips the exponential search and moves directly to the next iteration. If furthermore the DFS algorithm is given the lower bound Ck and throws a solution if its cost is Ck, then this guarantees that IBEX performs exactly like IDA* in domains in which the number of expansions grows by at least a factor 2 in each iteration. If the queries with inﬁnite budget (Line 7) do not help skipping iterations, in the worst case they cost an additive 2n expansions. The second enhancement is an early stopping condition for proceeding to the next iteration. Algorithm 3 terminates an iteration once it ﬁnds Ccrit(b), which can be slow when δ(Ccrit(b)) is small. Algorithm 6 uses a budget window deﬁned by 2b and αb, where α 2, so that whenever a query is made and the number of expansions is in the interval [2b, αb], the algorithm moves on to the next iteration (line 19). Hence iteration k ends when a query is made within budget with a cost within [Ccrit(2b), Ccrit(αb)]. The third enhancement is the option of using an additive variant of the exponential search algorithm, which increases Clow by increments of 2j at iteration j during the exponential phase (line 12). This variant is based on the assumption that costs increase linearly when the budget doubles, which often happens in heuristic search if the search space grows exponentially with the depth. These enhancements can also be applied to Dov IBEX (see the supplementary material).

10 Experiments

We test1 IBEX (BTS, enhanced), Dov IBEX (Dov BTS, enhanced), IDA* [Korf, 1985], IDA*_CR [Sarkar et al., 1991] and EDA* [Sharon et al., 2014]. EDA*(γ) is a variant of IDA* designed for polynomial domains that repeatedly calls DFS with unlimited budget and a cost threshold of

1All these algorithms are implemented in C++ in the publicly available HOG2 repository, https://github.com/nathansttt/hog2/.

Algorithm 6 Enhanced IBEX

1 # α: factor on budget, which must be 2

2 # is_additive: True is using additive search

3 def IBEX_enhanced(α=8, is_additive):

4 Clow = Cmin

6 for k = 1, 2, . . .:

7 [Clow, Chigh], nused = query+ ext(Clow, )

8 if nused < 2b:

9 for j = 1, 2, . . .:

10 if Chigh == : # exponential phase

11 if is_additive:

12 C = Clow + 2j

14 C = Clow 2

15 else: # binary phase

16 C = (Clow + Chigh)/2

17 [C low, C high], nused = query+ ext(C, α b)

18 [Clow, Chigh] = [Clow, Chigh] [C low, C high]

19 if (C high == and nused 2b) or

20 Clow == Chigh:

22 b = max(2b, nused)

γk at iteration k. In our experiments we take γ {2, 1.01}. IDA*_CR behaves similarly, but adapts the next cost threshold by collecting the costs of the nodes in the fringe into buckets and selecting the ﬁrst cost that is likely to expand at least bk nodes in the next iteration. Our implementation uses 50 buckets and sets b = 2. The number of nodes (states) of cost strictly below C is reported as n< T (n< G ). These algorithms are tested for tree search on the 15-Puzzle [Doran and Michie, 1966] with the Manhattan distance heuristic with unit costs and with varied edge costs of 1 + 1/(t + 1) to move tile t, on (12, 4)-Top Spin [Lammertink, 1989] with random action costs between 40 and 60 and the max of 3 4-tile pattern database heuristics, on long chains (branching factor of 1 and unit edge costs, solution depth in [1..10 000]), and on a novel domain, which we explain next. In order to evaluate the robustness of the search algorithms, we introduce the Coconut problem, which is a domain with varied branching factor and small solution density. The heuristic is 0 everywhere, except at the root where h = 1. At each node there are 3 actions , {1, 2, 3}. The solution path follows the same action (sampled uniformly in [1..3]) for D steps, then it follows a random path sampled uniformly of length q, where D is sampled uniformly in [1..10 000] and q is sampled from a geometric distribution of parameter 1/4. The ﬁrst action costs 1. At depth less than D, taking the same action as at the parent node costs 1, taking another action costs 2D. At depth larger than D, each action costs 1/10.

10.1 Results We use 100 instances for each problem domain, a time limit of 4 hours for 15-Puzzle and Top Spin, 1 hour for the Coconut problem and no limit for the Chain problem. The results are

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

Algorithm 15-Puzzle (unit) 15-Puzzle (real) (12, 4)-Topspin Chain Coconut α add? Solved Exp. Solved Exp. Solved Exp. 103 Solved Exp. 104 Solved Exp. 104

BTS 2 y 100 242.5 97 3 214.1 100 1 521.9 100 302.0 100 72.9 8 y 100 242.5 100 673.1 100 597.0 100 198.2 100 84.7 2 n 100 242.5 97 3 549.1 100 1 600.0 100 111.8 100 58.5 8 n 100 242.5 99 1 320.3 100 614.6 100 26.7 100 86.8 Dov BTS 2 y 100 390.5 100 1 087.7 100 1 083.1 100 287.2 100 107.4 8 y 100 322.0 100 767.4 100 606.1 100 125.9 100 1 136.0 2 n 100 322.0 98 2 355.2 100 1 145.1 100 33.8 100 121.8 8 n 100 474.6 100 2 432.2 100 590.5 100 24.9 100 2 882.9 EDA* γ = 2 99 5 586.0 100 2 882.3 100 807.5 100 19.4 3 554 249.0 EDA* γ = 1.01 100 1 023.8 100 742.0 100 730.2 100 990.2 10 528 137.9 IDA*_CR 100 868.4 100 700.6 100 346.0 100 988.3 3 516 937.7 IDA* 100 242.5 57 62 044.3 100 2 727.5 100 162 129.0 100 5 484.2 n< T 100 100.8 100 258.1 100 35.8 100 4.9 100 2.7

Table 2: Results on tree search domains. Each tasks has 100 instances. Expansions (Exp.) are averaged on solved tasks only (except Coconut), and times 106 for the 15-puzzle. BTS is the implementation for tree search of the IBEX framework. add? is the is_additive parameter.

Algorithm d=100 d=1 000 d=10 000 α add? Exp. Exp. Exp. Time (s)

BGS 2 y 2 592 35 478 752 392 0.4 8 y 1 276 22 275 312 497 0.2 2 n 2 429 26 030 513 573 0.4 8 n 513 8 821 84 434 0.1 Dov BGS 2 y 2 195 31 862 564 720 0.1 8 y 1 495 15 757 189 883 0.1 2 n 1 547 12 987 185 500 0.1 8 n 449 4 017 36 093 0.1 A*/B/B 7 652 751 502 75 015 002 22.4 n< G 200 2 000 20 000 0.0

Table 3: Results for inconsistent heuristics in graph search. BGS is the implementation of the IBEX framework for graph search.

shown in Table 2. We report results also for α = 2 to show the gain in efﬁciency when using a budget factor window of [2, 8] instead of the narrower window [2, 2] (see Section 9). IBEX (BTS) and Dov IBEX (Dov BTS) (α = 8) are the only robust algorithms across all domains while being competitive on all domains, whereas all other algorithms tested fail hard on at least one domain. IBEX (BTS) has exactly the same behaviour as IDA* when the number of expansions grows at least by a factor 2 at each call to query with inﬁnite budget; see 15-Puzzle (unit). Taking is_additive=y helps on exponential domains, whereas is_additive=n helps on polynomial domains, as expected. To explain the behaviour of IDA*_CR and EDA* on the Coconut problem, consider a randomly chosen instance where D = 2 690 and q = 6. The cost set by EDA* (γ = 2) in the last iteration is 4 096, resulting in a search tree with approximately 3(4 096 2 690)/0.1 106 700 nodes. The same issue arises for IDA*_CR. EDA*(1.01) performs only marginally better. This is not a carefully selected example, and such behaviour occurs on almost all Coconut instances. Finally, we evaluate our algorithms in graph search problems with inconsistent heuristics, parameterizing Mérõ s (1984) graph by d to have 2d + 2 states (see supplementary

material). All states have heuristic of 0 except each state ti which has heuristic d + i 1. A*, B [Martelli, 1977], and B [Mérõ, 1984] are all expected to perform O(n2 G ) expansions on this graph. The results are in Table 3. While A* shows quadratic growth on the number of expansions, our algorithms exhibit near-linear performance as expected.

11 Conclusion We have developed a new framework called IBEX that combines exponential search with an increasing node expansion budget to resolve two long-standing problems in heuristic search. The resulting algorithms for tree and graph search improve existing guarantees on the number of expansions from Ω(n2 ) to O(n log C ). Our algorithms are fast and practical. They signiﬁcantly outperform existing baselines in known failure cases while being at least as good, if not better, on traditional domains; hence, for tree search we recommend using our algorithms instead of IDA*. On graph search problems our algorithms outperform A*, B and B when the heuristic is inconsistent, and pay only a small log C factor otherwise. We also expect the IBEX framework to be able to tackle reexpansions problems of other algorithms, such as Best-First Levin Tree Search [Orseau et al., 2018] and Weighted A* [Chen et al., 2019]. The IBEX framework and algorithms have potential applications beyond search in domains that exhibit a dependency between a parameter and the amount of work (computation steps, energy, etc.) required to either succeed or fail. Some of these applications may not be well suited to the extended feedback model, which further justiﬁes the interest in the analysis of the limited feedback model.

Acknowledgements This research was enabled in part by a sabbatical grant from the University of Denver and by Compute Canada (www.computecanada.ca). Many thanks to Csaba Szepesvári, András György, János Kramár, Roshan Shariff, Ariel Felner, and the reviewers for their feedback.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

References [Akagi et al., 2010] Yuima Akagi, Akihiro Kishimoto, and Alex Fukunaga. On transposition tables for single-agent search and planning: Summary of results. In Proceedings of the 3rd Annual Symposium on Combinatorial Search (So CS 2010), pages 2 9, 2010. [Bentley and Yao, 1976] Jon Louis Bentley and Andrew Chi-Chih Yao. An almost optimal algorithm for unbounded searching. Information Processing Letters, 5(3):82 87, 1976. [Burns and Ruml, 2013] Ethan Burns and Wheeler Ruml. Iterative-deepening search with on-line tree size prediction. Annals of Mathematics and Artiﬁcial Intelligence, 69(2):183 205, 2013. [Chen et al., 2019] Jingwei Chen, Nathan R. Sturtevant, William Doyle, and Wheeler Ruml. Revisiting suboptimal search. In Proceedings of the 12th Annual Symposium on Combinatorial Search (So CS 2019), 2019. [Dechter and Pearl, 1985] Rina Dechter and Judea Pearl. Generalized best-ﬁrst search strategies and the optimality of A*. Journal of the ACM, 32(3):505 536, 1985. [Doran and Michie, 1966] James E. Doran and Donald Michie. Experiments with the graph traverser program. Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences, 294(1437):235 259, 1966. [Felner, 2011] Ariel Felner. Position paper: Dijkstra s algorithm versus uniform cost search or a case against dijkstra s algorithm. In Proceedings of the 4th Annual Symposium on Combinatorial Search (So CS 2011), pages 47 51, 2011. [Hart et al., 1968] Peter E. Hart, Nils J. Nilsson, and Bertram Raphael. A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4(2):100 107, 1968. [Hatem et al., 2015] Matthew Hatem, Scott Kiesel, and Wheeler Ruml. Recursive best-ﬁrst search with bounded overhead. In Proceedings of the 29th AAAI Conference on Artiﬁcial Intelligence (AAAI 2015), pages 1151 1157, 2015. [Hatem et al., 2018] Matthew Hatem, Ethan Burns, and Wheeler Ruml. Solving large problems with heuristic search: General-purpose parallel external-memory search. Journal of Artiﬁcial Intelligence Research, 62:233 268, 2018. [Helmert et al., 2019] Malte Helmert, Tor Lattimore, Levi H. S. Lelis, Laurent Orseau, and Nathan R. Sturtevant. Iterative budgeted exponential search. Co RR, 2019. [Holte, 2010] Robert C. Holte. Common misconceptions concerning heuristic search. In Proceedings of the 3rd Annual Symposium on Combinatorial Search (So CS 2010), pages 46 51, 2010. [Korf, 1985] Richard E. Korf. Depth-ﬁrst iterativedeepening: An optimal admissible tree search. Artiﬁcial Intelligence, 27(1):97 109, 1985.

[Korf, 1993] Richard E. Korf. Linear-space best-ﬁrst search. Artiﬁcial Intelligence, 62(1):41 78, 1993. [Lammertink, 1989] Ferdinand Lammertink. Puzzle or game having token ﬁlled track and turntable, October 3 1989. US Patent 4,871,173. [Luby et al., 1993] Michael Luby, Alistair Sinclair, and David Zuckerman. Optimal speedup of Las Vegas algorithms. Information Processing Letters, 47(4):173 180, September 1993. [Martelli, 1977] Alberto Martelli. On the complexity of admissible search algorithms. Artiﬁcial Intelligence, 8(1):1 13, 1977. [Mérõ, 1984] László Mérõ. A heuristic search algorithm with modiﬁable estimate. Artiﬁcial Intelligence, 23(1):13 27, 1984. [Orseau et al., 2018] Laurent Orseau, Levi H. S. Lelis, Tor Lattimore, and Theophane Weber. Single-agent policy tree search with guarantees. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (Neur IPS 2018), pages 3205 3215, 2018. [Orseau et al., 2019] Laurent Orseau, Tor Lattimore, and Levi H. S. Lelis. Zooming cautiously: Linear-memory heuristic search with node expansion guarantees. Co RR, abs/1906.03242, 2019. [Press et al., 1992] William H. Press, Saul A. Teukolsky, William Vetterling, and Brian P. Flannery. Numerical recipes in C. Cambridge University Press, 1992. [Russell and Norvig, 2009] Stuart Russell and Peter Norvig. Artiﬁcial Intelligence: A Modern Approach. Prentice Hall Press, Upper Saddle River, NJ, USA, 3rd edition, 2009. [Sarkar et al., 1991] U. K. Sarkar, P. P. Chakrabarti, S. Ghose, and S. C. De Sarkar. Reducing reexpansions in iterative-deepening search by controlling cutoff bounds. Artiﬁcial Intelligence, 50(2):207 221, 1991. [Sharon et al., 2014] Guni Sharon, Ariel Felner, and Nathan R. Sturtevant. Exponential deepening A* for real-time agent-centered search. In Proceedings of the 28th AAAI Conference on Artiﬁcial Intelligence (AAAI 2014), pages 871 877, 2014. [Sturtevant and Helmert, 2019] Nathan R. Sturtevant and Malte Helmert. Exponential-binary state-space search. Co RR, abs/1906.02912, 2019. [Wah and Shang, 1994] Benjamin W. Wah and Yi Shang. Comparision and evaluation of a class of IDA* algorithms. International Journal on Artiﬁcial Intelligence Tools, 3(4):493 524, 1994. [Zhou and Hansen, 2004] Rong Zhou and Eric A. Hansen. Structured duplicate detection in external-memory graph search. In Proceedings of the 19th National Conference on Artiﬁcial Intelligence (AAAI 2004), pages 683 689, 2004.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)