# lookahead_with_minibucket_heuristics_for_mpe__7646fc70.pdf

Look-Ahead with Mini-Bucket Heuristics for MPE

Rina Dechter, Kalev Kask, William Lam University of California, Irvine Irvine, California, USA

Javier Larrosa UPC Barcelona Tech Barcelona, Spain

The paper investigates the potential of look-ahead in the context of AND/OR search in graphical models using the Mini Bucket heuristic for combinatorial optimization tasks (e.g., MAP/MPE or weighted CSPs). We present and analyze the complexity of computing the residual (a.k.a. Bellman update) of the Mini-Bucket heuristic and show how this can be used to identify which parts of the search space are more likely to beneﬁt from look-ahead and how to bound its overhead. We also rephrase the look-ahead computation as a graphical model, to facilitate structure exploiting inference schemes. We demonstrate empirically that augmenting Mini-Bucket heuristics by look-ahead is a cost-effective way of increasing the power of Branch-And-Bound search.

Introduction Look-ahead is known to be useful and even essential in the context of online search algorithms (e.g., game playing schemes such as alpha-beta, planning under uncertainty (Geffner and Bonet 2013; Vidal 2004) where the search space is enormous and traversing it to completion is out of the question. Look-ahead can improve the heuristic function h of a node by expanding the search tree below it to a certain depth d, evaluate the static h at tip nodes and roll back this information to n (known as a Bellman update). Here we investigate the potential of look-ahead (Bu et al. 2014) in the different context of complete search. We consider the min-sum problem over a Graphical model (which includes, among others, the most probable explanation task of Markov Networks (Pearl 1988)). This problem is usually solved by depth-ﬁrst search Branch-and-Bound (DFBn B). The search is guided by a heuristic function that provides an optimistic bound on the cost of the best extension of any partial assignment. It is often observed that these types of algorithms spend a signiﬁcant time proving the optimality of their output (where the only use of h is pruning). Our framework is AND/OR Branch-And-Bound (AOBB) guided by the static Mini-Bucket Elimination heuristic (MBE), which, assisted by variational cost-shifting methods, is one of the best current approaches for the min-sum problem (Marinescu and Dechter 2009; L. Otten and Dechter 2012). However, this heuristic can be weak, (e.g., mainly for

Copyright c 2016, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

problems having a high induced width). A natural approach is to improve it via dynamic look-ahead during search itself. In the context of depth-ﬁrst Bn B a better heuristic may i) allow ﬁnding an optimal solution early with less search (and thus the best upper bound for pruning), and ii) once the optimal bound is reached, it will help proving optimality by better pruning of the search space. What we show in this paper is that in the context of graphical models and when using the mini-bucket heuristic, we can bound the computation of d-level look-ahead sufﬁciently, making it cost effective. This is accomplished by i) exploiting a relationship between the residual of the heuristic and a new concept of bucket error that can be pre-compiled, to facilitate a selective look-ahead process combined with effective pruning of the d-level lookahead tree whenever it is applied, and ii) using an inference-based message-passing algorithm to compute the look-ahead, thus exploiting the graphical model structure at each node. In Section 2, we provide background. Section 3 introduces the notions of residual and local bucket-error of the minibucket heuristic that captures the mini-bucket approximation error. Section 4 presents the experiments and discussion. Section 5 concludes the paper.

Graphical Models A graphical model is a tuple M = (X, D, F), where X = {Xi : i V } is a set of variables indexed by a set V and D = {Di : i V } is the set of ﬁnite domains of values for each Xi, F is a set of discrete functions. In this paper we focus on the min-sum problem, C = minx

f F f(x) wich is applicable in a wide range of reasoning problems (e.g, MPE/MAP). The scope of f (i.e. the subset of X relevant to f) is noted scope(f). We will use lower case y to denote the assignment of a set of variables Y X. Thus, f(x, Y ) is a function with scope X Y partially assigned by x. In an abuse of notation we will sometimes write f(y) with scope(f) Y assuming that the irrelevant parts of the assignment will be ignored.

Primal graph, pseudo-tree. The primal graph G of a graphical model M has each variable Xi in a node and an edge (Xi, Xj) is in G iff the pair of variables appears in the scope of any f F. A pseudo-tree T = (V, E ) of the

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16)

(a) A primal graph of a graphical model over 8 variables. We assume pair-wise functions.

(b) A pseudo-tree T. Grayed area is TB,1

f(B,C) C f(B,F) F

f(A,G) f(F,G) G f(B,E) f(C,E) E f(B,D) f(C,D) D

mini-buckets

(c) Mini-bucket tree with i-bound of 3.

(d) The AND/OR context minimal search graph.

(e) Depth 2, look-ahead tree from B. Minimal pruned tree in grayed area.

(f) Depth 2, look-ahead search space from B. Minimal pruned search space in grayed area.

primal graph G = (V, E) is a rooted tree over the same set of vertices V such that every edge in E E connects a node to one of its ancestors in T.

Notation. Let T be a pseudo-tree. Xp denotes Xp and its ancestors in T. ch(Xp) denote the children of Xp in T. The subtrees rooted by each ch(Xp) is denoted Tp. The same subtrees of Tp pruned below depth d is denoted Tp,d. We will often abuse notation and will consider a tree as the set of its nodes, or its variables.

Example. Figure 1a displays the primal graph of a graphical model with 10 binary cost functions, each one corresponds to an edge in the graph. Figure 1b displays a pseudotree. The ancestors of C are A and B. Its children are D and E. The grayed area corresponds to TB,1.

AND/OR search. A state-of-the-art method to solve reasoning tasks on graphical models is to search their AND/OR graph (Marinescu and Dechter 2009). The AND/OR search tree, which is guided by a pseudo-tree T of the primal graph G, consists of alternating levels of OR and AND nodes. Each OR node is labeled with a variable Xp X. Its children are AND nodes, each labeled with an instantiation xp of Xp. The weight of the edge, w(Xp, xp), is the sum of costs of all the functions f F that are completely instantiated at xp but are not at its parent Xp. Children of AND nodes are OR nodes, labeled with the children of Xp in T. Each child represents a conditionally independent subproblems given assignments to their ancestors. Those edges are not weighted. The root

of the AND/OR search tree is the root of T. The path from an AND node labeled xp to the root corresponds to a partial assignment to Xp denoted xp. A solution tree in the AND/OR tree is a subtree that (1) contains its root , (2) if an OR node is in the solution subtree, then exactly one of its children is in the solution tree, (3) if an AND node is in the solution tree, then all its children are. A solution tree corresponds to a complete assignment and its cost is the sum-cost of its arc weights. The AND/OR search tree can be converted into a graph by merging nodes that root identical subproblems. It was shown that identical subproblems can be identiﬁed by their context (a partial instantiation that decomposes the subproblem from the rest of the problem graph), yielding an AND/OR search graph (Marinescu and Dechter 2009).

Example. Figure 1d displays the AND/OR search graph of the running example. A solution tree, corresponding to the assignment (A = 0, B = 1, C = 1, D = 0, E = 0, F = 0, G = 0), is highlighted.

AND/OR Branch-And-Bound (AOBB) is a state-of-theart algorithm for solving combinatorial optimization problems over graphical models (Marinescu and Dechter 2009). It explores the set of (partial) assignments by traversing the weighted AND/OR context-minimal search graph in a depthﬁrst manner, while keeping track of the current best solution (upper bound, ub) of the subtree rooted at each node xp, denoted ub( xp). Each AND node has an associated heuristic value h( xp) which is a lower bound on the best cost

extension of xp to its descendents in Tp. Then whenever h( xp) ub( xp), search below xp can be safely pruned. The heuristic is also used to guide the order in which OR nodes and AND nodes are expanded (namely, in which order independent subproblems are considered and in which order values for the current variable are considered).

Mini-Bucket Elimination Heuristic. Current implementations of AOBB (Kask and Dechter 2001; Marinescu and Dechter 2009) are guided by the mini-bucket heuristic. This heuristic is based on Mini-Bucket Elimination and is called MBE(i) where i, called i-bound, allows trading off preprocessing time and space for heuristic accuracy with actual search. It belongs to the class of static heuristics, meaning that it is pre-processed before AOBB starts. MBE(i) works relative to the same pseudo-tree T which deﬁnes the AND/OR search graph. Each node Xp of T is associated with a bucket Bp that includes a set of functions. The scope of a bucket is the union of scopes of all its functions before it is processed as described next. First, each function f of the graphical model is placed in a bucket Bp if Xp is the deepest variable in T s.t. Xp scope(f). Then MBE(i) processes the buckets of the pseudo-tree from leaves towards the root. Processing a bucket may require partitioning the bucket s functions into mini-buckets Bp = r Br p, where each Br p includes no more than i variables. Then, processing each mini-bucket separately, all its functions are combined (by sum in our case) and the bucket s variable (Xp) is eliminated (by min in our case). Each resulting new function, also called a message, is placed in the closest ancestor bucket whose variable is contained in its scope. MBE(i) is time and space exponential in the i-bound.

Bucket Elimination (BE). In the special case in which the i-bound is high enough so there is no partitioning into minibuckets, the algorithm is the exact bucket-elimination (BE) scheme (Dechter 2013). The time and space complexity of BE is exponential in the size of the largest bucket scope encountered which is called the induced width and is denoted w , for that ordering.

Notation. In the following, fp denotes an original function placed in Bucket Bp (if there is more than one, they are all summed into a single function), and λj p denotes a message created at bucket Bj and placed in bucket Bp. Processing bucket Bp produces messages λp i for some ancestors Xi of Xp (i.e, Xi Xp Xp).

Example. Figure 1c illustrates the execution of MBE(3) in our running example by means of the so-called minibucket tree (nodes are buckets and tree-edges show message exchanges). In this example, bucket BD is the only one that needs to be partitioned into mini-buckets. Each mini-bucket generates a message. In the ﬁgure, messages are displayed along an edge to emphasize the bucket where they are generated and the bucket where they are placed. For instance, λD C is generated in bucket D (as λD C = min D{f(B, D) + f(B, D)}) and placed in bucket C.

Extracting the mini-bucket heuristic (Kask and Dechter

1999). The (static) mini-bucket heuristic used by AOBB requires a MBE(i) execution prior to search keeping all the message functions. Let xp be a partial assignment. Λj denotes the sum of the messages sent from bucket Bj to all of the instantiated ancestor variables,

Xq Xp λj q( xp) (1)

The heuristic value at node xp is,

Xj Tp Λj( xp) (2)

Example. In our example, the heuristic function of partial assignment (A = 0, B = 1) is h(A = 0, B = 1) = λD A(A = 0) + λC B(B = 1) + λF B(A = 0, B = 1)

AND/OR Look-Ahead OR Look-Ahead In the ﬁeld of heuristic search the general task is to ﬁnd the least cost path in a weighted directed graph from one initial node to one of the possibly many goal nodes. The search is guided by an admissible heuristic function h(n) which underestimates the least cost path from n to a goal state. More accurate h(n) decreases the expanded search space (Nilsson 1980). If h is not accurate enough it can be improved by means of a look-ahead, a technique useful in game playing and in planning with uncertainty, especially for anytime solvers (Geffner and Bonet 2013; Vidal 2004). DEFINITION 1 (look-ahead, residual). Let ch(n) be the children of node n in the search graph and h be a heuristic function. The d look-ahead of n is,

hd(n) = min ni ch(n){w(n, ni) + hd 1(ni)}

where w(n, ni) is the weight of the arc, and h0 = h. The look-ahead residual, deﬁned by,

res(n) = h1(n) h(n)

measures the error of h with respect to the next level.

AND/OR Look-Ahead. Focusing now on AND/OR search for graphical models, let n be an AND node labeled by xp terminating a partial assignment xp. Extending the deﬁnition of look-ahead to the next level of AND nodes (i.e, two levels in the AND/OR graph) we get:

Xq ch(Xp) min xq {w(Xq, xq) + hd 1( xq)} (3)

Specializing to the MBE heuristic we can show PROPOSITION 1. If h denotes the mini-bucket heuristic, xp is a partial assignment and x p,d is an extension of the assignment to all the variables in Tp,d

hd( xp) = h( xp) + Ld( xp)

Xk Tp,d Λk( xp) (4)

Ld( xp) = min x p,d {

Xk Tp,d [fk( xp, x p,d) +

Xj Tp Tp,d λj k( xp, x p,d)]}

(5) Proof. (sketch) The reformulation is obtained after expanding the recursive deﬁnition of lookahead (eq. 3) and replacing the generic heuristic by the MBE heuristic (eq. 2 and 1)

Ld( xp) is a min-sum problem over a graphical model, as deﬁned next. DEFINITION 2 (look-ahead graphical model). Given a graphical model M = (X, D, F) and messages generated by MBE(i) along pseudo-tree T, M( xp, d) is the depth-d look-ahead graphical model at xp. It has as variables the set {Xk| Xk Tp,d} and has functions {fk(X, xp)| Xk Tp,d} {λj k(X, xp)| Xj Tp Tp,d, Xk Tp,d}. Clearly, Tp,d is a valid pseudo-tree of M( xp, d), that we call the look-ahead tree. The induced width of M( xp, d) is noted lwp,d.

Example. In Figure 1, the depth 1 look-ahead graphical model relative to assignment (A = 0, B = 1) contains variables {C, F}, and the functions, {f(B = 1, C), f(B = 1, F), λD C(B = 1, C), λE C(B = 1, C), λG F (A = 0, F)}, whose induced width is 1. Figure 1e displays the depth 2 look-ahead tree at variable B. Clearly, lwd min{d, w } and the bound is tight. In practice, only shallow look-aheads may be feasible because the complexity of solving the look-ahead graphical model is exponential in lwd. A main source for improving the efﬁciency in our look-ahead is using effective variable elimination algorithms to compute the look-ahead at each node. We next focus on the main method for bounding the look-ahead overhead using the notion of bucket error.

Bounding the Look-Ahead Overhead Bucket Error The residual measures the error of one level look-ahead. For the mini-buckets heursitic the error is originated by the minibucket partitioning. In the following we will compare the message a bucket would have computed without partitioning (called exact bucket message and noted μ k) against the messages computed by mini-buckets of the bucket (called combined mini-bucket message and noted μk). The difference between the two captures the error introduced by partitioning the bucket. DEFINITION 3 (bucket and mini-bucket messages at Bk). Given a mini-bucket partition Bk = k Br k, we deﬁne the combined mini-bucket message at Bk,

f Br k f) (6)

In contrast, the exact message that would have been generated with no partitioning at Bk is,

μ k = min Xk

Notice that while μ k is exact for Bk it may contain partitioning errors introduced in earlier buckets. DEFINITION 4 (local bucket-error at Bk). Given a run of MBE, the local bucket error function at Bk denoted Errk is,

Errk = μ k μk (8)

The scope of Errk is a subset of Xk Xk

Example. In an MBE(3) execution as in Figure 1c the bucket error of D is Err D = min D[f(A, D) + f(B, D) + f(C, D)] - (λD A(A) + λD C(B, C)). Algorithm 1 (BEE) computes the bucket errors. Following MBE(i), it executes a second pass from leaves to root along the pseudo-tree. When considering bucket Bk, it computes μk, μ k and Errk. The complexity is exponential in the scope of the bucket after the MBE(i) execution. The total cost is dominated by the largest scope. This number is captured by a new parameter that we call pseudo-width.

Algorithm 1: Bucket Error Evaluation (BEE)

Input: A Graphical model M = (X, D, F), a pseudo-tree T, i-bound Output: The error function Errk for each bucket Initialization: Run MBE(i) w.r.t. T. for each Bk, Xk X do

Let Bk = k Br k be the partition used by MBE(i) μk =

f Br k f) μ k = min Xk

f Bk f Errk μ k μk return Err functions

DEFINITION 5 (pseudo-width(i)). Given a a run of MBE(i) along pseudo-tree T, the pseudo-width of Bj, psw(i) j is the number of variables in the bucket at the moment of being processed. The pseudo-width of T relative to MBE(i) is denoted psw(i) = maxj psw(i) j .

THEOREM 1 (complexity of BEE). The complexity of BEE is O(n kpsw(i)), where n is the number of variables, k bounds the domain size and psw(i) is the pseudo-width along T relative to MBE(i). The pseudo-width lies between the width and the induced width w of the ordering, and it grows with the i-bound. When the i-bound of MBE is large, computing the error exactly may not be possible. We next show (Theorem 2) the relationship between the residual of the mini-bucket heuristic and the bucket errors. To prove it we need the following lemmas which relates μk (Eq. 6) to the MBE heuristic of its parent Xp, and relates μ (Eq. 7) to the combinatorial part of the look-ahead computation (Eq. 5). Lemma 1. If Xk is a child of a variable Xp. Then, Λk( Xp) = μk( Xp)

Proof. Λk( Xp) (see Eq. 1) is the sum of messages that MBE(i) sends from Bk to the buckets of variables in Xp.

Since Xp is the parent of Xk, Λk( Xp) is the sum of all the messages departing from Xk, which is the deﬁnition of μk( Xp) Lemma 2. If Xk is a child of a variable Xp T. Then, L1( xp) =

Xk ch(Xp) μ k( xp)

Proof. (sketch) In the expression of L1( xp) (see Eq. 5) it is possible to push the minimization into the summation. Thus,

Xk ch(Xp) min xk {fk( xp, xk) +

Xj Tp ch(Xp) λj k( xp, xk)}

The set of functions inside of each minxk are, by deﬁnition, the set of functions in Bk placed there either originally or are messages received from its descendent, yielding (Eq. 7)

Xk ch(Xp) min xk

Xk ch(Xp) μ k

THEOREM 2 (residual and bucket-error). Assume an execution of MBE(i) along T yielding heuristic h Then, for every xp res( xp) =

Xk ch(Xp) Errk( xp) (9)

Proof. (sketch) From the deﬁnition of residual and Prop 1,

res( xp) = L1( xp) (

Xk ch(Xp) Λk( xp))

From the Lemma 1 and Lemma 2,

Xk ch(Xp) μ k( xp)

Xk ch(Xp) μk( xp)

Grouping together expressions that refer to the same child proves the theorem.

Corollary 1. When a bucket is not partitioned into minibuckets, its bucket error is 0, and therefore it contributes zero to the residual and the look-ahead of its parent.

Pruned Look-Ahead Trees Given the error functions, we can now identify variables that are irrelevant for lookahead (that is, their look-ahead does not produce any improved heuristic). Once those are identiﬁed, for each variable and look-ahead depth d, we will prune its look-ahead subtree to include only paths that touch variables that are relevant. If the bucket error function is zero for a variable, the variable is clearly irrelevant for 1level look-ahead. However to allow more ﬂexibility we will identify those buckets whose average relative bucket error is below a given threshold ϵ. Below, dom(Bj) denotes all the assignments to variables in the scope of bucket Bj. DEFINITION 6 (average relative bucket error). The average relative bucket error of Xj given a run of MBE(i) is

Ej = 1 |dom(Bj)|

DEFINITION 7 (irrelevant variable). A variable Xj is ϵirrelevant iff Ej ϵ.

DEFINITION 8 (minimal pruned d-level subtree). Given a threshold ϵ, a minimal pruned look-ahead subtree T p,d for variable Xp is obtained from Tp,d by removing ϵ-irrelevant leaves Xj (recursively until quiescence).

Example. Corollary 1 identiﬁes trivial 0-irrelevant variables. For instance, in the MBE execution displayed in Figure 1c, all variables but D are 0-irrelevant. It may happen that variable D is also 0-irrelevant or ϵ-irrelevant for some small ϵ, but we cannot tell from the symbolic execution displayed in the ﬁgure. It can be detected using bucket errors. Figure 1e shows a depth 2 look-ahead tree for B and (grayed) the minimal pruned subtree assuming ϵ = 0 and only irrelevant variables as for Corollary 1.

Algorithm 2: Minimal Pruned Look-Ahead Subtree

Input: A Graphical model M = (X, D, F), a pseudo-tree T, i-bound, threshold ϵ, depth d Output: Minimal pruned look-ahead subtrees Initialization: Run MBE(i) w.r.t. T. X X Run BEE(M, T, i) for each Bj, Xj X do

Ej 1 |dom(Bj)|

xj Errj( xj)

μ ( xj) if Ej < ϵ then X X {Xj} ; for each Xp in X do

Initialize T p,d to Tp,d while T p,d has leaves in X do Remove leaves Xj / X from T p,d

return T p,d for each Xp

Algorithm 2 describes the generation of pruned look-ahead trees for each variable.

Experimental Evaluation

Algorithm Setup. We experimented with running AND/OR branch-and-bound (AOBB) guided by MBE(i) with moment-matching (Ihler et al. 2012). To ensure a ﬁxed search space for all algorithms we used pre-computed variable orderings. We tried 2-3 different i-bounds for each problem instance, aiming at a diversity of heuristic accuracy. We run Algorithm 2 for pre-processing, yielding a minimal pruned look-ahead subtree for each variable, When the BEE computation (and its table) gets too large (e.g. over 106) we sample (e.g. 105 assignments). Within each i-bound setting, we varied the look-ahead depth from 0 to 6. Benchmarks. Includes instances from genetic linkage analysis (pedigree, large Fam) and medical diagnosis (promedas) (see Table 2). In total, we evaluated 59 problem instances with induced widths ranging from 19 to 120.

instances Lookahead i-bound (n,w,h,k,fns,ar) depth ϵ / rv / nel / ptime / time / spd / nodes / red ϵ / rv / nel / ptime / time / spd / nodes / red ϵ / rv / nel / ptime / time / spd / nodes / red Pedigree networks

i=6 i=10 i=20

LH(0) - / - / - / 0 / 8628 / 1.00 / 4764 / 1.00 - / - / - / 0 / 311 / 1.00 / 175 / 1.00 - / - / - / 48 / 55 / 1.00 / 3 / 1.00 pedigree7 LH(1) 1.0 / 111 / 88 / 0 / 6895 / 1.25 / 3637 / 1.31 1.0 / 76 / 67 / 1 / 257 / 1.21 / 135 / 1.30 0.01 / 52 / 51 / 52 / 59 / 0.93 / 3 / 1.16

(867,32,90,4,1069,4) LH(3) 1.0 / 111 / 137 / 0 / 5761 / 1.50 / 2020 / 2.36 1.0 / 57 / 89 / 1 / 233 / 1.33 / 96 / 1.82 0.01 / 52 / 83 / 52 / 59 / 0.93 / 2 / 1.80

LH(6) 1.0 / 111 / 164 / 0 / 4681 / 1.84 / 519 / 9.17 1.0 / 57 / 116 / 1 / 481 / 0.65 / 53 / 3.30 0.01 / 52 / 94 / 52 / 65 / 0.85 / <1 / 3.88

i=5 i=8 LH(0) - / - / - / 0 / 1262 / 1.00 / 826 / 1.00 - / - / - / 0 / 35 / 1.00 / 23 / 1.00 - / - / - / - / - / - / - / - pedigree18 LH(1) 0.01 / 151 / 134 / 0 / 912 / 1.38 / 564 / 1.47 0.01 / 92 / 82 / 0 / 20 / 1.75 / 13 / 1.76 - / - / - / - / - / - / - / -

(931,19,102,5,1185,5) LH(3) 0.01 / 151 / 203 / 0 / 691 / 1.83 / 311 / 2.66 0.01 / 92 / 132 / 0 / 12 / 2.92 / 6 / 4.00 - / - / - / - / - / - / - / -

LH(6) 0.01 / 93 / 192 / 0 / 300 / 4.21 / 66 / 12.47 0.01 / 92 / 152 / 0 / 13 / 2.69 / 2 / 11.06 - / - / - / - / - / - / - / - Large Fam linkage networks i=17 i=18 LH(0) - / - / - / 33 / 10427 / 1.00 / 6730 / 1.00 - / - / - / 34 / 4349 / 1.00 / 2809 / 1.00 - / - / - / - / - / - / - / - lf3 11 53 LH(1) 0.01 / 78 / 73 / 36 / 8611 / 1.21 / 4875 / 1.38 0.01 / 75 / 70 / 36 / 3653 / 1.19 / 2116 / 1.33 - / - / - / - / - / - / - / -

(1094,39,71,3,1567,4) LH(3) 0.01 / 78 / 96 / 35 / 5481 / 1.90 / 1674 / 4.02 0.01 / 75 / 91 / 36 / 2750 / 1.58 / 901 / 3.12 - / - / - / - / - / - / - / - LH(6) 0.01 / 78 / 101 / 36 / 20147 / 0.52 / 583 / 11.55 0.01 / 75 / 98 / 36 / 10918 / 0.40 / 323 / 8.68 - / - / - / - / - / - / - / - i=10 i=12 i=14 LH(0) - / - / - / 0 / 1507 / 1.00 / 926 / 1.00 - / - / - / 0 / 39 / 1.00 / 26 / 1.00 - / - / - / 0 / 18 / 1.00 / 12 / 1.00 lf3 15 53 LH(1) 0.01 / 80 / 71 / 1 / 1149 / 1.31 / 671 / 1.38 0.01 / 80 / 77 / 2 / 37 / 1.05 / 22 / 1.20 0.01 / 80 / 76 / 3 / 18 / 1.00 / 10 / 1.20

(1480,32,71,3,2171,4) LH(3) 0.01 / 80 / 112 / 1 / 887 / 1.70 / 413 / 2.25 0.01 / 80 / 116 / 2 / 31 / 1.26 / 14 / 1.90 0.01 / 80 / 114 / 3 / 13 / 1.38 / 5 / 2.17 LH(6) 0.01 / 80 / 134 / 1 / 1401 / 1.08 / 198 / 4.68 0.01 / 80 / 138 / 2 / 65 / 0.60 / 8 / 3.28 0.01 / 80 / 129 / 3 / 28 / 0.64 / 3 / 3.41 Promedas networks i=20 i=23 LH(0) - / - / - / 7 / 10645 / 1.00 / 3528 / 1.00 - / - / - / 43 / 9228 / 1.00 / 3048 / 1.00 - / - / - / - / - / - / - / - or chain 151 LH(1) 0.01 / 106 / 98 / 15 / 9398 / 1.13 / 2930 / 1.20 0.01 / 103 / 96 / 53 / 21603 / 0.43 / 3948 / 0.77 - / - / - / - / - / - / - / -

(1911,94,165,2,1928,3) LH(3) 0.01 / 106 / 148 / 15 / 8278 / 1.29 / 2176 / 1.62 0.01 / 103 / 136 / 53 / 5895 / 1.57 / 1559 / 1.96 - / - / - / - / - / - / - / - LH(6) 0.01 / 106 / 172 / 14 / 9402 / 1.13 / 1123 / 3.14 0.01 / 103 / 160 / 52 / 5934 / 1.56 / 757 / 4.03 - / - / - / - / - / - / - / - i=15 i=20 i=24 LH(0) - / - / - / 0 / 2958 / 1.00 / 1328 / 1.00 - / - / - / 4 / 917 / 1.00 / 403 / 1.00 - / - / - / 64 / 542 / 1.00 / 210 / 1.00 or chain 230 LH(1) 0.01 / 82 / 78 / 3 / 2590 / 1.14 / 1086 / 1.22 0.01 / 68 / 63 / 8 / 863 / 1.06 / 364 / 1.11 0.01 / 64 / 62 / 72 / 506 / 1.07 / 184 / 1.14

(1338,61,109,2,1357,3) LH(3) 0.01 / 82 / 111 / 3 / 2169 / 1.36 / 739 / 1.80 0.01 / 60 / 95 / 8 / 800 / 1.15 / 290 / 1.39 0.01 / 64 / 95 / 70 / 435 / 1.25 / 130 / 1.61 LH(6) 0.01 / 82 / 123 / 3 / 3552 / 0.83 / 379 / 3.50 0.01 / 40 / 91 / 8 / 1180 / 0.78 / 202 / 2.00 0.01 / 64 / 107 / 70 / 615 / 0.88 / 83 / 2.52

Table 1: Statistics on pedigree, largefam, promedas instances for exact solutions. We selected 2-3 i-bounds for each instance, aiming to vary magnitudes of search, ranging from signiﬁcant to minimal (dictated by our memory limit of 4Gb for constructing the MBE heuristic). The time and nodes are emphasized in bold. For each i-bound, we box the best particular time across the different look-ahead depths.

Benchmark # inst n k w h |F| a

Pedigree 17 387 3 19 58 438 4 1015 7 39 143 1290 5

Large Fam 14 950 3 32 66 1383 4 1530 3 40 95 2352 4

Promedas 28 735 2 41 77 749 3 2113 2 120 180 2134 3

Table 2: Benchmark statistics. # inst - n. of instances, n - n. of variables, w - induced width, h - pseudotree height, k - max. domain size, |F| - n. of functions, a - max. arity. The top value is the min. and the bottom value is the max. for that statistic.

In Table 1, we report detailed statistics across a representative set of instances from our benchmarks. The main three columns cover the different i-bounds. Each tuple reports the ϵ, and the number of ϵ-relevant variables (rv), as well as the number of variables where look-ahead was applied (i.e., whose pruned trees were not empty) (nel). We report preprocessing time (in seconds) (ptime), total CPU time (in seconds) (time), its associated speedup (spd), the number of nodes generated (in millions of nodes), and its associated ratio of node reduction (red). We report a subset of depths

for space reasons. Deeper look-ahead yields smaller search space. As expected, we see universally across all our experiments that more look-ahead results in smaller search space . However, this is not always cost effective. Look-ahead beneﬁts weaker heuristics more. For example, on pedigree7 the best speedup at i=6 is 1.84 while the best speedup at i=10 is 1.33; on pedigree18 the best speedup at i=5 is 4.21 while the best speedup at i=8 is 2.92. For weaker heuristics, deeper look-ahead is more effective. For example, we observe that on pedigree7 the best look-ahead at i=6 is d=6 while the best look-ahead at i=10 is d=3; on pedigree18 the best look-ahead at i=5 is d=6 while the best look-ahead at i=8 is d=3. Weaker heuristics will invite more look-ahead for the same ϵ. We observe how the threshold of 0.01 yields a fraction of ϵ-relevant variables leading to a small subset of variables where look-ahead is applied (lhn). These numbers also differ depending on the heuristic strength. Figure 2 presents best speedups using scatter diagrams as well as avg/min/max best speedups per benchmark set.

Figure 3 provides a comparison of average speedups between the full look-ahead tree vs the minimal pruned tree, as a function of the look-ahead level d, using a ﬁxed threshold of 0.01. The full look-ahead tree corresponds to applying

Figure 2: Total CPU times for using no look-ahead plotted against best total CPU times for any level of look-ahead (in log scale). Each point in the gray region represents an instance where the look-ahead heuristic has better time performance for some setting of the depth. We also report the number of instances which were actually solved and the average speedup across these instances.

Figure 3: Average speedups relative to using no look-ahead per depth for the minimal and full look-ahead subtrees.

lookahead naively. We observe here the advantage of using minimal pruned trees. For example, on the pedigree instances, there is almost no speedup on average when using the full look-ahead subtree by a depth of 3, but using the minimal pruned tree allows look-ahead to remain cost-effective.

This paper opens up an investigation on the use of look-ahead in graphical models. We address the challenge of ﬁnding an effective balance between the look-ahead overhead and its punning power, by exploiting the graphical structure. We have showed a general relationship between the residual of the mini-bucket heuristic and a local bucket-error due to the mini-bucket algorithm. This relationship shows that the residual (and consequently depth-1 look-ahead) can be precompiled using a message-passing like computation. As such, we showed that the bucket-errors can assist in controlling the

look-ahead at any depth. We showed empirically that even this simple mechanism of identifying irrelevant variables can be instrumental in making deep-level look-head costeffective. In addition, for the mini-bucket heuristic, we have characterized look-ahead as an inference task, which allows a full inference solving approach. In the future, we plan to automate parameter selection for optimizing look-ahead, and explore the potential for anytime combinatorial optimization.

Acknowledgements

We thank the reviewers for their valuable feedback. This work was suppported in part by NSF grants IIS-1065618, IIS1526842, and IIS-1254071, the US Air Force under Contract No. FA8750-14-C-0011 under the DARPA PPAML program, and MINECO under projects TIN2013-45732-C4-3-P and TIN2015-69175-C4-3-R.

References Bu, Z.; Stern, R.; Felner, A.; and Holte, R. C. 2014. A* with lookahead re-evaluated. In Edelkamp, S., and Bart ak, R., eds., Proceedings of the Seventh Annual Symposium on Combinatorial Search, SOCS 2014, Prague, Czech Republic, 15-17 August 2014. AAAI Press. Dechter, R. 2013. Reasoning with Probabilistic and Deterministic Graphical Models: Exact Algorithms. Synthesis Lectures on Artiﬁcial Intelligence and Machine Learning. Morgan & Claypool Publishers. Geffner, H., and Bonet, B. 2013. A Concise Introduction to Models and Methods for Automated Planning. Synthesis Lectures on Artiﬁcial Intelligence and Machine Learning. Morgan & Claypool Publishers. Ihler, A.; Flerova, N.; Dechter, R.; and Otten, L. 2012. Joingraph based cost-shifting schemes. In Uncertainty in Artiﬁcial Intelligence (UAI). Corvallis, Oregon: AUAI Press. 397 406. Kask, K., and Dechter, R. 1999. Branch and bound with mini-bucket heuristics. Proc. IJCAI-99. Kask, K., and Dechter, R. 2001. A general scheme for automatic search heuristics from speciﬁcation dependencies. Artiﬁcial Intelligence 91 131. L. Otten, A. Ihler, K. K., and Dechter, R. 2012. Winning the pascal 2011 map challenge with enhanced and/or branchand-bound. In Workshop on DISCML 2012 (a workshop of NIPS 2012). Marinescu, R., and Dechter, R. 2009. Memory intensive and/or search for combinatorial optimization in graphical models. Artif. Intell. 173(16-17):1492 1524. Nilsson, N. J. 1980. Principles of Artiﬁcial Intelligence. Tioga, Palo Alto, CA. Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann. Vidal, V. 2004. A lookahead strategy for heuristic search planning. In Proceedings of the Fourteenth International Conference on Automated Planning and Scheduling (ICAPS 2004), June 3-7 2004, Whistler, British Columbia, Canada, 150 160.