# scaling_up_andor_abstraction_sampling__f73a1ee1.pdf

Scaling Up AND/OR Abstraction Sampling

Kalev Kask, Bobak Pezeshki, Filjor Broka, Alexander Ihler and Rina Dechter University of California, Irvine, Irvine, CA 92697, USA {kkask, pezeshkb, fbroka, ihler, dechter}@ics.uci.edu

Abstraction Sampling (AS) is a recently introduced enhancement of Importance Sampling that exploits stratiﬁcation by using a notion of abstractions: groupings of similar nodes into abstract states. It was previously shown that AS performs particularly well when sampling over an AND/OR search space; however, existing schemes were limited to proper abstractions in order to ensure unbiasedness, severely hindering scalability. In this paper, we introduce AOAS, a new Abstraction Sampling scheme on AND/OR search spaces that allow more ﬂexible use of abstractions by circumventing the properness requirement. We analyze the properties of this new algorithm and, in an extensive empirical evaluation on ﬁve benchmarks, over 480 problems, and comparing against other state of the art algorithms, illustrate AOAS s properties and show that it provides a far more powerful and competitive Abstraction Sampling framework.

1 Introduction An important task in probabilistic graphical model inference is estimating the partition function (Z). Monte-Carlo methods have traditionally been used for generating Z estimates, with Importance Sampling (IS) being a primary framework [Rubinstein and Kroese, 2007; Liu et al., 2015; Gogate and Dechter, 2011]. In IS, samples represent a single full conﬁguration of the variables, with each given an importance weight. A newly introduced framework, Abstraction Sampling (AS) [Broka et al., 2018], inspired by [Knuth, 1975; Chen, 1992], expands on IS by allowing each sample to represent multiple conﬁgurations rather than just one. These samples of multiple conﬁgurations are called probes. In Abstraction Sampling, probes are generated by grouping nodes together into abstract states based on an abstraction function, and then picking a representative node from each group to build a sampled subtree. As such, each probe is a subtree of the full search tree. Each probe is reweighted based on its chosen representative nodes and contributes to a Monte Carlo estimate of the partition function of the full tree. In particular, Abstraction Sampling smoothly interpolates between standard IS (where all of a variable s states are

abstracted into a single abstract state) and standard search (where only provably equivalent states are abstracted into a single abstract state in the search tree). As such, Abstraction Sampling can be considered a generalization of Stratiﬁed Sampling [Rubinstein and Kroese, 2007; Rizzo, 2007] combined with compact search [Dechter and Mateescu, 2007]. [Broka et al., 2018] deﬁned AS over OR and AND/OR search spaces and provided an analysis of complexity and conditions for unbiasedness. They introduced abstraction functions for which AS signiﬁcantly improved performance over the baseline of Importance Sampling and was shown to be highly competitive against two state-of-the-art schemes: Weighted Mini-Bucket Importance Sampling [Liu et al., 2015] and IJGP-Sample Search [Gogate and Dechter, 2011]. However, a serious limitation of the existing AND/OR AS algorithm is that abstractions need to adhere to a property called properness to ensure unbiased estimation. This restriction (to be described) inhibits the algorithm s ability to group nodes into abstract states and thus to limit the size of its probes, severely hindering scalability. Contributions. 1. We introduce AOAS, a new Abstraction Sampling scheme over AND/OR search spaces. This new scheme lifts the properness restriction altogether, significantly enhancing the scalability of Abstraction Sampling for AND/OR spaces. 2. We provide new analysis and extensive empirical evaluation of all Abstraction Sampling schemes, and highlight how the new AOAS algorithm empowers formation of abstractions. 3. We compare against a new highly competitive scheme, Dynamic Importance Sampling (DIS), and show that AOAS is largely superior. We also strengthen the empirical results on instances lacking exact answers by placing the estimates within bounds produced by DIS.

2 Background and Deﬁnitions A graphical model, such as a Bayesian or a Markov network [Pearl, 1988; Darwiche, 2009; Dechter, 2013], can be deﬁned by a 3-tuple M = (X, D, Φ), where X = {Xi : i V } is a set of variables indexed by a set V and D = {Di : i D} is the set of ﬁnite domains of values for each Xi. Each function ψα Φ is deﬁned over a subset of the variables called its scope, Xα, where α V are the indices of variables in its scope and Dα denotes the Cartesian product of their domains, so that ψα : Dα R 0. The primal graph G = (V, E) of a graphical model associates each variable with a node

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

Figure 1: Example of an AND/OR search space. (a) A primal graph, (b) A possible pseudo tree for primal graph a), (c) An AND/OR search tree guided by pseudo tree b). To calculate the value of an OR node, we sum the values of its children multiplied by their arc weights, and for an AND node we multiply the values of it children. Partition function of the model is the resulting value of the root node.

(V = X), while arcs e E connect nodes whose variables appear in the scope of the same local function. Graphical models can be used to represent a global function, often a probability distribution, deﬁned by Pr(X) Q α ψα(Xα). An important task is to compute the normalizing constant, also known as the partition function, Z = P X Q α ψα(Xα).

Search Spaces of Graphical Models A graphical model can be transformed into a weighted state space graph. In an OR search space, which is constructed layer-by-layer relative to a variable ordering, paths from the root to the leaves represent full conﬁgurations - or assignments to all variables - where each successive level corresponds to an assignment of the next variable in the ordering. A graphical model can also be transformed into a more compact AND/OR search space by capturing its conditional independencies, thus facilitating more effective algorithms [Dechter and Mateescu, 2007]. An AND/OR search space is deﬁned relative to a pseudo tree of a primal graph. A pseudo tree T =(V, E ) of a primal graph G = (V, E) is a directed rooted tree that spans G such that every arc of G not in E is a back-arc in T connecting a node to one of its ancestors (Figure 1(a),(b)). A variable is a branching variable if it has multiple children in T . Given a pseudo tree T of a primal graph G, the AND/OR search tree TT guided by T has alternating levels of OR nodes corresponding to variables, and AND nodes corresponding to an assignment from its domain with edge costs extracted from the original functions [Dechter and Mateescu, 2007]. Let n be an AND node in Tτ, also denoted n X if X is the last variable of its partial conﬁguration. Each arc into an AND node n has a cost c(n) deﬁned to be the product of all factors ψα in M that are instantiated at n but not before. A solution tree is a subtree of TT satisfying: (1) it contains the root of TT ; (2) if an OR node is in the solution tree, exactly one of its AND child nodes is in the solution tree; (3) if an AND node is in the tree then all of its OR children are in the solution tree. Finally, its leaves are leaves of TT .

The cost of a solution tree is the product of its arc-costs and is equal to the cost of the corresponding full conﬁguration of the graphical model [Dechter and Mateescu, 2007].

Value of A Node The partition function, Z, of a graphical model M can be obtained by computing the value of the AND/OR search tree T, V (T), by summing the costs of all its solution trees. This can be done recursively from leaves to root [Dechter and Mateescu, 2007] by associating a value function V (n) with every node n in the AND/OR search tree. The value of a node, V (n), is deﬁned as the sum cost of all partial solution-trees rooted at n. In particular, V (n X) denotes the value of an AND node n X of variable X. V (Yn X) denotes the value of an OR node Y that is the child of an AND node n X, Y being the child of X in the pseudo tree. The value function can be computed by the following recursive expression:

V (n X) = Y

Y ch T (X) V (Yn X) (1)

V (Yn X) = X

n Y ch Y (n X) c(n Y ) V (n Y ) (2)

and where ch() denotes child variables either in the pseudotree or the search tree itself (depending on the context). Here, ch Y (n X) are the child AND nodes of Y descended from AND node n X. Note that V (T) = V (root), where root is the root node of TT . It follows that V (root) = ZM, the partition function of the underlying model (see Figure 1c). But what is the meaning of V (n) for an arbitrary AND node n w.r.t the partition function of M? Let path(n) denote the conﬁguration of variables on the path from the root to n in TT , let Z|path(n) be Z conditioned on the assignment to path(n), and let Out(path(n)) be the set of OR nodes (corresponding to variables) emanating from path(n), which are OUTside path(n). Then Z, when restricting to the partial conﬁguration of path(n), is obtained by multiplying g(n), the cost of path(n), by V (n) and by all the values from OR nodes branching out of path(n) (the latter referred to as the value of the ancestor branching ). It is easy to see that

Z|path(n) = g(n) V (n) Y

Y Out(n path(n)) V (Yn ) (3)

where V (Yn ) is as deﬁned in EQ. 2.

Deﬁning the product factor in Eq. 3, which we call ancestral branching value as

Y Out(n path(n)) V (Yn ) (4)

we get that Z|path(n) = g(n) V (n) R(n), an expression that will be used in the algorithm for estimating the partition function at nodes of the search tree.

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

(a) Full AND/OR tree

(b) ancestor branching

(c) p AOAS probe

(d) invalid probe

(e) AOAS probe

Figure 2: An AND/OR search tree and possible probes generated by abstraction sampling. Nodes are abstracted based on having the same domain value (denoted by having the same color). (a) A full AND/OR tree, representing all 16 solutions, (b) Boxed in green is the ancestor branching subtree for the path (B=0) (C=1), (c) A probe generated by proper abstractions, representing 8 solutions, (d) An invalid probe generated by improper abstractions with nodes abstracted across different sub-branches under the branching variable B (contains 4 partial conﬁgurations, and no full solution), (e) A valid probe generated by AOAS with improper abstractions, representing 4 solutions.

Example 1. An example of expression (3) is given in Figure 2(b). Consider the path to node n = (B=0, C =1) marked in red. There is only one branching variable ancestor (B) and, in this case, the only Out(path(n)) node is the A node emanating from under (B = 0). Thus, Z|path(B=0,C=1)) = g(B = 0, C = 1) V (B = 0, C = 1) V (B = 0, A).

Stratiﬁed Importance Sampling. Abstraction sampling builds on Importance Sampling and Stratiﬁed Sampling. Importance Sampling (IS) is a Monte Carlo scheme for approximating likelihood queries over graphical models. Stratiﬁed Sampling is a variance reduction technique for sampling a search space by ﬁrst dividing it into disjoint strata. This can be used with importance sampling to further reduce variance. In Stratiﬁed Importance Sampling, we ﬁrst divide the sample space into k strata of equal area under the distribution q, then choose re-weighted representatives from each strata. In order to maximize reduction in variance, the variance between strata should be maximized (see [Rizzo, 2007]).

3 AND/OR Abstraction Sampling

Abstraction Sampling (AS) algorithms [Broka et al., 2018] emulate Stratiﬁed Importance Sampling with the modiﬁcation that stratiﬁcation and selection of representatives is done variable-by-variable along a search tree. Guided by an abstraction function a that groups nodes of a variable together, Abstraction Sampling generates a compact representative subtree of the full search tree TT called a probe. An abstraction function, a, is formally deﬁned as a : TT I+, where I+ are integers, and it partitions the nodes in TT , layer by layer. From each partition (i.e. abstraction), a representative node is stochastically chosen based on a proposal p and is reweighted accordingly. AS can use any Importance Sampling proposal. For OR search trees we use p which is proportional to w(n) g(n) h(n), where h(n) is a heuristic that provides an upper bound on V (n), g(n) is the cost of path(n), and w(n) is the importance weight. The ORAS algorithm from [Broka et al., 2018], which operates on an OR search tree, is unbiased for any Importance Sampling proposal. However, it was shown that using a heuristic-based proposal function can signiﬁcantly impact convergence based on its accuracy. When h(n) = h = V (n) the proposal is ex-

act and only one sample is needed, as is the case for general Importance sampling.

p AOAS Abstraction sampling algorithms for AND/OR search spaces operate by expanding and abstracting an AND/OR probe and can traverse the pseudo-tree T in a depth-ﬁrst or breadthﬁrst manner. However, performing these abstractions naively may leave partial (invalid) conﬁgurations within the resulting probe (ex. Figure 2(d)), potentially leading to biased estimates. To avoid this issue, [Broka et al., 2018] forced abstractions to be proper with the scope of abstractions restricted to nodes descending from the same AND node of the most recent branching variable ancestor. A resulting such probe can be seen in Figure 2(c). More formally, consider a variable W whose most recent branching ancestor in T is U. When performing proper abstractions for W, only the AND nodes of W that are in the common subtree rooted under the same AND node of U can be grouped together [Broka et al., 2018]. This ensure expansion to only valid probes. Unfortunately, properness limits compactness of the sampled AND/OR trees as it constrains the set of AND nodes that can be abstracted together. In particular, instead of bounding the size of the probe by O(nm) AND nodes (as with ORAS), the number of AND nodes in a probe constructed via proper abstractions is bounded only by O(n mb+1), where n is the number of variables, b bounds the number of branching variables along any path and m bounds the maximum number of abstract states per variable. This means that proper AND/OR abstraction schemes are not scalable whenever b >> 0, unless m=1, in effect limiting us to basic Importance Sampling. Various superﬁcial schemes were used in [Broka et al., 2018] to bound probe sizes in this case.

AOAS Our solution to the above is, AOAS (Algorithm 1), a new AND/OR Abstraction Sampling algorithm. We will show the algorithm to be unbiased and effective far beyond the earlier versions. Given a pseudo tree T of a graphical model M = (X, D, Φ) that is rooted at a dummy variable D, a heuristic function h(n) providing upper bound on V (n), and an abstraction function a, AOAS traverses the pseudo tree T variable-by-variable in a depth-ﬁrst manner using the traversal to guide the generation of a partial search tree. The algo-

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

Algorithm 1: AOAS.

Input: Graphical model M = (X, D, Φ), a pseudo tree T for M rooted at a dummy singleton variable D, an abstraction function a, heuristic function h. For any node n, g(n) = its path cost, w(n) = its importance weight, and ˆV (n) = its estimated value (initialized to h(n)). Output: ˆZM, an estimate of the partition function of M

1 Function AOAS(T , h, a)

3 PROBE n D, g(n D), w(n D), r(n D), ˆV (n D) 1

4 STACK push(empty stack, D)

5 while STACK is not empty do

6 X top(STACK)

7 if X has unvisited children in T then

8 Y the next unvisited child of X

9 foreach n X PROBE do

10 PROBE PROBE expanded from n X to Y

11 F Y newly added AND nodes of Y PROBE

12 foreach n Y F Y do

13 w(n Y ) w(n X)

14 g(n Y ) g(n X) c(n Y )

15 r(n Y ) r(n X) Q {S =Y ch T (X)} ˆV (Sn X)

18 A {Ai | Ai ={n Y PROBE | a(n)=i}}

19 foreach Ai A do

20 foreach n Ai do

21 p(n) w(n) g(n) h(n) r(n) P

m Ai w(m) g(m) h(m) r(m)

23 n Yi p Ai ; // randomly select

24 w(n Yi) w(n Yi)/p(n Yi)

25 ˆV (n Yi) 1

26 PROBE PROBE \ Ai {n Yi}

28 push(STACK, Y )

30 pop(STACK), W top(STACK)

31 PROBE PROBE s.t. all n W without descendants are pruned

32 foreach n W in PROBE do

33 ˆV (n W ) ˆV (n W ) P

n X child(n W ) ˆV (n X) c(n X) w(n X)

35 if X = D then ˆZM = ˆV (D);

38 return ˆZM

rithm uses a data structure referred to as PROBE to store the generated subtree of the full search tree TT . As it progresses forward in the traversal, it expands and abstracts nodes within PROBE. Expansion is done by adding nodes of the newly visited variable across the tree in a breadth-ﬁrst manner. During backtracking, the algorithm prunes nodes and updates values. More speciﬁcally, in the forward step (lines 8-28), PROBE is extended from all current leaf AND nodes of a variable X to the next variable Y (child of X in T ) in a breadth-ﬁrst manner. The newly added AND nodes n = n Y are new leaves in PROBE. Each newly generated node n inherits the weight of its parent AND node n X and has its path cost g(n) computed. r(n) is also computed (line 15), where it estimates R(n) in the expression for Z|path(n) in EQ. (3, 4). Note that r(n) is simply inherited from r(n X) if the parent variable X is not a branching variable. Otherwise, the r(n X) from its parent is multiplied by the estimated values contributed by its sibling branches, yielding an r(n) which bounds the ancestor branching value for n. Once nodes are expanded, we perform abstractions (lines 18-27). Nodes are partitioned into abstract states according to an abstraction function a, and for each abstract state, proposals are generated from which a representative node is stochastically selected. The proposal function at node n is proportional to the quantity w(n) g(n) h(n) r(n). The portion g(n) h(n) r(n) estimates Z|path(n) (EQ. 3, 4), where V (n) is approximated by h(n) and R(n) is estimated by r(n). We multiply by w(n) to account for all the mass n represents. When backtracking to a variable W from variable X (lines 30-35), we prune any n W that has no descendant n X, thus preventing the creation of invalid probes (line 31). For the surviving nodes, we back up the values from their children from which we have backtracked (line 33). Once the algorithm backtracks to the dummy node, it will have backed up the estimated value of the full model, which is returned.

AOAS Trace Figure 3 shows a step-by-step trace of AOAS. We follow a DFS traversal of the pseudo-tree T from Figure 1(b) and sample from the corresponding AND/OR search tree TT (Figure 2(a)). The abstraction function we use groups nodes that have the same domain value, depicted by the color of the nodes. When performing abstractions, we box nodes that are being abstracted in gray. A red X marks a pruned subtree. Starting with variable B (Figure 3(a)), each node belongs to a different abstraction and is therefore kept. Next, we expand to A and abstract across its nodes (Figure 3(b)). Not restricted to proper abstractions, we partition across all nodes of A, regardless of whether they fall under B =0 or B =1. We see two nodes in each abstract state (denoted by the red and blue coloring). Next we calculate their respective proposals (line 21). Note that the proposal of each node n relies on r(n) (line 15), which captures the values of the nodes in its Out(path(n)), in this case nodes of C. Since the nodes of C have not been expanded yet, we use their heuristic values as an approximation of their values. We then stochastically choose a representative from each abstract state (line 23). Suppose that both red and blue representatives are stochas-

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

(a) AOAS (step 1)

(b) AOAS (step 2)

(c) AOAS (step 3)

(d) AOAS (step 4)

(e) AOAS (step 5)

(f) AOAS (ﬁnal)

Figure 3: A sample trace of AOAS based on the AND/OR tree in Figure 2(a). Nodes with the same domain values are abstracted (also indicated by node color). A red X indicates pruning. Transparent nodes indicate portions of the tree not pruned and yet to be explored.

tically chosen from under B=0 (Figure 3(c)). Since A has no descendant, we backtrack to B, updating its node values (line 33) and performing a pruning step (line 31). In pruning, we remove AND nodes of B that do not extend to AND nodes of A, and thus prune B=1 (denoted by the red X in Figure 3(c)), in order to ensure formation of proper AND/OR probes. Finally, we expand and abstract C and D (Figures 3(d)-3(f)). The r(n) for D s nodes is inherited from the r(n C) of its respective n C parent. We backtrack from D to the root updating values (no further pruning was necessary). The result is a valid probe (Figure 3(f)) containing four solutions: (B = 0, A = 0, C = 0, D = 0), (B = 0, A = 0, C = 1, D = 1), (B=0, A=1, C=0, D=0), and (B=0, A=1, C=1, D=1). We estimate the partition function by computing ˆV (B).

Properties of AOAS Algorithm AOAS does away with the properness requirement of p AOAS while ensuring unbiasedness. The proof is based on the fact that there is always a corresponding equivalent

OR sampling algorithm, (e.g., using a DFS ordering of the pseudo-tree), such that there is a one-to-one correspondence between AND nodes in AND/OR sampling and equivalent OR sampling, and ensuring also that the proposal for the matching nodes is the same, implying that the two algorithms produce the same estimate. Since ORAS is known to be unbiased for any proposal, our claim follows. A key to this is that our choice of proposal, which includes r(n), captures the estimated ˆZ of the full conﬁguration in the same way that would be done in the equivalent OR case. In other words we can show the following: THEOREM 1 (simulation of AOAS with ORAS). Given a graphical model M = (X, D, Φ), pseudo-tree TAO of M, an abstraction a AO, and a proposal function p, then there exists a variable ordering d deﬁning a chain-like pseudo-tree TOR of M, and an abstraction function a OR deﬁned on TOR such that for every execution trace of AOAS on TAO using a AO and proposal p r and producing a corresponding ˆZM estimate, there is an execution trace of ORAS on TOR using a OR and proposal p producing the same ˆZM estimate. THEOREM 2 (unbiasedness). Given a graphical model M = (X, D, Φ), algorithm AOAS provides an unbiased estimate for the partition function of M. Complexity. It is easy to see that, like for ORAS, probe sizes for AOAS will not be larger than O(m n), where n is the number of variables and m bounds the number of abstract states per variable. Note that theorem 1 does not imply that AOAS on T is the same as ORAS on the DFS ordering of T with the same a. In fact, the former can lead to more informed probes that express more solutions as we show next. Contrasting Scalability. Figures 4(b), 4(c), and 4(d) show probes generated by ORAS, p AOAS, and AOAS, respectively, sampling over a search tree guided by the pseudo tree in Figure 4(a). ORAS ends up expanding 26 nodes, 8 of which are retained in the ﬁnal probe constituting only two solutions: (X, Y, Z, T, R, L=0, M=0) and (X, Y, Z, T, R, L=0, M=1). p AOAS inevitably expands and keeps the entire AND/OR search tree of 42 nodes! However, since it samples a compact AND/OR search space, this nevertheless corresponds to a relatively many captured solutions (128 in this this case). AOAS only expands 20 nodes (fewer than both of the other schemes), and results in a probe size of 11 (comparable to ORAS). Further more, its ﬁnal probe corresponds to 16 solutions, far more than that by ORAS. Thus, we see that AOAS is much more equipped than p AOAS to limit its probe size whilst able to take advantage of an AND/OR search space. Cutset Sampling. An improvement to any IS sampling scheme can be made by sampling over a subset of the variables using cutset decomposition called cutset sampling [Dechter, 2013; Bidyuk and Dechter, 2007]. Using the weighted mini-bucket heuristic it is possible to identify when we reach a node for which the heuristic is exact (i.e. h(n) = h (n) = V (n)). When this occurs, there is no longer a need to sample below the node as the value of the subtree is known. Utilizing this method, the number of variables sampled is bounded by what is known as the i-cutset, which in turn bounds the size of the probes. Consequently, this leads

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

(a) Example pseudo tree

(b) ORAS; 8-nodes; 2-solutions

(c) p AOAS; 42-nodes, 128-solutions

(d) AOAS; 11-nodes, 16-solutions

Figure 4: Contrasting Scalability. AS probes sampled from a search space corresponding to the pseudo tree in (a). Nodes are abstracted based on having the same domain value. White nodes were discarded through the abstraction process. Light-yellow nodes are selected representatives that are later pruned. Pruning is depicted by a red X . Dark-yellow nodes constitute the ﬁnal probe. (b) a probe by ORAS; (c) a probe by p AOAS; (d) a probe by AOAS

to a greater number of samples and improves accuracy, as we reference in our empirical analysis.

4 Empirical Evaluation

Algorithms. Experiments were run on three classes of AS algorithms: AOAS, p AOAS, and ORAS (using a depth-ﬁrst ordering of the pseudo tree). The AS algorithms were also tested against high performing state-of-the-art Dynamic Importance Sampling (DIS) [Lou et al., 2019] using an equaltime policy. All algorithms were implemented in C++. All experiments were run for 1 hr on a 2.66 GHz processor with 8 GB of memory (24 GB for the Linkage-Type4 benchmark).

Abstraction Functions. We design abstractions to reduce variance within abstract states by using the notion of a variable s context. Formally, the context of a variable X identiﬁes a subset C(X) of its ancestor in a pseudo-tree T whose assignment uniquely determines the AND/OR subtree below it [Dechter and Mateescu, 2007]. Intuitively, the context of a variable is its ancestors that directly affect its value. Thus, abstractions based on a subset of the context aim to group nodes

Problem Size Total Bnds AOAS AOAS>

DBN small 66 62 57 47 large 48 40 38 35

Grids small 8 5 5 2 large 19 7 7 6

Linkage1 large 82 82 82 82

Pedigree small 24 24 24 19

Promedas small 65 58 49 29 large 173 165 141 113

Table 1: How often AOAS estimates: fall within DIS probabilistic bounds ( Bnds), were comparable/better than DIS s (AOAS ), and were strictly better than DIS s (AOAS >)

based on having similar values. AOAS and ORAS utilize relaxed context-based (Rel CB) and randomized context-based (Rand CB) abstractions as in [Broka et al., 2018]. Rel CB is parametrized by a level j, selecting the closest j 1 variables from a variable s context (ie. its relaxed context) plus itself. It abstracts nodes of the same domain value that also share the same assignment to the relaxed context. This yields kj abstract states at each level, assuming domain size of k. Rel CB0 vacuously groups all nodes of a variable into a single abstract state and is referred to as a Knuth Abstraction. The randomized scheme, Rand CB, is parameterized by a level d determining the number of abstractions per level, leading to a varying number of nodes in each abstract state. p AOAS utilizes a variant of Rel CB that maintains properness. It is parameterized by levels j and k resulting in a j-level abstraction for nodes having no more than k branching variables in the path to the root and 0-level abstractions below it.

Heuristics. To inform the sampling proposal, Weighted Mini-Bucket Elimination (WMBE) [Dechter and Rish, 2003; Liu and Ihler, 2011] is used as a heuristic. The i-bound parameter controls the strength of WMBE, where higher ibounds generally lead to stronger heuristics and, thus, better proposals at the expense of higher computation and memory. We standardize our experiments by using i-bound 10.

Benchmarks. We perform high-throughput experiments on over 480 problems from ﬁve well known benchmarks: DBN, Grids, Linkage-Type4, Pedigree, and Promedas. For brevity, we show detailed aggregated statistics on only large problem instances, thus excluding Pedigree, whose problems were all small and results relatively uniform across all algorithms. Average statistics for the benchmarks can be found in Figure 5.

Performance Measure. To evaluate the performance of the various algorithms, we calculate error as: error = log10 ˆZ log10Z , where Z is the partition function, log10 ˆZ is the log10 of the experimentally obtained Z estimate, and log10Z is the reference log10Z value. When the exact Z value is unknown, an empirical estimate based on an average over 100 1hr of abstraction sampling is used as the reference. We veriﬁed that 98% of these estimates fell within the 95% probabilistic bounds determined by DIS.

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

(c) Linkage-Type4

(d) Promedas

Figure 5: Aggregated statistics. Displayed are the number of problems solved (n*), average log10Z error (log(err)), count of problems solved within an error threshold (error distr.), average number of probes (#probes), and average size of a probe (#nodes/probe). Color bars visually show the magnitude of the values, and darker colors show greater values. Red n* cells indicate an algorithm s inability to solve relatively many problems. Lines in bold indicate the best performing algorithms. Each benchmark also displays the average number of nodes (n), domain size (d), tree width (w), and AND/OR and OR search tree height (h) of its problems.

4.1 Results

For extended results, please view the full paper1.

Aggregation Tables. Data from experiments on the same benchmark using a speciﬁc AS scheme and abstraction level are combined in the aggregation tables (Figure 5). Importance Sampling corresponds to Rel CB-0. At the top of each table (in red) we also report the average log10Z error for DIS. To help the reader identify AS algorithm conﬁgurations that performed particularly well, rows are bolded where (1) a relatively large number of problems are solved and (2) the error is within a factor of 0.2 of the minimum error within the set of algorithms solving the greatest number of problems.

Representative Plots. Figure 6 provides a representative plot from the Grids benchmark. The legend includes probe statistics and log10Z error values for each algorithm. A

1https://www.ics.uci.edu/ dechter/publications.html

DIS plot is also overlaid onto each subplot for comparison. Although plots vary between benchmarks and problem instances, here we attempt to show a plot that captures the main trend of our data. In the majority of plots, we noticed AOAS converged towards the reference Z value faster, and had overall better performance than both ORAS-DFS and p AOAS. This was also true compared to DIS in many cases. We can also see the anytime nature of the algorithms as they continue to converge towards the reference Z value over time.

AOAS vs DIS Table. Table 1 compares AOAS Rand CB256 and DIS. For each benchmark (partitioned into small and large instances), we tally how often AOAS s Z estimates: (i) fall within DIS probabilistic bounds, (ii) were comparable or better than DIS s2, or (iii) were strictly better than DIS s. Note that in order to compare Abstraction Sampling to DIS

2comparable means falling within 0.1 or 0.5 of the DIS log10Z value, for small and large problems respectively

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

Figure 6: Plots of AS and DIS on a representative large Grid problem. The dashed line marks the reference log10Z value. The legend marks the algorithm and abstraction level, number of probes (#p), size of a probe (#n/p), and log10Z error (est. error).

fairly, we selected only a single set of Abstraction Sampling parameters (AOAS Rand CB-256).

4.2 Analysis

AOAS Performance. The aggregation tables (Figure 5) show that for each benchmark (except Promedas) AOAS algorithms are among the top performers. Furthermore, AOAS stands out as the only best performing algorithm when solving both large Grids and Linkage-Type4 problems. Note that, for Linkage-Type4 (Figure 5(c)), AOAS with Rel CB-8 is able to produce estimates for 41 problems, where the best of any other scheme, ORAS-DFS, is only able to produce estimates for 16 problems.

AOAS vs. p AOAS. A noticeable empirical difference between AOAS and p AOAS is their respective abilities to control their probe sizes with more reﬁned abstractions. For example, within the Grids aggregation table (Figure 5(b)) both AOAS and p AOAS Rel CB-0 have average probe size of 3326, however, upon increasing the abstraction level of each individually, probe sizes for p AOAS explode. This is a crucial difference as being able to temper the growth of probe sizes is instrumental to the scalability of the algorithms with increasing abstraction levels.

AOAS vs. ORAS-DFS. Despite the relationship stated in Theorem 1, AOAS regularly outperforms ORAS-DFS across the same abstraction level (see Figure 5). We believe that this is due to the smaller probe size of AOAS as compared with ORAS-DFS (for the same abstraction schema), which we believe is impacted signiﬁcantly by the cutset effect [Mateescu and Dechter, 2005]. Namely, the number of i-cutset variables in the AND/OR space is smaller than that of the corresponding the OR space and, as noted earlier, since the heuristic is exact below the cutset, only the cutset variables are sampled by the abstraction sampling schemes.

Comparing with Non Abstraction Sampling Schemes. [Broka et al., 2018] already showed that earlier versions of AS are highly competitive against Weighted Mini-Bucket Importance Sampling [Liu et al., 2015] and IJGP-Sample Search

[Gogate and Dechter, 2011]. By superseding the performance of the earlier AS algorithms, we also further solidify superiority over these non-AS state-of-the-art schemes. In addition, we also test against a new competitive sampling scheme, Dynamic Importance Sampling (DIS) [Lou et al., 2017; Lou et al., 2019], which produces probabilistic bounds as well as estimates in an anytime manner. Through Table 1 we see that not only do AOAS estimates fall within the bounds produced by DIS in most cases, but also that the estimates are comparable or better than that of DIS. This is particularly true of hard problems. We also note that unlike AS algorithms (AOAS in particular), DIS was unable to generate estimates for Linkage-Type4 problems. Furthermore, the aggregated errors for AS are often smaller than that of DIS s (Table 5).

5 Summary and Conclusion The paper presents a new algorithm for abstraction sampling over AND/OR search spaces that can accommodate any abstraction function. In particular it freed AND/OR abstraction sampling from its earlier handicapping restriction of properness and opens up the horizon for far more scalable and effective performance. We provided analysis and extensive empirical evaluation over 5 benchmarks with large and hard problems showing clearly that the new AOAS algorithm is overall superior to some of the most competitive state-of-the-art IS schemes and to previous AS schemes. We illustrated that, like previous AND/OR AS schemes, AOAS maintains the ability to exploit the decomposition expressed in AND/OR search spaces, yet it also has far better control of probe-size thus making it uniquely scalable. This gives AOAS the power to more smoothly interpolate between (stochastic) sampling and (systematic) search [Broka et al., 2018]. With our ﬁndings, we can now turn our attention to advancing the scheme along its central components including the development of good abstraction functions beyond context-based abstractions, understanding the desired strengths of a heuristic function and how it should be balanced with the strength of abstractions, and ﬁnally, how we should ﬁt an abstraction level to a problem instance.

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

[Bidyuk and Dechter, 2007] Bozhena Bidyuk and Rina Dechter. Cutset sampling for bayesian networks. J. Artif. Intell. Res. (JAIR), 28:1 48, 2007.

[Broka et al., 2018] Filjor Broka, Rina Dechter, Alexander T. Ihler, and Kalev Kask. Abstraction sampling in graphical models. In Proceedings of the Thirty-Fourth Conference on Uncertainty in Artiﬁcial Intelligence, UAI 2018, Monterey, California, USA, August 6-10, 2018, pages 632 641, 2018.

[Chen, 1992] P.-C. Chen. Heuristic sampling: A method for predicting the performance of tree searching programs. SIAM Journal on Computing, 21:295 315, 1992.

[Darwiche, 2009] A. Darwiche. Modeling and Reasoning with Bayesian Networks. Cambridge University Press, 2009.

[Dechter and Mateescu, 2007] Rina Dechter and Robert Mateescu. AND/OR search spaces for graphical models. Artiﬁcial Intelligence, 171(2-3):73 106, 2007.

[Dechter and Rish, 2003] Rina Dechter and Irina Rish. Minibuckets: A general scheme for bounded inference. J. ACM, 50(2):107 153, 2003.

[Dechter, 2013] Rina Dechter. Reasoning with Probabilistic and Deterministic Graphical Models: Exact Algorithms. Synthesis Lectures on Artiﬁcial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2013.

[Gogate and Dechter, 2011] Vibhav Gogate and Rina Dechter. Samplesearch: Importance sampling in presence of determinism. Artif. Intell., 175(2):694 729, 2011.

[Knuth, 1975] D.E. Knuth. Estimating the efﬁciency of backtracking algorithms. Math. Comput., 29:1121 136, 1975.

[Liu and Ihler, 2011] Qiang Liu and Alexander T. Ihler. Bounding the partition function using holder s inequality. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011, pages 849 856, 2011.

[Liu et al., 2015] Qiang Liu, John W Fisher III, and Alexander T Ihler. Probabilistic variational bounds for graphical models. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 1432 1440, Montreal, Canada, 2015. Curran Associates, Inc.

[Lou et al., 2017] Qi Lou, Rina Dechter, and Alexander T. Ihler. Dynamic importance sampling for anytime bounds of the partition function. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 3199 3207, 2017.

[Lou et al., 2019] Qi Lou, Rina Dechter, and Alexander Ihler. Interleave variational optimization with monte carlo sampling: A tale of two approximate inference paradigms. 2019.

[Mateescu and Dechter, 2005] Robert Mateescu and Rina Dechter. And/or cutset conditioning. In International Joint Conference on Artiﬁcial Intelligence (Ijcai-2005), 2005. [Pearl, 1988] J. Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, 1988. [Rizzo, 2007] Maria L. Rizzo. Statistical computing with R. Chapman & Hall/CRC, 2007. [Rubinstein and Kroese, 2007] Reuven Y. Rubinstein and Dirk P. Kroese. Simulation and the Monte Carlo Method (Wiley Series in Probability and Statistics). 2 edition, 2007.

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)