# lifted_weighted_minibucket__16569266.pdf Lifted Weighted Mini-Bucket Nicholas Gallo University of California Irvine Irvine, CA 92637-3435 ngallo1@uci.edu Alexander Ihler University of California Irvine Irvine, CA 92637-3435 ihler@ics.uci.edu Many graphical models, such as Markov Logic Networks (MLNs) with evidence, possess highly symmetric substructures but no exact symmetries. Unfortunately, there are few principled methods that exploit these symmetric substructures to perform efficient approximate inference. In this paper, we present a lifted variant of the Weighted Mini-Bucket elimination algorithm which provides a principled way to (i) exploit the highly symmetric substructure of MLN models, and (ii) incorporate high-order inference terms which are necessary for high quality approximate inference. Our method has significant control over the accuracy-time trade-off of the approximation, allowing us to generate any-time approximations. Experimental results demonstrate the utility of this class of approximations, especially in models with strong repulsive potentials. 1 Introduction Many applications require computing likelihoods and marginal probabilities over a distribution defined by a graphical model, tasks which are intractable in general [24]. This has motivated the development of approximate inference techniques with controlled computational cost. Inference in these settings often involves reasoning over a set of regions (subsets of variables), with larger regions providing higher accuracy at a higher cost. This paper utilizes the Weighted Mini-Bucket (WMB) [10] algorithm which employs a simple heuristic method of region selection that mimics a variable elimination procedure. Recently, there has been interest in modeling large problems with repeated potentials and structure, often described with a Markov Logic Network (MLN) language [16]. Such models arise in many settings such as social network analysis (e.g. estimating voting habits), collective classification (e.g. classifying text in connected web-pages), and many others [16]. In these settings, lifted inference refers to a broad class of techniques, both exact [3, 15, 21] and approximate [4, 11, 13, 5, 12, 9], that exploit model symmetries. Most of these methods work well when the model possesses welldefined symmetries [4, 11, 13, 5, 12, 14], but break down in unpredictable ways in the presence of unstructured model perturbations present in most practical settings. The problem of asymmetries in the approximate inference structure is compounded when higher order inference terms (for which lifted inference requires higher order model symmetry) are incorporated [14, 20, 5]. Methods to control computational cost in the presence of asymmetries are largely heuristic, such as [19] which presents a belief propagation procedure that approximates messages in a symmetric form. Other works create an over-symmetric approximate model [23, 22] on which inference is run, but provide no guarantees on its relation to the original problem. Similar to our work, [17] employ (non-weighted) mini-bucket inference; however, they too rely on over-symmetric model approximation heuristics to control computational cost. This paper addresses the shortcomings of the methods described above with a lifted variant of Weighted mini-bucket (LWMB) that is able to (i) trade-off inference cost with accuracy in a controlled 32nd Conference on Neural Information Processing Systems (Neur IPS 2018), Montréal, Canada. manner in the presence of asymmetries, and (ii) incorporate higher-order approximate inference terms, which are crucial for high quality inference. This work can be seen as a high-order (property (ii)) extension of [9] (which is qualitatively identical in property (i)). Additionally, this work employs efficient region selection and representation for MLN models, and hence never grounds the graph as many others are required to do for symmetry detection (e.g. [9, 20, 14]). 2 Background A Markov random field (MRF) over n discrete random variables (RVs) X = [X1 . . . Xn] taking values x = [x1 . . . xn] (X 1 . . . X n) has probability density function p(X = x) = 1 α I fα(xα); Z = X where I indexes subsets of variables and each α I is associated with potential table fα. The partition function Z normalizes the distribution. Calculating Z is a central problem in many learning and inference tasks, but exact evaluation of the summation is exponential in n, and hence intractable. 2.1 Bucket and Mini-Bucket Elimination Bucket Elimination (BE) [6] is an exact inference algorithm that directly eliminates RVs along a sequence o called the elimination order. Without loss of generality, we assume that each factor index α I is ordered according to o. BE operates by performing the summation (2) along each RV in sequence. The computation is organized with a set of buckets B1 . . . Bn where initially each Bv = {fα | α1 = v} is the set of model factors whose earliest eliminated RV index is v. Proceeding sequentially along o, we multiply the factors in Bv, then sum over xv producing a message mv w(xpa(v)) = X α Bv f α(xα) (1) where pa(v) is the arguments of factors in Bv not including v. The message is then placed in bucket Bw. If w = the message is a scalar. All such messages are multiplied to form Z. The computational cost is exponential in the scope of the largest message, which is prohibitive in most applications. Mini-Bucket Elimination (MBE) Mini-Bucket Elimination [7] avoids the complexity of BE by upper (or lower) bounding the message (1) as the product of terms each over a controlled number i Bound of RVs. During elimination, factors in bucket Bv are grouped into partitions Qv = {q1 v . . . qk v}, where each qj v Qv is called a mini-bucket and is associated with factors that (collectively) use at most i Bound + 1 RVs. The true message is bounded using the inequality α Bv f α(xα) X α q1v f α(xα) α qj v f α(xα) (2) Each message is an upper bound on the exact message, hence the full procedure yields an upper bound on Z. 2.2 Weighted mini-bucket elimination (WMB) WMB [10] generalizes MBE by using a tighter bound based on Holder s inequality x g(x) h(x) x h(x), where x f(x)1/w w (3) is the power-sum operator and w 0, h(x) 0, g(x) 0. The power-sum reduces to standard sum when w = 1 and approaches maxx f(x) as w 0+. Thus, Holder s inequality generalizes the inequality P x g(x) h(x) P x g(x) maxx h(x) used by mini-bucket elimination (MBE). WMB associates a weight wq 0 with each mini-bucket q Qv where P q Qv wr = 1 for all v, then forms the bound α Bv f α(xα) Y α q f α(xα). (4) Variational Optimization. The weights can be optimized to provide tighter bounds. Additionally, applying WMB to any parameterization of the distribution yields a bound on Z, thus it makes sense to optimize over all valid parameterizations as well. Each parameterization is obtained by shifting factor potentials between mini-buckets. That is, for each v, associated with each mini-bucket q Qv is the cost-shifting parameter φq(xv) such that mq q (xq ) = xv f φ q (xq) where f φ q (xq) = φq(xv) 1 Y α q f α(xα) ( q Qv) (5) is the reparameterized potential of bucket q. The cost-shifting terms that were divided out of each q Qv are multiplied into an aggregated cost-shifting term φ0 v(xv) = Y q Qv φq(xv) (6) such that Q q Qv fq(xq) = Q q Qv f φ q (xq) φ0 v(xv). We then have the following bound (rather than (4)) on the exact BE message α Bv f α(xα) wv X q Qv φq(xv) Y q Qv mq q (xq ) (7) Augmented with wv 0, this is simply another term in the product that was bounded with Holder s inequality we require wv + P q Bv wq = 1. We search for the tightest bound by performing convex optimization over (δ,w) where log(φq) = δq for all q. Gradients can be computed and a black-box solver can be used, or a fixed point iteration [10] can be used. 2.3 Symmetric models and lifted inference Many models of interest, such as MLNs, are defined by repeated potentials organized in a symmetric structure. Lifted inference refers to a broad class of techniques that exploit this structure for exact or approximate inference. The basic idea is to represent identical terms in the model and identical terms generated during ground inference implicitly with a single template. The simplest form of symmetry used for lifted inference is based on the stable vertex coloring of a graph [1] in which two vertices of the same color have identically colored neighborhoods. In the context of lifted inference, we require a stable coloring of the ground factor graph where factor nodes of the same color are required to have the same ordered node neighborhood and factor potential table. Nodes of the same color behave identically during approximate inference (e.g. [18, 12]). RV nodes of the same color are grouped together to form the index Vi {1 . . . n} and denote V = {V1 . . . VN}. Factor nodes of the same color are grouped together to form Aj I and denote I = {A1 . . . AM}. Thus, given a stable partition (coloring) we can define Definition 2.1. The lifted scope of A I (relative to V) is σA = [Vb1 . . . Vbk] where each α A has |α| = k and αi σA i for i = 1 . . . k. This simple symmetry will help us organize higher order symmetries throughout the LWMB algorithm. Note, that the lifted scope (unlike the scope of a ground factor) may have repeated elements. This occurs, for example in a complete symmetric graph, where N = 1 and σA = [V1, V1]. 2.3.1 Markov Logic Networks A Markov Logic Network (MLN) [16] defines a large symmetric model implicitly via a compact first order logic (FOL) language. The MLN predicates defines the set of RVs, and each MLN formula defines a set of factors with identical potential. Both are parameterized compactly by a set of logical variables (LVs) taking values in a finite domain (for example the set of all people P ). The predicates of an MLN represent an attribute associated with domain elements or a relationship among domain elements. The instantiation of a predicate with a domain element is the index of a model RV. For example, the attribute predicate Sm (for smokes) over the domain of all people ( P ) corresponds to the set of ground RV indices {Sm(y) | y P } (meaning that Sm(Ana) indexes x Sm(Ana)). An example relationship predicate Fr (for friends) among all pairs of people is the set of indices {Fr(x, y) | x = y P }.1 A formula specifies a soft-logic rule applied identically to all people (or groups of people). An example relating smoking habits between friends is ( y = z P ) Fr(y, z) (Sm(y) Sm(z)), γ . This corresponds to the set of R = { [Fr(y, z), Sm(y), Sm(z)] | y = z P } and where each α R, fα(xα) = f R( x R) where f R( x R) is a template with log potential taking value γ if Fr(y, z) (Sm(y) Sm(z)) is true and 0 otherwise ( Fr(y,z) corresponds to x1 in the template). The FOL expressions defining formulas can be arbitrary, but they often have the form of simple domain constraints on LVs, with all-diff constraints on LVs ranging over identical domain (note all-diff(y,z) is equivalent to y = z used in Fr-Smoker formula above). The stable coloring of the factor graph groups predicate RVs together and factors associated with each formula together. Formulas of this form also possess many higher order symmetries which we exploit later. 3 Lifted Weighted Mini-Bucket (LWMB) This section presents a variant of WMB that operates on lifted factors, each of which is a group of identical ground factors, and eliminates blocks of random variables simultaneously. A key difficulty is choosing an approximating structure that guarantees symmetric messages (which can be represented as a lifted factor) are produced and, furthermore, that allows forming high order (high i Bound) symmetric inference terms. We first discuss these operations in models that possess the necessary symmetric structure, then discuss modifications that allow us to control the size of the LWMB graph in the presence of model asymmetries. Algorithm 1 summarizes the LWMB tree construction algorithm (similar to ground mini-bucket construction [7]) developped in this section. 3.1 LWMB in Symmetric Models First order symmetries specified by a stable partition of variables V = {V1 . . . VN} and model factors I = {A1 . . . AM} are necessary to provide a LWMB bound. We further require that the lifted scope σA not have repeated elements (we relax both of these restrictions later). The computations for LWMB will be described by their equivalence to a set of ground operations. To this end, we define Definition 3.1. A lifted factor FG(x G) = Q α G f G(xα) is the product of the template potential f G( x) applied to all sets of ground RVs indexed by elements of G, which range over the same domain as x used to define the template.2 Blocks of ground RVs indexed by V V are eliminated simultaneously, along a lifted elimination order O. We assume the lifted scope σA of all lifted factors in the input and generated during inference are ordered by O. The computation is organized with a set of buckets {BV1 . . . BVN } where initially each BV = {FA | σA 1 = V } is the set of lifted model factors whose earliest eliminated lifted RV block is V . Lifted multiplication Having collected lifted factors in lifted buckets, an RV partition V will be processed by first forming lifted mini-buckets QV = {q1 V . . . qk V } each of which groups together and multiplies lifted factors. The lifted product corresponds to a product of ground terms and may, in general, not have a lifted factor representation. This situation can cause symmetries to break 1In general, can have > 2 LVs and > 1 domain type 2x G abuses notation, refering to x G where G = α G α is the set of all RVs used by elements of G Algorithm 1 LWMB Tree Build 1: Input: Lifted model factors I , RV partition V , lifted elimination order O , i Bound 2: BV = { FA | σA 1 = V, A I } 3: Initialize empty set of messages and empty QV V V 4: for (V = First(O ); v = ; v Next(V, O )) do 5: repeat 6: Select q, q QV s.t. Q A q q FA(x A) Lifted multiply 7: is valid and has template size i Bound + 1 8: Bv Bv (q q ) \ {q, q } Merge MBs, delete old 9: For all b, replace Mb q or Mb q with Mb (q q ). Re-route incoming messages 10: until q = q = 11: for q Bv do Simulate mini-bucket message pass 12: FC = Q A q FA(x A) Get C: ground regions associated with product 13: Set p to indices where σC = σC 1 14: D = {cp | c C}, σD = σC p D : scope indices of ground messages 15: Add {D} to BσD 1 and add message pointer mq {D}. 16: end for 17: end for 18: Output Mini-Buckets {QV | V V }, and messages structure {mq q | q V V QV } in arbitrarily complex ways (e.g., during lifted variable elimination; see [15],[21]). We need to understand lifted multiplication to design LWMB bounds that avoid this situation. To compute the lifted product FT (x T ) = FR(x R)FS(x S), we require a symmetric join, for given index vectors p and q, to exist. This means there exists a T where for each t T, tp R and tq S, and furthermore that for each r R, |{t | tp = r, t T}| = |T|/|R|, meaning that each r participates in the same number of elements of T in position p (and similarly for S). We then have FT (x T ) = Y t T f T (xt) = Y t T f R(xtp)|R|/|T |f S(xtq)|S|/|T | (8) where f T ( x) = f R( xp)f S( xq). In the simplest case, |R| = |S| = |T| and there is a one-to-one mapping, corresponding to a series of standard ground multiplications. Otherwise, it corresponds to spreading a ground factor across many identical (up to renaming of RVs) ground multiplications. If the symmetric join does not exist, we say the lifted multiplication is invalid. Only one set of p and q can be valid when the lifted scopes have unique elements. The lifted scope of the multiplication (if valid) will be σT = [σR, σS]. Hence we set p and q such that σT p = σR and σT q = σS. This matches the lifted factors on first order symmetries, which is a necessary but not sufficient condition for higher order symmetries. Symmetric join with FOL formulas If R and S are represented with FOL formulas and each contains only domain constraints, the symmetric join can be performed quickly (or determine none exists). If the domain constraints between R and S are either disjoint or identical, we simply match the two on their LVs with identical domain. For example, multiplying factors defined by {[A(x), B(x)] | x 1} and {[A(x), C(x)] | x 1} produces {[A(x), B(x), C(x)] | x 1}. As another example, {[A(x), B(x)] | x 1} and {[A(x), C(z)] | x 1, z 3} will produce {[A(x), B(x), C(z)] | x 1, z 3} if 1 3 = . The main algorithm in section 4 uses many symmetric lifted factors of this form. Lifted cost-shifting For each V , each lifted mini-bucket q BV is associated with a weight Wq and cost-shifting lifted factor Φq V (x V ) with template φq( x). We form (analogous to 5) F Φ Q(x Q) = Φq V (x V ) 1 Y A q F A(x A) (9) where F A represents lifted factors arising from model or a message in mini-bucket, Q represents the set of ground factors associated with the product of all of mini-bucket q s lifted factors. (a) Ground graph (b) Initial LWMB graph (c) LWMB graph after split Figure 1: (a) Symmetric graph with potential f on each edge and a distinct unary potential at each node, (b) LWMB graph with partition A = {a(1), a(2)}, B = {b(1), b(2), b(3)} and lifted elimination order (A, B), (c) LWMB graph with B partitioned into B(1) = {b(1)} and B(2) = {b(2), b(3)}. Each RV partition is connected to its associated ground RVs with a horizontal edge through a solid square node. Each other horizontally oriented edge (e.g. between nodes (A) and (A, B) in panel (b)) is associated with a cost-shifting term. Lifted message passing. The message Q sends should be a lifted factor equal to the product of identical messages sent from each α Q. The result will be a lifted factor over D = {α\α1 | α Q}. Since each v V appears |Q|/|V | times in Q (due to symmetry), each ground factor should be eliminated with weight wq = Wq/(|V |/|Q|) yielding the template m q q ( x \ x1) = wq X x1 f Φ Q( x) |Q|/|D| (10) The power |Q|/|D| arises since each d D receives (by symmetry) |Q|/|D| copies of the message. The lifted factor message, denoted Mq q (x D) has template m q q applied at all indices in D. 3.2 Handling asymmetries The exact symmetries necessary to perform lifted inference rarely exist in practice. In the extreme case, model asymmetries cause lifted algorithms to ground the model. Here, we extend LWMB to handle asymmetries (i) induced by the elimination order (to prevent, for example, grounding a complete symmetric graph), and (ii) induced by unstructured unary evidence. Sequential Asymmetries. Suppose for a lifted factor FG there are K = |σG = σG 1 | > 1 copies of the earliest variable partition in the lifted scope σG. In this case, any ground elimination order will treat RVs in the partition differently (hence requires grounding). The way around this is to eliminate K RV s simultaneously Pwq x K Pwq x1 f G( x), where wq = Wq/(|V |/|Q| K). This can be justified by applying Holder s inequality with any elimination order and appropriately tied weights, noting that the power-sum with tied weights commute (details omitted for space). Distinct Unary Evidence. The LWMB bound can be modified to incorporate distinct single-RV potentials. The trick (similar to [9]) is to aggregate the lifted cost-shifting terms and multiply the result with the ground unary terms. That is, define xv fv(xv) φV (xv) (11) where φV ( x) = Q q QV φq V ( x).3 Figure 1b illustrates a LWMB graph with aggregated approximate evidence terms. 4 Coarse to fine LWMB for MLN models In this section we build a sequence of LWMB approximations of gradually increasing accuracy and computational cost. Starting with a LWMB tree using the coarsest possible partition, we iteratively 3Identical potentials, such hard evidence, can be grouped into a single computation (omitted for space). Algorithm 2 Coarse To Fine LWMB for MLNs 1: Initialize Choose elimination order O on MLN predicates 2: Initialize Build LWMB tree with MLN predicates and formulas 3: repeat 4: Associate unary evidence with lifted RVs To compute (11) during inference 5: Optimize bound over ( δ, W ) 6: FQ F Φ Q/(Mq q ) q See Maintaining Monotonicity 7: Set domain partition d { 1 d, 2 d} via gradient cluster or another method 8: Split ( δ, W ), V, and lifted factors that use domain d Section 4.1 9: Update lifted elimination order O 10: Build LWMB tree with new lifted MB regions and RVs as input 11: until Exact answer computed improve the approximation with Splitting and Joining operations. Splitting partitions the cost-shifting parameters (eq. (11)) into finer groupings, allowing a more flexible interaction with evidence. Joining incorporates high-order inference terms by performing LWMB Tree Build with high i Bound. This procedure is summarized in Algorithm 2 and described in this section. An example of the effect of splitting on the LWMB graph in Figure 1b is show in Figure 1c. 4.1 Splitting LWMB splitting is similar to the splitting operation presented in [9] for factor-graph models, but operates by partitioning a group of MLN domain elements rather than a group of variational parameters (as in [9]). That is, we partition a single domain into two disjoint domains ( (1) (2) = and (1) (2) = ). Then, we split all lifted factors and RV partitions that use M 1 LVs with domain into 2M finer lifted factors. An important property of this splitting scheme is that lifted factors with FOL form are split into lifted factors with FOL form. Example 1. A variable partition V = {Sm(y) | y )} splits into {{Sm(y) | y (i)} | i {1, 2}}. A lifted factor FR with R = {[Fr(y, z), Sm(y), Sm(z)] | y = z } splits into 4 lifted factors FR where f R = f R, w R = w R and where R {{[Fr(x, y), Sm(x), Sm(y)] | x = y, x (i), y (j)} | (i, j) {1, 2}2}. Another important property is that lifted factors with M > 1 LVs with the same domain split into lifted factors with LVs of all distinct domains. Lifted factors of this form can participate in lifted multiplications resulting in higher order joins (section 3). For example, in Example 1 when i = j, the x = y constraint is superfluous and can be dropped. Updating the lifted elimination order. The domain split causes an RV partition V V to split into V 1, . . . , V 2M . Since the previous LWMB bound eliminated RVs x V simultaneously, we must eliminate RVs sequentially V 1, . . . , V 2M in the same position in O relative to other RV partitions. For example, in Figure 1c (A, B(1), B(2)) is a valid elimination order while (B(2), A, B(1)) is not. Maintaining Monotonicity. Modifications to the LWMB tree (either splitting or joining) can cause unpredictable changes in the message structure. Qualitatively, we must ensure that each new region can send a forward message, creating new regions as necessary. Such structural modifications to the flow of messages can cause an increase in the bound. To guarantee a monotonically improving bound, we replace each lifted factor with the cost-shifted mini-bucket functions divided by their forward message FQ F Φ Q/(Mq q ) (a similar idea was used in the ground case in [8]). This is simply a reparameterization of the model, but ensures that each node in the LWMB tree sends a uniform message of all 1 s. Hence, after split or join we can simply call LWMB Tree Build with the reparameterized terms and guarantee a monotonic bound improvement. In practice, we apply this technique only to nodes in the tree affected by a split or join operation, leaving the message structure of other nodes unchanged. Choosing the domain partition. The goal is to cheaply find a reasonably good split grouping. A similar problem was considered in [9]. They compute a 2-way clustering of the gradient of their inference objective with respect to variational parameters. The main idea is if parameters are constrained to be identical (for computational improvement with lifting) but their greedy unconstrained next step 100 101 102 Inference time (seconds) log Z Upper Bound i Bound=5 i Bound=2 No Join (a) γ = 0.5 100 101 102 Inference time (seconds) log Z Upper Bound (b) γ = 0.05 10-1 100 101 102 Inference time (seconds) log Z Upper Bound (c) γ = +0.05 Figure 2: Repulsive (γ < 0) vs. attractive (γ > 0) collective classification example. (a)-(b)High i Bound is extremely important in the presence of strong repulsive potentials, (c) but performs slightly worse than the baseline ( No join ) in the attractive case. Dashed black lines indicate when a batch of splitting occurs for the blue curve (split transitions for other curves occur at similar locations). would have been similar, then the lifting restriction incurs little error. Here, we perform a similar operation, but a 2-way split of a domain partition d induces a split of many parameters, associated with lifted RVs that use domain d, into 2 groups. Our clustering objective is a sum over squared error of all these terms (details omitted for space). 5 Experiments This section provides an empirical illustration of our LWMB algorithm. We demonstrate the superiority of utilizing high order LWMB approximations for models with repulsive potentials. In models with strictly attractive potentials, low order approximations work slightly better, likely due to their ability to split more, obtaining better approximations of the evidence. Setup. We consider a standard collective classification MLN with formula ( x = y )L(x, y) (C(x) C(y)), γ. If γ < 0, a hard true observation on link L(x, y) induces a repulsive potential between C(x) and C(y). γ > 0 induces attractive potentials. We run experiments with N = | | = 512, with clustered evidence. We randomly assign elements of to one of K = 16 clusters. Evidence on C predicate has potentials [0; a]. Each cluster generates a (scalar) center on N(0, 2) each member of the cluster is then perturbed from its center by N(0, 0.4) noise. Relational evidence on L is generated as follows: each of the K blocks has all true evidence with each other block with probability 0.25. We then flip 25% of the evidence uniformly at random. Optimization. We call a black-box convex optimization (using non-linear conjugate gradients) allowing a maximum of 1000 function evaluations. The gradient of the LWMB objective (derivation omitted for space) is computable in time roughly equal to the cost of evaluating the objective. Timing. We report only time spent doing inference (optimization), and allow each method 250 seconds total. Inference is the algorithmic bottleneck, and code has been written in C++. The rest of Algorithm 2 simply updates the LWMB structure (performing less work than a single inference iteration) but is coded in MATLAB and thus yields unreliable timing. We note that other works on lifted inference report only inference time [12, 2, 9], yet incur significant overhead of symmetry detection (that could require touching the ground model) which we never do. Results. Figure 2 shows results for (a) strongly repulsive case, (b) weakly repulsive case, (c) weakly attractive case. Strongly attractive (γ = 0.5) was qualitatively similar to (c) and omitted for space.4 We see in (a) that lifted inference with higher order terms significantly outperforms the fully relaxed ( No Join ) method. In the attractive case (c) higher order performs slightly worse. We believe this is because the cheaper inference method builds approximations of finer resolution. For panel (c), by the end, Blue performed 31 splits, Red 60 splits, and Green 73 splits. 4Bumps in the curves are due to optimization initialization. Numerical issues arise when power-sum weights are small. Hence at the beginning of each optimization we floor them at 10 4. Acknowledgements This work is sponsored in part by NSF grants IIS-1526842, IIS-1254071, by the United States Air Force under Contract No. FA9453-16-C-0508, and DARPA Contract No. W911NF-18-C-0015. [1] C. Berkholz, P. Bonsma, and M. Grohe. Tight lower and upper bounds for the complexity of canonical colour refinement. In European Symposium on Algorithms, pages 145 156. Springer, 2013. [2] G. V. d. Broeck and M. Niepert. Lifted probabilistic inference for asymmetric graphical models. ar Xiv preprint ar Xiv:1412.0315, 2014. [3] H. B. Bui, T. N. Huynh, and R. de Salvo Braz. Exact lifted inference with distinct soft evidence on every object. In AAAI, 2012. [4] H. H. Bui, T. N. Huynh, and S. Riedel. Automorphism groups of graphical models and lifted variational inference. ar Xiv preprint ar Xiv:1207.4814, 2012. [5] H. H. Bui, T. N. Huynh, and D. Sontag. Lifted tree-reweighted variational inference. ar Xiv preprint ar Xiv:1406.4200, 2014. [6] R. Dechter. Bucket elimination: A unifying framework for reasoning. Artificial Intelligence, 113(1-2):41 85, 1999. [7] R. Dechter and I. Rish. A scheme for approximating probabilistic inference. In Proc. Uncertainty in Artificial Intelligence, pages 132 141. Morgan Kaufmann Publishers Inc., 1997. [8] S. Forouzan and A. T. Ihler. Incremental region selection for mini-bucket elimination bounds. In UAI, pages 268 277, 2015. [9] N. Gallo and A. Ihler. Lifted generalized dual decomposition. In AAAI. AAAI Press, 2018. [10] Q. Liu and A. T. Ihler. Bounding the partition function using holder s inequality. In ICML-11, pages 849 856, 2011. [11] M. Mladenov, B. Ahmadi, and K. Kersting. Lifted linear programming. In AISTATS, pages 788 797, 2012. [12] M. Mladenov, A. Globerson, and K. Kersting. Lifted message passing as reparametrization of graphical models. In UAI, pages 603 612, 2014. [13] M. Mladenov and K. Kersting. Equitable partitions of concave free energies. In UAI, pages 602 611, 2015. [14] M. Mladenov, K. Kersting, and A. Globerson. Efficient lifting of MAP LP relaxations using k-locality. In AISTATS, pages 623 632, 2014. [15] D. Poole. First-order probabilistic inference. In IJCAI, volume 3, pages 985 991, 2003. [16] M. Richardson and P. Domingos. Markov logic networks. Machine learning, 62(1):107 136, 2006. [17] P. Sen, A. Deshpande, and L. Getoor. Bisimulation-based approximate lifted inference. In Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence, pages 496 505. AUAI Press, 2009. [18] P. Singla and P. M. Domingos. Lifted first-order belief propagation. In AAAI, volume 8, pages 1094 1099, 2008. [19] P. Singla, A. Nath, and P. M. Domingos. Approximate lifting techniques for belief propagation. In AAAI, pages 2497 2504, 2014. [20] D. Smith, P. Singla, and V. Gogate. Lifted region-based belief propagation. ar Xiv preprint ar Xiv:1606.09637, 2016. [21] N. Taghipour, D. Fierens, J. Davis, and H. Blockeel. Lifted variable elimination with arbitrary constraints. In International Conference on Artificial Intelligence and Statistics, pages 1194 1202, 2012. [22] G. Van den Broeck and A. Darwiche. On the complexity and approximation of binary evidence in lifted inference. In Advances in Neural Information Processing Systems, pages 2868 2876, 2013. [23] D. Venugopal and V. Gogate. Evidence-based clustering for scalable inference in Markov logic. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 258 273. Springer, 2014. [24] M. J. Wainwright, M. I. Jordan, et al. Graphical models, exponential families, and variational inference. Foundations and Trends R in Machine Learning, 1(1 2):1 305, 2008.