# the_limits_of_tractable_marginalization__66ec53c0.pdf

The Limits of Tractable Marginalization

Oliver Broadrick * 1 Sanyam Agarwal * 2 Guy Van den Broeck # 1 Markus Bl aser # 2

Marginalization summing a function over all assignments to a subset of its inputs is a fundamental computational problem with applications from probabilistic inference to formal verification. Despite its computational hardness in general, there exist many classes of functions (e.g., probabilistic models) for which marginalization remains tractable, and they can be commonly expressed by polynomial size arithmetic circuits computing multilinear polynomials. This raises the question, can all functions with polynomial time marginalization algorithms be succinctly expressed by such circuits? We give a negative answer, exhibiting simple functions with tractable marginalization yet no efficient representation by known models, assuming FP = #P (an assumption implied by P = NP). To this end, we identify a hierarchy of complexity classes corresponding to stronger forms of marginalization, all of which are efficiently computable on the known circuit models. We conclude with a completeness result, showing that whenever there is an efficient real RAM performing virtual evidence marginalization for a function, then there are small circuits for that function s multilinear representation.

1. Introduction

Marginalization is a fundamental computational problem. In machine learning for example, many architectures are designed around an instance of marginalization. In Bayesian learning (e.g., Bayesian neural networks), model parameters are marginalized to obtain a posterior distribution; variational autoencoders approximate a marginalization over

*,#Equal contribution. 1Department of Computer Science, University of California, Los Angeles, United States 2Department of Computer Science, Saarland University, Saarbr ucken, Germany. Correspondence to: Oliver Broadrick <odbroadrick@gmail.com>, Sanyam Agarwal <agarwal@cs.unisaarland.de>.

Proceedings of the 42 nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025. Copyright 2025 by the author(s).

their latent space; autoregressive and discrete diffusion models are trained to predict (conditional) marginal probabilities (Kingma et al., 2019; Bishop & Bishop, 2023). In the case of boolean valued functions, marginalization corresponds to the well known task of model counting, with applications from formal verification to cryptography (Gomes et al., 2021; Arteta et al., 2016).

Unfortunately, marginalization is a notoriously hard problem in general. As models become increasingly expressiveefficient, able to succinctly express larger classes of distributions, they often sacrifice the tractability of marginalization (Cooper, 1990; Roth, 1996). However, there is a vast body of work aimed at identifying classes of functions for which efficient marginalization remains tractable, including well known probabilistic models like hidden Markov models, determinantal point processes, and probabilistic circuits in general, as well as logical representation languages like d DNNF (Rabiner & Juang, 1986; Kulesza et al., 2012; Choi et al., 2020; Darwiche, 2001). Moreover, a recent line of work has shown how known models with tractable marginalization can be commonly viewed as arithmetic circuits computing multilinear polynomials, resulting in a unified model that we formalize as uniform finally multilinear arithmetic circuits (UFMACs) (Zhang et al., 2020; 2021; Agarwal & Bl aser, 2024; Broadrick et al., 2024b).

The fact that known functions with tractable marginalization can be expressed by polynomial size UFMACs begs the question. Do all probability distributions over binary random variables with tractable marginalization have polynomial size (uniform) finally multilinear arithmetic circuits? A positive answer to this question would show that the long line of work on tractable models has successfully arrived at a maximally expressive-efficient modeling language for tractable marginalization. We give a negative answer, suggesting the possibility of yet more expressive-efficient tractable models.

To rigorously answer this question, we define the tractability of marginalization for a function in terms of its polynomial time computability. Then, we observe that functions with UFMACs are actually tractable for more powerful forms of marginalization than standard variable marginalization. Specifically, we show that UFMACs support tractable summation over inputs of a given Hamming weight (Section 4)

The Limits of Tractable Marginalization

d-DNNFs, SPNs

PVM PHM PM FP

Figure 1. Relationships between complexity classes for tractable marginalization. UFMAC contains functions efficiently expressible using known tractable models including determinantal point processes (DPPs) and probabilistic generating circuits (PGCs) as well as the subclass USMAC containing for example, deterministic decomposable negation normal forms (d-DNNFs) and sum product networks (SPNs). PM is the set of all functions with polynomial time marginalization. PHM and PVM consist of functions that are tractable for Hamming weight marginalization and virtual evidence marginalization respectively (see Sections 4 and 5). We exhibit the function faff which is contained in PM\PHM assuming FP = #P. FP is the class of polynomial time computable functions, included for context.

and support marginalization in the presence of virtual evidence (Section 5). We show that the class of functions tractable for each of these queries form a hierarchy of inclusions, and we prove that certain inclusions are strict by exhibiting separating functions assuming standard complexitytheoretic conjectures (i.e., P = NP): see Figure 1 for a visual summary. This reveals distributions that are tractable for marginalization but cannot be computed by a UFMAC, answering our central question.

While it turns out that not all distributions with tractable marginalization can be expressed by UFMACs, we then ask the corresponding question for the stronger forms of marginalization. We find that UFMACs are complete for tractable virtual evidence marginalization in the real RAM model of computation in the sense that if there is a polynomial time real RAM that performs virtual evidence marginalization for a function, then the function has UFMACs.

2. Tractable Arithmetic Circuits

Classical probabilistic models like hidden Markov models (Rabiner & Juang, 1986) and Chow-Liu trees (Chow & Liu, 1968) have efficient algorithms for marginalization based on sums and products of probabilities. For general graphical models this remains true, albeit with the caveat that the efficiency of the algorithm depends on the underlying graph structure. In the nineties, it was observed that such algorithms naturally correspond to the computation of certain multilinear polynomials (Castillo et al., 1995; Darwiche & Provan, 1997). Actually, a direct representation of such a polynomial can be exponentially more succinct than an equivalent graphical model (Roth & Samdani, 2009), and so recent tractable probabilistic models have focused on representing the underlying polynomial as an arithmetic circuit, a computation graph consisting of sums and products. Known models with tractable marginalization can be viewed in this

common language, including bounded-treewidth graphical models (Chow & Liu, 1968; Meila & Jordan, 2000; Rabiner & Juang, 1986; Koller, 2009), determinantal point processes (Kulesza et al., 2012), probabilistic sentential decision diagrams (Kisa et al., 2014), sum-product networks (Poon & Domingos, 2011), cutset networks (Rahman et al., 2014), characteristic circuits (Yu et al., 2024), and probabilistic generating circuits (Zhang et al., 2021; Harviainen et al., 2023; Bl aser, 2023; Broadrick et al., 2024a). In this section, we formalize the unified tractable model as finally multilinear arithmetic circuits (UFMACs). A UFMAC expresses a function f : {0, 1}n Q by its multilinear polynomial and represents the polynomial by an arithmetic circuit.

2.1. Multilinear Polynomials

Consider a function f : {0, 1}n Q. Among infinitely many polynomials that compute f on {0, 1}n, there is a unique multilinear polynomial (a fundamental object; for example, see O Donnell (2014)). A polynomial is multilinear if its degree in every individual variable is at most one. The unique multilinear polynomial computing f can be found by interpolation:

p(x1, . . . , xn) = X

S [n] f(v S) Y

i/ S (1 xi)

where [n] = {1, 2, . . . , n} and v S denotes the characteristic vector of the set S, that is, the element of {0, 1}n with ith entry 1 for i S and ith entry 0 for i / S. It turns out that the tractability of marginalization for a function f can be characterized to some extent in terms of the ability to evaluate its multilinear polynomial p on elements of Qn

beyond just {0, 1}n, as we will see in Sections 3 to 5.

The Limits of Tractable Marginalization

2.2. Arithmetic Circuits

Arithmetic circuits are a thoroughly studied model for the computation of polynomials, an algebraic analogue to boolean circuits: see Shpilka et al. (2010) and Saptharishi (2015) for surveys.

Definition 2.1 (Arithmetic Circuits). An arithmetic circuit over a field F in variables X is a directed acyclic graph. A node with in-degree 0 is an input node and is labeled with an element in F or X. All other nodes are sum or product nodes, labeled accordingly by + or . The node with out-degree 0 is the output node.1

Each node in an arithmetic circuit computes a polynomial: (i) each leaf computes the polynomial by which it is labeled, (ii) each sum node computes the sum of the polynomials computed by its children, and (iii) each product node computes the product of the polynomials computed by its children. The polynomial computed by a circuit is the polynomial computed by its output node.

Classes of arithmetic circuits that compute multilinear polynomials are themselves well studied (Nisan & Wigderson, 1996; Raz, 2009; Raz et al., 2008). A circuit is called syntactically multilinear if every product node has the property that its children mention disjoint sets of variables. An arithmetic circuit is called (semantically) multilinear if every node in the circuit computes a multilinear polynomial. However, there exist multilinear polynomials for which the only known polynomial size circuits do not conform to either of these (syntactic or semantic) multilinearity properties. One notable example is the determinant, a multilinear polynomial central to determinantal point processes (Kulesza et al., 2012), a popular tractable probabilistic model.

2.3. Uniform Finally Multilinear Arithmetic Circuits

In this work we consider a class of arithmetic circuits computing multilinear polynomials with (minimal) restrictions to guarantee efficient evaluation by a uniform algorithm, general enough to express known probabilistic models with tractable marginalization. In particular, while syntactically and semantically multilinear circuits force all internal nodes to compute multilinear polynomials, we allow intermediate nodes to accumulate higher degree in each variable. However, if intermediate degrees are allowed to grow arbitrarily, then the circuits can compute polynomials like x2n + 22n

which cannot be efficiently evaluated (in polynomial time). We thus strike a balance, defining finally multilinear arithmetic circuits whose internal nodes compute polynomials of polynomially bounded degree, high enough to allow flexible algorithms like determinants, but low enough to guarantee polynomial time evaluation of the circuit at rational

1We consider arithmetic circuits with a single output node, though in general they can have multiple.

points. Indeed, it is an open question whether polynomials like the determinant can be efficiently computed without this additional flexibility (Shpilka et al., 2010, Open Problem 3). In the following we deal with families of circuits (Cn)n=1,2,... = (C1, C2, . . .) with each Cn a circuit, often using (Cn) as shorthand for (Cn)n=1,2,....

Definition 2.2 (Finally Multilinear Circuits). The arithmetic circuit family (Cn) is finally multilinear if for n = 1, 2, . . .,

Cn computes a multilinear polynomial, and

after replacing each of the constants in Cn, c1, . . . , ck F, by fresh variables z1, . . . , zk, the polynomial computed by any internal node (in variables x1, . . . , xn, z1, . . . , zk) has total degree at most polynomial in n.

The only additional restriction needed to ensure that a circuit family can be efficiently evaluated at rational points is uniformity, which requires that there be an efficient method for obtaining a description of any circuit from the family. Uniformity rules out the unrealistic power of nonuniform circuit families which can, for example, compute undecidable languages.

Definition 2.3 (Uniform Circuits). An arithmetic circuit family (Cn)n=1,2,... is uniform if there exists a polynomial time Turing machine which on input 1n outputs Cn (in some reasonable fixed representation: see Appendix A.1).

We note that if an arithmetic circuit family (Cn) is uniform, then (Cn) is of polynomial size (since time bounds space). Definitions 2.2 and 2.3 together provide sufficient conditions for a circuit to be efficiently evaluated at rational points.

Lemma 2.4. Let (Cn) be a uniform finally multilinear arithmetic circuit family. Then there exists a polynomial time Turing machine which on input x1, x2, . . . , xn Q computes Cn(x1, x2, . . . , xn).

The efficient evaluation algorithm promised by Lemma 2.4 is simple: on input n rationals, first obtain the circuit Cn by uniformity, and then evaluate it in the natural way. We give the full proof that this is guaranteed to take polynomial time in Appendix A.2. The class of functions whose multilinear polynomial is computable by uniform finally multilinear arithmetic circuits constitutes the class of functions for which efficient marginalization algorithms are currently known, and thus the class of circuits central to this paper.

Definition 2.5 (The Class UFMAC). A function family (fn) with2 fn : {0, 1}n Q is in UFMAC if there exists a uniform finally multilinear arithmetic circuit family computing the family of multilinear representations of (fn).

2Note that fixing an encoding Q {0, 1} allows functions {0, 1}n Q to be viewed as functions {0, 1}n {0, 1} .

The Limits of Tractable Marginalization

X1 X2 X3 f 0 0 0 0.05 1 0 0 0.15 0 1 0 0.1 1 1 0 0.3 0 0 1 0.06 1 0 1 0.18 0 1 1 0.04 1 1 1 0.12

p(x1, x2, x3) = .1x1 + .05x2 + .1x1x2 + .01x3 .07x2x3 + .02x1x3 .14x1x2x3 + .05

p(x1, x1, x2, x2, x3, x3) = 0.05 x1 x2 x3 + 0.15x1 x2 x3 + 0.1 x1x2 x3 + 0.3x1x2 x3 + 0.06 x1 x2x3 + 0.18x1 x2x3 + 0.04 x1x2x3 + 0.12x1x2x3.

Figure 2. An example function f (probability mass function over three binary random variables) and its multilinear polynomial p, network polynomial p, and a multilinear arithmetic circuit computing p.

An important subclass of UFMAC is the analogously defined class of uniform syntactically multilinear arithmetic circuits (USMAC). A majority of known tractable models (both probabilistic and logical) belong to this subclass including probabilistic sentential decision diagrams (Kisa et al., 2014), sum-product networks (Poon & Domingos, 2011), d-DNNF (Darwiche, 2001), and more. Determinantal point processes are the best known example of a tractable model known to live in UFMAC but not known to live in USMAC.

In the next section we formalize the problem of marginalization, and show that efficient marginalization is easy for functions in UFMAC, giving a simple, unified view of existing marginalization algorithms.

Complexity basics. For notions from basic computational complexity theory, including complexity classes P, FP, NP, and #P, we use standard definitions and notation (Arora & Barak, 2009). We note that while the famous classes P and NP contain decision problems (functions of the form f : {0, 1} {0, 1}), FP and #P can be viewed as natural analogues for general functions, f : {0, 1} {0, 1} . In particular, FP contains all functions f : {0, 1} {0, 1}

computable in polynomial time. Moreover, FP = #P is a well known conjecture in computational complexity theory and is a weaker assumption than P = NP in the sense that P = NP implies FP = #P. Therefore, our results which assume FP = #P also hold if P = NP.

3. Tractable Marginalization

We define tractable marginalization in a natural way. For every function family f = (fn) with fn : {0, 1}n Q we define a corresponding marginalization problem and then call f tractable for marginalization if its marginalization

problem is polynomial time computable. Definition 3.1 (MAR(f)). Let f = (fn) with fn : {0, 1}n Q. The marginalization problem for f, denoted MAR(f), gets input m {0, 1, }n and outputs P

x Xm fn(x) where

Xm = {x {0, 1}n : xi = mi if mi {0, 1}}.

Intuitively, mi {0, 1} means that the ith input is fixed (xi = mi), and mi = means the ith input is marginalized, taking values freely in {0, 1}. Example 3.2 (Marginalization). Consider the function f shown for n = 3 in Figure 2. Suppose we wish to marginalize over X2 and X3 with the evidence X1 = 0. Specifically, on input m = (0, , ) to MAR(f), we get

Xm = {000, 010, 001, 011}

MAR(f)(m) = X

v Xm p(v) = 0.05+0.1+0.06+0.04 = 0.25.

Definition 3.3 (The Class PM). Let f = (fn) with fn : {0, 1}n Q. Then f PM if MAR(f) is polynomial time computable.

If f PM, we say f is tractable for variable marginalization. If f has boolean codomain, then variable marginalization can be viewed as the task of model counting on f after the substitution of variables given by any (partial) assignment. Of course, if a boolean function (family) f is tractable for variable marginalization, then the function itself, f(x), can be computed in polynomial time, by taking m = x. Proposition 3.4. PM FP.

Moreover, it is not hard to see that a function being tractable for marginalization is a much stronger condition than the

The Limits of Tractable Marginalization

function itself being tractable to compute. There exist many natural problems in FP\PM. For example, if FP = #P, then every #P-complete problem provides a function in FP\PM.

Proposition 3.5. If FP = #P, then for every #P-complete function f, there is a function g FP\PM.

Examples of functions in FP\PM given by Proposition 3.5 include verifiers for NP-complete problems (e.g., satisfiability problems) as well as functions that do not correspond to NP-complete problems (e.g., a suitably defined indicator function for perfect matchings).

3.1. UFMAC has Tractable Marginalization

Having shown functions in FP\PM (assuming FP = #P), we now consider functions in PM. Such functions include those expressible by the many existing tractable models that can be commonly viewed as uniform finally multilinear circuits. Efficient marginalization for functions in UFMAC follows from (i) the fact that FMACs can be evaluated at rational points in polynomial time and (ii) that to compute any marginalization of f it suffices to evaluate the multilinear polynomial representation of f at a single point. The former follows from Lemma 2.4. The elegant latter fact follows from a very simple argument using linearity of expectation given below for completeness, which has been observed before (Juma et al., 2009).

Lemma 3.6. Let f : {0, 1}n Q have multilinear polynomial p. Then,

x {0,1}n f(x) = 2np(1/2, . . . , 1/2).

Proof. We have

x {0,1}n p(x) = 2n E xp(x1, . . . , xn)

= 2np(E x1x1, . . . , E xnxn)

= 2np(1/2, . . . , 1/2)

where the expectation is over the uniform distribution, and the middle equality follows by linearity of expectation.

(The above lemma can also be viewed as extracting the first Fourier coefficient of f, by first translating the domain {0, 1} to { 1, 1} by x 7 1 2x and evaluating at zero.) With the ability to compute sums of a multilinear polynomial with a single evaluation, we get that functions with uniform finally multilinear arithmetic circuits are tractable for marginalization.

Proposition 3.7. UFMAC PM.

Proof. Let f UFMAC be computed by the UFMAC (Cn). We need to provide a polynomial time Turing machine that computes MAR(f). On input m {0, 1, }n, first obtain Cn by uniformity of f. Then, for any mi {0, 1}, replace any node in Cn labeled by xi with mi. Note that multilinearity of the polynomial computed is preserved under this substitution, and all remaining variables now need to be marginalized. By Lemma 3.6, it suffices to evaluate the remaining circuit with all inputs set to 1/2, which takes polynomial time by Lemma 2.4.

4. Hamming Weight Marginalization

A key observation of our work is that UFMACs actually enable more powerful forms of marginalization. In this section we introduce the first such form, Hamming weight marginalization. The Hamming weight of a binary string x {0, 1}n is the number of ones in it. Where standard marginalization asks for the sum of a function s values over all assignments with some subset of the inputs fixed, Hamming weight marginalization asks for the sum of its values only over the inputs of a given Hamming weight. In the case that the underlying function represents a probability distribution, Hamming weight marginalization enables conditioning on the Hamming weight of the binary random variables, or on the size of the random set, if the variables are interpreted as the characteristic vector of a set. Algorithms for this problem are known in some special cases of UFMACs, for example in structured syntactically multilinear circuits (Vergari et al., 2021) and in determinantal point processes (Kulesza & Taskar, 2011; Calandriello et al., 2020).

Definition 4.1 (HMAR(f) and PHM). Let f = (fn) with fn : {0, 1}n Q. The Hamming weight marginalization problem for f, denoted HMAR(f), takes input m {0, 1, }n and k {0, 1, . . . , n} and gives output P

x Xm,k fn(x) where

Xm,k = Xm {x {0, 1}n : |x| = k}.

Moreover, f PHM if HMAR(f) is polynomial time computable.

Intuitively, the input m is the same as in MAR(f), and the additional input k further restricts the x Xm,k being summed over to those with Hamming weight |x| = k. If f PHM, we say f is tractable for Hamming weight marginalization. Example 4.2 (Hamming Weight Marginalization). Recall the distribution shown in Figure 2. Suppose we want to sum over X2 and X3, fixing X1 = 0, all strings of Hamming weight k = 1. Specifically, we have input m = 0 , k = 1 to HMAR(f). From Example 3.2, Xm = {000, 001, 010, 011}. So,

Xm,1 = {010, 001}

The Limits of Tractable Marginalization

HMAR(f)(m, 1) = X

v Xm,1 p(v) = 0.1 + 0.06 = 0.16

We first observe that a function tractable for Hamming weight marginalization, is also tractable for variable marginalization, since any variable marginalization query is the sum of n + 1 Hamming weight marginalization queries for all k {0, 1, ..., n}. Proposition 4.3. PHM PM.

We now show that UFMACs support tractable Hamming weight marginalization. To do so, we use a useful circuit transformation. For any multilinear polynomial p(x1, . . . , xn) = P

S [n] c S Q

i S xi with c S F, the network polynomial for p is the multilinear polynomial

p(x1, x1, . . . , xn, xn) = X

S [n] p(v S) Y

where v S {0, 1}n is the characteristic vector of S. There is a polynomial time algorithm for transforming an arithmetic circuit computing a multilinear polynomial to a circuit computing its network form. Lemma 4.4 (Broadrick et al. (2024b)). Given an arithmetic circuit of size s computing polynomial p in n variables, an arithmetic circuit computing the network polynomial for p can be constructed in time O(sn).

With the ability to efficiently transform UFMACs to compute their network polynomials, we can efficiently perform Hamming weight marginalization for UFMACs. Proposition 4.5. UFMAC PHM.

Proof. Let f UFMAC. We need to provide a polynomial time Turing machine that computes HMAR(f). On input m {0, 1, }n, k {0, 1, . . . , n}, first obtain a FMAC that computes fn by uniformity of f. Using Lemma 4.4, obtain a FMAC computing its network polynomial p(x, x). We will now evaluate p as follows. For each i, if mi = 0, set xi = 0 and xi = 1; if mi = 1, set xi = t and xi = 0; if mi = , set xi = t and xi = 1. Note that t is a single indeterminate/symbol used for all such xi. Evaluating the circuit produces a univariate polynomial in t. The coefficient of the monomial of degree k is the desired quantity. To see this, consider some monomial p(v S) Q

i/ S xi in p. The monomial evaluates to zero if v S does not agree with m (i.e., if there is some mi = 1 but i / S or mi = 0 but i S). Moreover, the degree of the monomial after the substitution is |S|, the Hamming weight of v S, as needed.

Example 4.6 (Hamming Weight Marginalization). Recall Example 4.2. We summed over X2 and X3 with X1 = 0, only strings of Hamming weight equal to 1. Following the proof of Proposition 4.5, we compute

p(0, 1, t, 1, t, 1) = 0.05 + 0.1t + 0.06t + 0.04t2,

and extract the coefficient of tk = t. Hence, HMAR(f)(m, 1) = 0.16, as in Example 4.2.

4.1. Separating PHM from PM with CSPs

Our construction of a function that is tractable for marginals but #P-hard for Hamming queries exploits known dichotomy theorems for constraint satisfaction problems (CSPs). Constraint satisfaction problems concern conjunctions of constraints, each an application of some boolean relation R {0, 1}k to variables x1, . . . , xk (not necessarily distinct), i.e., R(x1, . . . , xk) = 1 if and only if (x1, . . . , xk) R. For some finite set Γ of boolean relations (called a constraint language), a Γ-formula is a conjunction of constraints each an application of some relation in Γ. The constraint satisfaction problem CSP(Γ) gives a Γ-formula ϕ as input and asks whether there is a satisfying assignment for ϕ. The problem k-ONES(Γ) gives a Γ-formula ϕ and an integer k as input and asks whether there is a satisfying assignment for ϕ with exactly k ones. The counting versions of these problems #CSP(Γ) and #k-ONES(Γ) ask for the number of satisfying assignments (in total and with k ones respectively). Clearly, the complexity of these problems depends on the set of relations Γ.

There are elegant dichotomy theorems for these problems, stating that, depending on Γ, these problems are either polynomial time computable or #P-hard. We say a relation is affine if it is logically equivalent to a system of linear equations over GF(2), the finite field containing two elements.

Theorem 4.7 (Creignou & Hermann (1996)). If Γ contains only affine relations, then #CSP(Γ) is computable in polynomial time. Otherwise, #CSP(Γ) is #P-complete.

To the extent that #CSP(Γ) is a CSP analogue to variable marginalization, #k-ONES(Γ) is an analogue to Hamming weight marginalization. There is a similar dichotomy, but with fewer tractable cases, for #k-ONES(Γ). We say a relation is width-k affine if it is logically equivalent to a system of linear equations over GF(2) with each equation having at most k variables with a nonzero coefficient.

Theorem 4.8 (Creignou et al. (2010)). If Γ contains only width-2 affine relations, then #k-ONES(Γ) is computable in polynomial time. Otherwise, #k-ONES(Γ) is #P-complete.

Leveraging these results from the CSP literature, we now construct a simple function for which standard variable marginalization is tractable but Hamming weight marginalization is #P-hard; in other words, we find a function tractable for marginals for which the existence of a UFMAC would imply FP = #P. Consider the function faff : {0, 1}2n3+n {0, 1} in variables (xi)i [n],

The Limits of Tractable Marginalization

(yijk)i,j,k [n]3, and (zijk)i,j,k [n]3 abbreviated to x, y, z:

faff(x, y, z) =

i,j,k [n]3 yijk xi xj xk

i,j,k [n]3 yijk zijk. (1)

We first observe that marginalization of faff is computable in polynomial time, mirroring a tractable case of Theorem 4.7. After substituting in any partial assignment, faff specifies a linear system over GF(2). Performing Gaussian elimination reveals k linearly independent equations. If k n, then there are no satisfying assignments. If k < n, then there are 2n k satisfying assignments, as each affine equation (parity constraint) independently halves the number of solutions.

Proposition 4.9. faff PM.

To see that Hamming weight marginalization on faff is hard, we reduce from #k-ONES(Γ) for a chosen Γ consisting of only width-three affine relations. The reduction encodes a width-three formula as evidence (a partial assignment) to faff, maintaining a relationship between the Hamming weight of the original formula and the resulting query. Intuitively, this may work because of the additional width of faff, though we do not rule out the possibility of a lower-width function also working. In the reduction, the variables yijk are used to turn on and turn off the width-3 equations in variables xi as needed. The variables zijk and the constraints yijk zijk are then used to balance the Hamming weight of satisfying assignments since exactly one of yijk and zijk will be one.

Theorem 4.10. HMAR(faff) is #P-hard. That is, faff / PHM unless FP = #P.

Proof. Consider the following constraint language:

Γ = {a b c} = {{(0, 0, 1), (0, 1, 0), (1, 0, 0), (1, 1, 1)}}.

By Theorem 4.8, #k-ONES(Γ) is #P-complete. We show the #P-hardness of HMAR(faff) by providing a deterministic, polynomial time reduction from #k-ONES(Γ). We get input a Γ-formula ϕ(x1, . . . , xn) and an integer k {0, 1, . . . , n}. We construct an evidence string m for HMAR(f) as follows, consisting of an entry in {0, 1, } for each variable xi, yijk, zijk. For any constraint xi xj xk = 1 in ϕ, set yijk = 0 and zijk = 1. All other variables are marginalized, i.e., their entry in m is set to . Then #k-ONES(Γ)(ϕ, k) = HMAR(faff)(m, k + n3). To see this, suppose ϕ(x) = 1; we find the only y and z such that f(x, y, z) = 1, and we then observe that |x, y, z| = |x|+n3. Every constraint of ϕ is satisfied, and so every width-4 constraint in f with yijk = 0 is satisfied. For constraints xi xj xk not in ϕ, the corresponding width-4 clause in f is satisfied by setting the free variable yijk to

whichever value (0 or 1) is necessary. The values zijk are then set to the opposite of the values of yijk which satisfies the remaining width-2 clauses of f. For every i, j, k [n]3

we have zijk = yijk, and so |y| + |z| = n3.

Having shown that HMAR(faff) is #P-hard, we have established that, unless FP = #P, the function faff is not in UFMAC. In particular, this means that faff cannot be efficiently represented by known tractable logical representation languages like d-DNNF or BDD, nor by tractable probabilistic models like SPNs or DPPs. This answers our central question negatively, showing that known models for tractable marginalization do not characterize the class of functions with tractable marginalization, suggesting the potential for yet more expressive-efficient tractable models. We remark that our separating example avoids the known efficient reductions from weighted to unweighted variants of counting problems, for example in weighted model counting (Chakraborty et al., 2015) and weighted CSP counting variants (Bulatov et al., 2012). We also note that Theorem 4.10 partially resolves an open question on the succinctness of languages in the knowledge compilation literature (Koriche et al., 2013) as described in Appendix D.

5. Virtual Evidence Marginalization

Interestingly, the computational problems we have studied so far can be characterized by the ability to evaluate the multilinear polynomial representation of the underlying function on increasingly large domains. To efficiently compute f itself, one need only efficiently evaluate its multilinear polynomial p on {0, 1}n. To efficiently compute MAR(f), it suffices to efficiently evaluate p on {0, 1/2, 1}n. For HMAR(f), it suffices to evaluate p on a set of the form n 2 i=1 ({0, 1/2, 1, ai}n) for distinct ai R as shown below in Proposition 5.3. This begs the question: What power is afforded by the ability to evaluate the multilinear polynomial at all rational points? Nicely, this task corresponds naturally to probabilistic inference in the presence of virtual evidence.

Virtual evidence is a well known generalization of the standard notion of (hard) evidence (Bilmes, 2004). Where observing hard evidence means that the probabilities for some subset of outcomes in a distribution be zeroed out , virtual evidence allows the rescaling of the probabilities by arbitrary (nonnegative) weights. Given a probability distribution over variables X1, . . . , Xn, observing conditioning on the hard evidence that Xi = 1 can be viewed as scaling the probability for all outcomes with X = 1 by a factor of 0 (and then renormalizing the distribution). On the other hand, it is often necessary, for example in Bayesian belief updates, to multiply the probability of outcomes with Xi = 1 by factors other than 0.

We consider a general version of the problem of updat-

The Limits of Tractable Marginalization

ing a distribution given virtual evidence, though multiple formulations are possible (Chan & Darwiche, 2005). Let X1, . . . , Xn be random variables supported on {0, 1} with joint probability mass function p(x1, . . . , xn). Given scaling factors α1, α1, . . . , αn, αn R 0 with either αi > 0 or αi > 0 for each i, the mass function obtained after observing (α1, α1, . . . , αn, αn)-virtual evidence is given by

p(x1, . . . , xn)

i=1 (αixi + αi(1 xi)) (2)

up to normalization. (Note that ignoring normalization for distributions in PM is not a computational issue, as normalization requires a single call to MAR(f), namely with all coordinates marginalized.) We formulate the problem of virtual evidence marginalization as follows. Definition 5.1 (VMAR(f) and PVM). Let f = (fn) with fn : {0, 1}n Q, and let p = (pn) be the family of multilinear polynomials computing f. The virtual evidence marginalization problem for f, denoted VMAR(f), gets input x1, x2, . . . , xn Q and outputs p(x1, . . . , xn). Moreover, f PVM if VMAR(f) is polynomial time computable.

If f PVM, we say f is tractable for virtual evidence marginalization. That VMAR(f) enables the incorporation of virtual evidence in the sense of Equation (2) follows by a straightforward application of Lemma 4.4 which we give in Appendix B. We immediately get that VMAR(f) can be solved for UFMACs by simply evaluating (Lemma 2.4). Proposition 5.2. UFMAC PVM.

The fact that functions in UFMAC permit efficient conditioning on virtual evidence generalizes known algorithms for special cases, including Bayesian networks with bounded treewidth (Bilmes, 2004) and certain syntactically multilinear circuits (Chan, 2017; Liu & Van den Broeck, 2021), with Proposition 5.2 giving a generalization of them. It is also straightforward to reduce Hamming weight marginalization to virtual evidence marginalization. Proposition 5.3. PVM PHM.

Proof. Suppose VMAR(f) is polynomial time computable. Consider the algorithm given in the proof of Proposition 4.5. This algorithm defines a univariate polynomial q(t) in terms of the multivariate (multilinear) polynomial p(x1, . . . , xn) by setting each xi to either a constant or to t. The degree of q(t) is at most n. Therefore, all the coefficients of q(t) can be recovered by interpolation by evaluating at n + 1 distinct points using the polynomial time algorithm for VMAR(f). Now, HMAR(f) is simply the appropriate coefficient, as per the proof of Proposition 4.5.

Intuitively, this shows that despite Definition 5.1 containing no explicit summation, there is in some sense a hidden

marginalization problem in every VMAR query. On the one hand, multiplying the inputs by arbitrary rationals can be viewed as updating the distribution given virtual evidence, but on the other hand, we have observed (Lemma 3.6) that marginalization of f can be reduced to evaluating the multilinear polynomial p for f at 1/2, ..., 1/2. Therefore the evaluation of p at arbitrary rational points a1, ..., an can be viewed as first applying virtual evidence corresponding to 2a1, ..., 2an to x1, ..., xn to get 2a1x1, ..., 2anxn and then evaluating at x1 = ... = xn = 1/2, i.e., marginalizing. In particular, to compute a marginal probability given some virtual evidence, a single VMAR(f) query suffices.

Intuitively, VMAR(f) captures the computational power afforded by a UFMAC, which raises the question: are UFMACs complete for virtual evidence marginalization in the sense that any function f with an efficient algorithm (Turing machine) for VMAR(f) in fact has a UFMAC? That is, do we have UFMAC = PVM? This is the version of a question (restricted to multilinear polynomials) which has been asked before and appears challenging to answer (Koiran & Perifel, 2011). We are able, however, to give such a completeness result in the real RAM model of computation.

5.1. Completeness for Real RAMs

Given the apparent difficulty of answering this question for discrete computation models (i.e., Turing machines) and that the question naturally extends to real inputs, we consider the real RAM model of computation in which real numbers are stored in constant space and manipulated by constant time arithmetic. See Appendix C for a brief introduction to the model, though we refer the reader to Erickson et al. (2024) for a full presentation. First, we extend the notion of UFMACs to the real RAM model. We say a function family f = (fn) with f : {0, 1}n R with multilinear representation p = (pn) has a real RAM UFMAC if there exists a polynomial time real RAM which on input 1n outputs an arithmetic circuit Cn computing pn, in some fixed standard representation (see Appendix C.1). Note that we no longer require the bound on degree since intermediate values are always a single word in the real RAM computation, avoiding the problem of circuit evaluation by Turing machines. We also extend the definitions of MAR, HMAR, and VMAR in the natural way to the real RAM model. We then observe that if f has a polynomial time real RAM computing VMAR(f), then the f has real RAM UFMACs, showing that UFMACs are in that sense complete for virtual evidence marginalization. The proof, given in Appendix C, observes that if the real RAM computes the polynomial correctly on all points, then there is some branch of the computation (i.e., ignoring comparisons) which computes it on all points, from which an arithmetic circuit can be recovered. Proposition 5.4. Let f = (fn) with fn : {0, 1}n Q. If there exists a polynomial time real RAM computing

The Limits of Tractable Marginalization

VMAR(f), then there exists a real RAM UFMAC computing f.

6. Related Work and Conclusion

We highlight some closely related research areas. While we focus on theoretical limits of marginalization, there is much work on leveraging theoretical progress in practice (Sladek et al., 2023; Loconte et al., 2023; 2024; Wang & Van den Broeck, 2025). Moreover, probabilistic models with tractable marginalization, in addition to quickly developing as performant probabilistic models in their own right (Liu et al., 2024a), also appear as integral components in recent proposals for control and alignment of deep generative models (Zhang et al., 2023; 2024; Liu et al., 2024b; Ahmed et al., 2022) as well as numerous other applications (Wedenig et al., 2024; Saad et al., 2021). Where we study exact marginalization, approximate methods form their own vast area (Hoffman et al., 2013; Liu et al., 2023). We considered functions of binary variables and note work on productive reductions to this setting (Garg et al., 2024; Cao et al., 2023). Given our use of affine relations, we observe that such parity constraints appear in other work on boolean knowledge representation both in algorithms and languages (Fargier & Marquis, 2008; Koriche et al., 2013; Fargier & Marquis, 2014; de Colnet & Mengel, 2021; Chakraborty et al., 2021).

In summary, despite known models with tractable marginalization being expressible as UFMACs, we show that there exist functions with tractable marginalization, yet no UFMAC assuming FP = #P. On the other hand, UFMACs support the more powerful Hamming weight marginalization and virtual evidence marginalization, being complete for the latter in the real RAM model of computation.

We conclude with two open questions. First, is the inclusion PVM PHM strict? Second, observing that all marginalization algorithms in this paper are amenable to parallelization, do there exist sequential (P-hard) marginalization problems?

Acknowledgements

This work was funded in part by the DARPA ANSR, CODORD, and SAFRON programs under awards FA875023-2-0004, HR00112590089, and HR00112530141, NSF grant IIS1943641, and gifts from Adobe Research, Cisco Research, and Amazon. Approved for public release; distribution is unlimited.

Impact Statement

This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be

specifically highlighted here.

Agarwal, S. and Bl aser, M. Probabilistic generating circuits - demystified. In Salakhutdinov, R., Kolter, Z., Heller, K., Weller, A., Oliver, N., Scarlett, J., and Berkenkamp, F. (eds.), Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning Research, pp. 329 342. PMLR, 21 27 Jul 2024. URL https://proceedings.mlr. press/v235/agarwal24c.html.

Ahmed, K., Teso, S., Chang, K.-W., Van den Broeck, G., and Vergari, A. Semantic probabilistic layers for neurosymbolic learning. Advances in Neural Information Processing Systems, 35:29944 29959, 2022.

Arora, S. and Barak, B. Computational complexity: a modern approach. Cambridge University Press, 2009.

Arteta, C., Lempitsky, V., and Zisserman, A. Counting in the wild. In Computer Vision ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11 14, 2016, Proceedings, Part VII 14, pp. 483 498. Springer, 2016.

Bilmes, J. On virtual evidence and soft evidence in bayesian networks. University of Washington Electrical Engineering Technical Report UWEETR-2004-0016, 2004.

Bishop, C. M. and Bishop, H. Deep learning: Foundations and concepts. Springer Nature, 2023.

Bl aser, M. Not all strongly rayleigh distributions have small probabilistic generating circuits. In Proceedings of the 40th International Conference on Machine Learning, ICML 23. JMLR.org, 2023.

Broadrick, O., Cao, W., Wang, B., Trapp, M., and Van den Broeck, G. Probabilistic circuits for cumulative distribution functions. In Proceedings of the UAI Workshop on Tractable Probabilistic Modeling (TPM), aug 2024a. URL http://starai.cs.ucla.edu/ papers/Broadrick TPM24.pdf.

Broadrick, O., Zhang, H., and Van den Broeck, G. Polynomial semantics of tractable probabilistic circuits. In 40th Conference on Uncertainty in Artificial Intelligence, 2024b.

Bulatov, A., Dyer, M., Goldberg, L. A., Jalsenius, M., Jerrum, M., and Richerby, D. The complexity of weighted and unweighted# csp. Journal of Computer and System Sciences, 78(2):681 688, 2012.

Calandriello, D., Derezinski, M., and Valko, M. Sampling from a k-dpp without looking at all items. Advances in

The Limits of Tractable Marginalization

Neural Information Processing Systems, 33:6889 6899, 2020.

Cao, W. X., Garg, P., Tjoa, R., Holtzen, S., Millstein, T., and Van den Broeck, G. Scaling integer arithmetic in probabilistic programs. In Evans, R. J. and Shpitser, I. (eds.), Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, volume 216 of Proceedings of Machine Learning Research, pp. 260 270. PMLR, 31 Jul 04 Aug 2023. URL https://proceedings. mlr.press/v216/cao23b.html.

Castillo, E., Guti errez, J. M., and Hadi, A. S. Parametric structure of probabilities in bayesian networks. In European Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty, pp. 89 98. Springer, 1995.

Chakraborty, S., Fried, D., Meel, K. S., and Vardi, M. Y. From weighted to unweighted model counting. In IJCAI, pp. 689 695, 2015.

Chakraborty, S., Meel, K. S., and Vardi, M. Y. Approximate model counting. In Handbook of Satisfiability, pp. 1015 1045. IOS Press, 2021.

Chan, H. Incorporating uncertain evidence into arithmetic circuits representing probability distributions. In Advanced Methodologies for Bayesian Networks, pp. 105 116. PMLR, 2017.

Chan, H. and Darwiche, A. On the revision of probabilistic beliefs using uncertain evidence. Artificial Intelligence, 163(1):67 90, 2005.

Choi, Y., Vergari, A., and Van den Broeck, G. Probabilistic circuits: A unifying framework for tractable probabilistic models. UCLA. URL: http://starai. cs. ucla. edu/papers/Prob Circ20. pdf, pp. 6, 2020.

Chow, C. and Liu, C. Approximating discrete probability distributions with dependence trees. IEEE transactions on Information Theory, 14(3):462 467, 1968.

Cooper, G. F. The computational complexity of probabilistic inference using bayesian belief networks. Artificial intelligence, 42(2-3):393 405, 1990.

Creignou, N. and Hermann, M. Complexity of generalized satisfiability counting problems. Information and computation, 125(1):1 12, 1996. doi: 10.1006/inco.1996.0016. URL https://doi.org/10.1006/inco.1996. 0016.

Creignou, N., Schnoor, H., and Schnoor, I. Nonuniform boolean constraint satisfaction problems with cardinality constraint. ACM Trans. Comput. Logic, 11 (4), jul 2010. ISSN 1529-3785. doi: 10.1145/

1805950.1805954. URL https://doi.org/10. 1145/1805950.1805954.

Darwiche, A. On the tractable counting of theory models and its application to truth maintenance and belief revision. Journal of Applied Non-Classical Logics, 11(1-2): 11 34, 2001.

Darwiche, A. and Provan, G. Query dags: A practical paradigm for implementing belief-network inference. Journal of Artificial Intelligence Research, 6:147 176, 1997.

de Colnet, A. and Mengel, S. A compilation of succinctness results for arithmetic circuits. Proceedings of the 18th International Conference on Principles of Knowledge Representation and Reasoning, 2021.

Erickson, J., van der Hoog, I., and Miltzow, T. Smoothing the gap between np and er. SIAM Journal on Computing, 53(6):FOCS20 102 FOCS20 138, 2024. doi: 10.1137/ 20M1385287.

Fargier, H. and Marquis, P. Extending the knowledge compilation map: Krom, horn, affine and beyond. In 23th AAAI Conference on Artificial Intelligence (AAAI 08), pp. 442 447, 2008.

Fargier, H. and Marquis, P. Disjunctive closures for knowledge compilation. Artificial Intelligence, 216:129 162, 2014.

Garg, P., Holtzen, S., Van den Broeck, G., and Millstein, T. Bit blasting probabilistic programs. Proceedings of the ACM on Programming Languages, 8(PLDI):865 888, 2024.

Gomes, C. P., Sabharwal, A., and Selman, B. Model counting. In Handbook of satisfiability, pp. 993 1014. IOS press, 2021.

Harviainen, J., Ramaswamy, V. P., and Koivisto, M. On inference and learning with probabilistic generating circuits. In Uncertainty in Artificial Intelligence, pp. 829 838. PMLR, 2023.

Hoffman, M. D., Blei, D. M., Wang, C., and Paisley, J. Stochastic variational inference. Journal of Machine Learning Research, 2013.

Juma, A., Kabanets, V., Rackoff, C., and Shpilka, A. The black-box query complexity of polynomial summation. computational complexity, 18(1):59 79, 2009.

Karp, R. M. and Lipton, R. J. Some connections between nonuniform and uniform complexity classes. In Proceedings of the twelfth annual ACM symposium on Theory of computing, pp. 302 309, 1980.

The Limits of Tractable Marginalization

Kingma, D. P., Welling, M., et al. An introduction to variational autoencoders. Foundations and Trends in Machine Learning, 12(4):307 392, 2019.

Kisa, D., Van den Broeck, G., Choi, A., and Darwiche, A. Probabilistic sentential decision diagrams. In Fourteenth International Conference on the Principles of Knowledge Representation and Reasoning, 2014.

Koiran, P. and Perifel, S. Interpolation in valiant s theory. Computational Complexity, 20:1 20, 2011.

Koller, D. Probabilistic graphical models: Principles and techniques, 2009.

Koriche, F., Lagniez, J.-M., Marquis, P., and Thomas, S. Knowledge compilation for model counting: Affine decision trees. In IJCAI, pp. 947 953, 2013.

Kulesza, A. and Taskar, B. k-dpps: Fixed-size determinantal point processes. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 1193 1200, 2011.

Kulesza, A., Taskar, B., et al. Determinantal point processes for machine learning. Foundations and Trends in Machine Learning, 5(2 3):123 286, 2012.

Liu, A. and Van den Broeck, G. Tractable regularization of probabilistic circuits. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp. 3558 3570. Curran Associates, Inc., 2021. URL https://proceedings.neurips. cc/paper_files/paper/2021/file/ 1d0832c4969f6a4cc8e8a8fffe083efb Paper.pdf.

Liu, A., Ahmed, K., and Van den Broeck, G. Scaling tractable probabilistic circuits: A systems perspective. In Proceedings of the 41th International Conference on Machine Learning (ICML), jul 2024a. URL http://starai.cs.ucla.edu/papers/ Liu ICML24.pdf.

Liu, A., Niepert, M., and Van den Broeck, G. Image inpainting via tractable steering of diffusion models. In Proceedings of the Twelfth International Conference on Learning Representations (ICLR), May 2024b. URL http://starai.cs.ucla.edu/ papers/Liu ICLR24.pdf.

Liu, S., Ramadge, P. J., and Adams, R. P. Generative marginalization models. ar Xiv preprint ar Xiv:2310.12920, 2023.

Loconte, L., Sladek, A. M., Mengel, S., Trapp, M., Solin, A., Gillis, N., and Vergari, A. Subtractive mixture models

via squaring: Representation and learning. ar Xiv preprint ar Xiv:2310.00724, 2023.

Loconte, L., Mengel, S., and Vergari, A. Sum of squares circuits. ar Xiv preprint ar Xiv:2408.11778, 2024.

Meila, M. and Jordan, M. I. Learning with mixtures of trees. Journal of Machine Learning Research, 1(Oct): 1 48, 2000.

Nisan, N. and Wigderson, A. Lower bounds on arithmetic circuits via partial derivatives. Computational complexity, 6:217 234, 1996.

O Donnell, R. Analysis of boolean functions. Cambridge University Press, 2014.

Poon, H. and Domingos, P. Sum-product networks: A new deep architecture. In 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 689 690. IEEE, 2011.

Preparata, F. P. and Shamos, M. I. Computational geometry: an introduction. Springer Science & Business Media, 2012.

Rabiner, L. and Juang, B. An introduction to hidden markov models. ieee assp magazine, 3(1):4 16, 1986.

Rahman, T., Kothalkar, P., and Gogate, V. Cutset networks: A simple, tractable, and scalable approach for improving the accuracy of chow-liu trees. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2014, Nancy, France, September 1519, 2014. Proceedings, Part II 14, pp. 630 645. Springer, 2014.

Raz, R. Multi-linear formulas for permanent and determinant are of super-polynomial size. Journal of the ACM (JACM), 56(2):1 17, 2009.

Raz, R., Shpilka, A., and Yehudayoff, A. A lower bound for the size of syntactically multilinear arithmetic circuits. SIAM Journal on Computing, 38(4):1624 1647, 2008.

Roth, D. On the hardness of approximate reasoning. Artificial Intelligence, 82(1):273 302, 1996. ISSN 00043702. doi: https://doi.org/10.1016/0004-3702(94)000921. URL https://www.sciencedirect.com/ science/article/pii/0004370294000921.

Roth, D. and Samdani, R. Learning multi-linear representations of distributions for efficient inference. Machine Learning, 76:195 209, 2009.

Saad, F. A., Rinard, M. C., and Mansinghka, V. K. Sppl: probabilistic programming with fast exact symbolic inference. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language

The Limits of Tractable Marginalization

Design and Implementation, PLDI 2021, pp. 804 819, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383912. doi: 10.1145/ 3453483.3454078. URL https://doi.org/10. 1145/3453483.3454078.

Saptharishi, R. A survey of lower bounds in arithmetic circuit complexity. Github survey, 95, 2015.

Shpilka, A., Yehudayoff, A., et al. Arithmetic circuits: A survey of recent results and open questions. Foundations and Trends in Theoretical Computer Science, 5(3 4): 207 388, 2010.

Sladek, A. M., Trapp, M., and Solin, A. Encoding negative dependencies in probabilistic circuits. In The 6th Workshop on Tractable Probabilistic Modeling, 2023.

Strassen, V. Vermeidung von Divisionen. Journal f ur die reine und angewandte Mathematik, 264:184 202, 1973. URL http://eudml.org/doc/151394.

Vergari, A., Choi, Y., Liu, A., Teso, S., and Van den Broeck, G. A compositional atlas of tractable circuit operations for probabilistic inference. Advances in Neural Information Processing Systems, 34:13189 13201, 2021.

Wang, B. and Van den Broeck, G. On the relationship between monotone and squared probabilistic circuits. In Proceedings of the 39th Annual AAAI Conference on Artificial Intelligence, feb 2025.

Wedenig, T., Nagpal, R., Cassiers, G., Mangard, S., and Peharz, R. Exact soft analytical side-channel attacks using tractable circuits. In Salakhutdinov, R., Kolter, Z., Heller, K., Weller, A., Oliver, N., Scarlett, J., and Berkenkamp, F. (eds.), Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning Research, pp. 52472 52483. PMLR, 21 27 Jul 2024. URL https://proceedings.mlr. press/v235/wedenig24a.html.

Yu, Z., Trapp, M., and Kersting, K. Characteristic circuits. Advances in Neural Information Processing Systems, 36, 2024.

Zhang, H., Holtzen, S., and Broeck, G. On the relationship between probabilistic circuits and determinantal point processes. In Conference on Uncertainty in Artificial Intelligence, pp. 1188 1197. PMLR, 2020.

Zhang, H., Juba, B., and Van den Broeck, G. Probabilistic generating circuits. In International Conference on Machine Learning, pp. 12447 12457. PMLR, 2021.

Zhang, H., Dang, M., Peng, N., and Van den Broeck, G. Tractable control for autoregressive language generation. In Proceedings of the 40th International

Conference on Machine Learning (ICML), jul 2023. URL http://starai.cs.ucla.edu/papers/ Zhang ICML23.pdf.

Zhang, H., Kung, P.-N., Yoshida, M., Van den Broeck, G., and Peng, N. Adaptable logical control for large language models. In Advances in Neural Information Processing Systems 37 (Neur IPS), dec 2024. URL https://starai.cs.ucla.edu/papers/ Zhang Neur IPS24.pdf.

The Limits of Tractable Marginalization

A. Uniform Finally Multilinear Arithmetic Circuits

A.1. Note on Representation

The choice of representation is not important up to polynomial changes in length. For example and completeness, one can take the description following (Arora & Barak, 2009, Definition 6.5), substituting node types in the natural way, and indicating an edge not with zero or one but with a rational a/b with a, b Z given in base two.

A.2. Efficient Circuit Evaluation

We show that uniform finally multilinear arithmetic circuits can be evaluated at rational points in polynomial time. To do so, we first show that if a circuit family can be evaluated at integral points in polynomial time then it can be evaluated at rational points in polynomial time, a reduction which indeed holds for general arithmetic circuit families. (We also remark that multilinearity is not used in any of the proceeding arguments, and so, indeed, any uniform arithmetic circuit family with the degree bound of Definition 2.2 can be efficiently evaluated in the same way.) Lemma A.1. If there is a polynomial time Turing machine for evaluating the arithmetic circuit family C = (Cn)n=1,2,... at integral points x1, . . . , xn Z, then there is a polynomial time Turing machine that evaluates C at rational points x1, . . . , xn Q.

Proof. Consider the input a1

b2 , . . . , an

bn , with each ai, bi Z. Let D = Qn i=1 bi, a common denominator. Let ci = ai Q j [n]\{i} bj Z, and so ai

D. Denote by p the polynomial computed by Cn with the degree of p being d . Then, using a new variable t, observe that

D , . . . , tcn

d1,...,dn [0,1,...,d ]n αd1,...,dn

d1,...,dn [0,1,...,d ]n:Pn i=1 di=k αd1,...,dn

i=1 (ci)di .

where αd1,...,dn Q. In particular, consider the univariate polynomial in t,

f(t) = p (tc1, . . . , tcn) = X

d1,...,dn [0,1,...,d ]n αd1,...,dn

i=1 (tci)di

d1,...,dn [0,1,...,d ]n:Pn i=1 di=k αd1,...,dn

i=1 (ci)di .

The polynomial f(t) is of degree at most n and thus can explicitly by interpolation given the evaluation of p at n + 1 distinct (indeed, integral) points. Then, having f(t), the value p c1

D , . . . , cn

D = p a1 b1 , . . . , an

can be recovered, following

Equation (3), by multiplying the coefficient of tk in f(t) by D k for k = 0, 1, . . . , n and summing. (Note that, while recovering f(t) in this black-box manner suffices in general, f(t) can be obtained more directly in the case of uniform circuits by evaluating the circuit symbolically according to Equation (3), requiring 1 rather than n + 1 evaluations.)

We now show that uniform finally multilinear arithmetic circuits can be efficiently evaluated at rational points, Lemma 2.4, restated here for convenience.

Lemma 2.4. Let C = (Cn)n=1,2,... be a uniform finally multilinear arithmetic circuit family. Then there exists a polynomial time Turing machine which on input x1, x2, . . . , xn Q computes Cn(x1, x2, . . . , xn).

Proof. By Lemma A.1, it suffices to show that C can be evaluated at integral points. First, obtain the circuit Cn in polynomial time by the uniformity of C. Next, we show that evaluating Cn(x1, . . . , xn) in the natural way takes polynomial time, owing to the polynomial bound on the degree of internal nodes.

Assume inputs a1, . . . , an Z are given in base two (with an additional sign bit). For a Z, denote by ||a|| the number of bits needed to write a in this way (e.g., ||a|| 2 + log2 a). Then, for integers a, b, we have ||a + b|| 1 + max{||a||, ||b||} (Fact 1) and ||a b|| ||a||+||b|| (Fact 2), and, moreover, such additions and multiplications are polynomial time computable.

The Limits of Tractable Marginalization

Consider the augmented circuit C n obtained by replacing the constants of Cn by fresh variables. We say a node is of degree d if the polynomial it computes is of degree d (in both the original and fresh variables). Let B(d) be the maximum number of bits needed to write the integer computed by any node of degree d in C n. We show by induction that B(d) (3d 1)N, where N is the length of the input (i.e., N Pn i=1 ||ai||). This completes the proof since the additions and multiplications themselves are polynomial time computable in this bitwidth, and d is bounded by a polynomial in n N by Definition 2.3. Note that by uniformity of C there exists a (univariate) polynomial q upper bounding the size of the representation of C n by q(n).

For the base case, we upper bound B(1) by considering all nodes of degree 1 by cases (there are no nodes in C n of degree zero). For input nodes xi in C n which were already variables in Cn, the bitwidth is at most N q(N) since they are evaluated as ai. For input nodes zi in C n corresponding to the constants of Cn, the bitwidth is at most q(n) q(N) by uniformity of C. The nodes of degree one considered so far thus require at most q(N) bits each. All other nodes must be the sum of two3 degree one nodes and since there are at most q(n) q(N) nodes total in C n, and by Fact 1, we get B(1) 2q(N). Proceeding by induction, assume B(k) (3d 1)q(N), and consider B(k +1). Nodes of degree k +1 are obtained either (i) as a product of lower degree nodes or (ii) as a sum of nodes with degree at most k + 1. For case (i) and by Fact 2, the maximum resulting bitwidth is at most maxi [k] B(i)+B(k +1 i) (3(k +1) 2)q(N) where the inequality follows from the inductive hypothesis. For case (ii), since there are at most q(n) q(N) nodes in the circuit, we can have at most q(N) many additions of degree k + 1 nodes. Hence, by Fact 1 we have B(k + 1) q(N) + (3(k + 1) 2)q(N) = (3(k + 1) 1)q(N) as needed.

B. Virtual Evidence

Consider a function f = (fn) with fn : {0, 1}n Q with multilinear polynomial p = (pn). Here, we interpret the function as a probability mass function. We show how to use VMAR(f) to observe virtual evidence on this distribution, in a way which allows multiple such observations of virtual evidence (by its commutativity) and so maintaining the ability to compute VMAR queries (and so HMAR and MAR queries) on the resulting distributions. Consider the following expression with αixi and αi xi not both zero (i.e., xi {0, 1}, xi := 1 xi, and αi, αi not both zero as is guaranteed for valid virtual evidence scaling factors).

i=1 (αixi + αi xi)

p α1x1 α1x1 + α1 x1 , . . . , αnxn αnxn + αn xn

i=1 (αixi + αi xi)

S [n] p(vs) Y

αixi αixi + αi xi

1 αixi αixi + αi xi

by interpolation

S [n] p(v S) Y

i/ S αi xi (5)

i=1 (αixi + αi xi)

S [n] p(v S) Y

i/ S xi using xi = xi and x2 = x for x {0, 1}

= p(x1, x1, . . . , xn, xn)

i=1 (αixi + αi xi) by definition

= p(x1, . . . , xn)

i=1 (αixi + αi xi) by definition

Thus evaluating the expression Equation (4) allows one to condition on virtual evidence given an algorithm for VMAR(f). Note that Equation (5) can be written as P

S [n] p(v S) Q

i/ S xi. By inspection, this already gives the desired effect of virtual evidence (i.e., scaling the probabilities of outcomes by the corresponding factor); we show the final three steps to demonstrate alignment with the definition given in Equation (2) (and so relying on xi, xi {0, 1}).

3Enforcing at most two children per node is straightforward in polynomial time (and so changing the size of the circuit by at most a polynomial factor).

The Limits of Tractable Marginalization

C. Real RAM

The real RAM is a standard computational model used extensively for modeling computations involving real numbers, widely popular in the computational geometry community; we refer to Preparata & Shamos (2012) for more details. Several variants of the model have been developed, and here we briefly describe a recent formalization provided by (Erickson et al., 2024), which has the advantage of describing uniform algorithms. Every real RAM algorithm takes as input a pair of vectors (a, b) Rn Zm, and has a fixed bit size w of every word. Essentially, a word is any integer in {0, ..., 2w 1} which can be represented as a sequence of w bits. A word size w = Ω(log(n+m)) suffices, with the additional advantage of providing O(1) access to input data. The machine contains two random access arrays W[0...2w 1] and R[0...2w 1], to store words and exact real values, respectively. There is a machine program counter that is incremented at each step, and at any point the machine executes the step indicated by this counter. The runtime of the machine is the number of instructions executed by the machine before halting. On the word array, we are allowed constant and memory assignments, comparisons between different word values, rounded arithmetic operations like addition, subtraction, multiplication, and division, and bitwise boolean operations. For the real array, we are allowed constant assignments to 0 or 1, memory assignments, comparison with 0, and exact arithmetic operations such as addition, subtraction, multiplication, and division. Note that here we consider real RAMs without the ability to perform square roots on real inputs. The major points of difference between the real array and the word array are: while we are allowed to cast integers as reals, we do not allow casting reals to integers (for example, by using floor function), testing whether a real register stores an integer value, or any access to the binary representation of the real numbers.

C.1. Real RAM UFMACs

We say that f = (fn) with f : {0, 1}n R has a real RAM UFMAC if there exists a polynomial time real RAM which on input 1n outputs an arithmetic circuit Cn computing fn. The real RAM operates on the padded input (1p(n), 1p(n)), where p(n) is a polynomial bound on the size of Cn, i.e., at most the running time of the real RAM. The output is a circuit in some standard representation, with the gates, edge relations, and edge weights encoded in the memory array on output, using real registers to store edge weights of the circuit.

C.2. Proof of Proposition 5.4

Proof. Consider any input for VMAR(f) given to the real RAM as the tuple W, R where W represents the word array (storing w-bit numbers), and R represents the array of reals. The important thing to note is that the input points for the VMAR query are given to us in R, and W is empty initially. Also, as mentioned in Appendix C above, the only real operation affected by the word array is when we cast integers to reals. However, this can be replicated purely on the real registers by creating the constant (by repeated multiplication and addition) on the reals, and then using that. Since, any word size is bounded by 2w, we can make the constant in poly(w) time by repeated multiplication. Hence, any computation on the word array doesn t actually contribute to the final output. For the sake of completeness, we do note that since all words are w-bit integers, any computation involving them is a bit operation, and hence can be arithmetized with only a polynomial blowup. We now move to the main argument of the proof: showing how to convert any arithmetic operations on the reals to an arithmetic circuit computation in polynomial time. We are allowed four operations: addition, subtraction, multiplication, and division. Clearly, the first three easily give rise to an arithmetic circuit. Whenever we see a division operator, we use the famous result of Strassen (1973) to eliminate the division operator. Since, by assumption, the real RAM computes a multilinear polynomial in the end, we know that the degree is bounded by n, and hence to remove the division operator we only need to consider truncations up to degree n. Further, to avoid degree blowup on the intermediate nodes, we only need to consider homogenous parts up to degree n. This can be achieved with only a polynomial blowup in size (Lemma 5.2 in Saptharishi (2015)).

Finally, let us now look at the computation of the real RAM. First assume that the real RAM does not make any comparison operations. Then the computation of the real RAM can be turned into an arithmetic circuit computing some polynomial q, by the arguments above. Note, at each point we only compute the homogenous parts upto degree n. By the definition of virtual evidence, q(x) coincides with p(x) for all x Qn where p is the multilinear polynomial representation of f. Since q and p are polynomials, they also have to be the same as polynomials. Since p is multilinear, the obtained circuit is finally multilinear by definition.

Now assume that the real RAM does make comparisons. We get some computation tree. At each leaf u, we compute a polynomial qu. This polynomial coincides with p for each input x that takes the path to u. For each leaf u, the set of

The Limits of Tractable Marginalization

all such x form a semialgebraic set. Since all these sets together cover the whole Rn, there is at least one u such that the corresponding semialgebraic set contains the product I1 . . . In of nonempty open intervals. On this product, qu(x) = p(x), and therefore, qu and p are the same polynomials.

D. Succinctness Corollary

In the literature on knowledge compilation, representation languages for boolean functions are compared in their tractability for various transformations and queries as well as in their succinctness. Marginalization corresponds to the query of counting in the boolean setting (up to partial assignments), and Theorem 4.10 settles an open question on the succinctness of languages with tractable counting. There are several well known languages that support polynomial time counting (e.g., OBDD, SDD), but most of them are special cases of (and so not more succinct than) d-DNNF. Moreover, d-DNNF straightforwardly translates (in linear time) into syntactically multilinear arithmetic circuits, thus falling within UFMAC. However, Koriche et al. (2013) introduced the representation language Extended Affine Decision Tree (EADT) which supports polynomial time counting, and they left open the succinctness relationship between d-DNNF and EADT. While we refer the reader to Koriche et al. (2013) for full definitions, it is not hard to see that our function faff can be represented efficiently in EADT (in fact even in the much less succinct though incomplete language AFF (Fargier & Marquis, 2008)). We are therefore able to partially resolve the succinctness relationship between d-DNNF and EADT.

For a formula ϕ in some representation language L, let |ϕ| denote the size of ϕ (say, the number of bits in some reasonable fixed representation). For representation languages L1, L2, we say L1 is at least as succinct as L2 (denoted L1 s L2) if there is a polynomial p such that for any ϕ2 L2, there exists a ϕ1 L1 with ϕ1 ϕ2 such that |ϕ1| p(|ϕ2|). Here ϕ1 ϕ2 means that ϕ1 and ϕ2 represent the same boolean function.

Corollary D.1. Unless PH = Σ2, d-DNNF s EADT.

Proof. Let faff be as defined in Equation (1), and so HMAR(faff) is #P-hard by Theorem 4.10. Note that faff can be represented in polynomial size in either AFF or EADT. But, if faff can be represented by a polynomial size d-DNNF, then by the algorithm of Proposition 4.5 we have HMAR(faff) P\poly. In particular, this gives #P FP\poly and so NP P\poly, collapsing the polynomial hierarchy to Σ2 by the Karp-Lipton Theorem (Karp & Lipton, 1980).