# a_compositional_atlas_for_algebraic_circuits__2a568401.pdf

A Compositional Atlas for Algebraic Circuits

Benjie Wang University of California, Los Angeles benjiewang@ucla.edu

Denis Deratani Mauá University of São Paulo ddm@ime.usp.br

Guy Van den Broeck University of California, Los Angeles guyvdb@cs.ucla.edu

Yoo Jung Choi Arizona State University yj.choi@asu.edu

Circuits based on sum-product structure have become a ubiquitous representation to compactly encode knowledge, from Boolean functions to probability distributions. By imposing constraints on the structure of such circuits, certain inference queries become tractable, such as model counting and most probable configuration. Recent works have explored analyzing probabilistic and causal inference queries as compositions of basic operators to derive tractability conditions. In this paper, we take an algebraic perspective for compositional inference, and show that a large class of queries including marginal MAP, probabilistic answer set programming inference, and causal backdoor adjustment correspond to a combination of basic operators over semirings: aggregation, product, and elementwise mapping. Using this framework, we uncover simple and general sufficient conditions for tractable composition of these operators, in terms of circuit properties (e.g., marginal determinism, compatibility) and conditions on the elementwise mappings. Applying our analysis, we derive novel tractability conditions for many such compositional queries. Our results unify tractability conditions for existing problems on circuits, while providing a blueprint for analysing novel compositional inference queries.

1 Introduction

Circuit-based representations, such as Boolean circuits, decision diagrams, and arithmetic circuits, are of central importance in many areas of AI and machine learning. For example, a primary means of performing inference in many models, from Bayesian networks [16, 9] to probabilistic programs [20, 24, 26, 43], is to convert them into equivalent circuits; this is commonly known as knowledge compilation. Inference via knowledge compilation has also been used for many applications in neuro-symbolic AI, such as constrained generation [2, 54] and neural logic programming [34, 28]. Circuits can also be learned as probabilistic generative models directly from data [25, 41, 40, 32], in which context they are known as probabilistic circuits [11]. Compared with neural generative models, probabilistic circuits enjoy tractable evaluation of inference queries such as marginal probabilities, which has been used for tasks such as fair machine learning [12] and causal reasoning [53, 50, 49].

The key feature of circuits is that they enable one to precisely characterize tractability conditions (structural properties of the circuit) under which a given inference query can be computed exactly and efficiently. One can then enforce these circuit properties when compiling or learning a model to enable tractable inference. For many basic inference queries, such as computing a marginal probability, tractability conditions are well understood [48, 8]. However, for more complex queries, the situation is less clear, and the exercise of deriving algorithms and tractability conditions for a given query has usually been carried out in an instance-specific manner requiring significant effort.

38th Conference on Neural Information Processing Systems (Neur IPS 2024).

Y e p(X, Y )

P X ω(X) P Y q ϕ(X,Y ) P Y ϕ(X,Y )

[[ ]] Y τ 1 Y

Figure 1: Example applications of our compositional inference framework for (Left) MMAP and (Right) Success Probability in Prob. Logic Programing under the Stable Model semantics (Max Ent).

In Figure 1, we illustrate two such queries. The marginal MAP (MMAP) [13] query takes a probabilistic circuit p and some evidence e and asks for the most likely assignment of a subset of variables. The success probability inference in probabilistic logic programming [6, 45] takes a circuit representation ϕ of a logic program, a weight function ω and some query q, and computes the probability of the query under the program s semantics (Max Ent, in the example). At first glance, these seem like very different queries, involving different types of input circuits (logical and probabilistic), and different types of computations. However, they share similar algebraic structure: logical and probabilistic circuits can be interpreted as circuits defined over different semirings, while maximization and summation can be viewed as aggregation over different semirings. In this paper, inspired by the compositional atlas for probabilistic circuits [48], we take a compositional approach to algebraic inference problems, breaking them down into a series of basic operators: aggregation, product, and elementwise mapping. For example, the MMAP and probabilistic logic programming queries involve multiple interleaved aggregations and products, along with one elementwise mapping each. Given a circuit algorithm (and associated tractability condition) for each basic operator, we can reuse these algorithms to construct algorithms for arbitrary compositions. The key challenge is then to check if each intermediate circuit satisfies the requisite tractability conditions.

Our contributions can be summarized as follows. We introduce a compositional inference framework for algebraic circuits (Section 3) over arbitrary semirings, generalizing existing results on logical [18] and probabilistic [48] circuits. In particular, we provide a language for specifying inference queries involving different semirings as a composition of basic operators (Section 3.1). We then prove sufficient conditions for the tractability of each basic operator (Section 3.2) and novel conditions for composing such operators (Section 3.3). We apply our compositional framework to a number of inference problems (Section 4), showing how our compositional approach leads to more systematic derivation of tractability conditions and algorithms, and in some cases improved complexity analysis. In particular, we discover a tractability hierarchy for inference queries captured under the 2AMC framework [29], and reduce the complexity of causal backdoor/frontdoor adjustment on probabilistic circuits [38, 49] from quadratic/cubic to linear/quadratic respectively.

2 Preliminaries

Notation We use capital letters (e.g., X, Y ) to denote variables and lowercase for assignments (values) of those variables (e.g., x, y). We use boldface to denote sets of variables/assignments (e.g., X, y) and write Assign(V ) for the set of all assignments to V . Given a variable assignment v of V , and a subset of variables W V , we write v W to denote the assignment of W corresponding to v.

Semirings In this paper, we consider inference problems over commutative semirings. Semirings are sets closed w.r.t. operators of addition ( ) and multiplication ( ) that satisfy certain properties: Definition 1 (Commutative Semiring). A commutative semiring S is a tuple (S, , , 0S, 1S), where and are associative and commutative binary operators on a set S (called the domain) such that distributes over (i.e., a (b c) = (a b) (a c) for all a, b, c S); 0S S is the additive identity (i.e., 0S a = a for all a S) and annihilates S through multiplication (i.e., 0S a = 0 for all a S); and 1S S is the multiplicative identity (i.e., 1S a = a for all a S).

For example, the probability semiring P = (R 0, +, , 0, 1) employs standard addition and multiplication ( = + and = ) over the non-negative reals, the (max, ) semiring M = (R 0, max, , 0, 1)

(a) A Boolean circuit that is smooth, decomposable, deterministic, but not {X}-deterministic.

JX1K JY1K J X1K J Y1K JX2K JY2K J X2K J Y2K

(b) A probabilistic circuit that is smooth, decomposable, and {X1, X2}-deterministic. J K maps to 1 and to 0.

Figure 2: Examples of Algebraic Circuits. We use , , to represent input, sum and product nodes respectively.

replaces addition with maximization, while the Boolean semiring B = ({ , }, , , , ) employs disjunction and conjunction operators ( = and = ) over truth values.

Algebraic Circuits We now define the concept of an algebraic circuit, which are computational graph-based representations of functions taking values in an arbitrary semiring. Definition 2 (Algebraic Circuit). Given a semiring S = (S, , , 0S, 1S), an algebraic circuit C over variables V is a rooted directed acyclic graph (DAG), whose nodes α have the following syntax:

α ::= l | +k i=1αi | k i=1αi , where αi C are circuit nodes, k N>0 and l : Assign(W ) S is a function over a (possibly empty) subset W V of variables, called its scope. That is, each circuit node may be an input (l), sum (+), or a product ( ). The scope of any internal node is defined to be vars(α) := k i=1vars(αi). Each node α represents a function pα taking values in S, defined recursively by: pα(w) ::= l(w) if α = l, pα(w) ::= k i=1pαi(w) if α = +k i=1αi, and pα(w) ::= k i=1pαi(w) if k i=1αi, where W is the scope of α. The function p C represented by the circuit is defined to be the function of the root node. The size |C| of a circuit is defined to be the number of edges in the DAG.

For simplicity, we will restrict to circuits with binary products (i.e. k = 2 for products); this can be enforced with at most a linear increase in size. Prominent examples of algebraic circuits include negation normal forms (NNF) and binary decision diagrams [4] which are over the Boolean semiring and represent Boolean functions and probabilistic circuits [11] which are over the probabilistic semiring and represent probability distributions.1 By imposing simple restrictions on the circuit, which we call circuit properties, various inference queries that are computationally hard in general become tractable. In particular, smoothness and decomposability ensure tractable marginal inference: Definition 3 (Smoothness, Decomposability). A circuit is smooth if for every sum node α = +iαi, its children have the same scope: i, j, vars(αi) = vars(αj). A circuit is decomposable if for every product node α = α1 α2, its children have disjoint scopes: vars(α1) vars(α2) = .

Aside from the scopes of circuit nodes, we can also specify properties relating to their supports [11]: Definition 4 (X-Support). Given a partition (X, Y ) of variables V and a node α in circuit C, the X-support of α is the projection of its support on X: supp X(α) = {x Assign(X vars(α)) : y Assign(vars(α) \ X) s.t. pα(x, y) = 0S}. Definition 5 (X-Determinism). Given a circuit C and a partition (X, Y ) of V , we say that C is X-deterministic if for all sum nodes α = +k i=1αi, either: (i) vars(α) X = ; or (ii) supp X(αi) supp X(αj) = for all i = j.

X-determinism refers to a family of properties indexed by sets X. In particular V -determinism is usually referred to simply as determinism. Note that, as defined, scope and support, and thus these circuit properties, apply to any semiring: the scope only depends on the variable decomposition of the circuit, while the support only refers to scope and the semiring additive identity 0S. Figure 2a shows a simple example of a smooth, decomposable, and deterministic circuit that is not X-deterministic, while Figure 2b shows a smooth, decomposable, and {X1, X2}-deterministic circuit.

1Probabilistic circuits are sometimes written with weights on the edges; this can easily be translated to our formalism by replacing the child of a weighted edge with a product of itself and an input function with empty scope corrresponding to the weight [44, 42].

3 Compositional Inference: A Unifying Approach

Many inference problems can be written as compositions of basic operators, which take as input one or more functions and output another function. For example, the marginal MAP query on probability distributions maxx P

y p(x, y) is a composition of the P and max operators. Similarly, for Boolean functions ϕ, ψ, the query P

x y. ϕ(x, y) ψ(x, y) composes the P, and operators. Although these queries appear to involve four different operators, three of them (P, max, ) can be viewed as an aggregation operation over different semirings. Thus, we begin this section by consolidating to a simple set of three operators applicable to functions taking values in some semiring: namely, aggregation, product, and elementwise mapping (Section 3.1).

Equipped with this language for specifying compositional inference queries, we then move on to analyzing their tractability when the input functions are given as circuits. The thesis of this paper is that algebraic structure is often the right level of abstraction to derive useful sufficient (and sometimes necessary) conditions for tractability. We firstly show tractability conditions of each of the basic operators (Section 3.2), before deriving composability conditions showing how circuit properties are maintained through operators (Section 3.3). This enables us to systematically derive conditions for the input circuits that enable efficient computation of a compositional inference query. Algorithms and detailed proofs of all theorems can be found in Appendix A.

3.1 Basic Operators

Aggregation Given a function f : Assign(V ) S, aggregating f over W V returns the function f :Assign(Z) S for Z = V \ W defined by f (z) := L

For example, aggregation corresponds to forgetting variables W in the Boolean semiring, marginalizing out W in the probability semiring, and maximizing over assignments in the (max, ) semiring. Next, some queries, such as divergence measures between probability distributions, take two functions as input, and many others involve combining two or more intermediate results, as is the case in probabilistic answer set programming inference and causal backdoor/frontdoor queries. We define the product operator to encapsulate such combination of functions in general.

Product Given two functions f :Assign(W ) S and f :Assign(W ) S, the product of f and f is a function f : Assign(V ) S, where V =W W , defined by f (v) := f(v W ) f (v W ).

For example, a product corresponds to the conjoin operator in the Boolean semiring, and standard multiplication in the probability semiring. Lastly, we introduce the elementwise mapping operator, defined by a mapping τ from a semiring to a (possibly different) semiring. When applied to a function f, it returns the function composition τ f. This is the key piece that distinguishes our framework from prior analysis of sum-of-product queries over specific semirings, allowing us to express queries such as causal inference and probabilistic logic programming inference under the same framework.

Elementwise Mapping Given a function f : Assign(V ) S and a mapping τ : S S from semiring S to S satisfying τ(0S) = 0S , an elementwise mapping of f by τ results in a function f : Assign(V ) S defined by f (v) := τ(f(v)).2

In practice, we use elementwise mappings as an abstraction predominantly for two purposes. The first is for switching between semirings, while the second is to map between elements of the same semiring. For the former, one of the most important elementwise mappings we will consider is the support mapping, which maps between any two semirings as follows.

Definition 6 (Support Mapping). Given a source semiring S and a target semiring S , the support mapping J KS S is defined as: Ja KS S = 0S if a = 0S; Ja KS S = 1S otherwise.

In particular we will often use the source semiring S = B, in which case the support mapping maps to the 0S and to the 1S in the target semiring. This is useful for encoding a logical function for inference in another semiring, e.g. probabilistic inference in the probabilistic semiring.

2In a slight abuse of notation, we will write τ : S S to indicate that τ maps between the respective sets.

Example 1 (Marginal MAP). Suppose that we are given a Boolean formula ϕ(X, Y ) and a weight function w : Assign(X Y ) R 0. The marginal MAP query for variables X is defined by

MMAP(ϕ, ω) = max x

y ϕ(x, y) ω(x, y) ,

where we interpret as 1 and as 0. We can break this down into a compositional query as follows: M

y Jϕ(x, y)KB P ω(x, y)

The support mapping ensures ϕ and ω are both functions over the probabilistic semiring, so that we can apply the product operation. Notice also the inclusion of an identity mapping τid,P M from the probability to the (max, ) semiring defined by τid,P M(x) = x for all x R 0. While differentiating between semirings over the same domain may seem superfluous, the explicit identity operator will become important when we analyze the tractability of these compositions on circuits.

3.2 Tractability Conditions for Basic Operators

We now consider the tractability of applying each basic operator to circuits: that is, computing a circuit whose function corresponds to the result of applying the operator to the functions given by the input circuit(s). First, it is well known that forgetting and marginalization of any subset of variables can be performed in polynomial time if the input circuits in the respective semirings (NNF and PC) are smooth and decomposable [18, 11]. This can be generalized to arbitrary semirings: Theorem 1 (Tractable Aggregation). Let C be a smooth and decomposable circuit representing a function p : Assign(V ) S. Then for any W V , it is possible to compute the aggregate as a smooth and decomposable circuit C (i.e., p C (Z) = L

w p C(Z, w)) in O(|C|) time and space.

Next, let us consider the product operator. In the Boolean circuits literature, it is well known that the conjoin operator can be applied tractably if the circuits both follow a common structure known as a vtree [17]. In [48] a more general property known as compatibility was introduced that directly specifies conditions with respect to two (probabilistic) circuits, without reference to a vtree. We now define a generalization of this property (X-compatibility) and also identify a new condition (X-support-compatibility) that enables tractable products. Definition 7 (X-Compatibility). Given two smooth and decomposable circuits C, C over variables V , V respectively, and a variable set X V V , we say that C, C are X-compatible if for every product node α = α1 α2 C and α = α 1 α 2 C such that vars(α) X = vars(α ) X, the scope is partitioned in the same way, i.e. vars(α1) X = vars(α 1) X and vars(α2) X = vars(α 2) X. We say that C, C are compatible if they are (V V )-compatible.

Intuitively, compatibility states that the scopes of the circuits decompose in the same way at product nodes. Compatibility of two circuits suffices to be able to tractably compute their product: Theorem 2 (Tractable Product - Compatibility). Let C, C be compatible circuits over variables V , V , respectively, and the same semiring. Then it is possible to compute their product as a circuit C compatible with them (i.e., p C (V V ) = p C(V ) p C (V )) in O(|C||C |) time and space.

We remark that if we are given a fully factorized function f(V ) = N

Vi V fi(Vi), this can be arranged as a circuit (series of binary products) compatible with any other decomposable circuit; thus, we say this type of function is omni-compatible. We also say that a circuit is structured decomposable if it is compatible with itself. Now, our more general definition of X-compatibility states that the scopes of the circuits restricted to X decompose in the same way at product nodes. This will be important when we consider composing products with other operators, such as aggregation. The following result shows that compatibility w.r.t. a subset is a weaker condition: Proposition 1 (Properties of X-Compatibility). If two circuits C, C are X-compatible, then they are X -compatible for any subset X X.

Compatibility is a sufficient but not necessary condition for tractable products. Some non-compatible circuits can be efficiently restructured to be compatible, such that we can then apply Theorem 2; we refer readers to [55] for details. Alternatively, it is also known that deterministic circuits can be multiplied with themselves in linear time, even when they are not structured decomposable [48, 27]. We formalize this idea with a new property that we call support-compatibility.

Definition 8 (X-Support Compatibility). Given two smooth and decomposable circuits C, C over variables V , V respectively, and a set of variables X V V , let C[X], C [X] be the DAGs obtained by restricting to nodes with scope overlapping with X. We say that C, C are X-supportcompatible if there is an isomorphism ι between C[X], C [X] such that: (i) for any node α C[X], vars(α) X = vars(ι(α)) X; (ii) for any sum node α C[X], supp X(αi) supp X(ι(αj)) = whenever i = j. We say that C, C are support-compatible if they are (V V )-support-compatible.

To unpack this definition, we note that any smooth, decomposable, and X-deterministic circuit is X-support-compatible with itself, with the obvious isomorphism. However, this property is more general in that it allows for circuits over different sets of variables and does not require that the nodes represent exactly the same function; merely that the sum nodes have compatible support decompositions. As we will later see, the significance of this property is that it can be often maintained through applications of operators, making it useful for compositions.

Theorem 3 (Tractable Product - Support Compatibility). Let C, C be support-compatible circuits over variables V , V , respectively, and the same semiring. Then, given the isomorphism ι, it is possible to compute their product as a smooth and decomposable circuit C support-compatible with them (i.e., p C (V V ) = p C(V ) p C (V )) in O(max(|C|, |C |)) time and space.

We now examine the tractability of general elementwise mappings τ : S S on a circuit C. It is tempting here to simply construct a new circuit C over the semiring S with the same structure as C, and replace each input function l in the circuit with τ(l). However, the resulting circuit p C (V ) is not guaranteed to correctly compute τ(p C(V )) in general. For example, consider the support mapping J KB S which maps to 0S and to 1S for the probability semiring S = (R 0, +, , 0, 1). Then the transformation of the smooth and decomposable circuit C = X X produces C = JXK + JXK, which evaluates to p C (X = ) = 2 whereas τ(p C(X = )) = 1. In order for this simple algorithm to be correct, we need to impose certain conditions on the elementwise mapping τ and/or the circuit C it is being applied to.

Theorem 4 (Tractable Mapping). Let C be a smooth and decomposable circuit over semiring S, and τ : S S a mapping such that τ(0S) = 0S . Then it is possible to compute the mapping of C by τ as a smooth and decomposable circuit C (i.e., p C (V ) = τ(p C(V ))) in O(|C|) time and space if τ distributes over sums and over products.

τ distributes over sums if: either (Additive) τ is an additive homomorphism, i.e. τ(a b) = τ(a) τ(b); or (Det) C is deterministic.

τ distributes over products if: either (Multiplicative) τ is an multiplicative homomorphism, i.e. τ(a b) = τ(a) τ(b); or (Prod 0/1) τ(1S) = 1S , and for all product nodes α = α1 α2 C, and for every value v Assign(vars(α)), either pα1(vvars(α1)) {0S, 1S} or pα2(vvars(α2)) {0S, 1S}.

We can apply Theorem 4 to immediately derive the following property of support mappings:

Corollary 1 (Support Mapping). Given a circuit C over a semiring S and any target semiring S , a circuit representing Jp CKS S can be computed tractably if (i) S satisfies a b = 0S = a = b = 0S and S is idempotent (i.e., 1S 1S = 1S ), or (ii) C is deterministic.

Proof. First note that J KS S satisfies (Multiplicative), and thus distributes over products. If (i) holds, consider Ja b KS S . If a = b = 0S, then this is equal to J0SKS S = Ja KS S + Jb KS S = 0S ; otherwise a, b, a b = 0S and Ja b KS S = Ja KS S Jb KS S = 1S (by idempotence of S ). Thus J KS S satisfies (Additive). Alternatively, if (ii) holds, then (Det) holds. In either case J KS S distributes over sums in the circuit.

The following examples illustrate the generality of elementwise mappings and Theorem 4:

Example 2 (Partition Function and MAP). Given a probability distribution p(V ), consider the task of computing the partition function P

v p(v) and MAP maxv p(v). These can be viewed as aggregations over the probability and (max, ) semirings respectively.

p is often either a probabilistic circuit Cprob, or a combination of a Boolean circuit Cbool and weights w (in weighted model counting). In the former case, the partition function is tractable because the circuit is already over the probability semiring, while in the latter case, MAP is tractable because the S = (max, ) semiring is idempotent so JCbool KB S is tractable. On the other hand, the partition

Table 1: Tractability Conditions for Operations on Algebraic Circuits. Sm: Smoothness, Dec: Decomposability; X-Det(erminism), X-Cmp: X-Compatibility, X-SCmp: X-Support-Compatibility.

If the Input Circuit(s) are ... Conditions X-Det X-Cmp w/ Cother X-SCmp w/ Cother Complexity Then the Output Circuit is ... (A.4)

Aggr. (W ) Sm, Dec X-Det if W X = X-Cmp w/ Cother if W X = X-SCmp w/ Cother if W X = O(|C|) (A.1)

Product Cmp X-Det X-Cmp w/ Cother N/A O(|C||C |) (A.2.1) SCmp X-Det X-Cmp w/ Cother X-SCmp w/ Cother O(max(|C|, |C |)) (A.2.2)

Elem. Mapping Sm, Dec, (Add/Det), (Mult/Prod01) X-Det X-Cmp w/ Cother X-SCmp w/ Cother O(|C|) (A.3)

function for Boolean circuits and MAP for PCs require determinism for the conditions of Theorem 4 to hold; in fact, these problems are known to be NP-hard without determinism [18, 39]. Example 3 (Power Function in Probability Semiring). For the probability semiring S = S =

(R 0, +, , 0, 1), consider the power function τβ(a) := aβ if a = 0 0 if a = 0 for some β R. This

mapping satisfies (Multiplicative), and is tractable if we enforce (Det) on the circuit.

It is worth noting that semiring homomorphisms (i.e. additive and multiplicative) are always tractable. In the case when S = S = P, it was shown in [48] that the only such mapping is the identity function. However this is not the case for other semirings: the power function τβ is an example in the (max, ) semiring. To summarize, we have shown sufficient tractability conditions for aggeregation, products, and elementwise mappings. Notice that the conditions for aggregation and products only depend on variable scopes and supports, and as such apply to any semiring; in contrast, for elementwise mappings, we take advantage of specific properties of the semiring(s) in question.

3.3 Tractable Composition of Operators

We now analyze compositions of these basic operators. As such, we need to consider not only circuit properties that enable tractability, but how these properties are maintained through each operator, so that the output circuit can be used as input to another operator. We call these composability conditions. In all cases, the output circuit is smooth and decomposable. Thus, we focus on the properties of X-determinism, X-compatibility, and X-support-compatibility. We emphasize that these are not singular properties, but rather families of properties indexed by a variable set X. We present the intuitive ideas behind our results below, while deferring full proofs to the Appendix. Theorem 5 (Composability Conditions). The results in Table 1 hold.

X-determinism Intuitively, X-determinism is maintained through products because the resulting sum nodes partition the X-support in a "finer" way to the original circuits, and through elementwise mappings since they do not expand the support of any node (since τ(0S) = 0S ). For aggregation, the X-support is maintained if aggregation does not occur over any of the variables in X.

X-compatibility Here, we are interested in the following question: if the input circuit(s) to some operator are X-compatible with some other circuit Cother for any fixed X, is the same true of the output of the operator? X-compatibility with Cother is maintained through aggregation because it weakens the condition (by Proposition 1) and through elementwise mapping as it does not change variable scopes. As for taking the product of circuits, the output circuit will maintain similar variable partitionings at products, such that it remains X-compatible with Cother. Notably, this result does not hold for compatibility where the scope X may be different for each pair of circuits under consideration; we show a counterexample in Example 4 in the Appendix.

X-support-compatibility X-support-compatibility is maintained through elementwise mappings and aggregation (except on X) for similar reasons to X-determinism. For products, the result retains a similar X-support structure, so X-support compatibility is maintained.

We conclude by remarking that, once we determine that a compositional query is tractable, then one immediately obtains a correct algorithm for computing the query by application of the generic

Table 2: Tractability Conditions and Complexity for Compositional Inference Problems. We denote new results with an asterisk.

Problem Tractability Conditions Complexity

2AMC PASP (Max-Credal) Sm, Dec, X-Det O(|C|) PASP (Max Ent) , MMAP Sm, Dec, Det, X-Det O(|C|) SDP Sm, Dec, Det, X-Det, X-First O(|C|))

Causal Inference Backdoor Sm, Dec, SD, (X Z)-Det O(|C|2) Sm, Dec, Z-Det, (X Z)-Det O(|C|) Frontdoor Sm, Dec, SD, X-Det, (X Z)-Det O(|C|2)

Other MFE Sm, Dec, H-Det, I -Det, (H I )-Det O(|C|) Reverse-MAP Sm, Dec, X-Det O(|C|)

algorithms for aggregation, product, and elementwise mapping (see Appendix A). An upper bound on the complexity (attained by the algorithm) is also given by considering the complexities of each individual operator; in particular, the algorithm is polytime for a bounded number of operators.

4 Case Studies

In this section, we apply our compositional framework to analyze the tractability of several different problems involving circuits found in the literature (Table 2). Some of the results are known, but can now be cast in a general framework (with often simpler proofs). We also present new results, deriving tractability conditions that are less restrictive than reported in existing literature.

Theorem 6 (Tractability of Compositional Queries). The results in Table 2 hold.

4.1 Algebraic Model Counting

In algebraic model counting [30] (a generalization of weighted model counting), one is given a Boolean function ϕ(V ), and a fully-factorized labeling function ω(V ) = N

Vi V ωi(Vi) in some semiring S, and the goal is to aggregate these labels for all satisfying assignments of ϕ. This can be easily cast in our framework as L

v J(ϕ(v))KB S ω(v) . Here, the support mapping J KB S transfers the Boolean function to the semiring S over which aggregation occurs. Assuming that ϕ(V ) is given as a smooth and decomposable Boolean circuit (DNNF), then by Corollary 1 AMC is tractable if S is idempotent or if the circuit is additionally deterministic (note that ω(V ) is omni-compatible, so the product is tractable); this matches the results of [30].

2AMC A recent generalization of algebraic model counting is the 2AMC (second-level algebraic model counting) problem [29], which encompasses a number of important bilevel inference problems such as marginal MAP and inference in probabilistic answer set programs. Given a partition of the variables V = (X, Y ), a Boolean function ϕ(X, Y ), outer and inner semirings SX, SY , labeling functions ωY (Y ) = N

Yi Y ωY ,i(Yi) over SY and ωX(X) = N

Xi X ωX,i(Xi) over SX, and an elementwise mapping τSY SX : SY SX, the 2AMC problem is given by:

y Jϕ(x, y)KB SY ω(y) ω (x) (1)

To tackle this type of bilevel inference problem, [29] identified a circuit property called X-firstness. Definition 9 (X-Firstness). Suppose C is a circuit over variables V and (X, Y ) a partition of V . We say that a node α C is X-only if vars(α) X, Y -only if vars(α) Y , and mixed otherwise. Then we say that C is X-first if for all product nodes α = α1 α2, we have that either: (i) each αi is X-only or Y -only; (ii) or exactly one αi is mixed, and the other is X-only.

It was stated in [29] that smoothness, decomposability, determinism, and X-firstness suffice to ensure tractable computation of 2AMC problems, by simply evaluating the circuit in the given semirings (caching values if necessary). We now show that this is neither sufficient nor necessary in general. To build intuition, consider the simple NNF circuit ϕ(X, Y ) = (X Y ) (X Y ). Note that ϕ trivially satisfies X-firstness and is smooth, decomposable, and deterministic. Let S

(a) Boolean circuit ϕ(X, Y )

(b) Inner semiring evaluation

(c) Outer semiring evaluation

Figure 3: Failure case of 2AMC algorithm on smooth, decomposable, X-first circuit.

be the probability semiring, S be the (max, )-semiring, labeling functions be ω(y) = ω( y) = 1, ω (x) = ω ( x) = 1, and the mapping function be the identity τ(a) = a. Then, noting that the labels are the multiplicative identity 1, the 2AMC value is max X τ(P

Y Jϕ(X, Y )KB S) = max τ(Jϕ(x, y)KB S + Jϕ(x, y)KB S), τ(Jϕ( x, y)KB S + Jϕ( x, y)KB S) = max τ(1 + 1), τ(0) = 2. On the other hand, the algorithm of [29] returns the value 2AMC = 1, as shown in Figure 3. This is not just a flaw in the specific evaluation algorithm, but rather a provable intractability of the problem given these properties: Theorem 7 (Hardness of 2AMC with X-firstness). 2AMC is #P-hard, even for circuits that are smooth, decomposable, deterministic, and X-first, and a constant-time elementwise mapping.

Analyzing using our compositional framework, the issue is that the tractability conditions for τ do not hold; whilst the Boolean circuit is deterministic, this is not true once Y is aggregated. In fact, we show that also enforcing X-determinism suffices to tractably compute arbitrary 2AMC instances. Theorem 8 (Tractability Conditions for 2AMC). Every 2AMC instance is tractable in O(|C|) time for Boolean circuits that are smooth, decomposable, deterministic, X-first, and X-deterministic.

Proof sketch. The key point to notice is that the elementwise mapping relative to the transformation of inner to outer semiring operates over an aggregation of an X-first and X-deterministic circuit, obtained by the product of a Boolean function (mapped to the inner semiring by a support mapping) and a weight function of Y . Hence, it satisfies (Det) and (Prod 0/1): all of the X-only children of a product node are 0/1 valued (in the inner semiring).

For specific instances of 2AMC, depending on the semirings S, S and mapping function τ, we also find that it is possible to remove the requirement of X-firstness or (V )-determinism, as we summarize in Table 2. One might thus wonder if there is a difference in terms of compactness between requiring X-determinism and X-firstness, as opposed to X-determinism alone. For example, for sentential decision diagrams (SDD) [17], a popular knowledge compilation target, these notions coincide: a SDD is X-deterministic iff it is X-first (in which context this property is known as X-constrainedness [37, 22]). However, as shown in Figure 2b, there exist X-deterministic but not X-first circuits. We now show that X-deterministic circuits can be exponentially more succinct than X-deterministic circuits that are additionally X-first, as the size of X grows.3

Theorem 9 (Exponential Separation). Given sets of variables X = {X1, ..., Xn}, Y = {Y1, ..., Yn}, there exists a smooth, decomposable and X-deterministic circuit C of size poly(n) such that the smallest smooth, decomposable, and X-first circuit C such that p C p C has size 2Ω(n).

Thus, to summarize, some instances of 2AMC can be solved efficiently when ϕ is smooth, decomposable and X-deterministic. A larger number of instances can be solved when additionally, ϕ is deterministic; and all 2AMC problems are tractable if we also impose X-firstness.

4.2 Causal Inference

In causal inference, one is often interested in computing interventional distributions, denoted using the do( ) operator, as a function of the observed distribution p. This function depends on the causal graph linking the variables, and can be derived using the do-calculus [38]. For example, the well-known backdoor and frontdoor graphs induce the following formulae:

p(y|do(x)) = X

z p(z)p(y|x, z), (2)

3If the size of X is fixed, a circuit can always be rearranged to be X-first with at most a 2|X| blowup.

p(y|do(x))= X

x p(x )p(y|x , z). (3)

Assuming that the observed joint distribution p(X, Y , Z) is given as a probabilistic circuit C, we consider the problem of obtaining a probabilistic circuit C over variables X Y representing p(Y |do(X)). Tractability conditions for the backdoor/frontdoor cases were derived by [49], with quadratic/cubic complexity respectively. However, we observe that in some cases we can avoid the requirement of structured decomposability and/or obtain reduced complexity relative to their findings.

In the backdoor case, it is known that structured decomposability and (X Z)-determinism suffices for a quadratic time algorithm. This can be seen by decomposing into a compositional query: M

x,y p(v) p(v) τ 1 M

y p(v) . (4)

where V = (X, Y , Z), and τ 1(a) = a 1 if a = 0 0 if a = 0. Assuming (X Z)-determinism and

structured decomposability, then τ 1 L y p(V ) is tractable by (Det) and (Multiplicative), the product p(V ) τ 1 L

y p(V ) by support-compatibility, and the final product by compatibility. However, if we additionally have Z-determinism, then the final product becomes tractable by support compatibility. This has linear rather than quadratic complexity, and does not require the circuit to be structured decomposable. In the frontdoor case, [49] showed that X-determinism, (X Z)- determinism, and structured decomposability suffices for cubic complexity. However, we note that under such conditions, the inner product p(X ) p(Y |X , Z) is tractable by support-compatibility. As such, the complexity of this query is actually quadratic rather than cubic as previously shown. We summarize our findings in Table 2 and refer the reader to the Appendix for full proofs.

5 Related Work

Our work builds upon the observation that many inference problems can be characterized as a composition of basic operators. Prior works have considered compositional inference for circuits in the Boolean [18] and probabilistic semirings [48, 49], deriving tractability conditions for operators specific to these semirings. Aside from generalizing to arbitrary semirings, we also introduce extended composability conditions that enable interleaving of aggregation, products, and mappings. Meanwhile, algebraic model counting [30] deals (implicitly) with mappings from the Boolean semiring to an arbitrary semiring, but does not consider compositional queries. Closest to our work, [29] consider a generalization of algebraic model counting that allows for an additional semiring translation; however, this still assumes input Boolean circuits and has incomplete tractability characterizations. Our framework resolves these limitations, permitting arbitrary compositional queries over semirings.

Many works have considered (unbounded) sums-of-products queries on arbitrary semirings [21, 5, 1, 23], encompassing many important problems such as constraint satisfaction problems [7], graphical model inference [56], and database queries [52], which are often computationally hard in the worstcase. Algorithms for such queries often utilize compact intermediate representations and/or assume compact input representations, such as circuits [35, 17, 36, 3]. Our framework focuses on queries where the number of operators is bounded, and characterizes conditions under which inference is tractable in polynomial time. It also includes elementwise mappings as a key additional abstraction that can be used to express queries involving more than sums and products.

6 Conclusion

In summary, we have introduced a framework for analysing compositional inference problems on circuits, based on algebraic structure. In doing so, we were able to derive new tractability conditions and simplified algorithms for a number of existing problems, including 2AMC and causal inference. Our framework focuses on simple and composable sufficient tractability conditions for aggregations, products and elementwise mappings operators; a limitation of this generality is these conditions may not be necessary for specific queries on specific semirings. Our work motivates the development of knowledge compilation and learning algorithms that target the requisite circuit properties, such as X-determinism. Finally, while we focus on exact inference, for many problems (e.g. marginal MAP) approximate algorithms exist and are of significant interest; an interesting direction for future work is to investigate if these can be also be generalized using the compositional approach.

Acknowledgements

We thank Antonio Vergari for helpful discussions, and acknowledge him for proposing an early version of support compatibility and Theorem 3, and for pointing out a potential reduction in complexity for the causal inference queries. This work was done in part while the authors were visiting the Simons Institute for the Theory of Computing. This work was funded in part by the DARPA ANSR program under award FA8750-23-2-0004, the DARPA PTG Program under award HR00112220005, and NSF grant #IIS-1943641. DM received generous support from the IBM Corporation, the Center for Artificial Intelligence at University of São Paulo (C4AI-USP), the São Paulo Research Foundation (FAPESP grants #2019/07665-4 and 2022/02937-9), the Brazilian National Research Council (CNPq grant no. 305136/2022-4) and CAPES (Finance Code 001). YC was partially supported by a gift from Cisco University Research Program.

[1] Mahmoud Abo Khamis, Hung Q Ngo, and Atri Rudra. Faq: questions asked frequently. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 13 28, 2016.

[2] Kareem Ahmed, Stefano Teso, Kai-Wei Chang, Guy Van den Broeck, and Antonio Vergari. Semantic probabilistic layers for neuro-symbolic learning. In Advances in Neural Information Processing Systems 35 (Neur IPS), dec 2022.

[3] Antoine Amarilli and Florent Capelli. Tractable circuits in database theory. ACM SIGMOD Record, 53(2):6 20, 2024.

[4] Antoine Amarilli, Marcelo Arenas, Yoo Jung Choi, Mikaël Monet, Guy Van den Broeck, and Benjie Wang. A circus of circuits: Connections between decision diagrams, circuits, and automata. ar Xiv preprint ar Xiv:2404.09674, 2024.

[5] Fahiem Bacchus, Shannon Dalmao, and Toniann Pitassi. Solving# sat and bayesian inference with backtracking search. Journal of Artificial Intelligence Research, 34:391 442, 2009.

[6] Chitta Baral, Michael Gelfond, and J. Nelson Rushton. Probabilistic reasoning with answer sets. Theory and Practice of Logic Programming, 9(1):57 144, 2009.

[7] Stefano Bistarelli, Ugo Montanari, and Francesca Rossi. Semiring-based constraint satisfaction and optimization. Journal of the ACM (JACM), 44(2):201 236, 1997.

[8] Oliver Broadrick, Honghua Zhang, and Guy Van den Broeck. Polynomial semantics of tractable probabilistic circuits. In Proceedings of the 40th Conference on Uncertainty in Artificial Intelligence (UAI), july 2024.

[9] Mark Chavira and Adnan Darwiche. On probabilistic inference by weighted model counting. Artificial Intelligence, 172(6):772 799, 2008.

[10] Arthur Choi, Yexiang Xue, and Adnan Darwiche. Same-decision probability: A confidence measure for threshold-based decisions. International Journal of Approximate Reasoning, 53 (9):1415 1428, 2012. ISSN 0888-613X. doi: https://doi.org/10.1016/j.ijar.2012.04.005. URL https://www.sciencedirect.com/science/article/pii/S0888613X12000485. Fifth European Workshop on Probabilistic Graphical Models (PGM-2010).

[11] Yoo Jung Choi, Antonio Vergari, and Guy Van den Broeck. Probabilistic circuits: A unifying framework for tractable probabilistic models. ar Xiv preprint, 2020.

[12] Yoo Jung Choi, Meihua Dang, and Guy Van den Broeck. Group fairness by probabilistic modeling with latent fair decisions. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 12051 12059, 2021.

[13] Yoo Jung Choi, Tal Friedman, and Guy Van den Broeck. Solving marginal map exactly by probabilistic circuit transformations. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS), 2022.

[14] Fabio Gagliardi Cozman and Denis Deratani Mauá. On the semantics and complexity of probabilistic logic programs. Journal of Artificial Intelligence Research, 60:221 262, 2017.

[15] Adnan Darwiche. On the tractable counting of theory models and its application to truth maintenance and belief revision. Journal of Applied Non-Classical Logics, 11(1-2):11 34, 2001. doi: 10.3166/jancl.11.11-34.

[16] Adnan Darwiche. A differential approach to inference in bayesian networks. Journal of the ACM (JACM), 50(3):280 305, 2003.

[17] Adnan Darwiche. Sdd: A new canonical representation of propositional knowledge bases. In Twenty-Second International Joint Conference on Artificial Intelligence, 2011.

[18] Adnan Darwiche and Pierre Marquis. A knowledge compilation map. Journal of Artificial Intelligence Research, 17:229 264, 2002.

[19] Cassio P De Campos. New complexity results for map in bayesian networks. In IJCAI, volume 11, pages 2100 2106. Citeseer, 2011.

[20] Luc De Raedt, Angelika Kimmig, and Hannu Toivonen. Problog: A probabilistic prolog and its application in link discovery. In Proceedings of the International Joint Conference in Artificial Intelligence (IJCAI), volume 7, pages 2462 2467, 2007.

[21] Rina Dechter. Bucket elimination: A unifying framework for reasoning. Artificial Intelligence, 113(1-2):41 85, 1999.

[22] Vincent Derkinderen and Luc De Raedt. Algebraic circuits for decision theoretic inference and learning. In ECAI 2020, pages 2569 2576. IOS Press, 2020.

[23] Thomas Eiter and Rafael Kiesel. On the complexity of sum-of-products problems over semirings. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 6304 6311, 2021.

[24] Daan Fierens, Guy Van den Broeck, Joris Renkens, Dimitar Shterionov, Bernd Gutmann, Ingo Thon, Gerda Janssens, and Luc De Raedt. Inference and learning in probabilistic logic programs using weighted boolean formulas. Theory and Practice of Logic Programming, 15(3):358 401, 2015.

[25] Robert Gens and Domingos Pedro. Learning the structure of sum-product networks. In International conference on machine learning, pages 873 880. PMLR, 2013.

[26] Steven Holtzen, Guy Van den Broeck, and Todd Millstein. Scaling exact inference for discrete probabilistic programs. Proceedings of the ACM on Programming Languages, 4(OOPSLA): 1 31, 2020.

[27] Haiying Huang and Adnan Darwiche. Causal unit selection using tractable arithmetic circuits. ar Xiv preprint ar Xiv:2404.06681, 2024.

[28] Jiani Huang, Ziyang Li, Binghong Chen, Karan Samel, Mayur Naik, Le Song, and Xujie Si. Scallop: From probabilistic deductive databases to scalable differentiable reasoning. Advances in Neural Information Processing Systems, 34:25134 25145, 2021.

[29] Rafael Kiesel, Pietro Totis, and Angelika Kimmig. Efficient knowledge compilation beyond weighted model counting. In Proceedings of the 38th International Conference on Logic Programming (ICLP 2022), 2022.

[30] Angelika Kimmig, Guy Van den Broeck, and Luc De Raedt. Algebraic model counting. Journal of Applied Logic, 22:42 62, 2017.

[31] Johan Kwisthout. Most frugal explanations in bayesian networks. Artificial Intelligence, 218: 56 73, 2015. ISSN 0004-3702. doi: https://doi.org/10.1016/j.artint.2014.10.001.

[32] Anji Liu, Honghua Zhang, and Guy Van den Broeck. Scaling up probabilistic circuits by latent variable distillation. In Proceedings of the International Conference on Learning Representations (ICLR), may 2023.

[33] T. Lukasiewicz. Probabilistic description logic programs. International Journal of Approximate Reasoning, 45(2):288 307, 2007.

[34] Robin Manhaeve, Sebastijan Dumancic, Angelika Kimmig, Thomas Demeester, and Luc De Raedt. Deepproblog: Neural probabilistic logic programming. Advances in neural information processing systems, 31, 2018.

[35] Robert Mateescu, Rina Dechter, and Radu Marinescu. And/or multi-valued decision diagrams (aomdds) for graphical models. Journal of Artificial Intelligence Research, 33:465 519, 2008.

[36] Dan Olteanu and Maximilian Schleich. Factorized databases. ACM SIGMOD Record, 45(2): 5 16, 2016.

[37] Umut Oztok, Arthur Choi, and Adnan Darwiche. Solving pp pp-complete problems using knowledge compilation. In Fifteenth International Conference on the Principles of Knowledge Representation and Reasoning, 2016.

[38] Judea Pearl. Causal diagrams for empirical research. Biometrika, 82(4):669 688, 1995.

[39] Robert Peharz, Robert Gens, Franz Pernkopf, and Pedro Domingos. On the latent variable interpretation in sum-product networks. IEEE transactions on pattern analysis and machine intelligence, 39(10):2030 2044, 2016.

[40] Robert Peharz, Antonio Vergari, Karl Stelzner, Alejandro Molina, Xiaoting Shao, Martin Trapp, Kristian Kersting, and Zoubin Ghahramani. Random sum-product networks: A simple and effective approach to probabilistic deep learning. In Uncertainty in Artificial Intelligence, pages 334 344. PMLR, 2020.

[41] Tahrima Rahman, Prasanna Kothalkar, and Vibhav Gogate. Cutset networks: A simple, tractable, and scalable approach for improving the accuracy of chow-liu trees. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2014, Nancy, France, September 15-19, 2014. Proceedings, Part II 14, pages 630 645. Springer, 2014.

[42] Amirmohammad Rooshenas and Daniel Lowd. Learning sum-product networks with direct and indirect variable interactions. In International Conference on Machine Learning, pages 710 718. PMLR, 2014.

[43] Feras A Saad, Martin C Rinard, and Vikash K Mansinghka. Sppl: probabilistic programming with fast exact symbolic inference. In Proceedings of the 42nd acm sigplan international conference on programming language design and implementation, pages 804 819, 2021.

[44] Amir Shpilka, Amir Yehudayoff, et al. Arithmetic circuits: A survey of recent results and open questions. Foundations and Trends in Theoretical Computer Science, 5(3 4):207 388, 2010.

[45] Pietro Totis, Luc De Raedt, and Angelika Kimmig. sm Prob Log: Stable model semantics in problog for probabilistic argumentation. Theory And Practice Of Logic Programming, 23(6): 1198 1247, 2023.

[46] Leslie G Valiant. The complexity of enumeration and reliability problems. SIAM Journal on Computing, 8(3):410 421, 1979.

[47] Guy Van den Broeck, Anton Lykov, Maximilian Schleich, and Dan Suciu. On the tractability of shap explanations. In Proceedings of the 35th AAAI International Conference on Artificial Intelligence and Statistics (AAAI 2021), 2021.

[48] Antonio Vergari, Yoo Jung Choi, Anji Liu, Stefano Teso, and Guy den Broeck. A Compositional Atlas of Tractable Circuit Operations for Probabilistic Inference. In Advances in Neural Information Processing Systems, volume 34, pages 13189 13201, 2021.

[49] Benjie Wang and Marta Kwiatkowska. Compositional probabilistic and causal inference using tractable circuit models. In Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS), pages 9488 9498. PMLR, 2023.

[50] Benjie Wang, Matthew R Wicker, and Marta Kwiatkowska. Tractable uncertainty for structure learning. In International Conference on Machine Learning, pages 23131 23150. PMLR, 2022.

[51] Zhun Yang, Adam Ishay, and Joohyung Lee. Neur ASP: Embracing neural networks into answer set programming. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI), pages 1755 1762, 2020.

[52] Mihalis Yannakakis. Algorithms for acyclic database schemes. In VLDB, volume 81, pages 82 94, 1981.

[53] Matej Zeˇcevi c, Devendra Dhami, Athresh Karanam, Sriraam Natarajan, and Kristian Kersting. Interventional sum-product networks: Causal inference with tractable probabilistic models. Advances in neural information processing systems, 34:15019 15031, 2021.

[54] Honghua Zhang, Meihua Dang, Nanyun Peng, and Guy Van den Broeck. Tractable control for autoregressive language generation. In Proceedings of the 40th International Conference on Machine Learning (ICML), jul 2023.

[55] Honghua Zhang, Benjie Wang, Marcelo Arenas, and Guy Van den Broeck. Restructuring tractable probabilistic circuits. ar Xiv preprint ar Xiv:2411.12256, 2024.

[56] Nevin L Zhang and David Poole. A simple approach to bayesian network computations. In Proc. of the Tenth Canadian Conference on Artificial Intelligence, 1994.

Algorithm 1: AGG Input: Smooth and decomposable algebraic circuit C(V ); node α C; Subset of variables W vars(α) Output: Node encoding L

1 if α is input node then

2 return AGG-INPUT(α; W )

3 else if α is product or sum node and vars(α) = W then

4 return NEWNODE( k i=1p AGG(αi;W vars(αi))) if α is product else NEWNODE( k i=1p AGG(αi;W ))

5 else if α is product or sum node and W vars(α) then

6 return k i=1AGG(αi; W vars(αi)) if α is product else +k i=1AGG(αi; W )

Algorithm 2: PROD-CMP Input: Compatible algebraic circuits C(V ), C (V ); nodes α C, α C s.t. vars(α) (V V ) = vars(α ) (V V ) Output: Node encoding p C(V ) p C (V )

1 if vars(α) vars(α ) = then

2 return α α

3 else if α is a product or input node and α = +k j=1 is a sum node then

4 return +k j=1PROD-CMP(α, α j)

5 else if α, α are input nodes then

6 return PROD-INPUT(α, α )

7 else if α = α1 α2, α = α 1 α 2 are product nodes then

8 return PROD-CMP(α1, α 1) PROD-CMP(α2, α 2)

9 else if α = +k i=1αi, α = +k j=1α j are sum nodes then

10 return +k i=1 +k j=1 PROD-CMP(αi, α j)

A Algorithms and Proofs

In Algorithms 1-4 we present algorithms for the aggregation, product (with compatibility), product (with support-compatiblity), and elementwise mapping operators respectively (the initial call is to the root of the circuit(s)). In the following, we present proofs that the algorithms soundly compute smooth and decomposable output circuits for the respective operators.

A.1 Tractable Aggregation

Theorem 1 (Tractable Aggregation). Let C be a smooth and decomposable circuit representing a function p : Assign(V ) S. Then for any W V , it is possible to compute the aggregate as a smooth and decomposable circuit C (i.e., p C (Z) = L

w p C(Z, w)) in O(|C|) time and space.

Proof. We prove this inductively, starting from the input nodes of the circuit. Our claim is that for each node α C, AGG(α; W ) (Algorithm 1) returns a node α with scope vars(α ) = vars(α) \ W such that pα (vars(α )) = L

w pα(vars(α)), and is decomposable (if product) and smooth (if sum).

If α is an input node (Lines 1-2), then this is possible by assumption; we denote this with AGG-INPUT in the algorithm. Note that if vars(α) = W , then this is just a scalar/constant (i.e. input node with empty scope).

Algorithm 3: PROD-SCMP Input: Support-compatible algebraic circuits C(V ), C (V ); nodes α C, α C s.t. ι(α) = α Output: Circuit encoding p C(V ) p C (V )

1 if vars(α) vars(α ) = then

2 return α α

3 else if α, α are input nodes then

4 return PROD-INPUT(α, α )

5 else if α = α1 α2, α = α 1 α 2 are product nodes then

6 return PROD-SCMP(α1, α 1) PROD-SCMP(α2, α 2)

7 else if α = +k i=1αi, α = +k i=1α i are sum nodes then

8 return +k i=1PROD-SCMP(αi, α i)

Algorithm 4: MAPPING Input: Smooth and decomposable algebraic circuit C(V ) over semiring S; Node α C; Mapping function τ : S S Output: Node encoding τ(p C(V ))

1 if α is input node then

2 return MAPPING-INPUT(α; τ)

3 else if α is product or sum node then

4 return k i=1MAPPING(αi; τ) if α is product else k i=1MAPPING(αi; τ)

If α is a product node α1 α2, then by decomposability, W vars(α1) and W vars(α2) partition W . Thus we have that:

w pα(vars(α)) = M

pα1(vars(α1)) pα2(vars(α2))

pα1(vars(α1)) pα2(vars(α2))

w vars(α1) pα1(vars(α1))

w vars(α2) pα2(vars(α2))

= p AGG(α1;W vars(α1))(vars(α1) \ W ) p AGG(α2;W vars(α2))(vars(α2) \ W )

The second equality follows by the partition (and associativity of the addition and multiplication), while the third follows by distributivity of multiplication over addition. In the case where vars(α) = W (Lines 3-4), then p AGG(αi;W vars(αi))(vars(αi)) is just a scalar for each i, so we can directly perform this computation, returning a new scalar node α . Otherwise (Lines 5-6), we construct a new product node α = α 1 α 2 = AGG(α1; W vars(α1)) AGG(α2; W vars(α2)). By the inductive hypothesis, α i has scope vars(α i) = vars(αi) \ W , so α is clearly decomposable and has scope vars(α ) = (vars(α1) \ W ) (vars(α2) \ W ) = vars(α) \ W .

If α = +k i=1αi is a sum node, then we note that by smoothness, vars(αi) = vars(α) for all i. Thus we have that:

w pα(vars(α)) = M

i=1 pαi(vars(α))

w pαi(vars(α))

w pαi(vars(αi))

i=1 p AGG(αi;W )(vars(αi))

In the case where vars(α) = W (Lines 3-4), then p AGG(αi;W )(vars(αi)) is just a scalar, so we can directly perform this computation, returning a new scalar node α . Otherwise (Lines 5-6), we construct a new sum node α = +k i=1α i = +k i=1AGG(αi; W ). By the inductive hypothesis, each α i has scope vars(αi) \ W = vars(α) \ W , so α is smooth and also has scope vars(α) \ W .

A.2 Tractable Product

A.2.1 Tractable Product with Compatibility

Theorem 2 (Tractable Product - Compatibility). Let C, C be compatible circuits over variables V , V , respectively, and the same semiring. Then it is possible to compute their product as a circuit C compatible with them (i.e., p C (V V ) = p C(V ) p C (V )) in O(|C||C |) time and space.

Proof. We prove this inductively bottom up, for nodes α C, α C such that vars(α) (V V ) = vars(α ) (V V ). Our claim is that PROD-SCMP(α, α ) (Algorithm 2) returns a node α such that pα = pα pα , has scope vars(α ) = vars(α) vars(α ), and is decomposable (if product) and smooth (if sum).

If vars(α) vars(α ) = (i.e. vars(α) (V V ) = vars(α ) (V V ) is empty), then the algorithm (Lines 1-2) simply constructs a new product node α = α α . By definition, pα = pα pα , has scope vars(α ) = vars(α) vars(α ), and α is decomposable.

If α, α are input nodes, then we can construct a new input node α satisfying the requisite properties (Lines 5-6).

If α is an input or product node and α = +k j=1α j is a sum node, then the algorithm constructs a new sum node α = +k j=1PROD-CMP(α, α j). This computes the correct function as pα =

k j=1 pα pα j

= pα k j=1pα j

= pα pα . Each child has scope vars(α) vars(α j) =

vars(α) vars(α ), so smoothness is retained.

If α = α1 α2, α = α 1 α 2 are product nodes such that vars(α) (V V ) = vars(α ) (V V ) is non-empty, then writing X := V V , by compatibility we also have vars(α1) X = vars(α 1) X and vars(α2) X = vars(α 2) X, so we can apply the inductive hypothesis for PROD-CMP(α1, α 1) and PROD-CMP(α2, α 2). Algorithm 2 constructs a new product node α = PROD-CMP(α1, α 1) PROD-CMP(α2, α 2). To show that this is decomposable, we need the following lemma:

Lemma 1 (Decomposability of Product). Suppose α C, α C are decomposable product nodes which decompose in the same way over X, i.e. vars(α1) X = vars(α 1) X and vars(α2) X = vars(α 2) X. Then (vars(α1) vars(α 1)) (vars(α2) vars(α 2)) = .

Proof. We have that:

(vars(α1) vars(α 1)) (vars(α2) vars(α 2))

= (vars(α1) vars(α2)) (vars(α 1) vars(α 2)) (vars(α1) vars(α 2)) (vars(α2) vars(α 1))

Note that the first two intersections are empty due to decomposability of α, α . For the third intersection (vars(α1) vars(α 2)), any variable in this intersection must be in the common variables X. But we know that vars(α 2) X = vars(α2) X in both cases above; by decomposability, (vars(α 2) X) (vars(α1) X) = . Thus the third intersection is also empty; a similar argument applies for the fourth.

Applying this Lemma, we see that α is decomposable as vars(PROD-CMP(α1, α 1)) = (vars(α1) vars(α 1)) and vars(PROD-CMP(α2, α 2)) = (vars(α2) vars(α 2)). We can also verify that pα = p PROD-CMP(α1,α 1) p PROD-CMP(α2,α 2) = pα1 pα 1 pα2 pα 2 = pα pα by the inductive hypothesis, and associativity of .

If α = +k i=1αi, α = +k i=1α i are sum nodes, then the algorithm produces a new sum node α = +k i=1 +k j=1 PROD-CMP(αi, α j) (Lines 7-8). This computes the correct function as pα = k i=1 k j=1 PROD-CMP(αi, α j) = k i=1 k j=1 pαipα j = ( k i=1pαi) ( k j=1pα j) = pα pα . It also retains smoothness.

The complexity of this algorithm is O(|C||C |) because we perform recursive calls for pairs of nodes in C and C .

A.2.2 Linear-time Product with Support Comptibility

Theorem 3 (Tractable Product - Support Compatibility). Let C, C be support-compatible circuits over variables V , V , respectively, and the same semiring. Then, given the isomorphism ι, it is possible to compute their product as a smooth and decomposable circuit C support-compatible with them (i.e., p C (V V ) = p C(V ) p C (V )) in O(max(|C|, |C |)) time and space.

Proof. We prove this inductively bottom up, for nodes α C such that α C either satisfies α = ι(α) or vars(α) vars(α ) = . Our claim is that PROD-SCMP(α, α ) (Algorithm 3) returns a node α such that pα = pα pα , has scope vars(α ) = vars(α) vars(α ), and is decomposable (if product) and smooth (if sum).

If vars(α) vars(α ) = , then the algorithm (Lines 1-2) simply constructs a new product node α = α α . By definition, pα = pα pα , has scope vars(α ) = vars(α) vars(α ), and α is decomposable.

If the α, α are input nodes, then we can construct a new input node α satisfying the requisite properties (Lines 3-4).

If α = α1 α2, α = α 1 α 2 are product nodes and ι(α) = α , then the Algorithm (Lines 5-6) constructs a product node α = PROD-SCMP(α1, α 1) PROD-SCMP(α2, α 2). Define X = V V . By support compatibility (i.e. X-support compatibility), α, α are part of the restricted circuits C[X], C [X] respectively and so vars(α) X = , vars(α ) X = . There are two cases to consider; we first show that in both of these cases, we can apply the inductive hypothesis to PROD-SCMP(α1, α 1) and PROD-SCMP(α2, α 2).

Firstly, suppose that both α1 and α2 have scope overlapping with X. Then by the isomorphism, we have α 1 = ι(α1), α 2 = ι(α2). By the definition of support compatibility, this also means vars(α1) X = vars(α 1) X and vars(α2) X = vars(α 2) X and these are both non-empty; thus we can apply the inductive hypothesis for PROD-SCMP(α1, α 1) and PROD-SCMP(α2, α 2).

Second, suppose instead that only α1 has scope overlapping with X, and so vars(α2) X = . Then α 1 = ι(α1) and vars(α1) X = vars(α 1) X = vars(α) X = vars(α ) X. Since vars(α 2) = vars(α ) \ vars(α 1), it follows that vars(α2) X = (vars(α ) X) \ (vars(α 1) X) = , i.e. α 2 also does not have scope overlapping with X. Since X are the shared variables V , V , it follows that vars(α2) vars(α 2) = , and so we can apply the inductive hypothesis for PROD-SCMP(α2, α 2) (and for PROD-SCMP(α1, α 1)).

By the inductive hypothesis, PROD-SCMP(α1, α 1) has scope vars(α1) vars(α 1) and PROD-SCMP(α2, α 2) has scope vars(α2) vars(α 2). We can thus apply Lemma 1. Thus PROD-SCMP(α1, α 1) and PROD-SCMP(α2, α 2) have disjoint scopes and α is decomposable. We

can also verify that pα = p PROD-SCMP(α1,α 1) p PROD-SCMP(α2,α 2) = pα1 pα 1 pα2 pα 2 = pα pα by the inductive hypothesis, and associativity of .

If α = +k i=1αi, α = +k i=1α i are sum nodes and ι(α) = α , then by smoothness, all of the children of α have the same support and all the children of α have the same support; thus all the children are in C[X], C [X] respectively, k = k , and ι(αi) = α i. By support compatibility, we also that (i) vars(αi) X = vars(α j) X for all i, j; and (ii) that supp X(αi) supp X(α j) for i = j.

We claim that pαi pα j 0S whenever i = j. To see this, recall the definition of X-support: we have that:

supp X(αi) = {x Assign(X vars(αi)) : y Assign(vars(αi) \ X) s.t. pαi(x, y) = 0S}

supp X(α j) = {x Assign(X vars(α j)) : y Assign(vars(α j) \ X) s.t. pα j(x, y) = 0S}

Since X vars(αi) = X vars(α j) and is nonempty, by (ii) we know that there is no assignment of X vars(αi) such that pαi and pα j can be simultaneously not equal to 0S. Thus there is no assignment of X vars(αi) such that pαi pα j is not 0S, since 0S is the multiplicative annihilator.

Thus, the product function is given by:

j=1 (pαi pα j)

i=1 (pαi pα i)

i=1 PROD-SCMP(αi, α i)

The second equality follows by the Lemma and the fact that 0S is the additive identity, and the third equality by the inductive hypothesis. Thus α = +k i=1PROD-SCMP(αi, α i) computes the correct function (Lines 7-8). We conclude by noting that vars(α ) = Sk i=1(vars(αi) vars(αi)) = Sk i=1 vars(αi) Sk i=1 vars(αi) = vars(α) vars(α ).

The complexity of this procedure applied to the root nodes is O(max(|C|, |C |), as we only perform recursive calls for (i) α C[X] and its corresponding node α = ι(α) and (ii) nodes with nonoverlapping scope, upon which the recursion ends; so the overall number of recursive calls is linear in the size of the circuits.

A.3 Tractable Elementwise Mapping

Theorem 4 (Tractable Mapping). Let C be a smooth and decomposable circuit over semiring S, and τ : S S a mapping such that τ(0S) = 0S . Then it is possible to compute the mapping of C by τ as a smooth and decomposable circuit C (i.e., p C (V ) = τ(p C(V ))) in O(|C|) time and space if τ distributes over sums and over products.

τ distributes over sums if: either (Additive) τ is an additive homomorphism, i.e. τ(a b) = τ(a) τ(b); or (Det) C is deterministic.

τ distributes over products if: either (Multiplicative) τ is an multiplicative homomorphism, i.e. τ(a b) = τ(a) τ(b); or (Prod 0/1) τ(1S) = 1S , and for all product nodes α = α1 α2 C, and for every value v Assign(vars(α)), either pα1(vvars(α1)) {0S, 1S} or pα2(vvars(α2)) {0S, 1S}.

Proof. First, let us consider sum nodes. Given any sum node α = +k i=1αi C, we consider computing a circuit representing

τ pα(vars(α)) τ k M

i=1 pαi(vars(α)) (5)

If (Additive) holds, then we immediately have that τ Lk i=1 pαi(vars(α)) Lk i=1 τ(pαi(vars(α))) by associativity of . Alternatively, if (Det) holds, then given any v Assign((vars(α))), there is at most one child, say αj, such that pαj(v) = 0S. Then we have that

τ k i=1pαi(v) = τ pαj(v) k M

i=1,i =j pαi(v)

= τ pαj(v) k M

i=1,i =j 0S

= τ pαj(v) k M

i=1,i =j 0S

= τ pαj(v) k M

i=1,i =j τ(0S)

= τ pαj(v) k M

i=1,i =j τ(pαi(v))

i=1 τ(pαi(v))

and so again τ Lk i=1 pαi(v) Lk i=1 τ(pαi(v)).

Second, let us consider product nodes. If (Multiplicative) holds, then we immediately have that τ Nk i=1 pαi(vars(α)) Nk i=1 τ(pαi(vars(α))) by associativity of . Otherwise, if (Prod 0/1) holds, then given any v Assign(vars(α)), there is at most one child, say αj, such that pαj(v) {0S, 1S}. Thus, we have that:

i=1 pαi(v) = τ pαj(v) k O

i=1,i =j pαi(v)

= τ pαj(v) τ k O

i=1,i =j pαi(v)

= τ pαj(v) k O

i=1,i =j τ(pαi(v)

i=1 τ(pαi(v))

The second equality follows because Nk i=1,i =j pαi(v) {0S, 1S}, and we have that τ(a 0S) =

0S = τ(a) τ(0S) and τ(a 1S) = 1S = τ(a) τ(1S) for any a S. The third equality follows as both τ Nk i=1,i =j pαi(v) and Nk i=1,i =j τ(pαi(v)) are equal to 1S iff no pαi(v) is 0S. Thus,

we have that τ Nk i=1 pαi(v) Nk i=1 τ(pαi(v)).

By applying these identities recursively to sum and product nodes, and assuming that τ can be applied tractably to input nodes, we obtain a circuit C such that p C (V ) τ(p C(V )).

A.4 Tractable Composition of operators

Theorem 5 (Composability Conditions). The results in Table 1 hold.

If the Input Circuit(s) are ... Tractability Conditions X-Det X-Cmp w/ Cother X-SCmp w/ Cother Complexity Then the Output Circuit is ...

Aggregation (W ) Sm AND Dec X-Det if W X = (5.1)

X-Cmp w/ Cother if W X = (5.5)

X-SCmp w/ Cother if W X = (5.9) O(|C|)

Product Cmp X-Det (5.2) X-Cmp w/ Cother (5.6) N/A O(|C||C |)

SCmp X-Det (5.3) X-Cmp w/ Cother (5.7) X-SCmp w/ Cother (5.10) O(max(|C|, |C |))

Elem. Mapping (Sm AND Dec) AND (Add OR Det) AND (Mult OR Prod01)

X-Det (5.4) X-Cmp w/ Cother (5.8) X-SCmp w/ Cother (5.11) O(|C|)

Table 3: Tractability Conditions for Operations on Algebraic Circuits. Sm: Smoothness, Dec: Decomposability; X-Det(erminism), X-Cmp: X-Compatibility, X-SCmp: X-Support-Compatibility.

Proof. We look at each property in turn, and show that they are maintained under the aggregation, product, and mapping operators as stated in the Table. For convenience, we reproduce the table in Table 3, with each result highlighted with a number that is referenced in the proof below.

X-determinism Suppose that circuit C is X-deterministic; that is, for any sum node α = +k i=1αi C, either (i) vars(α) X = , or else (ii) supp X(αi) supp X(αj) = for all i = j.

(5.1) Consider aggregating with respect to a set of variables W such that W X = . According to Algorithm 1 and the proof of Theorem 1, this produces an output circuit where each node α corresponds to some node α in the original circuit, such that pα = L

w vars(α) pα and with scope vars(α) \ W . In particular, for sum nodes α = +k i=1αi C, either vars(α) W , in which case α is an input node (and X-determinism is not applicable), or else α = +k i=1α i is also a sum node, where each α i corresponds to αi. If (i) vars(α) X = , then vars(α ) X = also.

If (ii) supp X(αi) supp X(αj) = for all i = j, we claim that supp X(α i) supp X(αi) for all i. To see this, first note that by smoothness, vars(α i) = vars(α j) = vars(α ). Suppose that xi Assign(X vars(α )) satisfies x supp X(α i). Then there exists yi Assign(vars(α ) \ X) such that pα i(xi, yi) = 0S. Since α i corresponds to αi in the original circuit, we have: M

w Assign(W ) vars(α) pαi(xi, yi, wi) = pα i(x, y) = 0S

This means that there must be some wi Assign(W ) vars(α) such that pαi(x, yi, wi) = 0S (since 0S is the additive identity); thus x supp X(αi). To finish the proof, note that supp X(α i) supp X(αi) and supp X(α l) supp X(αl) are disjoint unless i = l (by X-determinism of α, i.e. supp X(αi) supp X(αl) = unless i = l). Thus (ii) holds for α . In either case, we have shown that α is also X-deterministic.

(5.2) Consider taking the product of two compatible circuits C, C over variables V , V , outputting a circuit C . According to Algorithm 2 and the proof of Theorem 2, every sum node α C corresponds to either the product of (a) an input or product node α C and a sum node α = +k j=1α j C , such that α = +k j=1α j or (b) two sum nodes α = +k i=1αi C and α = +k j=1α j C , such that α = +k i=1 +k j=1 α ij. Further, α and α have the same scope over the common variables V V , i.e. vars(α) (V V ) = vars(α ) (V V ).

Assume that C and C are both X-deterministic; then X V V . We note that since α, α have the same scope over the common variables, they also have the same scope over X, i.e. vars(α) X = vars(α ) X.

In case (a), X-determinism of α means that either (i) vars(α ) X = or (ii) supp X(α i) supp X(α j) = for all i = j. If (i), then vars(α ) X = (vars(α) vars(α )) X = also. If (ii), note that supp X(α j ) supp X(α j) for all j as a 0S = 0S for any semiring S and a S. Thus supp X(α i ) supp X(α j ) = for all i = j. Thus α is X-deterministic.

In case (b), since α, α have the same scope over X, either (i) holds for both α, α , or (ii) holds for both. If (i), then vars(α ) X = (vars(α) vars(α )) X = also. If (ii), then for any i, j, consider the restricted support supp X(α ij). Noting that vars(αi) X = vars(α j) X = vars(α ij)

X by smoothness, we claim that supp X(α ij) supp X(αi) supp X(α j). Suppose that x supp X(α ij). Then there exists some y vars(α ij)\X such that pα ij(x, y) = pαi(x, yvars(αi))\X) pα j(x, yvars(α j)\X) = 0S. This means that both pαi(x, yvars(αi))\X), pα j(x, yvars(α j)\X) cannot be 0S, and so x supp X(αi) and x supp X(α j) also. To finish the proof, we note that supp X(α ij) supp X(αi) supp X(α j) and suppx(α lm) supp X(αl) supp X(α m) are disjoint unless i = l, j = m (by X-determinism of α and α ). Thus α is X-deterministic by (ii).

(5.3) Consider taking the product of two support-compatible circuits C, C over variables V , V , outputting a circuit C . According to Algorithm 3 and the proof of Theorem 3, every sum node α = +k i=1α i C corresponds to some sum nodes α = +k i=1αi C and α = +k i=1α i C such that α = ι(α), pα i = pαi pα i, and has scope vars(α) vars(α ). Further, α and α have the same scope over the common variables V V , i.e. vars(α) (V V ) = vars(α ) (V V ).

Assume that C and C are both X-deterministic; then X V V . We note that since α, α have the same scope over the common variables, they also have the same scope over X, i.e. vars(α) X = vars(α ) X. Thus, either (i) holds for both α, α , or (ii) holds for both. If (i), then vars(α ) X = (vars(α) vars(α )) X = also. If (ii), then for any i, consider the restricted support supp X(α ij). Noting that vars(αi) X = vars(α j) X = vars(α i ) X by smoothness, we claim that supp X(α i ) supp X(αi) supp X(α i). Suppose that x supp X(α i ). Then there exists some y vars(α i ) \ X such that pα i (x, y) = pαi(x, yvars(αi))\X) pα i(x, yvars(α i)\X) = 0S. This means that both pαi(x, yvars(αi))\X), pα i(x, yvars(α i)\X) cannot be 0S, and so x supp X(αi) and x supp X(α i) also. To finish the proof, we note that supp X(α i ) supp X(αi) supp X(α i) and suppx(α l ) supp X(αl) supp X(α l) are disjoint unless i = l (by X-determinism of α and α ). Thus α is X-deterministic by (ii).

(5.4) Consider applying an elementwise mapping τ to a circuit C, outputting a circuit C . According to Algorithm 4 and Theorem 4, every sum node α = +k i=1α i C corresponds to some node α = +k i=1αi C , such that pα = τ(pα), and further pα i = τ(pαi) and vars(α i) = vars(αi) for each i.

Assume that C is X-deterministic. If (i) vars(α) X = , then vars(α )X = also. Otherwise, (ii) supp X(αi) supp X(αj) = for all i = j. We claim that supp X(α i) supp X(αi) for each i. To see this, recall that elementwise mappings satisfy τ(0S) = 0S . If x supp X(α i), then there exists y s.t. pα i(x, y) = 0S . Since pα i(x, y) = τ(pαi(x, y)), pαi(x, y) = 0S. So x supp X(αi). To finish the proof, note that suppx(α i) supp X(αi) and suppx(α l) supp X(αl) are disjoint unless i = l (by X-determinism of α). Thus α is X-deterministic by (ii).

X-compatibility Recall that two smooth and decomposable circuits C, Cother over variables V , Vother are X-compatible for X V Vother if for every product node α = α1 α2 C and αother = αother,1 αother,2 Cother such that vars(α) X = vars(αother) X, it holds that vars(α1) X = vars(αother,1) X and vars(α2) X = vars(αother,2) X.

(5.5) Suppose that C, Cother are X-compatible. We wish to show that Cother, C are X-compatible where C is the output circuit from Algorithm 1 that aggregates C over W , where W X = .

Suppose α = α 1 α 2 C and αother = αother,1 αother,2 Cother are product nodes such that vars(α ) X = vars(αother) X. Let α = α1 α2 be the corresponding node in C such that pα = L

w pα. The scope vars(α ) = vars(α) \ W ; since W X = , we have vars(α) X = vars(αother) X also. Thus, by X-compatibility of C, Cother, we have that vars(α1) X = vars(αother,1) X and vars(α2) X = vars(αother,2) X. Since vars(α 1) = vars(α1) \ W and vars(α 2) = vars(α2) \ W , this means that vars(α 1) X = vars(αother,1) X and vars(α 2) X = vars(αother,2) X. Thus C , Cother are X-compatible.

(5.6) Suppose that C over V and C over V are both X-compatible with Cother. We wish to show that Cother, C are X-compatible where C is the output circuit from Algorithm 2 that computes the product of the two compatible (i.e. (V V )-compatible) circuits C, C .

Suppose α = α 1 α 2 C is a product node, and αother = αother,1 αother,2 Cother such that vars(α ) X = vars(αother) X; we need to show that these decompose in the same way over X. By Algorithm 2 and the proof of Theorem 2, this was created as the product of nodes α = α1 α2 C and α = α 1 α 2 C such that vars(α ) (V V ) = vars(α) (V V ) =

vars(α ) (V V ) (and similarly for their children). Thus by (V V )-compatibility of C, C , α and α decompose the same way over (V V ), i.e. vars(α1) (V V ) = vars(α 1) (V V ) and vars(α2) (V V ) = vars(α 2) (V V ). Since X V V (by definition of compatibility), this also holds over X, i.e. vars(α1) X = vars(α 1) X and vars(α2) X = vars(α 2) X.

Now, since vars(α 1) = vars(α1) vars(α 1) and vars(α 2) = vars(α2) vars(α 2), we have that:

vars(α ) X = (vars(α) X) (vars(α ) X) = vars(α) X

vars(α 1) X = (vars(α1) X) (vars(α 1) X) = vars(α1) X

vars(α 2) X = (vars(α2) X) (vars(α 2) X) = vars(α2) X

By compatibility of C, Cother, we have that vars(αother1) X = vars(α1) X and vars(αother2) X = vars(α2) X. Thus vars(αother1) X = vars(α 1) X and vars(αother2) X = vars(α 2) X. This shows X-compatibility of C , Cother.

Example 4 (Counterexample to (5.6) for Compatibility). While X-compatibility is maintained through multiplying compatible circuits, the same is not true for compatibility, due to the different variable overlaps between the circuits. For example, suppose that C over variable sets A, B, C has product nodes with scope decomposing as α = α1(A) α2(B C), and C over variable sets A, B, D has product nodes with scope decomposing as α = α 1(A) α 2(B D). Then these circuits are compatible (i.e. A B-compatible), and their product is a circuit with product nodes with scope decomposing as α = α 1(A) α 2(B C D). Now consider Cother with product nodes with scope decomposing as αother = αother(C) αother(D). This is compatible with α and α , but not with α .

(5.7) This holds by the same argument as (5.6).

(5.8) The circuit C obtained by applying an elementwise mapping to C does not change the scopes of any node. Thus, if C is compatible with Cother, then C is also compatible with Cother.

X-support-compatibility Recall that two smooth and decomposable circuits C, Cother over variables V , Vother are X-support-compatible for X V Vother if there is an isomorphism ι between the nodes C[X] and Cother[X], such that:

For any node α C[X], vars(α) X = vars(ι(α)) X;

For all sum nodes α = +k i=1αi C[X], we have that supp X(αi) supp X(ι(αj)) = whenever i = j.

(5.9) Suppose that C, Cother are X-support-compatible; and let ιCother,C be the isomorphism from Cother[X] to C[X]. We wish to show that Cother, C are X-support-compatible where C is the output circuit from Algorithm 1 that aggregates C over W , where W X = .

We define the isomorphism as follows. Consider the set of nodes C [X]. Since W X = , these nodes are not scalars and so are not propagated away by Lines 3-4. Moreover, since the algorithm retains the node types and connectivity of the circuit, there is an isomorphism ιC,C between C[X] and C [X]. There is thus an isomorphism ιCother,C := ιC,C ιCother,C between Cother[X] and C [X]. It remains to show the two conditions.

Given a node αother Cother, let us write α := ιCother,C(αother) and α := ιC,C (α). By Xsupport compatibility of Cother, C, we have that vars(αother) X = vars(α) X. By the proof of Theorem 1, we know that vars(α ) = vars(α) \ W . Since W X = , this implies that vars(αother) X = vars(α ) X as required. For the second part, suppose that these are sum nodes, i.e. αother = +k i=1αother,i, α = +k i=1αi and α = +k i=1α i. We know by X-support-compatibility that supp X(αother,i) supp X(αj) = whenever i = j. By the same argument as in (5.1), we have that supp X(α i) supp X(αi) for all i. Thus we can conclude that supp X(αother,i) supp X(α j) = whenever i = j. So Cother, C are X-support-compatible.

(5.10) Suppose that C over V and C over V are both X-support-compatible with Cother; write ιCother,C for the isomorphism from Cother[X] to C, and ιCother,C for the isomorphism from Cother[X] to C . We wish to show that Cother, C are X-support-compatible where C is the output circuit

from Algorithm 3 that computes the product of the two support-compatible (i.e. (V V )-supportcompatible) circuits C, C .

We define the isomorphism as follows. Consider the set of nodes C [X]. The algorithm for multiplying C, C makes use of the isomorphism ιC,C between C[V V ] and C [V V ], with C [V V ] retaining the same connectivity and node types; thus there is an isomorphism ιC,C from C[V V ] to C [V V ], also. Since X (V V ), this isomorphism also holds between the circuits restricted to X. Thus, we define the isomorphism ι = ιC,C ιCother,C between Cother[X] and C [X]. It remains to show the two conditions.

Given a node αother Cother, let us write α := ιCother,C(αother), α = ιC,C (α) and α := ιC,C (α). By X-support-compatibility of Cother, C, we have that vars(αother) X = vars(α) X. By support-compatibility of C, C , we have that vars(α) (V V ) = vars(α ) (V V ) and so vars(α) X = vars(α ) X, and both are equal to vars(α ) X since vars(α ) = vars(α) vars(α ) (as in Theorem 3). Thus vars(αother) X = vars(α ) X as required. For the second part, suppose that these are sum nodes, i.e. αother = +k i=1αother,i, α = +k i=1αi, α = +k i=1α i and α = +k i=1α i . We know by X-support-compatibility that supp X(αother,i) supp X(αj) = whenever i = j. By the same argument as in (5.3), we have that supp X(α ) supp X(α) supp X(α ). Thus we can conclude that supp X(αother,i) supp X(α ) = . So Cother, C are X-support-compatible.

(5.11) Suppose that C, Cother are X-support-compatible; and let ιCother,C be the isomorphism from Cother[X] to C[X]. We wish to show that Cother, C are X-support-compatible where C is the output circuit from Algorithm 4 that applies an elementwise mapping τ to C. Algorithm 4 maps each node α C to another node α C, keeping the node type and connectivity; this defines an isomorphism ιC,C from C[X] to C [X]. Thus we have an isomorphism ιCother,C := ιC,C ιCother,C. It remains to show the two conditions.

Given a node αother Cother, let us write α := ιC0,C(αother) and α := ιC,C (α). By Xsupport-compatibility of Cother, C, we have that vars(αother) X = vars(α) X. The mapping algorithm does not change the scope of the nodes, i.e. vars(α ) = vars(α), so we have that vars(αother) X = vars(α ) X as required. For the second part, suppose that these are sum nodes, i.e. αother = +k i=1αother,i, α = +k i=1αi and α = +k i=1α i. We know by X-support-compatibility that supp X(αother,i) supp X(αj) = whenever i = j. We know by the same argument as in (5.4) that supp X(α i) supp X(αi) for all i. Thus we can conclude that supp X(αother,i) supp X(α j) = whenever i = j. So Cother, C are X-support-compatible.

Theorem 7 (Hardness of 2AMC with X-firstness). 2AMC is #P-hard, even for circuits that are smooth, decomposable, deterministic, and X-first, and a constant-time elementwise mapping.

Proof. Take a DNF ϕ with terms ϕ1, . . . , ϕm over variables X1, . . . , Xn. Let l = log m + 1. Let us construct another DNF ϕ with terms ϕ 1, . . . , ϕ m over variables X1 . . . , Xn and Y1, . . . , Yl+1 such that each ϕ i is the conjunction of ϕi, Yl+1 and a term over Y1, . . . , Yl encoding a binary representation of i. For example:

ϕ 5 = ϕ5 Y1 Y2 Y3 Y4 Yl Yl+1.

Now, efficiently manipulate ϕ to make it smooth [15]. The circuit ϕ is thus smooth, decomposable, deterministic and trivially satisfies X-firstness (since the children to every -gate are literals). Take the probability semiring as SX, and SY = (N2, +2, 2, (0, 0), (1, 1)) and τ((n1, n2)) = n1/n2 (define 0/0 = 0). Also, define ω(x) = 1, and ω (Yl+1 = 0) = (0, 1) and ω (y) = 1 for all other literals. Then 2AMC counts the models of ϕ, which is #P-hard [46]:

y:yl+1=1 ϕ (x, y)

y ϕ (x, y) = X

where we assume 0/0 = 0. The last equality follows because the circuit is deterministic (hence P

y ϕ (x, y) = maxy ϕ(x, y) 1) and logically equivalent to ϕ (i.e., x : ϕ(x) = 1 y : ϕ (x, y) = 1).

Theorem 8 (Tractability Conditions for 2AMC). Every 2AMC instance is tractable in O(|C|) time for Boolean circuits that are smooth, decomposable, deterministic, X-first, and X-deterministic.

(a) HMM graphical model

0.2 0.8 0.6 0.4

C2(0) C2(1)

(b) Circuit

{X1, Y1} X2: n Y2: n

Xn 1: n Yn 1: n

{Xn 1, Yn 1} {Xn, Yn}

JXi = j K +

JYi = 0K JYi = 1K

(d) Component

Figure 4: Illustration of PC computing hidden Markov model (HMM)

Algorithm 5: 2AMC Input: Decomposable, smooth, deterministic, X-first and X-deterministic logic circuit C over X Y , weight circuits ωX, ωY , semirings SX, SY , mapping function τSY SX Output: 2AMC value (scalar in semiring SX)

1 CSY (X, Y ) MAPPING(C(X, Y ); J KB SY )

2 CSY ,ωY (X, Y ) PROD-CMP(CSY (X, Y ), ωY )

3 CSY ,ωY (X) AGG(CSY ,ωY (X, Y ); Y )

4 CSX,ωY (X) MAPPING(CSY ,ωY (X); τSY SX)

5 CSX,ωY ,ωX(X) PROD-CMP(CSX(X), ωX)

Result: AGG(CSX,ωY ,ωX(X); X)

Proof. In Algorithm 5, we show the algorithm for 2AMC, which is simply a composition of aggregations, products, and elementwise mappings. To show tractability of 2AMC, we simply need to show that the input circuits to each of these operators satisfy the requisite tractability conditions.

We start with a smooth, decomposable, deterministic, X-deterministic, and X-first circuit C(X, Y ).

In line 1, we use the support mapping (Definition 6) from the Boolean to SY semiring; this is tractable by Corollary 1 due to determinism, and the output CSY (X, Y ) retains all the properties by Table 3.

In line 2, we take the product of CSY (X, Y ) and ωX(X). ωX is omni-compatible, so we can apply PROD-CMP. This results in a circuit CSY ,ωY (X, Y ) that is smooth, decomposable and X-first. ωX(X) is both deterministic and X-deterministic as it has no sum nodes, so this output circuit is also deterministic and X-deterministic by (5.2).

In line 3, we aggregate CSY ,ωY (X, Y ) over Y . The output circuit CSY ,ωY (X) is smooth and decomposable. It is also X-deterministic by (5.1), as Y X = .

Since CSY ,ωY (X, Y ) satisfied X-firstness, each product node α = α1 α2 in that circuit had at most one child (say α1) with scope overlapping with Y . Then, in the product in the previous step, α2 must have been produced through Lines 1-2 (otherwise it would contain some variable in Y ); thus it was produced by applying J KB SY to some node in C. Thus, for any value v Assign(α2), pα2 {0SY , 1SY }. So (Prod 0/1) is satisfied.

In line 4, we apply the mapping τSY SX to CSY ,ωY (X). This circuit is over X and is X-deterministic, i.e. deterministic and satisfies (Additive). As shown in the previous step, it also satisfies (Prod 0/1). Thus the mapping algorithm produces the correct result, producing a smooth, decomposable and determinsitic circuit CSX,ωY (X) as output.

In line 5, we take the product of CSX,ωY (X) with ωX(X). ωX is omni-compatible so we can apply PROD-CMP, producing a circuit CSX,ωY ,ωX that is smooth and decomposable (and also deterministic).

Finally, we aggregate CSX,ωY ,ωX(X) over X, producing a scalar.

Theorem 9 (Exponential Separation). Given sets of variables X = {X1, ..., Xn}, Y = {Y1, ..., Yn}, there exists a smooth, decomposable and X-deterministic circuit C of size poly(n) such that the smallest smooth, decomposable, and X-first circuit C such that p C p C has size 2Ω(n).

Proof. Consider representing the distribution given by a hidden Markov model (HMM) over (hidden) variables X n = {X1, ..., Xn} and (observed) variables Y n = {Y1, ..., Yn}, as depicted in Figure 4a. Figure 4b shows a structured decomposable circuit that computes the hidden Markov model distribution, where the components Ci(j) have scope {Xi, Yi}. The corresponding vtree/scopedecomposition (with nodes notated using their scopes) is shown in Figure 4c. It can easily be checked that the circuit is X n-deterministic, and that the circuit size is linear in n.

It remains to show that the smallest X n-first and X n-deterministic circuit computing the HMM distribution is exponential in size. Explicitly, we will choose a HMM such that the emission distribution is given by p(Yi|Xi) = 1Yi=Xi. Then we have that p C (x n, Y n) = p C (x n)p C (Y n|x n) = p C (x n)1Y n=x n, for any circuit C that expresses the distribution of the HMM.

Consider any such circuit C . Then, let α = {α1, ..., αK} be the set of nodes with scope Y n in the circuit. We will need the following lemma:

Lemma 2. For any value x n of X n, there exists constants c1, .., c K R 0 such that:

p C (x n, Y n)

k=1 ckpαk(Y n) (6)

In other words, the output of the circuit is a linear function of the nodes with scope Y n.

Proof. We show this proof by bottom-up induction (child before parent), for the set of nodes whose scope contains Y n:

Input node: If the scope is Y n, then it must be some node αk α; then we take ck = 1 and ck = 0 for all k = k.

Sum node: By smoothness, all the children must have the same scope (containing Y n). The sum node is then just a linear combination of its children, so the result holds by the inductive hypothesis.

Product node P: Let P1, P2 be the children of P. By X n-firstness, either both children are pure (have scope entirely contained in X n or Y n), or one of them is pure, and the scope of the other one (say P1) contains Y n.

In the first case, if there is exactly one node (say P1), with scope contained in Y n, then it must have scope exactly Y n. Then we have that:

p P (x n, Y n) = p P1(Y n)p P2(x n vars(P2))

p P2(x n vars(P2)) here is a constant, so by the inductive hypothesis we are done. If both nodes have scope contained in Y n, then P is in α, say P = αk. Then we set ck = 1 and ck = 0 for k = k.

In the second case, we have that:

p P (x n, Y n) = p P1(x n vars(P1), Y n)p P2(x n vars(P2))

Here p P2(x n vars(P2)) is a constant, so by the inductive hypothesis we are done.

Note that X n-firstness was crucial to avoid the case where a product has two mixed nodes (containing variables in X n and Y n) as children.

For any k = 1, .., K, define vk R2n 0 to be the vector with entries vk,i = αk(i) (where we interpret i as a value of Y n). Then we have the following Corollary:

Corollary 2. The set of vectors {v1, ..., v K} forms a spanning set for R2n.

Proof. By the Lemma and the fact that C expresses the HMM distribution, we have that for any x n {0, 1}n, there exists c1, .., ck R 0 such that:

p C (x n)1Y n=x n

k=1 ckpαk(Y n)

Rearranging, and writing in vector form, we have:

ck p C (x n)vk

where ex n R2n 0 is the standard basis vector corresponding to the value x n. Thus {v1, ..., v K} is a spanning set.

Any spanning set for R2n must contain at least 2n elements. Thus, K 2n, and the circuit C must be exponentially sized.

One might attempt to remedy the situation by replacing X-firstness with X-determinism. For the general case, that however is insufficient: Theorem 10 (Hardness of 2AMC with X-determinism). 2AMC is #P-hard even for decomposable, smooth, deterministic and X-deterministic circuits, and a constant-time elementwise transformation function.

Proof. By reduction from the counting version of number partitioning: Given positive integers k1, . . . , kn, count the number of index sets S {1, . . . , n} such that P

i S ki = c. That problem is known to be #P-hard [47]. Define ϕ = Vn i=1(Xi Yi). Then ϕ is a deterministic, X-deterministic, decomposable and smooth circuit.4 Let the inner labeling function be ω (yi) = ki/c and ω ( yi) = 1. Then for a fixed configuration x of the variables X = {X1, . . . , Xn}, we have exactly one model for ϕ, whose value is i:xi=1ki/c. If we select the inner semiring so that is addition (e.g., the max tropical semiring or log semiring), then the inner AMC problem returns P

i:xi=1 ki/c, which equals 1 iff S = {i : xi = 1} is a solution to the number partitioning instance. Now, define the outer labeling function to be ω = 1, and let the transformation function be τ(s) = 1 if s = 1 and τ(s) = 0 otherwise. Then the 2AMC problem with the probability semiring as outer semiring counts the number of solutions of the number partitioning instance.

4While this circuit is not X-first, it does satisfy a property known as X-firstness modulo definability [29]; thus that property is insufficient for 2AMC even together with X-determinism.

Table 4: Tractability Conditions and Complexity for Compositional Inference Problems. We denote new results with an asterisk.

Problem Tractability Conditions Complexity

2AMC PASP (Max-Credal) Sm, Dec, X-Det O(|C|) PASP (Max Ent) , MMAP Sm, Dec, Det, X-Det O(|C|) SDP Sm, Dec, Det, X-Det, X-First O(|C|)

Causal Inference Backdoor Sm, Dec, SD, (X Z)-Det O(|C|2) Sm, Dec, Z-Det, (X Z)-Det O(|C|) Frontdoor Sm, Dec, SD, X-Det, (X Z)-Det O(|C|2)

Other MFE Sm, Dec, H-Det, I -Det, (H I )-Det O(|C|) Reverse-MAP Sm, Dec, X-Det O(|C|)

B Case Studies

In this section, we provide more details about the compositional inference problems in Table 2 (reproduced in Table 4) for convenience, and prove the tractability conditions for each (Theorem 6). For all of them, we assume that we are given a Boolean formula represented as a circuit. That would usually come from knowledge compilation from some source language such as Bayesian Networks [9] or probabilistic logic programs [24]; our results thus show what properties the compiled circuit must have in order a query of interest to be tractable. Note that the problems are generally computationally hard [19, 10] on the source language, which means there do not exist compact circuits satsifying the properties in the worst-case.

Theorem 6 (Tractability of Compositional Queries). The results in Table 2 hold.

B.1 2AMC Queries

Firstly, we consider instances of 2AMC queries. Recall the general form of a 2AMC query. Given a partition of the variables V = (X, Y ), a Boolean function ϕ(X, Y ), outer and inner semirings SX, SY , labeling functions ωY (Y ) = N

Yi Y ωY ,i(Yi) over S and ωX(X) = N

Xi X ωX,i(Xi) over S , and an elementwise mapping τSY SX : SY SX, the 2AMC problem is given by:

y Jϕ(x, y)KB SY ω(y) ω (x) (1, revisited)

By Theorem 8, any 2AMC problem is tractable if ϕ is given as a smooth, decomposable, deterministic, X-deterministic, and X-first circuit C. However, in some instances, we can relax these conditions, as we show shortly.

B.1.1 Marginal MAP

In the Marginal Maximum A Posteriori inference (MMAP), we are given a Boolean function ϕ(V ), a (unnormalized) fully factorized distribution p(V ) = Q

i pi(Vi), a partition X Y = V and some evidence e on E V . The goal is to compute the probability of the maximum probability assignment of X consistent with e:

max x p(X = x, E = e) = max x

y|=ϕ(x,Y ) e

To cast it as a 2AMC problem, take the inner semiring SY to be the probability semiring and define the inner labelling function to assign ωY (Yi) = 0 if Yi E and Yi is inconsistent with e and ωY (Yi) = pi(Yi) otherwise. The outer semiring is the (max, ) semiring with labeling function ωX(Xi) = 1. The elementwise mapping function τSY SX(a) = a is the identity function.

The proof of the tractability conditions follows Theorem 8, except that we note that the mapping function τSY SX from the outer to inner semiring satisifies (Multiplicative). As such, we do not need the (Prod 0/1) circuit property, which was the reason we needed the X-firstness condition.

B.1.2 Probabilistic Answer Set Programming (PASP)

The Probabilistic Answer Set Programming Inference (PASP) query takes a Boolean formula ϕ(V ), a partition X Y = V , a (unnormalized) fully factorized distribution p(X) = Q

i p(Xi), and query variable and value {Q = q}, for some Q V . The goal is to compute:

p(Q = q) = X

y|=ϕ(x,Y ) q p (y|x).

The function p (Y |X) depends on the semantics adopted. Let mod(Y |X) := {y : ϕ(X, y)} be the set of assignments of Y such that ϕ(X, ) is true. In the Maximum Entropy Semantics (Max Ent) [6, 51, 45], one distributes the probability mass p(X) uniformly over the models of ϕ consistent with X, i.e. p (y|X) = 1 |mod(Y |X)|. On the other hand, in the Credal Semantics [33, 14] (Max-Credal), one places all probability mass p(X) on some assignment y of Y consistent with X and q. To obtain an upper bound on the query probability regardless of which y is chosen, one sets p (y|X) := 1 for all y if there exists an assignment Y |= ϕ(X, Y ) q, and p (Y |X) = 0 otherwise.

The 2AMC formulation of the problem uses the probability semiring as outer semiring SX, with labeling function ωX(Xi) = p(Xi) for Xi X.

In the (Max Ent) semantics, for the inner semiring, we take as the semiring of pairs of naturals SY = (N2, +, , (0, 0), (1, 1)), with coordinatewise addition and multiplication. The inner labeling function sets ωY (Q) = (1Q=q, 1), and sets ωY (Yi) = (1, 1) for all other variables Yi Y . The mapping function is defined by τSY SX((a, b)) = a/b (with 0/0 = 0). In the (Max-Credal) semantics, we simply set the inner semiring to be the Boolean semiring

SY = B. The inner labeling function sets ωY (Q) = if Q = q otherwise, and sets ωY (Yi) =

for all other variables Yi Y . The mapping function is defined by τSY SX(a) = Ja KSY SX.

As with marginal MAP, we can see that in both cases, the mapping function τSY SX satisfies (Multiplicative), so X-firstness of the circuit is not required. In particular, for (Max Ent) we have τSY SX((a, b) (c, d)) = τSY SX((a c, b d)) = a c

d = τSY SX(a, b) τSY SX(c, d) = τSY SX(a, b) τSY SX(c, d) (this holds also if (a, b) = (0, 0) and/or (c, d) = (0, 0)). Meanwhile, for (Max-Credal) we have τSY SX(a b) = τSY SX(a b) = Ja b KSY SX = Ja KSY SX Jb KSY SX = τSY SX(a) τSY SX(b) = τSY SX(a) τSY SX(b).

For the (Max-Credal) semantics, we note additionally since SY is just the Boolean semiring, we do not need determinism in Line 1 of Algorithm 5. So the only conditions required are smoothness, decomposability, and X-determinism.

B.1.3 Same-Decision Probability

In the Same Decision Probability (SDP) query [37], we are given a Boolean formula ϕ(V ), a fully factorized distribution p(V ) = Q

i p(Vi), a partition X, {Y } of V , a query {Y = y}, some evidence e on a subset E X of variables and a threshold value T (0, 1]. The goal is to compute a confidence measure on some threshold-based classification made with the underlying probabilistic model: X

x p(x|e)1p(Y =y|x,e) T ,

To cast this as a 2AMC instance, we use the inner semiring S = (R2 0, +, , (0, 0), (1, 1)), with coordinate-wise addition and multiplication. The inner labeling function assigns ωY (Y ) = (p(Y )1Y =y, p(Y )). The outer semiring is the probability semiring and the mapping τSY SX from inner to outer semirings is τSY SX((a, b)) = [[a b T]]. Last, the outer labeling function assigns ωX(Xi) = 1Xi|=e if Xi E, and ωX(Xi) = p(Xi) otherwise.

Unlike marginal MAP and PASP inference, there is no special structure in SDP that allows us to relax the general tractability conditions for 2AMC. However, it is still a 2AMC instance, and we have the tractability conditions from Theorem 8. In particular this justifies the use of X-constrained sentential decision diagrams for this problem.

B.2 Causal Inference

In Section 4.2, we discussed computing causal interventional distributions. In particular, in the backdoor and frontdoor cases, we had the following formulae:

p(y|do(x)) = X

z p(z)p(y|x, z), (2)

p(y|do(x))= X

x p(x )p(y|x , z). (3)

B.2.1 Backdoor query

The backdoor query can be written as a compositional query as follows:

BACKDOOR(p; x, y) := M

x,y p(v) p(v) τ 1 M

y p(v) . (7)

where V = (X, Y , Z), and τ 1(a) = a 1 if a = 0 0 if a = 0. Note that τ 1 satisfies (Multiplicative),

and so for this mapping to be tractable we just need the circuit it is applied to to be deterministic.

Assume that p(V ) is given as a smooth, structured decomposable, and (X Z)-deterministic circuit (over the probabilistic semiring). We now show that this query is tractable, by showing that each operator in the composition is tractable. For readability, we label each circuit constructed with the function that it represents ( boxed ).

p(X, Z) C1(X, Z) := AGG(C, Y ) is tractable by smoothness and decomposability. By (5.1) in Table 3, since Y (X Z) = , C1 is (X Z)-deterministic (i.e. deterministic).

1 p(X,Z) C2(X, Z) := MAPPING(C1, τ 1) is tractable since C1 is deterministic.

p(Y |X, Z) C3(X, Y , Z) := PROD-SCMP(C(X, Y , Z), C2(X, Z)). C is (X Z)- support-compatible with itself as it is (X Z)-deterministic = C is also (X Z)- support-compatible with C1 by (5.9) = C is also (X Z)-support-compatible with C2 by (5.11). As C and C2 share variables (X Z), this means they are support-compatible. Thus this product is tractable in linear time.

p(Z) C4(Z) := AGG(C, X Y ) is tractable by smoothness and decomposability.

p(Z)p(Y |X, Z) C5(X, Y , Z) := PROD-CMP(C4, C3). C is V -compatible with itself (structured decomposable) = C is Z-compatible with itself by Proposition 1 = C is also Z-compatible with C4 by (5.5) = C4 is Z-compatible with C1 by (5.5) = C4 is Z-compatible with C2 by (5.8) = C4 is Z-compatible with C3 by (5.6). Since C4 and C3 share variables Z, this means they are compatible and so this product is tractable in quadratic time.

z p(z)p(Y |X, z) C6(X, Y ) = AGG(C5, Z) is tractable by smoothness and decomposability.

Thus, we have recovered the tractability conditions derived by [49], with the same complexity of O(|C|2) (induced by the compatible product to construct C5). However, we also have an alternative tractability condition. Suppose that C were additionally Z-deterministic, but not necessarily structured decomposable. Then we could replace the derivation of C5 above with the following:

p(Z)p(Y |X, Z) C5(X, Y , Z) := PROD-SCMP(C4, C3). C is Z-support-compatible with itself as it is Z-deterministic = C is also Z-support-compatible with C4 by (5.9) = C4 is Z-support-compatible with C1 by (5.9) = C4 is Z-compatible with C2 by (5.11) = C4 is Z-compatible with C3 by (5.10). Since C4 and C3 share variables Z, this means they are compatible and so this product is tractable in linear time.

In this case, the overall complexity is also reduced to O(|C|).

B.2.2 Frontdoor query

Now, consider the frontdoor case. In this case, we have the following compositional query:

FRONTDOOR(p; x, y, z) = M

y p(v) τ 1 M

y,z p(v) BACKDOOR(p; z, y) (8)

Assume that p(V ) is given as a smooth, structured decomposable, X-deterministic, and (X Z)- deterministic circuit (over the probabilistic semiring). We continue the analysis from the backdoor case:

p(X) C7(X) := AGG(C, Y Z) is tractable by smoothness and decomposability. By (5.1) in Table 3, since (Y Z) X = , C7 is X-deterministic (i.e. deterministic).

1 p(X) C8(X) := MAPPING(C7, τ 1) is tractable since C7 is deterministic.

p(Z|X) C9(X, Z) := PROD-SCMP(C8, C1). C is X-support-compatible with itself as it is X-deterministic = C is X-support-compatible with C1 by (5.9) = C1 is X-support-compatible with C7 by (5.9) = C1 is X-support-compatible with C8 by (5.11). Thus this product is tractable in linear time.

x p(x)p(Y |x, Z) C10(Y , Z). This is just like C6, but with variables X and Z swapped. Thus it is tractable for a smooth, X-deterministic and (X Z)-deterministic circuit in linear time.

x p(x )p(Y |x , Z) C11(X, Y , Z) := PROD-CMP(C9, C10). We can chain applications of (5.5), (5.7) and (5.8) in a similar way to the other steps to show that C9, C10 are Z-compatible (i.e. compatible), so this product is tractable in quadratic time.

x p(x )p(Y |x , z) C12(X, Y ) := AGG(C11; Z). This is tractable by smoothness and decomposability.

Thus, this algorithm has complexity O(|C|2), as opposed to the O(|C|3) complexity algorithm in [49]. The key difference is that we exploit support compatibility for a linear time product when constructing C10.

B.3 Other Problems

B.3.1 Most Frugal Explanation

In [31], the most frugal explanation (MFE) query was introduced. Given a partition of variables V into (H, I+, I , E), some evidence e Assign(E), and a probability distribution p(V ), the MFE query asks for the following:

i 1[h arg max h p(h , i , e)] (9)

In words, we want the explanation (assignment to H) that is the most probable for the most number of assignments to I , when I+ is marginalized out. We can rewrite as follows:

i 1 p(h, i , e) maxh p(h , i , e) = 1 (10)

This can be written as a compositional query as follows.

h τS S (p(h , i , e))

p(h, i , e)

where S is the probability semiring, S is the (max, )-semiring, S is ([0, 1], +, , 0, 1) (i.e. the probability semiring with domain [0, 1]), and S is the counting semiring (N, +, , 0, 1), and the mapping functions are defined as follows:

τS S (a) = a τS S (a) = a

τ 1(a) = a 1 if a = 0 0 if a = 0

τS S (a) = 1a=1 τS S (a) = a

Suppose we are given a probabilistic circuit representing p(H, I , e). While this query appears extremely intimidating at first glance, we note that the only operators we need to consider are the mappings and single product. Note that all of these mappings satisfy (Multiplicative) (τS S because the domain of S is [0, 1] so τS S (a b) = 1 iff a = b = 1); thus the mappings are tractable if the input circuits are deterministic. By checking the scopes of the inputs to each mapping, we can see that (H I )-determinism, I -determinism, and H-determinism suffices. This also enables tractability of the product in linear time by support compatibility.

No tractability conditions for exact inference for this query were previously known. While the motivation behind the MFE query is as a means of approximating marginal MAP, and so this exact algorithm is not practically useful in this case, this example illustrates the power of the compositional framework to tackle even very complex queries.

B.3.2 Reverse MAP

Recently, in [27], the reverse-MAP query was introduced, defined by:

max X p(e1|X, e2) (12)

where the variables are partitioned as V = (E1, E2, X, H). In our compositional framework, this can be written as: M

h p(e1, x, e2, h) τ 1 M

h,e 1 p(e 1, x, e2, h) (13)

Here, the mapping τ 1 is tractable if the circuit for p is X-deterministic. Since p is X-deterministic, it is X-support-compatible with itself; chaining this with (5.9) and (5.11) in Table 3, the inputs to the product are X-compatible; since they have scope X, this means the product is tractable by support-compatibility. The resulting circuit remains X-deterministic (i.e. deterministic as the scope is X), which means that the mapping τP M from the probability to (max, ) semiring is tractable. Thus, this query is tractable for smooth, decomposable and X-deterministic circuits in linear time (same as derived by the authors).

Neur IPS Paper Checklist

Question: Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope? Answer: [Yes] Justification: The abstract and introduction reference the results in the rest of the paper. Guidelines:

The answer NA means that the abstract and introduction do not include the claims made in the paper. The abstract and/or introduction should clearly state the claims made, including the contributions made in the paper and important assumptions and limitations. A No or NA answer to this question will not be perceived well by the reviewers. The claims made should match theoretical and experimental results, and reflect how much the results can be expected to generalize to other settings. It is fine to include aspirational goals as motivation as long as it is clear that these goals are not attained by the paper. 2. Limitations

Question: Does the paper discuss the limitations of the work performed by the authors? Answer: [Yes] Justification: Yes, in the conclusion we discuss the fact that while our results provide simple and general sufficient tractability conditions, these are not necessary conditions. Guidelines:

The answer NA means that the paper has no limitation while the answer No means that the paper has limitations, but those are not discussed in the paper. The authors are encouraged to create a separate "Limitations" section in their paper. The paper should point out any strong assumptions and how robust the results are to violations of these assumptions (e.g., independence assumptions, noiseless settings, model well-specification, asymptotic approximations only holding locally). The authors should reflect on how these assumptions might be violated in practice and what the implications would be. The authors should reflect on the scope of the claims made, e.g., if the approach was only tested on a few datasets or with a few runs. In general, empirical results often depend on implicit assumptions, which should be articulated. The authors should reflect on the factors that influence the performance of the approach. For example, a facial recognition algorithm may perform poorly when image resolution is low or images are taken in low lighting. Or a speech-to-text system might not be used reliably to provide closed captions for online lectures because it fails to handle technical jargon. The authors should discuss the computational efficiency of the proposed algorithms and how they scale with dataset size. If applicable, the authors should discuss possible limitations of their approach to address problems of privacy and fairness. While the authors might fear that complete honesty about limitations might be used by reviewers as grounds for rejection, a worse outcome might be that reviewers discover limitations that aren t acknowledged in the paper. The authors should use their best judgment and recognize that individual actions in favor of transparency play an important role in developing norms that preserve the integrity of the community. Reviewers will be specifically instructed to not penalize honesty concerning limitations. 3. Theory Assumptions and Proofs

Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof? Answer: [Yes]

Justification: Yes, assumptions are laid out in the statements of the results, and proofs are provided in the supplementary material. Guidelines:

The answer NA means that the paper does not include theoretical results. All the theorems, formulas, and proofs in the paper should be numbered and crossreferenced. All assumptions should be clearly stated or referenced in the statement of any theorems. The proofs can either appear in the main paper or the supplemental material, but if they appear in the supplemental material, the authors are encouraged to provide a short proof sketch to provide intuition. Inversely, any informal proof provided in the core of the paper should be complemented by formal proofs provided in appendix or supplemental material. Theorems and Lemmas that the proof relies upon should be properly referenced. 4. Experimental Result Reproducibility

Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)? Answer: [NA] Justification: This paper does not include experiments. Guidelines:

The answer NA means that the paper does not include experiments. If the paper includes experiments, a No answer to this question will not be perceived well by the reviewers: Making the paper reproducible is important, regardless of whether the code and data are provided or not. If the contribution is a dataset and/or model, the authors should describe the steps taken to make their results reproducible or verifiable. Depending on the contribution, reproducibility can be accomplished in various ways. For example, if the contribution is a novel architecture, describing the architecture fully might suffice, or if the contribution is a specific model and empirical evaluation, it may be necessary to either make it possible for others to replicate the model with the same dataset, or provide access to the model. In general. releasing code and data is often one good way to accomplish this, but reproducibility can also be provided via detailed instructions for how to replicate the results, access to a hosted model (e.g., in the case of a large language model), releasing of a model checkpoint, or other means that are appropriate to the research performed. While Neur IPS does not require releasing code, the conference does require all submissions to provide some reasonable avenue for reproducibility, which may depend on the nature of the contribution. For example (a) If the contribution is primarily a new algorithm, the paper should make it clear how to reproduce that algorithm. (b) If the contribution is primarily a new model architecture, the paper should describe the architecture clearly and fully. (c) If the contribution is a new model (e.g., a large language model), then there should either be a way to access this model for reproducing the results or a way to reproduce the model (e.g., with an open-source dataset or instructions for how to construct the dataset). (d) We recognize that reproducibility may be tricky in some cases, in which case authors are welcome to describe the particular way they provide for reproducibility. In the case of closed-source models, it may be that access to the model is limited in some way (e.g., to registered users), but it should be possible for other researchers to have some path to reproducing or verifying the results. 5. Open access to data and code

Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material?

Answer: [NA] Justification: This paper does not include experiments requiring code. Guidelines:

The answer NA means that paper does not include experiments requiring code. Please see the Neur IPS code and data submission guidelines (https://nips.cc/ public/guides/Code Submission Policy) for more details. While we encourage the release of code and data, we understand that this might not be possible, so No is an acceptable answer. Papers cannot be rejected simply for not including code, unless this is central to the contribution (e.g., for a new open-source benchmark). The instructions should contain the exact command and environment needed to run to reproduce the results. See the Neur IPS code and data submission guidelines (https: //nips.cc/public/guides/Code Submission Policy) for more details. The authors should provide instructions on data access and preparation, including how to access the raw data, preprocessed data, intermediate data, and generated data, etc. The authors should provide scripts to reproduce all experimental results for the new proposed method and baselines. If only a subset of experiments are reproducible, they should state which ones are omitted from the script and why. At submission time, to preserve anonymity, the authors should release anonymized versions (if applicable). Providing as much information as possible in supplemental material (appended to the paper) is recommended, but including URLs to data and code is permitted. 6. Experimental Setting/Details

Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to understand the results? Answer: [NA] Justification: This paper does not contain experiments. Guidelines:

The answer NA means that the paper does not include experiments. The experimental setting should be presented in the core of the paper to a level of detail that is necessary to appreciate the results and make sense of them. The full details can be provided either with the code, in appendix, or as supplemental material. 7. Experiment Statistical Significance

Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments? Answer: [NA] Justification: This paper does not contain experiments. Guidelines:

The answer NA means that the paper does not include experiments. The authors should answer "Yes" if the results are accompanied by error bars, confidence intervals, or statistical significance tests, at least for the experiments that support the main claims of the paper. The factors of variability that the error bars are capturing should be clearly stated (for example, train/test split, initialization, random drawing of some parameter, or overall run with given experimental conditions). The method for calculating the error bars should be explained (closed form formula, call to a library function, bootstrap, etc.) The assumptions made should be given (e.g., Normally distributed errors). It should be clear whether the error bar is the standard deviation or the standard error of the mean.

It is OK to report 1-sigma error bars, but one should state it. The authors should preferably report a 2-sigma error bar than state that they have a 96% CI, if the hypothesis of Normality of errors is not verified. For asymmetric distributions, the authors should be careful not to show in tables or figures symmetric error bars that would yield results that are out of range (e.g. negative error rates). If error bars are reported in tables or plots, The authors should explain in the text how they were calculated and reference the corresponding figures or tables in the text.

8. Experiments Compute Resources

Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments?

Answer: [NA]

Justification: This paper does not contain experiments.

Guidelines:

The answer NA means that the paper does not include experiments. The paper should indicate the type of compute workers CPU or GPU, internal cluster, or cloud provider, including relevant memory and storage. The paper should provide the amount of compute required for each of the individual experimental runs as well as estimate the total compute. The paper should disclose whether the full research project required more compute than the experiments reported in the paper (e.g., preliminary or failed experiments that didn t make it into the paper).

9. Code Of Ethics

Question: Does the research conducted in the paper conform, in every respect, with the Neur IPS Code of Ethics https://neurips.cc/public/Ethics Guidelines?

Answer: [Yes]

Justification: The paper does conform to the Neur IPS Code of Ethics.

Guidelines:

The answer NA means that the authors have not reviewed the Neur IPS Code of Ethics. If the authors answer No, they should explain the special circumstances that require a deviation from the Code of Ethics. The authors should make sure to preserve anonymity (e.g., if there is a special consideration due to laws or regulations in their jurisdiction).

10. Broader Impacts

Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed?

Answer: [No]

Justification: This paper presents foundational research that is not tied to any particular application/deployment; there are no particular societal impacts that we feel need to be highlighted.

Guidelines:

The answer NA means that there is no societal impact of the work performed. If the authors answer NA or No, they should explain why their work has no societal impact or why the paper does not address societal impact. Examples of negative societal impacts include potential malicious or unintended uses (e.g., disinformation, generating fake profiles, surveillance), fairness considerations (e.g., deployment of technologies that could make decisions that unfairly impact specific groups), privacy considerations, and security considerations.

The conference expects that many papers will be foundational research and not tied to particular applications, let alone deployments. However, if there is a direct path to any negative applications, the authors should point it out. For example, it is legitimate to point out that an improvement in the quality of generative models could be used to generate deepfakes for disinformation. On the other hand, it is not needed to point out that a generic algorithm for optimizing neural networks could enable people to train models that generate Deepfakes faster. The authors should consider possible harms that could arise when the technology is being used as intended and functioning correctly, harms that could arise when the technology is being used as intended but gives incorrect results, and harms following from (intentional or unintentional) misuse of the technology. If there are negative societal impacts, the authors could also discuss possible mitigation strategies (e.g., gated release of models, providing defenses in addition to attacks, mechanisms for monitoring misuse, mechanisms to monitor how a system learns from feedback over time, improving the efficiency and accessibility of ML).

11. Safeguards

Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)?

Answer: [NA]

Justification: The paper does not propose to release any data or models that would pose such a risk.

Guidelines:

The answer NA means that the paper poses no such risks. Released models that have a high risk for misuse or dual-use should be released with necessary safeguards to allow for controlled use of the model, for example by requiring that users adhere to usage guidelines or restrictions to access the model or implementing safety filters. Datasets that have been scraped from the Internet could pose safety risks. The authors should describe how they avoided releasing unsafe images. We recognize that providing effective safeguards is challenging, and many papers do not require this, but we encourage authors to take this into account and make a best faith effort.

12. Licenses for existing assets

Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected?

Answer: [NA]

Justification: This paper does not use existing assets.

Guidelines:

The answer NA means that the paper does not use existing assets. The authors should cite the original paper that produced the code package or dataset. The authors should state which version of the asset is used and, if possible, include a URL. The name of the license (e.g., CC-BY 4.0) should be included for each asset. For scraped data from a particular source (e.g., website), the copyright and terms of service of that source should be provided. If assets are released, the license, copyright information, and terms of use in the package should be provided. For popular datasets, paperswithcode.com/datasets has curated licenses for some datasets. Their licensing guide can help determine the license of a dataset. For existing datasets that are re-packaged, both the original license and the license of the derived asset (if it has changed) should be provided.

If this information is not available online, the authors are encouraged to reach out to the asset s creators. 13. New Assets

Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets? Answer: [NA] Justification: This paper does not release new assets. Guidelines:

The answer NA means that the paper does not release new assets. Researchers should communicate the details of the dataset/code/model as part of their submissions via structured templates. This includes details about training, license, limitations, etc. The paper should discuss whether and how consent was obtained from people whose asset is used. At submission time, remember to anonymize your assets (if applicable). You can either create an anonymized URL or include an anonymized zip file. 14. Crowdsourcing and Research with Human Subjects

Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)? Answer: [NA] Justification: This paper does not involve crowdsourcing nor research with human subjects. Guidelines:

The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. Including this information in the supplemental material is fine, but if the main contribution of the paper involves human subjects, then as much detail as possible should be included in the main paper. According to the Neur IPS Code of Ethics, workers involved in data collection, curation, or other labor should be paid at least the minimum wage in the country of the data collector. 15. Institutional Review Board (IRB) Approvals or Equivalent for Research with Human Subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or institution) were obtained? Answer: [NA] Justification: This paper does not involve crowdsourcing nor research with human subjects. Guidelines:

The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. Depending on the country in which research is conducted, IRB approval (or equivalent) may be required for any human subjects research. If you obtained IRB approval, you should clearly state this in the paper. We recognize that the procedures for this may vary significantly between institutions and locations, and we expect authors to adhere to the Neur IPS Code of Ethics and the guidelines for their institution. For initial submissions, do not include any information that would break anonymity (if applicable), such as the institution conducting the review.