# on_relaxing_determinism_in_arithmetic_circuits__b7d6c3b7.pdf On Relaxing Determinism in Arithmetic Circuits Arthur Choi 1 Adnan Darwiche 1 The past decade has seen a significant interest in learning tractable probabilistic representations. Arithmetic circuits (ACs) were among the first proposed tractable representations, with some subsequent representations being instances of ACs with weaker or stronger properties. In this paper, we provide a formal basis under which variants on ACs can be compared, and where the precise roles and semantics of their various properties can be made more transparent. This allows us to place some recent developments on ACs in a clearer perspective and to also derive new results for ACs. This includes an exponential separation between ACs with and without determinism; completeness and incompleteness results; and tractability results (or lack thereof) when computing most probable explanations (MPEs). 1. Introduction Arithmetic circuits (ACs) were introduced into the AI literature more than fifteen years ago as a tractable representation of probability distributions. The original work on these circuits proposed the compilation of such circuits from Bayesian networks, while identifying and assuming three circuit properties, called determinism, decomposability and smoothness (Darwiche, 2001a;b; 2003). Since then, the literature on using arithmetic circuits for probabilistic reasoning has seen two key developments. The first is the proposal made by (Lowd & Domingos, 2008) to learn these circuits directly from data instead of just compiling them from models therefore creating two distinct construction modes for these circuits. The second development, reported by (Poon & Domingos, 2011), amounted to proposing a class of arithmetic circuits that does not satisfy determinism, under the name of sum-product networks (SPNs). 1University of California, Los Angeles, California, USA. Correspondence to: Arthur Choi , Adnan Darwiche . Proceedings of the 34 th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017. Copyright 2017 by the author(s). An examination of the literature surrounding arithmetic circuits and their variants suggests that the implications of relaxing determinism are not very well understood, even leading to conflicting claims in some cases. The treatment of smoothness has also not been very consistent as far as its necessity for certain operations on arithmetic circuits, and the complexity of enforcing it. Our goal in this paper is to address some of these issues by providing a systematic and formal treatment of arithmetic circuits, while focusing on the precise roles and semantics of their various properties and the implications of relaxing determinism. We make several contributions in this paper. We start by reconstructing the original definition of arithmetic circuits given in (Darwiche, 2002; 2003), while assuming that these circuits represent arbitrary factors, instead of just distributions induced by Bayesian networks (a particular class of factors). We then provide definitions for decomposability, smoothness and determinism in the context of this reconstruction, while isolating precisely the role that each one of these properties play. Some of what we report on this has already been observed in the literature, but we provide alternate or more formal proofs for the sake of a systematic and inclusive treatment. We also derive new results. The first of these is a separation theorem showing that relaxing determinism can lead to exponentially smaller arithmetic circuits, while preserving the ability of these circuits to compute marginals in linear time. This begs the question of whether anything is lost from relaxing determinism. On this front, we highlight a finding that has already been reported in the literature and introduce new ones. In particular, we provide an expanded proof for the observation that relaxing determinism deprives arithmetic circuits from the ability to compute MPE in linear time. We also add a new result showing that enforcing decomposability has the power of solving MPE, even though the MPE query is not tractable for decomposable circuits. Moreover, we show that relaxing determinism leads to a type of incompleteness that we call parametric incompleteness, with important implications on the compilability of circuits from models. Our final contribution is a formal correctness proof of the linear-time MPE algorithm, originally proposed by (Chan & Darwiche, 2006), but with respect to the reconstructed definition of arithmetic circuits satisfying decomposability, determinism and smoothness. On Relaxing Determinism in Arithmetic Circuits This paper is structured as follows. We reconstruct the definition of arithmetic circuits as given by (Darwiche, 2002; 2003) in Section 2, but with respect to factors instead of distributions (of Bayesian networks). We then provide a new treatment of decomposability and smoothness in Section 3, followed by a new treatment of determinism in Section 4. We finally focus on the relaxation of determinism in Section 5, where we provide a new set of results and insights. An extended version of this paper includes some proofs that have been omitted here for space limitations.1 2. Arithmetic Circuits Capital letters (X) denote variables and lower-case letters (x) denote their values. Bold capital letters (X) denote sets of variables and bold lower-case letters (x) denote their instantiations. Value x is compatible with instantiation y iff y assigns value x to X or does not assign any value to X. Definition 1 A factor f(X) over variables X maps each instantiation x of variables X into a non-negative number f(x). The factor is a distribution when P x f(x) = 1. The classical, tabular representation of a factor f(X) is clearly exponential in the number of variables X, yet it allows one to answer key probabilistic queries efficiently. The interest here is in a more compact representation of these factors, using arithmetic circuits, while preserving the ability to answer some of these queries efficiently. We focus on the following queries, all with respect to a factor f(X), with its variables X partitioned into sets Y and Z: Computing the value of factor f(X) at instantiation y, defined as f(y) = P z f(y, z) and called a marginal in this paper. This corresponds to the probability of evidence y when the factor is a distribution. Computing the MPE of factor f(X), defined as argmaxx f(x). This corresponds to the most likely instantiation when the factor is a distribution. Computing the MAP over variables Y, defined as argmaxy P z f(y, z). This is the most likely state of variables Y when the factor is a distribution. For Bayesian networks (interpreted as factors), the decision variants of the MPE, marginals, and MAP problems are, respectively, NP-complete (Shimony, 1994), PPcomplete (Roth, 1996), and NPPP-complete (Park & Darwiche, 2004); see also (Darwiche, 2009). Hence, computing marginals is more difficult than computing MPE an observation that will be quite relevant later. 1Available at http://reasoning.cs.ucla.edu. A f1 1 1 0 2 (a) f1(A) A B f2 1 1 3 1 0 4 0 1 5 0 0 6 (b) f2(A, B) A B f 1 1 3 1 0 4 0 1 10 0 0 12 (c) f = f1f2 Figure 1. Two factors and their product. We also need to define the projection of factor f(X) on variables Y as the factor g(Y) with g(y) = P z f(y, z). This projection will be denoted by P We next define an arithmetic circuit over discrete variables X, as utilized in (Darwiche, 2002; 2003) to represent distributions, except that we will utilize it to represent factors. A key observation here is that the circuit inputs are not variables X, but indicator variables that are derived from the values of variables X (Darwiche, 2002; 2003). Definition 2 An arithmetic circuit AC(X) over variables X is a rooted DAG whose internal nodes are labeled with + or and whose leaf nodes are labeled with either indicator variables λx or non-negative parameters θ. Here, x is the value of some variable X in X. The value of the circuit at instantiation y, where Y X, is denoted AC(y) and obtained by assigning indicators λx the value 1 if x is compatible with instantiation y and 0 otherwise, then evaluating the circuit in the standard way (in linear time). The following definition makes a distinction that has not been made explicit in the literature as far as we know, but is critical for a clear semantics of arithmetic circuits. Definition 3 The circuit AC(X) computes factor f(X) iff AC(x) = f(x) for each instantiation x of variables X. The circuit computes the factor marginals iff AC(y) = f(y) for each instantiation y of every Y X. The notion of computes a factor constrains the value of an arithmetic circuit under a strict subset of its inputs (i.e., those corresponding to complete instantiations). However, the notion of computes marginals constrains its value under every input. Hence, two arithmetic circuits that represent distinct functions (over indicator variables) may still compute the same factor. Consider an arithmetic circuit that computes a factor f(X, . . .), where X has values x and x. Replacing λx + λ x with 1 in this circuit preserves its ability to compute the factor since λx+λ x = 1 for every input that is relevant to computing the factor. This replacement, however, will change the function represented by the circuit and its ability to compute the factor marginals. Corollary 1 An arithmetic circuit that computes the marginals of a factor also computes the factor. However, On Relaxing Determinism in Arithmetic Circuits an arithmetic circuit that computes a factor does not necessarily compute its marginals. Consider the following arithmetic circuit which computes the factor in Figure 1(c): [λa + 2λ a][3λaλb + 4λaλ b + 5λ aλb + 6λ aλ b]. This circuit does not compute the factor marginals. Moreover, this circuit is the product of two circuits, one computing factor f1, the other computing factor f2, as in Figure 1. To get further insights into the notion of computing marginals, we appeal to the notion of a network polynomial (Darwiche, 2003), while lifting it to factors. Definition 4 The polynomial of factor f(X) is defined as: X where x x means that the value x of variable X X is compatible with instantiation x of variables X. The polynomial of factor f(A, B) in Figure 1(c) is 3λaλb+ 4λaλ b+10λ aλb+12λ aλ b. The polynomial of factor f(X) corresponds to the simplest circuit that computes the factor marginals. It is a two-level circuit though, which has an exponential size. The interest, however, is in deeper circuits that may not be exponentially sized. We later discuss circuit properties that allows us to achieve this, sometimes. One can construct an arithmetic circuit that computes the distribution of a Bayesian network or the partition function of a Markov network in time and space that are linear in the size of these models. Each of these models correspond to a set of factors f1, . . . , fn, with the model representing the product of these factors. We can construct a circuit that computes each factor as given in Definition 4, then simply combine these circuits using a multiplication node. The result will compute the factor f = f1 . . . fn but it will not necessarily compute its marginals. We next show that if we enforce the properties of decomposability and smoothness on such a circuit, while maintaining its ability to compute the factor f, the resulting circuit will also compute the factor marginals. Therefore, these two properties turn the circuit into a tractable representation of the factor, allowing one to compute marginals by simply evaluating the circuit as in Definition 2 (in time linear in the circuit size). 3. Decomposability and Smoothness The property of decomposability (Darwiche, 2001b) was used for tractable probabilistic reasoning in (Darwiche, 2003) by compiling Bayesian networks into arithmetic circuits that are guaranteed to be decomposable. This property was also enforced by the algorithm proposed in (Lowd & Domingos, 2008) for learning arithmetic circuits. Figure 2. An AC that computes factor f = f1 f2 from Figure 1, where an ab-subcircuit (left) and an ab-subcircuit (right) are highlighted. This circuit is deterministic, decomposable and smooth. Definition 5 (Decomposability) Let n be a node in an arithmetic circuit AC(X). The variables of n, denoted vars(n), are all variables X X with some indicator λx appearing at or under node n. An arithmetic circuit is decomposable iff every pair of children c1 and c2 of a -node satisfies vars(c1) vars(c2) = . The property of smoothness (Darwiche, 2001b) was also used for probabilistic reasoning in (Darwiche, 2003) by compiling circuits that are smooth. It was also enforced by the learning algorithm of (Lowd & Domingos, 2008). This property was later called completeness in the works on SPNs, initially in (Poon & Domingos, 2011). Definition 6 (Smoothness) An arithmetic circuit AC(X) is smooth iff (1) it contains at least one indicator for each variable in X, and (2) for every child c of +-node n, we have vars(n) = vars(c). Consider a factor f(X), where variable X is binary and f(x) = f( x) = θ. A circuit that consists of the single parameter θ will compute this factor but is not smooth. The circuit θλx + θλ x computes this factor and is smooth. Consider a variable X with values x1, . . . , xm. Multiplying a circuit node by λx1 +. . .+λxm preserves the circuit s ability to compute a given factor since λx1 +. . .+λxm = 1 under any circuit input that is relevant to this computation. One can use this technique to ensure the smoothness of any circuit, while incurring only a polynomial overhead.2 Hence, contrary to decomposability and determinism, enforcing smoothness is not difficult computationally, yet it is necessary for an arithmetic circuit to compute marginals as we discuss later. We also state the following observation, used extensively in inductive proofs that we utilize later. Lemma 1 Consider a decomposable and smooth arithmetic circuit AC(X) and define Xn = vars(n) for each circuit node n. Each arithmetic circuit AC(Xn) rooted at node n is also decomposable and smooth. 2(Darwiche, 2001a) shows how to smooth an NNF circuit in O(mn) space and time, where n is the size of the circuit and m is the number of variables (the method can be adapted to ACs). On Relaxing Determinism in Arithmetic Circuits A main insight in this paper is the use of subcircuits, first introduced in (Chan & Darwiche, 2006) for a different purpose. They were also adopted in (Dennis & Ventura, 2015; Zhao et al., 2016) to motivate SPN learning algorithms. Definition 7 (Subcircuits) A complete subcircuit α of an arithmetic circuit is obtained by traversing the circuit topdown, while choosing one child of each visited +-node and all children of each visited -node. The term of α is the set of values x for which indicator λx appears in α. The coefficient of α is the product of all parameters in α. The circuit 2 λx+1 λ x+3 λx computes factor f(X) with f(x) = 5 and f( x) = 1. It is decomposable, smooth and has three complete subcircuits, with (x, 2), ( x, 1) and (x, 3) as their term/coefficient pairs. Note that two subcircuits may have the same term but different coefficients. The following lemma and its proof reveal the precise roles of decomposability and smoothness. Given decomposability, the term of a complete subcircuit will not contain conflicting values for any variable. Given smoothness, the term must contain a value for each variable. Lemma 2 Let α be a complete subcircuit of an arithmetic circuit AC(X) that is decomposable and smooth. The term of subcircuit α must be an instantiation of variables X. Proof Given smoothness, every variable X X must have at least one indicator λx in α (no variables are dropped from the circuit if we keep only a single child of a +-node). Given decomposability, each variable X X must have at most one indicator λx in α. Hence, α will contain exactly one indicator for each variable X X. The term of α is therefore an instantiation x of variable X. A complete subcircuit with term x will be called an x-subcircuit. Figure 2 depicts an ab-subcircuit (in red) and an ab-subcircuit (in blue). In a decomposable and smooth circuit AC(X), every complete subcircuit is an x-subcircuit for some instantiation x of variables X. The circuit can then be treated as a collection of x-subcircuits (multiple subcircuits can have the same term). Our proofs utilize this implication heavily. Definition 8 An input Λ for arithmetic circuit AC(X) assigns a value in {0, 1} to each circuit indicator λx. An instantiation x is compatible with a circuit input Λ, denoted x Λ, iff the input sets λx = 1 when x sets X =x. A circuit input can be viewed as the set of instantiations compatible with it. Consider the binary variables X = {A, B, C} for an example. The circuit input Λ = {λa = 1, λ a = 0, λb = 1, λ b = 0, λc = 1, λ c = 0} has a single compatible instantiation abc. The input Λ = {λa = 0, λ a = 0, λb = 1, λ b = 0, λc = 1, λ c = 0} has no compatible instantiations, and the circuit input: Λ = {λa = 1, λ a = 1, λb = 1, λ b = 0, λc = 1, λ c = 0} has two compatible instantiations abc and abc. In this latter case, evaluating the circuit at instantiation bc, as discussed in Definition 2, leads to evaluating it under input Λ. The following lemma brings us one step away from showing why decomposability and smoothness force an AC that computes a factor to also compute the factor marginals. Lemma 3 Given a decomposable and smooth arithmetic circuit, let θ1, . . . , θm be the coefficients of complete subcircuits whose terms are compatible with circuit input Λ. The circuit will evaluate to θ1 + . . . + θm under input Λ. Proof Given Lemma 1, we will use induction on the circuit structure. The base case is a leaf circuit node (indicator or parameter). The lemma holds trivially in this case. The inductive case is when n is an internal circuit node with children c1, . . . , ck. Suppose the lemma hold for these children. If n is a -node, the lemma holds for n by decomposability (the complete subcircuits of n correspond to the Cartesian product of complete subcircuits for its children). If n is a +-node, the lemma holds for n since the complete subcircuits of n correspond to the union of complete subcircuits of its children. Corollary 2 Let θ1, . . . , θm be the coefficients of xsubcircuits in a decomposable and smooth arithmetic circuit AC(X). We then have AC(x) = θ1 + . . . + θm. We are now ready for the key result we are after. Theorem 1 Consider an arithmetic circuit AC(X) that computes factor f(X). If the circuit is decomposable and smooth, then it also computes the marginals of factor f(X). Proof Consider an instantiation y of some variables Y X, let x1, . . . , xm be all instantiations of variables X that are compatible with y, and let Λ be the circuit input corresponding to these instantiations. Let θi be the sum of coefficients of all xi-subcircuits. Since the circuit computes factor f(X), we have AC(xi) = f(xi) and, hence, f(xi) = θi by Corollary 2. By Lemma 3, the circuit evaluates to θ1+. . .+θm under input Λ, which is f(x1)+. . .+f(xm) = f(y). Hence, the circuit computes the factor marginals. This theorem justifies the standard algorithm for computing marginals on arithmetic circuits, in linear time, as proposed in (Darwiche, 2003) that is, by simply evaluating the circuit as in Definition 2. In that work, however, the property On Relaxing Determinism in Arithmetic Circuits of determinism was also assumed (discussed in the next section). Determinism is not necessary though for computing marginals as initially observed in (Poon & Domingos, 2011).3 Our proof above uses different tools than those used in (Poon & Domingos, 2011) and is set in a more general context. Moreover, these tools and associated lemmas turn out to be essential for the rest of our treatment on the role of determinism, which we discuss in the next section. As for the necessity of smoothness, consider the circuits AC1(A, B) = λaλb + λ a and AC2(A, B) = λaλb + λ a(λb + λ b). Both circuits are decomposable and compute the same factor: f(a, b) = 1, f(a, b) = 0, f( a, b) = 1 and f( a, b) = 1. However, circuit AC1 is not smooth while AC2 is smooth. Only AC2 is guaranteed to compute factor marginals by Theorem 1. For example, evaluating AC1 at instantiation a gives AC1( a) = 0 1 + 1 = 1 = f( a) = 2 according to Definition 2, while AC2( a) = f( a).4 Before we discuss determinism, we note that decomposability and determinism were exploited recently in tractable, propositional reasoning within a semi-ring setting; initially in (Kimmig et al., 2012; 2016), then followed by (Friesen & Domingos, 2016). 4. Determinism The property of determinism (Darwiche, 2001a) was employed for probabilistic reasoning in (Darwiche, 2003) by compiling Bayesian networks into arithmetic circuits that are deterministic. It was also enforced by the algorithm of (Lowd & Domingos, 2008) for learning arithmetic circuits. The property was later called selectivity in the works on SPNs, initially in (Peharz et al., 2014). Using the terminology of our current formulation, the original definition of determinism would amount to this: An arithmetic circuit is deterministic iff the terms of each two of its complete subcircuits are conflicting. We will adopt a weaker definition, which allows conflicting subcircuits as long as at most one of them has a non-zero coefficient. Definition 9 (Determinism) An arithmetic circuit AC(X) is deterministic iff each +-node has at most one non-zero input when the circuit is evaluated under any instantiation x of variables X. 3(Poon & Domingos, 2011) introduced sum-product networks (SPNs), which are equivalent to decomposable and smooth ACs. More precisely, each can be converted into the other in linear time and space (Rooshenas & Lowd, 2014). The conversion is straightforward and amounts to adjusting for graphical notation. 4Theorem 1 of (Friesen & Domingos, 2016) implies that factor marginals can be computed in time linear in the size of an arithmetic circuit, when the circuit is decomposable but not smooth. This complexity is not justified (but assumed) in the proof of the theorem. In fact, we are unaware of any justified algorithm that attains this complexity without smoothness; see also Footnote 2. As mentioned earlier, the original proposal for using arithmetic circuits as a tractable representation of probability distributions (Darwiche, 2003) ensured that these circuits are deterministic, in addition to being decomposable and smooth. Moreover, several methods were proposed in (Darwiche, 2003) for compiling Bayesian networks into ACs with these properties. One of these methods ensures that the size of the AC is proportional to the size of a jointree for the network. Another method yields circuits that can sometimes be exponentially smaller, and is implemented in the publicly available ace system (Chavira & Darwiche, 2008); see also (Darwiche et al., 2008) for an empirical evaluation of this system in one of the UAI inference evaluations. While determinism is not needed to compute factor marginals, it is needed for the correctness of the linear-time MPE algorithm of (Chan & Darwiche, 2006). This was missed in some earlier works (Poon & Domingos, 2011), which used this algorithm on non-deterministic ACs (i.e., SPNs) without realizing that it is no longer correct. This oversight was noticed in later works (Peharz et al., 2016; Mau a & de Campos, 2017).5 We next reveal the reason why computing MPE without determinism is hard. Later in the section, we reveal the reason why the MPE algorithm of (Chan & Darwiche, 2006) fails without determinism. The key observation is this. Consider variables X which are partitioned into Y and Z. Given a decomposable and smooth arithmetic circuit AC(X) that computes factor f(X), one can obtain in linear time a decomposable and smooth AC(Y) that computes the projection P Z f(X). This is achieved by simply setting all indicators λz to 1; see (Darwiche, 2001b) for the root of this observation. Moreover, an MPE for the projection P Z f(X) is a MAP for the original factor f(X). Hence, a polytime MPE algorithm implies a polytime MAP algorithm on decomposable and smooth ACs. We know, however, that Naive Bayes networks have linear-size decomposable and smooth ACs, while MAP is hard on these networks (de Campos, 2011). Therefore, the existence of a polytime MPE algorithm on such circuits will contradict standard complexity assumptions. These observations can be abstracted into the following lemma, which succinctly and intuitively explains why MPE is not tractable on decomposable and smooth circuits. Lemma 4 A circuit representation that supports projection and MPE in polytime also supports MAP in polytime. Note that deterministic, decomposable and smooth ACs do not support projection in polytime so the above argument 5(Peharz et al., 2016) proposed a polytime algorithm that converts an SPN into one that is deterministic and smooth (called an augmented SPN), but this new SPN computes a different factor than the one computed by the original SPN. Hence, its MPEs cannot be generally converted into MPEs of the original SPN. On Relaxing Determinism in Arithmetic Circuits does not apply in this case (setting indicators λz to 1 will generally destroy determinism). More formally, let AC(X) be a decomposable and smooth arithmetic circuit that computes a factor f(X). For a given value k, consider the decision problems: D-MPE-AC: Is there an instantiation x where AC(x) > k? D-MAP-AC: Is there an instantiation y where P z AC(y, z) > k? (X is partitioned into Y and Z). We now have the following result, whose proof expands the one given in (Peharz et al., 2016) for SPNs based on the above observations; see also (Mau a & de Campos, 2017) for an in-depth discussion of MPE hardness on SPNs. Theorem 2 The problem D-MPE-AC is NP-complete. Proof Given instantiation x and value k, we can test whether f(x) > k by evaluating the circuit AC in time linear in the size of the circuit. Hence, the problem is in NP. To show NP-hardness, we reduce the (decision) problem of computing (partial) MAP in a naive Bayes network, which is NP-complete (de Campos, 2011), to MPE in a decomposable and smooth arithmetic circuit. Suppose we have a naive Bayes network with a root node X0 and leaf nodes X, and inducing a distribution Pr(X0, X). We can compile this network into a polysize decomposable, deterministic and smooth arithmetic circuit AC0(X0, X) that computes Pr(X0, X), e.g., as in (Chavira & Darwiche, 2008). We can sum-out variable X0 in the circuit AC0(X0, X) by setting the indicators of X0 to one. The resulting circuit AC1(X) is decomposable and smooth, and computes the (marginals of) factor P X0 Pr(X0, X). For a given value k, there exists an instantiation x where AC1(x) > k iff there exists an instantiation x where P x0 Pr(x0, x) > k, which is an NP-complete problem (de Campos, 2011). Corollary 3 The problem D-MAP-AC is NP-complete. The following lemma reveals the precise role of determinism, which stands behind the correctness of the linear-time MPE algorithm of (Chan & Darwiche, 2006). It basically shows a one-to-one correspondence between the non-zero rows of the factor computed by a circuit and the complete subcircuits with non-zero coefficients. Lemma 5 Consider an arithmetic circuit AC(X) that computes factor f(X) and is deterministic, decomposable and smooth (hence, can be viewed as a collection of xsubcircuits). For each instantiation x, we have: (a) If the circuit has two distinct x-subcircuits, one of them must have a zero coefficient. (b) If f(x) > 0, the circuit contains a unique x-subcircuit with coefficient f(x). Proof To prove (a), suppose the circuit contains two distinct x-subcircuits α1 and α2 that have non-zero coefficients. We will now establish a contradiction. Since α1 and α2 are distinct, each αi must include a distinct child ci of some +-node in the circuit. If we evaluate the circuit at instantiation x, both c1 and c2 will have non-zero values. Hence, the circuit cannot be deterministic, which is a contradiction. To prove (b), suppose f(x) > 0 and let α1, . . . , αm be all x-subcircuits. At most one αi can have a non-zero coefficient by (a). Since the circuit computes the factor, it must evaluate to f(x) under instantiation x. Hence, exactly one αi has f(x) as its coefficient. Lemma 5 allows us to prove the correctness of the MPE algorithm given by (Chan & Darwiche, 2006) under the more general setting we have in this paper. This original algorithm is based on converting a deterministic, decomposable and smooth AC that computes a distribution Pr(X) into a maximizer circuit. Evaluating this circuit under evidence y, as in Definition 2, gives the MPE value argmaxx y Pr(x). An arithmetic circuit AC(X) is converted into a maximizer circuit, denoted ACmax(X), by replacing every +-node with a max-node. The complete subcircuits of ACmax are defined as in Definition 7, but where exactly one child of each visited max-node is selected. Theorem 3 Let AC(X) be a deterministic, decomposable and smooth arithmetic circuit that computes a factor f(X) and let ACmax(X) be its maximizer circuit. Then ACmax(y) = maxx y f(x) for Y X. Proof By Lemma 5, there is a one-to-one correspondence between the non-zero rows of factor f(X) and xsubcircuits with non-zero coefficients. Let θ1, . . . , θm be the coefficients of x-subcircuits, where x is compatible with y. Hence, max{θ1, . . . , θm} = maxx y f(x). That is, the MPE value is a coefficient of some x-subcircuit call it an MPE-subcircuit. We will think of the algorithm as composing an MPE-subcircuit in addition to computing its coefficient and show that ACmax(y) = max{θ1, . . . , θm} by induction on the circuit structure (see Lemma 1). The base case trivially holds for leaf circuit nodes (indicators and parameters). Assume n is an internal circuit node and the above equality holds for its children c1, . . . , ck having MPE-subcircuits αi and coefficient ηi. If n is a -node, then by decomposability, an MPE-subcircuit for n can be found by joining α1, . . . , αk with η1 . . . ηk as its coefficient. If n is a max-node, then by determinism, an MPEsubcircuit for n can be found from the αi with the largest ηi with maxk i=1 ηi as its coefficient. Once a maximizer circuit is evaluated to θ, one can identify On Relaxing Determinism in Arithmetic Circuits an x-subcircuit that has θ as its coefficient, with x being an MPE instantiation; see (Chan & Darwiche, 2006).6 Without determinism, a circuit may have multiple xsubcircuits for a given x, each having a non-zero coefficient. By Corollary 2, the value of x, AC(x) = f(x), is the sum of these coefficients. An MPE algorithm that does not perform this sum cannot be correct.7 Before we further discuss the impact of relaxing determinism, we point to a new class of arithmetic circuits, the Probabilistic Sentential Decision Diagram (PSDD) (Kisa et al., 2014), which imposes stronger versions of decomposability and determinism. This enables the multiplication of two ACs in polytime, which is otherwise hard under the standard versions of these properties (Shen et al., 2016). 5. The Impact of Relaxing Determinism We now consider two new implications of relaxing determinism, one positive and one negative. We also address an apparent paradox: How could a representation (decomposable and smooth ACs) allow the computation of marginals easily (a PP-complete problem), yet not allow the computation of MPE easily (an NP-complete problem)? Recall that the complexity class NP is included in the class PP. The positive implication is that relaxing determinism can lead to exponentially smaller arithmetic circuits. Theorem 4 (Separation) There is a family of factors fn(Xn) where (1) there exists a decomposable and smooth arithmetic circuit ACn(Xn) that computes the marginals of fn, with a size polynomial in n; (2) every deterministic, decomposable and smooth circuit that computes the marginals of factor fn must have a size exponential in n. Proof (Bova et al., 2016) identifies a family of Boolean functions (the Sauerhoff functions) Sn that have decomposable NNFs (DNNFs) with sizes polynomial in n, but where their deterministic DNNFs (d-DNNFs) must have sizes exponential in n. Previously known separations were conditional on the polynomial hierarchy not collapsing (Darwiche & Marquis, 2002), but (Bova et al., 2016) does not make such an assumption (and neither do we). 6Smoothness is not strictly needed for this algorithm, except that it ensures that a full variable instantiation is returned. 7This MPE algorithm was used on selective SPNs (equivalent to deterministic and decomposable ACs) in (Peharz et al., 2016). It was also adapted to algebraic model counting (AMC) in (Kimmig et al., 2016) and to Sum-Product Functions (SPFs) in (Friesen & Domingos, 2016). Determinism was not required in (Kimmig et al., 2016). This is sound since AMC problems correspond to Boolean circuits where the weight of an instantiation is a product of literal weights, and is independent of how many times the instantiation appears as a subcircuit. Let gn denote a polysize DNNF for function Sn and let ACn denote the polysize decomposable and smooth arithmetic circuit obtained by: replacing the inputs of gn with the corresponding indicator variables, replacing conjunctions and disjunctions by products and sums, respectively, then smoothing if necessary. The resulting arithmetic circuit ACn has a positive value (possibly > 1) on input x iff the original function Sn evaluates to true. We now show that if fn is the factor computed by arithmetic circuit ACn, then any deterministic, decomposable and smooth AC that computes fn must have an exponential size. Let AC n be such a circuit. Consider the d-DNNF g n obtained by: replacing the indicator variables with the corresponding literals of variables X, replacing products and sums with conjunctions and disjunctions, respectively, and by replacing positive parameters with true and zero parameters with false. Note that g n(x) is true iff AC n(x) > 0, i.e., a complete subcircuit for g n evaluates to true iff the corresponding subcircuit for AC n has a positive coefficient. Hence, if AC n had a sub-exponential size, then function Sn would have a sub-exponentially sized d-DNNF, which we know does not exist (Bova et al., 2016). We now get to a newly identified, negative implication of relaxing determinism. It pertains to compiling ACs from probabilistic models and requires the following notion. Definition 10 A set of parameters Θ is complete for factor f(X) iff for every instantiation x, the parameter f(x) can be expressed as a product of parameters in Θ. The parameters of a Bayesian network are complete for its distribution; those of a Markov network are complete for its partition function; and the parameters Θ = {0, 1} are complete for Boolean factors: f(X) with f(x) {0, 1}. We will write AC(X, Θ) to denote an arithmetic circuit whose parameters are in Θ. The following theorem states a key property which is lost when relaxing determinism. Theorem 5 (Completeness) Consider factor f(X) and complete parameters Θ. There must exist an arithmetic circuit AC(X, Θ) that computes the factor marginals and is deterministic, decomposable and smooth. Proof Consider the factor polynomial P x x λx in Definition 4 and replace each f(x) by a product of parameters from Θ. The result can be represented by an AC that is deterministic, decomposable and smooth. The standard methods for compiling Bayesian networks, and graphical models more generally, into arithmetic circuits do indeed limit the circuit parameters to those appearing in the model factors. Hence, the compilation process amounts only to finding a (small) circuit structure since On Relaxing Determinism in Arithmetic Circuits the circuit parameters are already predetermined. As mentioned earlier, these methods can yield relatively small circuits for some graphical models with very high treewidth (Chavira & Darwiche, 2008; Darwiche et al., 2008). The above property is lost if one insists on constructing arithmetic circuits that are decomposable and smooth, but not deterministic. This is shown in the following theorem, which refers to dead circuit nodes: ones that appear only in complete subcircuits that have zero coefficients.8 Theorem 6 (Parametric Incompleteness) Let f(X) be a Boolean factor and Θ = {0, 1} (Θ is complete for f). A circuit AC(X, Θ) cannot compute f(X) if it is decomposable, smooth and free of dead nodes, but not deterministic. Proof If the AC has no +-node, then it is vacuously deterministic. Otherwise, it has a +-node. Since the circuit is not deterministic, there is a +-node n that violates determinism. This node is included in some complete subcircuit with a non-zero coefficient (otherwise, the node n is dead). Since node n violates determinism, we can find two distinct x-subcircuits, with non-zero coefficients, that differ by the branch selected at node n. Since the circuit computes factor f(X), Lemma 3 implies that the coefficients of these x-subcircuits must add up to f(x) = 1. There must then exist an x-subcircuit whose coefficient is in (0, 1), exclusive, i.e., the circuit has a parameter not in {0, 1}. In other words, if a decomposable and smooth circuit AC(X, {0, 1}) computes the marginals of a Boolean factor, it must also be non-trivially deterministic. This result has a major implication on compiling probabilistic graphical models into ACs that are not deterministic. That is, one cannot generally restrict the circuit parameters to those appearing in the model, otherwise a circuit may not exist. Therefore, while relaxing determinism can lead to exponentially smaller circuits, finding these circuits is now more involved as it may require searching for parameters. This demands new techniques as all techniques we are aware of for compiling models into deterministic circuits assume that the circuit parameters come from model parameters. Our last contribution relates to the following apparent paradox. Suppose we have a set of factors f1(X1), . . . , fn(Xn), representing a probabilistic graphical model that has a corresponding joint factor f = f1 fn. Consider now the following decision problems, over such probabilistic graphical models, which correspond to computing the MPE and marginals: 8Dead nodes can be replaced by the constant zero without changing the factor computed by the circuit. One can relax determinism trivially by adding dead nodes, but that does not help as far as obtaining smaller circuits. D-MPE: Is there an instantiation x where f(x) > k? x f(x) > k? D-MPE is NP-complete, whereas D-PR is PP-complete. Moreover, the complexity class PP includes NP. Yet, decomposable and smooth ACs allow one to compute marginals in linear time, while computing MPE, which is no harder, is hard on these circuits! To resolve this apparent paradox, one must observe the sometimes subtle distinction between a representation and the computation needed to produce that representation. The representation here is decomposable and smooth ACs, and the computation is the algorithm used to compile a graphical model into this representation. While the representation itself does not facilitate the computation of MPE, the compilation algorithm must be sufficient to compute the MPE query without additional complexity (beyond polynomial). To formalize this, we need the following lemma. Lemma 6 D-MPE can be reduced to D-PR. We now have the following result, which implies that a polytime compilation algorithm for decomposable and smooth ACs can be used as a sub-routine for a polytime algorithm for computing MPEs (which we do not expect to exist, under typical complexity theoretic assumptions). Theorem 7 Consider an algorithm Ξ that takes a set of factors f1(X1), . . . , fn(Xn) as input and returns a decomposable and smooth arithmetic circuit that computes the marginals of factor f = f1 fn. Let s be the size of input factors and let O(t(s)) be the time complexity of algorithm Ξ. One can compute the MPE of factor f in time O(t(poly(s))). These findings highlight an interesting property of decomposable and smooth ACs. They store the results to an exponential number of marginal queries, where each result can be retrieved by a simple traversal of the circuit. Yet, they do not store the answers to MPE queries, even though these queries are easier. The implication of this can be seen from two angles, depending on whether these circuits are compiled from models or learned from data. In the former case, the compilation algorithm is readily available to answer MPE queries, but at the cost of invoking this algorithm for each query. In the latter case, however, answering MPE queries remains a challenge. Hence, learning circuits that are not deterministic needs to yield an additional benefit that compensates for this loss in tractability. This could be a simpler learning algorithm; a smaller learned circuit; or a learned circuit whose factor is superior from a statistical learning viewpoint. On Relaxing Determinism in Arithmetic Circuits Acknowledgements This work has been partially supported by NSF grant #IIS1514253, ONR grant #N00014-15-1-2339 and DARPA XAI grant #N66001-17-2-4032. We thank Yoo Jung Choi, Umut Oztok, Yujia Shen, and Guy Van den Broeck for comments and discussions on this paper. Bova, Simone, Capelli, Florent, Mengel, Stefan, and Slivovsky, Friedrich. Knowledge compilation meets communication complexity. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI), pp. 1008 1014, 2016. Chan, Hei and Darwiche, Adnan. On the robustness of most probable explanations. In Proceedings of the 22nd Conference in Uncertainty in Artificial Intelligence (UAI), 2006. Chavira, Mark and Darwiche, Adnan. On probabilistic inference by weighted model counting. Artificial Intelligence, 172(6 7): 772 799, April 2008. Darwiche, Adnan. On the tractable counting of theory models and its application to truth maintenance and belief revision. Journal of Applied Non-Classical Logics, 11(1-2):11 34, 2001a. Darwiche, Adnan. Decomposable negation normal form. Journal of the ACM, 48(4):608 647, 2001b. Darwiche, Adnan. A logical approach to factoring belief networks. In Proceedings of KR, pp. 409 420, 2002. Darwiche, Adnan. A differential approach to inference in Bayesian networks. J. ACM, 50(3):280 305, 2003. Darwiche, Adnan. Modeling and Reasoning with Bayesian Networks. Cambridge University Press, 2009. Darwiche, Adnan and Marquis, Pierre. A knowledge compilation map. JAIR, 17:229 264, 2002. Darwiche, Adnan, Dechter, Rina, Choi, Arthur, Gogate, Vibhav, and Otten, Lars. Results from the probabilistic inference evaluation of UAI-08. 2008. de Campos, Cassio Polpo. New complexity results for MAP in Bayesian networks. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), pp. 2100 2106, 2011. Dennis, Aaron W. and Ventura, Dan. Greedy structure search for sum-product networks. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI), pp. 932 938, 2015. Friesen, Abram L. and Domingos, Pedro M. The sum-product theorem: A foundation for learning tractable models. In Proceedings of the 33nd International Conference on Machine Learning (ICML), pp. 1909 1918, 2016. Kimmig, Angelika, Van den Broeck, Guy, and De Raedt, Luc. Algebraic model counting. Co RR, abs/1211.4475, 2012. URL http://arxiv.org/abs/1211.4475. Kimmig, Angelika, Van den Broeck, Guy, and De Raedt, Luc. Algebraic model counting. International Journal of Applied Logic, November 2016. Kisa, Doga, Van den Broeck, Guy, Choi, Arthur, and Darwiche, Adnan. Probabilistic sentential decision diagrams. In KR, 2014. Lowd, Daniel and Domingos, Pedro M. Learning arithmetic circuits. In Proceedings of the 24th Conference in Uncertainty in Artificial Intelligence (UAI), pp. 383 392, 2008. Mau a, Denis Deratani and de Campos, Cassio Polpo. Approximation complexity of maximum a posteriori inference in sumproduct networks. Co RR, abs/1703.06045, 2017. Park, James D. and Darwiche, Adnan. Complexity results and approximation strategies for MAP explanations. J. Artif. Intell. Res. (JAIR), 21:101 133, 2004. Peharz, Robert, Gens, Robert, and Domingos, Pedro M. Learning selective sum-product networks. In LTPM workshop, 2014. Peharz, Robert, Gens, Robert, Pernkopf, Franz, and Domingos, Pedro. On the latent variable interpretation in sum-product networks. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2016. Poon, Hoifung and Domingos, Pedro M. Sum-product networks: A new deep architecture. In UAI, pp. 337 346, 2011. Rooshenas, Amirmohammad and Lowd, Daniel. Learning sumproduct networks with direct and indirect variable interactions. In ICML, pp. 710 718, 2014. Roth, Dan. On the hardness of approximate reasoning. Artif. Intell., 82(1-2):273 302, 1996. Shen, Yujia, Choi, Arthur, and Darwiche, Adnan. Tractable operations for arithmetic circuits of probabilistic models. In Advances in Neural Information Processing Systems 29 (NIPS), 2016. Shimony, Solomon Eyal. Finding MAPs for belief networks is NP-hard. Artif. Intell., 68(2):399 410, 1994. Zhao, Han, Poupart, Pascal, and Gordon, Geoffrey J. A unified approach for learning the parameters of sum-product networks. In Advances in Neural Information Processing Systems 29 (NIPS), pp. 433 441, 2016.