# submodel_decomposition_bounds_for_influence_diagrams__9348e9bc.pdf Submodel Decomposition Bounds for Influence Diagrams Junkyu Lee1,2, Radu Marinescu2, Rina Dechter1 1 University of California Irvine 2 IBM Research junkyul@uci.edu, radu.marinescu@ie.ibm.com, dechter@ics.uci.edu Influence diagrams (IDs) are graphical models for representing and reasoning with sequential decision-making problems under uncertainty. Limited memory influence diagrams (LIMIDs) model a decision-maker (DM) who forgets the history in the course of making a sequence of decisions. The standard inference task in IDs and LIMIDs is to compute the maximum expected utility (MEU), which is one of the most challenging tasks in graphical models. We present a model decomposition framework in both IDs and LIMIDs, which we call submodel decomposition that generates a tree of single-stage decision problems through a tree clustering scheme. We also develop a valuation algebra over the submodels that leads to a hierarchical message passing algorithm that propagates conditional expected utility functions over a submodel-tree as external messages. We show that the overall complexity is bounded by the maximum tree-width over the submodels, common in graphical model algorithms. Finally, we present a new method for computing upper bounds over a submodel-tree by first exponentiating the utility functions yielding a standard probabilistic graphical model as an upper bound and then applying standard variational upper bounds for the marginal MAP inference, yielding tighter upper bounds compared with state-of-the-art bounding schemes for the MEU task. Introduction Influence diagrams (IDs) (Howard and Matheson 1981) which extend Bayesian networks (BNs) with decision variables and utility functions, are discrete graphical models for single-agent sequential decision making under uncertainty. The standard inference task in IDs is finding the maximum expected utility (MEU) and a set of optimal policy functions that jointly achieve the MEU under given constraints on the available information at each decision. IDs assume perfect recall, namely that the agent or decision maker (DM) is no-forgetting and the sequence of previous decisions and immediate observations are all available when making a decision. The naive evaluation of the MEU leads to algorithms that are space and time exponential in the length of the history (Howard and Matheson 1981), hence, earlier works focused on variable elimination algorithms with the complexity bounded by the treewidth Copyright 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. (Bodlaender 1988) derived from tree decomposition methods that are tied to the algorithms for solving for IDs. A valuation algebra (Kohlas and Shenoy 2000) for IDs was introduced by (Jensen, Jensen, and Dittmer 1994) together with a variable elimination algorithm over a corresponding constrained junction tree. More recently, (Dechter 2000) presented an improved variable elimination algorithms with lower width parameters, while (Pralet, Schiex, and Verfaillie 2006) developed a multi-cluster operator DAG (MC-DAG) decomposition to capture the lowest possible treewidth of IDs. Despite this progress, the MEU task remains clearly intractable so current research focuses on developing bounding schemes. One type of bounds uses the principle of information relaxation, thereby relaxing the constrained ordering of observations and decision variables and allowing more observations to be available to the DM (Nilsson and Hohle 2001; Jensen and Gatti 2010; Yuan, Wu, and Hansen 2010). Other approaches are based on translation and exploit the equivalence between the MEU and the Marginal MAP (MMAP) tasks (Mauá 2016), thus applying bounding schemes for MMAP to the translated MEU (Liu and Ihler 2012; Marinescu, Dechter, and Ihler 2014; Mauá and Cozman 2016). Finally, state-of-the-art decomposition bounds (Lee, Ihler, and Dechter 2018; Lee et al. 2019) apply decomposition schemes directly to IDs using methods such as weighted mini-bucket (WMB) (Liu and Ihler 2011; Marinescu et al. 2018), and generalized dual decomposition (GDD) (Ping, Liu, and Ihler 2015). Yet, the quality of those decomposition bounds quickly degrades due to non-convex optimization routines that must be applied. Relaxing the perfect recall condition leads to limited memory influence diagrams (LIMIDs) that pose additional challenges to the MEU task. A limited memory agent only has access to partial knowledge in the sequence of decisions and observations which means that exact algorithms for LIMIDs must jointly optimize multiple policy functions by exhaustive enumeration (Mauá, de Campos, and Zaffalon 2012). Iterative local policy update algorithms can reduce the complexity by locally improving a subset of policy functions. However, local policy search algorithms assume that the local policy evaluation is easy (Lauritzen and Nilsson 2001; Detwarasiti and Shachter 2005; Mauá and Cozman 2016). To the best of our knowledge, the only upper bounds available in the literature for LIMIDs are the theoretical er- The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21) ror bounds presented in (Mauá and Cozman 2016). Another line of research related to the decomposition of IDs and LIMIDs studies the structure of DAGs to identify the relevant random variables for optimizing a single policy function. Specifically, (Nielsen and Jensen 1999) introduced d-separation criteria for identifying relevant utility functions and observed variables for a decision, and (Nielsen 2001) presented decomposition of IDs. For LIMIDs, (Zhang and Poole 1992; Lauritzen and Nilsson 2001) defined soluble LIMIDs in which the variable elimination for IDs can apply. (Detwarasiti and Shachter 2005) extended the d-separation criteria for identifying relevant random variables in LIMIDs while updating a single policy function. Contribution: In this paper, we take a fresh look at IDs and LIMIDs by viewing the underlying DAG as a causal diagram and provide a new unified tree decomposition method, called a submodel-tree decomposition (STD). For IDs, each node in the submodel-tree is a single-time step decision problem for evaluating value functions, and it is equivalent to MC-DAGs (Pralet, Schiex, and Verfaillie 2006). For LIMIDs, each node in the submodel-tree captures multiple decision variables that must be optimized jointly. The STD offers a new tree clustering framework by using d-separation criteria over the causal DAG and it facilitates hierarchical message passing algorithms, whose complexity can be characterized by a graph width parameter applicable to both IDs and LIMIDs. Since exact message passing algorithms over the STD is intractable, we develop convex variational decomposition bounds over the STD, which allow us to reuse MMAP bounds via Jensen s inequality applied to log moment generating functions. Our experimental results show that the new bounding scheme generates bounds that are orders of magnitudes tighter than those obtained by the current methods in IDs and LIMIDs. Graphical Models A graphical model M is a tuple X, D, F , where X = {X1, . . . , Xn} is a set of discrete random variables, D = {D1, . . . , Dn} are their finite domains of values, and F = {F1(X1), . . . Fr(Xr)} is a set of non-negative functions with each function Fk F being defined over a subset of variables Xk X called its scope and denoted by sc(Fk). The graphical model encodes a joint probability distribution P(X) = 1 Z Q Fk F Fk(Xk), where Z is the normalization constant or partition function. The primal graph of M is an undirected graph in which nodes are associated with variables and two nodes are connected if they appear together in the scope of some function. The join-tree decomposition obtained by triangulating the primal graph offers exact message passing algorithms with the worst-case complexity characterized by the treewidth of the primal graph (Dechter 2013). The join-graph decomposition further decomposes a join-tree into possibly a loopy join-graph GJ = C, S that satisfies the running intersection property. Each cluster C C is assigned with a subset of variables XC X and functions FC F such that sc(Fk) XC for all Fk FC, and a separator Sij S between clusters Ci and Cj is la- (a) Perfect Recall ID (b) Limited Memory ID Figure 1. Example of an ID and LIMID beled by XSij = XCi XCj. (Mateescu et al. 2010). The MMAP inference task eliminates variables either by maximization or summation and, it can be formulated as computing the weighted log partition function, ΦW(θ) = log X1 exp(θ(X)), (1) where θ(X)= P Fk F log Fk(Xk), and the powered-sum elimination operator Pw X f(X)=[P X f(X)1/w]w generalizes the maximization (w = 0+) and the summation (w = 1) with weights 0 w 1 for a variable X (Liu and Ihler 2011). When combined with variational inference (Wainwright and Jordan 2008), a decomposition scheme provides variational upper bounds derived from a certain constrained optimization problem with the decomposition graph encoding constraints. The generalized dual decomposition bound (GDD) (Ping, Liu, and Ihler 2015) provides convex upper bounds that are obtained by duplicating all the shared variables between cluster nodes in the join-graph, exp h θCi(XCi) + X Sij S δSij(XSij) i , (2) where θCi(XCi)=P Fk FClog Fk(Xk), the weights WCi= {W Ci k |Xk XCi} are assigned to a variable Xk such that Wk = P Ci C W Ci k , and δSij(XSij) is the costshifting functions over a separator that cancels each other Sij(XSij) = Sji(XSij). GDD offers a message passing algorithm that tightens the bound by optimizing the weights WCi and δCi,Cj(XCi XCj) between clusters. Often, empirical studies report that a simpler weighted mini-bucket with moment-matching (WMB-MM) algorithm that optimizes the cost-shifting functions only once over the decomposition graph with larger cluster size turns out to be effective in difficult problems (Liu and Ihler 2011; Marinescu, Dechter, and Ihler 2014). Influence Diagrams An ID M is a tuple X, D, P, U, O consisting of a set of discrete chance variables X={X1, . . . , Xn}, a set of discrete decision variables D = {D1, . . . , Dm}, a set of conditional probability functions P={P1, . . . , Pn} associated with the chance variables, a set of additive utility functions U = {U1, . . . , Ur}, and a set of precedence relations O that we will specify shortly. We denote the finite domain of a variable X X D by ΩX, and ΩS is the Cartesian product of individual domains ΩX of variables X in a set S (X D), i.e., ΩS = X SΩX. Given a node X in a DAG G, we denote the set of parents, children, ancestors, descendants, and family of X by pa(X), ch(X), an(X), de(X), and fa(X) = pa(X) {X}, respectively. Clearly, the definitions above extend to a set of nodes X by taking the union of individual sets. Figure 1 illustrates an example of IDs with and without the perfect recall condition which we adapted from the Pigs example (Lauritzen and Nilsson 2001). We associate nodes in a DAG G with the chance variables drawn as circles, the decision variables drawn as squares, and the utility functions drawn as diamonds. The directed arcs toward any chance node Xi in G specify the scope of the conditional probability function Pi Xi|pa(Xi) . The parents of a value node Ui define the scope of the utility or value function Ui pa(Ui) . The directed arcs toward decision variables are called informational arcs, and they specify the set of precedence relations OC = {pa(Dk) {Dk}|Dk D}, where pa(Dk) is the set of variables observed immediately before decision Dk. Under the perfect recall, DM knows the decision order OD = {D1 D2 . . . DT } that constitutes the history ht = ht 1 pa(Dt), where h0 = , for making decisions Dt at the t-th time step. In case of LIMIDs, DM does not have access to OD and each decision Dk is made solely based on immediate observations pa(Dk). The solid arcs directed toward decision variables in Figure 1a and Figure 1b specify the OC. The dotted directed arcs in Figure 1a are the explicit informational arcs specifying the history that are omitted in conventional IDs. By explicitly including these informational arcs in the DAG of IDs, we can express ht as simply ht = pa(Dk) for Dk Dt. In this paper, we encode the perfect recall condition of IDs by explicit informational arcs and IDs will refer to both IDs with or without the perfect recall condition. A deterministic policy function k Dk|pa(Dk) for a decision variable Dk in both IDs and LIMIDs is a mapping k : Ωpa(Dk) 7 ΩDk, and the set of all policy functions = { k Dk|pa(Dk) |Dk D} is called a strategy. The MEU task is to compute the MEU and the global maximum (optimal) strategy . Namely, MEU := max E h X Ui U Ui i (3) Uk U Uk , (4) = argmax E[ X Ui U Ui]. (5) Submodel-Tree Decomposition A DM controls the decision variables in an ID M. Deliberate choices by the DM based on some strategy result in the expected utility E [P Ui U Ui]. Therefore, we view the DAG G of M as a causal diagram and make explicit use of graph concepts from causal graphs (Pearl 2009). In this section, we identify a relevant subset of M for computing the local MEU in the partial evaluation of M denoted by LMEUM(D U ), which is a local maximum conditional expected utility obtained by maximizing the sum of expected utilities over a subset of utility functions U U on policy functions for a subset decision variables D D. We call the relevant subset of M for computing LMEUM(D U ) submodel M (D , U ), and show that it can be identified by the backdoor and front door criteria (Pearl 2009). Next, we define graph-based combination and marginalization operations over the submodels and develop a valuation algebra over the system of submodels in an ID. It is known that the valuation algebra satisfies the axioms of local computation (Shenoy and Shafer 1990), so we can obtain a cluster tree decomposition, which we call submodel-tree decomposition, that assigns submodels to each cluster. We subsequently develop a hierarchical message passing algorithm over the submodel-tree. The complexity of exact algorithms over this tree can be characterized by the maximum width of the individual submodel clusters, which we call submodel width ws. Submodels in the MEU Task Given M = X, D, P, U, O , we define a partially specified strategy for a subset of decision variables D D by ( D D\D ), where D = { Dk|pa(Dk) |Dk D } and D\D is a set of arbitrary policy functions for D\D . The partial evaluation of M over (D , U ) computes a local maximum conditional expected utility and finds the local maximum strategy ( D D\D ) as follows. Definition 1. The local maximum conditional expected utility evaluated on D D and U U in M is defined by LMEUM(D ,U ) := max D E h X Ui U Ui pa(D ) i , (6) where the conditional expectation is taken over the conditional probability P(X D)/P pa(D ) and the local maximum conditional expected utility is obtained by a local maximum strategy ( D D\D ). Definition 2. The set of relevant variables for evaluating LMEUM(D ,U ) with an arbitrary fixed D\D is the subset of observed variables RELO(D , U ) pa(D ) and the subset of hidden variables RELH(D , U ) (X D) \ pa(D ) so that Eq. (6) can be evaluated by LMEUM(D ,U ) = max D E h X Ui U Ui RELO(D , U ) i , (7) where the conditional expectation is taken over the relevant subset of variables P RELH(D , U )|RELO(D , U )). The local maximum strategy remains the same under the relevant observed variables RELO(D , U ). Definition 3. A submodel M (D , U ) of an ID M relevant to LMEUM(D ,U ) is a tuple X , D , P , U , O such that X = X RELO(D , U ) RELH(D , U ) , P = {Pk Xk|pa(Xk) |Xk X , sc(Pk) (X D )}, and O = RELO(D , U ) D RELH(D , U ) . The relevant observed variables and hidden variables can be found by graph separation criteria. In the following, the backdoor set for a pair (X, Y) is denoted by BD(X, Y), and the front door set is denoted by FD(X, Y) (Pearl 2009). Proposition 1. For computing LMEUM(D ,U ), the relevant observed variables RELO(D , U ) is a subset of pa(D ) that forms BD(D , U ), and the relevant hidden variables RELH(D , U ) is a subset of X D \ pa(D ) , where each variable in RELH(D , U ) is a member of any FD(pa(D ), ch(U )), where ch(U ) is an auxiliary child node for the value nodes U . We say that a relevant set is minimal if we cannot reduce the set further. Intuitively, RELO(D , U ) is a subset of observed variables that shields all the influence between the decision nodes D and the value nodes U , and RELH(D , U ) is a subset of hidden variables that mediates influence between the decision nodes D and value nodes U . Next, we introduce a utility random variable and show lemmas for proving the above proposition. Definition 4 (Utility Random Variables). Given a utility function Uk U in an ID M, we define Wk as a random variable with its domain defined by ΩWk = range(Uk) s.t. Uk : sc(Uk) range(Uk), (8) where range(Uk) is the set of outcomes of the utility function Uk. The probability associated with Wk can be defined by P Wk=wk = X pa(Uk) P pa(Uk) I Uk(pa(Uk)) = wk , (9) where wk range(Uk), I Uk(pa(Uk)) = wk is the indicator function, and P pa(Uk) is the marginal probability over the set of random variables pa(Uk) in M. By using the utility random variable, we can rewrite the expected utility by E h Uk i = P wk ΩWk wk P(Wk = wk). In the presence of multiple utility functions U , we first add a new child value node to all value nodes U , and define a utility random variable for the newly added value node. Lemma 1 (Identifying Relevant Observed Nodes). Let Bk k N be a nested sequence of subsets of pa(D ) such that (1) B0 = pa(D ), and (2) Bk Bk 1, where each Bk is a BD(D , U ). Then, for a set RELO(D , U ), there exists a sequence Bk k N that reaches to the minimal RELO(D , U ). Proof. By induction, let s assume that Bk is a BD(D , U ) and Y = pa(D )\Bk. Since Bk is a backdoor set, Bk blocks every path between D and U that contains an arrow into D . The LMEUM(D ,U ) can be written by max D |pa(D ) E h X Ui U Ui pa(D ) i (10) = max D |pa(D ) X D D |pa(D ) X W W P(W|fa(D )) (11) = max D |Bk X X D (D |Bk) X W W P(W|D , Bk). (12) Since Bk is a BD(D , U ), the joint policy function D |pa(D ) over D only depends on Bk, i.e., D |pa(D ) = (D |Bk), and P(W|Y, Bk) = P(W|Bk). The input ID has a finite number of nodes, and by removing irrelevant observed nodes we eventually reach to the minimal RELO(D , U ). Lemma 2 (Identifying Relevant Hidden Nodes). Let Fk k N be a nested sequence of subsets of (X D) \ (fa(D )) such that F0 = and Fk 1 Fk where each Fk is generated by adding a node X FD(fa(D ), ch(U )), where ch(U ) is an auxiliary utility node added as a child of all utility nodes U . Then, for a set RELH(D , U ), there exists a sequence Fk k N that reaches to the minimal RELH(D , U ). Proof. Suppose a set of hidden variables Z satisfies Z FD(fa D ), ch(U ) . Introducing a new utility random variable W for the node ch(U ), the LMEUM(D ,U ) can be written by max D |pa(D ) X W W P W|pa(D ) (13) = max (D |pa(D )) D D |pa(D ) P W|fa(D ) (14) = max D |pa(D ) X D D |pa(D ) Z P Z|fa(D ) P W|fa(D ), Z . (15) Since Z FD fa(D ), W , Eq. (15) shows that the hidden variables Z are relevant to LMEUM(D ,U ). Now suppose that a hidden node T is not a member of any FD fa(D ), W . Following the definition of the front door criteria, there are two possible cases in IDs. First, the node T is not an ancestor of W. Namely, T is a barren node for evaluating LMEUM(D ,U ). Second, the node T is an ancestor of W, but it is d-separated from W given pa(D ). In both cases, the hidden random variable T is not relevant to LMEUM(D ,U ) since the variable T can be marginalized out without inducing any change to LMEUM(D ,U ). The input ID has a finite number of nodes, and by removing all irrelevant hidden nodes from the set of all hidden nodes, we obtain the minimal RELH(D , U ). The MEU task in a submodel M (D , U ) is subject to change due to the changes in U and D\D . Therefore, we want to capture a submodel M (D , U ) such that the evaluation of the local maximum conditional expected utility LMEUM(D ,U ) and the local maximum policy functions D remains stable regardless of the changes in the utility and policy functions outside of the submodel. Definition 5. We say a submodel M (D , U ) is stable if the set of relevant hidden variables does not contain any unobserved decision variable. Given a fixed set of policy functions D\D , irrelevant utility functions do not change the local maximum policy (a) M ({D3}, {U3}) (b) M ({D2, D3}, {U2, U3}) Figure 2. Example of submodels. See Example 1. functions D . The relevant utility functions can be identified from G by RELU(D ) = de(D ) U (Nielsen and Jensen 1999). Proposition 2. If a submodel M (D , RELU(D )) is stable, the MEU task in M is independent to policy functions outside of the submodel D\D , and the local maximum policy functions D extend to the global maximum strategy . Example 1. Figure 2 shows submodels in the LIMID from Figure 1b. In both examples, the decision nodes and value nodes of the submodel are shaded, the relevant observed nodes are marked with stripes, and the relevant hidden nodes are marked with a double-lined boarder. Figure 2a depicts M ({D3}, {U3}), where RELO({D3}, {U3}) = {C6} since removing C6 from pa(D3) opens a backdoor path between D3 and U3. RELH({D3}, {U3}) contains all hidden variables {C1, C2, D1, C3, C4, D2, C5} since influence between pa(D3) and ch(U3) propagates through C5. The submodel M ({D3}, {U3}) is not stable since RELH({D3}, {U3}) contains unobserved decision variables D1 and D2 that may change the local maximum policy function (D3|C3). On the other hand, Figure 2b shows a stable submodel M ({D2, D3}, {U2, U3}). We can see that C4 can be removed from pa({D2, D3}) because C4 does not open any backdoor path, and the relevant observed variables are {C3, C6}. RELH({D2, D3}, {U2, U3}) is {C5} since other unobserved variables C1, C2, and D1 are not in any FD({D2, D3}, {U2, U3}). The utility functions {U2, U3} in the submodel is RELU({D2, D3}). Valuation Algebra Over Submodels A valuation algebra is a system of potentials with combination and marginalization operations (Kohlas and Shenoy 2000). In this section, we develop a valuation algebra for solving IDs that uses submodels as potentials. We introduce the structural concepts and necessary definitions for the valuation algebra over submodels. Proposition 3. Given a submodel M , the probability and policy functions can be factorized as P(X H,up|X O,out)P(X O,in|X O,out, X H,up) (D |X O,in, X O,out) P(X H,down|X O,out, X H,up, X O,in, D ), (16) where X O,in is a subset of relevant observed variables whose parents are included in submodel, X O,out is the rest of the relevant observed variables, X H,up is the relevant hidden variables that are ancestors of relevant observed variables, and X H,down is the rest of the relevant hidden variables. The joint policy function (D |X O,in, X O,out) is factorized as Q k D k Dk|pa(Dk) (X O,in X O,out) . In Figure 2b, we can inspect X O,in = {C6}, X O,out = {C3}, and X H,up = {C5}. By using the factorization in Eq. (16), we can compute the conditional expected utility from the submodel M by Ui U Ui|X O,out] (17) and avoid unnecessary marginalization involved in Eq. (7). Since the conditional expected utility is a function over X O,out, we will call it a set of interface variables. Definition 6. The domain of a submodel ΩM is the set of all variables X D in the submodel. Definition 7. The combination operation is a binary operation, M (D , U ) = M 1(D 1, U 1) M 2(D 2, U 2) such that D = D 1 D 2 and U = U 1 U 2. The marginalization operation is defined by a projection operation Y that marginalizes variables (X D ) \ Y. Definition 8. We denote the projection operation of a submodel M (D , U ) to M (D , U ) by M (D , U ) = Y M (D , U ), where M is a tuple X , D , P , U , O with variables X and D projected on Y. For a probability function Pk P , if sc(Pk) Y, then Pk remains the same in P . Otherwise, we marginalize sc(Pk) \ Y from Pk by P sc(Pk)\Y Pk Xk, pa(Xk) P sc(Pk)\Y Pk pa(Xk) . (18) For a utility function Uk U , if sc(Pk) Y, then we marginalize sc(Pk) \ Y from Uk by X sc(Pk)\Y Uk P an(Uk) \Y|pa (an(Uk) {Uk}) \ Y . (19) Although the definition of the marginalization operation is lengthy, the graph manipulation is intuitive as shown in Figure 3. Example 2. Figure 3 illustrates the marginalization operation applied to the ID shown in Figure 3a. We will consider a submodel M ({D3}, {U2}) highlighted by the shaded nodes. Figure 3b shows the case where we are marginalizing out all but XO, out = {X4}. We can see that all the nodes in M ({D3}, {U2}) are removed and the new value node V is added according to Eq. (19). Figure 3c shows another case where we are marginalizing out {C5, C6}. We can see that both hidden variables are removed and U2 is also replaced by a new value function V (C3, C4). Although the diagram does not show the changes in the parameters of the probability function, P(C3) is also updated according to Eq. (18). (a) ID and M ({D3}, {U2}) (b) {C4} M ({D3}, {U2}) (c) {C3,C4,D3} M ({D3}, {U2}) Figure 3. Example of the marginalization operation for a submodel. See Example 2 for the details. While solving LIMIDs, we can refine the decision order OD from the reverse topological order of decision nodes in G. Note that the decision order OD is already specified in O in IDs with perfect recall. We next define a decision order OD relative to G as follows. Definition 9. Given an M, OD = {D1 . . . DT } is a sequence of disjoint subsets of decision variables, found along with a sequence of submodels Mt T t=1 and DAGs Gt T t=1 such that: 1. GT = G, 2. Mt is a stable submodel M Dt, RELU(Dt) in Gt, where Dt is a set of decision variables in Gt chosen from the last to the first decision nodes in the reverse topological order in Gt, 3. Gt 1 is generated from Gt by deleting Dt and RELU(Dt) and adding a new value node V (X O, out) from Mt. Following the decision order OD relative to G, we can obtain a set of stable submodels MOD in M. To define the valuation algebra, let MOD denote a closure of submodels in MOD, and DOD denote the set of domains of submodels in MOD. Theorem 1. Given an ID M and the decision ordering OD relative to G, a tuple ΥM = MOD, DOD, , is a valuation algebra. Next, we prove that the system of stable submodels in an ID M relative to the decision order OD, ΥM, satisfies the axioms of valuation algebra (Kohlas and Shenoy 2000). Definition 10 (Neutral Submodels). Given a submodel M := X , D , P , U , O of an ID M, we say M is a neutral submodel of M if P = and U = {0(X)|X X }, where 0(X) is a constant function 0(X) : X 0. We denote a neutral submodel over a variable X by M 0(X), a neutral submodel over a set of variables X by M 0(X) = X XM 0(X), and a set of neutral submodels over each variable X X by M0(X) = {M 0(X)|X X}. Lemma 3 (Neutral Elements). For X, Y DOD, M 0(X) M 0(Y) = M 0(X Y). Proof. The set of utility functions in M 0(X) and M 0(Y) is {0(Xi)|Xi X} and {0(Xi)|Xi Y}, respectively. It is immediate to see that the set of utility functions in M 0(X) M 0(Y) and M 0(X Y) are {0(Xi)|Xi X Y}. Lemma 4 (Semigroup). MOD is a semigroup with the combination operation over submodels. Proof. The combination operation is closed in MOD. Given M1, M2 MOD, the combination of submodels M1 M2 is defined as a component-wise union of the sets in submodels. Therefore, the combination operator is commutative and associative, showing that MOD is a semigroup. Lemma 5 (Domain of Combination). For M 1, M 2 MOD, ΩM 1 M 2 = ΩM 1 ΩM 2. Proof. Since M 1(D 1, U 1) and M 2(D 2, U 2) are stable submodels, each domain contains all relevant variables for computing the LMEUM(D 1,U 1) and LMEUM(D 2,U 2), respectively. Without loss of generality, we assume that D 1 D 2 = and U 1 U 2 = . Any non-neutral submodel in MOD can be reduce to the combination of submodels in MOD. Let D 1 D 2 in OD. Then, U 1 is not relevant to D 2 by construction of MOD. Therefore RELO(D 2, U 1) = and RELH(D 2, U 1) = . If D 2 de(D 1), then RELO(D 1, U 2) = and RELH(D 1, U 2) = . If D 2 de(D 1), then RELO(D 1 D 2, U 2) = RELO(D 2, U 2) and RELH(D 1 D 2, U 2) = RELH(D 2, U 2). Therefore, RELO(D 1 D 2, U 1 U 2) = RELO(D 1, U 1) RELO(D 2, U 2), and RELH(D 1 D 2, U 1 U 2) = RELH(D 1, U 1) RELH(D 2, U 2) Lemma 6 (Marginalization). For any stable submodel M MOD and X DOD, the marginalization satisfies w XM = w X ΩM M , Ω XM = X ΩM , and w ΩM M = M . Proof. Two projections M 1:= XM and M 2 := X ΩM M are equivalent by definition given a stable submodel M := X , D , P , U , O . Since ΩX is X, Ω XM is X (X D ). Finally, projecting a submodel to its domain leaves the submodel unchanged. Lemma 7 (Transitivity of Marginalization). For a stable submodel M MOD and X DOD, w X( w YM ) = w X YM . Proof. Let a stable submodel M in MOD be denoted by M := X , D , P , U , O and the submodels after marginalization by M L := X L, D L, P L, U L, O L = w X( w YM ), M R := X R, D R, P R, U R, O R = w X YM . Two submodels are equivalent if both have the same sets of variables, domains, functions, and constrained ordering. By definition of marginalization operation, X L = (X Y) X = X (X Y) = X R. For the probability functions, consider a conditional probability function P X i|pa(X i) . There are 3 possible cases when marginalizing out Y and X in sequence. First, the probability function remains the same in the marginalized submodel M L if sc(P X i|pa(X i) ) Y and sc(P X i|pa(X i) ) X. Second, a probability function P X i|pa(X i) is removed from M L if X i X \Y or X i (X Y)\X, which is equivalent to the condition X i X \ (X Y). The last case is marginalizing the subset of pa X i , resulting in eliminating all but pa X i (X Y) (Y X), which is equivalent to pa X i (X (X Y)). From the above cases, we can see that the probability functions in M L and M R are the same. For a utility function Ui, if Ui remains in M L, it will also remain in M R for the similar reasoning as shown in the probability functions. Lastly, D L = D R is immediate due to X L = X R, and O L = O R since the relevant sets are the same in M L and M R. Lemma 8 (Distributivity of Marginalization). For a pair of stable submodels M 1, M 2 MOD with ΩM 1 = X, w X(M 1 M 2) = M 1 ( w XM 2). Proof. We show the equivalence by comparing two submodels, M L := X L, D L, P L, U L, O L = w XM 1 M 2, M R := X R, D R, P R, U R, O R =M 1 XM 2 . X L = (X X 2) X = X (X 2 X) = X R. The projection operation applies to each element in the set of functions. Since ΩM 1 = X, the functions in M 1 remain the same in M L. The rest of the functions in M L can be obtained by X M 2. Therefore, P L = P R and U L = U R . The equivalence of D L and D R is immediate due to X L = X R. Lastly, O L = O R since the relevant sets of the observed variables and the hidden variables in both X L and X R are the same. Any valuation algebra satisfies the axiom for local computation. Therefore, the MEU task in IDs can also be solved by local computations using the valuation algebra over the stable submodels relative to OD. Submodel-Tree Clustering The tree decomposition framework for inference tasks over graphical models offers the generic architecture for local computation (Shenoy 1997; Kask et al. 2005). We now present the submodel-tree decomposition for IDs. Figure 4. Submodel-tree decomposition TST Definition 11. Given an ID M, and the set of stable submodels MOD relative to the decision ordering OD, the submodel-tree decomposition is a tuple TST := T, χ, ψ , where T = (C, S) is a tree of cluster nodes C and separator edges S, χ is a labelling function that maps a node C C to a set of chance and decision variables χ(C) = XC DC, and ψ maps a node C C to a set of stable submodels ψ(C) MOD. In addition, T = (C, S) should satisfy the running intersection property. Namely, for any variable X, a set of cluster nodes including the variable induces a connected subtree of T. Now, we use an example to illustrate the submodel-tree decomposition TST. Example 3. Figure 4 shows the TST of the LIMID shown in Figure 1b. The submodel-tree decomposition starts by identifying OD and a set of stable submodels MOD. Beginning with M (D3, U3), we see that RELH(D3, U3) contains unobserved decision variables, so we combine M ({D3}, {U3}) and M ({D2}, {U2, U3}) to find a stable submodel M 2({D2, D3}, {U2, U3}) and D 2 = {D2, D3}. After adding M 2 to the submodel-tree, we delete D 2 and {U2, U3} and add a new value node V (C3), which is the conditional expected utility computed from M 2. We can also find D 1 = {D1} and M 1(D1, {U1, V }) in the same way and complete the submodel-tree decomposition of M. Evaluating IDs over Submodel-Trees The submodel-tree decomposition TST facilitates the MEU task in IDs by identifying independent single-stage decision problems, where each submodel cluster defines a single time step decision problem in perfect recall IDs. In case of LIMIDs, a submodel cluster may span decision variables over multiple time steps and it is required to perform joint optimization over the policy functions in the submodel. If a LIMID is soluble (Lauritzen and Nilsson 2001), each cluster only contains a single decision variable that allows applying the exact variable elimination algorithm. Hierarchical Message Passing Algorithm 1 presents a hierarchical message passing scheme over a submodel-tree decomposition. The message propagation over the submodel-tree is a single pass procedure starting from the last decision stage to the first. The evaluation of each submodel is performed by a subroutine Eval(MC), which is any exact algorithm for solving IDs or LIMIDs. This step is internal message passing for computing the conditional expected utility V C at cluster MC. The V C propagates from the leaf nodes to the root node and the algorithm Algorithm 1 Hierarchical Message Passing over TST Require: TST := T(C, S), χ, ψ Ensure: Submodel-tree augmented with messages sent out from clusters, MEU 2: for each cluster C from the leaves to the root in T do 3: Pull messages from incoming edges 4: Update submodel MC at C 5: V C Eval(MC) 6: if C is the root node or V C is a constant then 7: MEU MEU + V C 9: Push V C to the outgoing edge return MEU returns the MEU at the root. The message at each cluster C can be computed by V C(XC O,out) = max DC DC DC E[ X Ui UC Ui XC O, out], (20) where DC and UC denote the decision variables and utility functions at cluster C and XC O,out is the interface variables at cluster C. Theorem 2. The time and space complexity for solving IDs over the submodel-tree decomposition is exponential in the submodel-tree width ws := max C C wc(C), which is the maximum of the constrained tree width wc(C) of individual submodels. Variational Decomposition Bounds Exact evaluation of individual submodels is still infeasible for practical problems and recent works focus on developing bounding schemes having an anytime behavior that can trade computational resources for the quality of the bounds. The main obstacle for developing efficient variational bounds for the MEU task is due to the additive utility functions, which hinders formulating a dual representation of the inference task. Therefore, we propose to use Jensen s inequality (Jensen 1906) applied to the log moment generating function, E[X] log E[e X], which exponentiates the utility functions in IDs to bound the MEU by the log partition function of a probabilistic graphical model. Proposition 4. The MEU task in a submodel can be bounded by the Marginal MAP task after exponentiating the utility functions, V C(XC O,out) max DC DC DC log E[e P Ui UC Ui XC O, out] (21) P(XC|XC O,out) Ui UC Ui + X (C ,C) S VC (XC) (22) The first inequality in Eq. (21) is obtained by the Jensen s inequality, and the second inequality in Eq. (22) is the decomposition bounds for the MMAP task (Liu 2014; Ping, Liu, and Ihler 2015). Figure 5. ST-GDD(i): decomposition bounds over TST We present briefly two new algorithms based on the variational decomposition bounds for MMAP: (1) submodel-tree decomposition with generalized dual decomposition (STGDD), and (2) submodel-tree decomposition with weighted mini-bucket with moment matching (ST-WMBMM). Both GDD and WMBMM algorithms provide state-of-the-art upper bounds for the MMAP task and they use the i-bound parameter for controlling the space and time complexity in an anytime manner. Next, we illustrate ST-GDD(i) by using an example in Figure 5. Example 4. For the 2-stage submodel-tree in Figure 4, Eval(MC) bounds two submodels M 1 and M 2. The internal GDD message passing algorithm decomposes each submodel into a join-graph using some i-bound. For M 2, the internal message passing tightens the upper bounds, and sends an approximate message from the internal clusters by marginalizing out all the variables except for the interface variable C3. The MEU can be computed by adding all the values from the internal clusters at the root submodel, M 1. Experiments We compare the upper-bounds from the proposed algorithms ST-GDD(i) and ST-WMBMM(i) with the state-ofthe-art methods in the following three experiments: (1) synthetic IDs with perfect recall comparing against the stateof-the-art decomposition bounds JGDID(i) (Lee, Ihler, and Dechter 2018) and WMBEID(i) (Lee et al. 2019), (2) upper bounds of the LIMIDs comparing against the error-bounds presented by (Mauá 2016), and (3) a case study that evaluates the upper bounds on large scale problems adopted from an online-planning domain. Benchmarks Domains. The synthetic benchmark domains are generated as follows: (1) Factored FH-MDP instances are generated from two-stage factored MDP templates by varying the number of state and action variables, and the time horizon; (2) Factored FH-POMDP instances are generated similarly to FH-MDP instances, using factored POMDP templates; (3) Random influence diagram instances (RAND) are generated by generating DAGs in random for influence diagrams; (4) BN instances are existing Bayesian networks used in the UAI-2006 probabilistic inference competitions which we converted to IDs; (5) The LIMID benchmark converted the above four benchmark domains into LIMIDs by removing the temporal constraints; and (6) System administrator MDP/POMDP instances are generated by translat- Domain n wc ws ST-GDD(i=1) ST-GDD(i=5) ST-WMB(i=10) JGDID(i=1) WMBEID(i=10) ID-BN 84.6 30.2 21.8 0.19 0.15 0.13 0.33 0.74 IDBN14w57d12 115 57 42 103.89 96.24 95.37 1420 2.2E+4 FH-MDP 105.7 25.5 25.4 0.06 0.07 0.18 0.16 0.44 mdp9-32-3-8-3 99 43 43 18.92 19.71 25.31 23.09 111.81 FH-POMDP 55.9 28.1 28.1 0.31 0.22 0.06 0.56 0.72 pomdp8-14-9-3-12-14 96 47 46 73.53 76.37 67.18 5.E+08 5.E+09 RAND 56.2 20.5 17.9 0.22 0.24 0.24 0.23 0.46 rand-c70d21o1 84 32 34 1309.89 1791.93 1752.47 1743.6 2.E+04 Table 1. Quality of upper bounds on ID benchmarks. n is the number of variables, wc is the constrained induced width by JGDID or WMBEID, ws is the submodel induced width by ST-GDD or ST-WMB. Each domain shows the average of the statistics and the average gap U Umin U , and the bounds from the hardest instance. ing RDDL instances into 2-stage influence diagrams and unrolling them to the desired time-horizons (up to 10). Results from the ID domains. Table 1 shows a summary of the experiments comparing the quality of the upper bounds for IDs. We can see that the submodel induced width ws is lower than the constrained induced width wc from a strong junction tree. The average gaps (the lower, the better) from all four benchmarks show that the proposed ST-GDD and ST-WMBMM generate higher quality upper bounds than others. From the results of ST-GDD(i=5) and ST-WMBMM(i=10), we can see that a simple single-pass internal message-passing algorithm can generate comparable upper bounds with higher i-bounds. Results from the LIMID domains. Table 2 shows the results from the LIMIDs benchmark domain. To the best of our knowledge, there s no practical upper bounding scheme available for LIMIDs, so we compare against the theoretical bounds given in (Mauá and Cozman 2016). We can see that the upper bounds from ST-WMB with i=10 are orders of magnitudes tighter than the theoretical kpu-UB. A Case Study on Sys Admin Domain. The Sys Admin domain (Guestrin et al. 2003) is one of the problems used in the international planning competitions. We compare the upper bounds from ST-WMB(i=20) and the expected value returned by the online MDP/POMDP planner (Cui and Khardon 2016, 2019). This is because other schemes JGDID, WMBEID, and kpu-UB could not generate any meaningful bounds at this scale. Although the values from the online planner and ST-WMB are not directly comparable as the online planner computes the Monte-Carlo estimate Domain n ws ST-WMB(i=10) kpu-UB IDBN14w42d6 115 33 45.3 1.22E+8 IDBN14w57d12 115 44 93.73 4.70E+8 mdp9-32-3-8-3 99 43 25.33 1.68E+11 pomdp9-14-8-3-10-4 92 44 41.35 4.27E+8 rand-c50d15o1 84 27 1250 1.13E+6 rand-c70d7o1 91 21 658 4.38E+5 Table 2. Upper bounds on LIMID benchmarks. ws is the submodel induced width of the LIMID instances, kpu-UB is the analytical bound (Mauá and Cozman 2016). of the suboptimal online strategy while the ST-WMB computes the deterministic upper bounds of the optimal offline strategy, we can see that the upper bounds from the finitehorizon MDP and POMDP instances are close to the lower bounds, demonstrating the potential for applying our bounding schemes to planning under uncertainty. Instance ws t=4 (OP) t=4 (UB) t=10 (OP) t=10 (UB) mdp6 68 110.51 130.63 239.00 328.25 mdp7 88 147.94 172.62 312.81 433.28 mdp8 92 146.62 174.22 300.06 439.45 mdp9 109 184.25 218.21 385.26 547.76 mdp10 113 183.65 217.52 364.57 549.87 pomdp6 360 110.55 155.24 229.13 420.17 pomdp7 480 147.50 195.35 304.58 526.01 pomdp8 480 147.90 207.96 298.83 568.49 pomdp9 701 182.78 244.42 372.75 666.88 pomdp10 300 184.93 253.04 375.90 707.23 Table 3. Upper bounds for Sys Admin domain. ws is the submodel induced width, OP denotes the value from online planners, UB denotes the upper bounds from ST-WMB(20), and t is the number of stages. This paper presents a graph-based decomposition scheme, submodel-tree decomposition for solving influence diagrams. For the perfect recall IDs, the submodel-tree decomposition identifies the minimal relevant sets for computing the MEU. For LIMIDs, it identifies the subset of decision variables that should be optimized jointly under the imperfect recall. Since exact algorithms are intractable, we also present a variational decomposition bounding scheme for the MEU task which is built on top of the proposed submodel-tree decomposition and decomposition bounds for MMAP inference, achieving tighter upper bounds compared with the state-of-the-art approaches. In the future work, the presented decomposition methods can be extended to more general influence diagrams such as multi-agent influence diagrams. Since submodel decomposition bounds provide a cost-to-go function over a submodel-tree, we can utilize the bounding scheme for the heuristic search. Acknowledgments We thank the reviewers for their valuable feedback. This work was supported in part by NSF grants IIS-2008516. Broader Impact In this paper, we present a new method for solving influence diagrams that can also apply to more general limited memory influence diagrams. The main contribution is two folds. First, we presented a graph-based method for reducing influence diagrams. The proposed method can also apply to a more general case than earlier works; decision making under the limited memory (decision-maker does not have access to the information about the history). Second, we also provide a simple yet efficient way to generate upper bounds of the optimal solution. For the potential impact of the current work, the proposed method can improve other approaches such as heuristic search of sampling methods for decision-making. More importantly, the ideas shown in the current work can provide a new perspective on sequential decision making by relaxing the restriction of underlying models from MDP or POMDP to more general structures. Concerning the limitation of our work, care must be taken when applying our work to a real-world application setting. Our proposed method provides upper bounds of the value of the optimal solution and suboptimal policies that may not achieve the optimal value. Besides, the input problem has to be provided in a structured format that requires knowledge engineering or machine learning. References Bodlaender, H. L. 1988. Dynamic programming on graphs with bounded treewidth. In International Colloquium on Automata, Languages, and Programming, 105 118. Springer. Cui, H.; and Khardon, R. 2016. Online Symbolic Gradient Based Optimization for Factored Action MDPs. In Proceedings of the International Joint Conferences on Artificial Intelligence, 3075 3081. Cui, H. J.; and Khardon, R. 2019. Sampling Networks and Aggregate Simulation for Online POMDP Planning. In Advances in Neural Information Processing Systems, 9218 9228. Dechter, R. 2000. A New Perspective on Algorithms for Optimizing Policies under Uncertainty. In Proceedings of the 5th Conference on Artificial Intelligence and Planning Systems, 72 81. Dechter, R. 2013. Reasoning with probabilistic and deterministic graphical models: Exact algorithms. Synthesis Lectures on Artificial Intelligence and Machine Learning 7(3): 1 191. Detwarasiti, A.; and Shachter, R. D. 2005. Influence diagrams for team decision analysis. Decision Analysis 2(4): 207 228. Guestrin, C.; Koller, D.; Parr, R.; and Venkataraman, S. 2003. Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research 19: 399 468. Howard, R. A.; and Matheson, J. E. 1981. Influence Diagrams. Readings on the Principles and Applications of Decision Analysis 721 762. Jensen, F.; Jensen, F. V.; and Dittmer, S. L. 1994. From Influence Diagrams to Junction Trees. In Proceedings of the International Conference on Uncertainty in Artificial Intelligence, 367 373. Jensen, F. V.; and Gatti, E. 2010. Information enhancement for approximate representation of optimal strategies from influence diagrams. on Probabilistic Graphical Models 161. Jensen, J. L. W. V. 1906. Sur les fonctions convexes et les inégalités entre les valeurs moyennes. Acta Math. 30: 175 193. doi:10.1007/BF02418571. URL https://doi.org/ 10.1007/BF02418571. Kask, K.; Dechter, R.; Larrosa, J.; and Dechter, A. 2005. Unifying tree decompositions for reasoning in graphical models. Artificial Intelligence 166(1-2): 165 193. Kohlas, J.; and Shenoy, P. P. 2000. Computation in valuation algebras. In Handbook of defeasible reasoning and uncertainty management systems, 5 39. Springer. Lauritzen, S. L.; and Nilsson, D. 2001. Representing and Solving Decision Problems with Limited Information. Management Science 47(9): 1235 1251. Lee, J.; Ihler, A.; and Dechter, R. 2018. Join Graph Decomposition Bounds for Influence Diagrams. In Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence, 1053 1062. Lee, J.; Marinescu, R.; Ihler, A.; and Dechter, R. 2019. A Weighted Mini-Bucket Bound for Solving Influence Diagrams. In Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence (UAI). Liu, Q. 2014. Reasoning and Decisions in Probabilistic Graphical Models A Unified Framework. University of California, Irvine. Liu, Q.; and Ihler, A. 2011. Bounding the Partition Function using Hölder s Inequality. In Proceedings of the International Conference on Machine Learning, 849 856. Liu, Q.; and Ihler, A. 2012. Belief Propagation for Structured Decision Making. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence, 523 532. Marinescu, R.; Dechter, R.; and Ihler, A. 2014. AND/OR Search for Marginal MAP. In Proceeding of the International Conference on Uncertainty in Artificial Intelligence, 563 572. Quebec City, Canada. Marinescu, R.; Lee, J.; Dechter, R.; and Ihler, A. 2018. And/or Search for Marginal Map. Journal of Artificial Intelligence Research 63: 875 921. Mateescu, R.; Kask, K.; Gogate, V.; and Dechter, R. 2010. Join-graph propagation algorithms. Journal of Artificial Intelligence Research 37: 279 328. Mauá, D. D. 2016. Equivalences Between Maximum a Posteriori Inference in Bayesian Networks and Maximum Expected Utility Computation in Influence Diagrams. Int. J. Approx. Reasoning 68(C): 211 229. Mauá, D. D.; and Cozman, F. G. 2016. Fast Local Search Methods for Solving Limited Memory Influence Diagrams. Int. J. Approx. Reasoning 68(C): 230 245. Mauá, D. D.; de Campos, C. P.; and Zaffalon, M. 2012. Solving Limited Memory Influence Diagrams. Journal of Artificial Intelligence Research 44: 97 140. Nielsen, T. D. 2001. Decomposition of influence diagrams. In European Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty, 144 155. Springer. Nielsen, T. D.; and Jensen, F. V. 1999. Well-defined Decision Scenarios. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, 502 511. Nilsson, D.; and Hohle, M. 2001. Computing Bounds on Expected Utilties for Optimal Policies Based on Limited Information. Dinar Research Report . Pearl, J. 2009. Causality. 78 85. Cambridge University Press. Ping, W.; Liu, Q.; and Ihler, A. T. 2015. Decomposition Bounds for Marginal MAP. In Proceedings of Advances in Neural Information Processing Systems 28, 3267 3275. Pralet, C.; Schiex, T.; and Verfaillie, G. 2006. From Influence Diagrams to Multi-operator Cluster DAGs. In Proceedings of the International Conference on Uncertainty in Artificial Intelligence, 393 400. Shenoy, P. P. 1997. Binary join trees for computing marginals in the Shenoy-Shafer architecture. International Journal of approximate reasoning 17(2-3): 239 263. Shenoy, P. P.; and Shafer, G. 1990. Axioms for Probability and Belief-function propagations. In Proceedings of the International Conference on Uncertainty in Artificial Intelligence, 169 198. Wainwright, M. J.; and Jordan, M. I. 2008. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning 1(1-2): 1 305. Yuan, C.; Wu, X.; and Hansen, E. A. 2010. Solving Multistage Influence Diagrams Using Branch-and-bound Search. In Proceedings of the International Conference on Uncertainty in Artificial Intelligence, 691 700. Zhang, N. L.; and Poole, D. L. 1992. Stepwise Decomposable Influence Diagrams. In Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning, 141 152.