# do_not_marginalize_mechanisms_rather_consolidate__da98e9a8.pdf Do Not Marginalize Mechanisms, Rather Consolidate! Moritz Willig Technical University of Darmstadt moritz.willig@cs.tu-darmstadt.de Matej Zeˇcevi c Technical University of Darmstadt matej.zecevic@tu-darmstadt.de Devendra Singh Dhami Eindhoven University of Technology Hessian Center for AI (hessian.AI) d.s.dhami@tue.nl Kristian Kersting Technical University Darmstadt Hessian Center for AI (hessian.AI) German Research Center for AI (DFKI) kersting@cs.tu-darmstadt.de Structural causal models (SCMs) are a powerful tool for understanding the complex causal relationships that underlie many real-world systems. As these systems grow in size, the number of variables and complexity of interactions between them does, too. Thus, becoming convoluted and difficult to analyze. This is particularly true in the context of machine learning and artificial intelligence, where an ever increasing amount of data demands for new methods to simplify and compress large scale SCM. While methods for marginalizing and abstracting SCM already exist today, they may destroy the causality of the marginalized model. To alleviate this, we introduce the concept of consolidating causal mechanisms to transform large-scale SCM while preserving consistent interventional behaviour. We show consolidation is a powerful method for simplifying SCM, discuss reduction of computational complexity and give a perspective on generalizing abilities of consolidated SCM. 1 Introduction Even complex real world systems might be modeled using structural causal models (SCM) [Pearl, 2009] and several methods exist for doing so automatically from data [Spirtes et al., 2000, Pearl, 2009, Peters et al., 2017]. While technically reflecting the causal structure of the systems under consideration, SCM might not entail intuitive interpretations to the user. Large scale SCM like, appearing for example in genomics, medical data [Squires et al., 2022, Ribeiro-Dantas et al., 2023] or machine learning [Schölkopf et al., 2021, Berrevoets et al., 2023], may become increasingly complex and thereby less interpretable. Contrary to this, computing average treatment effects might be too uninformative given the specific application, as the complete causal mechanism is compressed into a single number. Ideally a user could express the factors of interest and yield a reduced causal system that isolates the relevant mechanism from the rest of the model. In contrast to other probabilistic models, SCM model the additional aspect of interventions. Consider for example a row of dominoes and its corresponding causal graph as shown in Figure 1. If the starting stone it tipped over, it will affect the following stones, causing the whole row to fall. Humans usually have a good intuition about predicting the unfolding of such physical systems [Gerstenberg, 2022, Beck and Riggs, 2014, Zhou et al., 2023]. Second to that, it is easy to imagine what would happen, if we were to hold onto a domino stone, that is, intervening actively upon the domino sequence. Alternatively, we can programmatically simulate these systems to reason about their outcomes. A simulator tediously computes and updates positions, rotations and collision states of all objects in DSD contributed while being with hessian.AI and TU Darmstadt before joining TU\e. 37th Conference on Neural Information Processing Systems (Neur IPS 2023). X1 X2 X3 X4 X5 X5 X1 Causal Model Consolidation Marginalization Figure 1: Consolidation vs. Marginalization. Even simple real-world systems, like this row of dominoes, are composed of numerous intermediate steps. Classical structural causal models require the explicit evaluation of the individual structural equations to respect possible interventions along the computational chain and yield the final value of X5. The intermediate steps (X2, X3, X4) might be marginalized to obtain a simplified functional representation. Marginalization, however, loses some causal interpretation of the process, as interventions on the marginalized variables can no longer be performed. Consolidation of causal models simplifies the graph structure (compare to Appendix D.1), while respecting interventions on the marginalized variables. Thus, preserving the ability to intervene on the underlying causal mechanisms. (Best viewed in color.) the system. Depending on the abstraction level of our SCM, computations might be simplified to represent individual stones as binary variables, indicating a stone standing up or getting pushed over. Nonetheless, classical evaluation of simplified SCM is still performed step by step to be able to respect possible interventions on the individual stones. Given that we might only be interested in the outcome. That is, whether or not the last stone will tip over, computing all intermediate steps seems to be a waste of computation, as already noted by Peters and Halpern [2021]. Under these premises, we are interested in preserving the ability to intervene while also being computationally efficient. Classical marginalization [Pearl, 2009, Rubenstein et al., 2017] is of no help to us as it destroys the causal aspect of interventions attached to the variables. Consolidation vs. Marginalization. By marginalizing we do not only remove variables, but also all associated interventions, destroying the causal mechanisms of the marginalized variables. The insight of this paper, as alluded to in Figure 1 (center), is that there exists an intermediate tier of consolidated models that fill the gap between unaltered SCM and ones with classical marginalization applied. Consolidation simplifies the causal graph by compressing computations of consolidated variables into the equations of compositional variables that are functionally equivalent to the initial model, while still respecting the causal effects of possible interventions. As such consolidation generalizes marginalization in the sense that marginalization can be modeled by consolidating without interventions (I = ; see Def. 1 and Sec. 3). If questions involve causality, then consolidation necessarily needs to be considered since it can actually handle interventions (all cases where I = ). If causal questions are not of concern, then marginalization can be considered as remaining the standard marginalization procedure. One perspective on our approach is to describe consolidation as intervention preserving marginalization . Structure and Contributions of this paper. In section two we discuss the foundation of SCM and related work. In section three we formally establish the composition of structural causal equations, partitioned SCM, and finally consolidated SCM. In section four we discuss the possible compression and computational simplifications resulting from consolidating models. We present an applied example for simplifying time series data and in a second example demonstrate how consolidation reveals the policy of a game agent. Finally, in section five, we discuss generalizing abilities of our method and provide a perspective on the broader impact and further research. The technical contributions of this paper are as follows: We define causal compositional variables that yield functionally equivalent distributions to SCM under intervention. We formalize (partially) consolidated SCM by partitioning SCM under a constraint that guarantees the consistent evaluation with respect to the initial SCM. We discuss conditions under which consolidation leads to compressed causal equations. We demonstrate consolidation on two examples. First, obtaining a causal model of reduced size and, secondly, revealing the underlying policy of a causal decision making process. 2 Preliminaries and Related Work In general we write sets of variables in bold upper-case (X) and their values in lower-case (x). Single variables and their values are written in normal style (X, x). Specific elements of a set are indicated by a subscript index (Xi). Probability distributions over a variable X or a set of variables X are denoted by PX and PX respectively. A detailed list of notation can be found in Appendix E. Structural Causal Models provide a framework to formalize a notion of causality via graphical models [Pearl, 2009]. From a computational perspective, structural equation models (SEM) can be considered instead of SCM [Halpern, 2000, Spirtes et al., 2000]. While focusing on computational aspects of consolidating causal equations, we use Pearl s formalism of SCM. Modeling causal systems using SCM over SEM does not affect our freedom, as Rubenstein et al. [2017] show consistency between both frameworks. Similar to earlier works of Halpern [2000], Beckers and Halpern [2019] and Rubenstein et al. [2017], we do not assume independence of exogenous variables and model SCM with an explicit set of allowed interventions. Definition 1 A structural causal model is a tuple M = (V, U, F, I, PU) forming a directed acyclic graph G over variables X = {X1, . . . , XK} taking values in X = Q k {1...K} Xk subject to a strict partial order 0, provides a relaxed consolidation constraint for noisy systems. SCM constitute a well suited framework for representing causal knowledge in the form of graphical models. The ability to trace effects through structural equations that yield explanations about the role of variables within causal models is required to make results accessible to non-experts. Actual causation, and only recently, causal abstractions and constrained causal models have come to attention in the field of causality [Halpern, 2016, Zennaro et al., 2023, Beckers and Halpern, 2019, Blom et al., 2020] and might be beneficial for future works on consolidation. Apart from computational advantages, consolidation of SCMs presents itself as a method that enables researchers to break down complex structures and present aspects of causal systems in a broadly accessible manner. Without such tools, SCMs run the danger of being only useful to specialized experts. Acknowledgments and Disclosure of Funding The authors acknowledge the support of the German Science Foundation (DFG) project Causality, Argumentation, and Machine Learning (CAML2, KE 1686/3-2) of the SPP 1999 Robust Argumentation Machines (RATIO). The work was supported by the Hessian Ministry of Higher Education Research, Science and the Arts (HMWK) via the DEPTH group CAUSE of the Hessian Center for AI (hessian.ai). This work was partly funded by the ICT-48 Network of AI Research Excellence Center TAILOR (EU Horizon 2020, GA No 952215) and by the Federal Ministry of Education and Research (BMBF; project Plex Plain , FKZ 01IS19081). It benefited from the Hessian research priority programme LOEWE within the project White Box, the HMWK cluster project The Third Wave of AI. and the Collaboration Lab AI in Construction (AICO) of the TU Darmstadt and HOCHTIEF. We thank the anonymous reviewers for their valuable feedback and constructive criticism, which significantly improved this paper. We sincerely appreciate their time and expertise in evaluating our research. We acknowledge the use of Chat GPT for restructuring some sentences while writing the paper. Tara V Anand, Adèle H Ribeiro, Jin Tian, and Elias Bareinboim. Effect identification in cluster causal diagrams. ar Xiv preprint ar Xiv:2202.12263, 2022. Sarah R Beck and Kevin J Riggs. Developing thoughts about what might have been. Child development perspectives, 8(3):175 179, 2014. Sander Beckers and Joseph Y Halpern. Abstracting causal models. In Proceedings of the aaai conference on artificial intelligence, volume 33, pages 2678 2685, 2019. Sander Beckers, Frederick Eberhardt, and Joseph Y Halpern. Approximate causal abstractions. In Uncertainty in Artificial Intelligence, pages 606 615. PMLR, 2020. Jeroen Berrevoets, Krzysztof Kacprzyk, Zhaozhi Qian, and Mihaela van der Schaar. Causal deep learning. ar Xiv preprint ar Xiv:2303.02186, 2023. Tineke Blom, Stephan Bongers, and Joris M Mooij. Beyond structural causal models: Causal constraints models. In Uncertainty in Artificial Intelligence, pages 585 594. PMLR, 2020. Stephan Bongers, Tineke Blom, and Joris M Mooij. Causal modeling of dynamical systems. ar Xiv preprint ar Xiv:1803.08784, 2018. Stephan Bongers, Patrick Forré, Jonas Peters, and Joris M Mooij. Foundations of structural causal models with cycles and latent variables. The Annals of Statistics, 49(5):2885 2915, 2021. Johann Brehmer, Pim De Haan, Phillip Lippe, and Taco S Cohen. Weakly supervised causal representation learning. Advances in Neural Information Processing Systems, 35:38319 38331, 2022. Krzysztof Chalupka, Frederick Eberhardt, and Pietro Perona. Multi-level cause-effect systems. In Artificial intelligence and statistics, pages 361 369. PMLR, 2016. Tobias Gerstenberg. What would have happened? counterfactuals, hypotheticals and causal judgements. Philosophical Transactions of the Royal Society B, 377(1866):20210339, 2022. Joseph Y Halpern. Axiomatizing causal reasoning. Journal of Artificial Intelligence Research, 12: 317 337, 2000. Joseph Y Halpern. Actual causality. Mi T Press, 2016. Joseph Y Halpern and Spencer Peters. Reasoning about causal models with infinitely many variables. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 5668 5675, 2022. Mark Hopkins and Judea Pearl. Causality and counterfactuals in the situation calculus. Journal of Logic and Computation, 17(5):939 953, 2007. Andrei N Kolmogorov. On tables of random numbers. Sankhy a: The Indian Journal of Statistics, Series A, pages 369 376, 1963. Judea Pearl. Causality. Cambridge university press, 2009. Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. Elements of causal inference: foundations and learning algorithms. The MIT Press, 2017. Jonas Peters, Stefan Bauer, and Niklas Pfister. Causal models for dynamical systems. In Probabilistic and Causal Inference: The Works of Judea Pearl, pages 671 690. 2022. Spencer Peters and Joseph Y Halpern. Causal modeling with infinitely many variables. ar Xiv preprint ar Xiv:2112.09171, 2021. Marcel da Câmara Ribeiro-Dantas, Honghao Li, Vincent Cabeli, Louise Dupuis, Franck Simon, Liza Hettal, Anne-Sophie Hamy, and Hervé Isambert. Learning interpretable causal networks from very large datasets, application to 400,000 medical records of breast cancer patients. ar Xiv preprint ar Xiv:2303.06423, 2023. Paul K Rubenstein, Sebastian Weichwald, Stephan Bongers, Joris M Mooij, Dominik Janzing, Moritz Grosse-Wentrup, and Bernhard Schölkopf. Causal consistency of structural equation models. ar Xiv preprint ar Xiv:1707.00819, 2017. Bernhard Schölkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, and Yoshua Bengio. Toward causal representation learning. Proceedings of the IEEE, 109(5):612 634, 2021. Peter Spirtes, Clark N Glymour, Richard Scheines, and David Heckerman. Causation, prediction, and search. MIT press, 2000. Chandler Squires, Annie Yun, Eshaan Nichani, Raj Agrawal, and Caroline Uhler. Causal structure discovery between clusters of nodes induced by latent factors. In Conference on Causal Learning and Reasoning, pages 669 687. PMLR, 2022. Fabio Massimo Zennaro, Máté Drávucz, Geanina Apachitei, W Dhammika Widanage, and Theodoros Damoulas. Jointly learning consistent causal abstractions over multiple interventional distributions. ar Xiv preprint ar Xiv:2301.05893, 2023. Liang Zhou, Kevin A Smith, Joshua B Tenenbaum, and Tobias Gerstenberg. Mental jenga: A counterfactual simulation model of causal judgments about physical support. Journal of Experimental Psychology: General, 2023. Alexander K Zvonkin and Leonid A Levin. The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms. Russian Mathematical Surveys, 25(6):83, 1970. Supplementary Material Do Not Marginalize Mechanisms, Rather Consolidate! A Evaluation of Partitioned SCM A partitioned SCM MA consists of several sub SCM MA, that, in sum, cover all variables and structural equations of an initial SCM M. Thus, evaluation of a partitioned SCM yields the same set of values v V as the original M. Similar to the evaluation of structural equation in the initial M, sub SCM need to be evaluated in a specific order to guarantee all u M U exist. As such, sub SCM can be considered multivariate variables that establish another high-level DAG. The evaluation order is determined via the relation RX as defined in Sec. 3.1 and depends on the graph partition A and the order of X imposed by the the initial SCM. Algorithm 2 Evaluation of partitioned SCM 1: procedure PARTITIONEDSCMEVAL(MA, u, I) 2: x u x will gradually collect all values x X of M 3: for A in sort(A, RX) do Sort clusters by strict partial order imposed by M 4: M A M A MA where A = A 5: u {xi x | Xi M U} 6: I ψA(I) 7: v = M I A (u ) 8: x = x v 9: end for 10: v = {xi x | Xi M V} Filter all u U to get v V 11: return v 12: end procedure Algorithm 2 shows the evaluation of partitioned SCM, where MA is the partitioned SCM we want to evaluate, u are the values of exogenous variables to the initial model M and I is the set of applied interventions. The outcomes of sub SCM that are not related via RX are invariant to the evaluation order among each other. Even though RX defines the ordering of sub SCM only up to some partial order, sort(A, RX) can pick any total ordering that is valid with RX. Proof 1 (Consistency of Partitioned SCM Evaluation) Evaluations of M A every, in step 7, compute all variables Vi A by evaluating fi of the original SCM, yielding the same values as the evaluation of A in M. Therefore PM A = PMA. By Def. 4 every variable V V is contained within some sub SCM M A. The evaluation of Partitioned SCMEval is complete, in the sense that all V = S A = S A A A are evaluated, as the evaluation of all M A MA is guaranteed by iterating over all A in step 2. Finally PM A = S A A PM A = S A A PMA = PMV. B Complexity reduction in function composition Reduction of encoding length might vary depending on the type and structure of the equations under consideration. No compression of structural equation is gained when the system of consolidated equations is already minimal. Compression of equation to an identity function is showcased in the following. B.1 Compression of chained inverses Reduction to constant complexity for the unintervened system is reached in the case of f B = f 1 A . Consider the equation chain of X A B with A getting marginalized. Immediately f B := f B f A = f 1 A f A = Id follows. Therefore, B := X, which is a single assignment of the value(s) of X into B. Remaining complexity within the consolidated function is then only due to conditional branching in cases of do(A = a), do(B = b) I. B.2 Matrix composition is not sufficient for compressing equations The operation of matrix multiplication, as a way of expressing composition of linear functions, stays within the class of matrices. Matrix multiplication, therefore, serves as a possible candidate to be considered when consolidating equations and reducing the encoding length of a linear structural systems. When written down an a high-level view, matrices can expressed in terms of single variables A, B RM N and matrix multiplication : RM N RN O RM O. Assuming equations f Y := A X and f Z := B X, we can reduce the length of the composed equation f Z := A B X by multiply the matrices A and B together, fi = C X with C = A B. While we effectively reduced the number of high-level symbols written in the equation, we are hiding computational complexity in the structure of the matrix C. The following simple counterexample demonstrates a situation where the size, as well as, the number of non-zero entries even increases: C A B "0 1 1 0 1 1 0 1 1 "0 1 0 1 0 1 0 0 0 0 1 1 Thus, proving that pure matrix multiplication, is not suitable to keep, or even minimize, the size of composed function representations. B.3 Compression over Finite Discrete Domains Consolidation may reduce the number of variables within a graph, but burdens the remaining equations with the complexity of the consolidated variables. Without the need to explicitly compute values of consolidated variables, we might leverage cancellation effects to simplify equations, as outlined in the main paper. In terms of compression, no guarantees can be given in the general case. However, we will now show, that the often considered case of chained maps between finite discrete domains simplifies or at least preserves complexity. The cardinality of the image of a deterministic function f : X Y between two finite discrete sets X, Y is bounded by the cardinality of its domain: | Img(f)| | Dom(f)| |X|, where Img(f) is the image and Dom(f) the domain of f. In particular, the strict inequality | Img(f)| < | Dom(f)| holds for all non-injective maps. Function composition may further reduce the effective domain Domeffective(f) of a function, by only considering values of the image of the previous map as inputs to the next function. In contrast considering to all possible values of X in the case of the non-composed map, the image of the previous function may only be a subset of X. Therefore, f2 f1 | Imgeffective(f2)| | Domeffective(f2)| = | Img(f1)| | Dom(f1)|. In particular, the effective image of a composition chain fn f1 is bounded by the function with the smallest image: | Imgeffective(fn f1)| min | Img(fi)|. Thus, equation chains over finite discrete domains strictly preserve or reduce the effective size of the image, allowing for a possibly simpler combined representation in comparison to representing the functions individually. C Reparameterization of non-deterministic structural equations. Consolidation of structural equations might lead to duplication of non-deterministic terms within consolidated systems. For example when consolidating fork structures (compare to Sec. 4.1). Without further precautions, different values might be sampled from the duplicated non-deterministic equations. An example where consolidating a variable B with a non-deterministic equation f B (indicated by a squiggly line) leads to inconsistent behaviour is shown in 6. In M1, C and D both copy on the value of B. Therefore, c = d yields always. M1 shows a graph where B is consolidated from M1. As a result the non-deterministic equation f B is duplicated into the equations of C and D, such that f C := Bern(A) and f D := Bern(A). Within the consolidated model M1 different values might be be sampled from the different noise terms Bern(A) in f C and f D. Consequently c = d might occur in M1 . To obtain consistent behaviour with the initial M1, we need to ensure agreement about the value of Bern(A) across all instances of the duplicated equation. To do so, we reparameterize M1 and explicitly store a fixed value, sampled from Bern(A), into a new exogenous variable R. The equation f B is then reparameterized into a deterministic structural equation taking the variable R as an additional argument, resulting in M2. When consolidating B within M2, all instances of f B now yield the same value, as the noise term is fixed via R and finally PM 2 = PM1. (1) Non-Determistic Model. Yields C=D (2) Inconsistent Behaviour: C D (4) Consistent Model Behaviour: C=D (3) Reparameterization to Deterministic Eqs. Figure 6: Reparameterization of non-deterministic models. The SCM M1 contains a nondeterministic equation B := Bern(A) (marked with a squiggly line). With C := B and D := B, M1 always yields C = D. Simply consolidating (or marginalizing) B creates a model M1 with C := Bern(A) and D := Bern(A), such that possibly C = D. Reparameterizing f B by introducing an exogenous random variable R := U(0, 1) and B := A < R, yields the SCM M2 with only deterministic equations. Consolidating (or marginalizing) B in M2 leads to M2 where C := A < R and D := A < R, thus always C = D. D Consolidation Examples In this section we show further detailed applications of consolidation. Section D.1 presents the worked out consolidation of the dominoes motivating example of the paper, with regard to generalizing abilities of consolidates models. Section D.2 considers consolidation of the classical firing squad example. In contrast to the other examples, we focus on consolidating graphs with multiple edges in the causal graph. Lastly we provide the causal graph and structural equations of the game agent policy discussed in the main paper, in Section D.4. D.1 Motivating Example: Dominoes While we applied consolidation to a particular SCMs in the main paper, we will discuss the motivating example with focus on obtaining representations that cover generalize over populations of SCM. We demonstrate this on the particular example of a rows of dominoes, as a simple SCM with highly homogenous structure. Regardless of whether the SCM is obtained by using methods for direct identification of causal graphs from image data, as presented by Brehmer et al. [2022], or abstracting physical simulation using τ-abstractions [Beckers and Halpern, 2019]; we assume to be provided with a binary representation of the domino stones. The state of every domino Si indicates whether it is standing up or getting pushed over. In this case, the structural equations for all dominoes are the same: fi := Si 1. As a result tipping over the first stone in a row will lead to all stones falling. Also, we are only interested in the final outcome of the chain. That is, whether the last stone will fall or not (E = {Sn}). Again, we use consolidation to collapse the structural equations in the unintervened case: Sn := fn f1 := S1. We consider a single active allowed intervention of holding up any of the dominoes or tipping it over, I = {do(Si = 0), do(Si = 1)}. Upon evaluation, the unconsolidated model needs to check for every domino if it is being intervened or not, requiring n conditional branches. Using the fact that perfect interventions overwrite the variable state for the following dominoes, we introduce a first order quantifier that handles all intervention in a unified way. Finally, by combining the formulas of the intervened and unintervened case, we find the following simple equation: Sn := xi if do(Si = xi) I S1 else The resulting equation no longer has a notion of the actual number of dominoes and, in fact, it is invariant to it. We realise that introducing the first-order for-all and exists quantifiers allows for a unified representation of arbitrary chains of dominoes. Similar observations are discussed in Peters and Halpern [2021] and Halpern and Peters [2022] which introduce generalized SEM (GSEM). As intermediate the equations are no longer computed explicitly, the structural equations of consolidated models for different row lengths only differ in the set of allowed interventions I. That is, for a row of three domino stones I = {do(V1 = v1), do(V2 = v1), do(V3 = v1)}, while for four stones the additional do(V4 = v1) is defined. As set out in the introduction of this paper, we consider f B(A) := 0 if A 5 1 if A > 5 true if B = 0 false if 0 B 10 true otherwise f E(A) := A%5 = 0 f F (A) := A%10 = 0 f G(E, F) := E F f D(C) := C f H(C, G) := C G Figure 7: Example SCM for the Application of CONSOLIDATE. The figure shows a toy SCM for demonstrating application of the CONSOLIDATE algorithm. Consider the following SCM with its structural equations and resulting graph (endogenous variables are B, C, D, E, F, G, H with only one exogenous A with each structural equation highlighted on the r.h.s., note that the subscript on f_ denotes the variable to be determined e.g. B f B(A)). In the first step, the algorithm s user decides on a partition. Let s consider for instance the following partition i.e., allowed intervention and consolidation sets: A = {{E, F, G}, {B, C}, {D, H}}; E = {C, F, H}; I = {{do(D = true)}, {do(D = false)}, {do(G = false)}}. consolidation as a tool for obtaining more interpretable SCM. Towards this end, consolidation might help us in detecting similar structures within an SCM. Doing so eases understanding of causal systems, as the user only has to understand the general mechanisms of a particular SCM once and is then able to apply the gained knowledge to all newly appearing SCM of the same type. D.2 Firing Squad Example While the dominoes and tool wear examples where mainly considering the consolidation of sequential structures, we want to briefly demonstrate the consolidation of structural equations that are arranged in a parallel fashion. We consider a variation of the well known firing squad example [Hopkins and Pearl, 2007] with a variable number N of rifleman. A commander (C) gives orders to rifleman (Ri, i {1 . . . N}), which shoot accurately and the prisoner (P) dies. For the sequential stacking of equations we found that interventions exert an overwriting effect. That is, every intervention fixes the value of a variable, making the unfolding of the following equations independent of all previous computations. To yield a similar effect for parallel equations we need to block all paths between the cause and effect. In this scenario, this can easily be expressed by using an all-quantifier. When consolidating the SCM, we consider only the captain C and prisoner P, E = {C, P}, while allowing for any combination of interventions that prevent the rifleman from shooting I = P({do(Ri = 0)}i {1...N}). After consolidation, we obtain the following equation: lives if C = 0 ( Si. do(Si = 0) I) dies else As with the dominoes example, we are again in a situation where the consolidated equation intuitively summarizes the effects of individual: The prisoner lives if the captain does not give orders, or if all riflemen are prevented from shooting . D.3 Step-by-step CONSOLIDATE Application In this section we provide a step-by-step application of the CONSOLIDATE algorithm given in Algorithm 1. Consider the SCM shown in Figure 7 with its structural equations and resulting graph. The endogenous variables are B, C, D, E, F, G, H with only one exogenous A. Structural equation are highlighted on the right-hand side. Note that the subscript on fx denotes the variable to be determined e.g. B f B(A)). In a first step, the algorithm s user has to decide on a suitable partition. Consider for instance the following partition (indicated by dashed lines in the figure), the following allowed intervention and consolidation set: A = {{E, F, G}, {B, C}, {D, H}} I = {{do(D = true)}, {do(D = false)}, {do(G = false)}} E = {C, F, H} The following example presents a step-by-step application of the CONSOLIDATE algorithm for the cluster A1 = {E, F, G}: Step 3: E1 {E, F, G} {C, F, H} = {F} Step 4: E 1 {F} (pa(V \ {E, F, G}) {E, F, G}) = {F} ({A, B, C, G} {E, F, G}) = {F, G} Step 5: UA1 pa({E, F, G}) \ {E, F, G} = {A, E, F} \ {E, F, G} = {A} Step 6: IA1 {{do(Xi = v) I : Xi {E, F, G}} : I I} = {{do(G = false)}} Step 7: ρE 1 {f E(A) := A mod 5 = 0; f F (A) := A mod 10 = 0; f G(E, F) := E F} Step 8: ρ E 1 argmin K(ρE 1) = { ρF (A) := A mod 10 = 0; ρG(F, IA1) := F (do(G = false) / IA1)} Step 9: MA1,E ({F, G}, {F}, ρ E 1, {{do(G = false)}}, PA) Note how computing f E is no longer required. In a similar fashion, equations in A2 resemble a chain that can be composed: f C f B (previously called stacked ; cf. Sec. 4.1). Since |Img(f B)| = 2, at least one of the three conditions of f C (since f C is a 3-case function) will be discarded. (Eventually yielding ρ E 2 {ρC(A):=A 5}). As D is not in E and not required by any other sub SCM it can be marginalized. A3 then reduces to ρ E 3 {ρH(C, G) := C G}. D.4 Revealing Agent Policy: Causal Graph and Equations In this section we explicitly list the structural equations representing observed interactions between a platformer environment and a possible rule based agent. The resulting causal graph is shown in Fig.8 at the end of the appendix. Except for the parentless variables coin_reward , powerup_reward , enemy_reward , flag_reward , player_position , position_coin , position_powerup , position_enemy , position_flag and target_flag , which are exogenous and determined by the environment, all variables are considered endogenous: player_position, position_coin, position_powerup, position_enemy, position_flag [0..1]2 coin_reward := 3; powerup_reward := 1; enemy_reward := 9; flag_reward := 2 With X in {coin, powerup, enemy, flag} : distance_X := position_X player_position_X 2 near_X := distance_X < 3.0 targeting_cost_X := 1.0 + 0.5 distance_X target_coin := targeting_cost_coin < enemy_reward target_powerup := targeting_cost_powerup < powerup_reward target_enemy := targeting_cost_enemy < enemy_reward powered_up target_flag := True powered_up := target_powerup towards_coin := target_coin coin_reward > max({X_reward|target_X}X {powerup,enemy,flag}) towards_powerup := target_powerup powerup_reward > max({X_reward|target_X}X {coin,enemy,flag}) towards_enemy := target_enemy enemy_reward > max({X_reward|target_X}X {enemy,powerup,flag}) towards_flag := target_flag flag_reward > max({X_reward|target_X}X {coin,powerup,enemy}) jump := near_enemy powered_up planning_sequencei := finished if towards_flag (flag j=1 planning_sequence_j) coin if towards_coin (coin / j=1 planning_sequence_j) powerup if towards_powerup (powerup / j=1 planning_sequence_j) enemy if towards_enemy (enemy / j=1 planning_sequence_j) flag if towards_flag (flag / j=1 planning_sequence_j) finished else score := 20 time_taken + coin_reward if coin planning_sequencei + powerup_reward if powerup planning_sequencei + enemy_reward if enemy planning_sequencei powerup planning_sequencei + flag_reward if flag planning_sequencei E Mathematical symbols and notation The following table contains mathematical functions and notation used throughout the paper. Notation Meaning X; X A (set of) variable(s). x; x Value(s) of X; X. Xi The i-th variable of X. XS The subset {Xi : i S} of X. PX A probability distribution over variables X. x PX A value x sampled from a distribution over X. P( ) The power set. f g Function composition, (f g)(x) = f(g(x)). Q Xi X Xi N-ary Cartesian product over the domain of X. 2 l2 vector norm. U(a, b) Uniform Distribution. N(µ, σ2) Normal Distribution. Bern(p) Bernoulli distribution; Takes value 1 with probability p and 0 otherwise. PM Probability distribution over the SCM M. PI M Probability distribution over the SCM M under intervention I. Vi An endogenous variable of an SCM M. Ui An exogenous variable of an SCM M. fi Structural equation of the variable Xi. target_coin target_enemy target_powerup towards_coin towards_enemy towards_powerup towards_ ag distance_coin near_coin position_coin targeting_cost_coin distance_powerup near_powerup position_powerup targeting_cost_powerup distance_ ag near_ ag position_ ag targeting_cost_ ag distance_enemy near_enemy position_enemy targeting_cost_enemy planning_sequence_1 planning_sequence_2 planning_sequence_3 planning_sequence_4 player_position coin_reward powerup_reward enemy_reward ag_reward Figure 8: Causal graph of an agent policy. The causal graph of a greedy agent inside an platformer environment. The parentless variables are exogenous. Their value is determined via the game environment. The final score variable is left out for clarity.