# emergenceinspired_multigranularity_causal_learning__33f98cc6.pdf

Emergence-Inspired Multi-Granularity Causal Learning

Hanwen Luo1, 2, Guoxian Yu1, 2, *, Jun Wang2, Yanyu Xu1, Yongqing Zheng2, 3, Qingzhong Li2, 3

1School of Software, Shandong University, Jinan, China 2SDU-NTU Joint Centre for AI Research, Shandong University, Jinan, China 3Dareway Software Co., Ltd., Jinan, China hwluo@mail.sdu.edu.cn, {gxyu, kingjun, xu yanyu, lqz}@sdu.edu.cn, zhengyongqing@dareway.com.cn

Existing causal learning algorithms focus on micro-level causal discovery, confronting significant challenges in identifying the influence of macro systems, composed of microlevel variables, on other variables. This difficulty arises because the causal relationships in macro systems are often mediated through micro-level causal interactions, which can lead to erroneous causal discovery or omission when dispersed. To address this issue, we propose the Emergence-inspired Multi-granularity Causal learning (EMCausal) method. Inspired by the emerging phenomena of aggregating micro level variables into macro level representations, EMCausal introduces a progressive mapping encoder to simulate the process, thus capturing the causal relationships driven by these macro entities. Next, it introduces a causal consistency constraint to collaboratively reconstruct micro variables using macro-level representations, enabling the learning of a multi-granular causal structure. Experimental results on both synthetic and real datasets demonstrate that EMCausal can identify causal graphs under the influence of causal emergence, outperforming competitive baselines in term of accuracy and robustness.

Introduction Causal discovery is integral to scientific research, as it focuses on uncovering the underlying mechanisms that drive observed phenomena. In the complex systems, distinguishing causal relationships from mere correlations is essential for the effective design of interventions, policy development, and the formulation of predictive models that can adapt to dynamic environments (Cui and Athey 2022; Cai et al. 2024). Early causal discovery algorithms primarily rely on conditional independence tests (Spirtes et al. 2000; Colombo, Maathuis et al. 2014), which require large sample sizes. Alternatively, another line of works involve searching for the causal graph with the highest score using various search strategies (Chickering 2002b; Yang et al. 2023). Since Zheng et al. (2018) proposed the continuous optimization acyclicity constraint, researchers have further extended the continuous optimization of causal learning, including nonlinear extensions (Zheng et al. 2020; Yu et al. 2019), op-

*Corresponding author: Guoxian Yu. Copyright 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

timization techniques (Lee et al. 2020), and robustness enhancements (He et al. 2021). These studies aim to improve the performance of causal learning algorithms when handling complex datasets, and increase their robustness and adaptability to noise. The exploration of multi-granularity data (Liang et al. 2024b) has emerged as a current research focus, presenting new challenges for application of causal learning methods in complex systems. While micro structures focus on the direct causality between micro variables, we define the macro structures as the collective influence of certain micro variables impacting other micro ones. At the micro-level, it is possible to capture more detailed and comprehensive information. However, when the number of micro-level variables is large and the structure becomes overly complex, this complexity may lead to inaccuracies or omissions in causal discovery. In contrast, observing from a macro perspective allows certain causal relationships to manifest with stronger causal effects, thereby facilitating their identification. Contemporary causal learning algorithms predominantly focus on microlevel causal discovery, neglecting the potential insights that can be gained from learning causal structures at the macrolevel. Emergence phenomena manifesting as overall behaviors that cannot be directly inferred from the behavior of individual parts, exhibiting entirely new properties not present in the individuals (Tononi and Koch 2015; Epstein and Axtell 1996; Wei et al. 2022). It reveals that the collective influence of micro-level groups on other variables cannot simply be decomposed into the individual effects of each variable within the group. Inspired by the emergence phenomenon, we developed a multi-granularity causal discovery approach, EMCausal. Our method first aggregates micro-level variables into macro-level using a Progressive Mapping Encoder (PME), motivated by the concept of supervenience (Horgan 1993), wherein the complete observation of the micro system determines the state of the macro system then utilizes both the micro variables and macro representations to reconstruct the micro variables, simulating the process of causal emergence to extract a multi-granularity causal structure. Additionally, EMCausal imposes a micro-macro consistency constraint to ensure coherence in the multi-granularity causal structure. Upon completion of training, these multigranularity causal structures can be derived directly from

The Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25)

the model parameters. The code of EMCausal is shared at http://www.sdu-idea.cn/codes.php?name=EMCausal. Our main contributions can be outlined as follows: (i) We propose EMCausal, a multi-granularity causal discovery approach inspired by the emergence phenomenon. It introduces a Progressive Mapping Encoder to aggregate micro-level variables into macro-level representations and reconstructs micro variables from these representations by simulating causal emergence. (ii) EMCausal defines a micro-macro consistency constraint to ensure that the inferred causal structures at the macro-level align with those at the micro-level, thereby preserving the coherence of multi-granularity causal structure. (iii) Experimental results on synthetic multi-granularity dataset and real datasets demonstrate that EMCausal outperforms competitive baselines (Liang et al. 2024b; Yu et al. 2019; Ng et al. 2019; Chickering 2002b; Spirtes et al. 2000).

Related Work Causal discovery infers causal structures from data, helping to identify the causal direction between variables. Existing causal discovery methods can be classified as constraintbased (Colombo, Maathuis et al. 2014; Vandenbroucke, Broadbent, and Pearce 2016; Peters et al. 2014; Spirtes, Meek, and Richardson 2013; Zhang, Zhou, and Guan 2018) and score-based (Chen et al. 2024; Ng et al. 2022; Shimizu et al. 2006; Zhang et al. 2023; Liang et al. 2023; Chickering 2002a) algorithms. The former infer causal structures by leveraging conditional independence information within the data. Nevertheless, the computation of conditional independence is highly demanding in terms of sample size and is computationally expensive, limiting its applicability to large-scale datasets. The latter category (Chickering 2002b; Zheng et al. 2020; Yu et al. 2019; Lu and Gao 2023; Wang et al. 2022; Liang et al. 2024a; Yang et al. 2024), on the other hand, defines a scoring function to evaluate the suitability of a given causal graph and employs search algorithms to seek the graph with the highest score. However, existing algorithms often overlook the impact of diluted macro causal effects at the micro-level on the discovery of causal graphs. This oversight can lead to erroneous causal inference in complex systems. Our innovation lies in addressing this gap, enabling more accurate identification of causal relationships across different granularity and improving the robustness of causal discovery in intricate environments. Integrating the study of emergence with causal analysis making it possible to further dissect the relationship in complex systems, helping to explain the overall system behavior. Some studies consider the emergence phenomena with stronger causal effects, but do not discuss how to uncover these causal relationships (Klein et al. 2022; Breakspear 2017; Rubenstein et al. 2017). They utilized metrics like Effective Information (EI) to design strategies for reducing system uncertainty, and for exploring appropriate granularity to observe the system. In contrast to the previous studies, Mg CSL (Liang et al. 2024b) reconstructs the micro variables to learn appropriate representations of macro variables, using path-product to uncover the causal relations among micro and macro variables. Compared with the Mg CSL path

product calculation, EMCausal trains a PME with an attention matrix to mask the macro-micro causal structure, enhancing the certainty of the identified macro-micro causal relationship. In addition, EMCausal incorporates a micromacro consistency constraint to account for the interactions between learned micro and macro structures and to avoid structural conflict.

This section introduces the fundamental concepts relevant to this work. Causal Graph: A causal graph is a DAG (Directed Acyclic Graph) used to represent causal relationships between variables. In a causal graph G = (V, E) , the set V represents variables, and the edge set E represents causal relationships between variables. If there is a directed edge X Y , it indicates that variables X is a direct cause of variables Y . Data Generation Process: The data generation process involves generating observational data based on the structure and relationships in the causal graph. Suppose we have a causal graph G with variables V = {V1, V2, , Vd} representing variables. To avoid confusion between nodes and variables, we will consistently use Xi. The data generation process can be expressed as:

Xi = fi(Pa(Xi)) + ei (1)

where Pa(Xi) denotes the parent set of Xi (i.e., the direct causes), fi is a specific functional form, and ei is the noise term. The noise affecting different variables is mutually independent.

Methodology

The multi-granularity causal structures can be described by GM = (VM, EM). Unlike the causal graph of the micro-system, VM = {X1, , Xd, Y1, , Yq} incorporates variables of varying granularity, allowing GM to simultaneously reflect causal relationships at both micro and macro levels. Given the observed variables X = (X1, X2, , Xd) with X Rn d, our goal is to discover the multi-granularity causal structure GM, which not only clarifies the causal relationships within the original microlevel system but also uncovers potential macro variables composed of micro variables.

Model Structure Overview

To obtain reliable multi-granularity causal structures from observational data X, we design the PME to aggregate micro variables toward macro variables and achieve the micromacro relationship discovery. Next, we perform crosslayer collaborative causal learning to learn multi-granularity causal structures by reconstructing micro variables. Our framework is illustrated in Figure 1.

Progressive Mapping Encoder Emergence phenomena manifesting as overall behaviors that cannot be directly inferred from the behavior of individual parts, which should

Progressive Mapping Encoder

Hadamard product

Horizontally concatenate

Horizontally

Cross-layer Collaborative Causal Learning

ℋ(𝐂𝐂) = 0 𝐂𝐂= 𝐖𝐖𝑚𝑚𝑚𝑚+ 𝑔𝑔𝑔𝑔(𝐌𝐌)𝑇𝑇𝐖𝐖𝑚𝑚𝑚𝑚

Figure 1: The conceptual framework of EMCausal. Given the micro-level input X, the PME is used to aggregate micro-level variables into macro-level representations, and outputs the encoded results E. As M corresponds the relations from micro variables to macro representations, EMCausal uses the Gumbel-Softmax operation gs(M) to perform the Hadamard product with E, and then sums the results to get the macro representation Y. Under the micro-macro consistency constraint H(C) = 0, EMCausal reconstructs origin micro-level variables X based on X and Y with a MLP parameterized by Wma and Wmi, where Wma encodes the relationship between macro representations and micro variables, Wmi encodes the relationship among micro variables, and C = Wmi + gs(M)T Wma stores the reachable edges in the causal graph.

not be overlooked, as it may lead to erroneous causal discoveries. To address this gap, we introduce abstract macro variables using PME to discover collectives influence of variables. Therefore, a robust macro-level representation is necessary, which relies not only on the correspondence between micro and macro variables but also on a thorough understanding of the macro variables generation process. By employing the PME, We integrates an attention mechanism to capture the relationship between micro variables and abstract macro variables and iteratively adjusts the model parameters during training, ultimately deriving a definitive cross-granularity variables relationship and a model for generating macro-level representations. The PME framework comprises a stacked encoder coupled with a reweighting procedure which could be consider as an attention mechanism. Unlike Mg CSL (Liang et al. 2024b) that directly obtains the macro-level variables representation via a reconstruction task, EMCausal solely uses the encoder part to generate intermediate encoding states. Suppose the number of micro variables and macro ones are d and q, respectively. Given the input X, we first utilize each micro-level variable to predict various macro-level variable. Ei Rn q denotes the encoding result using the i-th micro-level variable:

Ei = MLP(Xi; θi enc) (2)

where θi enc is the parameter of the i-th encoder of the stacked encoder. To learn the correspondence between micro-level variables and macro-level one while aggregating micro-level variables, we then employ Gumbel-softmax(Jang, Gu, and Poole 2017), a technique that draw samples from a categorical distribution, to reweight all predictions. Denote the model internal parameters M Rd q, then we can obtain

the representation of macro variables Y Rn q as follows:

i gs(Mi) Ei (3)

where denotes the Hadamard product, Mi is the i-th row of M, and gs(Mi) refers to the outcome of Gumbel-Softmax. This operation facilitates sampling while preserving differentiability through the Softmax function, thereby enabling the gradient-based optimization of discrete variables. For micro variable i, Mi = [Mi,1, Mi,2, , Mi,q] corresponding to probability of the i-th micro variable generating the j-th macro variable, ϵj is the gumbel noise for each variable: ϵj = log( log(Mi,j)) (4)

Next, we calculate the Gumbel-Softmax probability pi,j for corresponding the i-th micro variable with j-th macro one:

pi,j = exp ((log(Mi,j) + ϵj)/τ) P j exp ((log(Mi,j) + ϵj)/τ) (5)

where τ is the temperature parameter that controls the smoothness of the Softmax function. By continuously training the parameters of PML, we can optimize them to approximate a binary matrix, facilitating the establishment of a threshold to identify the correspondence between micro-level and macro-level variables. With PME, macro state representations can be derived from the input and expressed mathematically as:

Y = PME(X) (6)

The following section will detail how PME is integrated into subsequent processes and jointly trained within the reconstruction task.

Cross-Layer Collaborative Causal Learning To jointly learn micro and macro structures, we introduce a collaborative learning strategy that enables the simultaneous identification of causal relationships within both micro and macro systems. To avoid structural conflict between micro and macro structure , we develop a micro-macro consistency constraint mechanism to prevent erroneous causal structures. Suppose Wmi Rd d is the weight matrix connecting micro variables, Wma Rq d connecting macro variables to micro ones. Different from NOTEARS-MLP(Zheng et al. 2020) and its variants that only tune Wmi at the micro-level, we use micro variables and macro representations to jointly reconstruct the original variable as follows: ˆX = MLP(XWmi PME(X)Wma; θmlp) (7) where θmlp is the parameter of the reconstruction model, is the concatenation operator. Then we can define the optimization target of reconstruction as: Lre = J (X, ˆX) (8)

where J (X, ˆX) = ||X ˆX||2 2 is the square loss. We omit the regularization term to simplify the presentation. To avoid the representation of macro variables solely depending on the model s initial parameters, causing biased results, we generally adjust the parameters of Wmi first, and then learn the causal relationships at the macro-level after the convergence of micro-level causal structure. In the first optimization stage of EMCausal, our optimization objective is: min Lre s.t. H(Wmi) = 0 (9)

where H(W) is the acyclicity constraint, H(W) = tr(e W W) d = 0 (10) After learning the micro-level causal relations, EMCausal moves to the second optimization stage to learn causal relation at the macro-level. To ensure that the causal relationships at the macro level are effectively guided by those at the micro level during the optimizing process, we impose a consistency constraint on macro-level causal discovery to ensure the consistency between the micro-level causal structures and the underlying macro-level causality as follows: C = Wmi + gs(M)T Wma (11) C Rd d stores the reachable edges in the causal graph. We can formulate the optimization objective for this stage as follows: min Lre s.t. H(C) = tr(e C C) d = 0 (12)

where the regularization term is also omitted to simplify the presentation. During this phase, we freeze the parameters of Wmi, but optimize M (induced by PME) and Wma to ensure that the causal relationships at the macro-level do not contradict with those at the micro-level. Algorithm 1 outlines the main procedure of EMCausal. Lines 2-6 learns the micro-level causal relations between d variables, while lines 7-11 optimizes the causal relations at the macro-level.

Algorithm 1: EMCausal: Emergence-inspired Multigranularity Causal Learning

Input: observed data X and threshold δ Output: Multi-granularity Causal Graph GM 1: Initialize the set of model parameters {M, Wma, θenc, Wmi, θmlp} 2: Learning on the micro-level graph 3: while not arriving maximal iterations or triggering termination conditions do 4: Fix M, Wma, θenc, and calculate L by Eq. (9) 5: Update Wmi, θmlp to minimize L in Eq. (9) 6: end while 7: Learning on the macro-level graph 8: while not arriving maximal iterations or triggering termination conditions do 9: Fix Wmi, and calculate L by Eq. (12) 10: Update Wma, θmlp, θenc, M to minimize L in Eq. (12) 11: end while 12: Update C by Eq. (11) 13: GM(i, j) = I(Cij > δ) prune too low-weight edges 14: return GM

Optimization Both phases of our optimization (Eq. (9) and Eq. (12)) can be made by an augmented Lagrangian method as follows:

min L = J (X, ˆX) + µ

2 H2(W) + γH(W) (13)

where µ is the penalty coefficient and γ is the Lagrange multiplier. The constrained problem in Eq. (13) can be converted into a sequence of unconstrained subproblems, which can be solved by gradient descent optimization strategy as follows:

θ(κ+1) arg min{L = J (X, ˆX) + µ

+ γH(W)} (14)

( ηµ(κ), H(W(κ+1)) > ρH(W(κ))

µ(κ), otherwise (15)

γ(κ+1) γ(κ) + µH(W) (16)

where for the first stage θ {θmlp, Wmi}, as for the second stage θ {M, θmlp, θenc, Wma}, and θ(κ) is the iterative optimized θ in the κ-th iteration. In this work, we apply the L-BFGS-B algorithm (Zhu et al. 1997) to solve the optimization problem. Although EMCausal incorporates several procedures, it introduces an acceptable computational complexity. The computational complexity of an iteration step of EMCausal is O(ndmq + nd(d + q)k + d2(d + q)), where m and k are the dimensions of the hidden layer of PME and restruction model, respectively. As the computational complexity of one step of NOTEARS-MLP is O(nd2k + d2k + d3), our method introduces extra complexity by PME and macro variables number d, which are indispensable. While the number of macro variables q has the same magnitude as micro ones d, EMCausal holds an acceptable computational complexity.

Erd os-R enyi graph Scale-Free graph d method Precision(%) Recall(%) F1(%) SHD Precision(%) Recall(%) F1(%) SHD

PC 50.89 6.61 36.04 9.54 41.91 8.47 26.77 4.47 53.63 6.93 45.00 10.29 48.70 8.57 23.80 3.16 GES 45.06 5.81 38.70 7.37 41.38 6.11 31.40 4.55 46.61 10.88 43.21 8.80 44.51 9.05 27.67 5.41 DAG-GNN 61.41 9.14 39.56 11.17 47.03 9.56 37.50 6.22 92.92 7.39 6.46 1.84 12.01 3.18 162.40 8.10 GAE 73.56 30.06 13.04 9.11 21.39 12.95 41.60 4.88 80.33 17.06 24.64 15.73 34.47 19.27 34.67 7.23 Gra N-DAG 95.83 9.00 15.00 6.10 25.59 9.21 39.10 2.81 91.33 9.19 15.56 2.98 26.54 4.67 30.90 1.52 Mg CSL 33.42 6.49 60.87 9.11 42.96 7.01 49.00 11.49 28.40 7.68 52.50 18.35 36.25 9.78 45.60 8.39 EMCausal 85.70 13.79 71.30 11.40 76.69 8.82 18.50 7.01 78.4 13.77 71.10 12.22 73.46 8.21 17.7 5.77

PC 44.55 5.83 19.82 3.24 27.41 4.13 140.75 11.30 43.30 7.16 19.59 3.31 26.96 4.50 138.36 7.05 GES 43.58 6.32 29.87 4.36 35.41 4.99 160.00 12.68 45.89 10.32 36.94 6.75 40.88 8.14 109.70 14.80 DAG-GNN 54.89 10.46 19.83 5.03 28.92 6.51 161.40 16.06 50.86 7.96 21.70 3.20 30.04 2.87 161.20 8.46 GAE 59.21 22.46 3.24 2.56 5.92 4.32 170.80 7.52 60.42 14.91 4.70 2.41 8.60 4.18 160.60 3.63 Gra N-DAG 92.92 7.39 6.46 1.84 12.01 3.18 162.40 8.10 87.65 7.26 5.31 1.84 9.96 3.31 156.40 3.17 Mg CSL 27.93 2.91 38.70 4.98 32.29 2.66 220.70 20.51 41.16 9.71 31.71 4.95 31.71 4.95 228.20 23.28 EMCausal 66.11 15.77 63.91 6.38 63.67 7.11 97.80 29.69 57.36 16.54 69.62 7.96 61.10 7.66 111.67 37.28

Table 1: Results on nonlinear models. The best result is highlighted in bold font. For the reported results, ( ) means the larger(smaller) the score, the better the model performance is.

Experimental Results and Analysis In this section, we conduct a series of experiments to study the effectiveness of EMCausal. We first compare the performance of various causal discovery algorithms using synthetic datasets and give a detailed analysis of the strengths and limitations of compared methods. We then investigate the impact of hidden macro variables to assess the ability of EMCausal to identify elusive causal relationships in the presence of emergent phenomena. Next, we analyze the consistency of causal relationships among micro and macro levels to further investigate the contribution factors within EMCausal. Finally, we conduct experiments on real-world datasets to testify EMCausal in practical applications. We implement EMCausal using Py Torch 1.13 and conduct experiments on a server with the Intel(R) Xeon(R) Gold 6248R CPU, 512G memory, 8 NVIDIA Ge Force RTX 3090 GPUs, and Ubuntu 22.04.

Performance Analysis Experimental Setup We choose several representative causal discovery algorithms as our baselines, including classical PC (Spirtes et al. 2000), which is based on conditional independence tests, and score-based optimization methods such as GES (Chickering 2002b), DAG-GNN (Yu et al. 2019), GAE (Ng et al. 2019), and Mg CSL (Liang et al. 2024b). The parameter configurations of these methods are summarized in Table 2. To synthesize multi-granularity data, we set the number of micro variables that constitute each macro variable to 4 and 8 with the total number of micro variables as 20 or 40. The edge density is set to 2. We use randomly generated two layer MLP, whose hidden size is set to 100, to model the nonlinear relationships between variables. For each experiment, we sample 1000 data samples from the data generation process. To reduce randomness, we perform ten independent experiments and report the mean and standard deviation. We generate nonlinear multi-granularity causal graphs and corresponding data, and conduct experiments on this basis to study the effectiveness of EMCausal. Since contemporary

parameter value PC significance level α 0.05 GES score function BIC

learning rate 0.003 VAE layers [1,64,1,64,1] initial λ 0 initial c 1

learning rate 0.001 input units 1 hidden layers 1 hidden units 4 λ 0 initial α 0 initial ρ 1

learning rate 0.001 hidden layers 2 hidden units 10 PNS threshold 0.75 initial λ 0 initial µ 0.001

initial µ 0.001 tolerance of H 0.1 maximum of µ 1016

ϵ 0.2 SAE layers [d,0.75d,5,0.75d,d] MLP layers [d+5,10,1]

ϵ 0.3 τ 0.2 PME structure [d,0.75d,5] MLP layers [d+5,10,1]

Table 2: Parameter configuration of compared methods

algorithms (except Mg CSL) are unable to directly uncover multi-granular causal relations, we map the true and predicted results of each algorithm to the micro-level graph for comparison. We mainly use four metrics to quantify the performance, namely Precision, Recall, F1 and Structural Hamming Distance (SHD). The larger the value of the first three metrics, the better the performance is, the opposite applies for SHD.

EMCausal vs. Constraint-based methods: The results in Table 1 clearly show that our EMCausal significantly outperforms constraint-based algorithms across all metrics on synthetic datasets, with the most pronounced advantage observed in precision. This performance disparity is primarily due to the limitations of conditional independence tests in capturing nonlinear causal relationships, leading to a substantial number of missed causal directions. Furthermore, as the number of variables increases, this difference becomes even more evident for the reliability of conditional independence tests heavily rely on the quantity of samples, more variables need more samples. EMCausal vs. Score-based methods: Compared to other score-based methods, such as Gra N-DAG and DAG-GNN, EMCausal exhibits significant differences in both F1 score and SHD. While Gra N-DAG and DAG-GNN gain a higher precision, their recall rates are notably lower. This discrepancy arises primarily because these score-based methods only report predicted edges with higher confidence, thereby reducing the number of predictions and having lower recall. As the causal effects from macro structures become dispersed across micro variables, the causal influence on each individual micro variable weakens, making it difficult for these score-based methods to capture a comprehensive causal structure, which in turn lowers their recall rates. Furthermore, the hidden noise of macro variables leads DAG-GNN to focus more on recovering causal relationships among micro-level variables decomposed within the same macro-level variables, rather than reflecting the causal effects driven by macro variables. Our EMCausal is explicitly designed to encourage the discovery of causal relationships at the macro-level. Although this design may result in a slight decrease in precision, it significantly enhances recall. The observed increase in F1 score and the reduction in SHD further validate its effectiveness, showing its superior capability in capturing macro structures within complex systems, thereby revealing multi-granularity causal relationships. EMCausal vs. Mg CSL: Both EMCausal and Mg CSL are multi-granularity causal discovery algorithms, warranting a comprehensive comparative analysis. Although both methods involve the process of identifying macro variables and joint causal discovery, they differ in identifying macro variables and the subsequent causal relationship discovery. Therefore, we standardize the comparison by evaluating Precision, Recall, SHD, and the Number of Non-Zero elements (NNZ) related to macro-level causal relationships within the multi-granular graph. This allows us to assess the performance difference in uncovering macro-level causal structures. As shown in Figure 2, our EMCausal significantly outperforms Mg CSL in macro-level causal discovery across all metrics. This indicates that EMCausal is more accurate and reliable in identifying macro-level causal structures. The Number of Non-Zero elements (NNZ) reflects the number of edges in the predicted causal graph. While delivering superior performance, EMCausal consistently exhibits higher NNZ values compared to Mg CSL, suggesting that EMCausal can find out more true causal edges and more

10 20 30 40 Number of variables

10 20 30 40 Number of variables

10 20 30 40 Number of variables

10 20 30 40 Number of variables

Mg CSL EMCausal

Figure 2: Precision, Recall, SHD, and NNZ of Mg CSL and EMCausal on synthetic datasets with q=4 macro variables and d {10, 20, 30, 40} micro-variables.

accurately identify causal relationships among micro and macro variables. In contrast, Mg CSL misses many causal edges, thus has lower precision and recall values. This is because EMCausal defines a consistency constraint between macro-level and micro-level causal structures. Particularly, it ensures that the learning process of the macro-level causal graph is informed by the micro-level graph. This alignment guarantees the correctness of micro-level causal discovery and its effective guidance in constructing macro-level causal relationships, which Mg CSL disregards this consistency and merely takes both macro and micro variables as a whole to learn the causal structure.

Impact of Macro-Level Variables

We further investigate the impact of hidden macro-level variables on various causal learning algorithms using synthetic multi-granular data. By fixing the number of micro variables to 20 and setting the edge density to 2, each macro variable is composed of 2 micro variables. We gradually increase the number of macro variables, and record the changes in Precision, Recall, F1 and SHD across different methods. The results presented in Figure 3 demonstrate that the performance of nearly all algorithms declines as the number of macro variables decreases. This decline occurs because, in our experiments, a reduction in macro variables leads to an increase in causal relationships mediated by those macro variables. This observation aligns with our previous analysis. However, since EMCausal simulates the process of macro variables generation to consider the emergenceinspired causality, it generally manages a better robustness to this change. Compared to conventional algorithms, our EMCausal can effectively resist with the impact caused by macro variables, maintain a relatively stable performance. This fact proves its advantages when dealing with complex macro systems.

2 4 6 8 Macro Variables

2 4 6 8 Macro Variables

2 4 6 8 Macro Variables

2 4 6 8 Macro Variables

PC GES DAG-GNN GAE Gra N-DAG Mg CSL EMCausal

Figure 3: The performance of causal learning algorithms on synthetic multi-granularity data with the number of macro variables from 2 to 8.

Consistency of Causal Relationships To verify that the macro variables identified by EMCausal encapsulate the common information of their corresponding micro variables, we conducted experiments on synthetic data. These experiments compared the consistency of causal relationships between micro and macro variables, aiming to demonstrate the coherence and effectiveness of the macrolevel representations generated by EMCausal. According to our criterion, if all micro variables associated with a macro variable point to another variable, the macro variable should exhibit such a causal relationship also. Based on this principle, we calculate the Precision and Recall and F1 score of the generated graphs, report the results in Table 3. The performance of both EMCausal and Mg CSL shows some decline as the number of variables increases. However, EMCausal is generally much less affected, demonstrating its superior ability to maintain the consistency of causal structures and ground truths at both the micro and macro levels.

d Method Precision (%) Recall (%) F1 (%)

10 Mg CSL 72.14 25.58 27.44 16.54 38.15 18.49 EMCausal 86.96 16.01 76.41 13.54 79.84 10.28

20 Mg CSL 45.71 15.48 14.13 9.08 21.12 11.98 EMCausal 86.81 10.78 71.65 11.28 77.70 8.47

30 Mg CSL 50.89 13.12 15.44 7.73 22.94 9.51 EMCausal 71.86 13.14 71.89 7.09 70.93 6.47

40 Mg CSL 54.81 17.34 9.32 5.43 15.54 8.50 EMCausal 73.37 14.04 53.53 11.07 60.18 6.44

Table 3: EMCausal vs. Mg CSL on causal consistency.

Result on the Sachs Dataset To further validate the applicability of EMCausal in realworld scenarios, we conducted experiments on the Sachs dataset (Sachs et al. 2005). The Sachs dataset, comprising 853 samples, records the expression levels of 11 proteins and phosphorylated proteins measured from human T cells in the immune system. The results of our experiments are presented in Table 4.

Method Precision(%) Recall(%) F1(%) SHD PC 42.86 35.29 38.71 11 GES 42.86 35.29 38.71 11 DAG-GNN 40.00 35.29 37.50 16 GAE 50.00 5.88 10.53 16 Gra N-DAG 66.67 11.76 20.00 15 Mg CSL 42.86 17.65 36.50 18 EMCausal 46.67 41.18 43.75 11

Table 4: Results on Sachs dataset. The best and the second results are highlighted in bold font and underline.

Our EMCausal achieves the highest recall and F1 score. Although Gra N-DAG gives a superior precision, its overly conservative predictions resulted in a low recall value, leading to suboptimal SHD and F1 scores. In contrast, while PC and GES also achieve low SHD values similar to ours, their F1 scores do not demonstrate the same level of superiority. The results on the Sachs dataset further underscore EMCausal s broad potential for real-world applications, highlighting its effectiveness in uncovering complex causal structures.

Conclusion Inspired by the emergence, we propose a novel multigranularity causal discovery algorithm, EMCausal. This approach integrates the discovery of micro-macro relationships with the aggregation of micro variables into macrolevel representations. By reconstructing micro variables from these macro-level representations, EMCausal effectively learns multi-granularity causal structures. Experimental results demonstrate that EMCausal not only more accurately uncovers the underlying causal mechanisms within complex systems but also provides deeper insights into the behavior of these systems. This capability positions EMCausal as a powerful tool for analyzing intricate causal relationships, offering significant improvements over existing methods in both accuracy and interpretability.

Acknowledgments This work is supported by National Key Research and Development Program of China (No. 2023YFF0725500), NSFC (62031003, 62272276 and 62432006), Shandong Provincial Natural Science Foundation (No. ZR2024JQ001) and Taishan Scholars Program (No. tsqn202306007).

References Breakspear, M. 2017. Dynamic models of large-scale brain activity. Nat. Neurosci., 20(3): 340 352. Cai, R.; Zhu, Y.; Qiao, J.; Liang, Z.; Liu, F.; and Hao, Z. 2024. Where and How to Attack? A Causality-Inspired Recipe for Generating Counterfactual Adversarial Examples. In AAAI, 11132 11140. Chen, W.; Qiao, J.; Cai, R.; and Hao, Z. 2024. On the role of entropy-based loss for learning causal structure with continuous optimization. TNNLS, 99(1): 1 12. Chickering, D. M. 2002a. Learning equivalence classes of Bayesian-network structures. JMLR, 2: 445 498. Chickering, D. M. 2002b. Optimal structure identification with greedy search. JMLR, 3(11): 507 554. Colombo, D.; Maathuis, M. H.; et al. 2014. Orderindependent constraint-based causal structure learning. JMLR, 15(1): 3741 3782. Cui, P.; and Athey, S. 2022. Stable learning establishes some common ground between causal inference and machine learning. Nat. Mach. Intell., 4(2): 110 115. Epstein, J. M.; and Axtell, R. 1996. Growing artificial societies: social science from the bottom up. Brookings Institution Press. He, Y.; Cui, P.; Shen, Z.; Xu, R.; Liu, F.; and Jiang, Y. 2021. DARING: Differentiable causal discovery with residual independence. In KDD, 596 605. Horgan, T. 1993. From Supervenience to Superdupervenience: Meeting the Demands of a Material World. Mind, 102(408): 555 586. Jang, E.; Gu, S.; and Poole, B. 2017. Categorical Reparametrization with Gumble-Softmax. In ICLR. Klein, B.; Swain, A.; Byrum, T.; Scarpino, S. V.; and Fagan, W. F. 2022. Exploring noise, degeneracy and determinism in biological networks with the einet package. Methods in Ecology and Evolution, 13(4): 799 804. Lee, H.-C.; Danieletto, M.; Miotto, R.; Cherng, S. T.; and Dudley, J. T. 2020. Scaling structural learning with NOBEARS to infer causal transcriptome networks. In PSB, 391 402. Liang, J.; Wang, J.; Yu, G.; Domeniconi, C.; Zhang, X.; and Guo, M. 2024a. Gradient-based local causal structure learning. TCYB, 54(1): 486 495. Liang, J.; Wang, J.; Yu, G.; Guo, W.; Domeniconi, C.; and Guo, M. 2023. Directed acyclic graph learning on attributed heterogeneous network. TKDE, 35(10): 10845 10856. Liang, J.; Wang, J.; Yu, G.; Xia, S.; and Wang, G. 2024b. Multi-Granularity Causal Structure Learning. In AAAI, 13727 13735.

Lu, S.; and Gao, T. 2023. Meta-DAG: Meta causal discovery via bilevel optimization. In ICASSP, 1 5. Ng, I.; Zhu, S.; Chen, Z.; and Fang, Z. 2019. A graph autoencoder approach to causal structure learning. ar Xiv preprint ar Xiv:1911.07420. Ng, I.; Zhu, S.; Fang, Z.; Li, H.; Chen, Z.; and Wang, J. 2022. Masked gradient-based causal structure learning. In SDM, 424 432. Peters, J.; Mooij, J. M.; Janzing, D.; and Sch olkopf, B. 2014. Causal Discovery with Continuous Additive Noise Models. JMLR, 15: 2009 2053. Rubenstein, P.; Weichwald, S.; Bongers, S.; Mooij, J.; Janzing, D.; Grosse-Wentrup, M.; and Sch olkopf, B. 2017. Causal Consistency of Structural Equation Models. In UAI, 808 817. Sachs, K.; Perez, O.; Pe er, D.; Lauffenburger, D. A.; and Nolan, G. P. 2005. Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308(5721): 523 529. Shimizu, S.; Hoyer, P. O.; Hyv arinen, A.; Kerminen, A.; and Jordan, M. 2006. A linear non-Gaussian acyclic model for causal discovery. JMLR, 7(10): 2003 2030. Spirtes, P.; Glymour, C. N.; Scheines, R.; and Heckerman, D. 2000. Causation, prediction, and search. MIT Press. Spirtes, P. L.; Meek, C.; and Richardson, T. S. 2013. Causal inference in the presence of latent variables and selection bias. ar Xiv preprint ar Xiv:1302.4983. Tononi, G.; and Koch, C. 2015. Consciousness: here, there and everywhere? Philosophical Transactions of the Royal Society B: Biological Sciences, 370(1668): 20140167. Vandenbroucke, J. P.; Broadbent, A.; and Pearce, N. 2016. Causality and causal inference in epidemiology: the need for a pluralistic approach. Int. J. of Epi., 45(6): 1776 1786. Wang, Y.; Zhang, A.; Wang, X.; Yuan, Y.; He, X.; and Chua, T.-S. 2022. Differentiable invariant causal discovery. ar Xiv preprint ar Xiv:2205.15638. Wei, J.; Tay, Y.; Bommasani, R.; Raffel, C.; Zoph, B.; Borgeaud, S.; Yogatama, D.; Bosma, M.; Zhou, D.; Metzler, D.; et al. 2022. Emergent Abilities of Large Language Models. In TMLR. Yang, D.; He, X.; Wang, J.; Yu, G.; Domeniconi, C.; and Zhang, J. 2024. Federated Causality Learning with Explainable Adaptive Optimization. In AAAI, 16308 16315. Yang, D.; Yu, G.; Wang, J.; Wu, Z.; and Guo, M. 2023. Reinforcement causal structure learning on order graph. In AAAI, 10737 10744. Yu, Y.; Chen, J.; Gao, T.; and Yu, M. 2019. DAG-GNN: DAG structure learning with graph neural networks. In ICML, 7154 7163. Zhang, A.; Liu, F.; Ma, W.; Cai, Z.; Wang, X.; and Chua, T.- s. 2023. Boosting differentiable causal discovery via adaptive sample reweighting. ar Xiv preprint ar Xiv:2303.03187. Zhang, H.; Zhou, S.; and Guan, J. 2018. Measuring conditional independence by independent residuals: Theoretical results and application in causal discovery. In AAAI, 2029 2036.

Zheng, X.; Aragam, B.; Ravikumar, P.; and Xing, E. P. 2018. DAGs with NO TEARS: continuous optimization for structure learning. In Neur IPS, 9492 9503. Zheng, X.; Dan, C.; Aragam, B.; Ravikumar, P.; and Xing, E. 2020. Learning sparse nonparametric dags. In AISTAT, 3414 3425. Zhu, C.; Byrd, R. H.; Lu, P.; and Nocedal, J. 1997. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM TOMS, 23(4): 550 560.