# disentangled_and_selfexplainable_node_representation_learning__0c93d2c1.pdf Published in Transactions on Machine Learning Research (07/2025) Disentangled and Self-Explainable Node Representation Learning Simone Piaggesi simone.piaggesi@di.unipi.it University of Pisa, Pisa, Italy André Panisson andre.panisson@centai.eu CENTAI Institute, Turin, Italy Megha Khosla M.Khosla@tudelft.nl Delft University of Technology, Delft, Netherlands Reviewed on Open Review: https: // openreview. net/ forum? id= s51TQ8Eg1e Node embeddings are low-dimensional vectors that capture node properties, typically learned through unsupervised structural similarity objectives or supervised tasks. While recent efforts have focused on post-hoc explanations for graph models, intrinsic interpretability in unsupervised node embeddings remains largely underexplored. To bridge this gap, we introduce Di Se NE (Disentangled and Self-Explainable Node Embedding), a framework that learns self-explainable node representations in an unsupervised fashion. By leveraging disentangled representation learning, Di Se NE ensures that each embedding dimension corresponds to a distinct topological substructure of the graph, thus offering clear, dimension-wise interpretability. We introduce new objective functions grounded in principled desiderata, jointly optimizing for structural fidelity, disentanglement, and human interpretability. Additionally, we propose several new metrics to evaluate representation quality and human interpretability. Extensive experiments on multiple benchmark datasets demonstrate that Di Se NE not only preserves the underlying graph structure but also provides transparent, human-understandable explanations for each embedding dimension. 1 Introduction Self-supervised and unsupervised node representation learning (Hamilton, 2020) provide a powerful toolkit for extracting meaningful insights from complex networks, making them essential in modern AI and machine learning applications for network analysis (Ding et al., 2024). These methods offer flexible and efficient ways to analyze high-dimensional networks by transforming them into low-dimensional vector spaces. This transformation enables dimensionality reduction, automatic feature extraction, and the deployment of standard machine learning algorithms for tasks such as node classification, clustering, and link prediction (Khosla et al., 2021). Moreover, unsupervised node representations, or embeddings, enable visualization of complex networks and can be transferred across similar networks, enhancing understanding and predictive power in fields ranging from social networks to biological systems. Despite their widespread utility, these approaches often face substantial challenges in terms of interpretability, typically relying on complex and post-hoc techniques to understand the latent information encoded within the embeddings (Piaggesi et al., 2024; Idahl et al., 2019; Gogoglou et al., 2019). This limitation raises a critical question: What information do these embeddings encode? Despite a large body of research on explainable GNN models, embedding methods the fundamental building blocks of graph-based systems have received comparatively little attention. Most existing attempts to explain embeddings are predominantly post-hoc (Piaggesi et al., 2024; Gogoglou et al., 2019; Khoshraftar Published in Transactions on Machine Learning Research (07/2025) Figure 1: Di Se NE generates dimension-wise disentangled representations in which each embedding dimension is mapped to a mesoscale substructure in the input graph. The vector represents the embedding for the node marked in blue and the bars depict feature values. Figure 2: The overlap in dimension explanations aligns with the correlation between the node feature values for those dimensions. The dimension referenced by the blue subgraph shows a stronger correlation with the red dimensions and a lower correlation with the green dimension. Figure 3: The node feature value indicates its proximity to the explanation substructure mapped to the corresponding dimension. The black node has a higher value for the dimension corresponding to the green subgraph (since it is 1 hop away) than for the dimension corresponding to the red subgraph (3 hops away). et al., 2021; Dalmia et al., 2018) and heavily dependent on the specific embedding techniques used. Some approaches (Piaggesi et al., 2024) extend methods from the post-processing of word embeddings (Subramanian et al., 2018; Chen & Zaki, 2017) by minimizing reconstruction errors through over-complete auto-encoders to improve sparsity. Other works (Gogoglou et al., 2019; Khoshraftar et al., 2021; Dalmia et al., 2018) focus solely on extracting meaningful explanations, without addressing the underlying embedding learning process. We propose Di Se NE (Disentangled and Self-Explainable Node Embedding), a framework that addresses the interpretability gap in unsupervised node embeddings by generating inherently self-explainable node representations. In our approach, "self-explainable" means that each embedding dimension corresponds to a distinct subgraph that globally explains the structural information encoded within that dimension. These dimensional subgraphs highlight human-comprehensible functional components of the input graph, providing clear and meaningful insights. Di Se NE leverages disentangled representation learning, an approach that encodes latent variables corresponding to distinct factors of variation in the data (Wang et al., 2024), to produce node embeddings that are interpretable on a per-dimension basis. In graph data, node behaviour is strongly influenced by mesoscale structures such as communities, which shape the network s organization and drive dynamics (Barrat et al., 2008). By leveraging disentangled representation learning, we capture these "topological" substructures more effectively, with each embedding dimension reflecting an independent graph unit (see Figure 1). We achieve this through a novel objective function that ensures structural disentanglement. Specifically, we optimize the embeddings so that each dimension is predictive of a unique substructure in the input graph. To avoid degenerate solutions, we incorporate an entropy-based regularizer that ensures the resulting substructures are non-empty and informative. Our paradigm represents a shift in the language of explanations compared to the ones often considered when dealing with GNNs (Yuan et al., 2023). Explainability for GNNs often involves understanding which parts of the local computation graph (nodes, edges) and node attributes significantly influence the model s predictions (Funke et al., 2023; Ying et al., 2019; Schnake et al., 2022). On the other hand, the explanations that we aim to discover are inherently non-local, since they could involve mesoscale structures such as node clusters (Piaggesi et al., 2024), usually not included in the GNN computational graph. We provide a comprehensive evaluation of the embeddings and uncover novel insights by proposing several new metrics that capture the interplay between disentanglement and explainability (see Section 3.3 for details). For instance, our overlap consistency metric (illustrated in Figure 2) shows that the overlap of the topological substructures used as explanations matches the correlation among the corresponding embedding dimensions, providing insights into the interdependencies among node characteristics within the graph. Moreover, we attribute meaning to the embedding values by showing that the magnitude of a node s entry in a specific dimension correlates with its proximity (as depicted in Figure 3) to the topological subgraphs associated Published in Transactions on Machine Learning Research (07/2025) with that dimension. This relationship enhances our understanding of the relative positioning of nodes with respect to different graph components, thereby enhancing spatial awareness of the graph structure. Contributions are summarized as follows: (i) we formalize new and essential criteria for achieving disentangled and explainable node representations, offering a fresh perspective on interpretability in unsupervised graph-based learning; (ii) we introduce novel evaluation metrics to help quantifying the goodness of node representation learning in disentangled and explainable settings (iii) we perform extensive experimental analyses on synthetic and real-world data to establish state-of-the-art results in self-explainable node feature learning. We release our code and data at https://github.com/simonepiaggesi/disene. 2 Preliminaries and Related Work Given an undirected graph G = (V, E), node embeddings are obtained through an encoding function h : V RK that maps each node to a point in a K dimensional vector space RK, where typically D << |V|. We denote the K-dimensional embedding of a node v V as h(v) = [h1(v), . . . , h K(v)], where hd(v) represents the value of the d-th feature of the embedding for node v. Alternatively, we can represent all node embeddings collectively as a matrix H(G) RV K, where each entry Hvd = hd(v) corresponds to the d-th feature for node v. We can also refer to columns of such matrix, H:,d, as the dimensions of the embedding model space. Node embeddings interpretability. Node embeddings are shallow encoding techniques, often based on matrix factorization or random walks (Qiu et al., 2018). Since the latent dimensions in these models are not aligned with high-level semantics (Senel et al., 2018; Prouteau et al., 2022), interpreting embeddings typically involves post-hoc explanations of their latent features (Gogoglou et al., 2019; Khoshraftar et al., 2021). Other works propose alternative methods to modify existing node embeddings, making them easier to explain with human-understandable graph features (Piaggesi et al., 2024; Shafi et al., 2024). From a different viewpoint, Shakespeare & Roth (2024) explore how understandable are the embedded distances between nodes. Similarly, Dalmia et al. (2018) investigate whether specific topological features are predictable, and then encoded, in node representations. Graph neural networks interpretability. Graph Neural Networks (GNNs) (Wu et al., 2021) are deep models that operate via complex feature transformations and message passing. In recent years, GNNs have gained significant research attention, also in addressing the opaque decision-making process. Several approaches have been proposed to explain GNN decision process (Yuan et al., 2023), including perturbation approaches (Ying et al., 2019; Yuan et al., 2021; Funke et al., 2023), surrogate model-based methods (Vu & Thai, 2020; Huang et al., 2023), and gradients-based methods (Pope et al., 2019; Sánchez-Lengeling et al., 2020). In parallel, alternative research directions focused on concept-based explanations, i.e. high-level units of information that further facilitate human understandability (Magister et al., 2021; Xuanyuan et al., 2023). Disentangled learning on graphs. Disentangled representation learning seeks to uncover and isolate the fundamental explanatory factors within data (Wang et al., 2024). In recent years, these techniques have gained traction for graph-structured data (Liu et al., 2020; Li et al., 2021; Yang et al., 2020; Fan & Gao, 2024). For instance, Factor GCN (Yang et al., 2020) disentangles an input graph into multiple factorized graphs, resulting in distinct disentangled feature spaces that are aggregated afterwards. IPGDN (Liu et al., 2020) proposes a disentanglement using a neighborhood routing mechanism, enforcing independence between the latent representations as a regularization term for GNN outputs. Meanwhile, DGCL (Li et al., 2021) focuses on learning disentangled graph-level representations through self-supervision, ensuring that the factorized components capture expressive information from distinct latent factors independently. 3 Our Proposed Framework: Di Se NE In this section, we begin by outlining the key desiderata for achieving disentangled and self-explainable node representations. Next, we design a novel framework that meets these objectives by ensuring that the learned node representations are both disentangled and interpretable. Finally, we introduce new evaluation metrics to effectively assess the quality of node representation learning in both disentangled and explainable settings. Published in Transactions on Machine Learning Research (07/2025) 3.1 Core Objectives and Desiderata In the context of unsupervised graph representation learning, we argue that learning self-explainable node embeddings amounts to reconstructing the input graph in a human-interpretable fashion. Traditionally, dotproduct models based on NMF (Yang & Leskovec, 2013) and LPCA (Chanpuriya et al., 2023) decompose the set of graph nodes into clusters, where each entry of the node embedding vector represents the strength of the participation of the node to a cluster. In this scenario, the dot-product of node embeddings becomes intuitively understandable, as it reflects the extent of shared community memberships between nodes, thereby providing a clear interpretation of edge likelihoods. This concept is also related to distance encoding methods (Li et al., 2020; Klemmer et al., 2023), where a node feature hd(u) is expressed as a function of the node s proximity ζ(u, Sd) = AGG({ζ(u, v), v Sd}) to the anchor set Sd V, using specific aggregation functions AGG. Typically, distance encodings are constructed by randomly sampling anchor sets (You et al., 2019), and used as augmented node features to enhance expressiveness and improve performance on downstream tasks. Inspired by this idea, our goal is to optimize unsupervised node embeddings encoded by a GNN function h : V RK trained on the graph G = (V, E), such that node features resemble non-random, structurally meaningful anchor sets, thus improving human-interpretability. To achieve this, we propose three key desiderata for learning general-purpose node representations: (i) connectivity preservation, (ii) dimensional interpretability, and (iii) structural disentanglement. These desiderata serve as the foundational components of our approach, as detailed below. Connectivity preservation. Ideally, node embeddings are constructed so that the geometric relationships in the low-dimensional space mirror the connectivity patterns of the original graph. Nodes with greater similarity in the network should be placed close to each other in the embedding space, and viceversa. To implement the approach, we train the node embedding function in recovering the graph structure. We employ a random walk optimization framework based on the skip-gram model with negative sampling (Huang et al., 2021). The loss function for this framework is defined as: (u,v) Prw log σ h(u) h(v) X (u ,v) Pn log σ h(u ) h(v) , where σ( ) is the sigmoid function, Prw is the distribution of node pairs co-occurring on simulated random walks (positive samples), Pn is a distribution over randomly sampled node pairs (negative samples), and h(u) h(v) represents the dot product between the embeddings of nodes u and v. By optimizing this loss function, we encourage nodes that co-occur in random walks to have similar embeddings, effectively preserving the graph s structural information in the embedding space. Dimensional interpretability. Embedding representations are multi-dimensional, with specific latent factors contributing in representing each dimension (e.g., social similarity, functional proximity, shared interests). A given edge between u and v may rely more heavily on certain dimensions. This perspective shows how local relationships (edges) are directly informed by the global structure of the embeddings, with each dimension contributing uniquely to reconstructing particular relationships. Given structure-preserving embeddings, meaning they effectively encode the input structure, we should be able to interpret each embedding dimension in terms of the graph s topological structure. Specifically, our framework attributes "responsibility" for reconstructing a local relationship (an edge) to specific dimensions, based on their contribution to the edge likelihood or the probability of reconstructing that edge through the embeddings. We achieve this by assigning local subgraphs to different latent dimensions. Consider the likelihood of an edge (u, v), defined as ˆy(u, v; h) = σ PK d=1 hd(u)hd(v) . The likelihood score is obtained by applying the sigmoid to a linear combination of per-dimension products hd(u)hd(v). Because the sigmoid is monotonic, each product s sign and magnitude directly determine its influence on ˆy. Thus, we can interpret the edge likelihood by inspecting the raw products. This decomposition requires only linear additivity and it does not depend on any statistical relationships among the terms. To understand how each dimension d contributes to this likelihood, we compute the edge-wise dimension importance ϕd(u, v; h) as the deviation of the dimension-specific contribution Published in Transactions on Machine Learning Research (07/2025) from its average over all edges: ϕd(u, v; h) = hd(u)hd(v) 1 (u ,v ) E hd(u )hd(v ). (1) Since the dot-product is a linear function PK d=1 αdhd(u)hd(v) + β with unitary coefficients αd 1 and zero intercept β 0, Eq. (1) corresponds to the formulation of Linear SHAP attribution scores (Lundberg & Lee, 2017), using the set of training edges as the background dataset. Essentially, the attribution function ϕd(u, v; h) indicates whether a specific dimension d contributes positively to an edge s likelihood. A positive attribution score means that the dimension increases the likelihood of predicting the edge. If two dimensions are correlated, they may often push an edge with similar attributions, but each still provides an observable marginal deviation that we can measure. In other words, ϕd quantifies marginal responsibility, not exclusive responsibility; therefore it remains valid in the presence of correlations. Each dimension contributes to reconstructing specific edges or substructures, and together, these contributions create a unified representation of the entire graph. Thus, the global nature of embeddings emerges as a direct consequence of their contributions to local relationships. From this observation, we generate dimension-wise (global) explanations for the latent embedding model by collecting edge subsets with positive contributions: Ed = {(u, v) E : ϕd(u, v; h) > 0}. (2) These self-explanations take the form of global edge masks M(d) R|V| |V| 0 , where each entry is defined as M (d) uv (ϕd; h) = max{0, ϕd(u, v; h)}. By applying these masks to the adjacency matrix A through Hadamard product ( ), we obtain A(d) = A M(d). Each masked adjacency matrix A(d) highlights the subgraph associated with dimension d. From these masked adjacency matrices, we construct edge-induced subgraphs Gd = (Vd, Ed), where Vd is the set of nodes involved in edges Ed. These subgraphs act as anchor sets for the model, providing interpretable representations of how each embedding dimension relates to specific structural patterns within the graph. We will refer to edge-induced subgraphs computed as the aforementioned procedure (pseudo-code in Appendix E) as explanation subgraphs/substructures or topological components of the embedding model. Structural disentanglement. To enhance the effectiveness of dimensionally interpretable encodings, each dimension of the latent space should encode an independent structure of the input graph, effectively acting as an anchor subgraph. Inspired by community-affiliation models (Yang & Leskovec, 2013; 2012), we introduce a node affiliation matrix F R|V| K that captures the association between each node u V and anchor subgraph Gd = (Vd, Ed). Specifically, each entry Fud is proportional to the magnitude of predicted meaningful connections between node u and other nodes in Gd, expressed using the per-dimension attribution scores from Eq. (1): Fud(h) = P v Vd ϕd(u, v; h). This aggregates the contributions of dimension d to the likelihood of edges involving node u. To achieve structure-aware disentanglement, we enforce soft-orthogonality among the columns of the affiliation matrix1. This ensures that different embedding dimensions capture independent structures, leading to nearly non-overlapping sets of predicted links for each dimension. We express the columns of the affiliation matrix as F:,d and obtain the disentanglement loss function as: l=1 [cos (F:,d, F:,l) δd,l] (3) where cos (F:,d, F:,l) denotes the cosine similarity between the d-th and the l-th columns of F, and and δd,l is the Kronecker delta function (1 if d = l , 0 otherwise). This objective penalizes correlation between the edge importance attribution scores of different dimensions. By reducing redundancy, we ensure that each dimension contributes uniquely to the reconstruction of a small set of edges, promoting interpretability. This approach enables us to obtain disentangled representations (Wang et al., 2024), where embedding dimensions correspond to orthogonal latent factors of the input graph. Although higher-order disentanglement is possible (e.g., with groups of dimensions), we focus on single-feature disentanglement for achieving dimension-wise interpretability. 1Note that Fud(h) P v V hd(u)hd(v) |V| (u ,v ) E hd(u )hd(v ) and assuming |V| << |E|, the second term becomes negligible, allowing us to approximate Fud(h) and reduce computational costs. Published in Transactions on Machine Learning Research (07/2025) 3.2 Our Approach: Di Se NE Building upon the above components, introduced to satisfy our desiderata for interpretable node embeddings, we present our approach, Di Se NE. Specifically, Di Se NE takes as input identity matrix 1|V| |V| as node attributes and, depending on the encoder architecture, also the adjacency matrix A R|V| |V|. The input is encoded into an intermediate embedding layer Z R|V| D. Next, Di Se NE processes the embedding matrix Z to compute the likelihood of link formation between node pairs, given by ˆy(u, v; h) = σ(h(u) h(v)) where h(v) = ρ(W z(v)) are the final node representations in H R|V| K, obtained by applying a linear transformation W RD K followed by a non-linear activation function ρ. To encode z, we employ architectures incorporating fully connected layers and graph convolutional layers (Wu et al., 2019). This process can be further enhanced by integrating more complex message-passing mechanisms or MLP operations. For example, the message-passing could initiate from an MLP-transformed node attribute matrix, MLP(X), or incorporate more sophisticated architectures beyond simple graph convolutions for increased expressiveness (Xu et al., 2019; Velickovic et al., 2017). The embeddings are optimized by combining the previously described objective functions for preserving structural faithfulness and achieving structural disentanglement, thereby improving dimensional interpretability. To avoid degenerate disentanglement solutions we introduce a regularization strategy. Specifically, we aim avoiding the emergence of empty clusters characterized by near-zero columns in F that, while orthogonal to others, fail to convey meaningful information. This regularization ensures a minimal but significant level of connectivity within each topological substructure. Specifically, we enforce that the total amount of contributions to predicted edges in each anchor subgraph Gk, P u,v V ϕk(u, v; h), to be non-zero. We found a more stable and precise approach by enforcing that the aggregated node features of each embedding dimension are non-zero, achieved by maximizing the entropy2: H = PK d=1 the model is optimized by minimizing the following comprehensive loss function: L = Lrw + Ldis + λent The hyperparameter λent determines the strength of the regularization, controlling the stability for explanation subgraph sizes across the various latent dimensions. We report the pseudo-code of Di Se NE in Appendix B, along with a space-time complexity analysis, showing that our method has O(|E|K + |V|K2 + |V|KTL) runtime complexity and O(|V|K + |V|L) space complexity (T refers to the window size, L to simulated walks length), which is in line with well-established techniques for node embeddings (Tsitsulin et al., 2021). Our approach deviates from typical GNNs by focusing solely on learning from graph topology, as node attributes may not always align with structural information (e.g., in cases of heterophily (Zhu et al., 2024)). While semantic features can be integrated in various ways (Tan et al., 2024), we chose a more straightforward, broadly applicable method. Our node embeddings produce interpretable structural features h(u), which can be concatenated with node semantic attributes x(u), enabling transparent feature sets for effective explanations in downstream tasks via post-hoc tabular techniques. 3.3 Proposed Evaluation Metrics In the following, we present novel metrics to quantify interpretability and disentanglement in unsupervised node embeddings, which we use to compare models in our experiments. Unlike prior works, which primarily focus on explaining graph model decisions, our approach offers a novel perspective by targeting the explanation of graph model encodings. Traditional graph-based explainers focus on interpreting GNN model decisions for specific tasks, highlighting subgraphs or motifs responsible for the predictions (Longa et al., 2025; Li et al., 2025). In contrast, our objective is to assess the interpretability of GNN models in a task-agnostic setting, by evaluating explanatory subgraphs derived from node embedding dimensions that are learned without supervision. In the next, we define our comprehensive vocabulary of evaluation metrics. While alternative taxonomies exist in the literature, we propose a consistent framework highlighting similarities and differences 2We assume non-negative activation functions ρ (e.g., Re LU), thus every post-activation feature hd(u) is itself non-negative. Therefore, their sum P d hd(u) is non-negative as well, and an absolute-value operator in the entropy mass term is unnecessary. Published in Transactions on Machine Learning Research (07/2025) between our indicators and other established metrics in the field of graph-based explainability (Yuan et al., 2023; Kakkad et al., 2023). Topological Alignment. This metric measures how well the explanations of embedding dimensions align with human-interpretable graph structures. Given their importance in the organization of complex real-world systems (Girvan & Newman, 2002; Hric et al., 2014), community modules (or clusters) are often the most intuitive units for understanding a graph. We evaluate topological alignment by handling edges in explanation masks {M(d)}d=1,...,K as retrieved items from a query, and measuring their overlap with the edges in the ground-truth communities using precision, recall, and F1-score. While this metric captures one aspect of human-comprehensibility, it does not exhaust the space of interpretable structures, such as motifs, roles, or domain-specific patterns. For example, in biological networks, structures like protein complexes or regulatory circuits are often considered as more interpretable. In social networks, roles like hubs, bridges, or peripheral nodes may give better explanations than just communities. Future work could incorporate alternative structural annotations to evaluate broader forms of topological alignment. Let C(E) = {C(1), . . . , C(m)} denotes the set of truthful link communities of the input graph3. Associated to partition C(i), we define ground-truth edge masks C(i) {0, 1}V V with binary entries C(i) uv = [(u, v) C(i)]. For a given mask M(d), topological alignment score is given by the maximum edge overlap (as F1-score) of the explanation substructure computed across community structures in C(E), quantifying how closely the explanation match human-understandable ground-truth: Align(M(d)) = maxi n F1(M(d), C(i)) o = maxi 2 prec(M(d), C(i)) 1 + rec(M(d), C(i)) 1 For precision, we weigh relevant item scores with normalized embedding masks values: prec(M(d), C(i)) = P u,v M (d) uv C(i) uv P u,v M (d) uv . For recall, we weigh binarized embedding masks values with normalized ground-truth scores4: rec(M(d), C(i)) = P u,v [M (d) uv >0]C(i) uv |C(i)| . This approach is similar to the accuracy assessment used in GNNExplainer and subsequent works (Ying et al., 2019; Luo et al., 2020), where explanations for model decisions are compared to planted graph substructures, perceived as human-readable justifications for a node s prediction. However, instead of evaluating individual decisions explanations by using synthetic ground-truth motifs, we perform a global assessment of the unsupervised embedding dimensions by comparing their explanation subgraphs with a reasonable ground-truth about the graph structure. An analogous evaluation of explanation correctness which specifically focuses on individual model decisions will be provided later by the plausibility metric. Sparsity. We refer to sparsity as a measure of the localization of subgraph explanations, it is generally defined as the ratio of the number of bits needed to encode an explanation compared to those required to encode the input (Pope et al., 2019; Funke et al., 2023). As we produce soft masks, we use entropy for this quantification. Given that compact explanations are more effective in delivering clear insights, we evaluate sparsity by measuring the normalized Shannon entropy over the mask distribution: Sp(M(d)) = 1 log |E| u ,v M (d) u v u ,v M (d) u v A lower entropy in the mask distribution indicates higher sparsity/compactness. Motivation for using entropy for explanation size quantification can be found in one of the existing works (Funke et al., 2023). Overlap Consistency. This metric assesses whether the correlation between two embedding dimensions is mirrored in their corresponding explanations. A well-structured, disentangled latent space should correspond 3Synthetic graphs can be constructed with ground-truth relevant sub-structures (like BA-Shapes (Ying et al., 2019) or SBM graphs). In real-world graphs, it is usually reasonable to assume that the community structure (Fortunato, 2010) can serve as ground-truth. 4For the precision, we normalize with the sum of scores because they are continuous. For recall, we use the cardinality in place of the sum because the ground-truth has binary scores. Published in Transactions on Machine Learning Research (07/2025) to distinct, uncorrelated topological structures. For example, if two embedding dimensions are correlated, their explanation substructures should also overlap. In our context, this a proxy measure for explanation s contrastivity (Wang et al., 2023; Pope et al., 2019), based on the intuition that explanations for unrelated dimensions should differ substantially particularly in the specific substructures they highlight. We aim to quantify how different topological components affect pairwise feature correlations in the latent space. To achieve this, we propose a metric that measures the strength of association between the physical overlap of the explanation substructures {Gd} and the correlation among corresponding latent dimensions {H:,d}. We compute the overlap between two subgraph components using the Jaccard similarity index of their edge sets from Eq. (2): J(d, l; h) = |Ed El| |Ed El|. The overlap consistency (Ov C) metric measures the linear correlation between pairwise Jaccard values and squared Pearson correlation coefficients (ρ2) of the embedding features: Ov C(h) = ρ [J(d, l; h)]d