# commute_graph_neural_networks__f262aec7.pdf

Commute Graph Neural Networks

Wei Zhuo 1 2 Han Yu 2 Guang Tan 1 Xiaoxiao Li 3 4

Graph Neural Networks (GNNs) have shown remarkable success in learning from graphstructured data. However, their application to directed graphs (digraphs) presents unique challenges, primarily due to the inherent asymmetry in node relationships. Traditional GNNs are adept at capturing unidirectional relations but fall short in encoding the mutual path dependencies between nodes, such as asymmetrical shortest paths typically found in digraphs. Recognizing this gap, we introduce Commute Graph Neural Networks (CGNN), an approach that seamlessly integrates node-wise commute time into the message passing scheme. The cornerstone of CGNN is an efficient method for computing commute time using a newly formulated digraph Laplacian. Commute time is then integrated into the neighborhood aggregation process, with neighbor contributions weighted according to their respective commute time to the central node in each layer. It enables CGNN to directly capture the mutual, asymmetric relationships in digraphs. Extensive experiments on 8 benchmarking datasets confirm the superiority of CGNN against 13 state-of-the-art methods.

1. Introduction

Directed graphs (digraphs) are widely employed to model relational structures in diverse domains, such as social networks (Cross et al., 2001) and recommendation systems (Qiu et al., 2020). Recently, the advances of graph neural networks (GNNs) have inspired various attempts to adopt GNNs for analyzing digraphs (Tong et al., 2020a;b; 2021; Zhang et al., 2021; Rossi et al., 2023; Geisler et al.,

Most of this work was done when Wei Zhuo <wei.zhuo@ntu.edu.sg> was with Shenzhen Campus of Sun Yat-sen University. 1Shenzhen Campus of Sun Yat-sen University, China 2Nanyang Technological University, Singapore 3The University of British Columbia, Canada 4Vector Institute, Canada. Correspondence to: Guang Tan <tanguang@mail.sysu.edu.cn>.

Proceedings of the 42 nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025. Copyright 2025 by the author(s).

Digraph Undirected

asymmetric relationships symmetric relationships

Figure 1: A digraph and its undirected counterpart. Blue arrows indicate unidirectional paths, together with longer paths in the gray area, forming commute closed loops between the central node vi and its outgoing neighbors vj and vk. In the undirected graph, shortest path distances (SPD) between nodes are symmetric. However, in the digraph, the fact that unidirectional SPDs are equal does not imply that mutual SPDs will also be equal. For instance, while the SPDs from vi to vj and vk are identical, the reverse SPD from vj and vk back to vi do not necessarily match these distances.

2023). The essence of GNN-based digraph analysis lies in utilizing GNNs to learn expressive node representations that encode edge direction information.

To achieve this, modern digraph neural networks are designed to integrate edge direction information into the message passing process by distinguishing between incoming and outgoing edges. This distinction enables the central node to learn directionally discriminative information from its neighbors. As illustrated in the digraph of Figure 1, given a central node vi, a 1-layer digraph neural network can aggregate messages from vi s incoming neighbor vm and outgoing neighbor vj, and simultaneously capture edge directions by applying direction-specific aggregation functions (Rossi et al., 2023), or by predefining edge-specific weights (Zhang et al., 2021; Tong et al., 2020b).

Despite the advancements, current digraph neural networks primarily capture unidirectional1 relationships between nodes, neglecting the complexity arising from path asymme-

1 unidirectional refers to relationships in digraphs where edges have a specific direction from one node to another.

Commute Graph Neural Networks

try. For instance, a k-layer GNN aggregates the neighbors within the shortest path k for the central node. If the graph is undirected, the shortest path between any two nodes is symmetric, as shown in the undirected graph of Figure 1. This symmetry simplifies the representation of node relationships, implying that if the shortest path distances (SPDs) from one node to two other nodes are identical, then the SPDs from these two nodes back to the source node must also be the same. Conversely, such symmetry is absent in digraphs. Considering the digraph in Figure 1, the shortest paths between vi and vj are asymmetric. Therefore, although vj and vk are both immediate outgoing neighbors of vi, the strength of their relationships with the central node differs significantly. Existing methods (Rossi et al., 2023; Tong et al., 2020b; Zhang et al., 2021), by focusing solely on unidirectional shortest paths (blue and red arrows), fail to capture the asymmetry phenomenon, which conveys valuable information of node relationships. Take social networks as an example: an ordinary user can directly follow a celebrity, yielding a short path to the celebrity, yet the reverse path from the celebrity to the follower might be much longer. Considering only the short path from the follower to celebrity could falsely suggest a level of closeness that does not exist. In contrast, accounting for the mutual paths between users yields a more precise and robust measure of their relationship, with stronger mutual interactions implying stronger connections.

To capture the mutual path interactions in GNNs, we adapt the concept of commute time, the expected number of steps to traverse from a source node to a target and back, from the Markov chain theory to the domain of graph learning. To this end, we first generalize the graph Laplacian to the digraph by defining the divergence of the gradient on the digraph. Utilizing this digraph-specific Laplacian, we develop an efficient method to compute commute time, ensuring sparsity and computational feasibility. Then we incorporate the commute-time-based proximity measure into the message passing process by assigning aggregation weights to neighbors. The intuition behind is that the immediate and unidirectional neighboring relationships do not necessarily imply strong similarity, but the mutual proximity is a more reliable indicator of relationship closeness. Our experimental results demonstrate the efficacy of CGNN. Particularly, compared with the best-performing baseline, it achieves an average improvement of 2.64% and 4.17% in accuracy on the Squirrel and Citeseer datasets, respectively.

2. Preliminary

Notations Consider G = (V, E, X) as an unweighted digraph comprising N nodes, where V = {vi}N i=1 is the node set, E (V V ) is the edge set with size M, X RN d

is the node feature matrix. Y = {y1, , y N} is the set

of labels for V . Let A RN N be the adjacency matrix and D = diag(d1, , d N) RN N be the degree matrix of A, where di = P

vj V Aij is the out-degree of vi.

Let A = A + I and D = D + I denote the adjacency and degree matrix with self-loops, respectively. The transition probability matrix of the Markov chain associated with random walks on G is defined as P = D 1A, where Pij = Aij/deg(vi) is the probability of a 1-step random walk starting from vi to vj. Graph Laplacian formalized as L = D A is defined on the undirected graph whose adjacency matrix is symmetric. The symmetrically normalized Laplacian with self-loops (Wu et al., 2019) is defined as ˆL = D 1

2 , where L = D A.

Digraph Neural Networks Dir GNN (Rossi et al., 2023) is a general framework that generalizes the message passing paradigm to digraphs by adapting to the directionality of edges. It involves separate aggregation processes for incoming and outgoing neighbors of each node as follows:

m(ℓ) i,in = Agg(ℓ) in n h(ℓ 1) j : vj N in i o

m(ℓ) i,out = Agg(ℓ) out n h(ℓ 1) j : vj N out i o

h(l) i = Comb(ℓ) h(ℓ 1) i , m(ℓ) i,in, m(ℓ) i,out ,

where N in i and N out i are respectively incoming and outgoing neighbors of vi. Agg(ℓ) in ( ) and Agg(ℓ) out( ) are specialized aggregation functions of N in i and N out i at layer ℓ, used to encode the directional characteristics of the edges connected to vi.

3. Random Walk Distance and GNNs

Based on the established notations, we then show that message passing based GNNs naturally capture the concept of hitting time during information propagation across the graph, due to the unidirectional nature of the neighborhood aggregation. Subsequently, we argue for the significance of commute time, highlighting it as a more compact measure of mutual node-wise interactions in random walks.

3.1. Can GNNs Capture Random Walk Distance?

In the context of random walks on a digraph, hitting time and commute time, collectively referred to as random walk distances, serve as key metrics for assessing node connectivity and interaction strength. Hitting time h(vi, vj) is the expected number of steps a random walk takes to reach a specific target node vj for the first time, starting from a given source node vi. Commute time c(vi, vj) is the expected number of steps required for a random walk to start at vi, reach vj, and come back. A high hitting (commute) time indicates difficulty in achieving unidirectional (mutual)

Commute Graph Neural Networks

visits to each other in a random walk. As illustrated in the digraph of Figure 1, commute time c(vi, vj) > c(vi, vk), while the hitting time h(vm, vi) = h(vi, vj) = h(vi, vk).

Motivation Given these definitions, two questions arise: How crucial is it to retain these measures in graph learning? Also, are message-passing GNNs capable of preserving these characteristics? Firstly, both hitting time and commute time are critical in understanding the structural dynamics of graphs. Hitting time, analogous to the shortest path, measures the cost of reaching one node from another, reflecting the directed influence or connectivity. Commute time, encompassing the round-trip journey, offers insights into the mutual relationships between nodes, which is especially evident in social networks, as illustrated by celebrity-follower relationships. Secondly, message-passing GNNs are somewhat effective in capturing hitting time, as they propagate information across the graph in a manner similar to a random walk, where quickly reached nodes are preferentially aggregated, and the influence of nodes exponentially diminishes with increasing distance (Topping et al., 2022). However, GNNs face challenges in preserving commute time due to their requirement for comprehending mutual path relations, which are inherently asymmetric and often involve longerrange interactions especially in digraphs, which are not naturally captured in the basic message-passing framework.

Taking the digraph in Figure 1 as an example, a 1-layer Dir GNN defined in Eq. (1) can encode vm, vj and vk into the representation of vi, while also capturing the directionality of edges from these neighbors by using distinct aggregation functions for incoming and outgoing neighbors. It shows that Dir GNN can capture the hitting time, as neighbors with lower hitting times, h(vi, vk), h(vi, vj) and h(vm, vi), are aggregated preferentially. However, Dir GNN inherently focuses on unidirectional interactions and overlooks mutual path dependencies. Notably, a 1-layer Dir GNN is insufficient for capturing the asymmetric interactions indicated by the paths from vj and vk returning to the central node vi (gray areas). One potential approach to address this limitation is to stack additional message passing layers to encompass the entire commute path between nodes, thereby capturing mutual path interactions. Nevertheless, this strategy is non-trivial because the commute paths vary considerably across different node pairs, complicating the determination of an appropriate number of layers. Additionally, stacking multiple layers to cover these paths can introduce irrelevant non-local information and lead to oversmoothing.

Goal We expect to directly encode node-wise commute time into the node representations to accurately reflect the true interaction strength between adjacent nodes during neighbor aggregation, accounting for both forward and back-

ward paths. For instance, even though h(vi, vj) = h(vi, vk), a shorter commute time c(vi, vk) < c(vi, vj) suggests a stronger interaction from vk to vi compared to vj to vi. Consequently, the contribution of neighbor vk to the representation of vi should be greater than that of vj.

3.2. Commute Time Computation

Based on the standard Markov chain theory, a useful tool to study random walk distances is the fundamental matrix (Aldous & Fill, 2002). We first establish the following assumptions required to support the theorem.

Assumption 3.1. The digraph G is irreducible and aperiodic.

These two properties pertain to the Markov chain s stationary probability distribution π (i.e., Perron vector) corresponding to the given graph. Irreducibility ensures that it is possible to reach any node (state) from any other node, preventing π from converging to 0. Aperiodicity ensures that the Markov chain does not get trapped in cycles of a fixed length, thus guaranteeing the existence of a unique π. Existence and uniqueness of π facilitate deterministic analysis and computation. For a more intuitive understanding of the assumptions, we give the sufficient conditions of digraph under the irreducibility and aperiodicity assumptions.

Proposition 3.2. A strongly connected digraph, in which a directed path exists between every pair of vertices, is irreducible. A digraph with self-loops in each node is aperiodic.

Given the above assumption, the fundamental matrix Z is defined as the sum of an infinite matrix series:

Pt eπ , (2)

where e is the all-one column vector, then we have J = e e

is the all-one matrix, and Π = diag(π) is the diagonal matrix of π.

Theorem 3.3. (Li & Zhang, 2012) Given Assumption 3.1, the fundamental matrix Z defined in Eq. (2) converges to:

Z = (I P + JΠ) 1 JΠ, (3)

where I is an identity matrix.

The hitting time and commute time on G can then be expressed as Z (Aldous & Fill, 2002) as follows:

h(vi, vj) = Zjj Zij

πj , c(vi, vj) = h(vi, vj)+h(vj, vi).

However, directly calculating the complete fundamental matrix Z and the commute times for all node pairs is computationally expensive and yields a dense matrix. Moreover,

Commute Graph Neural Networks

integrating the random walk distances computation, defined in Eq. (3) and Eq. (4), into the message passing framework is non-trivial, which concerns the scalability of the model.

4. Commute Graph Neural Networks

In this section, we present CGNN to encode the commute time information into message passing. We first establish the relationship between random walk distances and the digraph Laplacian.

4.1. Digraph Laplacian (Di Lap)

Contrary to the traditional graph Laplacian, typically defined as a symmetric positive semi-definite matrix derived from the symmetric adjacency matrix, our proposed Di Lap is built upon the transition matrix to preserve the directed structure. Specifically, the classical graph Laplacian L = D A is interpreted as the divergence of the gradient of a signal on an undirected graph (Shuman et al., 2013; Hamilton, 2020): given a graph signal s RN, (Ls)i = P

j Ni Aij(si sj). Intuitively, graph Laplacian corresponds to the difference operator on the signal s, and acts as a node-wise measure of local smoothness. In line with this conceptual foundation, we generalize the graph Laplacian to digraphs by defining the divergence of the gradient on digraphs with Di Lap T:

Ts = GDs, T = Bdiag {Pij}M (vi,vj) E B (5)

where G is the gradient operator on graph signals, and D is the divergence operator. B RN M is an incidence matrix, where the dimensions represent nodes and edges, respectively. For edge indices {e1, , e M} E, if ek = (vi, vj) E, then the k-th column of B corresponding to ek has +1 in row i and 1 in row j. diag {Pij}M (vi,vj) E

is a diagonal matrix whose entries are the transition probabilities corresponding to the edges in the graph. The detailed derivation of T is included in Appendix A.1, which illustrates how T functions as a measure of smoothness in directed graphs, taking into account their directional properties. Although the structure of Di Lap depends on the indices of edges and nodes, such as the ordering of edge transition probabilities in diag {Pij}M (vi,vj) E , the following property holds (for proof, see Appendix A.2).

Proposition 4.1. Di Lap T is permutation equivariant w.r.t. node indices and permutation invariant w.r.t. edge indices.

Given the Laplacian operator s role in assessing signal smoothness throughout the graph, it is essential to allocate greater weights to nodes of higher structural importance. This prioritization ensures that the smoothness at nodes central to the graph s structure more significantly influences the overall smoothness measurement. Thus, we further define

the Weighted Di Lap T :

(T s)i = (ΠGDs)i =πi

(Gs)(vj,vi) X

vj N out i (Gs)(vi,vj)

T = ΠT (6) Here we utilize the i-th element of the Perron vector π to quantify the structural importance of vi, reflecting its eigenvector centrality. This is based on the principle that a node s reachability is directly proportional to its corresponding value in the Perron vector (Xu et al., 2018). Therefore, π effectively indicates the centrality and influence over the long term in the graph. Perron-Frobenius Theorem (Horn & Johnson, 2012) establishes that π satisfies P

i πi = 1, is strictly positive, and converges to the left eigenvector of the dominant eigenvalue of P.

4.2. Similarity-based Graph Rewiring

Both the fundamental matrix defined in Eq. (3) and Weighted Di Lap requires Assumption 3.1 to ensure the existence and uniqueness of the Perron vector π, conditions that are not universally met in general graphs. To fulfill the irreducibility and aperiodicity assumptions, Tong et al. (2020a) introduce a teleporting probability uniformly distributed across all nodes. This method, inspired by Page Rank (Page et al., 1999), amends the transition matrix to Ppr = γP + (1 γ) ee

N , where γ (0, 1). Ppr allows for the possibility that a random walker may choose a nonneighbor node for the next step with a probability of 1 γ

N . This adjustment ensures that Ppr is irreducible and aperiodic, so it has a unique π. However, this approach leads to a complete graph represented by a dense matrix Ppr, posing significant challenges for subsequent computational processes.

Rather than employing Ppr as the transition matrix, we introduce a graph rewiring method based on feature similarity to make a given graph irreducible, while maintaining the sparsity. As outlined in Proposition 3.2, to transform the digraph into a strongly connected structure, it is essential that each node possesses a directed path to every other node. To this end, we initially construct a simple and irreducible graph G with all N nodes, then add all edges from G to the original digraph G, thereby ensuring G s irreducibility. The construction of G begins with the calculation of the mean of node features as the anchor vector a. Then we determine the similarity between each node and the anchor, sort the similarity values, and return the sorted node indices, denoted as Ω RN:

i Xi N , ωi = cos(a, Xi), S = arg sort({ωi}N i=1) (7) where cos(a, Xi) is the cosine similarity between node fea-

Commute Graph Neural Networks

Add edges to

Figure 2: The sorted node indices in Ωare connected one by one with undirected edges to construct G , then adding all edges from G to G to generate e G.

tures of vi and a, and argsort( ) yields the indices of nodes that sort similarity values {ωi}N i=1. We then connect the nodes one by one with undirected (bidirectional) edges following the order in S to construct G , as shown in Figure 2. Given that G is strongly connected, adding all its edges into G results in a strongly connected digraph e G, which is irreducible. To achieve aperiodicity, self-loops are further added to e G. This rewiring approach satisfies Assumption 3.1 and maintains graph sparsity. Additionally, adding edges between nodes with similar features only minimally alters the overall semantics of the original graph. Based on e G and its corresponding e P, e B, and eΠ, we have the deterministic Weighted Di Lap e T .

Note that graph rewiring is a common strategy in graph analysis research. For instance, (Attali et al., 2024) rewires graphs to eliminate negatively curved edges and thus alleviate over-squashing, while (Rubio-Madrigal et al., 2025) modifies the graph to enlarge the spectral gap and, in turn, adjust community structure for the same purpose. In contrast, we rewire the graph to ensure irreducibility and aperiodicity, which guarantees that commute times are uniquely and deterministically defined. Hence, both the motivation and the objective of our approach differ fundamentally from previous work.

4.3. From Di Lap to Commute Time

Given the Weighted Di Lap e T , we can unify the commute time information into the message passing by building the connection between e T and the fundamental matrix Z:

Lemma 4.2. Given a rewired graph e G, the Weighted Di Lap is defined as e T =

eΠe Bdiag n e Pij o M

e B . Then the fundamental

matrix Z of e G can be solved by:

Z = e T eΠ = e T , (8)

where the superscript means Moore Penrose pseudoinverse of the matrix.

Leveraging Lemma 4.2 and using Eq. (4), we can further compute the hitting times and commute times in terms of e T with the following theorem.

Theorem 4.3. Given e G, the hitting time and commute time from vi to vj on e G can be computed as follows:

h(vi, vj) = e T jj πj e T ij πiπj ,

c(vi, vj) = e T jj πj + e T ii πi e T ij πiπj e T ji πiπj .

Then we can derive the matrix forms of the hitting time H and commute time C as per Eq. (9):

H = (e π 1)(e T I) e T (π 1

C = H + H (10)

where denotes Hadamard product, and is outer product. Hij and Cij correspond to the hitting and commute time from vi to vj respectively. The computation of commute times via Di Lap, in contrast to the method delineated in Theorem 3.3, is primarily motivated by efficiency concerns. Specifically, Eq. (3) necessitates the inversion of a dense matrix with complexity O(N 3), whereas our Di Lap-based method hinges on computing the pseudoinverse of a sparse matrix e T. The pseudoinverse of e T can be efficiently determined using SVD. Given the sparse nature of e T, we can employ well-established techniques such as the randomized truncated SVD algorithm (Halko et al., 2011; Cai et al., 2023), which takes advantage of sparsity, to reduce the time complexity to O(q|E|), where |E| denotes the number of edges reflecting the sparsity (See Appendix A.4). Next, we present CGNN based on C.

Connection with Over-squashing Commute time has been used to analyze the cause of over-squashing (Di Giovanni et al., 2023; Arnaiz-Rodr ıguez et al., 2022), and extending the analytical framework to digraphs is a promising direction. While our work primarily focuses on using commute time to encode the directional strength of relationships between nodes, rather than studying the over-squashing problem, we still see two ways that our method might enhance over-squashing analysis: 1) Direction-Aware Jacobian: Di Giovanni et al. (2023) model over-squashing with a symmetric Jacobian suitable for undirected graphs. In contrast, our model captures the directionality of edges, allowing us to construct an asymmetric Jacobian for more targeted analysis in digraphs. 2) Quantifying Long Commutes : while previous over-squashing work identifies that over-squashing tends to occur between nodes that are far apart, it does not quantify exactly how large a commute time triggers over-squashing. Our method, anchored by the Di Lap-based commute time computation, can explicitly quantify these distances.

Commute Graph Neural Networks

C RN N quantifies the strength of mutual relations between node pairs in the random walk context. Notably, smaller values in C correspond to stronger mutual reachability, indicating stronger relations between node pairs. Thus, C is a positive symmetric matrix, and the commute-time-based node proximity matrix can be expressed as e C = exp( C). Since the directed adjacency matrix A represents the outgoing edges of each node, A therefore accounts for all incoming edges. Then we have e Cout = A e C and e Cin = A e C represent the proximity between adjacent nodes under outgoing and incoming edges, respectively. We further perform row-wise max-normalization on e Cout and e Cin to rescale the maximum value in each row to 1. Given the original graph G as input, we define the ℓ-th layer of CGNN as:

m(ℓ) i,in = Agg(ℓ) in n e Cin ij h(ℓ 1) j : vj N in i o

m(ℓ) i,out = Agg(ℓ) out n e Cout ij h(ℓ 1) j : vj N out i o

h(l) i = Comb(ℓ) h(ℓ 1) i , m(ℓ) i,in, m(ℓ) i,out ,

where Agg(ℓ) in ( ) and Agg(ℓ) out( ) are mean aggregation functions with different feature transformation weights, and Comb(ℓ)( ) is a mean operator. Within each layer, the influence of vj on the central node vi is modulated by the commute-time-based proximity e C based on the edge directionality. The pseudocode of CGNN is shown in Algorithm 1.

Complexity Analysis The randomized truncated SVD to compute e T is O(q|E|) where q is the required rank, and the message passing iteration has the same time complexity as Dir GNN with O(L|E|d2). Therefore, the overall time complexity of CGNN is O((Ld2 + q)|E|). In practice, q is typically set to 5, rendering the time complexity effectively linear with respect to the number of edges |E|. In GNN domain, models with a complexity less than O(N 2) are generally considered feasible by researchers (Wu et al., 2020). Given that real-world networks are often extremely sparse, i.e., |E| N 2, CGNN demonstrates its feasibility as a model within the GNN family.

5. Experiments

We conduct extensive experiments to evaluate the effectiveness of CGNN on eight digraph datasets. Experimental details and data statistics are provided in Appendix C.1 and Appendix C.2.

5.1. Overall Results and Analysis

Table 1 reports the node classification results across eight digraph datasets. Our method CGNN achieves new state-ofthe-art results on 6 out of 8 datasets, and comparable results

Chameleon Squirrel Dataset

(A + A ) 2 2 ( in + out) 2 2

(a) Heterophilic graph.

Cora ML Citeseer Dataset

(A + A ) 2 2 ( in + out) 2 2

(b) Homophilic graph.

Figure 3: Distance between M and A, and between M and e C.

on the remaining ones, validating the superiority of CGNN. We provide more observations as follows. Firstly, while nonlocal GNNs have the potential to cover the commute paths between adjacent nodes by stacking multiple layers, they do not consistently outperform general, shallow GNN models. It suggests that coarsely aggregating all nodes in commute paths is ineffective. The reason is that deeper models may aggregate excessive irrelevant information for the central node. Our goal is to encode mutual relationships between adjacent nodes by considering their commute times. Aggregating all nodes along the entire path introduces excessive information about other nodes unrelated to the direct relationship between the target nodes. Secondly, GNNs tailored for digraphs do not seem to bring substantial gains. Our results show that with careful hyper-parameter tuning, general GNNs can achieve results comparable to, or even better than, some of GNNs tailored for digraphs (Di GCN, Mag Net and Di GCL), as evidenced in the Squirrel, Chameleon, and AM-Photo datasets. Thirdly, CGNN achieves state-of-the-art results on both homophilic and heterophilic digraph benchmarks. Notably, Dir GNN also performs comparably on heterophilic graphs (e.g., Squirrel and Chameleon), confirming the findings of Rossi et al. (2023) that distinguishing edge directionality during message passing enables the central node to adaptively balance information flows from both heterophilic and homophilic neighbors, effectively mitigating the impact of heterophily. Moreover, CGNN, an enhanced version of Dir GNN, further improves performance on these graphs by effectively incorporating commute times to refine the strength of relationships between nodes, enhancing model robustness under heterophily.

To illustrate this, we further examine the relations between commute-time-based proximity and label similarity along edges. As shown in Eq. (11), we use commute-time-based proximity e C to weigh the neighbors during neighbor aggregation. Then we define a label similarity matrix M where Mij = 1 if vj Ni and yi = yj; otherwise Mij = 0. Essentially, M extracts the edges connecting nodes with the same classes from the adjacency matrix A. Thus a higher value of M (A + A ) 2 2 indicates a more pronounced

Commute Graph Neural Networks

Table 1: Node classification results. We highlight/underline the best/second-best method. For general GNN and non-local GNN baselines, we conduct experiments on both symmetrized versions and their directed counterparts, reporting better results from these two settings. OOM indicates out-of-memory. In Table 9 and Table 10 of Appendix D.1, we present detailed experimental results for both directed and undirected settings of all available baselines.

Method Squirrel Chameleon Citeseer Cora ML AM-Photo Snap-Patents Roman-Empire Arxiv-Year

GCN 52.43 2.01 67.96 1.82 66.03 1.88 70.92 0.39 88.52 0.47 51.02 0.06 73.69 0.74 46.02 0.26 GAT 40.72 1.55 60.69 1.95 65.58 1.39 72.22 0.57 88.36 1.25 OOM 49.18 1.35 45.30 0.23 Graph SAGE 41.61 0.74 62.01 1.06 66.81 1.38 74.16 1.55 89.71 0.57 67.45 0.53 86.37 0.80 55.43 0.75

APPNP 51.91 0.56 45.37 1.62 66.90 1.82 70.31 0.67 87.43 0.98 51.23 0.54 72.96 0.38 50.31 0.42 Mix Hop 43.80 1.48 60.50 2.53 56.09 2.08 65.89 1.50 87.17 1.34 41.22 0.19 50.76 0.14 45.30 0.26 GPRGNN 50.56 1.51 66.31 2.05 61.74 1.87 73.31 1.37 90.23 0.34 40.19 0.03 64.85 0.27 45.07 0.21 GCNII 38.47 1.58 63.86 3.04 58.32 1.93 64.84 0.71 83.40 0.79 48.09 0.09 74.27 0.13 57.36 0.17

DGCN 37.16 1.72 50.77 3.31 66.37 1.93 75.02 0.50 87.74 1.02 OOM 51.92 0.43 OOM Di GCN 33.44 2.07 50.37 4.31 64.99 1.72 77.03 0.70 88.66 0.51 OOM 52.71 0.32 48.37 0.19 Mag Net 39.01 1.93 58.22 2.87 65.04 0.47 76.32 0.10 86.80 0.65 OOM 88.07 0.27 60.29 0.27 DUPLEX 57.60 0.98 61.25 0.94 67.60 0.72 72.26 0.71 87.80 0.82 66.54 0.11 79.02 0.08 64.37 0.27 Di GCL 35.82 1.73 56.45 2.77 67.42 0.14 77.53 0.14 89.41 0.11 70.65 0.07 87.94 0.10 63.10 0.06 Dir GNN 75.19 1.26 79.11 2.28 66.57 0.74 75.33 0.32 88.09 0.46 73.95 0.05 91.23 0.32 64.08 0.26

CGNN 77.83 1.52 79.62 2.33 71.59 0.16 77.08 0.54 90.42 0.10 72.89 0.24 92.87 0.45 66.16 0.32

negative impact of heterophily on the model s performance. On the other hand, we compute M ( e Cin + e Cout) 2 2 to evaluate the efficacy of e C in filtering heterophilic information. The closer ( e Cin + e Cout) is to M, the more effectively it aids the model in discarding irrelevant heterophilic information. Figure 3 visually demonstrates these relationships. We observe that in heterophilic datasets, the commute-time-based proximity matrix ( e Cin + e Cout), aligns more closely with the label similarity matrix M than (A + A ). It indicates that e C effectively filters out irrelevant information during message passing by appropriately weighting neighbors, thereby explaining CGNN s superior performance on heterophilic graphs as well as its strong results on homophilic graphs.

Application scope analysis. Can commute times always enhance message passing on directed graphs? To answer this, we analyze the scope of use for CGNN based on Table 1. For example, on the Snap-Patents and Cora ML dataset, we observed that adding commute time-based weights during message passing did not significantly enhance performance. Now we can analyze the reason from the perspective of dataset. Cora ML is a directed citation network where nodes predominantly link to other nodes within the same research area. However, in such networks, reciprocal citations between two papers are impossible due to their chronological sequence. Consequently, mutual path dependencies do not exist, and thus, incorporating commute times to adjust neighbor weights might (slightly) hurt performance. A similar situation exists with the Snap-Patents dataset, where each directed edge represents a citation from one patent to another, again indicating the absence of mutual path depen-

80 100 120 140 160 180 200 Running Time (s)

APPNP GPRGNN GCNII DGCN Di GCN Di GCL Mag Net CGNN

(a) Squirrel

100 120 140 160 180 200 Running Time (s)

APPNP GPRGNN GCNII DGCN Di GCN Di GCL Mag Net CGNN

(b) AM-Photo

Figure 4: Accuracy vs. running time.

In conclusion, in networks like citation networks where mutual relationships inherently do not exist, applying commute times is unnecessary. Our model is particularly effective in networks like webpage networks and social networks examples being Squirrel and AM-Photo where mutual relationships are prevalent. For instance, in a social network, an ordinary user may follow a celebrity, creating a short path to the celebrity. However, the reverse path from the celebrity back to the user might be considerably longer. For such cases, considering mutual relationships based on commute times can provide a more accurate description of node relationships.

5.2. Efficiency Comparsion

Figure 4 compares the accuracy of different models along with running times. Despite the additional computational load of calculating commute-time-based proximity, the re-

Commute Graph Neural Networks

sults show that CGNN provides the best trade-off between effectiveness and efficiency. In particular, on the Squirrel dataset, CGNN has the third-fastest calculation speed while yielding accuracy nearly double that of all other methods. On AM-Photo, CGNN achieves the highest accuracy while maintaining moderate efficiency.

5.3. Component Analysis

Comparison between graph rewiring and PPR. In Section 4.2, we construct a rewired graph e G based on feature similarity to guarantee the irreducibility and aperiodicity. This approach introduces at most two additional edges per node, specifically targeting those with the highest feature similarity, while minimally altering the original graph structure to preserve semantic information. To investigate changes in commute times before and after rewiring, for the original graph, we use its largest connected component, removing absorbing nodes (i.e., nodes without outgoing edges) to ensure that we can compute meaningful and deterministic commute times. We denote the average normalized commute time for the original graph as corig; For the rewired graph, we directly compute the commute time and denote this average normalized value as crew. We then use δ = corig crew 2

corig to quantify the changes, which can be intepretated as the proportion of commute information changed in the original graph. As shown in Table 2, the graph rewiring method can effectively preserve the original commute times of the graph.

Table 2: Changes in commute times before and after rewiring.

Cora ML Cite Seer Chameleon Roman-Empire

δ 0.03280 0.028746 0.00157 0.06242

In contrast, the classic Page Rank transition matrix, defined as Ppr = γP + (1 γ) ee

N , achieves a similar objective but results in a completely connected graph Gpr. However, this approach tends to overlook the sparse structure of the original graph, which may alter the semantic information in the graph. Additionally, computing commute times using a dense transition matrix incurs a high computational cost. To validate the effectiveness of the rewiring approach over the PPR method, we conduct an experiment where e G is replaced with Gpr in the computation of commute-timebased proximity. We denote this variant as CGNNppr and the results of accuracy and efficiency are reported in Table 3. The findings reveal that the PPR approach is suboptimal in terms of both accuracy and efficiency, thereby underscoring the effectiveness of our rewiring-based approach.

Directed vs. Undirected. To validate the critical role of directed structures in our model, we transform all directed

Table 3: Accuracy and running time (s) of CGNN and CGNNppr.

Squirrel Chameleon Citeseer Cora ML AM-Photo

Method Acc. Time Acc. Time Acc. Time Acc. Time Acc. Time

CGNN 77.83 99.25 79.62 115.77 71.59 89.25 77.08 125.20 90.42 124.17 CGNNppr 68.37 257.84 71.69 253.05 68.59 137.82 76.23 192.09 88.52 203.04

edges into undirected ones by adding their reverse counterparts. This process results in a symmetric adjacency matrix, denoted as Asym. Subsequently, the commute time is calculated based on the transition matrix derived from Asym. We refer this variant as CGNNsym . Table 4 shows the accuracy of CGNN and CGNNsym on three datasets. We find that edge direction can significantly influence the prediction accuracy for our model.

Table 4: Impact of directed structure.

Squirrel Cora ML AM-Photo

CGNN 77.83 77.08 90.42 CGNNsym 72.37 71.29 88.53

Impact of the SVD rank q. To efficiently compute the pseudoinverse of e T and obtain the fundamental matrix, we employ a randomized truncated SVD algorithm with rank q, detailed in Appendix A.4. We perform an ablation study to analyze the influence of the SVD rank q across three datasets of varying sizes in Table 5. The results indicate that increasing q beyond a certain point does not yield continuous performance gains and incurs additional computational cost. Our findings suggest that setting q = 5 or 7 is optimal, as a small rank is sufficient for nodes to capture neighbor relationship strengths.

Table 5: Impact of q.

AM-Photo Snap-Patent Arxiv-Year

3 88.36 71.53 64.20 5 90.42 72.89 66.16 7 90.19 73.01 66.21 10 90.37 72.97 66.45

Impact of rewiring on graph structure To explore how the rewiring procedure only minimally alters the overall semantics of the original graph, we define edge density as δ = M Mmax , where Mmax is the maximum possible number

of edges (N 2 for both G and e G) in the graph and M is the actual number of edges. We denote the edge density of the original graph G as δ and that of the rewired graph e G as eδ. Thus, the change of graph density after rewiring can be represented as = eδ δ

δ (0, 1), the smaller indicates that the less effect of our methods on graph density. In the Table 6 we calculate on AM-Photo, Snap-Patent

Commute Graph Neural Networks

and Arxiv-Year datasets. The results reveal that on the AMPhoto dataset, graph rewiring increases density by 10.3%, while on the Snap-Patent and Arxiv-Year datasets, the increases are only 6.7% and 3.2% respectively. These findings demonstrate that our rewiring method generally has a modest effect on graph density.

Table 6: Changes in graph structure before and after rewiring.

AM-Photo Snap-Patent Arxiv-Year

0.103 0.067 0.032

6. Related Work

6.1. Digraph Laplacian

While the Laplacian for undirected graphs has been extensively studied, the area of Laplace operator digraphs remains underexplored. Chung (2005) pioneers this area by defining a normalized Laplace operator specifically for strongly connected directed graphs with nonnegative weights. This operator is expressed as I π1/2Pπ 1/2+π 1/2P π1/2

2 . Key to this formulation is the use of the transition probability operator P and the Perron vector π, with the operator being self-adjoint. Building on the undirected graph Laplacian, Singh et al. (2016) adapt this concept to accommodate the directed structure, focusing particularly on the in-degree matrix. They define the directed graph Laplacian as Din A, where Din = diag din i N i=1

represents the in-degree matrix. Li & Zhang (2012) uses stationary probabilities of the Markov chain governing random walks on digraphs to define the Laplacian as π 1 2 (I P)π 1

2 , which underscores the importance of random walks and their stationary distributions in understanding digraph dynamics. Hermitian Laplacian (Furutani et al., 2020) consider the edge directionality and node connectivity separately, and encode the edge direction into the argument in the complex plane. Diverging from existing Laplacians, our proposed Di Lap is grounded in graph signal processing principles, conceptualized as the divergence of a signal s gradient on the digraph. It encompasses the degree matrix D to preserve local connectivity, the transition matrix P to maintain the graph s directed structure, and the diagonalized Perron vector Π, capturing critical global graph attributes such as node structural importance, global connectivity, and expected reachability (Chung, 1997).

6.2. Digraph Neural Networks

To effectively capture the directed structure with GNNs, spectral-based methods (Zhang et al., 2021; Tong et al., 2020a;b) have been proposed to preserve the underlying

spectral properties of the digraph by performing spectral analysis based on the digraph Laplacian proposed by Chung (2005). Koke & Cremers (2024) introduce holomorphic filters as spectral filters for digraphs, and investigate their optimal filter-bank. Mag Net (Zhang et al., 2021) and its extensions (Lin & Gao, 2023; Fiorini et al., 2023) utilize the magnetic Laplacian to derive a complex-valued Hermitian matrix to encode the asymmetric nature of digraphs. Spatial GNNs also offer a natural approach to capturing directed structures. Graph SAGE (Hamilton et al., 2017) allows for controlling the direction of information flow by considering in-neighbors or out-neighbors separately. Dir GNN (Rossi et al., 2023) further extends Graph SAGE by segregating neighbor aggregation according to edge directions, offering a more refined method to handle the directed nature of graphs. DUPLEX (Ke et al., 2024) utilizes Hermitian adjacency matrix decomposition for neighbor aggregation and incorporates a dual GAT encoder for modeling directional neighbors. Transformer-based methods capture directed structure by specific positional encoding modules, such as directional random walk encoding (Geisler et al., 2023) and partial order encoding (Luo et al., 2024).

7. Conclusion

Identifying and encoding asymmetric mutual path dependencies in digraphs is essential for accurately representing real relationships between entities. In this work, we utilize the concept of commute time to assess the strength of asymmetric relationships in digraphs and introduce CGNN to enhance node representations. To achieve this, we propose Di Lap, a novel Laplacian formulation derived from the divergence of the gradient of signals on digraphs, along with an efficient computational method for deterministic commute times. By integrating commute times into GNN message passing, CGNN effectively harnesses path asymmetry in digraphs, significantly improving learning performance, as validated through extensive experiments.

Limitations and Future work CGNN s primary limitation is its memory overhead. The commute time matrix C is dense by definition, incurring quadratic memory complexity relative to the number of nodes. A promising direction for future study is to compute and maintain commute times locally rather than globally. By restricting computations to each node s immediate neighborhood within a limited number of hops, we may substantially reduce the memory footprint while still capturing the essential structural information needed for effective message passing.

Acknowledgements

The research was supported in part by the Shenzhen Basic Research Fund (Grant No. JCYJ20241202130025030). W.

Commute Graph Neural Networks

Zhuo and H. Yu were supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 1 (RG101/24); the National Research Foundation, Singapore and DSO National Laboratories under the AI Singapore Programme (AISG Award No. AISG2-RP-2020-019). X. Li was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. RS-202400445087).

Impact Statement

This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here.

Abu-El-Haija, S., Perozzi, B., Kapoor, A., Alipourfard, N., Lerman, K., Harutyunyan, H., Ver Steeg, G., and Galstyan, A. Mixhop: Higher-order graph convolutional architectures via sparsified neighborhood mixing. In International Conference on Machine Learning, pp. 21 29. PMLR, 2019.

Aldous, D. and Fill, J. Reversible markov chains and random walks on graphs, 2002.

Arnaiz-Rodr ıguez, A., Begga, A., Escolano, F., and Oliver, N. M. Diff Wire: Inductive Graph Rewiring via the Lov asz Bound. In Proceedings of the First Learning on Graphs Conference, Proceedings of Machine Learning Research. PMLR, 2022.

Attali, H., Buscaldi, D., and Pernelle, N. Delaunay graph: Addressing over-squashing and over-smoothing using delaunay triangulation. In Forty-first International Conference on Machine Learning, 2024.

Bojchevski, A. and G unnemann, S. Deep gaussian embedding of attributed graphs: Unsupervised inductive learning via ranking. ar Xiv preprint ar Xiv:1707.03815, 2017.

Cai, X., Huang, C., Xia, L., and Ren, X. Lightgcl: Simple yet effective graph contrastive learning for recommendation. ar Xiv preprint ar Xiv:2302.08191, 2023.

Chen, M., Wei, Z., Huang, Z., Ding, B., and Li, Y. Simple and deep graph convolutional networks. In International conference on machine learning, pp. 1725 1735. PMLR, 2020.

Chien, E., Peng, J., Li, P., and Milenkovic, O. Adaptive universal generalized pagerank graph neural network. In International Conference on Learning Representations,

2021. URL https://openreview.net/forum? id=n6jl7f Lxr P.

Chung, F. Laplacians and the cheeger inequality for directed graphs. Annals of Combinatorics, 9(1):1 19, 2005.

Chung, F. R. Spectral graph theory, volume 92. American Mathematical Soc., 1997.

Cross, R., Borgatti, S. P., and Parker, A. Beyond answers: Dimensions of the advice network. Social networks, 23 (3):215 235, 2001.

Di Giovanni, F., Giusti, L., Barbero, F., Luise, G., Lio, P., and Bronstein, M. M. On over-squashing in message passing neural networks: The impact of width, depth, and topology. In International Conference on Machine Learning, pp. 7865 7885. PMLR, 2023.

Fiorini, S., Coniglio, S., Ciavotta, M., and Messina, E. Sigmanet: One laplacian to rule them all. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 7568 7576, 2023.

Furutani, S., Shibahara, T., Akiyama, M., Hato, K., and Aida, M. Graph signal processing for directed graphs based on the hermitian laplacian. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2019, W urzburg, Germany, September 16 20, 2019, Proceedings, Part I, pp. 447 463. Springer, 2020.

Geisler, S., Li, Y., Mankowitz, D. J., Cemgil, A. T., G unnemann, S., and Paduraru, C. Transformers meet directed graphs. In International Conference on Machine Learning, pp. 11144 11172. PMLR, 2023.

Halko, N., Martinsson, P.-G., and Tropp, J. A. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217 288, 2011.

Hamilton, W. L. Graph representation learning. Morgan & Claypool Publishers, 2020.

Hamilton, W. L., Ying, R., and Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 1025 1035, 2017.

Horn, R. A. and Johnson, C. R. Matrix analysis. Cambridge university press, 2012.

Ke, Z., Yu, H., Li, J., and Zhang, H. DUPLEX: Dual GAT for complex embedding of directed graphs. In Fortyfirst International Conference on Machine Learning, 2024. URL https://openreview.net/forum? id=M3uv4q DKOL.

Commute Graph Neural Networks

Kipf, T. N. and Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017.

Klicpera, J., Bojchevski, A., and G unnemann, S. Predict then propagate: Graph neural networks meet personalized pagerank. In International Conference on Learning Representations (ICLR), 2019.

Koke, C. and Cremers, D. Holonets: Spectral convolutions do extend to directed graphs. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum? id=Ehm Ewfav OW.

Li, Y. and Zhang, Z.-L. Digraph laplacian and the degree of asymmetry. Internet Mathematics, 8(4):381 401, 2012.

Lin, L. and Gao, J. A magnetic framelet-based convolutional neural network for directed graphs. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1 5. IEEE, 2023.

Luo, Y., Thost, V., and Shi, L. Transformers over directed acyclic graphs. Advances in Neural Information Processing Systems, 36, 2024.

Page, L., Brin, S., Motwani, R., and Winograd, T. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Info Lab, 1999.

Pei, H., Wei, B., Chang, K. C.-C., Lei, Y., and Yang, B. Geom-gcn: Geometric graph convolutional networks. In International Conference on Learning Representations, 2019.

Qiu, R., Yin, H., Huang, Z., and Chen, T. Gag: Global attributed graph neural network for streaming sessionbased recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 669 678, 2020.

Rossi, E., Charpentier, B., Giovanni, F. D., Frasca, F., G unnemann, S., and Bronstein, M. M. Edge directionality improves learning on heterophilic graphs. In The Second Learning on Graphs Conference, 2023. URL https: //openreview.net/forum?id=T4LRb AMWFn.

Rozemberczki, B., Allen, C., and Sarkar, R. Multi-scale attributed node embedding. Journal of Complex Networks, 9(2):cnab014, 2021.

Rubio-Madrigal, C., Jamadandi, A., and Burkholz, R. GNNs getting comfy: Community and feature similarity guided rewiring. In The Thirteenth International Conference on Learning Representations, 2025.

Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., and Eliassi-Rad, T. Collective classification in network data. AI magazine, 29(3):93 93, 2008.

Shchur, O., Mumme, M., Bojchevski, A., and G unnemann, S. Pitfalls of graph neural network evaluation. ar Xiv preprint ar Xiv:1811.05868, 2018.

Shuman, D. I., Narang, S. K., Frossard, P., Ortega, A., and Vandergheynst, P. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE signal processing magazine, 30(3):83 98, 2013.

Singh, R., Chakraborty, A., and Manoj, B. Graph fourier transform based on directed laplacian. In 2016 International Conference on Signal Processing and Communications (SPCOM), pp. 1 5. IEEE, 2016.

Tong, Z., Liang, Y., Sun, C., Li, X., Rosenblum, D., and Lim, A. Digraph inception convolutional networks. Advances in neural information processing systems, 33, 2020a.

Tong, Z., Liang, Y., Sun, C., Rosenblum, D. S., and Lim, A. Directed graph convolutional network. ar Xiv preprint ar Xiv:2004.13970, 2020b.

Tong, Z., Liang, Y., Ding, H., Dai, Y., Li, X., and Wang, C. Directed graph contrastive learning. Advances in neural information processing systems, 34:19580 19593, 2021.

Topping, J., Giovanni, F. D., Chamberlain, B. P., Dong, X., and Bronstein, M. M. Understanding over-squashing and bottlenecks on graphs via curvature. In International Conference on Learning Representations, 2022.

Veliˇckovi c, P., Cucurull, G., Casanova, A., Romero, A., Li o, P., and Bengio, Y. Graph Attention Networks. International Conference on Learning Representations, 2018. URL https://openreview.net/forum? id=r JXMpik CZ. accepted as poster.

Wu, F., Souza, A., Zhang, T., Fifty, C., Yu, T., and Weinberger, K. Simplifying graph convolutional networks. In International conference on machine learning, pp. 6861 6871. PMLR, 2019.

Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., and Philip, S. Y. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, 32(1):4 24, 2020.

Xu, K., Li, C., Tian, Y., Sonobe, T., Kawarabayashi, K.-i., and Jegelka, S. Representation learning on graphs with jumping knowledge networks. In International conference on machine learning, pp. 5453 5462. PMLR, 2018.

Commute Graph Neural Networks

Zhang, X., He, Y., Brugnone, N., Perlmutter, M., and Hirn, M. Magnet: A neural network for directed graphs. Advances in neural information processing systems, 34: 27003 27015, 2021.

Zhu, J., Yan, Y., Zhao, L., Heimann, M., Akoglu, L., and Koutra, D. Beyond homophily in graph neural networks: Current limitations and effective designs. Advances in Neural Information Processing Systems, 33, 2020.

Commute Graph Neural Networks

A. Proofs and Derivations

A.1. Derivation of Di Lap T

The gradient operator G maps a signal defined on the nodes of the graph to a signal on the edges. For a directed graph G and a signal s RN on the nodes, the gradient Gs is defined on the edges as:

(Gs)(vi,vj) = Pij(si sj) (12)

for each directed edge (vi, vj) E. This captures the difference in the signal between the source node vi and the target node vj.

The divergence operator D maps a signal defined on the edges back to a signal on the nodes. For a signal Gs RM on the edges, the divergence at node vi is:

(D(Gs))i = X

(Gs)(vj,vi) X

vj N out i (Gs)(vi,vj) (13)

This computes the net incoming minus outgoing signal flow at each node. The digraph Laplacian Di Lap T is then defined as the composition of the divergence and gradient operators on the original signal s:

Ts = DGs (14)

Eq. (12) and Eq. (13) demonstrate that the composed operator forming Di Lap effectively measures how the signal diverges from each node considering the graph s directionality. Therefore, analogous to the traditional Laplacian in undirected graphs, Di Lap acts as a measure of smoothness specifically tailored for directed graphs.

To express T in matrix form, we initially define the incidence matrix B RN M, which encapsulates both the connectivity and the directionality of the edges within the digraph:

+1, ek = (vi, vj) 1, ek = (vj, vi) 0, otherwise , (15)

where k {1, , M} represents fixed edge indices, and each undirected edge is treated as comprising two unidirectional edges. Then We construct a diagonal matrix representing the edge transition probabilities, denoted as diag {Pij}M (vi,vj) E RM M, where the principal diagonal elements are indexed according to the edge indices. Based

on the above definitions, the gradient operator G can be represented as G = diag {Pij}M (vi,vj) E B , and the divergence operator as D = B. Therefore, the Di Lap becomes:

T = Bdiag {Pij}M (vi,vj) E B (16)

A.2. Proof of Proposition 4.1

Proof. Let Qnode RN N be a node permutation matrix that reorders the nodes in G. The permuted incidence matrix can be represented as B = Q node B. Then we have the permuted Di Lap T :

T = B diag {Pij}M (vi,vj) E B

= Q node B diag {Pij}M (vi,vj) E B Qnode

= Q node TQnode

Eq. (17) shows that T is obtained by conjugating T with the node permutation matrix Qnode, which means T is T with its rows and columns permuted according to Qnode. Thus T is permutation equivariant up to a relabeling of nodes.

Commute Graph Neural Networks

Let Qedge RM M be an edge permutation matrix that reorders the edges of G. The permuted incidence matrix can be rep-

resented as B = BQedge, and the permuted diagonal matrix diag {Pij}M (vi,vj) E = Q edgediag {Pij}M (vi,vj) E Qedge.

Then we have the permuted Di Lap T :

T = B diag {Pij}M (vi,vj) E B

= (BQedge) Q edgediag {Pij}M (vi,vj) E Qedge Q edge B

= Bdiag {Pij}M (vi,vj) E B

Eq. (18) shows that T remains unchanged under edge permutation when B and diag {Pij}M (vi,vj) E are adjusted accordingly. Thus T is fully invariant to the ordering of edges.

A.3. Proof of Lemma 4.2

Proof. We first define the weighted out-transition matrix as F = diag n πi P

. Based on F, the weight Di Lap

T can be written as T = F ΠP. P can be expressed as:

P = Π 1(F T ). (19)

Since the transition matrix P is row-stochastic, it follows that Pt J = J. In light of Eq. (2) and considering that π is stochastic, we have ZJ = 0n n and Π 1

Let K = Π 1

2 , J = Π 1 2 JΠ 1 2 , and Z = Π 1 2 ZΠ 1

2 , we have J 2 = J . As π Z = 0, we have ZJ = Π 1 2 ZJΠ 1 2 = 0N N and J Z = 0N N. Since B J = 0N N, TJ = 0N N and JT = 0N N holds. Incorporating these into Eq. (3), we have: Z + J = (K + J ) 1. (20)

By post-multiplying Eq. (20) from the right by (K + J ), we have:

I J = ZK + J K, (21)

where J K = (Π 1 2 JΠ 1 2 )(Π 1

2 ) = Π 1 2 JTΠ 1

2 = 0N N. Then we have:

ZK = I J (22)

Similarly, by multiplying from the left, we establish that KZ = I J . Since J Z = 0N N, ZKZ = Z. Furthermore, KJ = (Π 1

2 )(Π 1 2 JΠ 1 2 ) = 0N N leads to KZK = K. Considering the symmetry of the left part of Eq. (22), we have (ZK) = ZK. Similarly, (KZ) = KZ. These derivations satisfy the sufficient conditions for the Moore Penrose pseudoinverse, such that Z = K (23)

Finally, recovering Z and K as: Z = T Π (24)

which concludes the proof.

A.4. SVD for e T

Given a matrix e T RN N, its Moore-Penrose pseudoinverse can be directly computed with an SVD-based method. Specifically, we first perform truncated SVD on e T UqΣq V q , where Uq RN q and Vq RN q contains the first q columns of U and V. Σq Rq q is the diagonal matrix of q largest singular values. It is a q-rank approximation of e T, which holds that rank(R) = q. Then the Moore-Penrose pseudoinverse of e T can be easily computed as follows:

e T = UqΣ 1 q V q . (25)

Commute Graph Neural Networks

To leverage sparsity of e T to avoid O(N 3) complexity, we adopt the randomized SVD algorithm proposed by (Halko et al., 2011; Cai et al., 2023) to first approximate the range of the input matrix with a low-rank orthonormal matrix, and then perform SVD on this smaller matrix:

ˆU q, ˆΣq, ˆV q = Approx SVD(e T, q), ˆe TSV D = ˆU q ˆΣq ˆV q , (26)

where ˆU q, ˆΣq, and ˆV q are the approximated versions of Uq, Σq, and Vq. Then the Moore-Penrose pseudoinverse of e T can be computed by: e T = ˆU q ˆΣ 1 q ˆV q . (27)

The computation cost of randomized truncated SVD takes O(q K), where K is the number of non-zero elements in e T, so we have K = |E|. Thus, the sparsity degree of e T can determine the time complexity of its Moore-Penrose pseudoinverse, which demonstrates the importance of Lemma 4.2.

B. Pseudo Code for CGNN

Algorithm 1 CGNN Input: Digraph G = (V, E, X); Depth L; Hidden size d ; Number of classes K Output: Logits ˆY RN K

1: Compute the anchor a and node-anchor similarities to construct G with Eq. (7). 2: Add all edges from G to G to generate e G. 3: Compute the Weight Di Lap e T for e G with Eq. (6). 4: Compute R and its Moore-Penrose pseudoinverse with Eq. (8) and Eq. (27). 5: Compute the commute time matrix C with Eq. (10). 6: Compute the normalized proximity matrix e C with e Cout = A e C and e Cin = A e C. 7: for ℓ {1, , L} do 8: Layer-wise message passing with Eq. (11). 9: end for 10: H = MLP(H(L)). 11: ˆY = Softmax(H).

C. Implementation Details

C.1. Experimental Settings

We provide a performance comparison with 12 baselines including 1) General GNNs: GCN (Kipf & Welling, 2017), GAT (Veliˇckovi c et al., 2018), and Graph SAGE (Hamilton et al., 2017); 2) Non-local GNNs: APPNP (Klicpera et al., 2019), Mix Hop (Abu-El-Haija et al., 2019), GPRGNN (Chien et al., 2021), and GCNII (Chen et al., 2020); 3) Digraph NNs: DGCN (Tong et al., 2020b), Di GCN (Tong et al., 2020a), Mag Net (Zhang et al., 2021), Di GCL (Tong et al., 2021), DUPLEX (Ke et al., 2024), and Dir GNN (Rossi et al., 2023). We evaluate the performance by node classification accuracy with standard deviation for 10 runs in the semi-supervised setting. For Squirrel and Chameleon, we use 10 public splits (48%/32%/20% for training/validation/testing) provided by (Pei et al., 2019). For the remaining datasets, we adopt the same splits as (Tong et al., 2020a; 2021), which chooses 20 nodes per class for the training set, 500 for the validation set, and allocates the rest to the test set. We conduct our experiments on 2 Intel Xeon Gold 5215 CPUs and 1 NVIDIA Ge Force RTX 3090 GPU.

C.2. Data Statistics

The datasets used in Section 5 are Squirrel, Chameleon (Rozemberczki et al., 2021), Citeseer (Sen et al., 2008), Cora ML (Bojchevski & G unnemann, 2017), AM-Photo (Shchur et al., 2018), Snap-Patents, Roman-Empire, and Arxiv-Year (Rossi et al., 2023). We summarize their statistics in Table 7. homo ratio represents the homophily ratio, a metric proposed by Zhu et al. (2020). which is employed to gauge the degree of homophily within the graph. A lower homo ratio signifies a greater degree of heterophily, indicating a higher prevalence of edges that connect nodes of differing classes.

Commute Graph Neural Networks

Table 7: Statistics of the datasets.

Dataset N |E| # Feat. # Classes homo ratio

Squirrel 5,201 217,073 2,089 5 0.22 Chameleon 2,277 36,101 2,325 5 0.23 Cora-ML 2,995 8,416 2,879 7 0.79 Citeseer 3,312 4,715 3,703 6 0.74 AM-Photo 7,650 238,162 745 8 0.83 Snap-Patents 2,923,922 13,975,791 269 5 0.22 Roman-Empire 22,662 44,363 300 18 0.05 Arxiv-Year 169,343 1,166,243 128 40 0.22

C.3. Hyperparameter Settings

For our model, we tune the hyperparameters based on the highest average validation accuracy. We utilize the randomized truncated SVD algorithm for computing the Moore-Penrose pseudoinverse of matrix R, setting the required rank q to 5 for all datasets. The learning rate lr is selected from {0.01, 0.005}, and the weight decay wd from {0, 5e 5, 5e 4}. In the model architecture, the number of layers L vary among {1, 2, 3, 4, 5} and the dimension d is selected from {32, 64, 128, 256, 512}. The comprehensive hyperparameter configurations for CGNN are detailed in Table 8.

Table 8: Hyperparameters specifications.

Dataset lr wd L d

Squirrel 0.005 0 5 128 Chameleon 0.01 0 4 128 Cora ML 0.01 0 2 64 Citeseer 0.01 0 2 128 AM-Photo 0.005 0 2 512 Snap-Patents 0.01 0 2 32 Roman-Empire 0.01 5e 4 2 64 Arxiv-Year 0.01 5e 4 2 64

D. Additional Experiments

D.1. Detailed Experimental Results on Node Classification

Table 1 in Section 5 presents the results from experiments conducted on all eight directed graph datasets. For each baseline, experiments were carried out on both the original directed graph datasets and their undirected counterparts, which feature symmetrized adjacency matrices. The superior accuracy results from these two settings are reported in Table 1. This section provides a detailed exposition of the experimental outcomes for these configurations in Table 9 and Table 10. It is important to note that while GCN is traditionally a spectral method suited only for undirected graphs, it can be adapted to directed graphs by interpreting it from a spatial perspective, specifically, by aggregating outgoing neighbors with the weight 1

This adaptation allows GCN to be applicable in both experimental settings. Additionally, APPNP, GPRGNN, and GCNII are spectral methods that require symmetrized adjacency matrices for spectral filters. Therefore, we only report their results under the undirected settings in Table 1. For Dir GNN and CGNN, in the case of undirected graphs, these models degenerate to Graph SAGE.

D.2. Sensitivity Analysis

We investigate the sensitivity of CGNN to key hyperparameters that influence its performance, specifically focusing on the number of layers L and the dimension of the hidden layer d . We explore a range of values for L, considering {1, 2, 3, 4, 5}, and for d , considering {32, 64, 128, 256, 512}. From Figure 5, we observe that we observe that CGNN achieves optimal performance with L = 5 and d = 128 on Squirrel, and with L = 2 and d = 64 on Cora ML. This suggests that deeper models are necessary to effectively aggregate valuable information in heterophilic graphs, whereas in homophilic graphs, leveraging local neighborhood information is generally adequate.

Commute Graph Neural Networks

32 64 128 256 512 d

1 2 3 4 5 L

74.01 71.36 74.92 76.01 72.18

72.85 73.82 74.16 74.05 71.99

74.22 74.20 74.19 74.64 74.51

73.81 73.15 74.52 73.89 74.20

74.24 75.83 75.83 75.74 74.88

(a) Squirrel

32 64 128 256 512 d

1 2 3 4 5 L

76.88 76.64 75.82 75.81 76.18

76.99 77.08 77.05 76.24 76.15

76.13 76.66 76.84 75.52 76.55

75.19 73.38 75.62 74.11 74.56

73.27 72.88 72.10 72.11 72.88

(b) Cora ML

Figure 5: Sensitivity analysis on Squirrel and Cora ML.

Table 9: Comparison of node classification accuracy between original directed graphs and their undirected counterparts on Squirrel, Chameleon, Citeseer, and Cora ML.

Squirrel Chameleon Citeseer Cora ML

Method Dir. Undir. Dir. Undir. Dir. Undir. Dir. Undir.

GCN 52.43 2.01 51.93 1.19 63.37 0.92 67.96 1.82 64.27 1.56 66.03 1.88 68.73 0.24 70.92 0.39 GAT 40.72 1.55 40.50 1.47 60.69 1.95 59.37 1.52 65.58 1.39 54.22 0.98 72.20 0.49 72.22 0.57 Graph SAGE 35.19 0.54 41.61 0.74 58.20 1.19 62.01 1.06 62.57 0.71 66.81 1.38 74.16 1.55 72.98 0.90 Mix Hop 39.25 0.91 43.80 1.48 60.50 2.53 60.15 1.22 56.09 2.08 54.71 0.50 65.89 1.50 61.20 0.91 DGCN 37.16 1.72 38.24 1.19 50.7 3.31 48.26 1.97 66.37 1.93 62.15 0.80 75.02 0.50 73.11 0.68 Di GCN 33.44 2.07 28.17 1.90 50.37 4.31 43.08 5.77 64.99 1.72 64.35 1.64 77.03 0.70 76.98 1.00 Mag Net 39.01 1.93 35.20 1.65 58.22 2.87 55.46 3.10 65.04 0.47 64.90 0.51 76.32 0.10 76.29 0.08 DUPLEX 57.60 0.98 55.26 1.10 61.25 0.94 61.20 0.75 67.60 0.72 67.35 0.70 72.26 0.71 72.21 0.65 Di GCL 35.82 1.73 33.10 0.94 56.45 2.77 51.16 3.85 67.42 0.14 66.53 0.10 77.53 0.14 76.24 0.05

Table 10: Comparison of node classification accuracy between original directed graphs and their undirected counterparts on AM-Photo, Snap-Patents, Roman-Empire, and Arxiv-Year.

AM-Photo Snap-Patents Roman-Empire Arxiv-Year

Method Dir. Undir. Dir. Undir. Dir. Undir. Dir. Undir.

GCN 88.52 0.47 85.33 0.25 51.02 0.06 50.15 0.04 73.69 0.74 73.58 0.37 46.02 0.26 44.81 0.19 GAT 88.36 1.25 87.50 1.77 OOM OOM 49.18 1.35 43.37 1.02 45.30 0.23 43.27 0.09 Graph SAGE 89.71 0.57 86.23 1.25 67.45 0.53 60.10 0.26 86.37 0.80 84.26 0.28 55.43 0.75 51.19 0.73 Mix Hop 87.17 1.30 85.50 1.01 40.17 0.10 41.22 0.19 43.00 0.06 50.76 0.14 45.30 0.26 41.25 0.50 DGCN 87.74 1.02 86.53 1.77 OOM OOM 51.92 0.43 50.50 0.47 OOM OOM Di GCN 88.66 0.51 87.94 0.23 OOM OOM 52.71 0.32 50.43 0.21 48.37 0.19 47.26 0.11 Mag Net 86.80 0.65 85.21 0.20 OOM OOM 88.07 0.27 82.99 0.80 60.29 0.27 55.25 0.10 DUPLEX 85.19 0.73 87.80 0.82 64.92 0.10 66.54 0.11 79.02 0.08 77.64 0.07 64.37 0.27 62.12 0.18 Di GCL 89.41 0.11 87.36 0.20 70.65 0.07 68.62 0.08 87.94 0.10 84.00 0.28 63.10 0.06 59.02 0.02