# dynamic_spiking_graph_neural_networks__3f78d00f.pdf Dynamic Spiking Graph Neural Networks Nan Yin1, Mengzhu Wang2, Zhenghan Chen3, Giulia De Masi4, Huan Xiong5,1*, Bin Gu6,1* 1Mohamed bin Zayed University of Artificial Intelligence 2School of Artificial Intelligence, Hebei University of Technology 3Microsoft Corporation 4Technology Innovation Institute 5Harbin Institute of Technology 6Jilin University {yinnan8911,dreamkily,pandaarych,jsgubin, huan.xiong.math}@gmail.com, giulia.demasi@tii.ae The integration of Spiking Neural Networks (SNNs) and Graph Neural Networks (GNNs) is gradually attracting attention due to the low power consumption and high efficiency in processing the non-Euclidean data represented by graphs. However, as a common problem, dynamic graph representation learning faces challenges such as high complexity and large memory overheads. Current work often uses SNNs instead of Recurrent Neural Networks (RNNs) by using binary features instead of continuous ones for efficient training, which would overlooks graph structure information and leads to the loss of details during propagation. Additionally, optimizing dynamic spiking models typically requires propagation of information across time steps, which increases memory requirements. To address these challenges, we present a framework named Dynamic Spiking Graph Neural Networks (Dy-SIGN). To mitigate the information loss problem, Dy-SIGN propagates early-layer information directly to the last layer for information compensation. To accommodate the memory requirements, we apply the implicit differentiation on the equilibrium state, which does not rely on the exact reverse of the forward computation. While traditional implicit differentiation methods are usually used for static situations, Dy-SIGN extends it to the dynamic graph setting. Extensive experiments on three large-scale real-world dynamic graph datasets validate the effectiveness of Dy-SIGN on dynamic node classification tasks with lower computational costs. Introduction Graph Neural Networks (GNNs) (Scarselli et al. 2008) have been widely applied in various fields to learn the graph representation by capturing the dependencies of nodes and edges, such as relation detection (Schlichtkrull et al. 2018; Chen, Li, and Tang 2020; Mi and Chen 2020; Xu et al. 2019b) and recommendation (Wu et al. 2022). However, most GNNs are primarily designed for static or nontemporal graphs, which cannot meet the requirement of dynamic evolution over time in practice. The established solution is to extend the dynamic graphs into sequence models directly. Typically, these methods (Yin et al. 2023b; Shi *Corresponding authors. Copyright 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. et al. 2021; Kumar, Zhang, and Leskovec 2019; Yin et al. 2022a) utilize the Recurrent Neural Networks (RNNs) (Cho et al. 2014) to capture the dynamic evolution of graphs for numerous downstream tasks such as time series prediction or graph property prediction (Yin et al. 2022b; Wieder et al. 2020; Yin et al. 2023a). Despite the promising performance of dynamic graphs, the majority of these approaches typically involve complex structures that consume significant computational resources during training and testing. Inspired by the way the brain process information, Spiking Neural Networks (SNNs) represent the event or clock-driven signals as inference for updating the neuron nodes parameters (Brette et al. 2007). Different from traditional deep learning methods, SNNs utilize discrete spikes information instead of continuous features, resulting in significantly lower power consumption during model training. Considering the inherent characteristics of SNNs (Maass 1997; Pfeiffer and Pfeil 2018; Schliebs and Kasabov 2013), a few recent works (Zhu et al. 2022; Xu et al. 2021; Li et al. 2023) have attempted to integrate SNNs into the GNNs framework to tackle the issue of high computational complexity. These methods transform the node features into a series of spikes with Poisson rate coding, and follow a graph convolution layer with SNN neurons, which employ a membrane potential threshold to convert continuous features to spike information (Kim et al. 2020; Bu et al. 2022). Although Spiking Graph Networks (SGNs) are gradually gaining attention, the use of SNNs in dynamic graphs is still less explored, which is a more common scenario in life. To address the gap, the work focuses on the problem of spiking dynamic graph, which applies SNNs for dynamic graph node classification. However, the problem is highly challenging due to the following reasons: (1) Information Loss. The representation of GNNs includes the information on graph structure and neighboring nodes, which are crucial for downstream tasks. However, SNNs employ spike signals instead of continuous features, leading to the loss of details regarding the structure and neighbors. Moreover, with the evolution of graphs over time, the information loss issue may further deteriorate the graph representation. (2) Memory Consumption. The RNN-based dynamic graph methods typically require signif- The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) icant memory resources to store the temporal node information (Sak, Senior, and Beaufays 2014). Moreover, SNNs inherently operate with multiple time latencies (i.e., calculate the spike signals with time latency steps in each SNN layer). If we simply replace the GNN layer with the SNN layer on each time step, we need to store the temporary spikes in SNN layer and temporal information at each time step simultaneously, which further exacerbates memory consumption. In this paper, we present a novel framework named Dynamic Spiking Graph Neural Networks (Dy-SIGN) for node classification. The primary insight of proposed Dy SIGN is to thoroughly explore how to apply SNNs to dynamic graphs, and address the challenges of information loss and memory consumption by using the information compensation mechanism and implicit differentiation on the equilibrium state. On the one hand, the information compensation mechanism aims to make up for the information loss during forward propagation. However, implementing the mechanism in each layer of SNNs would significantly increase the model complexity. Thus, we propose to establish an information channel between the shallow and final layers to incorporate the original information directly into feature representations. This approach not only reduces the model complexity but also mitigates the impact of information loss. On the other hand, inspired by recent advances in implicit methods (Bai, Koltun, and Kolter 2020; Xiao et al. 2021) that view neural networks as solving an equilibrium equation for fixed points and provide alternative implicit models defined by the equation, we provide a variation training method that is suitable for dynamic spiking graph neural networks. Specifically, Dy-SIGN simplifies the nondifferentiable items in backpropagation, thus avoiding the huge computational overhead of traditional SNNs due to the use of surrogate learning techniques. In this way, the calculation of the gradient would significantly reduce memory consumption. We conduct extensive experiments to demonstrate the effectiveness of proposed Dy-SIGN in comparison to the state-of-the-art methods across various scenarios. In summary, our contributions are as follows: (1) Motivation: From the perspective of practical application and data analysis, we propose the Dy-SIGN, which is the first attempt to introduce implicit differentiation into dynamic graph. (2) Methodology: We propose a novel approach called Dy SIGN that incorporates SNNs into dynamic graphs to release the information loss and memory consumption problem. (3) Experiments: Extensive experiments validate the superiority of the proposed Dy-SIGN over the state-of-the-art methods. Related Work SNN Training Methods In the supervised training of SNNs, there are two primary research directions. One direction focuses on establishing a connection between the spike representations of SNNs, such as firing rates, and equivalent Artificial Neural Networks (ANNs) activation mappings. This connection enables the ANN-to-SNN conversion (Diehl et al. 2015; Hunsberger and Eliasmith 2015; Rueckauer et al. 2017; Rathi et al. 2020), and the optimization of SNNs using gradients computed from this equivalent mappings (Thiele, Bichler, and Dupret 2020; Wu et al. 2021; Zhou et al. 2021; Xiao et al. 2021; Meng et al. 2022). These methods usually require a relatively large number of time-steps to achieve performance comparable to ANNs, suffering from high latency and usually more energy consumption. The other direction is to directly train SNNs with back-propagation (Bohte, Kok, and La Poutre 2000; Esser et al. 2015; Bellec et al. 2018; Huh and Sejnowski 2018), which typically employs the surrogate gradients (Shrestha and Orchard 2018) method to overcome the non-differentiable nature of the binary spiking functions and direct train SNNs from scratch. This follows the backpropagation through time (BPTT) framework. BPTT with surrogate gradients can achieve extremely low latency, however, it requires large training memory to maintain the computational graph unfolded over time. Dynamic GNNs Dynamic GNNs have achieved impressive performance in various tasks. With the help of RNNs (Cho et al. 2014), static GNNs can be extended to model dynamic processes by employing RNN architectures (Rossi et al. 2020; Pareja et al. 2020; Shi et al. 2021; Rossi et al. 2020; Xu et al. 2019a). TGN (Rossi et al. 2020) and JODIE (Kumar, Zhang, and Leskovec 2019) update the node hidden state by RNN units for representation learning. Evolve GCN (Pareja et al. 2020) uses the RNN to regulate the model parameters on each time step. However, the RNN-based dynamic graph methods could save the historical information for graph representation, they typically require massive computational costs and memory consumption. To effectively model the dynamic evolution of graphs while minimizing computational and memory requirements, we introduce the implicit models and SNNs into dynamic GNNs. Feedback Models Implicit models are promising approaches to deep learning that utilize implicit layers to determine the outputs. In contrast to explicit models, which typically require storing intermediate activations for backpropagation, implicit models use the fixed-point solution (Bai, Kolter, and Koltun 2019; Bai, Koltun, and Kolter 2020) to perform backpropagation without saving these intermediate activations. This results in constant complexity for the implicit models, which is a significant advantage for large models. DEQ (Bai, Kolter, and Koltun 2019) demonstrates the ability of implicit models in sequence modeling. MDEQ (Bai, Koltun, and Kolter 2020) incorporates multiscale modeling into implicit deep networks, enabling tasks such as image classification and semantic segmentation. To further enhance the efficiency of implicit models, (Gu et al. 2020) extend the concept of implicit fixed-point equilibrium to graph learning, to address the problem of evaluation and training for recurrent GNNs. (Liu et al. 2022) propose a multiscale graph neural network with implicit layers to model multiscale information on graphs at multiple resolutions. Although implicit models have shown promise in various areas, their application to dynamic spiking graphs is still relatively unexplored. The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) Preliminary Spiking Neuron Models SNNs utilize binary activations in each layer, which limits the representation capacity. To address the issue, SNNs introduce a temporal dimension, known as latency K. In the forward pass of SNNs, inputs are presented as streams of events and repeated for K time steps to produce the final result. The leaky-integrate-and-fire (LIF) model is commonly used to describe the dynamics of spiking neurons. In LIF, each neuron integrates the received spikes as the membrane potential u ,i, which can be formulated as a first-order differential equation, d = (u urest)R I( ), u < Vth , (1) where I( ) is the input current, Vth is the spiking threshold, and R and λ are resistance and time constant, respectively. When u reaches Vth at time , a spike is generated and u is reset to the resting potential u = urest, which is usually taken as 0. The spike train is expressed by the Dirac delta function: s = P tf δ( tf). We consider a simple current model I ,i = P j wijs ,j + b, where wij is the weight from neuron j to neuron i. Then, the general form of LIF is described as: u +1,i = λ(u ,i Vths ,i) + P j wijs ,j + b, s +1,i = H(u +1,i Vth), (2) where H(x) is the Heaviside step function, which is the nondifferentiable spiking function. s ,i is the binary spike train of neuron i, and λ < 1 is a leaky term related to the constant m and discretization time interval used in the LIF model. The constant R, λ, and time step-size are absorbed into the weights wij and bias b. The training of SNNs follows the process of BPTT, and the gradients with K time latency steps are calculated with: @sl+1 @ul+1 @ul+1 i+1 @ul+1 i + @ul+1 i+1 @sl+1 i @sl+1 i @ul+1 i ! @ul+1 k @W l (3) where W l is the trainable matrix on l-th layer and L is the loss. The terms @sl @ul are non-differentiable, and surrogate derivatives are typically used instead. Dynamic GNNs Given the dynamic graph G = {G1, , Gt, , GT } with T time steps. On each snapshot Gt = (At, Xt, Vt, Et), where At is the adjacency matrix, Xt 2 RN d is N node features with dimension d, Vt = {vt 1, , vt N} and Et are the set of nodes and edges on time step t. Dynamic graph methods typically extract the graph features on each time step and then model the evolution over time, which is formulated as: ht,l v = Ct,l ' ht,l 1 v , At,l ' {ht,l 1 u }u2N (v) (( , ht+1 v = Evo(h1,L v , , ht,L v ), (4) where N(v) is the neighbors of v. At,l and Ct,l denote the aggregation and combination operations at the l-th layer on time step t, respectively. L is the number of layers of a graph and Evo means the evolution operation over time 1 to t, which is typically implemented with RNN (Cho et al. 2014) or LSTM (Hochreiter and Schmidhuber 1997). The Proposed Dy-SIGN Overview This paper introduces a novel approach named Dy-SIGN for semi-supervised dynamic spiking graph node classification. Recognizing that if the time latency in SNN tends to infinity, models will retain the details information of input. However, since the large would cause the vanishing gradient problem (Hochreiter 1998) and significantly increase the complexity, we propose the information compensation mechanism that bridges the feature from the beginning to the last layer and includes the shallow representation in the final embedding. Additionally, we propose a variation of the training method in dynamic graph, which is proved to be equivalent to the implicit differentiation. This method simplifies the calculation of gradient, which relies on the equation of implicit differentiation rather than the forward procedure, thereby reducing memory consumption. The detailed illustration of our Dy-SIGN can be seen in Figure 1, and we will introduce Dy-SIGN in detail. Information Compensation Spiking Graph Neural Network Spiking Graph Network (SGN) (Zhu et al. 2022; Xu et al. 2021) usually applies the Bernoulli encoding to transform the node representation to the spike signals for propagation. Specifically, on time step t 2 [1, , T] with T denotes the length of dynamic graph time window, we have the graph Gt = (At, Xt), where At is the adjacency matrix of Gt and Xt is the node features. SGN first encodes the initial features into binary signals { Xt,1, , Xt, , , Xt,K}, where 2 [1, , K] means the time latency step in SGN, and K is the length of SGN time latency window. Then, the layer-wise (update on different layers) spiking graph propagation is defined as: sl t, = Φ ' Cov(A, sl 1 t, ), sl t, 1 ( , (5) where Cov is the graph convolutional layer which is the same as Equation 4, Sl t, denotes the nodes spiking features on time step t and time latency step with l-th layer. S1 t,1 = Xt,1, Φ( ) is the non-linear function that combine historical information Sl t, 1 into current state. After that, the temporal-wise (update on each time latency in SGN) membrane potentials and firing rate follows: ul t, +1 = λul t, (1 sl t, ) + P W l Con(A, sl 1 t, +1), sl t, +1 = H(ul t, +1 Vth). For the specific node representation xi = [xi1, , xid], where xi denotes the i-th node features of graph, and d is the feature dimension. The spiking signals are sampled with Bernoulli distribution with K time latency in SGN, which The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) Information Compensation "# $ "!&$ "" $ . . . . . . . . . . . . SGN SGN SGN $#!"# $%# $#! $%# $#!%# $%# . . . . . . . . . . . . Predict Figure 1: An overview of the proposed Dy-SIGN. The Spiking Graph Neural Network (SGN) combines the SNNs and GNNs for node representation learning. The information compensation mechanism transfers the information from the shallow layer to the last to mitigate the information loss issue. The variation training method is applied to calculate the fix-point on each time latency in SGN and is used for dynamic prediction. is denoted as xi = { x1,i, , x ,i, , x K,i} with x ,i = [ x ,i1, , x ,id]. Then we have P( x ,ij = 1) = xij and P( x ,ij = 0) = 1 xij. Assume the parameters of each spike neuron are wi = [wi1, , wid], the combined spike input zi for the next SGN layer holds: j=1 wij x ,ij, j=1 wij E( x ,ij) = j=1 wijxij. According to (Chung and Lu 2002), the error bound of SGN holds: lim !1 P (z ,i < E(z ,i) ) e 2/2σ, lim !1 P (z ,i > E(z ,i) + ) e 2/2(σ+ ˆ wi /3), (7) where ˆwi = max{wi1, , wid}. From Equation 7, we observe that as ! 1, the difference between SGN and GNN will be with the probability of p = e 2/2(σ+ ˆ wi /3) to exceed the upper and lower bounds. This reveals that the spiking signals would preserve the details information of continuous features when ! 1. However, with the increase of , SGN becomes difficult to train and may suffer from the vanishing gradient problem due to the coefficient of λ in Equation 6. To address the issue, we design the information compensation mechanism for SGN that directly transfers features from the first layer to the last layer for node embeddings. Formally: u1 = λu1 1 + W 1s N + F 1x Vths1 , ul = λul 1 + F lsl 1 Vthsl , l = 2, , N, where ul denotes the neuronal membrane potential at time on l-th layer. F l and W l are the trainable parameters on the l-th layer, F 1 is the information compensation matrix. In this way, the information compensation SGN follows the form of multi-layer structures of feedback model (Bai, Kolter, and Koltun 2019; Bai, Koltun, and Kolter 2020). This type of structure has several potential advantages: (1) The forward and backward procedures are decoupled, avoiding the problem of non-differentiable spiking functions. (2) Using implicit differentiation in the equilibrium state, we can compute the gradient without saving the exact forward procedure, thus reducing memory consumption. Variation of Training SGN The traditional training method of SGN follows BPTT, which replaces the non-differentiable term @sl @ul with the surrogate derivatives in Equation 3. However, BPTT relies on multiple backpropagation paths, with would consume a large amount of memory. Similarly to (Xiao et al. 2022), we set the gradient of the Heaviside step function to 0, which is formulated as @ul +1 @sl+1 @sl+1 @ul+1 = 0. Then, the gradient of pa- The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) rameters W l is: @sl+1 @ul+1 k λ k @ul+1 k @W l @sl+1 @ul+1 During the forward procedure is ˆal +1 = λˆal + sl +1, where the presynaptic activities can be denoted as ˆal +1 = P k λ ksl k. By calculating the gradient of @L @sl+1 @sl+1 @ul+1 , we can directly compute the value of Equation 8 with- out considering the backpropagation through @ul+1 +1 @ul+1 , which would decrease the complexity and memory consumption of the model. On each time latency , the output is denoted as ˆal+1 , and the output of the SGN is {ˆal 1, , ˆal , , ˆal K} on l-th layer. Comparing with Feedback Model Note that, in the spiking dynamic graph framework, t 2 [1, , T] stands for the time steps of each graph, and 2 [1, , K] is the time latency in SGN. At each time step t, we apply the information compensation SGN to extract the graph features as Equation 8. In the traditional feedback model, the weighted average firing rate and inputs are denoted as at,K = PK =1 λK st, PK =1 λK and xt,K = PK =0 λK xt, PK =0 λK , where st, and xt, denote the firing rate and input on time latency in SGN and on time step t of dynamic graph. The LIF model approximate an equilibrium point a? t that satisfies a? t = σ 1 Vth (W a? t + F x? t ) . The characteristic is similar to the presynaptic activities ˆal t, +1 = P k λ ksl t,k. Considering the last time latency K on layer l, we have ˆal t,K = PK =1 λK sl t, , which equals to Cal t,K with C = PK =1 λK . In other words, the fire rate ˆa in at the last time latency step K on layer l is equivalent to the traditional feedback model. Thus, we prove that the variation of training SGN is equivalent to the traditional feedback model, and we can calculate the gradient of parameters with Equation 8 directly. Dynamic Spiking Graph Neural Network The proposed information compensation SGN is designed for a fixed time step t. However, since the graphs may change over time, it remains a challenge to integrate the temporal dynamics of SGN with dynamic graphs. We propose a novel method by propagating the medium at,K = {a1 t,K, , al t,K, , a L t,K} at different time steps. At time step t+1, we set the initial membrane potential to ut+1,1 = at,K, and the update process of membrane potentials is: ut+1,K = λut+1,K 1 + W st+1,K 1 + F xt+1,K 1 Vthst+1,K. (9) Algorithm 1: Learning Algorithm of Dy-SIGN Input: Dynamic graph G = {G1, , GT }; Label y; Network parameters ; Network layers L; Time latency of SGN K. Output: Trained model parameters . 1: Initialize . 2: // Forward: 3: for t = 1, , T do 4: for l = 1, , L do 5: Calculate the average firing rate al t,K with Equation 10; 6: Collect the fixed point representation a? t on layer L and time step t; 7: end for 8: end for 9: Calculate the output of Dy-SIGN ˆy and the loss L with Equation 11; 10: // Backward: 11: for l = L, , 1 do 12: Calculate the gradient of SGN with Equation 8; 13: Update the parameters . 14: end for The average firing rates is defined as at+1,K = PK =1 λK st+1, PK =1 λK , the average inputs as xt+1,K = PK =0 λK xt+1, PK =0 λK , and ut+1,1 = at,K, st+1,1 = 0. Then, we have: PK 2 i=0 λi PK 1 i=0 λi W at+1,K 1 + F xt+1,K 1 ut+1,K PK 1 i=0 λi + at,K PK 1 i=0 λi PK 2 i=0 λi PK 1 i=0 λi W at+1,K 1 + F xt+1,K 1 ut+1,K Vth PK 1 i=0 λi + at,K Vth PK 1 i=0 λi , where σ(x) = 1, x > 1 x, 0 x 1 0, x < 0 . For each time step t, we will first calculate the equilibrium point at,K ! a? t , which is fixed for time step t + 1. Therefore, the LIF model gradually approximates the equilibrium point a? t that satisfies a? t = σ 1 Vth (W a? t + F x? t ) with xt,K ! x? t . Finally, we have fixed point representation over time, i.e., a? = {a? 1, , a? t , , a? T }. We concatenate all the embeddings for the final node classification, which is formu- The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) Methods DBLP Tmall Patent 40% 60% 80% 40% 60% 80% 40% 60% 80% Deep Walk 67.08 67.17 67.12 49.09 49.29 49.53 72.32 0.9 72.25 1.2 72.05 1.1 Node2Vec 66.07 66.81 66.93 54.37 54.55 54.58 69.01 0.9 69.08 0.9 68.99 1.0 HTNE 67.68 68.24 68.36 54.81 54.89 54.93 - - - M2DNE 69.02 69.48 69.75 57.75 57.99 58.47 - - - Dy Triad 60.45 64.77 66.42 44.98 48.97 51.16 - - - MPNN 64.19 0.4 63.91 0.3 65.05 0.5 47.71 0.8 47.78 0.7 50.27 0.5 - - - JODIE 66.73 1.0 67.32 1.1 67.53 1.3 52.62 0.8 54.02 0.6 54.17 0.2 77.57 0.8 77.69 0.6 77.67 0.4 Evolve GCN 67.22 0.3 69.78 0.8 71.20 0.7 53.02 0.7 54.99 0.7 55.78 0.6 79.67 0.4 79.76 0.5 80.13 0.4 TGAT 71.18 0.4 71.74 0.5 72.15 0.3 56.90 0.6 57.61 0.7 58.01 0.7 81.51 0.4 81.56 0.6 81.57 0.5 Spike Net 70.88 0.4 71.98 0.3 74.64 0.5 58.84 0.4 61.13 0.8 62.40 0.6 83.53 0.6 83.85 0.7 83.90 0.6 Dy-SIGN 70.94 0.1 72.07 0.1 74.67 0.5 57.48 0.1 60.94 0.2 61.89 0.1 83.57 0.3 83.77 0.2 83.91 0.2 Table 1: Macro-F1 score comparisons on three dynamic graph datasets with different training ratios. The results are averaged over five runs, and the best results are in boldface. - denotes time-consuming. Methods DBLP Tmall Patent 40% 60% 80% 40% 60% 80% 40% 60% 80% Deep Walk 66.53 66.89 66.38 57.11 57.34 57.88 71.57 1.3 71.53 1.0 71.38 1.2 Node2Vec 66.80 67.37 67.31 60.41 60.56 60.66 69.01 0.9 69.08 0.9 68.99 1.0 HTNE 67.68 68.24 68.36 54.81 54.89 54.93 - - - M2DNE 69.23 69.47 69.71 64.21 64.38 64.65 - - - Dy Triad 65.13 66.80 66.95 53.24 56.88 60.72 - - - MPNN 65.72 0.4 66.79 0.6 67.74 0.3 57.82 0.7 57.66 0.5 58.07 0.6 - - - JODIE 68.44 0.6 68.51 0.8 68.80 0.9 58.36 0.5 60.28 0.3 60.49 0.3 77.64 0.7 77.89 0.5 77.93 0.4 Evolve GCN 69.12 0.8 70.43 0.6 71.32 0.5 59.96 0.7 61.19 0.6 61.77 0.6 79.39 0.5 79.75 0.3 80.01 0.3 TGAT 71.10 0.2 71.85 0.4 73.12 0.3 62.05 0.5 62.92 0.4 63.32 0.7 80.79 0.7 80.81 0.6 80.93 0.6 Spike Net 71.98 0.3 72.35 0.8 74.86 0.5 63.52 0.7 64.84 0.4 66.10 0.3 83.48 0.8 83.80 0.7 83.88 0.9 Dy-SIGN 71.90 0.1 72.61 0.4 74.96 0.2 62.93 0.3 64.10 0.3 65.82 0.2 83.50 0.2 83.47 0.1 83.90 0.2 Table 2: Micro-F1 score comparisons on three dynamic graph datasets with different training ratios. The results are averaged over five runs, and the best results are in boldface. - denotes time-consuming. lated as: ˆy = FC(||T t=1a?), L = X r2y L yrlnˆyr, (11) where || denotes the concatenation operation, FC is the fully connect layer, ˆy is the nodes prediction, y is the groundtruth labels, y L means the set of labeled nodes, and L is the cross-entropy loss. In summary, the proposed dynamic spiking graph neural network has several advantages. Firstly, a? t is the fixed point of each time step on different layers, which can be considered as the hidden embeddings of time step t. Compared to traditional RNN-based dynamic graph methods, directly propagating a? t over time would reduce the computational cost of calculating the hidden states and lower the model complexity. Secondly, traditional static feedback models usually set the initial states to 0, which cannot meet the requirement of dynamic graphs. By sending the previous state to the next time step, the model is able to capture the long temporal dependency for prediction. The detailed algorithm is shown in Algorithm 1. Experiments Experimental Settings To verify the effectiveness of the proposed Dy-SIGN, we conduct experiments on three large real-world graph datasets, i.e., DBLP (Lu et al. 2019), Tmall (Lu et al. 2019) and Patent (Hall, Jaffe, and Trajtenberg 2001). The statistics and details introduction are presented in Appendix. We compared Dy-SIGN with various competing methods, including two static graph methods ( i.e., Deep Walk (Perozzi, Al-Rfou, and Skiena 2014) and Node2Vec (Grover and Leskovec 2016)), seven dynamic graph methods (i.e., HTNE (Zuo et al. 2018), M2DNE (Lu et al. 2019), Dy Triad (Zhou et al. 2018), MPNN (Panagopoulos, Nikolentzos, and Vazirgiannis 2021), JODIE (Kumar, Zhang, and Leskovec 2019), Evolve GCN (Pareja et al. 2020) and TGAT (da Xu et al. 2020)), and one spiking method Spike Net (Li et al. 2023). The details are introduced in Appendix. As for the implementation, we follow the same settings with (Li et al. 2023) and report the Macro-F1 and Micro-F1 results under different training ratios (i.e., 40%, The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) (b) Training time %&'$% (sec/epoch) (a) Memory Consumption (c) Sensitivity Analysis Figure 2: (a) The memory consumption of Spike Net and Dy-SIGN on DBLP dataset. (b) The training time of different methods. (c) Hyperparameter sensitivity analysis of λ. 60%, and 80%). Besides, we use 5% for validation. The hidden dimension of all the methods is set to 128, and the batch size to 1024. The total training epochs are 100 and the learning rate is 0.001. Performance Comparison Comparison of Performance. The classification results on different datasets under various training ratios are presented in Table 1 and 2. From the results, we find that the proposed Dy-SIGN achieves competitive performance compared with other methods. From the results, we have the following observations: (1) The static methods Deep Walk and Node2Vec perform worse than the others, which indicates that simply applying static graph methods to dynamic graphs ignores the contribution of historical information for representation. (2) The methods HTNE, M2DNE, Dy Trid, and MPNN fail to learn meaningful representations on the large-scale Patent dataset. The potential reason behind this is the high computational complexity of these models, leading to extensive time consumption for both training and prediction. This makes it impractical to achieve competitive performance within an acceptable time frame. (3) Although the spiking methods Spike Net and Dy-SIGN apply the binary information for representation learning, the performance is still better than JODIE, Evolve GCN, and TGAT. This phenomenon indicates that the spike-based learning method is also competitive in both computational complexity and performance compared to the traditional methods. (4) The proposed Dy-SIGN outperforms Spike Net in half of the settings. However, as shown in Figure 2 (a), the memory consumption of Dy-SIGN is about only half of Spike Net under the same experiment environment, demonstrating the superiority of Dy-SIGN. We attribute this to the fact that by applying the variation training method, Dy-SIGN achieves more efficient results. Comparison of Runtime Complexity. We further compare the runtime complexity between Dy-SIGN with Spike Net, JODIE, Evolve GCN, and TGAT, which is shown in Figure 2 (b). From the results, we find that the SNN-based methods are significantly more efficient than the ANNs methods. The reason for this is attributed to the fact that SNN-based methods use binary signals instead of continuous features, allowing the matrix multiplication operation to be replaced by an accumulation operation. Additionally, the proposed Dy-SIGN is slightly more efficient than Spike Net method, the potential reason is that Dy-SIGN uses the simple form to calculate the gradient with Equation 8, ignoring the time-consuming calculation of gradient in BPTT. Sensitivity Analysis In this section, we examine the impact of hyperparameters on the performance of our proposed Dy-SIGN. Specifically, the parameter λ determines the amount of information retained for the next time latency step representation. We test the values of λ in the range of {0.6, 0.7, 0.8, 0.9, 1, 1.1, 1.2} with other parameters fixed to determine the optimal value. The results are depicted in Figure 2 (c). From the results, we observe that as the value of λ increases, the performance initially improves and then gradually declines. We attribute the reason to the fact that the smaller λ cannot provide sufficient historical information for effective representation learning. On the other hand, larger λ may lead to worse performance since the current time latency step representation may be influenced by too much historical information. Thus, we set the default value of λ to 1. In this paper, we study the problem of combining SNNs with dynamic graphs using implicit differentiation for node classification and propose a novel method named Dy-SIGN. To tackle the issue of information loss on graph structure and details during SNN propagation, we propose an information compensation mechanism. This mechanism passes the original structure and features to the last layer of the network, which then participates in node representation learning. This structure is very similar to traditional feedback models. Based on this, we use explicit differentiation and a variation training method to address the issue of high memory consumption in the combination of SNNs and dynamic graphs. Extensive experiments on real-world largescale datasets validate the superiority of the proposed Dy SIGN. The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) Acknowledgments This work is part of the research project ENERGY-BASED PROBING FOR SPIKING NEURAL NETWORKS , performed at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), funded by Technology Innovation Institute (TII) (Contract No. TII/ARRC/2073/2021). References Bai, S.; Kolter, J. Z.; and Koltun, V. 2019. Deep equilibrium models. In Neur IPS. Bai, S.; Koltun, V.; and Kolter, J. Z. 2020. Multiscale deep equilibrium models. In Neur IPS. Bellec, G.; Salaj, D.; Subramoney, A.; Legenstein, R.; and Maass, W. 2018. Long short-term memory and learningto-learn in networks of spiking neurons. In Neur IPS, volume 31. Bohte, S.; Kok, J.; and La Poutre, H. 2000. Spike Prop: Backpropagation for Networks of Spiking Neurons Error Backpropagation in a Network of Spik-ing Neurons. In ESANN. Brette, R.; Rudolph, M.; Carnevale, T.; Hines, M.; Beeman, D.; Bower, J. M.; Diesmann, M.; Morrison, A.; Goodman, P. H.; Harris, F. C.; et al. 2007. Simulation of networks of spiking neurons: a review of tools and strategies. Journal of computational neuroscience, 23: 349 398. Bu, T.; Ding, J.; Yu, Z.; and Huang, T. 2022. Optimized potential initialization for low-latency spiking neural networks. In AAAI. Chen, S.; Li, Z.; and Tang, Z. 2020. Relation R-CNN: A Graph Based Relation-Aware Network for Object Detection. IEEE Signal Processing Letters, 27. Cho, K.; Van Merri enboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; and Bengio, Y. 2014. Learning phrase representations using RNN encoderdecoder for statistical machine translation. ar Xiv preprint ar Xiv:1406.1078. Chung, F.; and Lu, L. 2002. Connected components in random graphs with given expected degree sequences. Annals of combinatorics, 6(2): 125 145. da Xu; chuanwei ruan; evren korpeoglu; sushant kumar; and kannan achan. 2020. Inductive representation learning on temporal graphs. In ICLR. Diehl, P. U.; Neil, D.; Binas, J.; Cook, M.; Liu, S.-C.; and Pfeiffer, M. 2015. Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing. In IJCNN. Esser, S. K.; Appuswamy, R.; Merolla, P.; Arthur, J. V.; and Modha, D. S. 2015. Backpropagation for energy-efficient neuromorphic computing. In Neur IPS, volume 28. Fey, M.; and Lenssen, J. E. 2019. Fast graph representation learning with Py Torch Geometric. ar Xiv preprint ar Xiv:1903.02428. Grover, A.; and Leskovec, J. 2016. node2vec: Scalable feature learning for networks. In KDD. Gu, F.; Chang, H.; Zhu, W.; Sojoudi, S.; and El Ghaoui, L. 2020. Implicit graph neural networks. Neur IPS. Hall, B. H.; Jaffe, A. B.; and Trajtenberg, M. 2001. The NBER patent citation data file: Lessons, insights and methodological tools. Hochreiter, S. 1998. The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge Based Systems, 6(02): 107 116. Hochreiter, S.; and Schmidhuber, J. 1997. Long short-term memory. Neural computation, 9(8): 1735 1780. Huh, D.; and Sejnowski, T. J. 2018. Gradient descent for spiking neural networks. In Neur IPS. Hunsberger, E.; and Eliasmith, C. 2015. Spiking deep networks with LIF neurons. ar Xiv preprint ar Xiv:1510.08829. Kim, S.; Park, S.; Na, B.; and Yoon, S. 2020. Spiking YOLO: spiking neural network for energy-efficient object detection. In AAAI. Kumar, S.; Zhang, X.; and Leskovec, J. 2019. Predicting dynamic embedding trajectory in temporal interaction networks. In KDD. Li, J.; Yu, Z.; Zhu, Z.; Chen, L.; Yu, Q.; Zheng, Z.; Tian, S.; Wu, R.; and Meng, C. 2023. Scaling Up Dynamic Graph Representation Learning via Spiking Neural Networks. In AAAI. Liu, J.; Hooi, B.; Kawaguchi, K.; and Xiao, X. 2022. MGNNI: Multiscale Graph Neural Networks with Implicit Layers. In Neur IPS. Lu, Y.; Wang, X.; Shi, C.; Yu, P. S.; and Ye, Y. 2019. Temporal network embedding with micro-and macro-dynamics. In CIKM. Maass, W. 1997. Networks of spiking neurons: the third generation of neural network models. Neural networks, 10(9): 1659 1671. Meng, Q.; Xiao, M.; Yan, S.; Wang, Y.; Lin, Z.; and Luo, Z.-Q. 2022. Training high-performance low-latency spiking neural networks by differentiation on spike representation. In CVPR. Mi, L.; and Chen, Z. 2020. Hierarchical graph attention network for visual relationship detection. In CVPR. Panagopoulos, G.; Nikolentzos, G.; and Vazirgiannis, M. 2021. Transfer graph neural networks for pandemic forecasting. In AAAI. Pareja, A.; Domeniconi, G.; Chen, J.; Ma, T.; Suzumura, T.; Kanezashi, H.; Kaler, T.; Schardl, T.; and Leiserson, C. 2020. Evolvegcn: Evolving graph convolutional networks for dynamic graphs. In AAAI. Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; De Vito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; and Lerer, A. 2017. Automatic differentiation in pytorch. Perozzi, B.; Al-Rfou, R.; and Skiena, S. 2014. Deepwalk: Online learning of social representations. In KDD, 701 710. Pfeiffer, M.; and Pfeil, T. 2018. Deep learning with spiking neurons: Opportunities and challenges. Frontiers in neuroscience, 12: 774. The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) Rathi, N.; Srinivasan, G.; Panda, P.; and Roy, K. 2020. Enabling Deep Spiking Neural Networks with Hybrid Conversion and Spike Timing Dependent Backpropagation. In ICML. Rossi, E.; Chamberlain, B.; Frasca, F.; Eynard, D.; Monti, F.; and Bronstein, M. 2020. Temporal graph networks for deep learning on dynamic graphs. ar Xiv preprint ar Xiv:2006.10637. Rueckauer, B.; Lungu, I.-A.; Hu, Y.; Pfeiffer, M.; and Liu, S.-C. 2017. Conversion of continuous-valued deep networks to efficient event-driven networks for image classification. Frontiers in neuroscience, 11: 682. Sak, H.; Senior, A.; and Beaufays, F. 2014. Long shortterm memory based recurrent neural network architectures for large vocabulary speech recognition. ar Xiv preprint ar Xiv:1402.1128. Scarselli, F.; Gori, M.; Tsoi, A. C.; Hagenbuchner, M.; and Monfardini, G. 2008. The graph neural network model. IEEE Transactions on Neural Networks, 20(1): 61 80. Schlichtkrull, M.; Kipf, T. N.; Bloem, P.; Van Den Berg, R.; Titov, I.; and Welling, M. 2018. Modeling relational data with graph convolutional networks. In European Semantic Web Conference. Schliebs, S.; and Kasabov, N. 2013. Evolving spiking neural network a survey. Evolving Systems, 4: 87 98. Shi, M.; Huang, Y.; Zhu, X.; Tang, Y.; Zhuang, Y.; and Liu, J. 2021. GAEN: graph attention evolving networks. In IJCAI. Shrestha, S. B.; and Orchard, G. 2018. Slayer: Spike layer error reassignment in time. In Neur IPS. Thiele, J. C.; Bichler, O.; and Dupret, A. 2020. Spike Grad: An ANN-equivalent Computation Model for Implementing Backpropagation with Spikes. In ICML. Wieder, O.; Kohlbacher, S.; Kuenemann, M.; Garon, A.; Ducrot, P.; Seidel, T.; and Langer, T. 2020. A compact review of molecular property prediction with graph neural networks. Drug Discovery Today: Technologies, 37: 1 12. Wu, H.; Zhang, Y.; Weng, W.; Zhang, Y.; Xiong, Z.; Zha, Z.-J.; Sun, X.; and Wu, F. 2021. Training spiking neural networks with accumulated spiking flow. In AAAI. Wu, S.; Sun, F.; Zhang, W.; Xie, X.; and Cui, B. 2022. Graph neural networks in recommender systems: a survey. ACM Computing Surveys, 55(5): 1 37. Xiao, M.; Meng, Q.; Zhang, Z.; He, D.; and Lin, Z. 2022. Online Training Through Time for Spiking Neural Networks. In Neur IPS. Xiao, M.; Meng, Q.; Zhang, Z.; Wang, Y.; and Lin, Z. 2021. Training feedback spiking neural networks by implicit differentiation on the equilibrium state. In Neur IPS. Xu, D.; Cheng, W.; Luo, D.; Liu, X.; and Zhang, X. 2019a. Spatio-Temporal Attentive RNN for Node Classification in Temporal Attributed Graphs. In IJCAI. Xu, H.; Jiang, C.; Liang, X.; and Li, Z. 2019b. Spatial-aware graph relation network for large-scale object detection. In CVPR. Xu, M.; Wu, Y.; Deng, L.; Liu, F.; Li, G.; and Pei, J. 2021. Exploiting spiking dynamics with spatial-temporal feature normalization in graph learning. In IJCAI. Yin, N.; Feng, F.; Luo, Z.; Zhang, X.; Wang, W.; Luo, X.; Chen, C.; and Hua, X.-S. 2022a. Dynamic hypergraph convolutional network. In ICDE. Yin, N.; Shen, L.; Li, B.; Wang, M.; Luo, X.; Chen, C.; Luo, Z.; and Hua, X.-S. 2022b. DEAL: An Unsupervised Domain Adaptive Framework for Graph-level Classification. In Proceedings of the 30th ACM International Conference on Multimedia, 3470 3479. Yin, N.; Shen, L.; Wang, M.; Lan, L.; Ma, Z.; Chen, C.; Hua, X.-S.; and Luo, X. 2023a. Co Co: A Coupled Contrastive Framework for Unsupervised Domain Adaptive Graph Classification. ar Xiv preprint ar Xiv:2306.04979. Yin, N.; Shen, L.; Xiong, H.; Gu, B.; Chen, C.; Hua, X.- S.; Liu, S.; and Luo, X. 2023b. Messages are never propagated alone: Collaborative hypergraph neural network for time-series forecasting. IEEE Transactions on Pattern Analysis & Machine Intelligence, (01): 1 15. Zhou, L.; Yang, Y.; Ren, X.; Wu, F.; and Zhuang, Y. 2018. Dynamic network embedding by modeling triadic closure process. In AAAI. Zhou, S.; Li, X.; Chen, Y.; Chandrasekaran, S. T.; and Sanyal, A. 2021. Temporal-coded deep spiking neural network with easy training and robust performance. In AAAI. Zhu, Z.; Peng, J.; Li, J.; Chen, L.; Yu, Q.; and Luo, S. 2022. Spiking Graph Convolutional Networks. In IJCAI. Zuo, Y.; Liu, G.; Lin, H.; Guo, J.; Hu, X.; and Wu, J. 2018. Embedding temporal network via neighborhood formation. In KDD. The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)