# differential_coding_for_trainingfree_anntosnn_conversion__88eb492f.pdf

Differential Coding for Training-Free ANN-to-SNN Conversion

Zihan Huang 1 Wei Fang B 2 Tong Bu 1 Peng Xue 3 4 Zecheng Hao 1 Wenxuan Liu 1 Yuanhong Tang B 1

Zhaofei Yu 1 5 Tiejun Huang 1

Spiking Neural Networks (SNNs) exhibit significant potential due to their low energy consumption. Converting Artificial Neural Networks (ANNs) to SNNs is an efficient way to achieve high-performance SNNs. However, many conversion methods are based on rate coding, which requires numerous spikes and longer time-steps compared to directly trained SNNs, leading to increased energy consumption and latency. This article introduces differential coding for ANNto-SNN conversion, a novel coding scheme that reduces spike counts and energy consumption by transmitting changes in rate information rather than rates directly, and explores its application across various layers. Additionally, the threshold iteration method is proposed to optimize thresholds based on activation distribution when converting Rectified Linear Units (Re LUs) to spiking neurons. Experimental results on various Convolutional Neural Networks (CNNs) and Transformers demonstrate that the proposed differential coding significantly improves accuracy while reducing energy consumption, particularly when combined with the threshold iteration method, achieving state-of-the-art performance. The source codes of the proposed method are available at https://github. com/h-z-h-cell/ANN-to-SNN-DCGS.

1School of Computer Science, Peking University, Beijing, China 2School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen, China 3Peng Cheng Laboratory, Shenzhen, China 4Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China 5Institute for Artificial Intelligence, Peking University, Beijing, China. Correspondence to: Wei Fang <fwei@pku.edu.cn>, Yuanhong Tang <ydtang@pku.edu.cn>.

Proceedings of the 42 nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025. Copyright 2025 by the author(s).

1. Introduction

Spiking Neural Networks (SNNs) are sometimes regarded as the third generation of neural network models (Maass, 1997) for their unique neural dynamics and high biological plausibility (Gerstner et al., 2014), making them a competitive candidate to Artificial Neural Networks (ANNs) (Li et al., 2024). A significant difference between ANNs and SNNs is the information representation. ANNs transmit dense floating values between layers. While in SNNs, communications between layers are based on sparse and binary spikes, which are triggered by the membrane potentials of spiking neurons across the threshold, bringing event-driven computations and extremely low power consumption on neuromorphic chips (Merolla et al., 2014; Davies et al., 2018; De Bole et al., 2019; Pei et al., 2019).

However, the discrete and non-differentiable spike firing process causes huge learning challenges in SNNs. Recently, this issue is solved partly by the surrogate learning method (Neftci et al., 2019), which redefines the gradient of spike firing process by a smooth and differentiable surrogate function. Enabled by the surrogate gradients, deep SNNs can be trained by powerful backward propagation and gradient descent methods, and their performance are greatly improved (Fang et al., 2021; Duan et al., 2022; Shi et al., 2024). The applications of SNNs are also extended to complex eventbased vision tasks (Cordone et al., 2022; Liu et al., 2024; Chen et al., 2024; Liu et al., 2025). Unfortunately, the surrogate gradient method is a coarse approximation, and may mislead the gradient descent direction in multi-layer SNNs (Gygax & Zenke, 2024). The time dimension of SNNs leads to the employment of backpropagation through time (BPTT), which requires nearly T times of training resources than ANNs. Here T is the sequence-length, and is also the number of time-steps of SNNs. Although some online learning methods (Xiao et al., 2022; Bohnstingl et al., 2023; Meng et al., 2023; Zhu et al., 2024) can estimate the full gradients of BPTT by accumulation of single-step gradients, their task accuracy is sub-optimal.

In addition to the surrogate gradient methods, the ANN to SNN conversion methods (Cao et al., 2015; Han et al., 2020; Li et al., 2021; Deng & Gu, 2021; Bu et al., 2022a; 2024) are another spiking deep learning methodology that

Differential Coding for Training-Free ANN-to-SNN Conversion

eliminates training challenges of SNNs. They convert pretrained ANNs to SNNs with replacing nonlinear activation functions by spiking neurons. The converted SNNs enjoy high performance and close accuracy to the source ANNs, even in the complex Image Net dataset. Most of the conversion methods are based on the rate coding, which represents activations in ANNs by the firing rates of SNNs. However, the precise estimation of firing rates requires a large number of time-steps, resulting in the obviously higher latency and energy consumption of the conversion methods than the surrogate gradient methods.

In this article, we propose the differential coding and its implementation scheme for different layers in ANN-to-SNN conversion. Instead of considering the average firing rate as the encoded activation value, differential coding treats timeweighted spikes as corrections to the encoded activation value. This approach not only improves network accuracy but also allows neurons to stop firing once a certain approximation precision is achieved, thereby reducing energy consumption without any extra training. Additionally, by minimizing the expected error between the Rectified Linear Units (Re LUs) and the encoded values in SNNs, we propose the threshold iteration method to determine the optimal thresholds for the spiking neurons for converting Re LUs, further enhancing the performance of the SNN.

Our main contributions are summarized as follows:

We propose differential coding for ANN-to-SNN conversion and establish the dynamics for various modules in SNNs.

We design a threshold iteration method to determine the optimal thresholds of spiking neurons for converting Re LUs.

We designed two equivalent implementations of the employed multi-threshold (MT) neuron to facilitate hardware-friendly execution.

By converting different CNNs and Transformers into SNNs for evaluation, our extensive experiments demonstrate that the proposed method achieves state-of-theart accuracy while significantly reducing network energy consumption.

2. Related Works

2.1. Rated-based ANN to SNN Conversion

The rate coding method has been early found in biological neural systems (Adrian, 1926) that stronger stimulation causes more frequent spikes. This straightforward coding method builds the bridge between activations of Re LUs in ANNs and firing rates of Integrate-and-Fire (IF) neurons in SNNs, based on which the primary ANN to SNN conversion

method (Cao et al., 2015) was derived. As the firing rate is defined by the average number of spikes over all timesteps, its range is restricted between zero and one. For the negative part, the IF neurons perfectly fit Re LUs. While the outputs of Re LUs are unbound, the normalization of weights for regulating activations (Rueckauer et al., 2017) and the balancing of thresholds for spiking neurons (Han et al., 2020) are proposed and relieve the range of mismatch during conversions.

The time is discretized to time-steps in SNNs. Consequently, the firing rates are also rounded with the fixed interval. While the floating activations in source ANNs are continuous, the discrete firing rates can not fit them preciously, causing the quantization errors. To future reduce the conversion errors, some quantized ANN to SNN methods are proposed (Bu et al., 2022b; Hu et al., 2023). These methods quantize and clip activations of ANNs, reliving the quantization errors and range mismatch at the same time. However, the source ANNs must be re-trained, which increases conversion costs, and their performance declines due to the change in the activation function.

The spikes may not arrive evenly during inference, which may cause the unevenness error in conversion (Bu et al., 2022b). A typical case is that no spike during the first T

2 time-steps, and more than T

2 spikes from different synapses arrive at the neuron during the last T

2 time-steps. Although the input firing rate is larger than 0.5, the neuron can not generate the 0.5 output firing rate because it has not enough time-steps to fire. Several methods have been proposed to reduce this error, including the two-stage inference strategy (Hao et al., 2023a) and shifting the initial membrane potential (Hao et al., 2023b).

Recent research has been extended to convert ANNs with activations beyond Re LUs to SNNs. (Oh & Lee, 2024) introduced a sign gradient descent based neuron that can approximate various nonlinear activation functions. (Wang et al., 2023) and (kang you et al., 2024) trained modified Transformers and converted them into spiking Transformers. Meanwhile, (Jiang et al., 2024) and (Huang et al., 2024) developed modules to approximate nonlinear layers, enabling a training-free conversion of Transformers to SNNs.

2.2. Temporal Coding Conversions

The rate coding method is inefficient and causes huge latency in rate-based ANN to SNN conversion methods. The surrogate gradient methods avoid this issue by the end-toend training, while the interpretation of coding methods in these SNNs is not clear yet (Li et al., 2023). For the conversion method, manual design for the coding strategy is indispensable. Beyond the rate coding method, several temporal coding methods are explored.

Differential Coding for Training-Free ANN-to-SNN Conversion

The time-to-first-spike (TTFS) coding method (Rueckauer & Liu, 2018; Zhang et al., 2019; Stanojevic et al., 2023) encodes the value into the firing time of spikes. Each neuron only fires one spike in TTFS SNNs, which brings extremely high power efficiency. However, these methods rely on layer-by-layer processing, i.e., one layer can only start to compute after it receives all input spikes from the last layer. Thus, these TTFS SNNs suffer from high latency increasing with the network depth.

The phase coding methods (Kim et al., 2018; Wang et al., 2022b) are similar to binary/decimal conversion. They encode values from spikes with the power-of-2 weights. For a given number of time-steps T, these methods can represent 2T different values, while the rate coding method can only represent T values. Their drawbacks are similar to the latency problem in TTFS SNNs that the values can only be obtained after all weighted spikes in a phase arrive.

The burst coding methods (Park et al., 2019; Li & Zeng, 2022; Wang et al., 2025) imitates the bursts of spikes during a short period in biological neural systems. The burst spikes are implemented by the multiplication of spikes and a coefficient, which carry more information than binary spikes, but may lose the advantages of SNNs based on the binary characteristic.

3. Preliminaries

3.1. Multi-Threshold Neuron

Many previous works have proposed using ternary-valued neurons to simulate negative values and reduce conversion errors (Li et al., 2022; Wang et al., 2022a; kang you et al., 2024). The ternary representation, with outputs of -1, 0, and 1, does not disrupt the event-driven nature and significantly enhances the expressive capability. Furthermore, (Huang et al., 2024) introduced the use of multi-channel methods to implement multi-threshold (MT) neuron for spike communication. In this article, we similarly adopt this approach to simulate y = x using identity spiking MT neurons.

The MT neuron is characterized by several parameters, including the base threshold θ, and a total of 2n thresholds, with n positive and n negative thresholds. The threshold values of the MT neuron are indexed by i, where λl i represents the i-th threshold value in the layer l:

λl 1 = θl, λl 2 = θl

2 , ..., λl n = θl

λl n+1 = θl, λl n+2 = θl

2 , ..., λl 2n = θl

Let variables Il[t], W l, sl i[t], xl[t], ml[t], and vl[t] represent the input current, weight, the output spike of the i-th threshold, the total output signal, and the membrane potential before and after spikes in the l-th layer at the time-step

Soma Axon Dendrite Synapse

Figure 1. Diagram of the MT neuron. The MT neuron receives input from last module and emits up to one spike at each time-step.

t. The dynamics of the MT neurons are described by the following equations:

ml[t] = vl[t 1] + Il[t] = vl[t 1] + xl 1[t], (2)

sl i[t] = MTHθ,n(ml[t], i) (3)

i sl i[t]W lλl i, (4)

vl[t] = ml[t] xl[t], (5)

MTHθ,n(ml[t], i) =

0,if λ2n < x < λn 1,elif i = arg minp |x λp| 0,else .

Figure 1 shows the dynamics of MT neurons, when n = 1, this model reduces to an IF neuron with an additional negative threshold. Since only up to one threshold can emit spike per time-step, and λl can be derived by bit-shifting θ, we can implement MT neurons by calculating W lλl i through the weight W lθl followed by bit-shifting.

3.2. Rate Coding in ANN-to-SNN Conversion

Traditional ANN-to-SNN conversion methods employ rate coding, which can be mathematically expressed as:

i=1 xl[i], (7)

where xl[i] represents the encoded output of layer l in SNNs at time-step i, while rl[t] denotes the encoded activation that aims to map activation value αl from the corresponding ANN layer. Derived from Equation (2) and (5), we have

vl[t] vl[t 1] = xl 1[t] xl[t], (8)

vl[t] vl[0] =

i=1 xl 1[t]

i=1 xl[t], (9)

i=1 xl[t] = 1

i=1 xl 1[t] vl[t] vl[0]

= rl 1[t] vl[t] vl[0]

Differential Coding for Training-Free ANN-to-SNN Conversion

Assuming rl 1[t] = αl 1, and given that αl = αl 1 when simulating y = x, when t is sufficient large or vl[t] close to vl[0], the encode value rl[t] in SNNs can approximate the activation value αl in ANNs:

rl[t] = rl 1[t] vl[t] vl[0]

t rl 1[t] = αl 1 = αl. (11)

In this section, we propose differential coding with graded units and spiking neurons (DCGS), a training-free theory for converting ANNs to SNNs. We begin by introducing a differential coding approach, from which we develop differential graded units, differential spiking neurons and differential coding for linear Layer. These enable the conversion of various network modules. Additionally, we provide the threshold iteration method to find the optimal threshold of spiking neurons for converting Re LUs. The overall algorithm can be found in Appendix A. Furthermore, we design two equivalent implementations of the MT neuron to support hardware-friendly execution.

4.1. Differential Coding in ANN-to-SNN Conversion

The traditional ANN-to-SNN conversion uses rate coding to transmit information, where the firing rate rl[t] at each time-step encodes the activation value. Equation (12) shows the relationship between the output firing rate rl[t] and the output signal xl[t]. When layer l consists of spiking neurons, the state can be described as xl[t] = θlsl[t], where sl[t] denotes the spike at time-step t and θl represents the threshold.

i=1 xl[t] = t 1

t rl[t 1] + xl[t]

When the neuron does not emit a spike, the rate update is the proportion 1

t rl[t 1] of the previous rate, while when the neuron emits a spike, the rate update increases by 1

t rl[t 1] + xl[t]

However, the rate coding method has a problem: over time, the encoded value gradually decays. As the time-step t increases, the influence of earlier inputs 1

t becomes smaller, and the system requires more spikes to compensate for this decay effect, thus increasing the number of spikes required.

To address this issue, we propose a novel encoding scheme, referred to as differential coding.

Definition 4.1. In differential coding, denote xl[t] as the actual output of the neuron. Define el[t] as the encoded output value at time-step t, the encoded activation value rl[t] as the average of el[t] from time 1 to time-step t. The

relationship between the two is expressed by Equations (13) and (14), as follows:

el[t] = rl[t 1] + xl[t], (13)

rl[t] = rl[t 1] + xl[t]

i=1 el[i], (14)

where t starts from 1, rl[0] = 0.

The detailed explanation of Definition 4.1 is provided in the Appendix B. Comparing Equation (7) with (14), the key difference is that differential coding only updates the encoded activation value when an output spike occurs, rather than decay at each time-step in rate coding. Figure 2 shows the ideal fitting results of rate coding and differential coding for Input y = x with T = 3 and thresholds 1. Differential coding can represent a wider range of values and achieve higher precision than rate coding, given the same threshold and time-steps.

2 1 0 1 2 Ideal Encoded Value

Closest Encoded Value rl[3]

y = x Rate Coding Diﬀerential Coding

Figure 2. Comparison of ideal fitting results: rate coding vs. differential coding for input y = x with T = 3 and thresholds 1. Differential coding shows a wider representation range and higher precision.

4.1.1. DIFFERENTIAL GRADED UNITS

Existing ANN-to-SNN conversion methods struggle with nonlinear functions such as Gaussian Error Linear Units (Ge LU) (Hendrycks & Gimpel, 2023) and Layer Norm (Lei Ba et al., 2016). For these nonlinear layers, we utilize specific neuron dynamic units to implement them. Based on the expectation compensation idea from (Huang et al., 2024), we propose introducing differential graded units to replace those nonlinear modules that cannot be directly converted.

Derived from differential coding scheme in Definition 4.1, this article proposes two types of differential graded units. Theorem 4.2 corresponds to nonlinear layers with only one input xl 1[t], and Theorem 4.3 applies to certain operations with two inputs xl 1 A [t] and xl 1 B [t].

Differential Coding for Training-Free ANN-to-SNN Conversion

Theorem 4.2. Let F l be a nonlinear layer l with only one input xl 1[t], such as Gelu, Silu, Maxpool, Layer Norm, or Softmax. In ANN-to-SNN conversion, the mapping from F to dynamics of the differential graded unit in differential coding is given by Equations (15) and (16).

ml[t] = rl 1[t] = ml[t 1] + xl 1[t]

xl[t] = t (F l(ml[t]) F l(ml[t 1])), (16)

where ml[t] is the membrane potential at time-step t which is equal to bl 1 if the previous layer has bias else 0, rl[t] is the encoded output activation value of the previous t timesteps. The output of layer l at time-step t, which serves as the input to layer l + 1, is given by xl[t].

The proof of Theorem 4.2 is detailed in the Appendix C. From Theorem 4.2, a single-input unit requires two variables: one to record ml[t] and another to record F(ml[t]), in order to reduce redundant calculations at each time-step.

Theorem 4.3. Let be an operation with two inputs, such as matrix multiplication or element-wise multiplication. In ANN-to-SNN conversion, the mapping from operation to dynamics of the differential graded units in differential coding is given by Equations (17) to (19).

ml A[t] = rl 1 A [t] = ml A[t 1] + xl 1 A [t]

ml B[t] = rl 1 B [t] = ml B[t 1] + xl 1 B [t]

xl[t] = xl 1 A [t] xl 1 B [t] t + xl 1 A [t] ml B[t] + ml A[t] xl 1 B [t],

where ml A[t] and ml B[t] are membrane potential at timestep t, and rl 1 A [t] and rl 1 B [t] are the encoded activation values of the previous layers at time-step t. The output of layer l at time-step t, which serves as the input to layer l+1, is given by xl[t].

The proof of Theorem 4.3 is detailed in the Appendix D. From Theorem 4.3, a neuron with two inputs requires two variables to record ml A[t] and ml B[t], respectively. Graded units provide the ability to integrate information about nonlinear layer changes. This enables the conversion of various complex networks, including CNNs and Transformers.

4.1.2. DIFFERENTIAL SPIKING NEURONS

Since the majority of computations occur in fully connected layers, convolutional layers, and matrix multiplication layers, it is recommended to introduce spiking neuron layers before these layers, so that the computation is event-driven, thereby effectively reducing the network s energy consumption. Theorem 4.4 demonstrates how to convert a spiking

Full Spiking Neuron Full Spiking Matrix Product

Identity Spiking Neuron Identity Spiking Neuron

Identity Spiking Neuron

Linear Linear

: variables

Graded Unit

Graded Matrix Product

Figure 3. (a) Conversion of a linear layer followed by a nonlinear layer in an ANN into SNN modules. (b) Conversion of a matrix product or element-wise multiplication in the ANN into SNN modules.

neuron in rate coding into a differential neuron in differential coding.

Theorem 4.4. In rate coding, the output of the previous layer, xl 1[t], is directly used as the input current for the current layer Il[t] = xl 1[t]. In differential coding, the input current Il[t] can be adjusted as shown in Equation (20), which converts any spiking neuron into a differential spiking neuron:

Il[t] = ml r[t] + xl 1[t], (20)

ml r[t + 1] = ml r[t] + xl 1[t]

where ml r[0] is bl 1 if the previous layer has bias else 0.

The proof of Theorem 4.4 is detailed in the Appendix E. In contrast to rate coding, which is constrained by a decay that limits the output range to below the threshold θ, differential coding allows for adaptive adjustment of the neuron s output range by directly modifying the encoded activation rl[t]. This flexibility is especially beneficial in scenarios with multiple or dynamically adjustable thresholds, as the combination of different thresholds enhances the representation accuracy. So, we employ a differential version of identity multi-threshold spiking neuron in our experiments.

Differential Coding for Training-Free ANN-to-SNN Conversion

4.1.3. DIFFERENTIAL CODING FOR LINEAR LAYER

Theorem 4.5 shows the conversion of linear layers under differential coding in ANN-to-SNN conversion. Theorem 4.5. For linear layers, including fully connected and convolutional layers that can be represented by Equation (22),

xl = W lxl 1 + bl, (22)

where W l and bl is the weight and bias of layer l. Under differential coding in SNNs, this is equivalent to eliminating the bias term bl and initializing the membrane potential of the subsequent layer with the bias value.

The proof of Theorem 4.5 is detailed in the Appendix F. Figure 3 shows the overall method to replace ANN modules by SNN modules under differential coding.

4.2. Optimal Threshold for Re LU

When replacing the Re LU function in CNNs with spiking neurons, we propose an algorithm called threshold iteration method for determining the optimal threshold. Assumption 4.6. According to (de G. Matthews et al., 2018), assume that the input x to the neuron follows a normal distribution X with mean µ and variance σ2.

Based on Assumption 4.6, we introduce Definition 4.7 to define the overall error function, which is obtained by integrating the function error over the distribution of activation values. Definition 4.7. In the T time-steps conversion, the quantization and clipping errors of the Re LU function can be expressed as

QE(θ) = R + (f(x, θ) max (x, 0))2 e (x µ)2

2σ2 dx, (23)

f(x, θ) = θ

N clamp Nx+ θ

2 θ , 0, N , (24)

where f(x, θ) represents the expected encoded activation in SNNs for a threshold θ which is proposed by (Bu et al., 2022a). For an IF neuron, N = T. For a multi-threshold with n threshold, roughly let N = 2n T.

Finding the optimal threshold by directly differentiating this function is challenging. However, we can take an alternative approach by introducing a variable k to help determine the optimal threshold. We consider two cases: k multiplies the output threshold amplitude as in Equation (25), and k multiplies the threshold during spike calculation as in Equation (28). These cases yield the following two lemmas. Lemma 4.8.

QE1(θ, k) = R + (f1(x, θ, k) max (x, 0))2 e (x µ)2

2σ2 dx, (25)

f1(x, θ, k) = k θ

N clamp Nx+ θ

2 θ , 0, N . (26)

When θ is fixed, QE1(θ, k) reaches its minimum value when:

1 Pn i=1 1 nerf

1 Pn i=1 2i 1

Pn i=1 1 ne

1 Pn i=1 2i 1

QE2(θ, k) = R + (f2(x, θ, k) max (x, 0))2 e (x µ)2

2σ2 dx, (28)

f2(x, θ, k) = θ

N clamp Nx+ kθ

2 kθ , 0, N . (29)

When θ is fixed, QE2(θ, k) reaches its minimum value when k = 1.

Algorithm 1 Threshold iteration method to find the best threshold

1: Input: Pre-trained ANN Model FANN(W ), Dataset D. 2: Initialize: Set θ 1 (any positive initial value) 3: Run the model FANN(W ) on dataset D to statically compute the mean µ and variance σ2 of pre-activations of each Re LU separately. 4: repeat 5: Update k1 based on µ and σ2 according to Eq (27) 6: Update θ k1 θ 7: until 1 ϵ < k1 < 1 + ϵ, where ϵ tends to 0. 8: Output: Threshold θ

According to Lemma 4.8 and Lemma 4.9, we obtain the following inequality and Theorem 4.10:

QE(k1θ) < QE2(k1θ, 1

k1 ) = QE1(θ, k1) < QE(θ). (30)

Theorem 4.10. Starting from any positive initial value of θ, the rate of change k1 can be continuously calculated based on the prior mean µ, variance σ2, and the current threshold θ using Equation (27). The iteration θ = k1θ continues until convergence, at which point the global optimal threshold θ is obtained. The process is guaranteed to converge as long as the threshold is greater than 0.

The proof of Theorem 4.8, 4.9, and 4.10 are detailed in the Appendix G, H and I. Therefore, the optimal θ can be determined by the Theorem 4.10 and Algorithm 1.

4.3. Hardware implementation of MT Neuron

Equation (6) in Section 3.1 is presented for ease of understanding. In hardware implementation, the argmin module

Differential Coding for Training-Free ANN-to-SNN Conversion

is not used. We develop a hardware-friendly version of the MT neuron model, which can efficiently map the appropriate threshold using the potential s sign bit and exponent bits at an extremely low cost.

Compared with previous ANN2SNN methods, the MT neuron is required to transmit an extra index i for the threshold. When implementing the MT neuron, two implementations can be considered:

1. Sent Vth[i] S[t] to the next layer

2. Add an external threshold dimension with 2n elements to S[t], set S[t][i] = 1 and S[t][j] = 0 for all j = i. At the same time, an external threshold dimension is added to the weight of the next layer, whose elements are the multi-level thresholds.

For simplicity, we use implementation 1 on GPUs, which is not pure binary but equivalent to implementation 2 with binary outputs. The MT neuron is also compatible with asynchronous computing neuromorphic chips because its outputs are still sparse events. Take the speck chip (Yao et al., 2024) as an example. The LIF neuron in the convolutional layer in speck chip outputs (c, x, y) to the next layer. When using the MT neuron, the only modification is adding a threshold index, i.e., (c, x, y, i). The computations of the next layer should also be changed by using a bit-shift operation on the weights, as the threshold is a power of 2 and this allows multiplication to be avoided. After the above modifications, the computation is still asynchronous and event-driven. The implementation to avoid argmin in Equation (6) in hardware can be described in the following two steps.

Step 1: Get SNN weights by using the weight normalization strategy (Rueckauer et al., 2017) described by the following equation.

W l SNN = W l ANN θl

θl+1 , (31)

bl SNN = bl ANN θl+1 . (32)

We then set all base thresholds θl = 1, resulting in the following thresholds for the MT neuron:

λl i = 1 2i 1 , 1 < i n, 1 2i n 1 ,n < i 2n. (33)

Step 2: We define 4

3ml[t] = ( 1)S2E(1 + M) with 1 sign bit (S), 8 exponent bits (E), and 23 mantissa bits (M). Since the median of 1 2k 1 and 1 2k is 3

4 1 2k 1 , we can easily select the correct threshold index i using E and S of 4

3ml[t], without performing 2n subtractions to calculate the argmin in Equation (6):

MTHθ,n(ml[t], i) =

( i < n, S = 0 and i = 1 E, i n, S = 1 and i n = 1 E,

0, otherwise.

For differential neurons, the memory overhead compared to initial neurons, such as IF or MT neurons, only includes an additional membrane potential. This extra potential is used to adjust the input current as described in Theorem 4.4.

To enable fast execution on GPU, we also design an efficient algorithm which is detailed in Appendix P.

5. Experimental Results

In this section, we first evaluate the performance of our proposed method on Image Net dataset across different models, comparing our results with state-of-the-art ANN-to-SNN conversion methods. Then, we compute and analyze the energy consumption of the converted SNNs. Finally, we conduct comparative experiments to validate the effectiveness of differential coding and the threshold iteration method.

Table 1. Accuracy and energy ratio of DCGS(Ours) of different converted models on Image Net Dataset

Model Config Time-step T

Acc/Energy 2 4 8 12 16

Res Net34-4/1, Param:21.8M, Acc:76.42%

Acc 59.71 73.35 76.04 76.26 76.35 Energy ratio 0.14 0.24 0.37 0.46 0.53

VGG16-4/1, Param:138M, Acc:73.25%

Acc 70.69 72.72 73.17 73.23 73.26 Energy ratio 0.10 0.15 0.22 0.26 0.29

Vi T-Small-8/4, Param:22.1M, Acc:81.38%

Acc 77.84 81.11 81.43 81.39 81.38 Energy ratio 0.32 0.62 1.05 1.39 1.71

5.1. Comparison with the State-of-the-art ANN-to-SNN Conversion Methods

We conducted conversion experiments on 11 different CNNs and Transformers using the Imagenet dataset. We denote the converted model as model n/c, where the multi-threshold neurons have n positive and n opposing negative thresholds, and the calculated channel-wise thresholds are scaled by a factor c. Eg., the Res Net34-4/2 model represents the conversion using the Res Net34 model, employing multi-threshold spiking neurons with 4 positive and 4 negative thresholds, and the actual thresholds are based on the statistical thresholds multiplied by a factor of 2.

When n = 1, it can be treated as an IF neuron with an additional negative threshold. Table 2 shows a comparison

Differential Coding for Training-Free ANN-to-SNN Conversion

Table 2. Comparison between the proposed method and previous ANN-to-SNN conversion works on Image Net dataset.

Method Type Arch. Param.(M) ANN Acc(%) T SNN Acc(%)

TS (Deng & Gu, 2021) CNN-to-SNN VGG-16 138 72.40 64 70.97

SNM (Wang et al., 2022a) CNN-to-SNN VGG-16 138 73.18 64 71.50

MMSE (Li et al., 2021) CNN-to-SNN Res Net-34 21.8 75.66 64 71.12 VGG-16 138 75.36 64 70.69

QCFS (Bu et al., 2022b) CNN-to-SNN Res Net-34 21.8 74.32 64 72.35 VGG-16 138 74.29 64 72.85

SRP (Hao et al., 2023a) CNN-to-SNN Res Net-34 21.8 74.32 4, 64 66.71, 68.61 VGG-16 138 74.29 4, 64 66.47, 69.43

MST (Wang et al., 2023) Transformer-to-SNN Swin-T(BN) 28.5 80.51 128, 512 77.88, 78.51

STA (Jiang et al., 2024) Transformer-to-SNN Vi T-B/32 86 83.60 32, 256 78.72, 82.79

Spike ZIP-TF (kang you et al., 2024) Transformer-to-SNN SVi T-S-32Level 22.05 81.59 64 81.45 SVi T-B-32Level 86.57 82.83 64 82.71 SVi T-L-32Level 304.33 83.86 64 83.82

ECMT (Huang et al., 2024) Transformer-to-SNN Vi T-S/16 22 78.04 8, 10 76.03, 77.07 EVA-G 1074 89.62 4, 8 88.60, 89.40

Res Net18-1/1 11.7 71.49 32, 64 69.89, 71.08 Res Net34-1/1 21.8 76.42 32, 64 58.86, 74.11 VGG-1/1 138 73.25 32, 64 72.04, 73.13 Res Net18-4/1 11.7 71.49 4, 8 70.07, 71.31 Res Net34-4/1 21.8 76.42 4, 8 73.35, 76.04 VGG-4/1 138 73.25 4, 8 72.72, 73.17

Transformer-to-SNN

Vi T-S-8/4 22.1 81.38 2, 4 77.84, 81.11 Vi T-B-8/4 86.6 84.54 2, 4 80.34, 83.98 Vi T-L-8/4 304.3 85.84 2, 4 83.73, 85.45 EVA02-T-8/4 5.8 80.63 2, 4 66.32, 79.56 EVA02-S-8/4 22.1 85.73 2, 4 71.37, 84.70 EVA02-B-8/4 87.1 88.69 2, 4 84.62, 88.16 EVA02-L-8/4 305.1 90.05 2, 4 88.25, 89.72

of our method with other ANN-to-SNN conversion methods, and detailed results can be found in Appendix K.

In CNNs, when n = 1, our method outperforms the existing methods on the same structure achieving state-of-the-art results; and when n > 1, we achieve better performance with extremely shorter time-steps.

In Transformers, the threshold iteration method is not suitable, and using the top 99.9% of activation values does not optimal thresholds. As a result, achieving high performance with n = 1 in short time-steps is challenging. Therefore, we scale the statistical thresholds by c = 4 and setting n = 8. Our method requires no training and achieves high performance in extremely short time-steps.

5.2. Energy Estimation and Result Analysis

Based on (Horowitz, 2014), we use Equation (35) to estimate the energy consumption ratio of the converted SNN

relative to the ANN, with EMAC = 4.6p J and EAC = 0.9p J.

ESNN EANN = MACs SNN EMAC + ACs SNN EAC

MACs ANN EMAC . (35)

Since most computations in the network occur in the fully connected, convolutional, and matrix multiplication layers, which in SNNs are primarily implemented by additions (with ACs SNN >> MACs SNN), we approximate MACs SNN 0. We then use the statistical spike emission rate η to estimate ACs SNN MACs ANN , thereby estimating the energy consumption of the SNN relative to the pre-conversion ANN. Table 1 presents partial results, and the detailed results for all converted SNN models can be found in Appendix K.

For CNNs, our method achieves SNN performance comparable to the ANN with low power consumption and extremely short time-steps. Notably, for the VGG16 model, it achieves an accuracy of 73.17% with only a 0.08% accuracy loss and 22% power consumption.

Differential Coding for Training-Free ANN-to-SNN Conversion

2 4 6 8 10 12 14 16 Time-step T

Accuracy (%)

Energy Ratio

Res Net34-4/2, DC Acc Res Net34-4/2, RC Acc

Vi T-Small-8/4, DC Acc Vi T-Small-8/4, RC Acc

Res Net34-4/2, DC Energy Res Net34-4/2, RC Energy

Vi T-Small-8/4, DC Energy Vi T-Small-8/4, RC Energy

Figure 4. Effective of the differential coding compared to the rate coding. DC and RC represent Differential Coding and Rate Coding, respectively.

For Transformers, although our method achieves high accuracy with extremely short time-steps and shows a decreasing energy consumption growth rate, there is still significant room for further optimization. This is primarily due to the lack of an optimal threshold calculation method, which causes inefficient spike firing in the SNNs. This leads to larger errors when matching the ANN activation values, resulting in more premature spike emissions. This is an area we aim to improve in future research.

5.3. Effectiveness of the Differential Coding

To validate the effectiveness of the differential coding, we compared the performance of differential coding and rate coding using the same model. The partial visualization results are presented in Figure 4, with a more detailed table provided in Appendix L. The model using differential coding not only outperforms the rate coding model in terms of accuracy, but also consumes less energy. This is because differential coding directly updates the current encoding value based on previous results, avoiding decay. It can represent a broader range and steadily improve representation accuracy. Once the representation precision reaches a certain level, no further spikes are emitted.

5.4. Effectiveness of Threshold Iteration Method

To verify the effectiveness of the threshold iteration method, we compared the performance of the converted SNNs using two different methods, threshold iteration method and

0 10 20 30 40 50 60 Time-step T

Accuracy (%)

Energy Ratio

Res Net34-1/2, TI Acc Res Net34-1/2, LA Acc

Res Net34-4/2, TI Acc Res Net34-4/2, LA Acc

Res Net34-1/2, TI Energy Res Net34-1/2, LA Energy

Res Net34-4/2, TI Energy Res Net34-4/2, LA Energy

Figure 5. Effectiveness of the threshold iteration method. TI and LA represent Threshold Iteration Method and 99.9% large activation method respectively.

the top 99.9% of activation method, with different numbers of threshold neurons in Res Net34. Here we set scale factor c = 2 to prevent the accuracy from being too small when using the 99.9% large activation method. The partial visualization results are presented in Figure 5, and more information can be found in Appendix M. The experimental results show that the thresholds derived using the threshold iteration method outperform those obtained through the 99.9% large activation method, achieving better accuracy and lower energy consumption at each time-step.

6. Conclusion

This article introduces a training-free ANN-to-SNN conversion method based on differential coding. Instead of directly encoding rate information, it uses spikes to encode differential information, improving both network accuracy and energy efficiency. For Re LU conversions, it includes a threshold iteration method to find the optimal thresholds, which further enhances the network performance.

However, the proposed method also has some limitations. Differential coding requires spiking neurons to have at least one negative threshold to generate negative spikes for error correction; otherwise, excessive spike errors will accumulate continuously. Meanwhile, we have not develop a method to determine the optimal thresholds for Transformers, which limits the conversion performance on Transformers. Future research could focus on addressing this challenge.

Differential Coding for Training-Free ANN-to-SNN Conversion

Acknowledgments

This work was supported by STI 2030-Major Projects 2021ZD0200300, the National Natural Science Foundation of China (62422601, U24B20140, and 62088102), Beijing Municipal Science and Technology Program (Z241100004224004), Beijing Nova Program (20230484362, 20240484703), and National Key Laboratory for Multimedia Information Processing.

Impact Statement

This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here.

Adrian, E. D. The impulses produced by sensory nerve endings: Part i. The Journal of Physiology, 61(1):49, 1926.

Bohnstingl, T., Wo zniak, S., Pantazi, A., and Eleftheriou, E. Online spatio-temporal learning in deep neural networks. IEEE Transactions on Neural Networks and Learning Systems, 34(11):8894 8908, 2023. doi: 10.1109/TNNLS. 2022.3153985.

Bu, T., Ding, J., Yu, Z., and Huang, T. Optimized potential initialization for low-latency spiking neural networks. Proceedings of the AAAI Conference on Artificial Intelligence, 36(1):11 20, 2022a.

Bu, T., Fang, W., Ding, J., Dai, P., Yu, Z., and Huang, T. Optimal ann-snn conversion for high-accuracy and ultralow-latency spiking neural networks. In International Conference on Learning Representations, 2022b.

Bu, T., Li, M., and Yu, Z. Training-free conversion of pretrained anns to snns for low-power and high-performance applications. ar Xiv preprint ar Xiv:2409.03368, 2024.

Cao, Y., Chen, Y., and Khosla, D. Spiking deep convolutional neural networks for energy-efficient object recognition. International Journal of Computer Vision, 113(1): 54 66, 2015.

Chen, K., Chen, S., Zhang, J., Zhang, B., Zheng, Y., Huang, T., and Yu, Z. Spikereveal: Unlocking temporal sequences from real blurry inputs with spike streams. In Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., and Zhang, C. (eds.), Advances in Neural Information Processing Systems, volume 37, pp. 62673 62696. Curran Associates, Inc., 2024.

Cordone, L., Miramond, B., and Thierion, P. Object detection with spiking neural networks on automotive event data. In 2022 International Joint Conference on Neural Networks (IJCNN), pp. 1 8, 2022. doi: 10.1109/IJCNN55064.2022.9892618.

Davies, M., Srinivasa, N., Lin, T.-H., Chinya, G., Cao, Y., Choday, S. H., Dimou, G., Joshi, P., Imam, N., Jain, S., Liao, Y., Lin, C.-K., Lines, A., Liu, R., Mathaikutty, D., Mc Coy, S., Paul, A., Tse, J., Venkataramanan, G., Weng, Y.-H., Wild, A., Yang, Y., and Wang, H. Loihi: a neuromorphic manycore processor with on-chip learning. IEEE Micro, 38(1):82 99, 2018. doi: 10.1109/MM.2018. 112130359.

de G. Matthews, A. G., Hron, J., Rowland, M., Turner, R. E., and Ghahramani, Z. Gaussian process behaviour in wide deep neural networks. In International Conference on Learning Representations, 2018.

De Bole, M. V., Taba, B., Amir, A., Akopyan, F., Andreopoulos, A., Risk, W. P., Kusnitz, J., Ortega Otero, C., Nayak, T. K., Appuswamy, R., Carlson, P. J., Cassidy, A. S., Datta, P., Esser, S. K., Garreau, G. J., Holland, K. L., Lekuch, S., Mastro, M., Mc Kinstry, J., di Nolfo, C., Paulovicks, B., Sawada, J., Schleupen, K., Shaw, B. G., Klamo, J. L., Flickner, M. D., Arthur, J. V., and Modha, D. S. Truenorth: Accelerating from zero to 64 million neurons in 10 years. Computer, 52(5):20 29, 2019.

Deng, S. and Gu, S. Optimal conversion of conventional artificial neural networks to spiking neural networks. In International Conference on Learning Representations, 2021.

Duan, C., Ding, J., Chen, S., Yu, Z., and Huang, T. Temporal effective batch normalization in spiking neural networks. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems, volume 35, pp. 34377 34390. Curran Associates, Inc., 2022.

Fang, W., Yu, Z., Chen, Y., Huang, T., Masquelier, T., and Tian, Y. Deep residual learning in spiking neural networks. In Advances in Neural Information Processing Systems, volume 34, pp. 21056 21069, 2021.

Gerstner, W., Kistler, W. M., Naud, R., and Paninski, L. Neuronal Dynamics: From Single Neurons to Networks and Models of Cognition. Cambridge University Press, 2014.

Gygax, J. and Zenke, F. Elucidating the theoretical underpinnings of surrogate gradient learning in spiking neural networks, 2024.

Differential Coding for Training-Free ANN-to-SNN Conversion

Han, B., Srinivasan, G., and Roy, K. Rmp-snn: Residual membrane potential neuron for enabling deeper highaccuracy and low-latency spiking neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13558 13567, 2020.

Hao, Z., Bu, T., Ding, J., Huang, T., and Yu, Z. Reducing ann-snn conversion error through residual membrane potential. Proceedings of the AAAI Conference on Artificial Intelligence, 37(1):11 21, Jun. 2023a. doi: 10.1609/aaai.v37i1.25071.

Hao, Z., Ding, J., Bu, T., Huang, T., and Yu, Z. Bridging the gap between ANNs and SNNs by calibrating offset spikes. In The Eleventh International Conference on Learning Representations, 2023b.

Hendrycks, D. and Gimpel, K. Gaussian error linear units (gelus), 2023.

Horowitz, M. 1.1 computing s energy problem (and what we can do about it). In Proceedings of IEEE International Solid-State Circuits Conference Digest of Technical Papers, pp. 10 14, 2014.

Hu, Y., Zheng, Q., Jiang, X., and Pan, G. Fast-snn: Fast spiking neural network by converting quantized ann. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(12):14546 14562, 2023. doi: 10.1109/TPAMI.2023.3275769.

Huang, Z., Shi, X., Hao, Z., Bu, T., Ding, J., Yu, Z., and Huang, T. Towards high-performance spiking transformers from ANN to SNN conversion. In Cai, J., Kankanhalli, M. S., Prabhakaran, B., Boll, S., Subramanian, R., Zheng, L., Singh, V. K., C esar, P., Xie, L., and Xu, D. (eds.), Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024 - 1 November 2024, pp. 10688 10697. ACM, 2024. doi: 10.1145/3664647.3680620.

Jiang, Y., Hu, K., Zhang, T., Gao, H., Liu, Y., Fang, Y., and Chen, F. Spatio-temporal approximation: A trainingfree SNN conversion for transformers. In The Twelfth International Conference on Learning Representations, 2024.

kang you, Xu, Z., Nie, C., Deng, Z., Guo, Q., Wang, X., and He, Z. Spike ZIP-TF: Conversion is all you need for transformer-based SNN. In Forty-first International Conference on Machine Learning, 2024.

Kim, J., Kim, H., Huh, S., Lee, J., and Choi, K. Deep neural networks with weighted spikes. Neurocomputing, 311: 373 386, 2018.

Lei Ba, J., Kiros, J. R., and Hinton, G. E. Layer Normalization. ar Xiv e-prints, art. ar Xiv:1607.06450, July 2016. doi: 10.48550/ar Xiv.1607.06450.

Li, C., Ma, L., and Furber, S. Quantization framework for fast spiking neural networks. Frontiers in Neuroscience, 16:918793, 2022.

Li, G., Deng, L., Tang, H., Pan, G., Tian, Y., Roy, K., and Maass, W. Brain-inspired computing: A systematic survey and future trends. Proceedings of the IEEE, 112 (6):544 584, 2024. doi: 10.1109/JPROC.2024.3429360.

Li, Y. and Zeng, Y. Efficient and accurate conversion of spiking neural network with burst spikes. 2022.

Li, Y., Deng, S., Dong, X., Gong, R., and Gu, S. A free lunch from ann: Towards efficient, accurate spiking neural networks calibration. In International Conference on Machine Learning, pp. 6316 6325. PMLR, 2021.

Li, Y., Kim, Y., Park, H., and Panda, P. Uncovering the representation of spiking neural networks trained with surrogate gradient. Transactions on Machine Learning Research, 2023. ISSN 2835-8856.

Liu, Z., Guan, B., Shang, Y., Yu, Q., and Kneip, L. Linebased 6-dof object pose estimation and tracking with an event camera. IEEE Transactions on Image Processing, 33:4765 4780, 2024. doi: 10.1109/TIP.2024.3445736.

Liu, Z., Guan, B., Shang, Y., Bian, Y., Sun, P., and Yu, Q. Stereo event-based, 6-dof pose tracking for uncooperative spacecraft. IEEE Transactions on Geoscience and Remote Sensing, 63:1 13, 2025. doi: 10.1109/TGRS.2025. 3530915.

Maass, W. Networks of spiking neurons: The third generation of neural network models. Neural networks, 10(9): 1659 1671, 1997.

Meng, Q., Xiao, M., Yan, S., Wang, Y., Lin, Z., and Luo, Z.- Q. Towards memoryand time-efficient backpropagation for training spiking neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6166 6176, October 2023.

Merolla, P. A., Arthur, J. V., Alvarez-Icaza, R., Cassidy, A. S., Sawada, J., Akopyan, F., Jackson, B. L., Imam, N., Guo, C., Nakamura, Y., et al. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science, 345(6197):668 673, 2014.

Neftci, E. O., Mostafa, H., and Zenke, F. Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Processing Magazine, 36(6):51 63, 2019.

Differential Coding for Training-Free ANN-to-SNN Conversion

Oh, H. and Lee, Y. Sign gradient descent-based neuronal dynamics: ANN-to-SNN conversion beyond re LU network. In Forty-first International Conference on Machine Learning, 2024.

Park, S., Kim, S., Choe, H., and Yoon, S. Fast and efficient information transmission with burst spikes in deep spiking neural networks. In Proceedings of Annual Design Automation Conference (DAC), pp. 1 6, 2019.

Pei, J., Deng, L., Song, S., Zhao, M., Zhang, Y., Wu, S., Wang, G., Zou, Z., Wu, Z., He, W., et al. Towards artificial general intelligence with hybrid tianjic chip architecture. Nature, 572(7767):106 111, 2019.

Rueckauer, B. and Liu, S.-C. Conversion of analog to spiking neural networks using sparse temporal coding. In 2018 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1 5. IEEE, 2018.

Rueckauer, B., Lungu, I.-A., Hu, Y., Pfeiffer, M., and Liu, S.-C. Conversion of continuous-valued deep networks to efficient event-driven networks for image classification. Frontiers in Neuroscience, 11, 2017. ISSN 1662-453X. doi: 10.3389/fnins.2017.00682.

Shi, X., Hao, Z., and Yu, Z. Spikingresformer: Bridging resnet and vision transformer in spiking neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5610 5619, June 2024.

Stanojevic, A., Wo zniak, S., Bellec, G., Cherubini, G., Pantazi, A., and Gerstner, W. An exact mapping from relu networks to spiking neural networks. Neural Networks, 168:74 88, 2023.

Wang, Y., Zhang, M., Chen, Y., and Qu, H. Signed neuron with memory: Towards simple, accurate and highefficient ann-snn conversion. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 2501 2508, 2022a.

Wang, Z., Gu, X., Goh, R. S. M., Zhou, J. T., and Luo, T. Efficient spiking neural networks with radix encoding. IEEE Transactions on Neural Networks and Learning Systems, 35(3):3689 3701, 2022b.

Wang, Z., Fang, Y., Cao, J., Zhang, Q., Wang, Z., and Xu, R. Masked spiking transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1761 1771, 2023.

Wang, Z., Fang, Y., Cao, J., Ren, H., and Xu, R. Adaptive calibration: A unified conversion framework of spiking neural networks. Proceedings of the AAAI Conference on Artificial Intelligence, 39(2):1583 1591, Apr. 2025. doi: 10.1609/aaai.v39i2.32150.

Xiao, M., Meng, Q., Zhang, Z., He, D., and Lin, Z. Online training through time for spiking neural networks. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems, volume 35, pp. 20717 20730. Curran Associates, Inc., 2022.

Yao, M., Richter, O., Zhao, G., Qiao, N., Xing, Y., Wang, D., Hu, T., Fang, W., Demirci, T., De Marchi, M., Deng, L., Yan, T., Nielsen, C., Sheik, S., Wu, C., Tian, Y., Xu, B., and Li, G. Spike-based dynamic computing with asynchronous sensing-computing neuromorphic chip. Nature Communications, 15:4464, 2024. doi: 10.1038/s41467-024-47811-6.

Zhang, L., Zhou, S., Zhi, T., Du, Z., and Chen, Y. Tdsnn: From deep neural networks to deep spike neural networks with temporal-coding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp. 1319 1326, 2019.

Zhu, Y., Ding, J., Huang, T., Xie, X., and Yu, Z. Online stabilization of spiking neural networks. In The Twelfth International Conference on Learning Representations, 2024.

Differential Coding for Training-Free ANN-to-SNN Conversion

A. Overall Algorithm

Algorithm 2 outlines the whole procedures we adopt.

Algorithm 2 Differential Coding with Graded Units and Spiking Neurons (DCGS) Conversion Method

1: Input: Pre-trained ANN model FANN(W ), Dataset D. Time-step T, or Threshold percentage p and scaling factor c. 2: Output: Converted SNN model FSNN(W , θ, v) 3: Step 1: Determine the Threshold: 4: if FANN(W ) is a Re LU network then 5: Use the threshold iteration method with T to calculate threshold θ on dataset D 6: else 7: Static threshold θ as the top p% of activation values on dataset D, and multiply by the scaling factor c 8: end if 9: Step 2: Replace Modules: 10: Replace the nonlinear layer with a differential graded unit. 11: Insert a differential identity spiking neuron before each linear layer. 12: Remove the bias b from the linear layer and set the initial potential v = b for the next layer. 13: Return the converted SNN model FSNN(W , θ, v)

B. Explanation of Definition 4.1

Definition B.1. (Repeated from Definition 4.1) In differential coding, the encoded activation value rl[t] is defined as shown in Equation (14), where el[t] represents the encoded output value of the neuron at time-step t, and xl[t] represents the actual output value of the neuron. The relationship between the two is expressed by Equation (13), as follows:

el[t] = rl[t 1] + xl[t], (36)

rl[t] = rl[t 1] + xl[t]

i=1 el[i], (37)

where t starts from 1, rl[0] = 0.

Proof. In this definition, el[t] is essentially adjusted based on the historical encoded values. If no spike is emitted, then el[t] = rl[t 1], also ensuring that the encoded value rl[t] = rl[t 1]. The derivation of Equation (37) can be written as:

rl[t] = rl[t 1] + xl[t]

t + el[t 1]

i=1 el[i] + 0

Differential Coding for Training-Free ANN-to-SNN Conversion

C. Proof of Theorem 4.2

Theorem C.1. (Repeated from Theorem 4.2) Let F l be a nonlinear layer l with only one input xl 1[t], such as Gelu, Silu, Maxpool, Layer Norm, or Softmax. In ANN-to-SNN conversion, the mapping from F to dynamics of the differential graded unit in differential coding is given by Equations (15) and (16).

ml[t] = rl 1[t] = ml[t 1] + xl 1[t]

xl[t] = t (F l(ml[t]) F l(ml[t 1])), (40)

where ml[t] is the membrane potential at time-step t which is equal to the encoded input value, rl[t] is the encoded output activation value of the previous t time-steps. The output of layer l at time-step t, which serves as the input to layer l + 1, is given by xl[t].

ml[t] = rl 1[t] = rl 1[t 1] + xl 1[t]

t = ml[t 1] + xl 1[t]

xl[t] = t (rl[t] rl[t 1]) = t (F(rl 1[t]) F(rl 1[t 1])) = t (F(ml[t]) F(ml[t 1])) (42)

From Theorem 4.2, a single-input unit requires two variables: one to record ml[t] and another to record F(ml[t]), in order to reduce redundant calculations at each time-step.

D. Proof of Theorem 4.3

Theorem D.1. (Repeated from Theorem 4.3) Let be an operation with two inputs, such as matrix multiplication or element-wise multiplication. In ANN-to-SNN conversion, the mapping from operation to dynamics of the differential graded unit in differential coding is given by Equations (43) to (45).

ml A[t] = rl 1 A [t] = ml A[t 1] + xl 1 A [t]

ml B[t] = rl 1 B [t] = ml B[t 1] + xl 1 B [t]

xl[t] = xl 1 A [t] xl 1 B [t] t + xl 1 A [t] ml B[t] + ml A[t] xl 1 B [t], (45)

where ml A[t] and ml B[t] are membrane potential at time-step t, and rl 1 A [t] and rl 1 B [t] are the encoded activation values of the previous layers at time-step t. The output of layer l at time-step t, which serves as the input to layer l + 1, is given by xl[t].

ml A[t] = rl 1 A [t] = rl 1 A [t 1] + xl 1 A [t]

t = ml A[t 1] + xl 1 A [t]

ml B[t] = rl 1 B [t] = rl 1 B [t 1] + xl 1 B [t]

t = ml B[t 1] + xl 1 B [t]

xl[t] = t (rl[t] rl[t 1])

= t (ml A[t] ml B[t] ml A[t 1] ml B[t 1])

= t ((ml A[t 1] + xl 1 A [t]

t ) (ml B[t 1] + xl 1 B [t]

t ) ml A[t 1] ml B[t 1])

= xl 1 A [t] xl 1 B [t] t + xl 1 A [t] ml B[t 1] + ml A[t 1] xl 1 B [t]

Differential Coding for Training-Free ANN-to-SNN Conversion

From Theorem 4.3, a neuron with two inputs requires two variables to record ml A[t] and ml B[t], respectively.

E. Proof of Theorem 4.4

Theorem E.1. (Repeated from Theorem 4.4) In rate coding, the output of the previous layer, xl 1[t], is directly used as the input current for the current layer Il[t] = xl 1[t]. In differential coding, the input current Il[t] can be adjusted as shown in Equation (49), which converts any spiking neuron into a differential spiking neuron:

Il[t] = ml r[t] + xl 1[t], (49)

ml r[t + 1] = ml r[t] + xl 1[t]

where ml r[0] is bl 1 if the previous layer has bias else 0.

Proof. Let the expected input encoding value of the differential neuron at the l-th layer at time-step t be rl 1[t], and the expected output encoding value be rl[t]. Due to soft resetting, the total expected membrane potential change is ml r[t] = rl 1[t 1] rl[t 1]. The total input current is then:

Il[t] = rl 1[t 1] rl[t 1] + xl 1[t] = ml r[t] + xl 1[t] (51)

Since by Definition 4.1:

rl 1[t] = rl 1[t 1] + xl 1[t]

rl[t] = rl[t 1] + xl[t]

ml r[t + 1] = rl 1[t] rl[t] (54)

= rl 1[t 1] + xl 1[t]

t rl[t 1] xl[t]

= ml r[t] + xl 1[t]

F. Proof of Theorem 4.5

Theorem F.1. (Repeated from Theorem 4.5) For linear layers, including linear and convolutional layers that can be represented by Equation (57),

xl = W lxl 1 + bl, (57)

where W l and bl is the weight and bias of layer l.

Under differential coding, this is equivalent to eliminating the bias term bl by initializing the membrane potential of the subsequent layer with the bias value or adding the bias to the first input current of that layer, and then running the dynamic process at each time-step according to the following equation:

xl[t] = W lxl 1[t] (58)

Proof. In the context of ANN-to-SNN conversion using rate coding, the output xl[t] of layer l can be expressed as:

xl[t] = W lxl 1[t] + bl, (59)

Differential Coding for Training-Free ANN-to-SNN Conversion

Under differential coding as defined in Definition B.1, we have:

el[t] = rl[t 1] + xl[t], (60)

rl[t] = rl[t 1] + xl[t]

rl[0] = 0 (62)

el[t] = W lel 1[t] + bl (63)

rl[t] = W lrl 1[t] + bl (64)

For t > 1, the following transformation holds:

xl[t] = el[t] rl[t 1] = t rl[t] (t 1) rl[t 1] rl[t 1]

= t (W lrl 1[t] + bl W lrl 1[t 1] bl)

= W lxl 1[t]

when t = 1, we can let rl[0] = bl, and initialize the membrane potential of the subsequent layer with the bias bl or adding the bias to the first input current of that layer and start running from time-step 1:

xl[t] = el[t] rl[t 1] = el[1] rl[0]

= W lel 1[t] + bl bl = W lxl 1[t] + W lrl 1[0]

= W lxl 1[t]

Therefore, for any t > 0 in differential coding, we have:

xl[t] = W lxl 1[t] (67)

G. Proof of Lemma 4.8

We first prove a previous lemma before proving Lemma 4.8.

a (c x)2 e (x µ)2

π 2 σ σ2 + µ2 2cµ + c2 erf (b µ)

+ σ2 (b + µ 2c) e (b µ)2

2σ2 + σ2 (a + µ 2c) e (a µ)2

Proof. We first calculate R b a e (x µ)2

2σ2 dx, R b a xe (x µ)2

2σ2 dx and R b a x2e (x µ)2

2σ2 dx separately. Since erf (x) =

2 π R x 0 e t2dt, R x 0 e t2dt = π

π 2 σ erf (b µ)

a xe (x µ)2

2σ2 dx = Z b

a (x µ + µ) e (x µ)2

2σ2 dx = σ2 Z b

= σ2 e (b µ)2

2σ2 e (a µ)2

π 2 σ erf (b µ)

Differential Coding for Training-Free ANN-to-SNN Conversion

a x2e (x µ)2

2σ2 dx = Z b

(x µ)2 + 2µ (x µ) + µ2 e (x µ)2

a (x µ) de (x µ)2

2µ (x µ) + µ2 e (x µ)2

= σ2 (b µ) e (b µ)2

2σ2 + σ2 (a µ) e (a µ)2

2σ2 + σ2 Z b

2σ2 d (x µ)

2σ2µ e (b µ)2

2σ2 e (a µ)2

π 2 σ erf (b µ)

= σ2 (b µ) e (b µ)2

2σ2 + σ2 (a µ) e (a µ)2

π 2 σ erf (b µ)

2σ2µ e (b µ)2

2σ2 e (a µ)2

π 2 σ erf (b µ)

= σ2 (b + µ) e (b µ)2

2σ2 + σ2 (a + µ) e (a µ)2

π 2 σ σ2 + µ2 erf (b µ)

Finally, aggregate and calculate the answer:

a (c x)2 e (x µ)2

2σ2 dx = Z b

x2 2cx + c2 e (x µ)2

= σ2 (b + µ) e (b µ)2

2σ2 + σ2 (a + µ) e (a µ)2

π 2 σ σ2 + µ2 erf (b µ)

2c σ2 e (b µ)2

2σ2 e (a µ)2

π 2 σ erf (b µ)

π 2 σ erf (b µ)

π 2 σ σ2 + µ2 2cµ + c2 erf (b µ)

+ σ2 (b + µ 2c) e (b µ)2

2σ2 + σ2 (a + µ 2c) e (a µ)2

Differential Coding for Training-Free ANN-to-SNN Conversion

Then, we can prove Lemma 4.8 base on Lemma G.1.

Lemma G.2. (Repeated from Lemma 4.8)

QE1(θ, k) = Z +

(f1(x, θ, k) max (x, 0))2 e (x µ)2

2σ2 dx (73)

f1(x, θ, k) = k θ

When θ is fixed, QE1(θ, k) reaches its minimum value when:

1 Pn i=1 1 nerf

1 Pn i=1 2i 1

Pn i=1 1 ne

1 Pn i=1 2i 1

Proof. According to Lemma G.1, we have:

a (c x)2 e (x µ)2

π 2 σ σ2 + µ2 2cµ + c2 erf (b µ)

+ σ2 (b + µ 2c) e (b µ)2

2σ2 + σ2 (a + µ 2c) e (a µ)2

Expand the calculation of QE1:

0 (x)2 e (x µ)2

n x 2 e (x µ)2

2σ2 dx + Z +

2n (kθ x)2 e (x µ)2

π 2 σ σ2 + µ2

2σ2 + σ2µ e µ2

σ2 + µ2 2k iθ

σ2 (2i + 1) θ

2n + µ 2k iθ

e ( (2i+1)θ

σ2 (2i 1) θ

2n + µ 2k iθ

e ( (2i 1)θ

π 2 σ σ2 + µ2 2kθµ + (kθ)2

+ σ2 (2n 1)θ

2n + µ 2kθ e

Differential Coding for Training-Free ANN-to-SNN Conversion

π 2 σ σ2 + µ2

2σ2 + σ2µ e µ2

σ2 + µ2 2k (i 1) θ

n µ + k2 (i 1) θ

σ2 + µ2 2k iθ

n µ + k2 iθ

σ2 (2i 1) θ

2n + µ 2k (i 1) θ

e ( (2i 1)θ

σ2 (2i 1) θ

2n + µ 2k iθ

e ( (2i 1)θ

π 2 σ σ2 + µ2 2kθµ + (kθ)2

+ σ2 (2n 1)θ

2n + µ 2kθ e

π 2 σ σ2 + µ2

2σ2 + σ2µ e µ2

σ2 + µ2 2k (n 1) θ

n µ + k2 (n 1) θ

σ2 + µ2 2k θ

+ σ2 (2n 1) θ

2n + µ 2k (n 1) θ

e ( (2n 1)θ

2n + µ 2k θ

π 2 σ σ2 + µ2 2kθµ + (kθ)2

+ σ2 (2n 1)θ

2n + µ 2kθ e

σ2 + µ2 2k (i 1) θ

n µ + k2 (i 1) θ

2 σ2 µ2 + 2k iθ

σ2 (2i 1) θ

2n + µ 2k (i 1) θ

+ σ2 (2i 1) θ

2n + µ 2k iθ

e ( (2i 1)θ

π 2 σ σ2 + µ2 1 erf µ

π 2 σ 2kθµ + (kθ)2

nµ k2θ2 2n 1

e ( (2n 1)θ

2σ2 + σ2 2k θ

e ( (2i 1)θ

Differential Coding for Training-Free ANN-to-SNN Conversion

π 2 σ σ2 + µ2 1 erf µ

π 2 σ 2kθµ + (kθ)2

e ( (2i 1)θ

i=1 (2i 1) erf

π 2 σθµ 2σ2θ

1 ne ( (2i 1)θ

π 2 σ σ2 + µ2 1 erf µ

Since this is a quadratic function with respect to k, the value of k that minimizes it is:

2 σθµ Pn i=1 1 nerf ( (2i 1)θ

1 2σ2θ Pn i=1 1 ne ( (2i 1)θ

2 σθ2 1 Pn i=1 2i 1

n2 erf ( (2i 1)θ

= µ 1 Pn i=1 1 nerf ( (2i 1)θ

θ 1 Pn i=0 2i 1

n2 erf ( (2i 1)θ

+ σ Pn i=1 1 ne ( (2i 1)θ

2 θ 1 Pn i=1 2i 1

n2 erf ( (2i 1)θ

H. Proof of Lemma 4.9

Lemma H.1. (Repeated from Lemma 4.9)

QE2(θ, k) = Z +

(f2(x, θ, k) max (x, 0))2 e (x µ)2

2σ2 dx (79)

f2(x, θ, k) = θ

2 kθ , 0, N

When θ is fixed, QE2(θ, k) reaches its minimum value when k = 1.

Proof. Expand the calculation of QE2:

2 kθ , 0, n

0 (x)2 e (x µ)2

n x 2 e (x µ)2

2σ2 dx + Z +

2n (θ x)2 e (x µ)2

Differential Coding for Training-Free ANN-to-SNN Conversion

π 2 σ σ2 + µ2

2σ2 + σ2µ e µ2

σ2 + µ2 2iθ

σ2 (2i+1)θk

σ2 (2i 1)θk

π 2 σ σ2 + µ2 2θµ + θ2

+ σ2 (2n 1)θk

2n + µ 2θ e

π 2 σ σ2 + µ2

2σ2 + σ2µ e µ2

π 2 σ σ2 + µ2 2θµ + θ2

+ σ2 (2n 1)θk

2n + µ 2θ θe

σ2 + µ2 2(i 1) θ

n µ + (i 1) θ

σ2 + µ2 2iθ

σ2 (2i 1)θk

2n + µ 2(i 1) θ

σ2 (2i 1)θk

π 2 σ σ2 + µ2

2σ2 + σ2µ e µ2

π 2 σ σ2 + µ2 2θµ + θ2

+ σ2 (2n 1)θk

2n + µ 2θ θe

σ2 + µ2 2(n 1) θ

n µ + (n 1) θ

σ2 + µ2 2 θ

n µ + (i 1) θ

+ σ2 (2n 1)θk

2n + µ 2(n 1) θ

2σ2 + σ2 θk 2n + µ 2 θ

σ2 (2i 1)θk

2n + µ 2(i 1) θ

+ σ2 (2i 1)θk

π 2 σ σ2 + µ2 erf µ

π 2 σ σ2 + µ2 2θµ + θ2

2 ( 2i + 1)

2 ( 2i + 1)

2σ2 + σ2 2 θ

Differential Coding for Training-Free ANN-to-SNN Conversion

π 2 σ σ2 + µ2 1 erf µ

π 2 σ 2θµ + θ2

2 ( 2i + 1)

π 2 σ σ2 + µ2 1 erf µ

π 2 σ 2θµ + θ2

We only need to extract the part containing k, denoted as L, for calculation.

Since erf (x) = 2 πe x2,

2σ2 (2i 1)θ

2σ2 (2i 1)θ

i=1 µ (2i 1)θ

2σ2 (2i 1)θ

2σ2 + (2i 1)θk

2n µ (2i 1)θ

i=1 (2i 1)θ

2nσ (2i 1) θ

2σ2 + (2i 1)θk

4n2σ (k 1) e

So, the value of k that minimizes the function is k = 1.

I. Proof of Theorem 4.10

Theorem I.1. (Repeated from Theorem 4.10) Starting from any positive initial value of θ, the rate of change k1 can be continuously calculated based on the prior mean µ, variance σ2, and the current threshold θ using Equation (27). The iteration θ = k1θ continues until convergence, at which point the global optimal threshold θ is obtained. The process is guaranteed to converge as long as the threshold is greater than 0.

Proof. Assume the optimal threshold is θ, then according to Lemma 4.8, after one update kθ should be equal to θ, that is k = 1.

To prove that θ is the optimal value, it is equivalent to proving that the following equation has an unique solution:

1 Pn i=1 1 nerf

1 Pn i=1 2i 1

Pn i=1 1 ne

1 Pn i=1 2i 1

It is equivalent to proving that f (θ) has an unique root.

Differential Coding for Training-Free ANN-to-SNN Conversion

1 Pn i=1 1 nerf

Pn i=1 1 ne

1 Pn i=1 2i 1

Step 1: Calculate the first derivative of f

Since erf (x) = 2 πe x2,

f (θ) = 2µ π

Step 2: Calculate the second derivative of f

f (θ) = 2 π

Step 3: Analyze the trend of the second derivative f (θ)

To analyze the changing trend of f (x) on the interval from 0 to positive infinity, we start by examining the given expressions:

f (0) = 3 π

f (+ ) = 0 (89)

The behavior of f (θ) as θ increases from 0 to + depends on two parts: e

θ ( (2i 1)θ

2n . Since e

> 0, the sign of f (θ) is determined by the second part. This term

decreases from 3 to negative infinity as θ increases.

Thus, f (θ) initially decreases from a positive number to a negative number and then continues to increase within the negative range.

Differential Coding for Training-Free ANN-to-SNN Conversion

Step 4: Analyze the Trend of the First Derivative f (θ)

f (0) = 1 + erf µ

< 0, f (+ ) = 0 (90)

Since f (θ) first decreases from a positive number to a negative number and then increases within the negative range, f (θ) will first increase from a negative number to a positive number and then decrease back to zero.

Step 5: Analyze the Trend of the Function f(θ)

2σ2 = µ 1 erf µ

2σ2 > 0 (91)

f(+ ) = µ(1 1) + σ p π

= 0 + 0 0 = 0 (92)

Since f (θ) first increases from a negative number to a positive number and then decreases back to zero, f(θ) will first decrease from a positive number to a minimum value and then increase towards zero.

Therefore, f(θ) has a unique root, which implies that the local optimal threshold θ is the global optimal threshold.

J. The Employed Neuron Model

We use a differential version of the Multi-Threshold (MT) neuron, as introduced in (Huang et al., 2024). The differential MT neuron is characterized by several parameters, including the base threshold θ, and a total of 2n thresholds, with n positive and n negative thresholds. The threshold values of the differential MT neuron are indexed by i, where λl i represents the i-th threshold value in the layer l:

λl 1 = θl, λl 2 = θl

2 , ..., λl n = θl

λl n+1 = θl, λl n+2 = θl

2 , ..., λl 2n = θl

2n 1 . (93)

Let variables Il[t], sl i[t], xl[t], ml[t], vl[t], and ml r[t] represent the input current, output spike of the i th threshold, encoded output value, the membrane potential before and after spikes in the l-th layer at time-step t, and another membrane potential to record encoded input rate information, respectively. The dynamics of the MT neurons are described by the following equations:

Il[t] = ml r[t] + xl 1[t] (94)

ml r[t + 1] = ml r[t] + xl 1[t]

ml[t] = vl[t 1] + Il[t], (96)

sl i[t] = MTHθ,n(ml[t], i) (97)

i sl i[t]λl i, (98)

vl[t] = ml[t] xl[t], (99)

MTHθ,n(ml[t], i) =

0,if λ2n < x < λn 1,elif i = arg minp |x λp| 0,else . (100)

When n = 1, this model reduces to a differential IF neuron with a negative threshold.

Differential Coding for Training-Free ANN-to-SNN Conversion

K. Result of Different Models on Image Net Dataset

Table 3 and 4 present the evaluation results for various CNN-based and Transformer-based models. The variable 2n denotes the number of positive and negative thresholds in the multi-threshold neurons, where the negative thresholds are the opposite number of corresponding positive thresholds. The energy ratio is the energy consumption of SNNs divided by that of ANNs. For the Res Net18, Res Net34, and VGG16 models, the threshold scale is set to 1. For the Vi T and EVA02 models, the threshold scale is 4.

Table 3. Accuracy and Energy Efficiency of DCGS(Ours) on CNN-based models for Image Net Dataset

Architecture/Parameter(M) Original(ANN)(%) n Accuracy Time-step T

Energy 2 4 8 16 32 64

Res Net18 / 11.7M 71.49

1 Acc 0.10 0.11 1.57 51.89 69.89 71.08 Energy ratio 0.05 0.09 0.18 0.31 0.49 0.76

4 Acc 61.45 70.07 71.31 71.47 71.49 - Energy ratio 0.14 0.22 0.33 0.46 0.66 -

8 Acc 65.30 70.96 71.40 71.51 - - Energy ratio 0.17 0.32 0.48 0.63 - -

Res Net34 / 21.8M 76.42

1 Acc 0.10 0.14 0.46 8.76 58.86 74.11 Energy ratio 0.04 0.09 0.19 0.34 0.61 0.97

4 Acc 59.71 73.35 76.04 76.35 76.38 - Energy ratio 0.14 0.24 0.37 0.53 0.76 -

8 Acc 65.23 74.68 76.17 76.37 - - Energy ratio 0.18 0.34 0.55 0.75 - -

VGG16 / 138M 73.25

1 Acc 0.08 0.16 1.03 55.48 72.04 73.13 Energy ratio 0.03 0.06 0.13 0.19 0.29 0.40

4 Acc 70.69 72.72 73.17 73.26 73.24 - Energy ratio 0.10 0.15 0.22 0.29 0.38 -

8 Acc 72.26 73.16 73.22 73.22 - - Energy ratio 0.15 0.28 0.37 0.45 - -

L. Effectiveness of Differential Coding

Table 5 presents a comparative experiment between differential coding and rate coding. In most cases, differential coding outperforms rate coding in both accuracy and energy ratio, particularly as n increases.

M. Effectiveness of Threshold Iteration Method

Table 6 presents a comparative experiment between the threshold iteration method and the 99.9% large activation method. The threshold iteration method outperforms the 99.9% large activation method across different threshold numbers n and threshold scales.

N. Evaluation results of object detection task on the COCO dataset

We evaluated the performance of our approach for object detection task on the COCO dataset using three different models provided by torchvision in various parameter settings, along with ablation studies, as shown in Table 7 and 8. The result shows that both differential coding and Threshold Iteration method improves the network s performance.

Differential Coding for Training-Free ANN-to-SNN Conversion

O. Evaluation results of Semantic segmentation task on the Pascal VOC dataset

Additionally, we evaluated our method for semantic segmentation task on the Pascal VOC dataset using two different models provided by torchvision in various parameter settings, also conducting ablation experiments, as presented in Table 9 and 10. The result shows that both differential coding and Threshold Iteration method improves the network s performance.

P. Algorithm of MT Neuron on GPU

To enable fast execution on the GPU, we also design an efficient algorithm for each time step, as illustrated in Algorithm 3. This algorithm leverages the torch.float32 data type and takes advantage of the IEEE 754 single-precision floatingpoint format, where the exponential part has a bias value of 127, as an example.

Algorithm 3 Algorithm of MT Neuron on GPU

1: Input: Total input x and membrane potential m of MT neuron, number of thresholds parameter n. 2: Output: Output sum Vth[i] S[t] which is denote as spike sum 3: Step 1: Add input to membrane potential 4: m = m + x 5: Step 2: Set mantissa to zero 6: int tensor = (m*4/3).view(torch.int32) 7: mantissa mask = (1 << 23) - 1 8: int tensor = int tensor & mantissa mask 9: Step 3: Extract the exponential part of spike 10: exponent mask = 0x FF << 23 11: spike exponent = (int tensor & exponent mask) >> 23 12: Step 4: Find the appropriate threshold to output 13: spike exponent = torch.where(spike exponent - 127 <= -n, torch.tensor(0,), torch.where(spike exponent - 127 > 0, torch.tensor(127,),spike exponent)) 14: Step 5: Construct a new exponential part to construct the spike output sum 15: int tensor = (int tensor & exponent mask) | (spike exponent << 23) 16: spike sum = int tensor.view(torch.float32) 17: Step 6: Reset membrane potential 18: m = m - spike sum

This algorithm leverages the IEEE 754 float32 format to efficiently compute spike outputs in the neuron model. At each time step, input is added to the membrane potential m, which is then scaled and cast to int32 to access the exponent field directly.

By masking out the mantissa and extracting the exponent, the algorithm quickly determines the magnitude of m. Using the bias value 127 and a threshold parameter n, it determines which threshold to apply for generating a spike. The exponent is adjusted accordingly and used to reconstruct spike sum as a float, which is then subtracted from m to complete the reset.

This approach avoids conditional branches and enables efficient bitwise operations and vectorization on the GPU, making it suitable for high-speed MT neuron simulations.

Differential Coding for Training-Free ANN-to-SNN Conversion

Table 4. Accuracy and Energy Efficiency of DCGS(Ours) on Transformer-based models for Image Net Dataset

Architecture/Parameter(M) Original(ANN)(%) n Accuracy Time-step T

Energy 2 4 8 16 32 64

Vi T-Small / 22.1M 81.38

1 Acc 0.1 0.1 0.1 0.11 0.19 64.92 Energy ratio 0.00 0.001 0.008 0.06 0.28 1.34

4 Acc 0.1 0.16 62.76 79.25 80.95 - Energy ratio 0.03 0.12 0.45 1.06 2.06 -

6 Acc 44.59 78.15 81.02 81.44 - - Energy ratio 0.20 0.44 0.83 1.44 - -

8 Acc 77.84 81.11 81.43 81.38 - - Energy ratio 0.32 0.62 1.05 1.71 - -

Vi T-Base / 86.6M 84.54

4 Acc 0.10 0.12 29.36 80.78 83.70 - Energy ratio 0.01 0.05 0.28 0.54 0.74 1.44

6 Acc 0.10 46.03 82.72 84.69 - - Energy ratio 0.05 0.20 0.52 1.04 - -

8 Acc 80.34 83.98 84.23 84.27 - - Energy ratio 0.28 0.54 0.92 1.46 - -

Vi T-Large / 304.3M 85.84 4 Acc 0.10 0.10 0.18 80.74 85.00 - Energy ratio 0.00 0.01 0.09 0.45 0.92 -

6 Acc 0.12 78.99 84.76 85.59 - - Energy ratio 0.06 0.23 0.51 0.94 - -

8 Acc 83.73 85.45 85.68 85.74 - - Energy ratio 0.24 0.46 0.80 1.33 - -

EVA02-Tiny / 5.8M 80.63

4 Acc 0.09 0.15 0.88 65.75 78.46 - Energy ratio 0.01 0.05 0.24 0.94 2.30 -

6 Acc 0.32 52.86 77.72 80.04 - - Energy ratio 0.11 0.34 0.86 1.81 - -

8 Acc 66.32 79.56 80.38 80.578 - - Energy ratio 0.29 0.66 1.26 2.28 - -

EVA02-Small / 22.1M 85.73

4 Acc 0.09 0.10 0.14 34.86 82.66 - Energy ratio 0.01 0.04 0.19 0.67 1.98 -

6 Acc 0.14 30.83 82.20 85.48 - - Energy ratio 0.09 0.29 0.80 1.71 - -

8 Acc 71.37 84.70 85.64 85.72 - - Energy ratio 0.28 0.63 1.21 2.17 - -

EVA02-Base / 87.1M 88.69 6 Acc 3.25 81.00 87.86 - - - Energy ratio 0.11 0.36 0.82 - - -

8 Acc 84.62 88.16 88.46 - - - Energy ratio 0.30 0.64 1.17 - - -

EVA02-Large / 305.1M 90.05 6 Acc 12.41 87.02 89.57 - - - Energy ratio 0.13 0.39 0.84 - - -

8 Acc 88.25 89.72 89.90 - - - Energy ratio 0.31 0.64 1.15 - - -

Differential Coding for Training-Free ANN-to-SNN Conversion

Table 5. Effective of Differential Coding

Architecture@ Coding Type@ n Accuracy Time-step T

Original(ANN)(%) Threshold scale Energy 2 4 8 16 32 64

Res Net34@76.42

Differential@ 1

1 Acc 0.10 0.14 0.46 8.76 58.86 74.11 Energy ratio 0.04 0.09 0.19 0.34 0.61 0.97

4 Acc 59.71 73.35 76.04 76.35 76.38 - Energy ratio 0.14 0.24 0.37 0.53 0.76 -

8 Acc 65.23 74.68 76.17 76.37 - - Energy ratio 0.18 0.34 0.55 0.75 - -

Differential@ 2

1 Acc 0.10 0.13 0.31 2.78 46.29 73.07 Energy ratio 0.02 0.05 0.13 0.25 0.46 0.77

4 Acc 46.1 69.53 75.77 76.33 76.43 - Energy ratio 0.11 0.20 0.32 0.48 0.71 -

8 Acc 72.03 76.24 76.39 76.41 - - Energy ratio 0.17 0.32 0.50 0.66 - -

1 Acc 0.11 0.29 11.03 52.78 68.05 71.04 Energy ratio 0.04 0.11 0.26 0.55 1.11 2.22

4 Acc 58.46 68.46 71.20 71.77 71.99 - Energy ratio 0.15 0.31 0.62 1.22 2.42 -

8 Acc 59.79 68.61 71.1 71.8 - - Energy ratio 0.19 0.38 0.74 1.45 - -

1 Acc 0.10 0.10 1.50 41.08 69.78 74.95 Energy ratio 0.02 0.07 0.17 0.38 0.77 1.54

4 Acc 51.34 71.22 75.11 75.78 75.92 - Energy ratio 0.13 0.26 0.53 1.05 2.09 -

8 Acc 65.26 74.24 75.72 75.96 - - Energy ratio 0.19 0.46 0.73 1.43 - -

Vi T-Small@81.38

Differential@ 4

1 Acc 0.1 0.1 0.1 0.11 0.19 64.92 Energy ratio 0.00 0.001 0.008 0.06 0.28 1.34

4 Acc 0.1 0.16 62.76 79.25 80.95 - Energy ratio 0.03 0.12 0.45 1.06 2.06 -

8 Acc 77.84 81.11 81.43 81.38 - - Energy ratio 0.32 0.62 1.05 1.71 - -

1 Acc 0.1 0.1 0.1 0.1 0.12 54.26 Energy ratio 0.00 0.001 0.008 0.05 0.22 1.17

4 Acc 0.1 0.14 51.46 78.07 80.69 - Energy ratio 0.03 0.12 0.44 1.13 2.47 -

8 Acc 75.64 80.29 81.18 81.36 - - Energy ratio 0.32 0.67 1.38 2.81 - -

Differential Coding for Training-Free ANN-to-SNN Conversion

Table 6. Effective of threshold iteration Method

Architecture@ Find Threshold Method@ n Accuracy Time-step T

Original(ANN)(%) Threshold Scale Energy 2 4 8 16 32 64

Res Net34@76.42

threshold iteration@ 1

1 Acc 0.10 0.14 0.46 8.76 58.86 74.11 Energy ratio 0.04 0.09 0.19 0.34 0.61 0.97

4 Acc 59.71 73.35 76.04 76.35 76.38 - Energy ratio 0.14 0.24 0.37 0.53 0.76 -

8 Acc 65.23 74.68 76.17 76.37 - - Energy ratio 0.18 0.34 0.55 0.75 - -

threshold iteration@ 2

1 Acc 0.10 0.13 0.31 2.78 46.29 73.07 Energy ratio 0.02 0.05 0.13 0.25 0.46 0.77

4 Acc 46.1 69.53 75.78 76.33 76.43 - Energy ratio 0.11 0.20 0.32 0.48 0.71 -

8 Acc 72.03 76.24 76.39 76.41 - - Energy ratio 0.17 0.32 0.50 0.66 - -

99.9% large activation@ 1

1 Acc 0.10 0.14 0.74 7.08 45.68 69.50 Energy ratio 0.04 0.09 0.19 0.35 0.64 1.08

4 Acc 34.08 63.18 74.41 75.82 76.14 - Energy ratio 0.13 0.24 0.39 0.60 0.91 -

8 Acc 49.60 71.53 75.46 76.02 - - Energy ratio 0.18 0.33 0.55 0.78 - -

99.9% large activation@ 2

1 Acc 0.10 0.10 0.22 0.93 31.32 71.36 Energy ratio 0.02 0.05 0.13 0.25 0.50 0.89

4 Acc 15.74 50.97 73.88 76.12 76.32 - Energy ratio 0.11 0.20 0.35 0.55 0.86 -

8 Acc 67.92 75.54 76.25 76.40 - - Energy ratio 0.18 0.32 0.50 0.71 - -

Table 7. Accuracy and energy efficiency of DCGS (Ours) across different models for object detection task on the COCO dataset.

Architecture ANN m AP%[Io U=0.50:0.95] m AP n Time-step T Energy ratio 2 4 6 8

FCOS Res Net50 39.2 m AP 2 0.0 0.2 1.6 6.3 Energy ratio 0.12 0.24 0.35 0.47

FCOS Res Net50 39.2 m AP% 4 21.0 33.9 36.7 38.2 Energy ratio 0.16 0.31 0.43 0.55

FCOS Res Net50 39.2 m AP% 8 30.5 38.5 39.2 39.2 Energy ratio 0.22 0.42 0.61 0.75

Retinanet Res Net50 36.4 m AP% 8 25.6 33.9 35.8 36.0 Energy ratio 0.23 0.44 0.63 0.78

Retinanet Res Net50 v2 41.5 m AP% 8 19.7 32.6 37.9 39.7 Energy ratio 0.22 0.43 0.64 0.84

Differential Coding for Training-Free ANN-to-SNN Conversion

Table 8. Ablation Study of DCGS (Ours) on FCOS Res Net50 model for object detection task on the COCO dataset.

Coding Type Threshold Searching method m AP n Time-step T Energy ratio 2 4 6 8

Differential Threshold Iteration m AP% 8 30.5 38.5 39.2 39.2 Energy ratio 0.22 0.42 0.61 0.75

Rate Threshold Iteration m AP% 8 21.8 31.5 34.3 35.5 Energy ratio 0.22 0.44 0.66 0.88

Differential 99.9% Large Activation m AP% 8 25.8 36.2 38.4 39.0 Energy ratio 0.22 0.43 0.62 0.78

Table 9. Accuracy and energy efficiency of DCGS (Ours) across different models for semantic segmentation task on the Pascal VOC dataset.

Architecture ANN m Io U% m Io U n Time-step T Energy ratio 2 4 6 8

FCN Res Net50 64.2 m Io U% 2 4.0 10.1 19.8 36.0 Energy ratio 0.03 0.10 0.15 0.22

FCN Res Net50 64.2 m Io U% 4 51.8 60.5 62.7 64.0 Energy ratio 0.10 0.20 0.27 0.35

FCN Res Net50 64.2 m Io U% 8 61.0 64.3 64.6 64.5 Energy ratio 0.18 0.34 0.50 0.63

Deeplabv3 Res Net50 69.3 m Io U% 8 66.6 69.1 69.3 69.3 Energy ratio 0.08 0.32 0.46 0.58

Table 10. Ablation Study of DCGS (Ours) on FCN Res Net50 model for semantic segmentation task on the Pascal VOC dataset.

Coding Type Threshold Searching Method m AP n Time-step T Energy ratio 2 4 6 8

Differential Threshold Iteration m Io U% 8 61.0 64.3 64.6 64.5 Energy ratio 0.18 0.34 0.50 0.63

Rate Threshold Iteration m Io U% 8 58.2 62.9 63.7 63.9 Energy ratio 0.18 0.37 0.54 0.71

Differential 99.9% Large Activation m Io U% 8 61.2 64.3 64.5 64.4 Energy ratio 0.18 0.35 0.51 0.64