# theoretically_provable_spiking_neural_networks__d21a388c.pdf Theoretically Provable Spiking Neural Networks Shao-Qun Zhang Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210093, China {zhangsq,zhouzh}@lamda.nju.edu.cn Spiking neural networks have attracted increasing attention in recent years due to their potential of handling time-dependent data. Many algorithms and techniques have been developed; however, theoretical understandings of many aspects of spiking neural networks are far from clear. A recent work [44] disclosed that typical spiking neural networks could hardly work on spatio-temporal data due to their bifurcation dynamics and suggested that the self-connection structure has to be added. In this paper, we theoretically investigate the approximation ability and computational efficiency of spiking neural networks with self connections, and show that the self-connection structure enables spiking neural networks to approximate discrete dynamical systems using a polynomial number of parameters within polynomial time complexities. Our theoretical results may shed some insight for the future studies of spiking neural networks. 1 Introduction The past decades have witnessed an increasing interest in spiking neural networks (SNNs) due to their great potential of modeling time-dependent data [28, 32, 37]. The fundamental units of SNNs are usually formulated as the combination of an integrated process (e.g., some first-order parabolic equations) and a step firing function. There has been significant progress on computational and implementation techniques for SNNs in computer vision [13, 33], speech recognition [30, 39], reinforcement learning [11, 38], few-short learning [16, 22, 25], etc. However, theoretical understandings of many aspects of SNNs, such as the approximation ability and computational efficiency on spatio-temporal systems, are far from clear. Some researchers [19, 20, 21, 31] focused on the approximation universality of SNNs, in which some typical SNNs can simulate the standard computational models such as Turing machines, random access machines, threshold circuits, sequence-to-sequence mapping, etc. There are also efforts on the computational efficiency of SNNs for some specific issues, such as the convergence in the limit results and computational complexity of SNNs for the sparse coding problem [34, 35] and temporal quadratic programming [5, 8], respectively. Amazingly, a recent study [44] theoretically proved that, contrary to previous beliefs, typical SNNs can hardly work well on spatio-temporal data, because they in nature are bifurcation dynamical systems with fixed eigenvalues in which many patterns inherently cannot be learned. They also suggested that adding self-connection structure can enhance the representation ability of SNNs on spatio-temporal systems that fully connect the spiking neurons in the same layer and solves adaptive eigenvalues of discrete dynamical systems. Zhi-Hua Zhou is the corresponding author. 36th Conference on Neural Information Processing Systems (Neur IPS 2022). In this paper, we theoretically investigate the approximation ability and computational efficiency of the self-connection spike neural networks (sc SNNs). Our theoretical results show that equipped with self connections, sc SNNs can approximate discrete dynamical systems using polynomial number of parameters within polynomial time complexities. Our main contributions are summarized as follows: We prove that the proposed sc SNNs are universal approximators in Theorem 1. As for spatial approximation, we prove that a broad range of radial functions can be well approximated by sc SNNs with polynomial spiking neurons in Theorem 2. As for temporal approximation, we prove that multivariate spike flows can be approximated by sc SNNs within polynomial time in Theorem 3 and verify this conclusion in simulation experiments. The rest of this paper is organized as follows. Section 2 introduces notations. Section 3 presents the sc SNNs and some related concepts. Section 4 provides three key theorems to show the universal approximation ability and (parameters and time) complexity of sc SNNs for approximating the discrete dynamical system. Section 5 develops in-depth discussions of the effects led by self-connection. Section 6 concludes this work. 2 Notations Here, we provide some useful notations, which are detailed in Appendix. Let [N] = {1, 2, . . . , N} be an integer set for N N+, and | |# denotes the number of elements in a collection, e.g., |[N]|# = N. Let i = 1 be the imaginary unit, and x 0 means that every element xi 0 for any i. Let the sphere S(r) and globe B(r) be S(r) = {x | x = r} and B(r) = {x | x r} for any r R, respectively. Given a function g(n), we denote by h1(n) = Θ(g(n)) if there exist positive constants c1, c2 and n0 such that c1g(n) h1(n) c2g(n) for every n n0; h2(n) = O(g(n)) if there exist positive constants c and n0 such that h2(n) cg(n) for every n n0; h3(n) = Ω(g(n)) if there exist positive constants c and n0 such that h3(n) cg(n) for every n n0; h4(n) = o(g(n)) if there exists positive constant n0 such that h4(n) < cg(n) for every c > 0 and n n0. Let C(K, R) be the set of all scalar functions f : K R continuous on K Rn. Given α = (α1, α2, . . . , αm) Nm, we define Dαf(x) = α1 xα2 . . . αm where x = (x1, x2, . . . , xn) K. Further, we define Cl(K, R) = {f | f C(K, R) and Drf C(K, R), for r [l]} . For 1 p < , we define f C(K, R) and f p,K Z K |f(x)|p dx 1/p < This work considers the Sobolev space Wl,p µ (K, R), defined as the collection of all functions f Cl(K, R) and Drf Lp(K, R) for all |α| [l], that is, Dαf p,K = Z K |Dαf(x)|p dx 1/p < . This paper employs En to denote the n n unit matrix and det( ) to indicate the determinant operation on the matrix. Two n-by-n matrices A and B are called similar, denoted as A B, if there exists an invertible n n matrix P such that B = P 1AP. The general linear group over field F, denoted as GL(n, F), is the set of n n invertible matrices with entries in F. Especially, we define that a special linear group SL(n, F) is the subgroup of GL(n, F) and consists of matrices with determinant 1. For any field F, the n n orthogonal matrices form the following subgroup O(n, F) = {P GL(n, F) | P P = PP = En} Figure 1: Illustrations of sc SNNs and Self Connections. of the general linear group GL(n, F). Similarly, we have the special orthogonal group, denoted as SO(n, F), which consists of all orthogonal matrices of determinant 1 and is a normal subgroup of O(n, F). This group is also called the rotation group, generalizing that linearly transforms geometries while holding the surface orientation. Let ϕ2(x) be the density function of some probability measure µ, which satisfies Z x Rm ϕ2(x) dx = Z x B(1) 1 dx = 1 . For continuous functions f and g, we have the following equalities under Fourier transform c fϕ c gϕ L2(µ) = fϕ gϕ L2(µ) and c fϕ = ˆf ˆϕ , for the convolution operator . 3 Self-connection Spiking Neural Networks Let n be the number of spiking neurons, u(t) Rn and s(t) Rn denote the vectors of membrane potentials and spikes in which ui(t) and si(t) are the membrane potential and spikes of neuron i [n] at time t 0, respectively. Inheriting the recognition in [44], we here consider the selfconnection SNN (sc SNN) as follows: τm ui(t) + X j [n] Vijsj(t) + X k [m] Wik Ik(t) , (1) where τm is a positive-valued hyper-parameter with respect to membrane time, Ik(t) indicates the signal from input channel k [m] at time t, V Rn n and W Rn m denote the self-connection and connection weights matrices, respectively. Hence, P k [m] Wik Ik(t) indicates the signal received by neuron i at time t, and P j [n] Vijsj(t) denotes the effect on the membrane potential of neuron i when neuron j fires a spike, as illustrated in Figure 1. In general, we force the selfconnection matrix V to be symmetric. Based on the spike response model scheme [12], Eq. (1) has the following solution with the boundary condition urest = 0 ui(t) = Z t j [n] Vijsj(t) + X k [m] Wik Ik(t) where t denotes the last firing time t = max{s | ui(s) = ufiring, s < t} for a pre-given firing threshold ufiring > 0. Spiking neuron model employs the typical threshold rule, that is, neuron i fires spikes si(t) at time t if and only if ui(t) ufiring. After firing, the membrane potential is instantaneously reset to a lower value urest (rest voltage). Note that this work does not consider using absolute refractory periods [14] or refractory kernels [9]. Let the timing set Ti = {t | ui(t) = ufiring, t [0, T]} record the firing times of neuron i and Ni = |Ti|#. In general, we consider using the firing rate f ave i (T) = Ni/T to characterize the spike dynamics, where f ave i (T) indicates the average number of spikes of neuron i at time interval [0, T]. It is clearly observed that f ave i (t) is discontinuous since Ni is a step function regarding ui(t) and time t. To ensure the well-posed characteristics of the firing rate functions, we here employ the spike excitation function fe : u s to smooth the firing procedure, for example, Linear: fe(ui(t)) ui(t) ufiring , Cos-based: fe(ui(t)) 1 cos π t(k+1) t(k) where t(k) and t(k+1) are two adjacent timings from Ti in which k N+ and t(k) < t(k+1). Similar smooth treatments on spike excitation functions can refer to [13, 33]. Thus, f ave i (T) can be approximated by an Instantaneous Firing Rate (IFR) function fi(x, t) = fe(ui(t))/(t t ) , (4) where x Rm, t Ti, and t [T]. It is observed that the proposed IFR function fi(x, t) induces a discrete dynamical system, in which f( , t) : Rm R is a Spatial function and f(x, ) : R R is a temporal flow. About Neural Encoding. The actual input data (e.g., image or video) should usually be preconverted into a spiking version before fed up to SNNs. The conversion procedure is called neural encoding, as shown in Figure 1. There are two main categories of neural encoding approaches: temporal encoding and rate-based encoding; the former encodes input data by exploiting the distance between time instances that fire spikes [40], and the latter encodes input data as a count sequence of the fired spikes within temporal windows [18]. The rate-based encoding is the simplest and most popular scheme in SNNs. The representative techniques are usually encoded by a Poisson distribution or recorded by a dynamic vision sensor [3, 27]. Recent years have witnessed a lot of efforts on the information capacity of neural encoding, specially rate-based encoding, from empirical [6, 17] and theoretical [24, 35, 36] sides. Throughout this paper, we adopt rate-based encoding as the default and focus on the firing rates of SNNs, generalizing the computational powers concerning the spike count. About Firing Rates. When we investigate the dynamics and neural computation of SNNs, the firing rates or equally the number of firing spikes are the key measure of network activities for investigating neural computation and model dynamics because of the close relation between firing rates and network function (including neural input, connectivity, spiking function, and firing process) [1, 2]. There are great efforts to use firing rates in SNNs for some real-world tasks, such as vision [13, 33, 44] and speech recognition [30, 39]. Besides, Barrett et al. [5] and Chou et al. [8] showed that the averaged firing rate can approximate the optimal solutions of some quadratic programs within polynomial complexity. This work employs an instantaneous firing rate rather than the averaged firing rate or the total number of firing spikes used in previous studies. This manner provides a feasible way to construct the discrete dynamical systems using the IFR functions, based on which we can develop in-depth understandings of SNNs from spatial and temporal aspects. 4 Approximation to Discrete Dynamical Systems In this section, we first present the universal approximation for sc SNNs, and then show the parameters and time complexities of the IFR functions led by sc SNNs for approximating discrete dynamical systems from spatial and temporal aspects, respectively. 4.1 Universal Approximation Now, we present our first theorem about the universal approximation of sc SNNs as follows: Definition 1 Let l N+. The function f(x) is said to be l-finite if f is an l-times differentiable scalar function that satisfies R Dlf(x) dx < . Theorem 1 Let K Rm be a compact set. If the spike excitation function fe is l-finite and wi R where i [n], then for all r [l], there exists some time t such that the set of IFR functions f( , t) : K R of the form f(x, t) = P i [n] wifi(x, t) is dense in Cr(K, R). Theorem 1 shows that the sc SNN is a universal approximator, which provides a solid cornerstone for developing SNNs theory. The proof idea of this theorem can be summarized as follows. We utilize the invertibility of the Fourier transform on Sobolev space Wl,p µ (K, R) (p > 1), to project the concerned functional space Cr(K, R) into a characteristic space, and the corresponding objective function is transformed as a single integral over the characteristic space. According to Fubini s theorem, the approximation problem on Cr(K, R) can be converted into another that uses multiple integrals to a single integral on the characteristic space. The subsequent proof can then be completed along the thought lines of the technical exposition given by Carslaw and Rogosinski [7]. The full proof of Theorem 1 can be obtained in Appendix A. 4.2 ϵ-Approximation with Parameters Complexity This subsection presents our second theorem about the parameters complexity of sc SNNs as follows: Definition 2 We say that g is a radial function if g(x ) = g(x) for any x = x . Theorem 2 Given a compact set Km Rm, a probability measure µ, and a radial function g : Km R. For some apposite spike excitation function fe and any ϵ > 0, there exists some time t such that the radial function g can be well approximated by a one-hidden-layer sc SNN of O(Cm15/4) spiking neurons, that is, f(x, t) g(x) L2(µ) < ϵ . Theorem 2 shows that the sc SNNs with polynomial spiking neurons can well approximate a broad scope of radial functions, which may shed some insights on that sc SNNs admit input rotations [36], such as the rotation-based data argument techniques [29] and invariant models [23]. This paper provides two ways for proving this theorem. Here, we only introduce the proof idea of the interesting one (full proof is shown in Appendix B), and another proof way can be accessed in Appendix C. This proof idea can be summarized as follows. The radial function is invariant to rotations and dependent on the input norm, corresponding to the phase and norm (i.e., radial) of inputs, respectively. It is observed that s(t) in Eq. (1) induces a local recurrent function concerning the input x(t). Thus, there exists a collection of parameters such that the IFR function is dependent on the norm of inputs. Besides, the IFR function admits the phase since V is a symmetric matrix. Thus, there exist some linear connections (including rotation transformations) such that the weighted aggregation of these spiking neurons is invariant to rotations. Summing up the approximation recognition, there is a family of radial functions that can be well approximated by one-hidden-layer sc SNNs. We formally begin our proof of Theorem 2 with some useful lemmas. Lemma 1 Let g : [r, R] R be an L-Lipschitz function for r R. For any δ > 0, Cs 1, and n Cs R2Lm/( rδ), there exist some time t and an IFR function f(x, t) led by a one-hidden-layer sc SNN of n spiking neurons such that sup x Rm |g( x ) f(x, t)| δ . Lemma 2 For m > C2 > 0, g : Rm R is an L-Lipschitz radial function supported on the set S = {x : 0 < C2 m x 2C2 m} . For any δ > 0, there exist some time t and an IFR function f(x, t) led by sc SNNs of one-hidden layer with width at most Cs(C2)3/2L(m)7/4/δ such that sup x Rm |g(x) f(x, t)| < δ . Lemma 2 shows that the L-Lipschitz radial functions can be approximated by the IFR function f(x, t) led by sc SNNs of one-hidden layer with polynomial parameters. Lemma 3 Define i=1 ϵigi(x) with gi(x) = I{ x Ωi} , (5) where ϵi { 1, +1}, N is a polynomial function of m, and Ωi s are disjoint intervals of width O(1/N) on values in the range Θ( m). For any ϵi { 1, +1} (i [N]), there exists a Lipschitz function h: S [ 1, +1] such that Z Rm (g(x) h(x))2 ϕ2(x) dx 3 (C2)2 m . Lemma 3 shows that any non-Lipschitz function h(x) can be approximated and bounded by a Lipschitz function with density ϕ2. Proof of Theorem 2. Let g(x) = PN i=1 ϵigi(x) be defined by Eq. (5) and N 4C5/2 2 m2. According to Lemma 3, there exists a Lipschitz function h with range [ 1, +1] such that h(x) g(x) L2(µ) 3 C2(m)1/4 . Based on Lemmas 1 and 2, any Lipschitz radial function supported on S can be approximated by an IFR function f(x, t) led by sc SNNs of one-hidden layer with width at most C3Cs(m)15/4, where C3 is a constant relative to C2 and δ. This means that there exists some time t such that sup x Rm |h(x) f(x, t)| δ . Thus, we have h(x) f(x, t) L2(µ) δ . Hence, the range of f( , t) is in [ 1 δ, +1 + δ] [ 2, +2]. Provided the radial function, defined by Eq. (5), we have g(x) f(x, t) L2(µ) g(x) h(x) L2(µ) + h(x) f(x, t) L2(µ) 3 C2(m)1/4 + δ . This implies that given constants m > C2 > 0 and C3 > 0, for any δ > 0 and ϵi { 1, +1} (i [N]), there exists some time t, such that the target radial function g can be approximated by an IFR function f(x, t) led by sc SNNs of one-hidden layer with range in [ 2, +2] and width at most C3Cs(m)15/4, that is, g(x) f(x, t) L2(µ) 3 C2(m)1/4 + δ < δ1 . This completes the proof. 4.3 ϵ-Approximation with Time Complexity The IFR function generates a discrete dynamical system, comprising a Spatial function and a temporal flow. Both Theorems 1 and 2 focus on the Spatial (e.g., spatial) approximation characteristics of the IFR function. This subsection investigates its temporal characteristics, especially how long it takes for a self-connected SNN to achieve a specified task or target function. Now, we present our third theorem about the time complexity of sc SNNs as follows. Definition 3 A matrix V Rm n (if m n) is said to be non-degenerate if any m m sub-matrix of V has full rank. Theorem 3 Let n m. Let xk(t) π(λk) and E(xk) = λk for λk > 0, k [m], and t [T]. If V is a non-degenerate matrix, then for any ϵ > 0 and matrix G Rm n with G 2 < , when the time complexity satisfies there exists some one-hidden-layer sc SNNs with the IFR vector f = (f1, . . . , fn) such that Ex [Vf(x, T)] Gλ 2 < ϵ . (6) Theorem 3 shows that without training, the sc SNNs can well approximate the multivariate spike flows (MSF, i.e., linear functions of λ) within an explicit polynomial time complexity. In contrast to the conventional ANN theory and our proposed theorems above that show the spatial approximation ability, this theorem portrays the computational efficiency of a flow function led by sc SNNs. Besides, there are few studies on the computational efficiency of SNNs, and we summarize the comparative results in Table 1. Notice that approximating specific functions, such as MSF in this work, is more challenging than solving some problems with locally competitive algorithms [34]. Thus, it is observed that Theorem 3 provides a rigorous guarantee for SNNs to solve some algorithmic problems. Besides, Theorem 3 relaxes the dimension of the self-connection matrix V from n n to n m, which provides a more general result for approximating MSF with any dimension. Table 1: Comparative Results on Computational Efficiency of SNNs. Works Models Objects Computational Complexity Tang et al. [35] configured SNNs solving Sparse Coding Problem Convergence Chou et al. [8] simple SNNs solving Quadratic Programming Polynomial Complexity Our Work sc SNNs with Poisson inputs approximating MSF Polynomial Complexity The proof idea of this theorem originates from the attractor theory in dynamical systems. One key lemma is as follows. Lemma 4 Given U = ΛU e U , λ Rn, and ϵ > 0, when t Ω n ΛG 2 U f( λ, t) λ ϵ, where the modified IFR function f( λ, t) is led by a sc SNN without connection matrix W and fed up to the constant spike sequence λ at every timestamp. Lemma 4 shows that when fed up to a constant spike sequence λ at every timestamp, a weighted sum of IFR functions can converge to an attractor around the inputs within a polynomial temporal computation. Based on the results of Lemma 4, it suffices to prove that the expectation of the concerned IFR function f(x, t) can approximate f( λ, t ) within time interval [0, T] where t T Ω( n G 2 V 2 ). Full proof of Theorem 3 can be obtained in Appendix D. The aforementioned results, time complexity bound in detail, can be verified by a simulated experiment. We here simulate a 4 10, 000 spike sequence from the Iris data sets with a timestamp of 0.001 using Poisson encoding. We employ the one-hidden-layer sc SNN [44] that contains self-connection structure as the conducted SNN model. The number of input channels and hidden neurons are 4 and 10, respectively. The self-connection matrix V is randomly sampled from [0, 1] with bias = 1/3. The above configurations meet the conditions of Theorem 3. Table 2 lists the hyper-parameter values in the conducted sc SNN. For any linear matrix G R4 10 with G 2 < , we define an indicator tc = n G 2 V 2 . Thus, by exploiting the relation among ϵ, t, and tc, we can verify the explicit polynomial bound, especially the order of magnitude function Ω( ) in Theorem 3. Figure 2 plots the experimental results. From Figure 2(a), the conducted sc SNN first approximates the objective function at a faster rate and then maintains a lower approximation error. Figure 2(b) signifies that log(ϵ) is inversely proportional to log(tc), i.e., a log(tc) + b log(ϵ) = c where a, b, c > 0. Figure 2(c) shows the relation plots between log(tc) and log(t). Notice that the conducted Sc NN Table 2: Hyper-parameter Setting of sc SNNs. Parameters Value Parameters Value Time Step 0.001 Firing Threshold 1 Expect Spike Count (True) 100 Membrane Time τm 0.2 Expect Spike Count (False) 10 Time Constant of Synapse τs 0.008 Encoding Length T 10, 000 Maximum Firing 10 Refractory Period 0.016 Maximum Time 10 can achieve the same approximation error at different timestamps, we only care about the shortest one, that is, the blue solid curve below the red dashed line. It is observed that log(t) is proportional to log(tc), i.e., log(t) = a log(tc) + b where a > 0 and b R. In summary, we have demonstrated the effectiveness of our theoretical results. (a) Curve: t-ϵ. (b) Curve: log(tc)-log(ϵ). (c) Curve: log(tc)-log(t). Figure 2: Magnitude Curves of ϵ, t, and tc. 5 Discussions Despite an increasing focus on the potential of handling spatio-temporal data, the theoretical understandings of many aspects (e.g., approximation ability and computational efficiency) of SNNs are still far from clear. Some seminal studies focused on the universality of SNNs. Maass et al. [19, 20, 21] showed that some typical SNNs can simulate the standard computational models such as Turing machines, random access machines, threshold circuits, etc. She et al. [31] showed some universal approximation properties of SNNs by exploiting the spike propagation paths. However, to the best of our knowledge, few theoretical guarantees on the approximation complexity and computational efficiency of SNNs have been provided. There are only some academic studies on the convergence in the limit results for SNNs solving the sparse coding problem [34, 35] and the convergence rates for SNNs solving temporal quadratic programming [8]. Recently, two studies [41] and [44] provided calculable ways for approaching the lower and upper bounds of adaptive SNN systems, respectively. Our starting point is a recent advance proposed by Zhang and Zhou [44], in which they showed that adding self connections enables SNNs to achieve adaptive dynamical systems with spatio-temporal representation. This work developed an in-depth analysis of approximation ability and computational efficiency of self connections in SNNs. It is observed that self connections facilitate our results; re-using firing spike variables s(t) contributes to approximating the norm of input sequences x(t), and the symmetric self-connection matrix coincides with rotation and dual transformations, which are crucial in the proofs of Theorems 2 and 3. Our theoretical derivatives not only show the universal approximation properties of sc SNNs but also disclose the parameters and time complexities for sc SNNs approximating discrete dynamical systems, which prospectively provide solid support for the theory development of SNNs. Unfortunately, this paper has not yet shown the theoretical advantages of sc SNNs over non-self-connection SNNs (it is valuable to be studied in the future). But we believe that the current results have disclosed the importance and power of self connections for enhancing the approximation and computational ability of SNNs, which may shed some insights on developing provable and sound SNNs. In light of the preceding merits, many issues are worthy of being studied in the future. One important future issue is to explore some practical techniques for sc SNNs. For example, Theorem 2 shows the power of sc SNNs on representing radial or equally rotation-invariant functions. So we conjecture that adding self connections may be more compatible with some invariant models and data augmentation techniques, such as image rotation [23, 29]. Besides, the current implementation roughly follows the ideas of Zhang and Zhou [44], i.e., the effect from the k-th neuron to the i-th neuron equals to the last spike of neuron k multiplied by a weighted factor as shown in Eq. (1). Such a self-connection graph enable SNNs to maintain adaptive characteristics for representing discrete dynamical systems. However, it inevitably leads to a larger memory consumption when the input spike sequences are high-dimensional and high-frequency. Therefore, it is prospective to explore some more practical techniques or modules for sc SNNs, which may open up possibilities for achieving sound SNNs. Another important future issue is to develop in-depth theoretical understandings of SNNs, from aspects of approximation complexity, computational efficiency, representation ability [15, 26, 43], and over-parameterized architectures [4, 45]. Furthermore, it is interesting to explore the theoretical advantages of SNNs with self connections over SNNs without ones, especially from the perspectives of approximation, optimization, and generalization. 6 Conclusions In this paper, we present the theoretical understandings of the approximation ability and computational efficiency of the self-connection SNNs, i.e., sc SNNs. We provided three main theorems to show the universal approximation properties, parameters complexity for spatial approximation, and time complexity for temporal approximation of the sc SNNs. Our theoretical results disclose the effects of self connections of sc SNNs for approximating discrete dynamical systems using polynomial number of parameters within time complexities. Acknowledgments and Disclosure of Funding This research was supported by the National Natural Science Foundation of China (NSFC 61921006) and the Collaborative Innovation Center of Novel Software Technology and Industrialization. [1] E. D. Adrian. The impulses produced by sensory nerve endings. The Journal of Physiology, 61(49):156 193, 1926. [2] A. Aertsen, P. I.M. Johannesma, and D. J. Hermes. Spectro-temporal receptive fields of auditory neurons in the grassfrog. Biological Cybernetics, 38(4):235 248, 1980. [3] J. Anumula, D. Neil, T. Delbruck, and S.-C. Liu. Feature representations for neuromorphic audio spike streams. Frontiers in Neuroscience, 12:23, 2018. [4] S. Arora, S. Du, W. Hu, Z. Li, and R.Wang. Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks. In Proceedings of the 36th International Conference on Machine Learning, pages 322 332, 2019. [5] D. G. Barrett, S. Denève, and C. K. Machens. Firing rate predictions in optimal balanced networks. In Advances in Neural Information Processing Systems 26, pages 1538 1546, 2013. [6] W. Bialek, F. Rieke, R. van Steveninck, and D. Warland. Reading a neural code. Science, 252(5014):1854 1857, 1991. [7] H. S. Carslaw and W. Rogosinski. Introduction to the theory of Fourier s series and integrals. Bulletin of the American Mathematical Society, 37:510 511, 1931. [8] C.-N. Chou, K.-M. Chung, and C.-J. Lu. On the algorithmic power of spiking neural networks. In Proceedings of the 10th Innovations in Theoretical Computer Science Conference, pages 26:1 26:20, 2019. [9] G. Dumont, A. Payeur, and A. Longtin. A stochastic-field description of finite-size spiking neural networks. PLo S Computational Biology, 13(8):1005691, 2017. [10] R. Eldan and O. Shamir. The power of depth for feedforward neural networks. In Proceedings of the 29th Annual Conference on Learning Theory, pages 907 940, 2016. [11] R. Florian. Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural Computation, 19(6):1468 1502, 2007. [12] W. Gerstner. Time structure of the activity in neural network models. Physical Review E, 51(1):738, 1995. [13] D. Huh and T. J. Sejnowski. Gradient descent for spiking neural networks. In Advances in Neural Information Processing Systems 32, pages 1440 1450, 2018. [14] E. Hunsberger and C. Eliasmith. Spiking deep networks with LIF neurons. ar Xiv:1510.08829, 2015. [15] A. Jacot, F. Gabriel, and C. Hongler. Neural tangent kernel: Convergence and generalization in neural networks. In Advances in Neural Information Processing Systems 32, pages 8571 8580, 2018. [16] S. R. Kheradpisheh, M. Ganjtabesh, S. J. Thorpe, and T. Masquelier. STDP-based spiking deep convolutional neural networks for object recognition. Neural Networks, 99:56 67, 2018. [17] N. Kuwabara and N. Suga. Delay lines and amplitude selectivity are created in subthalamic auditory nuclei: The brachium of the inferior colliculus of the mustached bat. Journal of Neurophysiology, 69(5):1713 1724, 1993. [18] J. L. Lobo, J. Del Ser, A. Bifet, and N. Kasabov. Spiking neural networks and online learning: An overview and perspectives. Neural Networks, 121:88 100, 2020. [19] W. Maass. Lower bounds for the computational power of networks of spiking neurons. Neural Computation, 8(1):1 40, 1996. [20] W. Maass. Networks of spiking neurons: The third generation of neural network models. Neural Networks, 10(9):1659 1671, 1997. [21] W. Maass and C. M. Bishop. Pulsed Neural Networks. MIT press, 2001. [22] T. Masquelier and S. J. Thorpe. Unsupervised learning of visual features through spike timing dependent plasticity. PLo S Computational Biology, 3(2):e31, 2007. [23] S. Mei, T. Misiakiewicz, and A. Montanari. Learning with invariances in random features and kernel models. In Proceedings of the 34th Annual Conference on Learning Theory, pages 3351 3418, 2021. [24] B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583):607 609, 1996. [25] P. Panda and K. Roy. Unsupervised regenerative learning of hierarchical features in spiking deep networks for object recognition. In Proceedings of the 2016 International Joint Conference on Neural Networks, pages 299 306, 2016. [26] T. Poggio, A. Banburski, and Q. Liao. Theoretical issues in deep networks. Proceedings of the National Academy of Sciences, 117(48):30039 30045, 2020. [27] R. Q. Quiroga, L. Reddy, G. Kreiman, C. Koch, and I. Fried. Invariant visual representation by single neurons in the human brain. Nature, 435(7045):1102, 2005. [28] K. Roy, A. Jaiswal, and P. Panda. Towards spike-based machine intelligence with neuromorphic computing. Nature, 575(7784):607 617, 2019. [29] R. Roy and T. Kailath. Esprit-estimation of signal parameters via rotational invariance techniques. IEEE Transactions on Acoustics, Speech, and Signal processing, 37(7):984 995, 1989. [30] B. Schrauwen, M. DHaene, D. Verstraeten, and J. Van Campenhout. Compact hardware liquid state machines on FPGA for real-time speech recognition. Neural Networks, 21(2-3):511 523, 2008. [31] X. She, S. Dash, and S. Mukhopadhyay. Sequence approximation using feedforward spiking neural network for spatiotemporal learning: Theory and optimization methods. In Proceedings of the 9th International Conference on Learning Representations, 2021. [32] T. Shimokawa, K. Pakdaman, and S. Sato. Time-scale matching in the response of a leaky integrate-and-fire neuron model to periodic stimulus with additive noise. Physical Review E, 59(3):3427, 1999. [33] S. B. Shrestha and G. Orchard. SLAYER: Spike layer error reassignment in time. In Advances in Neural Information Processing Systems 32, pages 1419 1428, 2018. [34] P. T. P. Tang. Convergence of LCA flows to (c) LASSO solutions. ar Xiv:1603.01644, 2016. [35] P. T. P. Tang, T.-H. Lin, and M. Davies. Sparse coding by spiking neural networks: Convergence theory and computational results. ar Xiv:1705.05475, 2017. [36] S. Thorpe, A. Delorme, and R. Van Rullen. Spike-based strategies for rapid processing. Neural Networks, 14(6-7):715 725, 2001. [37] R. Van Rullen, R. Guyonneau, and S. J. Thorpe. Spike times make sense. Trends in Neurosciences, 28(1):1 4, 2005. [38] E. Vasilaki, N. Frémaux, R. Urbanczik, W. Senn, and W. Gerstner. Spike-based reinforcement learning in continuous state and action space: When policy gradient methods fail. PLo S Computational Biology, 5(12):e1000586, 2009. [39] D. Verstraeten, B. Schrauwen, D. Stroobandt, and J. Van Campenhout. Isolated word recognition with the liquid state machine: A case study. Information Processing Letters, 95(6):521 528, 2005. [40] Q. Yu, H. Tang, K. C. Tan, and H. Yu. A brain-inspired spiking neural network model with temporal encoding and learning. Neurocomputing, 138:3 13, 2014. [41] G. Zhang and S.-Q. Zhang. Structural stability of spiking neural networks. ar Xiv:2207.04876, 2022. [42] S.-Q. Zhang, W. Gao, and Z.-H. Zhou. Towards understanding theoretical advantages of complex-reaction networks. Neural Networks, 151:80 93, 2022. [43] S.-Q. Zhang, F. Wang, and F.-L. Fan. Neural network gaussian processes by increasing depth. IEEE Transactions on Neural Networks and Learning Systems, pages 1 6, 2022. [44] S.-Q. Zhang, Z.-Y. Zhang, and Z.-H. Zhou. Bifurcation spiking neural network. Journal of Machine Learning Research, 22(253):1 21, 2021. [45] Z.-H. Zhou. Why over-parameterization of deep neural networks does not overfit? Science China Information Sciences, 64(1):1 3, 2021. 1. For all authors... (a) Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope? [Yes] (b) Did you describe the limitations of your work? [Yes] (c) Did you discuss any potential negative societal impacts of your work? [Yes] (d) Have you read the ethics review guidelines and ensured that your paper conforms to them? [Yes] 2. If you are including theoretical results... (a) Did you state the full set of assumptions of all theoretical results? [Yes] (b) Did you include complete proofs of all theoretical results? [Yes] 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] The simulation experiments employ the codes of Zhang and Zhou [44], and detail the data, settings, and configurations in Pages 7-8. So we believe it is easily to reproduce our experimental results. We will publicash our codes as soon as possible if this paper is accepted. (b) Did you specify all the training details (e.g., data splits, hyper-parameters, how they were chosen)? [Yes] See Table 2. (c) Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? [N/A] (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A] 4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets... (a) If your work uses existing assets, did you cite the creators? [Yes] (b) Did you mention the license of the assets? [Yes] (c) Did you include any new assets either in the supplemental material or as a URL? [Yes] (d) Did you discuss whether and how consent was obtained from people whose data you re using/curating? [N/A] (e) Did you discuss whether the data you are using/curating contains personally identifiable information or offensive content? [N/A] 5. If you used crowdsourcing or conducted research with human subjects... (a) Did you include the full text of instructions given to participants and screenshots, if applicable? [N/A] (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [N/A] (c) Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? [N/A]