# mesoscopic_modeling_of_hidden_spiking_neurons__f7dfa05a.pdf Mesoscopic modeling of hidden spiking neurons Shuqi Wang , Valentin Schmutz , Guillaume Bellec, Wulfram Gerstner Laboratory of Computational Neuroscience École polytechnique fédérale de Lausanne (EPFL) first.lastname@epfl.ch Can we use spiking neural networks (SNN) as generative models of multi-neuronal recordings, while taking into account that most neurons are unobserved? Modeling the unobserved neurons with large pools of hidden spiking neurons leads to severely underconstrained problems that are hard to tackle with maximum likelihood estimation. In this work, we use coarse-graining and mean-field approximations to derive a bottom-up, neuronally-grounded latent variable model (neu LVM), where the activity of the unobserved neurons is reduced to a low-dimensional mesoscopic description. In contrast to previous latent variable models, neu LVM can be explicitly mapped to a recurrent, multi-population SNN, giving it a transparent biological interpretation. We show, on synthetic spike trains, that a few observed neurons are sufficient for neu LVM to perform efficient model inversion of large SNNs, in the sense that it can recover connectivity parameters, infer single-trial latent population activity, reproduce ongoing metastable dynamics, and generalize when subjected to perturbations mimicking optogenetic stimulation. 1 Introduction The progress of large-scale electrophysiological recording techniques [1] begs the following question: can we reverse engineer the probed neural microcircuit from the recorded data? If so, should we try to design large spiking neural networks (SNN), representing the whole microcircuit, capable of generating the recorded spike trains? Such networks would constitute fine-grained mechanistic models and would make in silico experiments possible. However appealing this endeavor may appear, it faces a major obstacle that of unobserved neurons. Indeed, despite the large number of neurons that can be simultaneously recorded, they add up to a tiny fraction of the total number of neurons involved in any given task [2], making the problem largely underdetermined. Training SNNs with large numbers of hidden neurons is challenging because a huge number of possible latent spike patterns result in the same recurrent input to the recorded neurons, making training algorithms nontrivial [3 6]. From the perspective of a single recorded neuron, the spike activity of all the other neurons can be reduced to a single causal variable the total recurrent input (Figure 1A). Hence, we argue that fine-grained SNNs are not necessary to model the inputs from hidden neurons but can be replaced by a coarse-grained model of the sea of unobserved neurons. One possible coarse-graining scheme consists in clustering neurons into homogeneous populations with uniform intraand interpopulation connectivity. With the help of mean-field neuronal population equations [7 10], this approach enables the reduction of large SNNs to low-dimensional mesoscopic models composed of neuronal populations interacting with each other [11 13]. Clusters can reflect the presence of different cell-types [11, 14, 15] or groups of highly interconnected excitatory neurons [16 21]. From a computational point of view, coarse-grained SNNs offer biologically plausible implementations of Equal contributions. 36th Conference on Neural Information Processing Systems (Neur IPS 2022). Figure 1: Training SNNs with large numbers of hidden neurons: challenge and approaches. (A) The challenge of modeling the input to the observed neurons (black) coming from the hidden neurons (white) while only a small fraction of neurons is observed. (B) Modeling strategies for the input coming from the hidden neurons: SNNs (left) model the fine-grained spike trains of all hidden neurons; neu LVM (middle) uses a mesoscopic description of the population activity, clustering neurons into homogeneous populations; classic LVMs (right) model the latent activity with lowdimensional phenomenological variables (the link to SNNs is lost). rate coding by ensembles of neurons [22, 23] and computation through neural population dynamics [24]. In this paper, we show that, after clustering the unobserved neurons into several homogeneous populations, the finite-size neuronal population equation of Schwalger et al. [11] can be used to derive a neuronally-grounded latent variable model (neu LVM) where the activity of the unobserved neurons is summarized in a coarse-grained mesoscopic description. The hallmark of neu LVM is its direct correspondence to a multi-population SNN. As a result, both model parameters and latent variables have a transparent biological interpretation: the model is parametrized by single-neuron properties and synaptic connectivity; the latent variables are the summed spiking activities of the neuronal populations. Coarse-graining by clustering, therefore, turns an underdetermined problem fitting an SNN with a large number of hidden neurons into a tractable problem with interpretable solutions. Switching metastable activity patterns that are not stimulus-locked have attracted a large amount of attention in systems neuroscience [25 27] for their putative role in decision-making [28], attention [29], and sensory processing [30]. Since generative SNN models of these metastable dynamics are available [11, 21, 31, 32], metastable networks constitute ready-to-use testbeds for bottom-up mechanistic latent variable models. Therefore, we propose metastable networks as a new benchmark for mechanistic latent variable models. 2 Relation to prior work While many latent variable models (LVM), including Poisson Linear Dynamical Systems (PLDS) [33] and Switching Linear Dynamical Systems (SLDS) [34 39], have been designed for inferring low-dimensional population dynamics [40 53], their account of the population activity is a phenomenological one. By contrast, the LVM derived in this work is a true multiscale model, as latent population dynamics directly stems from neuronal dynamics. Our method builds on René et al. [54], who showed that the mesoscopic model of Schwalger et al. [11] enables the inference of neuronal and connectivity parameters of multi-population SNNs via likelihood-based methods when the mesoscopic population activity is fully observable. Here, for the first time, we show that mesoscopic modeling can also be applied to unobserved neurons, relating LVMs to mean-field theories for populations of spiking neurons [7 12]. Our neu LVM approach towards unobserved neurons differs from the Generalized Linear Model (GLM) approach [55 57] (and recent extensions [58 60]), which either neglects unobserved neurons or replaces unobserved neurons by stimulus-locked inputs. Our approach also avoids microscopic simulations of hidden spiking neurons [3, 4, 6], which scale poorly with the number of hidden neurons. 3 Background: mesoscopic modeling of the population activity Biophysical neuron models can be accurately approximated (neglecting the nonlinearity of dendritic integration) by simple spiking point neurons [61 63], which can be efficiently fitted to neural data [64 67]. Stochastic spiking neuron models where the neuron s memory of its own spike history is restricted to the time elapsed since its last spike (its age) are said to be of renewal-type.2 Examples of renewal-type neurons include noisy leaky integrate-and-fire and Spike-Response Model 0 neurons [9, 10]. The dynamics of a homogeneous population of interacting renewal-type neurons can be described, in the mean-field limit, by an exact integral equation [7, 9, 10, 12, 69] (see [70 72] for rigorous proofs). In the case of homogeneous but finite populations, Schwalger et al. [11] derived a stochastic integral equation that provides a mesoscopic description (i.e. a description including finite-size fluctuations) of the population dynamics. For clarity of exposition, in this section and the next, we focus on the case of a single homogeneous population with no external input. All the arguments presented below can be readily generalized to the case of multiple interacting populations with external input (Appendices A B C). Let us consider a homogeneous SNN of N recurrently connected renewal-type spiking neurons. For T discrete time steps of length t, let y 2 {0, 1}N T (a N T binary matrix) denote the spike trains generated by the N neurons. The fact that the neurons are of renewal-type implies, by definition, that the probability for neuron i to emit a spike at time t can be written p(yi t = 1|y1:t 1, ) = t 1:t 1) where ai is the age of neuron i (i.e. the number of time steps elapsed since the last spike of neuron i), the Jij are the recurrent synaptic weights of the network, and i are the parameters of neuron i. The sum P 1:t 1 represents the past input received by neuron i in all time steps up to t 1. The superscript t of the function t i indicates that we consider here the discrete-time escape rate of the neuron but the transition to continuous time is possible [10]. The explicit expression for t i in the case of leaky integrate-and-fire neurons with escape noise (LIF) is given in Appendix A. A crucial notion in this work is that of a homogeneous population . The SNN described above forms a homogeneous population if all the recurrent synaptic weights are identical, that is, Jij = J/N (mean-field approximation), and if all the neurons share the same parameters, that is, i = . In a homogeneous population, all the neurons share the same past input Jn1:t 1/N, where n1:t 1 = (n1, n2, . . . , nt 1) denotes the total number of spikes in the population in time steps 1, 2, . . . , t 1 with nt0 = PN t0 being the total number of spikes in the population at time t0. Then, for any neuron in the population, the probability to emit a spike at time t, given its age a, is (a, Jn1:t 1/N). (1) Importantly, Eq. (1) is independent of the identity of the neuron. In a microscopic description of the spiking activity, the vector yt depends nonlinearly on the past y1:t 1. A mesoscopic description aims to reduce the high-dimensional microscopic dynamics to a lower-dimensional dynamical system involving the population activity nt only (in the case of multiple interacting populations, nt is a vector of dimension K equal to the number of populations, Appendix A). While an exact reduction is not possible in general (neuron models being nonlinear), a close approximation in the form of a stochastic integral equation was proposed by Schwalger et al. [11]. In discrete time, the stochastic integral equation reads nt Binomial t,a St,a nt a + t | {z } finite-size correction The variable nt can be interpreted as the expected number of neurons firing at time t. The survival St,a = Qa 1 t a+s,s) is the probability for a neuron to stay silent between time t a and 2Traditional renewal theory in the mathematical literature [68] is restricted to stationary input whereas we use renewal-type in the broader sense that also includes time-dependent input. t 1. The finite-size correction term stabilizes the model by enforcing the approximate neuronal mass conservation P a 1 St,a nt a N (see [13] for an in-depth mathematical discussion). The modulating factor t has an explicit expression [11, 13] in terms of pfire, S and n (indices are dropped here for simplicity, complete formulas are presented in Appendix A, as well as explanations on how to initialize Eq. (2)). Importantly, for a populations of interacting neurons, pfire, S and depend on n, which makes the stochastic equation (2) highly nonlinear. While the mesoscopic model (2) is not mathematically exact, it provides an excellent approximation of the first and secondorder statistics of the population activity [11], and is much more tractable than the exact field equation [73, 74]. Also, the mesoscopic model (2) can be generalized to the case of non-renewal neurons with spike-frequency adaptation [11] and short-term synaptic plasticity [75]. Formally, Eq. (2) is reminiscent of the Point Process Generalized Linear Model (GLM) [55 57] for single neurons, with the notable difference that Eq. (2) contains additional nonlinearities beyond those of the GLM because pfire, S and all depend on n (Appendix A). Importantly, Equation (2) readily defines an expression for the probability p(n| ) [54], where = {J, } denotes the model parameters. Thus, the mesoscopic model (2) allows us to avoid the intractable sum encountered if we naively try to derive p(n| ) directly from the microscopic description (the intractable sum stems from the fact that the identity of neurons is lost in the observation nt0 at each time step t0, Figure 1B). 4 Theoretical result: Neuronally-grounded latent variable model In this section, we first recall why training SNN with large numbers of hidden neurons via the maximum likelihood estimator is computationally expensive. Then, we show that the mesoscopic description, Eq. (2), allows us to derive a tractable, neuronally-grounded latent variable model, which can be mapped to a multi-population SNN. For the sake of simplicity, all the arguments are presented for a single homogeneous population, but the generalization to multiple interacting populations is straightforward (Appendices B and C). Let us assume that we observe, during T time steps, the spike trains of q simultaneously recorded neurons that are part of a homogeneous population of N neurons, with N > q. We split the spike trains of the entire population y 2 {0, 1}N T into the observed spike trains yo (q neurons) and hidden spike trains yh (N q neurons). Even for a single population, it is difficult to infer the parameters = {J, } of the SNN from observation yo using the maximum likelihood estimator because, in the presence of hidden neurons, the likelihood L involves a marginalization over the latent spike trains yh: L = p(yo| ) = p(yo, yh| ). (3) While different variants of the Expectation-Maximization (EM) algorithm [76] relying on sampling yh have been used to maximize the likelihood [3, 4, 6], these algorithms scale poorly with the number of hidden neurons. Instead, we exploit the fact that, for a homogeneous population, the fine-grained knowledge of the latent activity yh is not necessary since all the observed neurons receive at time t the same input Jnt, where nt = PN t is the population activity of Section 3. Hence, we rewrite the likelihood (3) as L = p(yo| ) = p(yo, n| ), (4a) where the probability p(yo, n| ) factorizes in T terms of the form 1:t 1, n1:t 1, ) = p(yo 1:t 1, n1:t 1, ) | {z } given by neuron model, Eq. (1) p(nt|n1:t 1, ) | {z } approx. by meso. model, Eq. (2) A comparison of Eqs. (3) and (4) shows that the high-dimensional latent activity yh has been reduced to a low-dimensional mesoscopic description. Importantly, the q observed spike trains are conditionally independent given the population activity n. While the conditional dependence structure implied by Eq. (4b) is typical of standard latent variable models of multi-neuronal recordings [33, 40, 77, 78], in our approach, the latent variable explicitly represents the population activity of the generative SNN and the parameters of the model are identical to those of the SNN. As the latent population dynamics directly stems from neuronal dynamics, we call our LVM the neuronallygrounded latent variable model (neu LVM). The nonlinearity and the non-Markovianity of Eq. (2) prevent us from using previous EM algorithms [33, 40, 77, 78]. Therefore, we fit the neu LVM via the Baum-Viterbi algorithm [79] (also known as Viterbi training or hard EM [80]), which alternates estimation (E) and maximization (M) step E-step. bnn = argmaxn log p(yo, n|b n 1), M-step. b n = argmax log p(yo, bnn| ). The estimated parameters b and the estimated latent population activity bn are the result of many iterations of E-step and M-step (Appendix C). Note that the computational cost of this algorithm does not depend on the number of hidden neurons (it only depends on the number of populations).3. 5 Experimental results 5.1 Single homogenous population: SNN with metastable cluster states Although seemingly simple, homogeneous populations of leaky integrate-and-fire (LIF) neurons without external stimulation are SNNs with a rich repertoire of population dynamics, including asynchronous states, synchronous states, and cluster states [7, 9]. In a m-cluster state (with m 2), the population activity oscillates at a frequency m times higher than the average neuronal firing rate: a neuron spikes every m cycle on average; conversely, approximately N/m neurons fire in each cycle (N being to the total number of neurons). Cluster states have therefore been described as higher harmonics of the synchronous state (or 1-cluster state) [9, 81 83] where all neurons fire in In this set of experiments, we always consider the same network of 600 LIF neurons (Figure 2A), where only the connectivity parameter J varies. When initialized at time 0 in the same unstable asynchronous state, the network can spontaneously settle in a m-cluster state, where m depends on the recurrent connectivity parameter J (Figure 2B): finite-size fluctuations break the symmetry of the asynchronous state and the population splits into m groups of synchronized neurons. The cluster state to which the network converges can be read from the power spectrum of the neuronal spike trains (Figure 2B) (the fundamental frequency of the m-cluster state is approximately m times lower than that of the 1-cluster state). Generating spike trains for 6 observed neurons (1% of the population), we tested whether neu LVM could recover the connectivity parameter J (neuronal parameters were given), for different J s in the 1-, 2-, and 3-cluster states range (Figure 2C, Table S3). The Pearson correlation between the learned b J and the true J was 0.81 with p-value 2.8e 17, showing that, statistically, neu LVM could recover the connectivity parameter of the SNN. To assess how well neu LVM can infer the latent population activity and how neu LVM compares with the methods assuming full observability (like René et al. [54]), we studied in detail a single trial showing a transition from a metastable 4-cluster state to a 3-cluster state (Figure 2D,E). To generate this trial, we chose J = 60.32 m V and initialized the network in the 4-cluster state. From the spike trains of only two neurons (red stars in Figure 2D), neu LVM could infer the ground truth population activity n during the 4-cluster state, and during the 3-cluster state, and could approximately detect the transition between the two states (Figure 2E). While the summed, smoothed spike trains missed two out of four population activity peaks in the 4-cluster state, and one out of three peaks in the 3-cluster state (purple curve in Figure 2E), the strong inductive biases contained in neu LVM enabled the inference of the missing peaks (blue curve in Figure 2E). Finally, neu LVM and a method assuming full observability (equivalent to a naive application of René et al. [54]) were compared through their ability to recover the connectivity parameter J, for varying numbers of observed neurons (Figure 2F, Table S4). Since a naive application of the method of René et al. [54] does not take into account hidden neurons, it led, as expected, to wildly inaccurate estimates b J when the summed spike train was far from the ground truth population activity (which happened when the number of observed neurons was small, see Figure 2E for an example). In contrast, the neu LVM managed to recover the 3Our implementation of the algorithm is openly available at https://github.com/EPFL-LCN/neu LVM. Figure 2: Single-population SNN with metastable cluster states. (A) Network architecture (for visualization purposes, only a few connections are drawn). (B) Spike train power spectrum for different choices of connectivity parameter J. All simulations start from the same unstable asynchronous state. The corresponding cluster states are indicated below. The blue region around J = 58 m V indicates the absence of activity. (C) Connectivity recovered by the neu LVM b J vs ground truth J. The neu LVM was fitted on one-second single-trial recordings of six neurons (1% of the population). For each ground truth J value (seven in total), ten different trials were generated: bars indicate the standard deviations of the recovered b J. The Pearson correlation coefficient between the recovered b J and J is r = 0.81 and the associated p-value is 2.8e 17 (see Table S3). (D) Spike trains generated by the ground truth SNN for a trial showing a transition from a metastable 4-cluster state to a 3-cluster state. The spike trains of two randomly sampled neurons (red stars) formed the training data (for visualization purposes, only the first 0.6 second of the one-second trial is shown) on which neu LVM was fitted: (E) the inferred population activity bn|yo is compared to the ground truth n and the summed, smoothed spike trains (Gaussian smoothing window with σ = 1.4 ms, Appendix D) of the two observed neurons. (F) Absolute difference between the recovered b J and the ground truth J for the neu LVM algorithm and the method of René et al. (2020) for varying numbers of observed neurons. Using the same trial as in D, for each number of observed neurons, the two methods were tested on 10 different samples of observed neurons (see Table S4). The marker indicates that the difference | b J J| is larger than 30 m V. The median samples are linked with dashed lines to show the trends. connectivity parameter, thanks to the fact that the Baum-Viterbi algorithm of Section 4 also infers the population activity (see Figure 2E for an example). 5.2 Multiple populations: SNN with metastable point attractors 5.2.1 Latent population activity inference and reproduction of metastable dynamics As a second benchmark, we tested neu LVM on synthetic data generated by three interacting populations with two populations of 400 excitatory LIF neurons and one population of 200 inhibitory neurons (Figure 3A). Recurrent connections between these population drive winner-take-all dynamics with finite-size fluctuations-induced switches of activity between the two excitatory populations [11, 84] an example of itinerancy between attractor states [25]. The population activities of this metastable, three-population SNN constitute the ground truth against which different models will be tested. Figure 3: Network architecture and example trial. (A) Architecture of the three-population, metastable, winner-take-all SNN. (B) Example trial from the spiking benchmark dataset: 10 seconds recordings of 9 observed neurons (3 neurons from each of the three populations) and (C) corresponding ground truth latent population activity (for the two excitatory populations). To build a spiking benchmark dataset, we randomly selected 9 neurons 3 neurons from each of the three populations and considered the spike trains of these neurons as the observed data. For simplicity, the correct partitioning of the 9 neurons into 3 groups is given since it can be reliably obtained by k-means clustering [85] using the van-Rossum-Distance [86] between spike trains. The complete dataset consists of 20 trials of 10 seconds. An example trial is shown in Figure 3B. In contrast with the experiments of Section 5.1 where the neuronal parameters were given, here, neuronal and connectivity parameters are not given to neu LVM (see Appendix F). We compared the performance of neu LVM with other generative models of spiking data PLDS [33], SLDS [39], and GLM [55 57] on single trials of the spiking benchmark dataset in two ways: (i) we measured the Pearson correlation r between the inferred latent population activity bn|yo and the ground truth population activity n (Table 1 first column); (ii) we assessed how well could the fitted models reproduce metastable dynamics by counting the occurrences of stochastic switches in free simulations or in other words, samples of the fitted models (Table 1 second column). Tests (i) and (ii) on an example trial are shown in Figure 4. The Poisson Linear Dynamical Systems approach (PLDS, [33]) assumes that the recorded spikes can be explained by point processes driven by a latent linear dynamical system of low dimension. The Poisson Switching Linear Dynamical System (SLDS, [34 39]) extends PLDS by allowing the latent variables to switch randomly between two dynamical systems with distinct parameters. We should stress that, in PLDS and SLDS, the latent variables are phenomenological representations of neural population activity which have no direct link with the ground truth population activity n . In order to still make test (i) possible for PLDS and SLDS, we will consider the best linear transformation of the inferred latent variables which minimizes the mean squared error with the ground truth population activity n . On test (i), neu LVM gave better estimates bn|yo of the latent population activity n (Pearson correlation r = 0.81) than the best linear transformation of the latent activity inferred by PLDS and SLDS (r = 0.69 and r = 0.73 respectively) (Table 1 first column). The GLM approach cannot be included in test (i) since it ignores unobserved neurons. Interestingly, the example trial in Figure 4A shows the latent population activity bn|yo inferred by neu LVM is smoother than the ground truth n before and after the switch (finite-size fluctuations are reduced) but bn|yo and n closely match around the time of the switch. In contrast, fluctuations are exaggerated for PLDS and SLDS. The population activity Figure 4: Three-population SNN with metastable point attractors. (A) Latent population activity of the two excitatory populations inferred by neu LVM / PLDS / SLDS for one example trial (the same as in Figure 3). The value r is the Pearson correlation coefficient between the inferred bn|yo and the ground truth n population activities. (B-C) Examples of free simulations of the fitted neu LVM / PLDS / SLDS. Table 1: Model performance summary (corresponding to Figure 4). Models Pearson correlation r between bn|yo and n Number of switches during 100 seconds free simulations of the fitted models (10.3 2.7 for the ground truth SNN) neu LVM 0.81 0.02 7.8 4.1 PLDS (0.68 0.11) not visible SLDS (0.73 0.02) 11.9 8.9 GLM - not visible Mean and ( ) standard deviation were computed over 20 different trials. Parentheses for PLDS and SLDS indicate that these results are for the best linear transformation of the inferred latent variables. estimated by simply summing and smoothing the observed spike trains (Appendix D) is shown in Figure S6. On test (ii), neu LVM, fitted on a single trial of 10 seconds, was able to reproduce stochastic switches similar to that of the ground truth SNN (Table 1 second column): free simulations of the fitted neu LVM showed 7.8 switches in 100 seconds on average (10.3 switches on average for the ground truth SNN). To make sure that stochastic switches were the result of parameter learning via the Baum-Viterbi algorithm, we verified that, before learning, neu LVM did not show any metastable dynamics (Figure S7). Examples of simulated trials are shown in Figure 4B. PLDS failed to reproduce stochastic switches, which is not surprising since winner-take-all dynamics are typically nonlinear. SLDS could reproduce stochastic switches at the correct mean frequency (11.9 instead of the ground truth 10.3), but the standard deviation of the simulated switch count, 8.9 (2.7 for the ground truth SNN), indicates that a single 10 seconds trial was probably not sufficient for SLDS to learn switching probabilities reliably. Finally, neuronal stochasticity and small network size (9 neurons) did not allow GLM to produce stochastic switches, even when the training trial was prolonged to 500 seconds. Taken together, only neu LVM could infer the latent population activity and reliably learn the metastable dynamics on single trials of 10 seconds, demonstrating the effectiveness of its neuronallygrounded inductive biases. Of course, these results do not guarantee that the inductive biases of neu LVM would be effective on real data since real data is most certainly out-of-distribution for neu LVM. While applications on real data are beyond the scope of this paper, in Appendix F, we show that neu LVM is robust, to a certain extent, to within-population heterogeneity and out-of-distribution data. Figure 5: Network responses to perturbations mimicking optogenetic stimulation. (A) Activities of the excitatory populations when the active population is stimulated (100 trials, ratios indicate the number of No-switch or Switch trials). (B) Same as A but the silent population is stimulated. 5.2.2 Generalization: towards experimental predictions with neuronally-grounded modeling Bottom-up, mechanistic models allow us to perform in silico experiments and generate predictions about neural microcircuits, which can then be tested experimentally. So we wondered: can neu LVM, fitted on a single trial of spontaneous activity (like in Section 5.2.1), predict the response of the SNN when an external perturbation is applied? As a preliminary step in that direction, we tested whether an external stimulation of the fitted model would generate the same response as that of the microscopic SNN when subjected to the same perturbation. Using the same multi-population network as in Section 5.2 (Figure 3A) and neu LVM fitted on a single trial of spontaneous activity (Figure 3B), we compared the response of the ground truth SNN with that of neu LVM when one of the populations was stimulated by a current pulse of 4 ms mimicking the stimulation of an optogenetically modified population by a short light pulse. We simulated 100 trials where the momentarily active excitatory population was stimulated, and 100 where the momentarily silent excitatory population was stimulated (Figure 5A and B respectively). Each stimulation led to two possible outcomes: stimulation could trigger a state switch (Switch trials) or no state switch (No-switch trials). In both the ground truth SNN and the fitted neu LVM, we found that stimulating the silent population triggered more frequent state switches (Figure 5B) than stimulating the active population (Figure 5A). Moreover, in both the ground truth and the fitted neu LVM, we could induce excitatory rebound switches by stimulating the active population (Figure 5A, lower half). 6 Discussion Understanding the neural dynamics underlying computation in the brain is one of the main goals of latent variable modeling of multi-neuronal recordings [45, 47, 51, 53, 87, 88]. We contribute to this effort by proposing here a bottom-up, mechanistic LVM the neuronally-grounded latent variable model (neu LVM) which can be mapped to a multi-population SNN. Using SNN-based generative models, which are more biological than RNN-based models [45], could allow systems neuroscientists to test hypotheses about the architecture of the probed microcircuit, and provide a neuronally-grounded understanding of computation. While this work shows the potential of the neu LVM approach, the application of neu LVM to real data faces two methodological challenges. First, there is the problem of identifiability: although neu LVM could recover a single unknown connectivity parameter (Section 5.1), our method could not always recover the SNN parameters when many parameters were unknown (Section 5.2). Bayesian inference could circumvent the problem of non-identifiability by estimating the full posterior distribution over model parameters [54, 89]. In addition, perturbing the probed network, with optogenetic stimulation, for example, could help model parameter recovery by providing richer data. Second, in the case of real data, choosing a good generative SNN model is a nontrivial task. For example, how many homogeneous populations should the SNN have? Clustering the recorded spike trains could guide the design of possible generative models and Bayesian model comparison, as used in biophysical modeling of neuroimaging data [90 92], could help in selecting the most likely model among several possible models. The model proposed here is only one particular example of an SNN-based, tractable latent variable model. Whether other such neuronally-grounded models of partially observed spike trains can be formulated and efficiently applied to real data is a question left for future work. Acknowledgments and Disclosure of Funding We thank Johanni Brea for several discussions and for his comments on an early version of this work. We also thank Tilo Schwalger for the discussions and for sharing his code. Code from Joachim Koerfer was also used. This research was supported by Swiss National Science Foundation (no. 200020_184615). [1] Nicholas A Steinmetz, Christof Koch, Kenneth D Harris, and Matteo Carandini. Challenges and opportunities for large-scale electrophysiology with neuropixels probes. Current Opinion Neurobiology, 50:92 100, 2018. [2] Peiran Gao and Surya Ganguli. On simplicity and complexity in the brave new world of large- scale neuroscience. Current Opinion in Neurobiololy, 32:148 155, 2015. [3] Jonathan W Pillow and Peter Latham. Neural characterization in partially observed populations of spiking neurons. Advances in Neural Information Processing Systems, 20:1161 1168, 2008. [4] Johanni Brea, Walter Senn, and Jean-Pascal Pfister. Sequence learning with hidden units in spiking neural networks. Advances in Neural Information Processing Systems, 24:1422 1430, 2011. [5] Danilo Rezende, Daan Wierstra, and Wulfram Gerstner. Variational learning for recurrent spiking networks. Advances in Neural Information Processing Systems, 24:136 144, 2011. [6] Guillaume Bellec, Shuqi Wang, Alireza Modirshanechi, Johanni Brea, and Wulfram Gerstner. Fitting summary statistics of neural data with a differentiable spiking network simulator. Advances in Neural Information Processing Systems, 34:18552 18563, 2021. [7] Wulfram Gerstner. Time structure of the activity in neural network models. Physical Review E, 51(1):738, 1995. [8] Nicolas Brunel and Vincent Hakim. Fast global oscillations in networks of integrate-and-fire neurons with low firing rates. Neural Computation, 11(7):1621 1671, 1999. [9] Wulfram Gerstner. Population dynamics of spiking neurons: fast transients, asynchronous states, and locking. Neural Compututation, 12(1):43 89, 2000. [10] Wulfram Gerstner, Werner M Kistler, Richard Naud, and Liam Paninski. Neuronal dynamics: From single neurons to networks and models of cognition. Cambridge University Press, 2014. [11] Tilo Schwalger, Moritz Deger, and Wulfram Gerstner. Towards a theory of cortical columns: From spiking neurons to interacting neural populations of finite size. PLo S Computational Biology, 13(4):e1005507, 2017. [12] Tilo Schwalger and Anton V Chizhov. Mind the last spike firing rate models for mesoscopic populations of spiking neurons. Current Opinion in Neurobiology, 58:155 166, 2019. [13] Valentin Schmutz, Eva Löcherbach, and Tilo Schwalger. On a finite-size neuronal population equation. ar Xiv preprint ar Xiv:2106.14721, 2021. [14] Sandrine Lefort, Christian Tomm, J-C Floyd Sarria, and Carl CH Petersen. The excitatory neuronal network of the c2 barrel column in mouse primary somatosensory cortex. Neuron, 61 (2):301 316, 2009. [15] Tobias C Potjans and Markus Diesmann. The cell-type specific cortical microcircuit: relating structure and activity in a full-scale spiking network model. Cerebral Cortex, 24(3):785 806, 2014. [16] Yumiko Yoshimura, Jami LM Dantzker, and Edward M Callaway. Excitatory cortical neurons form fine-scale functional networks. Nature, 433(7028):868 873, 2005. [17] Sen Song, Per Jesper Sjöström, Markus Reigl, Sacha Nelson, and Dmitri B Chklovskii. Highly nonrandom features of synaptic connectivity in local cortical circuits. PLo S Biology, 3(3):e68, 2005. [18] Claudia Clopath, Lars Büsing, Eleni Vasilaki, and Wulfram Gerstner. Connectivity reflects coding: a model of voltage-based STDP with homeostasis. Nature Neuroscience, 13(3):344 352, 2010. [19] Rodrigo Perin, Thomas K Berger, and Henry Markram. A synaptic organizing principle for cortical neuronal groups. Proceedings of the National Academy of Sciences, 108(13):5419 5424, 2011. [20] Ho Ko, Sonja B Hofer, Bruno Pichler, Katherine A Buchanan, P Jesper Sjöström, and Thomas D Mrsic-Flogel. Functional specificity of local synaptic connections in neocortical networks. Nature, 473(7345):87 91, 2011. [21] Ashok Litwin-Kumar and Brent Doiron. Slow dynamics and high variability in balanced cortical networks with clustered connections. Nature Neuroscience, 15(11):1498 1505, 2012. [22] Michael N Shadlen and William T Newsome. Noise, neural codes and cortical organization. Current Opinion in Neurobiology, 4(4):569 579, 1994. [23] Michael N Shadlen and William T Newsome. The variable discharge of cortical neurons: implications for connectivity, computation, and information coding. Journal of Neuroscience, 18 (10):3870 3896, 1998. [24] Saurabh Vyas, Matthew D Golub, David Sussillo, and Krishna V Shenoy. Computation through neural population dynamics. Annual Review of Neuroscience, 43:249 275, 2020. [25] Paul Miller. Itinerancy between attractor states in neural systems. Current Opinion in Neurobi- ology, 40:14 22, 2016. [26] Giancarlo La Camera, Alfredo Fontanini, and Luca Mazzucato. Cortical computations via metastable activity. Current Opinion in Neurobiology, 58:37 45, 2019. [27] Braden AW Brinkman, Han Yan, Arianna Maffei, Il Memming Park, Alfredo Fontanini, Jin Wang, and Giancarlo La Camera. Metastable dynamics of neural circuits and networks. Applied Physics Reviews, 9(1):011313, 2022. [28] Kenneth W Latimer, Jacob L Yates, Miriam LR Meister, Alexander C Huk, and Jonathan W Pillow. Single-trial spike trains in parietal cortex reveal discrete steps during decision-making. Science, 349(6244):184 187, 2015. [29] Tatiana A Engel, Nicholas A Steinmetz, Marc A Gieselmann, Alexander Thiele, Tirin Moore, and Kwabena Boahen. Selective modulation of cortical state during spatial attention. Science, 354(6316):1140 1144, 2016. [30] Luca Mazzucato, Giancarlo La Camera, and Alfredo Fontanini. Expectation-induced modulation of metastable activity underlies faster coding of sensory stimuli. Nature Neuroscience, 22(5): 787 796, 2019. [31] Rubén Moreno-Bote, John Rinzel, and Nava Rubin. Noise-induced alternations in an attractor network model of perceptual bistability. Journal of Neurophysiology, 98(3):1125 1139, 2007. [32] Luca Mazzucato, Alfredo Fontanini, and Giancarlo La Camera. Dynamics of multistable states during ongoing and evoked cortical activity. Journal of Neuroscience, 35(21):8214 8231, 2015. [33] Jakob H Macke, Lars Buesing, John P Cunningham, Byron M Yu, Krishna V Shenoy, and Maneesh Sahani. Empirical models of spiking in neural populations. Advances in Neural Information Processing Systems, 24:1350 1358, 2011. [34] Guy Ackerson and K Fu. On state estimation in switching environments. IEEE Transactions on Automatic Control, 15(1):10 17, 1970. [35] Zoubin Ghahramani and Geoffrey E Hinton. Variational learning for switching state-space models. Neural Computation, 12(4):831 864, 2000. [36] David Barber. Expectation correction for smoothed inference in switching linear dynamical systems. Journal of Machine Learning Research, 7(11), 2006. [37] Emily Fox, Erik Sudderth, Michael Jordan, and Alan Willsky. Nonparametric Bayesian learning of switching linear dynamical systems. Advances in Neural Information Processing Systems, 21: 457 464, 2008. [38] Biljana Petreska, Byron M Yu, John P Cunningham, Gopal Santhanam, Stephen Ryu, Krishna V Shenoy, and Maneesh Sahani. Dynamical segmentation of single trials from population neural data. Advances in Neural Information Processing Systems, 24:756 764, 2011. [39] Scott Linderman, Matthew Johnson, Andrew Miller, Ryan Adams, David Blei, and Liam Paninski. Bayesian learning and inference in recurrent switching linear dynamical systems. In Artificial Intelligence and Statistics, pages 914 922. PMLR, 2017. [40] Jayant E Kulkarni and Liam Paninski. Common-input models for multiple neural spike-train data. Network: Computation in Neural Systems, 18(4):375 407, 2007. [41] Vernon Lawhern, Wei Wu, Nicholas Hatsopoulos, and Liam Paninski. Population decoding of motor cortical activity using a generalized linear model with hidden states. Journal of Neuroscience Methods, 189(2):267 280, 2010. [42] Michael Vidne, Yashar Ahmadian, Jonathon Shlens, Jonathan W Pillow, Jayant Kulkarni, Alan M Litke, EJ Chichilnisky, Eero Simoncelli, and Liam Paninski. Modeling the impact of common noise inputs on the network activity of retinal ganglion cells. Journal of Computational Neuroscience, 33(1):97 121, 2012. [43] Yuanjun Gao, Evan W Archer, Liam Paninski, and John P Cunningham. Linear dynamical neural population models through nonlinear embeddings. Advances in Neural Information Processing Systems, 29:163 171, 2016. [44] Anqi Wu, Nicholas A Roy, Stephen Keeley, and Jonathan W Pillow. Gaussian process based non- linear latent structure discovery in multivariate spike train data. Advances in Neural Information Processing Systems, 30:3496 3505, 2017. [45] Chethan Pandarinath, Daniel J O Shea, Jasmine Collins, Rafal Jozefowicz, Sergey D Stavisky, Jonathan C Kao, Eric M Trautmann, Matthew T Kaufman, Stephen I Ryu, Leigh R Hochberg, et al. Inferring single-trial neural population dynamics using sequential auto-encoders. Nature Methods, 15(10):805 815, 2018. [46] Lea Duncker and Maneesh Sahani. Temporal alignment and latent gaussian process factor inference in population spike trains. Advances in neural information processing systems, 31: 10445 10455, 2018. [47] Lea Duncker, Gergo Bohner, Julien Boussard, and Maneesh Sahani. Learning interpretable continuous-time models of latent stochastic dynamical systems. In International Conference on Machine Learning, pages 1726 1734. PMLR, 2019. [48] Josue Nassar, Scott Linderman, Monica Bugallo, and Il Memming Park. Tree-structured recur- rent switching linear dynamical systems for multi-scale modeling. In International Conference on Learning Representations, 2019. [49] Joshua Glaser, Matthew Whiteway, John P Cunningham, Liam Paninski, and Scott Linderman. Recurrent switching dynamical systems models for multiple interacting neural populations. Advances in Neural Information Processing Systems, 33:14867 14878, 2020. [50] David Zoltowski, Jonathan Pillow, and Scott Linderman. A general recurrent state space framework for modeling neural dynamics during decision-making. In International Conference on Machine Learning, pages 11680 11691. PMLR, 2020. [51] Virginia Rutten, Alberto Bernacchia, Maneesh Sahani, and Guillaume Hennequin. Nonreversible Gaussian processes for identifying latent dynamical structure in neural data. Advances in Neural Information Processing Systems, pages 9622 9632, 2020. [52] Stephen Keeley, Mikio Aoi, Yiyi Yu, Spencer Smith, and Jonathan W Pillow. Identifying signal and noise structure in neural population activity with Gaussian process factor models. Advances in Neural Information Processing Systems, 33:13795 13805, 2020. [53] Timothy D Kim, Thomas Z Luo, Jonathan W Pillow, and Carlos Brody. Inferring latent dynamics underlying neural population activity via neural differential equations. In International Conference on Machine Learning, pages 5551 5561. PMLR, 2021. [54] Alexandre René, André Longtin, and Jakob H Macke. Inference of a mesoscopic population model from population spike trains. Neural Computation, 32(8):1448 1498, 2020. [55] Liam Paninski. Maximum likelihood estimation of cascade point-process neural encoding models. Network: Computation in Neural Systems, 15(4):243 262, 2004. [56] Wilson Truccolo, Uri T Eden, Matthew R Fellows, John P Donoghue, and Emery N Brown. A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. Journal of Neurophysiology, 93(2):1074 1089, 2005. [57] Jonathan W Pillow, Jonathon Shlens, Liam Paninski, Alexander Sher, Alan M Litke, EJ Chichilnisky, and Eero P Simoncelli. Spatio-temporal correlations and visual signalling in a complete neuronal population. Nature, 454(7207):995 999, 2008. [58] David Zoltowski and Jonathan W Pillow. Scaling the Poisson GLM to massive neural datasets through polynomial approximations. Advances in Neural Information Processing Systems, 31: 3517 3527, 2018. [59] Regis C Lambert, Christine Tuleau-Malot, Thomas Bessaih, Vincent Rivoirard, Yann Bouret, Nathalie Leresche, and Patricia Reynaud-Bouret. Reconstructing the functional connectivity of multiple spike trains using Hawkes models. Journal of Neuroscience Methods, 297:9 21, 2018. [60] Ryota Kobayashi, Shuhei Kurita, Anno Kurth, Katsunori Kitano, Kenji Mizuseki, Markus Diesmann, Barry J Richmond, and Shigeru Shinomoto. Reconstructing neuronal circuitry from parallel spike trains. Nature Communications, 10(1):1 13, 2019. [61] Werner M Kistler, Wulfram Gerstner, and J Leo van Hemmen. Reduction of the Hodgkin- Huxley equations to a single-variable threshold model. Neural Computation, 9(5):1015 1045, 1997. [62] Renaud Jolivet, Timothy J Lewis, and Wulfram Gerstner. Generalized integrate-and-fire models of neuronal activity approximate spike trains of a detailed model to a high degree of accuracy. Journal of Neurophysiology, 92(2):959 976, 2004. [63] Romain Brette and Wulfram Gerstner. Adaptive exponential integrate-and-fire model as an effective description of neuronal activity. Journal of Neurophysiology, 94(5):3637 3642, 2005. [64] Renaud Jolivet, Alexander Rauch, Hans-Rudolf Lüscher, and Wulfram Gerstner. Predicting spike timing of neocortical pyramidal neurons by simple threshold models. Journal of Computational Neuroscience, 21(1):35 49, 2006. [65] Ryota Kobayashi, Yasuhiro Tsubo, and Shigeru Shinomoto. Made-to-order spiking neuron model equipped with a multi-timescale adaptive threshold. Frontiers in Computational Neuroscience, 3:9, 2009. [66] Christian Pozzorini, Skander Mensi, Olivier Hagens, Richard Naud, Christof Koch, and Wulfram Gerstner. Automated high-throughput characterization of single neurons by means of simplified spiking models. PLo S Computational Biology, 11(6):e1004275, 2015. [67] Corinne Teeter, Ramakrishnan Iyer, Vilas Menon, Nathan Gouwens, David Feng, Jim Berg, Aaron Szafer, Nicholas Cain, Hongkui Zeng, Michael Hawrylycz, et al. Generalized leaky integrate-and-fire models classify multiple neuron types. Nature Communications, 9(1):1 15, 2018. [68] David Roxbee Cox. Renewal theory. Methuen, 1962. [69] Hugh R Wilson and Jack D Cowan. Excitatory and inhibitory interactions in localized popula- tions of model neurons. Biophysical Journal, 12(1):1 24, 1972. [70] Anna De Masi, Antonio Galves, Eva Löcherbach, and Errico Presutti. Hydrodynamic limit for interacting neurons. Journal of Statistical Physics, 158(4):866 902, 2015. [71] Nicolas Fournier and Eva Löcherbach. On a toy model of interacting neurons. Annales de l Institut Henri Poincaré, Probabilités et Statistiques, 52:1844 1876, 2016. [72] Julien Chevallier. Mean-field limit of generalized Hawkes processes. Stochastic Processes and their Applications, 127(12):3870 3912, 2017. [73] Julien Chevallier. Fluctuations for mean-field interacting age-dependent Hawkes processes. Electronic Journal of Probability, 22, 2017. [74] Grégory Dumont, Alexandre Payeur, and André Longtin. A stochastic-field description of finite-size spiking neural networks. PLo S Computational Biology, 13(8):e1005691, 2017. [75] Valentin Schmutz, Wulfram Gerstner, and Tilo Schwalger. Mesoscopic population equations for spiking neural networks with synaptic short-term plasticity. The Journal of Mathematical Neuroscience, 10(1):1 32, 2020. [76] Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1):1 22, 1977. [77] Anne C Smith and Emery N Brown. Estimating a state-space model from point process observations. Neural Computation, 15(5):965 991, 2003. [78] Byron M Yu, John P Cunningham, Gopal Santhanam, Stephen I Ryu, Krishna V Shenoy, and Maneesh Sahani. Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity. Journal of Neurophysiology, 102(1):614 635, 2009. [79] Yariv Ephraim and Neri Merhav. Hidden Markov processes. IEEE Transactions on Information Theory, 48(6):1518 1569, 2002. [80] Armen Allahverdyan and Aram Galstyan. Comparative analysis of Viterbi training and maxi- mum likelihood estimation for HMMs. Advances in Neural Information Processing Systems, 24: 1674 1682, 2011. [81] Wulfram Gerstner and J Leo van Hemmen. Coherence and incoherence in a globally coupled ensemble of pulse-emitting units. Physical Review Letters, 71(3):312, 1993. [82] David Golomb and John Rinzel. Clustering in globally coupled inhibitory neurons. Physica D: Nonlinear Phenomena, 72(3):259 282, 1994. [83] U Ernst, K Pawelzik, and T Geisel. Synchronization induced by temporal delays in pulse- coupled oscillators. Physical Review Letters, 74(9):1570, 1995. [84] Kong-Fatt Wong and Xiao-Jing Wang. A recurrent network mechanism of time integration in perceptual decisions. Journal of Neuroscience, 26(4):1314 1328, 2006. [85] Stuart Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129 137, 1982. [86] Mark CW van Rossum. A novel spike distance. Neural Computation, 13(4):751 763, 2001. [87] Yuan Zhao and Il Memming Park. Interpretable nonlinear dynamic modeling of neural trajecto- ries. Advances in Neural Information Processing Systems, 29:3333 3341, 2016. [88] Matthew R Whiteway and Daniel A Butts. The quest for interpretable models of neural population activity. Current Opinion in Neurobiology, 58:86 93, 2019. [89] Jan-Matthis Lueckmann, Pedro J Goncalves, Giacomo Bassetto, Kaan Öcal, Marcel Nonnen- macher, and Jakob H Macke. Flexible statistical inference for mechanistic models of neural dynamics. Advances in Neural Information Processing Systems, 30:1289 1299, 2017. [90] William D Penny, Klaas E Stephan, Andrea Mechelli, and Karl J Friston. Comparing dynamic causal models. Neuroimage, 22(3):1157 1172, 2004. [91] Will D Penny, Klaas E Stephan, Jean Daunizeau, Maria J Rosa, Karl J Friston, Thomas M Schofield, and Alex P Leff. Comparing families of dynamic causal models. PLo S Computational Biology, 6(3):e1000709, 2010. [92] Kay H Brodersen, Thomas M Schofield, Alexander P Leff, Cheng Soon Ong, Ekaterina I Lomakina, Joachim M Buhmann, and Klaas E Stephan. Generative embedding for model-based classification of f MRI data. PLo S Computational Biology, 7(6):e1002079, 2011.