# targeting_eeglfp_synchrony_with_neural_nets__d6545cb8.pdf

Targeting EEG/LFP Synchrony with Neural Nets

Yitong Li1, Michael Murias2, Samantha Major2, Geraldine Dawson2, Kafui Dzirasa2,

Lawrence Carin1 and David E. Carlson3,4

1Department of Electrical and Computer Engineering, Duke University

2Departments of Psychiatry and Behavioral Sciences, Duke University 3Department of Civil and Environmental Engineering, Duke University

4Department of Biostatistics and Bioinformatics, Duke University {yitong.li,michael.murias,samantha.major,geraldine.dawson,

kafui.dzirasa,lcarin,david.carlson}@duke.edu

We consider the analysis of Electroencephalography (EEG) and Local Field Potential (LFP) datasets, which are big in terms of the size of recorded data but rarely have sufﬁcient labels required to train complex models (e.g., conventional deep learning methods). Furthermore, in many scientiﬁc applications, the goal is to be able to understand the underlying features related to the classiﬁcation, which prohibits the blind application of deep networks. This motivates the development of a new model based on parameterized convolutional ﬁlters guided by previous neuroscience research; the ﬁlters learn relevant frequency bands while targeting synchrony, which are frequency-speciﬁc power and phase correlations between electrodes. This results in a highly expressive convolutional neural network with only a few hundred parameters, applicable to smaller datasets. The proposed approach is demonstrated to yield competitive (often state-of-the-art) predictive performance during our empirical tests while yielding interpretable features. Furthermore, a Gaussian process adapter is developed to combine analysis over distinct electrode layouts, allowing the joint processing of multiple datasets to address overﬁtting and improve generalizability. Finally, it is demonstrated that the proposed framework effectively tracks neural dynamics on children in a clinical trial on Autism Spectrum Disorder.

1 Introduction

There is signiﬁcant current research on methods for Electroencephalography (EEG) and Local Field Potential (LFP) data in a variety of applications, such as Brain-Machine Interfaces (BCIs) [21], seizure detection [24, 26], and fundamental research in ﬁelds such as psychiatry [11]. The wide variety of applications has resulted in many analysis approaches and packages, such as Independent Component Analysis in EEGLAB [8], and a variety of standard machine learning approaches in Field Trip [22]. While in many applications prediction is key, such as for BCIs [18, 19], in applications such as emotion processing and psychiatric disorders, clinicians are ultimately interested in the dynamics of underlying neural signals to help elucidate understanding and design future experiments. This goal necessitates development of interpretable models, such that a practitioner may understand the features and their relationships to outcomes. Thus, the focus here is on developing an interpretable and predictive approach to understanding spontaneous neural activity.

A popular feature in these analyses is based on spectral coherence, where a speciﬁc frequency band is compared between pairwise channels, to analyze both amplitude and phase coherence. When two regions have a high power (amplitude) coherence in a spectral band, it implies that these areas are

31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.

coordinating in a functional network to perform a task [3]. Spectral coherence has been previously used to design classiﬁcation algorithms on EEG [20] and LFP [30] data. Furthermore, these features have underlying neural relationships that can be used to design causal studies using neurostimulation [11]. However, fully pairwise approaches face signiﬁcant challenges with limited data because of the proliferation of features when considering pairwise properties. Recent approaches to this problem include ﬁrst partitioning the data to spatial areas and considering only broad relationships between spatial regions [33], or enforcing a low-rank structure on the pairwise relationships [30].

To analyze both LFP and EEG data, we follow [30] to focus on low-rank properties; however, this previous approach focused on a Gaussian process implementation for LFPs, that does not scale to the greater number of electrodes used in EEG. We therefore develop a new framework whereby the low-rank spectral patterns are approximated by parameterized linear projections, with the parametrization guided by neuroscience insights from [30]. Critically, these linear projections can be included in a convolutional neural network (CNN) architecture to facilitate end-to-end learning with interpretable convolutional ﬁlters and fast test-time performance. In addition to being interpretable, the parameterization dramatically reduces the total number of parameters to ﬁt, yielding a CNN with only hundreds of parameters. By comparison, conventional deep models require learning millions of parameters. Even special-purpose networks such as EEGNet [15], a recently proposed CNN model for EEG data, still require learning thousands of parameters.

The parameterized convolutional layer in the proposed model is followed by max-pooling, a single fully-connected layer, and a cross-entropy classiﬁcation loss; this leads to a clear relationship between the proposed targeted features and outcomes. When presenting the model, interpretation of the ﬁlters and the classiﬁcation algorithms are discussed in detail. We also discuss how deeper structures can be developed on top of this approach. We demonstrate in the experiments that the proposed framework mitigates overﬁtting and yields improved predictive performance on several publicly available datasets.

In addition to developing a new neuroscience-motivated parametric CNN, there are several other contributions of this manuscript. First, a Gaussian Process (GP) adapter [16] within the proposed framework is developed. The idea is that the input electrodes are ﬁrst mapped to pseudo-inputs by using a GP, which allows straightforward handling of missing (dropped or otherwise noise-corrupted) electrodes common in real datasets. In addition, this allows the same convolutional neural network to be applied to datasets recorded on distinct electrode layouts. By combining data sources, the result can better generalize to a population, which we demonstrate in the results by combining two datasets based on emotion recognition. We also developed an autoencoder version of the network to address overﬁtting concerns that are relevant when the total amount of labeled data is limited, while also improving model generalizability. The autoencoder can lead to minor improvements in performance, which is included in the Supplementary Material.

2 Basic Model Setup: Parametric CNN

The following notation is employed: scalars are lowercase italicized letters, e.g. x, vectors are bolded lowercase letters, e.g. x, and matrices are bolded uppercase letters, e.g. X. The convolution operator is denoted , and | = p 1. denotes the Kronecker product. denotes an element-wise product.

The input data are Xi 2 RC T , where C is the number of simultaneously recorded electrodes/channels, and T is given by the sampling rate and time length; i = 1, . . . , N, where N is the total number of trials. The data can also be represented as Xi = [xi1, , xi C]|, where xic 2 RT is the data restricted to the cth channel. The associated labels are denoted yi, which is an integer corresponding to a label. The trial index i is added only when necessary for clarity.

An example signal is presented in Figure 1 (Left). The data are often windowed, the ith of which yields Xi and the associated label yi. Clear identiﬁcation of phase and power relationships among channels motivates the development of a structured neural network model for which the convolutional ﬁlters target this synchrony, or frequency-speciﬁc power and phase correlations.

2.1 Sync Net

Inspired both by the success of deep learning and spectral coherence as a predictive feature [12, 30], a CNN is developed to target these properties. The proposed model, termed Sync Net, performs a structured 1D convolution to jointly model the power, frequency and phase relationships between channels.

Figure 1: (Left) Visualization of EEG dataset on 8 electrodes split into windows. The markers (e.g., FP1 ) denote electrode names, which have corresponding spatial locations. (Right) 8 channels of

synthetic data. Refer to Section 2.2 for more detail.

Figure 2: Sync Net follows a convolutional neural network structure. The right side is the Sync Net (Section 2.1), which is parameterized to target relevant quantities. The left side is the GP adapter, which aims at unifying different electrode layout and reducing overﬁtting (Section 3).

This goal is achieved by using parameterized 1-dimensional convolutional ﬁlters. Speciﬁcally, the kth of K ﬁlters for channel c is

c ( ) = b(k)

c cos(!(k) + φ(k)

c ) exp( β(k) 2). (1)

The frequency !(k) 2 R+ and decay β(k) 2 R+ parameters are shared across channels, and they deﬁne the real part of a (scaled) Morlet wavelet1. These two parameters deﬁne the spectral properties targeted by the kth ﬁlter, where !(k) controls the center of the frequency spectrum and β(k) controls the frequency-time precision trade-off. The amplitude b(k)

c 2 R+ and phase shift φ(k)

c 2 [0, 2 ] are channel-speciﬁc. Thus, the convolutional ﬁlter in each channel will be a discretized version of a scaled and rotated Morlet wavelet. By parameterizing the model in this way, all channels are targeted collectively. The form in (1) is motivated by the work in [30], but the resulting model we develop is far more computationally efﬁcient. A fuller discussion of the motivation for (1) is detailed in Section 2.2.

For practical reasons, the ﬁlters are restricted to have ﬁnite length N , and each time step takes an integer value from

when N is even and from

when N is odd. For typical learned β(k) s, the convolutional ﬁlter vanishes by the edges of the window. Succinctly, the output of the k convolutional ﬁlter bank is given by h(k) = PC

The simplest form of Sync Net contains only one convolution layer, as in Figure 2. The output from each ﬁlter bank h(k) is passed through a Rectiﬁed Linear Unit (Re LU), followed by max pooling over the entire window, to return h(k) for each ﬁlter. The ﬁlter outputs h(k) for k = 1, . . . , K are concatenated and used as input to a softmax classiﬁer with the cross-entropy loss to predict ˆy. Because of the temporal and spatial redundancies in EEG, dropout is instituted at the channel level, with

dropout(xc) =

xc/p, with probability p 0, with probability 1 p. (2)

p determines the typical percentage of channels included, and was set as p = 0.75. It is straightforward to create deeper variants of the model by augmenting Sync Net with additional standard convolutional

1It is straightforward to use the Morlet wavelet directly and deﬁne the outputs as complex variables and deﬁne the neural network to target the same properties, but this leads to both computational and coding overhead.

layers. However, in our experiments, adding more layers typically resulted in over-ﬁtting due to the limited numbers of training samples, but will likely be beneﬁcial in larger datasets.

2.2 Sync Net Targets Class Differences in Cross-Spectral Densities

The cross-spectral density [3] is a widely used metric for understanding the synchronous nature of signal in frequency bands. The cross-spectral density is typically constructed by converting a time-series into a frequency representation, and then calculating the complex covariance matrix in each frequency band. In this section we sketch how the Sync Net ﬁlter bank targets cross-spectral densities to make optimal classiﬁcations. The discussion will be in the complex domain ﬁrst, and then it will be demonstrated why the same result occurs in the real domain.

In the time-domain, it is possible to understand the cross-spectral density of a single frequency band by using a cross-spectral kernel [30] to deﬁne the covariance function of a Gaussian process. Letting = t t0, the cross-spectral kernel is deﬁned

cc0tt0 = cov(xct, xc0t0) = Acc0 ( ), ( ) = exp

Here, ! and β control the frequency band. c and c0 are channel indexes. A 2 CC C is a positive semi-deﬁnite matrix that deﬁnes the cross-spectral density for that frequency band controlled by ( ). Each entry Acc0 is made of of a magnitude |Acc0| that controls the power (amplitude) coherence between electrodes in that frequency band and a complex phase that determines the optimal time offset between the signals. The covariance over the complete multi-channel times series is given by KCSD = A ( ). The power (magnitude) coherence is given by the absolute value of the entry, and the phase offset can be determined by the rotation in the complex space.

A generative model for oscillatory neural signals is given by a Gaussian process with this kernel [30], where vec(X) CN(0, KCSD + σ2IC T ). The entries of KCSD are given from (3). CN denotes the circularly symmetric complex normal. The additive noise term σ2IC T is excluded in the following for clarity.

Note that the complex form of (1) in Sync Net across channels is given as f( ) = f!( )s, where f!( ) = exp( 1

2β 2 + |! ) is the ﬁlter over time and s = b exp(|φ) are the weights and rotations of a single Sync Net ﬁlter. Suppose that each channel was ﬁltered independently by the ﬁlter f! = f!( ) with a vector input . Writing the convolution in matrix form as xc = f! xc = F

!xc, where F! 2 CT T is a matrix formulation of the convolution operator, results in a ﬁltered signal

. For a ﬁltered version over all channels, XT = [x T

C], the distribution would be given by

vec( X) = vec(F

, xt CN(0, A

xt 2 RC is deﬁned as the observation at time t for all C channels. The diagonal of

will reach a steady-state quickly away from the edge effects, so we state this as const =

tt. The output from the Sync Net ﬁlter bank prior to the pooling stage is then given by ht = s xt CN(0, const s As). We note that the signal-to-noise ratio would be maximized by matching the ﬁlter s (f!) frequency properties to the generated frequency properties; i.e. β and ! from (1) should match β and ! from (3).

We next focus on the properties of an optimal s. Suppose that two classes are generated from (3) with cross-spectral densities of A0 and A1 for classes 0 and 1, respectively. Thus, the signals are drawn from CN(0, Ay ( )) for y = {0, 1}. The optimal projection s would maximize the differences in the distribution ht depending on the class, which is equivalent to maximizing the ratio between the variances of the two cases. Mathematically, this is equivalent to ﬁnding

s = arg maxs max

s A1s s A0s, s A0s

= arg maxs | log(s A1s) log(s A0s)|. (5)

Note that the constant dropped out due to the ratio. Because the Sync Net ﬁlter is attempting to classify the two conditions, it should learn to best differentiate the classes and match the optimal s . We demonstrate in Section 5.1 on synthetic data that Sync Net ﬁlters do in fact align with this optimal direction and is therefore targeting properties of the cross-spectral densities.

In the above discussion, the argument was made with respect to complex signals and models; however, a similar result holds when only the real domain is used. Note that if the signals are oscillatory, then

the result after the ﬁltering of the domain and the max-pooling will be essentially the same as using a max-pooling on the absolute value of the complex ﬁlters. This is because the ﬁltered signal is rotated through the complex domain, and will align with the real domain within the max-pooling period for standard signals. This is shown visually in Supplemental Figure 9.

3 Gaussian Process Adapter

A practical issue in EEG datasets is that electrode layouts are not constant, either due to inconsistent device design or electrode failure. Secondly, nearby electrodes are highly correlated and contain redundant information, so ﬁtting parameters to all electrodes results in overﬁtting. These issues are addressed by developing a Gaussian Process (GP) adapter, in the spirit of [16], trained with Sync Net as shown in the left side of Figure 2. Regardless of the electrode layout, the observed signal X at electrode locations p = {p1, , p C} are mapped to a shared number of pseudo-inputs at locations p = {p

L} before being input to Sync Net.

In contrast to prior work, the proposed GP adapter is formulated as a multi-task GP [4] and the pseudoinput locations p are learned. A GP is used to map X 2 RC T at locations p to the pseudo-signals X 2 RL T at locations p , where L < C is the number of pseudo-inputs. Distances are constructed by projecting each electrode into a 2D representation by the Azimuthal Equidistant Projection. When evaluated at a ﬁnite set of points, the multi-task GP [4] can be written as a multivariate normal

, f N (0, K) . (6)

K is constructed by a kernel function K( , c, c0) that encodes separable relationships through time and through space. The full covariance matrix can be calculated as K = Kpp Ktt, where Kpcpc0 = 1 exp( 2||pc pc0||1) and Ktt is set to identity matrix IT . Kpp 2 RC C targets the spatial relationship across channels using the exponential kernel. Note that this kernel K is distinct from KCSD used in section 2.2.

Let the pseudo-inputs locations be deﬁned as p

l for l = 1, , L. Using the GP formulation, the signal can be inferred at the L pseudo-input locations from the original signal. Following [16], only the expectation of the signal is used (to facilitate fast computation), which is given by X = E(X |X) = Kp p(Kpp + σ2IC) 1X. An illustration of the learned new locations is shown under X in Figure 2. The derivation of this mathematical form and additional details on the GP adapter are included in Supplemental Section A.

The GP adapter parameters p , 1, 2 are optimized jointly with Sync Net. The input signal Xi is mapped to X

i , which is then input to Sync Net. The predicted label ˆyi is given by ˆyi = Sync(X

i ; ), where Sync( ) is the prediction function of Sync Net. Given the Sync Net loss function PN

i=1 (ˆyi, yi) = PN

i=1 (Sync(X

i ; ), yi), the overall training loss function

i=1 (Sync(E[X

i |Xi]; ), yi) = PN

Sync(Kp p(Kpp + σ2IC) 1Xi; ), yi

is jointly minimized over the Sync Net parameters and the GP adapter parameters {p , 1, 2}. The GP uncertainty can be included in the loss at the expense of signiﬁcantly increased optimization cost, but does not result in performance improvements to justify the increased cost [16].

4 Related Work

Frequency-spectrum features are widely used for processing EEG/LFP signals. Often this requires calculating synchronyor entropy-based features within predeﬁned frequency bands, such as [20, 5, 9, 14]. There are many hand-crafted features and classiﬁers for a BCI task [18]; however, in our experiments, these hand-crafted features did not perform well on long oscillatory signals. The EEG signal is modeled in [1] as a matrix-variate model with spatial and spectral smoothing. However, the number of parameters scales with time length, rendering the approach ineffective for longer time series. A range-EEG feature has been proposed [23], which measures the peak-to-peak amplitude. In contrast, our approach learns frequency bands of interest and we can deal with long time series evaluated in our experiments.

Deep learning has been a popular recent area of research in EEG analysis. This includes Restricted Boltzmann Machines and Deep Belief Networks [17, 36], CNNs [32, 29], and RNNs [2, 34]. These

approaches focus on learning both spatial and temporal relationships. In contrast to hand-crafted features and Sync Net, these deep learning methods are typically used as a black box classiﬁer. EEGNET [15] considered a four-layer CNN to classify event-related potentials and oscillatory EEG signals, demonstrating improved performance over low-level feature extraction. This network was designed to have limited parameters, requiring 2200 for their smallest model. In contrast, the Sync Net ﬁlters are simple to interpret and require learning only a few hundred parameters.

An alternative approach is to design GP kernels to target synchrony properties and learn appropriate frequency bands. The phase/amplitude synchrony of LFP signals has been modeled [30, 10] with the cross-spectral mixture (CSM) kernel. This approach was used to deﬁne a generative model over differing classes and may be used to learn an unsupervised clustering model. A key issue with the CSM approach is the computational complexity, where gradients cost O(NTC3) (using approximations), and is infeasible with the larger number of electrodes in EEG data. In contrast, the proposed GP adapter requires only a single matrix inversion shared by most data points, which is O(C3).

The use of wavelets has previously been considered in scattering networks [6]. Scattering networks used Morlet wavelets for image classiﬁcation, but did not consider the complex rotation of wavelets over channels nor the learning of the wavelet widths and frequencies considered here.

5 Experiments

To demonstrate that Sync Net is targeting synchrony information, we ﬁrst apply it to synthetic data in Section 5.1. Notably, the learned ﬁlter bank recovers the optimal separating ﬁlter. Empirical performance is given for several EEG datasets in Section 5.2, where Sync Net often has the highest hold-out accuracy while maintaining interpretable features. The usefulness of the GP adapter to combine datasets is demonstrated in Section 5.3, where classiﬁcation performance is dramatically improved via data augmentation. Empirical performance on an LFP dataset is shown in Section 5.4. Both the LFP signals and the EEG signals measure broad voltage ﬂuctuations from the brain, but the LFP has a signiﬁcantly cleaner signal because it is measured inside the cortical tissue. In all tested cases, Sync Net methods have essentially state-of-the-art prediction while maintaining interpretable features.

The code is written in Python and Tensorﬂow. The experiments were run on a 6-core i7 machine with a Nvidia Titan X Pascal GPU. Details on training are given in Supplemental Section C.

5.1 Synthetic Dataset

-2 -1 0 1 2 -2

Optimal Learned

Figure 3: Each dot represents one of 8 electrodes. The dots give complex directions for optimal and learned ﬁlters, demonstrating that Sync Net approximately recovers optimal ﬁlters.

Synthetic data are generated for two classes by drawing data from a circularly symmetric normal matching the synchrony assumptions discussed in Section 2.2. The frequency band is pre-deﬁned as ! = 10Hz and β is deﬁned as 40 (frequency variance of 2.5Hz) in (3). The number of channels is set to C = 8. Example data generated by this procedure is shown in Figure 1 (Right), where only the real part of the signal is kept.

A1 and A0 are set such that the optimal vector from solving (5) is given by the shape visualized in Figure 3. This is accomplished by setting A0 = IC and A1 = I + s (s ) . Data is then simulated by drawing from vec(X) CN(0, KCSD + σ2IC T ) and keeping only the real part of the signal. KCSD is deﬁned in equation (3) with A set to A0 or A1 depending on the class. In this experiment, the goal is to relate the ﬁlter learned in Sync Net and to this optimal separating plane s .

To show that Sync Net is targeting synchrony, it is trained on this synthetic data using only one single convolutional ﬁlter. The learned ﬁlter parameters are projected to the complex space by s = b exp(|φ), and are shown overlaid (rotated and rescaled to handle degeneracies) with the

optimal rotations in Figure 3. As the amount of data increases, the Sync Net ﬁlter recovers the expected relationship between channels and the predeﬁned frequency band. In addition, the learned ! is centered at 11Hz, which is close to the generated feature band ! of 10Hz. These synthetic data results demonstrate that Sync Net is able to recover frequency bands of interest and target synchrony properties.

5.2 Performance on EEG Datasets

We consider three publicly available datasets for EEG classiﬁcation, described below. After the validation on the publicly available data, we then apply the method to a new clinical-trial data, to demonstrate that the approach can learn interpretable features that track the brain dynamics as a result of treatment.

UCI EEG: This dataset2 has a total of 122 subjects with 77 diagnosed with alcoholism and 45 control subjects. Each subject undergoes 120 separate trials. The stimuli are pictures selected from 1980 Snodgrass and Vanderwart picture set. The EEG signal is of length one second and is sampled at 256Hz with 64 electrodes. We evaluate the data both within subject, which is randomly split as 7 : 1 : 2 for training, validation and testing, and using 11 subjects rotating test set. The classiﬁcation task is to recover whether the subject has been diagnosed with alcoholism or is a control subject.

DEAP dataset: The Database for Emotion Analysis using Physiological signals [14] has a total of 32 participants. Each subject has EEG recorded from 32 electrodes while they are shown a total of 40 one-minute long music videos with strong emotional score. After watching each video, each subject gave an integer score from one to nine to evaluate their feelings in four different categories. The self-assessment standards are valence (happy/unhappy), arousal (bored/excited), dominance (submissive/empowered) and personal liking of the video. Following [14], this is treated as a binary classiﬁcation with a threshold at a score of 4.5. The performance is evaluated with leave-one-out testing, and the remaining subjects are split to use 22 for training and 9 for validation.

SEED dataset: This dataset [35] involves repeated tests on 15 subjects. Each subject watches 15 movie clips 3 times. It clip is designated with a negative/neutral/positive emotion label, while the EEG signal is recorded at 1000Hz from 62 electrodes. For this dataset, leave-one-out cross-validation is used, and the remaining 14 subjects are split with 10 for training and 4 for validation.

ASD dataset: The Autism Spectral Disorder (ASD) dataset involves 22 children from ages 3 to 7 years undergoing treatment for ASD with EEG measurements at baseline, 6 months post treatment, and 12 months post treatment. Each recording session involves 3 one-minute videos designed to measure responses to social stimuli and controls, measured with a 121 electrode array. The trial was approved by the Duke Hospital Institutional Review Board and conducted under IND #15949. Full details on the experiments and initial clinical results are available [7]. The classiﬁcation task is to predict the time relative to treatment to track the change in neural signatures post-treatment. The cross-patient predictive ability is estimated with leave-one-out cross-validation, where 17 patients are used to train the model and 4 patients are used as a validation set.

Dataset UCI DEAP [14] SEED [35] ASD Within Cross Arousal Valence Domin. Liking Emotion Stage DE [35] 0.821 0.622 0.529 0.517 0.528 0.577 0.491 0.504 PSD [35] 0.816 0.605 0.584 0.559 0.595 0.644 0.352 0.499 r EEG [23] 0.702 0.614 0.549 0.538 0.557 0.585 0.468 0.361 Spectral [14] * * 0.620 0.576 * 0.554 * * EEGNET [15] 0.878 0.672 0.536 0.572 0.589 0.594 0.533 0.363 MC-DCNN [37] 0.840 0.300 0.593 0.604 0.635 0.621 0.527 0.584 Sync Net 0.918 0.705 0.611 0.608 0.651 0.679 0.558 0.630 GP-Sync Net 0.923 0.723 0.592 0.611 0.621 0.659 0.516 0.637 Table 1: Classiﬁcation accuracy on EEG datasets.

The accuracy of predictions on these EEG datasets, from a variety of methods, is given in Table 1. We also implemented other hand-crafted spatial features, such as the brain symmetric index [31]; however, their performance was not competitive with the results here. EEGNET is an EEG-speciﬁc convolutional network proposed in [15]. The Spectral method from [14] uses an SVM on extracted

2https://kdd.ics.uci.edu/databases/eeg/eeg.html

(a) Spatial pattern of learned amplitude b.

(b) Spatial pattern of learned phase φ.

Figure 4: Learned ﬁlter centered at 14Hz on the ASD dataset. Figures made with Field Trip [22].

spectral power features from each electrode in different frequency bands. MC-DCNN [37] denotes a 1D CNN where the ﬁlters are learned without the constraints of the parameterized structure. The Sync Net used 10 ﬁlter sets both with (GP-Sync Net) and without the GP adapter. Remarkably, the basic Sync Net already delivers state-of-the-art performance on most tasks. In contrast, the handcrafted features did not effectively cannot capture available information and the alternative CNN based methods severely overﬁt the training data due to the large number of free parameters.

In addition to state-of-the-art classiﬁcation performance, a key component of Sync Net is that the features extracted and used in the classiﬁcation are interpretable. Speciﬁcally, on the ASD dataset, the proposed method signiﬁcantly improves the state-of-the-art. However, the end goal of this experiment is to understand how the neural activity is changing in response to the treatment. On this task, the ability of Sync Net to visualize features is important for dissemination to medical practitioners. To demonstrate how the ﬁlters can be visualized and communicated, we show one of the ﬁlters learned in Sync Net on the ASD dataset in Figure 4. This ﬁlter, centered at 14Hz, is highly associated with the session at 6 months post-treatment. Notably, this ﬁlter bank is dominantly using the signals measured at the forward part of the scalp (Figure 4, Left). Intriguingly, the phase relationships are primarily in phase for the frontal regions, but note that there are off-phase relationships between the midfrontal and the frontal part of the scale (Figure 4, Right). Additional visualizations of the results are given in Supplemental Section E.

5.3 Experiments on GP adapter In the previous section, it was noted that the GP adapter can improve performance within an existing dataset, demonstrating that the GP adapter is useful to reduce the number of parameters. However, our primary designed use of the GP Adapter is to unify different electrode layouts. This is explored further by applying the GP-Sync Net to the UCI EEG dataset and changing the number of pseudo-inputs. Notably, a mild reduction in the number of pseudo-inputs improves performance over directly using the measured data (Supplemental Figure 6(a)) by reducing the total number of parameters. This is especially true when comparing the GP adapter to using a random subset of channels to reduce dimensionality.

Sync Net GP-Sync Net GP-Sync Net Joint DEAP [14] dataset 0.521 0.026 0.557 0.025 0.603 0.020 SEED [35] dataset 0.771 0.009 0.762 0.015 0.779 0.009 Table 2: Accuracy mean and standard errors for training two datasets separately and jointly.

To demonstrate that the GP adapter can be used to combine datasets, the DEAP and SEED datasets were trained jointly using a GP adapter. The SEED data was downsampled to 128Hz to match the frequency of DEAP dataset, and the data was separated into 4 second windows due to their different lengths. The label for the trial is attached for each window. To combine the labeling space, only the negative and positive emotion labels were kept in SEED and valence was used in the DEAP dataset. The number of pseudo-inputs is set to L = 26. The results are given in Table 2, which demonstrates that combining datasets can lead to dramatically improved generalization ability due to the data

augmentation. Note that the basic Sync Net performances in Table 2 differ from the results in Table 1. Speciﬁcally, the DEAP dataset performance is worse; this is due to signiﬁcantly reduced information when considering a 4 second window instead of a 60 second window. Second, the performance on SEED has improved; this is due to considering only 2 classes instead of 3.

5.4 Performance on an LFP Dataset

Due to the limited publicly available multi-region LFP datasets, only a single LFP data was included in the experiments. The intention of this experiment is to show that the method is broadly applicable in neural measurements, and will be useful with the increasing availability of multi-region datasets. An LFP dataset is recorded from 26 mice from two genetic backgrounds (14 wild-type and 12 CLOCK 19). CLOCK 19 mice are an animal model of a psychiatric disorder. The data are sampled at 200 Hz for 11 channels. The data recording from each mouse has ﬁve minutes in its home cage, ﬁve minutes from an open ﬁeld test, and ten minutes from a tail-suspension test. The data are split into temporal windows of ﬁve seconds. Sync Net is evaluated by two distinct prediction tasks. The ﬁrst task is to predict the genotype (wild-type or CLOCK 19) and the second task is to predict the current behavior condition (home cage, open ﬁeld, or tail-suspension test). We separate the data randomly as 7 : 1 : 2 for training, validation and testing

PCA + SVM DE [35] PSD [35] r EEG [23] EEGNET [15] Sync Net Behavior 0.911 0.874 0.858 0.353 0.439 0.946 Genotype 0.724 0.771 0.761 0.449 0.689 0.926

Table 3: Comparison between different methods on an LFP dataset.

Results from these two predictive tasks are shown in Table 3. Sync Net used K = 20 ﬁlters with ﬁlter length 40. These results demonstrate that Sync Net straightforwardly adapts to both EEG and LFP data. These data will be released with publication of the paper.

6 Conclusion

We have proposed Sync Net, a new framework for EEG and LFP data classiﬁcation that learns interpretable features. In addition to our original architecture, we have proposed a GP adapter to unify electrode layouts. Experimental results on both LFP and EEG data show that Sync Net outperforms conventional CNN architectures and all compared classiﬁcation approaches. Importantly, the features from Sync Net can be clearly visualized and described, allowing them to be used to understand the dynamics of neural activity.

Acknowledgements

In working on this project L.C. received funding from the DARPA HIST program; K.D., L.C., and D.C. received funding from the National Institutes of Health by grant R01MH099192-05S2; K.D received funding from the W.M. Keck Foundation; G.D. received funding from Marcus Foundation, Perkin Elmer, Stylli Translational Neuroscience Award, and NICHD 1P50HD093074.

[1] A. S. Aghaei, M. S. Mahanta, and K. N. Plataniotis. Separable common spatio-spectral patterns

for motor imagery bci systems. IEEE TBME, 2016.

[2] P. Bashivan, I. Rish, M. Yeasin, and N. Codella. Learning representations from eeg with deep

recurrent-convolutional neural networks. ar Xiv:1511.06448, 2015.

[3] A. M. Bastos and J.-M. Schoffelen. A tutorial review of functional connectivity analysis

methods and their interpretational pitfalls. Frontiers in Systems Neuroscience, 2015.

[4] E. V. Bonilla, K. M. A. Chai, and C. K. Williams. Multi-task gaussian process prediction. In

NIPS, volume 20, 2007.

[5] W. Bosl, A. Tierney, H. Tager-Flusberg, and C. Nelson. Eeg complexity as a biomarker for

autism spectrum disorder risk. BMC Medicine, 2011.

[6] J. Bruna and S. Mallat. Invariant scattering convolution networks. IEEE PAMI, 2013.

[7] G. Dawson, J. M. Sun, K. S. Davlantis, M. Murias, L. Franz, J. Troy, R. Simmons, M. Sabatos-

De Vito, R. Durham, and J. Kurtzberg. Autologous cord blood infusions are safe and feasible in young children with autism spectrum disorder: Results of a single-center phase i open-label trial. Stem Cells Translational Medicine, 2017.

[8] A. Delorme and S. Makeig. Eeglab: an open source toolbox for analysis of single-trial eeg

dynamics including independent component analysis. J. Neuroscience Methods, 2004.

[9] R.-N. Duan, J.-Y. Zhu, and B.-L. Lu. Differential entropy feature for eeg-based emotion

classiﬁcation. In IEEE/EMBS Conference on Neural Engineering. IEEE, 2013.

[10] N. Gallagher, K. Ulrich, K. Dzirasa, L. Carin, and D. Carlson. Cross-spectral factor analysis. In

NIPS, 2017.

[11] R. Hultman, S. D. Mague, Q. Li, B. M. Katz, N. Michel, L. Lin, J. Wang, L. K. David, C. Blount,

R. Chandy, et al. Dysregulation of prefrontal cortex-mediated slow-evolving limbic dynamics drives stress-induced emotional pathology. Neuron, 2016.

[12] V. Jirsa and V. Müller. Cross-frequency coupling in real and virtual brain networks. Frontiers in

Computational Neuroscience, 2013.

[13] D. Kingma and J. Ba. Adam: A method for stochastic optimization. ar Xiv:1412.6980, 2014.

[14] S. Koelstra, C. Muhl, M. Soleymani, J.-S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt,

and I. Patras. Deap: A database for emotion analysis; using physiological signals. IEEE Transactions on Affective Computing, 2012.

[15] V. J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung, and B. J. Lance. Eegnet:

A compact convolutional network for eeg-based brain-computer interfaces. ar Xiv:1611.08024, 2016.

[16] S. C.-X. Li and B. M. Marlin. A scalable end-to-end gaussian process adapter for irregularly

sampled time series classiﬁcation. In NIPS, 2016.

[17] W. Liu, W.-L. Zheng, and B.-L. Lu. Emotion recognition using multimodal deep learning. In

International Conference on Neural Information Processing. Springer, 2016.

[18] F. Lotte, M. Congedo, A. Lécuyer, F. Lamarche, and B. Arnaldi. A review of classiﬁcation

algorithms for eeg-based brain computer interfaces. Journal of Neural Engineering, 2007.

[19] K.-R. Müller, M. Tangermann, G. Dornhege, M. Krauledat, G. Curio, and B. Blankertz. Machine

learning for real-time single-trial eeg-analysis: from brain computer interfacing to mental state monitoring. J. Neuroscience Methods, 2008.

[20] M. Murias, S. J. Webb, J. Greenson, and G. Dawson. Resting state cortical connectivity reﬂected

in eeg coherence in individuals with autism. Biological Psychiatry, 2007.

[21] E. Nurse, B. S. Mashford, A. J. Yepes, I. Kiral-Kornek, S. Harrer, and D. R. Freestone. Decoding

eeg and lfp signals using deep learning: heading truenorth. In ACM International Conference on Computing Frontiers. ACM, 2016.

[22] R. Oostenveld, P. Fries, E. Maris, and J.-M. Schoffelen. Fieldtrip: open source software

for advanced analysis of meg, eeg, and invasive electrophysiological data. Computational Intelligence and Neuroscience, 2011.

[23] D. O Reilly, M. A. Navakatikyan, M. Filip, D. Greene, and L. J. Van Marter. Peak-to-peak

amplitude in neonatal brain monitoring of premature infants. Clinical Neurophysiology, 2012.

[24] A. Page, C. Sagedy, E. Smith, N. Attaran, T. Oates, and T. Mohsenin. A ﬂexible multichannel

eeg feature extractor and classiﬁer for seizure detection. IEEE Circuits and Systems II: Express Briefs, 2015.

[25] Y. Pu, Z. Gan, R. Henao, X. Yuan, C. Li, A. Stevens, and L. Carin. Variational autoencoder for

deep learning of images, labels and captions. In NIPS, 2016.

[26] Y. Qi, Y. Wang, J. Zhang, J. Zhu, and X. Zheng. Robust deep network with maximum correntropy

criterion for seizure detection. Bio Med Research International, 2014.

[27] A. Rasmus, M. Berglund, M. Honkala, H. Valpola, and T. Raiko. Semi-supervised learning with

ladder networks. In NIPS, 2015.

[28] O. Tsinalis, P. M. Matthews, Y. Guo, and S. Zafeiriou. Automatic sleep stage scoring with

single-channel eeg using convolutional neural networks. ar Xiv:1610.01683, 2016.

[29] K. R. Ulrich, D. E. Carlson, K. Dzirasa, and L. Carin. Gp kernels for cross-spectrum analysis.

In NIPS, 2015.

[30] M. J. van Putten. The revised brain symmetry index. Clinical Neurophysiology, 2007.

[31] H. Yang, S. Sakhavi, K. K. Ang, and C. Guan. On the use of convolutional neural networks and

augmented csp features for multi-class motor imagery of eeg signals classiﬁcation. In EMBC. IEEE, 2015.

[32] Y. Yang, E. Aminoff, M. Tarr, and K. E. Robert. A state-space model of cross-region dynamic

connectivity in meg/eeg. In NIPS, 2016.

[33] N. Zhang, W.-L. Zheng, W. Liu, and B.-L. Lu. Continuous vigilance estimation using lstm

neural networks. In International Conference on Neural Information Processing. Springer, 2016.

[34] W.-L. Zheng and B.-L. Lu. Investigating critical frequency bands and channels for eeg-based

emotion recognition with deep neural networks. IEEE Transactions on Autonomous Mental Development, 2015.

[35] W.-L. Zheng, J.-Y. Zhu, Y. Peng, and B.-L. Lu. Eeg-based emotion classiﬁcation using deep

belief networks. In IEEE ICME. IEEE, 2014.

[36] Y. Zheng, Q. Liu, E. Chen, Y. Ge, and J. L. Zhao. Time series classiﬁcation using multi-channels

deep convolutional neural networks. In International Conference on Web-Age Information Management. Springer, 2014.