# on_linearly_constrained_minimum_variance_beamforming__cf37772d.pdf

Journal of Machine Learning Research 16 (2015) 2099-2145 Submitted 10/12; Published 12/15

On Linearly Constrained Minimum Variance Beamforming

Jian Zhang jz79@kent.ac.uk Chao Liu cl304@kent.ac.uk School of Mathematics, Statistics and Actuarial Science University of Kent Canterbury, Kent CT2 7NF, UK

Editor: Xiaotong Shen

Beamforming is a widely used technique for source localization in signal processing and neuroimaging. A number of vector-beamformers have been introduced to localize neuronal activity by using magnetoencephalography (MEG) data in the literature. However, the existing theoretical analyses on these beamformers have been limited to simple cases, where no more than two sources are allowed in the associated model and the theoretical sensor covariance is also assumed known. The information about the eﬀects of the MEG spatial and temporal dimensions on the consistency of vector-beamforming is incomplete. In the present study, we consider a class of vector-beamformers deﬁned by thresholding the sensor covariance matrix, which include the standard vector-beamformer as a special case. A general asymptotic theory is developed for these vector-beamformers, which shows the extent of eﬀects to which the MEG spatial and temporal dimensions on estimating the neuronal activity index. The performances of the proposed beamformers are assessed by simulation studies. Superior performances of the proposed beamformers are obtained when the signalto-noise ratio is low. We apply the proposed procedure to real MEG data sets derived from ﬁve sessions of a human face-perception experiment, ﬁnding several highly active areas in the brain. A good agreement between these ﬁndings and the known neurophysiology of the MEG response to human face perception is shown.

Keywords: MEG neuroimaging, vector-beamforming, sparse covariance estimation, source localization and reconstruction

1. Introduction

MEG is a non-invasive imaging technique that records brain activity with high temporal resolution. Postsynaptic current ﬂow within the dendrites of active neurons generates a magnetic ﬁeld that can be measured close to the scalp surface by use of sensors (H am al ainen et al., 1993). The magnitude of these measured ﬁelds is directly related to neuronal current strength, and hence their measurement will reﬂect the amplitude of brain activity. The major challenge, however, is to localize active regions inside the head, given the measured magnetic ﬁelds outside the head (i.e., given MEG data). This is an ill-posed problem of source localization since the magnetic ﬁelds could be caused by an inﬁnite number of neuronal regions. Mathematically, the problem can be stated as follows: one observes a vector of time-series Y(t) = (Y1(t), ..., Yn(t))T Rn, t = tj, 1 j J from n sensors,

c 2015 Jian Zhang and Chao Liu.

Zhang and Liu

which are linked to candidate sources located at rk, 1 k p in the brain via the model

k=1 Hkmk(t) + ε(t), (1.1)

where Hk is an n 3 lead ﬁeld matrix at rk (i.e., the unit output of the candidate source at location rk, which is derived from Maxwell s equations), mk(t) with covariance matrix Σk is a 3 1 moment (time-course) at time t and location rk, ε(t) with covariance matrix σ2 0In represents white noises at the MEG channels, and In is the n n identity matrix. See Mosher et al. (1999) for more details. In practice, when candidate source locations (i.e., voxels) are created by discretizing the source space in the brain, the number of these sources can be substantially larger than the number of available sensors. Moreover, unlike the traditional functional data, not only source time courses but also sensor readings are spatially correlated. Therefore, searching for a small set of latent sources of non-null powers from a large number of candidates poses a challenge to standard i.i.d. sample-based methods in functional data analysis (Ramsay, 2006). Here, the source power at location rk is referred as the trace of the covariance matrix Σk. Two types of approaches have been proposed for handling the above problem in the literature: global approach and local approach (e.g., Henson et al., 2011; Bolstad et al., 2009; Van Veen et al., 1997; Robinson, 1999; Huang et al., 2004; Quraan et al., 2011). In the global approach, one puts all candidate sources into the model and solves a sparse estimation problem. In the local approach, on other hand, one invokes a list of local models, each is tailored to a particular candidate region. The global approach often requires to specify parametric models, while the local approach is model-free. When the number of candidate sources p is small or moderate compared to the number of available sensors n, one may use a Bayesian method to infer latent sources, with helps of computationally intensive algorithms (e.g., Henson et al., 2011). To make an accurate inference, a large p should be chosen. However, when p is large, the global approach may be computationally intractable and the local approach is preferred. Here, we focus on the so-called linearly constrained minimum variance (LCMV) beamforming (also called vector-beamforming), a local method for solving the above large-p-small-n problem. It involves two steps as follows:

Projection step. For location rk in the source space, one searches for the optimal n 3 weighting-matrix W by minimizing the trace of the sample covariance of the projected data W T Y (tj), 1 j J, subject to W T Hk = I3, where I3 is a 3 3 identity matrix. This gives the optimal trace

ˆSk = tr([HT k ˆC 1Hk] 1), (1.2)

where ˆC is a sensor covariance estimator and for any invertible matrix A, A 1 denotes its inverse, and tr( ) stands for the matrix trace operator. See Van Veen et al. (1997) for the details.

Mapping step. For each location rk, calculate the neuronal activity index ˆSk/(σ2 0tr([HT k Hk] 1)), where σ2 0 is estimated by certain baseline noise data such as the pre-stimulus data. Plot the index against the grid points, creating a neuronal activity map over a given temporal window.

LCMV Beamforming

In the projection step, the procedure aims at estimating the desired signal from each chosen location while minimizing the contributions of other unknown locations in the presence of noises by optimizing the variation of the projected data. This can be easily seen from the following decomposition of the projected covariance under the constrain W T Hk = I3:

tr cov(W T Y(t)) = tr(Σk) + tr(W T cov( X

j =k Hjmj(t) + ε(t))W)

+2tr(cov(mk(t), W T ( X

j =k Hjmj(t) + ε(t)))),

where the ﬁrst term is the underlying signal strength at rk and the last two terms are the contributions of other locations and background noises to the estimated strength of the signal at rk. Therefore, minimizing the trace of the projected covariance of the data with respect to W is equivalent to minimizing the the contributions of other locations and background noises to estimating the true signal strength at rk. The further mathematical details can be found in Sekihara and Nagarajan (2008). As pointed out before, in practice, we often have the baseline noise data. Performing the above projection procedure on the noise data under the assumption that the noise covariance matrix is approximately σ2 0In, we obtain the optimal trace of the covariance matrix of the projected noise at rk, σ2 0tr([HT k Hk] 1). This implies that the above neuronal activity index is a signal-to-noise ratio (SNR) at location rk. Therefore, the map generated in the mapping step is a SNR map. A similar formula can be derived under a general model of the noise covariance. However, to avoid highdimensional eﬀects on estimating sensor covariance matrices, we often employed a diagonal noise covariance model even when the true one is not diagonal. Both theoretical and empirical studies have suggested that the vector-beamforming can provide excellent performance given a suﬃcient number of observations (e.g., Sekihara et al., 2004; Brookes et al., 2008; Quraan et al., 2011). However, the existing theoretical analyses have been limited to simple cases, where no more than two sources are allowed in the model and the theoretical sensor covariance is assumed known. In limited data scenarios the estimated sensor covariance may possess considerable variation and thus deteriorate the performance of localization. Empirical studies have also demonstrated that the sampling window and rate are generally required to increase as the number of spatial sensors increases. For example, when using the sample covariance matrix to estimate the sensor covariance matrix, the number of statistically independent data records should be three or more times the number of sensors in order to obtain statistically stable source location estimates (e.g., Rodr ıguez-Rivera et al., 2006). Consequently, the potential advantages of having a large number of sensors are oﬀset by the requirement for increased sampling window and rate. Therefore, it is important to develop a general framework for users to examine the extent of eﬀects to which the spatial dimension (i.e., the lead ﬁeld matrix) and the temporal dimension (i.e., the temporal correlations of sensor measurements) of MEG on the accuracy of source localization. Furthermore, most brain activities are conducted by neural networks which consist of multiple sources. For example, in the so-called evoked median-nerve MEG response study, scientists have found the relatively large number of neuronal sources activated in a relatively short period of time by the median-nerve stimulation with typical repetition rates, which challenges covariance-based analysis techniques such as beamformer due to source cancellations (Huang et al., 2004). We need to understand how the accuracy

Zhang and Liu

of localization is aﬀected by source cancellations both theoretically and empirically. In particular, we need to address the fundamental questions of whether the neuronal activity map can reveal the true sources when the number of sensors and the width of the sampling window are large enough and of how much multiple source cancellation eﬀects are reduced by increasing spatial and temporal dimensions of MEG. The goal of the present study is to demonstrate at both theoretical and empirical levels the behavior of a class of vector-beamforming techniques which includes the standard vectorbeamformer as a special example. These beamformers are based on thresholding the sample sensor covariance matrix. By thresholding, we aim at reducing the noise level in the sample sensor covariance. We provide an asymptotic theory on these beamformers when the sensor covariance matrix is consistently estimated and when multiple sources exist. We show that the estimated source power is consistent when multiple sources are asymptotically separable in terms of a lead ﬁeld distance. We further assess the performance of the proposed procedure by both simulations and real data analyses. The paper is organized as follows. The details of the proposed procedures are given in Section 2. The asymptotic analysis is provided in Section 3. Other covariance estimatorbased beamformers are introduced in Section 4. The simulation studies on these beamformers and an application to face-perception data are conducted in Section 5. The discussion and conclusion are made in Section 6. The proofs of the theorems and corollaries are deferred to Section 7. Throughout the paper, let ||A|| denote the operator norm of matrix A. For a sequence of matrix An, we mean by An = O(1) that ||An|| is bounded and by An = o(1) that ||An|| = o(1). Similarly, we deﬁne the notations Op and op for a sequence of random matrices An. For non-negative matrices A and B, we say A < B if a T Aa < a T Ba for any a with ||a|| = 1. We say that random matrix An is asymptotically larger than random matrix Bn in probability if min||a||=1 a T (An Bn)a is asymptotically bounded below from zero in probability.

2. Methodology

Suppose that the sensor measurements (Y(tj) : 1 j J) are weakly stationary timecourses observed from n sensors. We want to identify a small set of non-null sources that underpin these observations. To this end, we introduce a family of vector-beamformers based on thresholding sensor covariance as follows.

2.1 Thresholding the sensor covariance matrix

The sensor covariance matrix of Y(t), C can be estimated by the sample covariance matrix

ˆC = (ˆcij) = 1

j=1 Y(tj)Y(tj)T Y YT ,

where Y is the sample mean of (Y(tj) : 1 j J). It is well-known that the sample covariance estimator can breakdown when the dimension n is large (Bickel and Levina, 2008). In the statistical literature (Bickel and Levina, 2008), various sparse estimation procedures have been proposed to ﬁx the sample covariance, including the following thresholded

LCMV Beamforming

estimator: ˆC(τn J) = (ˆcij(τn J))

with ˆcij(τn J) = ˆcij I(|ˆcij| τn J), where τn J is a varying constant in n and J. As with the i.i.d. case (Bickel and Levina, 2008), the above thresholded estimator will be shown to converges to positive deﬁnite limit with probability tending to 1 in the Lemma 7.2 in Section 7 below. Although the thresholded estimator has good theoretical properties, it may not be always positive deﬁnite when the sample size is ﬁnite or when sensors are spatially too close to each other. To tackle the issue, we assume that ˆC(τn J) has the eigendecomposition ˆC(τn J) = Pn k=1 ˆλkv T k vk and then a positive semideﬁnite estimator can be obtained by setting these negative eigenvalues to zeros. We further shrinkage the covariance matrix estimator by artiﬁcially adding ϵ0In to it in our implementation, where we choose ϵ0 to be a tuning constant which is equal to or slightly larger than the maximum eigenvalue of the noise covariance matrix. We will show in the following sections that adding ϵ0In to the thresholded covariance matrix does not aﬀect the consistency of the neuronal activity index.

2.2 Beamforming

As before, let Σk denote the covariance matrix of the moment mk(t) at the location rk. Based on the thresholded sensor covariance estimator ˆC(τn J), we estimate Σk, 1 k p and create a neuronal activity map in the following two steps. In the projection step, for 1 k p, we search for an n 3 weight matrix ˆWk which attains the minimum trace of W T ˆC(τn J)W subject to W T Hk = I3. When ˆC(τn J) is invertible, it follows from Van Veen et al. (1997) that

ˆWk = ˆC(τn J) 1Hk h HT k ˆC(τn J) 1Hk i 1

with the resulting moment covariance matrix and trace estimators

ˆΣk = h HT k ˆC 1(τn J)Hk i 1 , ˆSk = tr h HT k ˆC(τn J) 1Hk i 1

respectively. In the mapping step, we calculate the so-called neuronal activity index

NAI(rk) = ˆSk/ σ2 0tr HT k Hk 1 ,

creating a brain activity map, where σ2 0 is estimated from baseline data (i.e., called prestimulus data in the next subsection). One of the underlying sources can be then estimated by the global peak on the map with the associated latent time-course estimated by projecting the data along the optimal weighting vector. The multiple sources can also be identiﬁed by grouping the local peaks on the transverse slices of the brain.

2.3 Choosing the thresholding level

In practice, the MEG imaging is often run on a subject ﬁrst without stimulus and then with stimulus. This allows us to calculate the sample covariance ˆC for the stimulus data

Zhang and Liu

as well as the sample covariance ˆC0 for the pre-stimulus data. The latter can provide an estimator of the background noise level. In the next section, we will show that the convergence rate of the thresholded sample covariance is O( p

log(n)/J). In light of this, we set τn J = c0ˆσ2 0 p

log(n)/J with a tuning constant c0 and threshold ˆC by τn J, where ˆσ2 0 is the minimum diagonal element in ˆC0 and c0 is a tuning constant. Note that, when c0 = 0, the proposed procedure reduces to the standard vector-beamformer implemented in the software Field Trip (Oostenveld et al., 2010). For each value of c0, we apply the proposed procedure to the data and calculate the maximum neuronal activity index

NAIc0 = max{NAI(r) : r is running over the grid}. (2.3)

In simulations, we will show that c0 D0 = {0, 0.5, 1, 1.5, 2} has covered its useful range. Our simulations also suggests that there is an optimal value of c0, which depends on several factors including the strengths of signals and source interferences. To exploit these two factors, we choose c0 in which NAIc0 attains maximum or minimum, resulting in two procedures called ma and mi respectively. By choosing c0, the procedure ma intends to increase the maximum SNR value, while the procedure mi tries to reduce source interferences. The simulation studies in Section 5 suggest that mi can perform better than ma when sources are correlated.

2.4 Two sets of stimuli

Suppose now that MEG measurements (Y(1)(t)) and (Y(2)(t)) are made under two different sets of stimuli and pre-stimuli with the associated neuronal activity indices denoted by NAI(1)(rk) and NAI(2)(rk) respectively. The previous strategy for selecting the tuning constant c0 can be adopted here when we calculate these indices. To identify source locations that respond to the change of stimulus set, we calculate a log-contrast log(NAI(1)(rk)/NAI(2)(rk)) between the two sets of stimuli at location rk, 1 k p, creating a log-contrast map. The resulting log-contrast map is equivalent to the map based on index ratio NAI(1)(rk)/NAI(2)(rk), which was often seen in the literature (e.g., Hillebrand et al., 2005). We further take the global peak of the log-contrast as the maximum location estimator for a source location that contributes to the diﬀerence between the two sets of MEG measurements.

In this section, we develop a theory on the consistency as well as the convergence rate of the hard thresholding-based beamformer estimator under regularity conditions. In particular, we show that the consistency holds true under regularity conditions if we let the hard threshold τn J = A p

log(n)/J with constant A. This provides a theoretical basis for using the proposed procedures ma and mi. Without loss of generality, we assume that the ﬁrst q p moment vectors are of nonzero covariance matrices Σk, 1 k q, where q is unknown and often much smaller than p in practice. For the simplicity of mathematical derivations, we also assume that Σk does

LCMV Beamforming

not grow with the number of sensors n. Our task is to identify the unknown true model

k=1 Hkmk(t) + ε(t), (3.4)

from the working model (1.1) by using the proposed procedure, where the unknown moments mk(t), 1 k q are of non-zero powers tr(Σk), 1 k q. To establish a theory for the proposed procedures, we assume that (A1): Both the moment vectors (mk(t) : 1 k q) and the white noise process (ε(t)) are stationary with zero means and temporally uncorrelated with each other. Also, mk(t) is temporally uncorrelated with mj(t) for k = j. Under Condition (A1), the sensor covariance matrix of Y(t), C can be expressed in the form

k=1 HkΣk HT k + σ2 0In.

As pointed out by Sekihara and Nagarajan (2008, Chapter 9), Condition (A1) is one of fundamental assumptions in the vector-beamforming. However, source activities in the brain are inevitably correlated to some degree, and in strict sense, (A1) cannot be satisﬁed. The theoretical inﬂuence of temporally correlated sources has been investigated by Sekihara and Nagarajan (2008, Chapter 9). The equation (9.3) in Sekihara and Nagarajan (2008, Chapter 9) implies that the inﬂuence can be ignored if the partial correlations between sources are close to zeros in order of o(1/n) when the number of sensors n is suﬃciently large. Note that although in practice the number of sensors is limited to a few hundreds, we still ideally let n tend to inﬁnity to identify potential spatial factors that aﬀect the performance of a vector-beamformer. In the next section, by using simulations, we will demonstrate that the source correlations can mask some true sources. To show the consistency of the estimators ˆΣk and Sk, we need more notations and condition as follows. Let Hk denote the lead ﬁeld matrix at the location rk. For the simplicity of the technical derivations later, we further assume that the lead ﬁeld matrices satisfy the condition that for any location rk, HT k Hk/n G in terms of the operator norm as n tends inﬁnity, where G is a 3 3 positive deﬁnite matrix. Under the above condition, we can ﬁnd a positive deﬁnite matrix Qk satisfying that HT k Hk = n Qk QT k and Q 1 k HT k Hk Q T k = n I3 when n is large enough, where I3 is an identity matrix. Letting H k = Hk Q T k , m k(t) = QT k mk and Σ k = QT k Σk Qk, we reparametrize the model (1.1) as follows:

k=1 H km k + ε(t)

with the covariance matrix C = Pp k=1 H kΣ k H T k + σ2 0In. Then, under the reparametrized model, the estimators

ˆΣ k = h H T k ˆC(τn J) 1H k i 1 = h Q 1 k HT k ˆC(τn J) 1Hk Q T k i 1 = QT k ˆΣk Qk.

ˆSk = tr(Q T k ˆΣ k Q 1 k ).

Zhang and Liu

Consequently, ˆΣ k is consistent with Σ k if and only if ˆΣk is consistent with Σk. Therefore, without loss of generality, hereinafter we assume that (A2): HT k Hk = n I3 for any location rk. We process the remaining analysis in two stages: In the ﬁrst stage, we develop an asymptotic theory for the proposed vector-beamformers when the sensor covariance matrix C is known. The sensor covariance matrix can be assumed known if the width of the sampling window can be arbitrarily large. In the second stage, we extend the theory to the case where C is estimated by ˆC(τn J).

3.1 Beamformer analysis when C is known

We begin with introducing some more notations. For any locations rx and ry, let Hx and Hy denote their lead ﬁeld matrices. Deﬁne the lead ﬁeld coherent matrix by ρxy = ρ(rx, ry) = HT x Hy/n. Note that ρxy +ρyx = I3 (Hx Hy)T (Hx Hy)/(2n). Therefore, I3 (ρxy +ρyx) indicates how close rx is to ry. In general, the partial coherence factor matrices (or called partial correlation matrices) ayx|k, 1 k q are deﬁned iteratively by the so-called sweep operation (Goodnight, 1979) as follows:

ayx|1 = σ 2 0 ρ(ry, r1, rx) = σ 2 0 (ρ(ry, rx) ρ(ry, r1)ρ(r1, rx)) ,

ayx|(k+1) = ayx|k ay(k+1)|k a(k+1)(k+1)|k 1 a(k+1)x|k, 1 k q 1.

For example, we have

σ2 0ayx|1 = ρyx ρy1ρ1x, σ2 0a22|1 = I3 ρT 12ρ12,

σ2 0a33|2 = I3 ρT 13ρ13 ρ23 ρT 12ρ13 T I3 ρT 12ρ12 1 ρ23 ρT 12ρ13 .

Note that σ2 0a(k+1)(k+1)|k gauges the partial variability of rk+1 given the previous r ks while σ2 0ayx|(k+1) shows the partial coherence between rx and ry given {r1, ..., rk+1}. We expect that ayx|(k+1) will be small if ry and rx are spatially far away from each other. We deﬁne byx|k, 1 k q, by letting byx|1 = ρy1Σ 1 1 ρ1x and

byx|k = byx|(k 1) byk|(k 1) akk|(k 1) 1 akx|(k 1) ayk|(k 1) akk|(k 1) 1 bkx|(k 1)

+ayk|(k 1) akk|(k 1) 1 Σ 1 k + bkk|(k 1) akk|(k 1) 1 akx|(k 1).

We also deﬁne cjj|k, 1 j k q by

( Σ 1 k akk|(k 1) 1 Σ 1 k , j = k cjj|(k 1) bjk|(k 1) akk|(k 1) 1 b T jk|(k 1), 1 j k 1.

Let anq = n min1 k q 1 ||a(k+1)(k+1)|k||, and let km = 0 if anq and km = min{1 k q 1 : n||a(k+1)(k+1)|k|| = O(1)} if anq = O(1). Let dx|q = max2 k q ||akx|(k 1)a 1 kk|(k 1)||, which measures the maximum absolute partial correlation among q sources by using their lead ﬁeld matrix. As the lead ﬁeld matrix measures the unit outputs of sources recorded by sensors, the maximum absolute partial correlation may increase when the number of sensors n increases. In the following theorem, for any location rx of interest, the condition that

LCMV Beamforming

dx|q = O(1) (i.e., the maximum absolute partial correlation will be bounded) is imposed on the lead ﬁeld matrix. The condition is used to ensure the coherence stability for the grid approximation to the lead ﬁeld. Our numerical experience suggests that the condition roughly holds when the underlying sources are asymptotically not close to each other. See the discussion in Section 7. The following theorem shows when the source covariance estimator is consistent and when it is not.

Theorem 1 Under Conditions (A1) (A2) and C is known, we have:

(1) If anq = O(1) and max1 k q dk|q = O(1), then the estimated source covariance at

rkm+1 Hkm+1T C 1Hkm+1 1 is asymptotically larger than Σkm+1.

(2) If anq , then for 1 j q, the estimated source covariance at rj admits

HT j C 1Hj 1 = Σj + 1

nΣjcjj|qΣj + O(a 2 nq ),

provided max1 k q dk|q = O(1), where ||Σjcjj|qΣj/n|| = O(a 1 nq ) as n .

(3) If anq , then for rx {r1, ..., rq}, the estimated source covariance at rx admits

HT x C 1Hx 1 = 1 na 1 xx|q 1

n2 a 1 xx|qbxx|qa 1 xx|q + O(a 3 nq ),

provided max1 j q dj|q = O(1), ||naxx|q|| , and dx|q = O(1) as n tends to inﬁnity, where bxx|q = O(1) as n .

The following lemma shows when the source power estimator is consistent.

Corollary 2 Under Condition (A1) (A2), we have:

(1) If anq = O(1) and max1 k q dk|q = O(1), then the estimated source power at rkm+1 tr Hkm+1T C 1Hkm+1 1 is asymptotically larger than tr (Σkm+1).

(2) If anq , then for 1 j q, the estimated source power at rj admits

tr HT j C 1Hj 1 = tr(Σj) + 1

ntr(Σjcjj|qΣj) + O(a 2 nq ),

provided max1 k q dk|q = O(1), where ||Σjcjj|qΣj/n|| = O(a 1 nq ) as n .

(3) If anq , then for rx {r1, ..., rq}, the estimated source power at rx admits

tr HT x C 1Hx 1 = 1 ntr(a 1 xx|q) 1

n2 tr(a 1 xx|qbxx|qa 1 xx|q) + O(a 3 nq ),

provided max1 j q dj|q = O(1), ||naxx|q|| , and dx|q = O(1) as n tends to inﬁnity, where bxx|q = O(1) as n .

Zhang and Liu

Remark 3 It follows from the deﬁnition that cjj|q is proportional to σ2 0, which implies the convergence rate of the neuronal activity index is of order O(σ2 0/(σ2 0anq)), where σ2 0anq is independent of σ2 0. Therefore, the eﬀect of adding ϵ0In to C on the above convergence rate is increasing or decreasing the rate by the amount of O(ϵ0/((σ2 0 + ϵ0)anq)). In particular, adding ϵ0In to C does not aﬀect the consistency of the neuronal activity index if anq tends inﬁnity.

Remark 4 From the proof in Section 7, we can see that if we relax the coherence stability condition max1 k q dk|q = O(1) to max1 k q dk|q = O(log(n)), then the convergence rates in the theorem will be reduced by a factor of log(n).

Remark 5 If there are MEG measurements made under two diﬀerent sets of stimuli and pre-stimuli, we let C(1) = Pp k=1 HT k Σ(1) k Hk + σ2 01In and C(2) = Pp k=1 HT k Σ(2) k Hk + σ2 02In be the corresponding sensor covariance matrices. We perform the proposed beamformers on C(1) and C(2) respectively. Then, under certain conditions, Theorem 1 can be extended to this setting. When rk is a source location for both sets of stimuli, the log-contrast tends to the true one as n ; when rk is a source for stimulus set 1 but not for stimulus set 2, the log-contrast tends to inﬁnite; when rk is a source location for stimulus set 2 but not for stimulus set 1, the log-contrast tends to ; when rj is neither a source for stimulus set 1 nor a source for stimulus 2, the log-contrast tends to a ﬁnite value depending on the associated values of axx|q. The details are omitted here.

3.2 Beamformer analysis when C is estimated

We now estimate the sensor covariance matrix by using the sensor observations over J time instants. Following Bickel and Levina (2008) and Fan et al. (2011), we establish the asymptotic theory for the resulting beamformer estimators when both n and J are tending to inﬁnity. In addition to Conditions (A1) and (A2), we need the following two conditions for conducting the asymptotic analysis above. The ﬁrst one is imposed to regularize the tail behavior of the sensor processes.

(A3): There exist positive constants κ1 and τ1 such that for any u > 0 and all t,

max 1 i n P(||Yi(t)|| > u) exp(1 τ1uκ1)

and max1 i n E||Yi(t)||2 < + , where the noise covariance matrix is σ2 0In and || || is the L2 norm.

Note that Condition (A3) holds if Y(t) is a multivariate normal. In the second additional condition, we assume that the sensor processes are strong mixing. Let F0 and F k denote the σ-algebras generated by {Y(t) : t 0} and {Y(t) : t k} respectively. Deﬁne the mixing coeﬃcient

α(k) = sup A F0 ,B F k |P(A)P(B) P(AB)|.

The mixing coeﬃcient α(k) quantiﬁes the degree of the temporal dependence of the process {Y(t)} at lag k. We assume that α(k) is decreasing exponentially fast as lag k is increasing.

LCMV Beamforming

(A4): There exist positive constants κ2 and τ2 such that

α(k) exp( τ2kκ2).

Condition (A4) is a commonly used assumption for studying asymptotic behavior of time series. For a constant A, let τn J = A p

log(n)/J. As before, let Yi be the sample mean of the i-th sensor and

j=1 (Yi(tj) Yi)(Yk(tj) Yk), ˆC(τn J) = (ˆcik I(ˆcik τn J)),

where I( ) is the indicator. We are now in position to generalize Theorem 1 to the case where the sensor covariance is estimated by the thresholded covariance estimator.

Theorem 6 Under Conditions (A1) (A4) and assuming that n2p

log(n)/J = o(1) as n and J tend to inﬁnity, we have:

(1) If anq = O(1) and max1 k q dk|q = O(1), then as n and J tend to inﬁnity, the estimated source covariance at rkm+1 ˆΣkm+1 is asymptotically larger than Σkm+1 in probability.

(2) If anq , then as n and J tend to inﬁnity, for 1 j q, the estimated source covariance at rj admits

ˆΣj = Σj + 1

nΣjcjj|qΣj + Op(a 2 nq + n2p

provided max1 k q dk|q = O(1), where ||Σjcjj|qΣj/n|| = O(a 1 nq ) as n .

(3) If anq , then as n and J tend to inﬁnity, for rx {r1, ..., rq}, the estimated source covariance at rx admits

ˆΣx = 1 na 1 xx|q 1

n2 a 1 xx|qbxx|qa 1 xx|q + O(a 3 nq + n2p

provided max1 j q dj|q = O(1), ||naxx|q|| , and dx|q = O(1) as n tends to inﬁnity, where bxx|q = O(1) as n .

Corollary 7 Under Conditions (A1) (A4) and assuming that n2p

log(n)/J = o(1) as n and J tend to inﬁnity, we have:

(1) If anq = O(1), max1 k q dk|q = O(1), as n and J tend to inﬁnity, the estimated source power at rkm+1, ˆSkm+1 is asymptotically larger than tr (Σkm+1).

(2) If anq , then as n and J tend to inﬁnity, for 1 j q, the estimated source power at rj admits

ˆSj = tr(Σj) + 1

ntr(Σjcjj|qΣj) + O(a 2 nq + n2p

provided max1 k q dk|q = O(1), where ||Σjcjj|qΣj/n|| = O(a 1 nq ) as n .

Zhang and Liu

(3) If anq , then as n and J tend to inﬁnity, for rx {r1, ..., rq}, the estimated source power at rx admits

ˆSx = 1 ntr(a 1 xx|q) 1

n2 tr(a 1 xx|qbxx|qa 1 xx|q) + O(a 3 nq + n2p

provided max1 j q dj|q = O(1), ||naxx|q|| , and dx|q = O(1) as n tends to inﬁnity, where bxx|q = O(1) as n .

Remark 8 Theorem 6 indicates the convergence rate of the vector-beamformer estimation is much slower than the empirical rate suggested by Rodr ıguez-Rivera et al. (2006). However, the result is in agreement with an empirical result of Brookes et al. (2008). In fact, using their heuristic arguments, we can show that the error of the power estimation at location rx is determined by the factor Hx( ˆC(τn J) 1 C 1)Hx, which has a rate of n2p

log(n)/J. Theorem 6 can be also extended to the scenarios where MEG data are obtained under two diﬀerent sets of stimuli.

Remark 9 From the proof of Theorem 6, we can see that the thresholded covariance is still consistent with the true C even when the underlying sources are correlated.

4. Other covariance estimator-based beamformers

There are various ways to estimate the sensor covariance matrix. Each can be used to construct a beamformer. These covariance estimators can be roughly divided into two categories, namely global shrinkage-based methods and elementwise thresholding-based methods. In shrinkage-based settings, the sample covariance is shrinking toward a target structure (for example, a diagonal matrix). The so-called optimal shrinkage estimator belongs to this category (Ledoit and Wolf, 2004). In thresholding-based settings, an elementwise thresholding is applied to the sample covariance estimator. Examples of these approaches include hard thresholding, generalized thresholding and adaptive thresholding (Bickel and Levina, 2008; Rothman et al., 2009; Cai and Liu, 2011). Here, we focus on the following three methods recommended by the above authors. The optimal shrinkage covariance matrix is deﬁned by

ˆCopt = b2 n d2n µn In + d2 n b2 n d2n ˆC,

µn = D ˆC, In E , d2 n = D ˆC µn In, ˆC µn In E ,

b2 n = 1 J2

D Yj YT j ˆC, Yj YT j ˆC E , b2 n = min( b2 n, d2 n),

and the operator < A, B >= tr(ABT )/n for any n n matrices A and B. The idea behind the above estimator is to ﬁnd the optimal weighted average of the sample covariance matrix ˆC and the identity matrix via minimizing the expected squared loss. Under certain

LCMV Beamforming

conditions ˆCopt converges to the true covariance C as n tends inﬁnity, implying that ˆCopt can be degenerate if C is degenerate (Ledoit and Wolf, 2004). As before, we tackle the issue by adding ϵ0In to ˆCopt, where ϵ0 is determined by the maximum eigenvalue of the pre-stimulus sample covariance matrix. The beamformer based on the above covariance estimator is denoted as sh. A family of generalized thresholding-based covariance estimators indexed by tuning constants c0 0 and δ0 > 0 can be deﬁned by replacing the hard thresholding in Subsection 2.1 with the generalized thresholding, i.e.,

ˆCg = (g(ˆcij))

with g(ˆcij) = ˆcij(1 (τn J/|ˆcij|)δ0), where τn J = c0ˆσ2 0 p

log(n)/J and ˆσ2 0 is estimated from a baseline sample. Following the suggestion of Rothman et al. (2009), we choose δ0 = 4. The same maximum/minimum strategy as in Subsection 2.3 can be adapted to choose the tuning constant c0 when we use the above estimator to construct a beamformer. The corresponding beamformers are denoted by gmax and gmin respectively. Similarly, an adaptive thresholding estimator can be introduced by replacing the above

τn J in the g function by λij = 2 q

ˆθij log(n)/J, where ˆθij is the estimated variance of the (i, j)-th entry ˆcij and is deﬁned by

k=1 [(Yik Yi)(Yjk Yj) ˆcij]2

and Yi and Yj are the sample means of the i-th and the j-th sensors. See Cai and Liu (2011). The corresponding beamformer is denoted by adp.

5. Numerical results

In this section, we compare the proposed procedures to the standard vector-beamformer (with the tuning c0 = 0) and to the other covariance estimator-based beamformers in terms of localization bias by simulation studies and real data analyses. Here, for any estimator ˆr of a source location r, the localization bias |ˆr r| is the L1 distance between ˆr and r. The spatial correlation ρmax between locations r1 and r2 is measured by the maximum correlation between the projected lead ﬁeld vectors at r1 and r2:

ρmax(r1, r2) = max ||η1||=1,||η2||=1 (l(r1)η1)T l(r1)η1 ||l(r1)η1||| |l(r2)η2||

By simulations, we attempted to answer the following questions:

Has the vector-beamformer been improved by using the thresholded covariance estimator?

To what extent will the performance of the proposed beamformer procedure deteriorate by source interferences (or source cancellations) and source spatial correlations?

Can the proposed beamformers ma and mi be superior to the other covariance estimator-based beamformers?

Zhang and Liu

5.1 Simulated data

We started with specifying the following two head models (Sarvas, 1987). The simple head model that uses a homogeneous sphere in simulating the magnetic ﬁelds emanating from current electric dipole neuronal activity possesses the advantage that the lead ﬁeld matrix can be calculated analytically. However, with more realistic head models, the numerical approximations such as a ﬁnite element method have to be used when we calculate the lead ﬁeld matrix. Here, we considered both of them: the simple one is a spherical volume conductor with 10cm radius from the origin and with 91 sensors, created by using the software Field Trip (Oostenveld et al., 2010), and the realistic one is a single shell head model by using the magnetic resonance imaging (MRI) scan of a human brain provided by Henson et al. (2011). We then discretized the inside brain space into a 3D-grid of resolution 1 cm. This yielded a grid with 2222 points for the simple model and 1487 points for the realistic model. The grids was further sliced into 10 and 14 transverse layers along the z-axis of the brain respectively. We put two non-null sources at r1 and r2 or three sources at r1, r2 and r3 respectively, where two sources {r1, r2} are equal to {(3, 1, 4)T , ( 5, 2, 6)T } cm or {( 5, 5, 6)T , ( 6, 2, 5)T } cm, and three sources {r1, r2, r3} are equal to {(3, 1, 4)T , ( 5, 2, 6)T , (5, 5, 6)T } in the Subject Coordinate System (SCS/CTF). Note that the second set of source locations was obtained in our real data analyses which will be presented later. These sources were located in the region of the parietal and occipital lobes, where visual, auditory and touch information is processed. We considered two types of sources in the brain: evoked responses that are phase-locked to the stimulus and induced responses that are not. The induced responses often have oscillatory patterns. Combining these sources with the two head models, we had the following four scenarios:

Scenario 1: For the simple head model, we put two oscillatory sources at locations r1 = (3, 1, 4)T and r2 = ( 5, 2, 6)T with time-courses

mk(t) = ηk cos(20tπ), k = 1, 2,

respectively, where η1 = (10, 1, 1)T and η2 = (8, 0, 0)T . We considered two values of the signal-to-noise-ratio (SNR): 0.04 and 1/0.64 = 1.5625.

Scenario 2: For the simple head model, we put the above oscillatory sources at locations r1 = ( 5, 5, 6)T and r2 = ( 6, 2, 5)T . We also considered two values of the SNR: 0.04 and 1/0.64 = 1.5625.

Scenario 3: For the realistic head model, we put the following evoked response sources at locations r1 = (3, 1, 4)T and r2 = ( 5, 2, 6)T with moments (i.e., timecourses)

mk(t) = αk exp( (t τk1)2/ω2 k) sin(fk2π(t τk2)), k = 1, 2,

respectively, where α1 = (5, 0, 0)T , α2 = (20, 0, 0)T , τ11 = 0.239, τ12 = 0.139, τ21 = 0.199, τ22 = 0.139, f1 = 4.75, f2 = 6.25, and ω1 = ω2 = 0.067. We considered three values of the SNR: 1/0.352 = 8.16, 1/0.42 = 6.25, 1/0.52 = 4.

LCMV Beamforming

Scenario 4: For the realistic head model, we put the following evoked response sources at locations r1 = ( 5, 5, 6)T and r2 = ( 6, 2, 5)T with moments (i.e., timecourses) mk(t) = αk exp( (t τk1)2/ω2 k) sin(fk2π(t τk2)), k = 1, 2,

respectively, where α1 = (2, 0, 0)T , α2 = (18, 0, 0)T , τ11 = 0.439, τ12 = 0.139, τ21 = 0.399, τ22 = 0.139, f1 = 6, f2 = 9, and ω1 = ω2 = 2. We considered three values of the SNR: 1/0.72 = 2.04, 1/0.762 = 1.73, 1/0.782 = 1.64.

Scenario 5: We added another evoked response source at location r3 = (5, 5, 6)T to the model in Scenario 3 with moment

m3(t) = α3 exp( (t τ31)2/ω2 3) sin(f32π(t τ32)),

where α3 = (2.5, 0.25, 0.25), τ31 = 0.1, τ32 = 0.139, f3 = 1.25, and w3 = 0.067. The three source locations are highly spatially correlated with the pairwise spatial correlations ρ(r1, r2) = 0.7289, ρ(r1, r3) = 0.7935, and ρ(r2, r3) = 0.5924. We considered the same SNR values as in Scenario 3.

The pair sources mk(t), k = 1, 2 for the ﬁrst four scenarios and the treble sources mk(t), k = 1, 2, 3 for Scenario 5 are plotted respectively in Figure 1. By Scenarios 1 and 2, we compared the proposed procedure to the standard vector-beamformer (with c0 = 0) and to the other estimator-based beamformer, when there existed two highly correlated oscillatory sources (they have the same frequency and phase, but with slightly diﬀerent amplitudes). By Scenarios 3 and 4, we tested these beamformers when there existed two unbalanced evoked response (or slightly dumped-oscillatory) sources. By Scenario 5, we assessed these beamformers when there were three spatially correlated source locations. In each scenario, with time-window width 1 and sample rate J, we sampled 30 data sets of Y(t) from the model

k=1 Hkmk(t) + ε(t), (5.5)

where in Scenarios 1 4, mk(t), k = 1, 2 are non-null time-courses at the two locations and mk(t), 3 k p are null time-courses at other grid points, while in Scenario 5, mk(t), k = 1, 2, 3 are non-null time-courses at the three locations and mk(t), 4 k p are null time-courses at other grid points. As before, {ε(t)} is a white noise process with noise level σ2 0. We considered various combinations of (n, p) = (91, 2222) and (102, 1487), and J = 500, 1000, 2000, and 3000. Note that p is substantially larger than n and that the sources are sparse in the sense that there are only two or three non-null sources among p candidates. We ﬁrst applied the proposed procedures ma, mi and sh to each data set. We calculated the maximum indices over the grids and the L1-biases of the maximum location estimates to two sources respectively. For each combination of (n, p, J) and the SNR, we then summarized these values in the form of a box-whisker plot as in Figures 2, 3, 4, and 5 corresponding to Scenarios 1, 2, 3, and 4 respectively. The results demonstrate that the proposed hard thresholding-based procedure mi can outperform both the conventional vector-beamformer

Zhang and Liu

0 0.2 0.4 0.6 0.8 1 20

Sources at (3, 1,4) and ( 5,2,6)

0 0.2 0.4 0.6 0.8 1 20

Sources at ( 5,5,6) and ( 6, 2,5)

0 0.2 0.4 0.6 0.8 1 20

Sources at (3, 1,4) and ( 5,2,6)

0 0.2 0.4 0.6 0.8 1 20

Sources at ( 5,5,6) and ( 6, 2,5)

0 0.2 0.4 0.6 0.8 1 20

Sources at (3, 1,4),( 5,2,6) and (5,5,6)

Figure 1: The amplitude plots of mk(t), k = 1, 2 for Scenarios 1 to 4 and the amplitude plots of mk(t), k = 1, 2, 3 for Scenario 5. In these plots, the blue, green and red colored curves are corresponding to the amplitudes of mk(t), k = 1, 2, 3 respectively.

and the procedures ma and sh in all four scenarios, in particular when the SNR is low. We note that in several cases, the localization bias and the maximum index were degenerate to a single value with some outliers, indicating that random variations have not changed the global peak location although they have eﬀects on local peaks on the map. The simulations also suggest that the proposed procedure may be unable to detect evoked response sources of low SNR values. The local peak box-whisker plots in these ﬁgures reveal that all the local peaks on the transverse slices are not close to the source location r1, implying that the source at r1 has been masked on the neuronal activity index-based map even when two sources have a similar power level. This may be due to source cancellations as the lead ﬁeld vectors at these two locations were correlated and the sensor positions might favor the detection of r2. Finally, we note that the results are robust to the choice of J in the sense that increasing sampling frequency has only slightly reduced both the mean and standard error of localization bias.

LCMV Beamforming

To compare the procedures ma, mi and sh with the procedures gma, gmi and adp based on the generalized and adaptive thresholding, we again generated 30 data sets from model (5.5) for each of the above four scenarios and for each combination of (n, p) = (91, 2222) and (102, 1487), and J = 500, 1000, 2000, and 3000. We applied these procedures to each data set and calculated their localization biases respectively. As before, we displayed these biases by multiple box-whisker plots in Figures 6, 7 and 8. From these ﬁgures, we can see a dramatic improvement in localization performance of the hard thresholdingbased procedure mi over the other procedures in Scenarios 1 and 2 and a slightly better or similar performance to ma, gma, gmi, adp and sh in Scenarios 3 and 4. This is striking because the existing studies have already shown that the soft (or generalized) and adaptive thresholding-based covariance estimators can improve the hard thresholdingbased covariance estimator in terms of estimation loss. The potential explanations for this phenomena are as follows: (1) The procedure adp may lose eﬃciency by not using the prestimulus data. (2) The existing covariance estimators were aimed to improve the estimation accuracy by reducing the estimation loss (the distance between the estimator and the true covariance matrix) or by increasing the sensitivity and speciﬁcity in recovering sparse entries in the true covariance matrix (Rothman et al., 2009; Cai and Liu, 2011). Unfortunately, the sparsity in MEG means a sparse signal distribution, which is quite diﬀerent from the entry sparsity of the sensor covariance matrix. Therefore, these estimators may be not eﬃcient for improving the accuracy of the beamformer estimation which is related to the signal sparsity. In fact, our simulation experience suggests that besides the covariance estimation, there are other factors that can aﬀect the performance of a beamformer such as the lead ﬁeld matrix and the spatial distribution of signals in the brain. Therefore, the covariance estimator with a smaller estimation loss may not give rise to a beamformer with a lower localization bias.

To assess the performances of the six procedures ma, mi, gma, gmi, adp and sh when there are more than two spatially correlated sources, we applied these procedures to the 30 data sets generated for Scenario 5. We calculated the average localization bias for each procedure and presented them in Figure 9. It can be seen from these plots that like in twosource scenarios, mi can have superior performance over the other procedures. However, compared the above result to those in Scenario 3, we can see that the source cancellation from r3 has increased the average localization bias from zero to the value of three.

Note that although Theorem 2 suggests that in general the localization bias will be reduced as the sampling rate increases, it does not implies the localization bias is a monotone function of the sampling rate (or the number of time instances). In fact, from row 4 in Figure 2 and row one in Figure 9, it can be seen that the localization bias when J = 500 is smaller than when J = 1000, 2000 and 3000. A potential explanation is that in ﬁnite cases a higher sampling rate may cause a higher amount of leakage of background noises (in a neighborhood of the target location) into the neuronal activity index calculation.

Finally, we notice that we also carried out simulations with the soft thresholding (δ0 = 1). The result is very similar to the case with δ0 = 4. For reasons of space, we do not report it here.

Zhang and Liu

5.2 Face-perception data

We applied the proposed methodology to human MEG data acquired in ﬁve sessions by Wakeman and Henson (Henson et al., 2011). In each session, 96 face trials and 50 scrambled face trials were performed on a healthy young adult subject. Each trial started with a central ﬁxation cross (presented for a random duration of 400 to 600 ms), followed by a face or scrambled face (presented for a random duration of 800 to 1000 ms), and followed by a central circle for 1700 ms. The subject used either his/her left or right index ﬁnger to report whether he/she thought the stimulus was symmetrical or asymmetrical vertically through its center. The data were collected with a Neuromag Vector View system, containing a magnetometer and two orthogonal, planar gradiometers located at each of 102 positions within a hemispherical array situated in a light, magnetically shielded room. The sampling rate was 1100Hz. We focused our analysis on localizing non-null source positions, where neuronal activity increases for the face stimuli relative to the scrambled face stimuli.

For this purpose, we normalized the subject s MRI scan to a MRI template by using the Field Trip, on which a grid CTF system of 1 cm resolution was created with 1487 points. For each session, we applied the neuroimaging software SPM8 to read and preprocess the recorded data, and to epoch and average the data generated from the face stimulus trials and the scrambled face stimulus trials respectively. This gives rise to ﬁve 306 771 data matrices: the ﬁrst 220 columns for 200ms pre-stimuli and the later 551 columns for the stimuli. For each session, we calculated the sample covariance ˆC and noise covariance ˆC0 by using the stimulus data and the pre-stimulus data respectively. We estimated the baseline noise level by ˆσ2 0, the minimum diagonal element in ˆC0. We applied the beamforming procedures ma, mi, gma, gmi, adp, and sh to the face data set and the scrambled face data set respectively, obtaining the log-contrasts at each grid point. Here, if there exist the negative eigenvalues of the covariance estimators (used in ma, mi, gma, gmi, adp and sh), we set them to zeros and added ϵ0 to them to make the resulting covariance estimators positive deﬁnite, where ϵ0 was determined by the maximum eigenvalue of the noise matrix ˆC0. For each procedure, we interpolated and overlaid its log-contrasts on the structural MRI of the subject, obtaining its index map. There were no visible diﬀerences among the maps derived from ma, mi, gma, gmi and sh. The map derived from the adp slightly diﬀered from the rest. So, we reported only the mi-based and adp-based maps below.

For each session, we ﬁrst identiﬁed the global peak location from each map, followed by slicing the maps through their global peak locations as shown in Figure 10. For sessions 1 4, the global peaks derived from the mi and adp were the same, which were located at ( 4, 3, 8)cm, ( 1, 6, 8)cm, ( 6, 2, 5)cm, and ( 4, 4, 6)cm respectively. However, for session 5, the global peaks derived from the mi and the adp were located at two slightly diﬀerent positions, ( 4, 4, 6)cm and ( 7, 3, 6)cm. We then projected the data along the associated optimal weight directions, obtaining estimated time-courses at these global peaks. For reasons of space, we presented only these time-courses derived from the procedure mi. See Figure 12. Finally, we made 20 transverse slices along the z-axis to identify the local peaks. There were some subtle diﬀerences between the mi-based and the adp-based local peaks. For example, in session 1, the mi-based local peaks were located at (1, 5, 2) cm, (0, 1, 11) cm, (3, 2, 10) cm, (3, 4, 9) cm, ( 5, 3, 3) cm, ( 4, 3, 4) cm, ( 2, 1, 1) cm,( 4, 3, 1) cm,( 2, 1, 0) cm,( 4, 5, 5) cm,( 4, 2, 6) cm,( 5, 3, 7) cm and

LCMV Beamforming

( 4, 3, 8) cm, whereas the adp-based local peaks were located at (3, 2, 2)cm, (0, 1, 11)cm, ( 4, 3, 9)cm, ( 6, 2, 1)cm, ( 4, 3, 4)cm, (2, 3, 10)cm, ( 4, 3, 1) cm, ( 1, 1, 0) cm, ( 3, 6, 3) cm, ( 4, 4, 5) cm, ( 4, 2, 6) cm, ( 5, 3, 7) cm, and ( 4, 3, 8)cm. They are not the same as shown in Figure 11. Note that the previous simulations demonstrated that the procedure mi was expected to give a more accurate localization result than did the procedure adp. Although the areas highlighted in Figures 10 and 11 were varying over sessions, they did reveal the following known regions of face perception: the occipital face area (OFA), the inferior occipital gyrus (IOG), and the superior temporal sulcus (STS), and the precuneus (PCu). Interestingly, in each session, we identiﬁed a pair of nearly symmetric sources, of which one was strongly powered while the other was weakly powered. This phenomenon occurred due to source cancellations that prevented the second source from identiﬁcation as we have demonstrated in our simulation studies. The time-courses plots in Figure 12 showed the response diﬀerences under face stimuli and scrambled face stimuli during the time period 100ms 300ms. The results are consistent with recent ﬁndings in face-perception studies by using an MEG-based multiple sparse prior approach (Friston et al., 2006; Henson et al., 2011) and by other empirical approaches (e.g., Pitcher et al., 2011; Kanwisher et al., 1997). However, in the ﬁrst two papers, the authors made a parametric model assumption on source temporal correlation structures and imposed a limit on the number of candidate sources in the model, whereas in our approach, the model is non-parametric and allows for arbitrary number of candidate sources.

6. Discussion and Conclusion

In the present study, we have proposed a class of vector-beamformers by thresholding the sensor sample covariance matrix. The consistency and the convergence rate of the proposed vector-beamformer estimation have been proved in the presence of multiple sources. The theory has provided a basis for choosing the threshold τn J = c0σ2 0 p

log(n)/J in the beamformer construction. However, it requires a number of conditions. As pointed out in Section 3, conditions (A1) (A4) are commonly used assumptions in literature for studying multiple time series (Sekihara and Nagarajan, 2008; Fan et al., 2011). We only need to validate the coherence stability condition which is new. Intuitively, the strength of correlations between sensors (therefore the absolute partial correlation) will increase when the number of sensors increases in general. Taking the face-perception data (session 1) as an example, we show how to validate it empirically by random sub-samples of the 306 sensors below. We take the ﬁrst two peaks in Figure 8 as two true sources. They are located at CTF (-4,3,8) cm and (-4,-5,5) cm respectively. First, we reparametrize the lead ﬁeld matrix as in Section 3. Then, for k = 1, 2, ..., 306, we randomly choose k sensors, obtaining a k 4461 sub lead ﬁeld matrix for the 1487 voxels in the brain. We calculate the maximum absolute partial correlation d12(k) = max{d1|2, d2|2} between the two sources and the maximum absolute correlation dmax(k) = max dx|2 for all voxels, where x is running over these voxels. Finally, we plot d12(k), dmax(k), and log(log(k)) against k = 1, 2, ..., 306 respectively as displayed in Figure 13. As expected, the result shows that both d12(k) and dmax(k) change very slowly when the number of sensors k changes, with a rate much slower than log(log(k)). This implies that the coherence stability condition nearly holds.

Zhang and Liu

In real world situations, the underlying number of true sources, q needs to be estimated. The inﬂuence of q on the beamformer estimators can be measured by the lead ﬁeld partial correlation coeﬃcient anq deﬁned in Section 3. In this paper, local peaks on transverse slices have been used to reduce the search space of sources. We can cluster the local peak values into two groups, one of which is taken as a group of potential sources. The size of the selected group gives an estimate of q. In the face-perception data, we have only presented the ﬁrst two sources which are ranked higher than the remaining local peaks, because these two are of clear neurological implications. Our approach is non-parametric in the sense that we have not made any parametric assumptions on the model (1.1). However, if we are willing to assume a family of parametric models for background noises, then we can determine q via model selection criteria such as Bayesian information criterion. By theoretical and empirical studies, we have shown that due to source cancellations, the beamformer power estimator can be inconsistent if the underlying multiple sources are not well separated in terms of a lead ﬁeld distance. Unlike the existing theories in the literature, the new theory is applicable to more general scenarios, where multiple sources exist and the sensor covariance matrix are estimated from the data. In the new theory, we do assume that the powers of the unknown no-null sources as well as the underlying number q are not growing with the number of sensors n. This assumption is natural to neurologists and has simpliﬁed mathematical derivations of the theory very much. However, the theory can be extended to the case where these quantities are growing with n. In the theory, we have not impose any constraint on p as we only consider local behavior of beamformers. If we want to investigate global properties of the neuronal activity map, then some constraints need to be imposed on the growth rate of p with respect to n. The performances of the proposed beamformers have further been assessed by simulations and real data analyses. We have demonstrated that thresholding the sensor covariance matrix can help reduce the source localization bias when the data have a low SNR value. We have applied the vector-beamformer to an MEG data set for identifying the active regions related to human face perception. Some excellent agreements have been found between the current results and the existing neurological facts on human face perception. Finally, we note that there are other ways to measure the contrast between two source covariances such as the information-divergence. The theory can be easily extended to this case. The details will be presented elsewhere.

In this section we prove the theorems and corollaries in Section 3. To prove Theorem 1, we need the following lemma.

Lemma 10 If anq as n , then we have

HT j C 1 k Hj = bjj|k + cjj|k

n + O(a 2 nk ), bjj|k = Σ 1 j , for 1 j k

HT j1C 1 k Hj2 = cj1j2|k

n + O(a 2 nk ), for 1 j1 = j2 k

HT j C 1 k Hx = bjx|k + cjx|k

n + O(a 2 nk ), for 1 j k, x / Rk

HT y C 1 k Hx = nayx|k + byx|k + O(a 1 nk ), for x, y / Rk

LCMV Beamforming

where ank = n min1 j k 1 tr a(j+1)(j+1)|j , Rk = {r1, . . . , rk}, Ck = Pk j=1 HT j Σj Hj +σ2 0In, and ayx|k, byx|k and cjj|k are deﬁned before and the other c s are deﬁned iteratively as follows:

bj1k|(k 1)Σ 1 k a 1 kk|(k 1), 1 j1 k 1, j2 = k Σ 1 k a 1 kk|(k 1)bkj2|(k 1), 1 j2 k 1, j1 = k cj1j2|(k 1) bj1k|(k 1)a 1 kk|(k 1)bkj2|(k 1), 1 j1 = j2 k 1.

akk|(k 1)Σk 1 bkx|(k 1) I3 + bkk|(k 1) akk|(k 1)Σk 1 akx|(k 1) o , j = k

cjx|(k 1) cjk|(k 1)a 1 kk|(k 1)akx|(k 1) bjk|(k 1)a 1 kk|(k 1)bkx|(k 1) 1 j k 1 +bjk|(k 1)a 1 kk|(k 1) Σ 1 k + bkk|(k 1) a 1 kk|(k 1)akx|(k 1).

Proof Note that under the stability condition and the assumption that anq , we have byx|k = O(1), 1 k q. And for any rx in the source space,

n = O(n 1), cyx|k

n = O(a 1 n(k 1)), 1 y k, 2 k q.

We prove the lemma by induction. For k = 1, we have

C 1 1 = σ 2 0 In σ 4 0 H1(Σ 1 1 + nσ 2 0 I3) 1HT 1 ,

HT 1 C 1 1 H1 = nσ 2 0 I3 n2σ 4 0 (Σ 1 1 + nσ 2 0 I3) 1

I3 I3 + Σ 1 1 σ2 0 n

= nσ 2 0 I3 + nΣ1σ 2 0 1

= Σ 1 1 I3 σ2 0Σ 1 1 /n + O(n 2)

= Σ 1 1 Σ 1 1 σ2 0Σ 1 1 /n + O(n 2)

= b11|1 + c11|1

n + O(n 2),

where b11|1 = Σ 1 1 , c11|1 = σ2 0Σ 2 1 .

Analogously,

HT 1 C 1 1 Hx = σ 2HT 1 Hx σ 4 0 n(Σ 1 1 + nσ 2 0 I3) 1HT 1 Hx

= I3 I3 + Σ 1 1 σ2 0/n 1 HT 1 Hx

I3 + σ2 0 n Σ 1 1

= Σ 1 1 ρ1x Σ 1 1 σ2 0Σ 1 1 ρ1x/n + O(n 2)

= b1x|1 + c1x|1

n + O(n 2),

Zhang and Liu

where b1x|1 = Σ 1 1 ρ1x, c11|1 = Σ 2 1 σ2 0ρ1x.

HT y C 1 1 Hx = σ 2 0 HT y Hx σ 4 0 HT y H1(Σ 1 1 + nσ 2 0 I3) 1HT 1 Hx

= nσ 2 0 ρyx σ 4 0 HT y H1 σ2 0 n I3 + σ2 0Σ 1 1 /n 1 HT 1 Hx

= nσ 2 0 ρyx nσ 2 0 ρy1 I3 σ2 0Σ 1 1 /n ρ1x + O(n 1)

= nσ 2 0 ρy1x + ρy1Σ 1 1 ρ1x + O(n 1)

= nayx|1 + byx|1 + O(n 1),

where ρy1x = ρyx ρy1ρ1x, ayx|1 = σ 2 0 ρy1x, byx|1 = ρy1Σ 1 1 ρ1x.

This implies the lemma holds for k = 1. Assuming the lemma holds for the cases with less or equal to k sources, we show that it is also true for the case with k + 1 sources by invoking the matrix inversion formulas

C 1 k+1 = C 1 k C 1 k Hk+1 Σ 1 k+1 + HT k+1C 1 k Hk+1 1 HT k+1C 1 k , (7.6)

C 1 k = C 1 k+1 + C 1 k+1Hk+1Σk+1HT k+1C 1 k .

The details are as follows. For 1 j k,

HT j C 1 k+1Hj = HT j C 1 k Hj HT j C 1 k Hk+1 Σ 1 k+1 + HT k+1C 1 k Hk+1 1 HT k+1C 1 k Hj

= bjj|k + cjj|k

n + O(a 2 nk ) bj(k+1)|k + cj(k+1)|k

n + O(a 2 nk )

Σ 1 k+1 + na(k+1)(k+1)|k + b(k+1)(k+1)|k + O(a 1 nk ) 1

bj(k+1)|k + cj(k+1)|k

n + O(a 2 nk ) T

= bjj|k + cjj|k

n + O(a 2 nk )

bj(k+1)|k + O(a 1 nk (na(k+1)(k+1)|k) 1 O(a 2 n(k+1))

bj(k+1)|k + O(a 1 nk ) T

= bjj|k + cjj|k

nbj(k+1)|ka 1 (k+1)(k+1)|kb T j(k+1)|k + O(a 2 n(k+1))

= bjj|(k+1) + cjj|(k+1)

n + O(a 2 n(k+1)).

For j = k + 1, we have

HT j C 1 k+1Hj = HT k+1C 1 k+1Hk+1 = HT k+1C 1 k Hk+1 I3 + Σk+1HT k+1C 1 k Hk+1 1

= na(k+1)(k+1)|k + b(k+1)(k+1)|k + O(a 1 nk )

I3 + Σk+1 na(k+1)(k+1)|k + b(k+1)(k+1)|k + O(a 1 nk ) 1

LCMV Beamforming

= na(k+1)(k+1)|k + b(k+1)(k+1)|k + O(a 1 nk ) na(k+1)(k+1)|k 1

I3 + (na(k+1)(k+1)|k) 1Σ 1 k+1 + O(a 2 n(k+1)) 1 Σ 1 k+1

= Σ 1 k+1 1

nΣ 1 k+1a 1 (k+1)(k+1)|kΣ 1 k+1 + O(a 2 n(k+1))

= b(k+1)(k+1)|(k+1) + c(k+1)(k+1)|(k+1)

n + O(a 2 n(k+1)).

This completes the proof of the ﬁrst equation in the lemma. To prove the second equation in the lemma, we let

B = Σk+1na(k+1)(k+1)|k 1 + Σk+1na(k+1)(k+1)|k 1

Σk+1b(k+1)(k+1)|k Σk+1na(k+1)(k+1)|k 1

Then, when 1 j1 k, j2 = k + 1, we have

HT j1C 1 k+1Hj2 = HT j1C 1 k+1Hk+1 = HT j1C 1 k Hk+1 I3 + Σk+1HT k+1C 1 k Hk+1 1

= bj1(k+1)|k + 1

ncj1(k+1)|k + O(a 2 nk )

I3 + Σk+1 na(k+1)(k+1)|k + b(k+1)(k+1)|k + O(a 1 nk 1

= bj1(k+1)|k + 1

ncj1(k+1)|k + O(a 2 nk ) Σk+1na(k+1)(k+1)|k 1

(I3 + B + O(a 2 n(k+1))) 1 Σk+1na(k+1)(k+1)|k 1

= bj1(k+1)|k Σk+1na(k+1)(k+1)|k 1

I3 + O (na(k+1)(k+1)|k) 1 Σk+1na(k+1)(k+1)|k 1

2 + O(a 2 n(k+1))

= 1 nbj1(k+1)|kΣ 1 k+1a 1 (k+1)(k+1)|k + O(a 2 n(k+1))

= cj1(k+1)|(k+1)

n + O(a 2 n(k+1)).

Similarly, when 1 j1 = j2 k, we have

HT j1C 1 k+1Hj2 = HT j1C 1 k Hj2 HT j1C 1 k Hk+1 Σ 1 k+1 + HT k+1C 1 k Hk+1 1 HT k+1C 1 k Hj2

= 1 ncj1j2|k + O(a 2 nk ) 1

nbj1(k+1|k)a 1 (k+1)(k+1)|kb(k+1)j2|k + O(a 2 n(k+1))

= 1 ncj1j2|(k+1) + O(a 2 n(k+1)).

We complete the proof of the second equation in the lemma. To prove the third equation in the lemma, we let

D = na(k+1)(k+1)|kΣk+1 1 + na(k+1)(k+1)|kΣk+1 1

b(k+1)(k+1|k)Σk+1 na(k+1)(k+1)|kΣk+1 1

F = (na(k+1)(k+1)|k) 1

2 Σ 1 k+1 + b(k+1)(k+1)|k (na(k+1)(k+1)|k) 1

Zhang and Liu

Then, for j = k + 1,

HT j C 1 k+1Hx = I3 + HT k+1C 1 k Hk+1Σk+1 1 HT k+1C 1 k Hx

= I3 + na(k+1)(k+1)|kΣk+1 + b(k+1)(k+1)|kΣk+1 + O(a 1 nk ) 1

na(k+1)x|k + b(k+1)x|k + O(a 1 nk )

= na(k+1)(k+1)|kΣk+1 1

2 (I3 + D + O(a 2 n(k+1))) 1 na(k+1)(k+1)|kΣk+1 1

na(k+1)x|k + b(k+1)x|k + O(a 1 nk )

= a(k+1)(k+1)|kΣk+1 1

2 n I3 D + O(a 2 n(k+1)) o

a(k+1)(k+1)|kΣk+1 1

2 a(k+1)x|k + b(k+1)x|k/n + O(a 1 nk )/n

= a(k+1)(k+1)|kΣk+1 1 a(k+1)x|k

a(k+1)(k+1)|kΣk+1 1/2 D a(k+1)(k+1)|kΣk+1 1/2 a(k+1)x|k

+O(a 3 n(k+1)) + 1

n a(k+1)(k+1)|kΣk+1 1 b(k+1)x|k + O(a 2 n(k+1))

= a(k+1)(k+1)|kΣk+1 1 a(k+1)x|k + 1

n a(k+1)(k+1)|kΣk+1 1

n b(k+1)x|k I3 + b(k+1)(k+1)|kΣk+1 a(k+1)(k+1)|kΣk+1 1 a(k+1)x|k o

+O(a 2 n(k+1))

= b(k+1)x|(k+1) + 1

nc(k+1)x|(k+1) + O(a 2 n(k+1)).

For 1 j k, we have

HT j C 1 k+1Hx = HT j C 1 k Hx HT j C 1 k Hk+1 Σ 1 k+1 + HT k+1C 1 k Hk+1 1 HT k+1C 1 k Hx

= bjx|k + 1

ncjx|k + O(a 2 nk ) bj(k+1)|k + 1

ncj(k+1)|k + O(a 2 nk )

Σ 1 k+1 + na(k+1)(k+1)|k + b(k+1)(k+1)|k + O(a 2 nk ) 1

na(k+1)x|k + b(k+1)x|k + O(a 1 nk )

= bjx|k + 1

ncjx|k + O(a 2 nk ) bj(k+1)|k + 1

ncj(k+1)|k + O(a 2 nk )

(na(k+1)(k+1)|k) 1/2 I3 + F + O(a 2 nk ) 1 (na(k+1)(k+1)|k) 1/2

na(k+1)x|k + b(k+1)x|k + O(a 1 nk )

= bjx|k + 1

ncjx|k + O(a 2 nk ) bj(k+1)|k + 1

ncj(k+1)|k + O(a 2 nk )

a 1 (k+1)(k+1)|ka(k+1)x|k a 1/2 (k+1)(k+1)|k Fa 1/2 (k+1)(k+1)|ka(k+1)x|k + O(a 2 nk )

(na(k+1)(k+1)|k) 1b(k+1)x|k a 1/2 (k+1)(k+1)|k Fa 1/2 (k+1)(k+1)|kb(k+1)x|k/n

+O(a 2 n(k+1))

LCMV Beamforming

= bjx|k + 1

ncjx|k + O(a 2 nk ) bj(k+1)|k + 1

ncj(k+1)|k + O(a 2 nk )

a 1 (k+1)(k+1)|ka(k+1)x|k 1

na 1 (k+1)(k+1)|k(Σ 1 k+1 + b(k+1)(k+1)|k)

a 1 (k+1)(k+1)|ka(k+1)x|k 1 na 1 (k+1)(k+1)|kb(k+1)x|k + O(a 2 n(k+1))

= bjx|k + 1

ncjx|k bj(k+1)|ka 1 (k+1)(k+1)|ka(k+1)x|k

n cj(k+1)|ka 1 (k+1)(k+1)|ka(k+1)x|k + bj(k+1)|ka 1 (k+1)(k+1)|kb(k+1)x|k

bj(k+1)|ka 1 (k+1)(k+1)|k Σ 1 k+1 + b(k+1)(k+1)|k a 1 (k+1)(k+1)|ka(k+1)x|k o

+O(a 2 n(k+1))

= bjx|k bj(k+1)|ka 1 (k+1)(k+1)|ka(k+1)x|k + 1

ncjx|(k+1) + O(a 2 n(k+1)). (7.7)

Note that for k = j,

ajx|j = ajx|(j 1) ajj|(j 1)a 1 jj|(j 1)ajx|(j 1) = 0.

Assuming that for k = j + m, m > 0, the statement is true, i.e., ajx|(k+m) = 0 for all x. Then,

ajx|(j+m+1) = ajx|(j+m) aj(j+m+1)|(j+m)a 1 (j+m+1)(j+m+1)|(j+m)a(j+m+1)x|(j+m) = 0.

By induction, we have that ajx|k = 0 for all x, j k. This implies that and

bjx|(k+1) = bjx|k bj(k+1)|ka 1 (k+1)(k+1)|ka(k+1)x|k

by the deﬁnition of bjx|(k+1). Combining this with (7.7), we complete the proof of the third equation in the lemma. Finally, we turn to the last equation in the lemma. Assume that the equation holds for the case k. We show that it also holds for k + 1 below. For x, y / Rk+1 (thus x, y / Rk), by the assumption, we have

HT y C 1 k Hx = nayx|k + byx|k + O(a 1 nk ).

This together with (7.6) yields

HT y C 1 k+1Hx = HT y C 1 k Hx HT y C 1 k Hk+1 Σ 1 k+1 + HT k+1C 1 k Hk+1 1 HT k+1C 1 k Hx

= nayx|k + byx|k + O(a 1 nk ) nay(k+1)|k + by(k+1)|k + O(a 1 nk )

Σ 1 k+1 + na(k+1)(k+1)|k + b(k+1)(k+1)|k + O(a 1 nk ) 1

na(k+1)x|k + b(k+1)x|k + O(a 1 nk )

= nayx|k + byx|k + O(a 1 nk ) nay(k+1)|k + by(k+1)|k + O(a 1 nk )

Zhang and Liu

(na(k+1)(k+1)|k) 1 1

n2 a 1 (k+1)(k+1)|k Σ 1 k+1 + b(k+1)(k+1)|k

a 1 (k+1)(k+1)|k + O(a 3 n(k+1)) nay(k+1)|k + by(k+1)|k + O(a 1 nk )

= nayx|k + byx|k + O(a 1 nk ) n ay(k+1)|ka 1 (k+1)(k+1)|k

ay(k+1)|ka 1 (k+1)(k+1)|k Σ 1 k+1 + b(k+1)(k+1)|k a 1 (k+1)(k+1)|k/n

+by(k+1)|ka 1 (k+1)(k+1)|k/n + O(a 2 n(k+1)) o

na(k+1)x|k + b(k+1)x|k + O(a 1 nk )

= n h ayx|k ay(k+1)|ka 1 (k+1)(k+1)|ka(k+1)x|k i

+ h byx|k by(k+1)|ka 1 (k+1)(k+1)|ka(k+1)x|k ay(k+1)|ka 1 (k+1)(k+1)|kb(k+1)x|k

+ay(k+1)|ka 1 (k+1)(k+1)|k Σ 1 k+1 + b(k+1)(k+1)|k a 1 (k+1)(k+1)|ka(k+1)x|k i

+O(a 1 n(k+1))

= nayx|(k+1) + byx|(k+1) + O(a 1 n(k+1)).

The proof is completed.

Proof of Theorem 1. Note that byx|1 = ρ(ry, r1)Σ 1 1 ρ(x1, x), ayx|1 = σ 2 0 (ρyx ρy1ρ1x), and both are bounded. By induction and the stability condition, it can be shown that ayx|k and byx|k are bounded for 2 k q. If anq is bounded, then there exists km such that na(km+1)(km+1)|km = O(1) and ankm = min1 j km 1 na(j+1)(j+1)|j as n tends to inﬁnity. By Lemma 10, we have

HT km+1C 1 km Hkm+1 = na(km+1)(km+1)|km + b(km+1)(km+1)|km + O(a 1 nkm),

which is bounded and non-negative deﬁnite. Furthermore, there exists an orthogonal matrix Q and a diagonal matrix D = diag(d1, d2, d3) such that

Σ1/2 km+1HT km+1C 1 km Hkm+1Σ1/2 km+1 = QDQT .

HT km+1C 1 km+1Hkm+1

= HT km+1C 1 km Hkm+1 I3 Σ 1 km+1 + HT km+1C 1 km Hkm+1 1 HT km+1C 1 km Hkm+1

= HT km+1C 1 km Hkm+1 Σ 1 km+1 + HT km+1C 1 km Hkm+1 1 Σ 1 km+1

= HT km+1C 1 km Hkm+1Σ1/2 km+1 I3 + Σ1/2 km+1HT km+1C 1 km Hkm+1Σ1/2 km+1 1 Σ 1/2 km+1

= Σ 1/2 km+1QDQT I3 + QDQT 1 Σ 1/2 km+1

= Σ 1/2 km+1 I3 + QD 1QT 1 Σ 1/2 km+1

= Σ 1/2 km+1 Q(I3 + D 1)QT 1 Σ 1/2 km+1

= Σ 1/2 km+1Q(I3 + D 1) 1QT Σ 1/2 km+1. (7.8)

LCMV Beamforming

Note that Σ1/2 km+1HT km+1C 1 km Hkm+1Σ1/2 km+1 = O(1), which implies that dk 0, 1 k 3 are bounded. We can ﬁnd a positive constant ϵ0 such that max1 k 3(1 + d 1 k ) 1 < (1 + ϵ0) 1

when n is large enough. Consequently, for any vector a R3 with ||a|| = 1, we have

a T Σ1/2 km+1Q(I3 + D 1)QT Σ1/2 km+1a > (1 + ϵ0)a T Σ1/2 km+1QQT Σ1/2 km+1a,

which shows that Σ1/2 km+1Q(I3 + D 1)QT Σ1/2 km+1 (thus HT km+1C 1 km+1Hkm+1 1 due to (7.8)) is asymptotically larger than Σkm+1(1 + ϵ0). We now consider the case where anq . For j = q, by Lemma 10, we have

n = Σ 1 q naqq|(q 1) 1 Σ 1 l = O(a 1 nq ),

HT q C 1 q Hq 1 = h Σ 1 q + cqq|q

n + O(a 2 nq ) i 1 ,

= Σ1/2 q h I3 + Σ1/2 q cqq|q

n Σ1/2 q + O(a 2 nq ) i 1 Σ1/2 q

= Σ1/2 q h I3 Σ1/2 q cqq|q

n Σ1/2 q + O(a 2 nq ) i Σ1/2 q

= Σq naqq|(q 1) 1 + O(a 2 nq )

as n . For 1 j q 1, by Lemma 10, we have

HT j C 1 q Hj = Σ 1 j + cjj|q

n + O(a 2 nq ),

where cjj|q

n = O(a 1 nq ). This entails

HT j C 1 q Hj 1 = Σ1/2 j

nΣ1/2 j cjj|qΣ1/2 j + O(a 2 nq ) 1 Σ1/2 j

nΣ1/2 j cjj|qΣ1/2 j + O(a 2 nq ) Σ1/2 j

nΣjcjj|qΣj + O(a 2 nq ).

For any location rx, by Lemma 10, we have

HT x C 1 q Hx 1 = 1 n

na 1 xx|qbxx|q + O(a 2 nq ) 1 a 1 xx|q

na 1 xx|qbxx|q + O(a 2 nq ) a 1 xx|q

= 1 na 1 xx|q 1

n2 a 1 xx|qbxx|qa 1 xx|q + O(a 3 nq ).

The proof is completed. Proof of Corollary 2. First, let An = Hkm+1T C 1 l Hkm+1 1. If anq = O(1) and max1 k q dk|q = O(1), then by Theorem (1), there exists a positive constant ϵ0 such that

Zhang and Liu

min||a||=1 a T (An Σkm+1)a > ϵ0 for large n. Let a1 = (1, 0, 0)T , a2 = (0, 1, 0)T and a3 = (0, 0, 1)T . Then, we have

tr(An) = tr(An

k=1 aka T k ) =

k=1 tr(Anaka T k )

k=1 a T k Anak > 3ϵ0 +

k=1 akΣkm+1a T k

k=1 tr(Σkm+1aka T k ) = 3ϵ0 + tr(Σkm+1

k=1 aka T k )

= 3ϵ0 + tr(Σkm+1),

which implies tr(An) is asymptotically larger than Σkm+1. To prove Theorem 2, we need two more lemmas as follows and the following condition

(A1 ) : {Y(tj) : 1 j J} is stationary and has a ﬁnite covariance matrix.

Lemma 11 Under Conditions (A1 ) and (A3) (A4), if τn J = O( p

log(n)/J) and nτn J = o(1) as n and J , then

(i) max1 i,j n |ˆcij cij| = Op( p

(ii) || ˆC(τn J) C|| = Op(mn p

(iii) || ˆC(0) C|| (mn + n)τn J,

where mn = max1 i n Pn j=1 I(cij = 0) n.

Proof. Let κ3 = max{2(2/κ1+1/κ2) 1, (4/3)(1/κ1+1/κ2) 1/3, 1}. Then n p

log(n)/J = o(1) yields (log(n))κ3/J = o(1). We adopted the techniques of Bickel and Levina (2008); Fan et al. (2011); Zhang et al. (2014) to prove it. To prove (i), we set up more notations. Let τ(t) be the so-called Dedecker-Prieur τ-mixing coeﬃcients (Merlev ede et al., 2011, see). Let

Θ(u, t) = {v > 0 : P(|y1(t)y2(t)| > v) u}, ψy(M, t) = max{min{yi(t)yj(t), M}, M}.

It follows from Lemma 7 in Dedecker and Prieur (2004) that

sup t Θ(u, t) b1(1 log(u))2/κ1,

which, under Condition (A4), gives τ(t) b2 exp( b3tκ2). Similarly, it is derived from Remark 3 in Merlev ede et al. (2011) that

sup M>0 [sup t var(ψy(M, t)) + 2 X

t1>t2 |cov(ψy(M, t1), ψy(M, t2))]

sup M>0 sup t var(ψy(M, t))

sup M>0 sup t var(ψy(M, t)) + 4 X

0 (sup t Θ(u))2du

LCMV Beamforming

Let 1/κ = 2/κ1 + 1/κ2. By Theorem 1 in Merlev ede et al. (2011), we can ﬁnd positive constants dk, 1 k 5 that only depend on τ1, κ2, b2, b3 such that

t=1 yi(t)yj(t) cij| u

J exp (Ju)κ

+ exp (Ju)2

d2(1 + Jd3))

d5(log(Ju))κ

Consequently,

max 1 i,j n | 1

t=1 yi(t)yj(t) cij| > u

n2 max 1 i,j n P

t=1 yi(t)yj(t) cij| > u

n2J exp (Ju)κ

+ n2 exp (Ju)2

d2(1 + Jd3)

d5(log(Ju))κ

Let u = A p

log(n)/J. Then Ju = p

J log(n). When both n and J tend to inﬁnity, we have

n2J exp (Ju)κ

2 log(n) + log(J) (A p

(2(log(n))1 κ/2

d1 )(J log(n))κ/2 + log(J)

since (log(n))1 κ/2/Jκ/2 = o(1). Similarly, if we choose A > p

2d2(d3 + 1), we have

n2 exp (Ju)2

d2(1 + Jd3)

= n2 exp A2J log(n)

d2(1 + Jd3)

d2(d3 + 1/J)

log(n) = o(1).

d5(log(Ju))κ

Aκ(1 κ)(J log(n))κ(1 κ)/2

J log(n)))κ

Zhang and Liu

max 1 i,j n | 1

t=1 yi(t)yj(t) cij| > u

= o(1). (7.9)

Note that for u = A p

log(n)/J, there exist positive constants dk, 1 k 5 so that

P( max 1 i,j n | yi|| yj| > u) = P( max 1 i n | yi| > u)

n max 1 i n P(| yi| > u)

= n J exp (J u)κ1

+ n exp (J u)2

d2(1 + Jd3)

(J u)κ1(1 κ1)

d5(log(Ju))κ1

since (log(n))4/(3κ1) 1/3/J = o(1) and log(n)/J = o(1). This together with (7.9) yields that for u = O( p

P max 1 i,j n |ˆcij cij| > u P

max 1 i,j n | 1

t=1 yi(t)yj(t) cij| > u

+P max 1 i,j n | yi|| yj| > u = o(1),

which implies max 1 i,j n |ˆcij cij| = Op p

We turn to ˆC(τn J) in (ii). Let T1 = ||(ˆcij I(|ˆcij| > τn J) cij I(|cij| > τn J))||. We have

|| ˆC(τn J) C|| ||(ˆcij I(|ˆcij| > τn J) cij I(|cij| > τn J))|| + ||(cij I(|cij| τn J))||

j=1 |cij|I(|cij| τn J)

T1 + τn Jmn. (7.10)

Similarly, we have

|| ˆC(0) C|| T1 + τn Jmn + max i

j=1 |ˆcij|I(|ˆcij| τn J)

T1 + (mn + n)τn J.

j=1 |ˆcij I(|ˆcij| > τn J) cij I(|cij| > τn J)|

LCMV Beamforming

j=1 |ˆcij (I(|ˆcij| > τn J, |cij| τn J) + I(|ˆcij| > τn J, |cij| > τn J))

cij (I(|cij| > τn J, |ˆcij| > τn J) + I(|cij| > τn J, |ˆcij| τn J)) |

I +II +III,

j=1 |ˆcij cij|I(|ˆcij| > τn J, |cij| > τn J),

j=1 |ˆcij|I(|ˆcij| > τn J, |cij| τn J),

III = max i

J |cij|I(|cij| > τn J, |ˆcij| τn J).

We bound the above three terms as follows.

I max i,j |ˆcij cij| max i

j=1 I(|cij| > 0)

log(n)/J . (7.11)

For δ > 0, using the equality in (i), we have

j=1 |ˆcij cij|I(|ˆcij| > τn J, |cij| τn J)

j=1 |cij|I(|cij| τn J)

j=1 |ˆcij cij|I(|ˆcij| > τn J, |cij| δτn J)

j=1 |ˆcij cij|I(|ˆcij| > τn J, δτn J < |cij| < τn J) + τn Jmn

max i,j |ˆcij cij|

j=1 I(|ˆcij| > τn J, |cij| δτn J) + mn

j=1 I(|ˆcij cij| (1 δ)τn J) + mn

log(n)/J)(op(1) + mn) + τn Jmn = Op(τn Jmn), (7.12)

j=1 I(|ˆcij cij| (1 δ)τn J) > ϵ

P max i,j |ˆcij ci,j| (1 δ)τn J

Zhang and Liu

j=1 (|ˆcij cij| + |ˆcij||) I(|cij| > τn J, |ˆcij| τn J)

max i,j |ˆcij cij|

j=1 I(|cij| > τn J) + τn J max i

j=1 I(|cij| > τn J)

Op(τn J)mn + τn Jmn = Op(τn Jmn).

Combining this with (7.11), (7.12) and (7.10), we obtain the desired result in (ii). The proof is completed.

Lemma 12 Under Conditions (A1 ) and (A3) (A4), if τn J = O( p

log(n)/J) and nτn J = o(1) as n and J , then

(i) || ˆC(τn J) 1 C 1|| = Op(mnτn J) and || ˆC(τn J) 2 C 2|| = Op(mnτn J),

(ii) || ˆC(0) 1 C 1|| Op(τn J(mn + n)); || ˆC(0) 2 C 2|| Op(τn J(mn + n)),

where mn = max1 i n Pn j=1 I(cij = 0) n.

Proof. Let κ3 = max{2(2/κ1+1/κ2) 1, (4/3)(1/κ1+1/κ2) 1/3, 1}. Then n p

log(n)/J = o(1) yields (log(n))κ3/J = o(1). If let λmin(C) denote the minimum eigenvalue of C, then we have that λmin(C) σ2 0. If let λmin( ˆC(τn J)) denote the minimum eigenvalue of ˆC(τn J), then it follows from Lemma 11 that

λmin( ˆC(τn J)) = λmin(C) + Op(mnτn J)

σ2 0 + Op(mnτn J),

which is bounded below by σ2 0/2 if τn Jmn is small enough. Therefore, we have

|| ˆC(τn J) 1 C 1|| = || ˆC(τn J) 1(C ˆC(τn J))C 1||

|| ˆC(τn J) 1||(||C ˆC(τn J)||)||C 1||

λmin( ˆC(τn J)) 1λmin(C) 1||C ˆC(τn J)|| = Op(τn Jmn).

|| ˆC(τn J) 2 C 2|| || ˆC(τn J) 1( ˆC(τn J) 1 C 1)|| + ||( ˆC(τn J) 1 C 1)C 1||

|| ˆC(τn J) 1|||| ˆC(τn J) 1 C 1|| + || ˆC(τn J) 1 C 1||||C 1||

= (λmin( ˆC(τn J)) 1 + λmin(C) 1)|| ˆC(τn J) 1 C 1||

= Op(τn Jmn).

|| ˆC(0) 1 C 1|| Op(τn J(mn + n)),

|| ˆC(0) 2 C 2|| Op(τn J(mn + n)).

LCMV Beamforming

Proof of Theorem 6. Note that for any x, HT x Hx = n. We have

|| h HT j ˆC(τn J) 1Hj i 1 HT j C 1Hj 1 ||

= || h HT j ˆC(τn J) 1Hj i 1 HT j C 1Hj HT j ˆC(τn J) 1Hj HT j C 1Hj 1 ||

n|| h HT j ˆC(τn J) 1Hj/n i 1 ||||HT j C 1Hj/n HT j ˆC(τn J) 1Hj/n||

|| HT j C 1Hj/n 1 ||

n|| h HT j ˆC(τn J) 1Hj/n i 1 ||||C 1 ˆC(τn J) 1|||| HT j C 1Hj/n 1 ||,

which combining with Lemma 11 yields

|| h HT j ˆC(τn J) 1Hj i 1 HT j C 1Hj 1 ||

log(n)/J)|| h HT j ˆC(τn J) 1Hj i 1 |||| HT j C 1Hj 1 ||. (7.13)

Let λm and ˆλm denote the smallest eigenvalues of HT j C 1Hj and HT j ˆC(τn J) 1Hj respectively. Invoking Theorem 1, (HT j C 1Hj) 1 = Σj + o(1). There exists a positive constant ϵ0 such that for large n, λm ϵ0. By the deﬁnition, there exists am R3 with ||am|| = 1, such that ˆλm = a T m HT j ˆC(τn J) 1Hjam. So

|ˆλm a T m HT j C 1Hjam| = |(Hiam)T ( ˆC 1 C 1)Hiam| n|| ˆC(τn J) 1 C 1||

which implies

ˆλm a T m HT j C 1Hjam Op(n2p

log(n)/J) ϵ0 Op(n2p

This shows that for large n, ˆλm is bounded below from zero. Consequently, we have

|| h HT j ˆC(τn J) 1Hj i 1 || = O(1), || HT j C 1Hj 1 || = O(1).

This together with (7.13) proves that

|| h HT j ˆC(τn J) 1Hj i 1 HT j C 1Hj 1 || = Op(n2p

The proof is completed. Proof of Corollary 7. It follows from Theorem 6 directly. The details are omitted.

Acknowledgments

We thank Professor Richard Henson from MRC Cognition and Brain Sciences Unit, Cambridge for sharing his MEG data with us. The software SPM8 is available at http://www.fil.ion.ucl.ac.uk/spm/software/spm8/. We also thank the Editor and three anonymous reviewers for their helpful comments.

Zhang and Liu

P. J. Bickel and E. Levina. Covariance regularization by thresholding. The Annals of Statistics, pages 2577 2604, 2008.

A. Bolstad, B. Van Veen, and R. Nowak. Space time event sparse penalization for magneto- /electroencephalography. Neuro Image, 46(4):1066 1081, 2009.

M. J. Brookes, J. Vrba, S. E. Robinson, C. M. Stevenson, A. M. Peters, G. R. Barnes, A. Hillebrand, and P. G. Morris. Optimising experimental design for meg beamformer imaging. Neuroimage, 39(4):1788 1802, 2008.

T. Cai and W. Liu. Adaptive thresholding for sparse covariance matrix estimation. Journal of the American Statistical Association, 106(494):672 684, 2011.

J. Dedecker and C. Prieur. Coupling for τ-dependent sequences and applications. Journal of Theoretical Probability, 17(4):861 885, 2004.

J. Fan, Y. Liao, and M. Mincheva. High dimensional covariance matrix estimation in approximate factor models. Annals of statistics, 39(6):3320, 2011.

K. Friston, R. Henson, C. Phillips, and J. Mattout. Bayesian estimation of evoked and induced responses. Human brain mapping, 27(9):722 735, 2006.

J. H. Goodnight. A tutorial on the sweep operator. The American Statistician, 33(3): 149 158, 1979.

M. H am al ainen, R. Hari, R. J. Ilmoniemi, J. Knuutila, and O. V. Lounasmaa. Magnetoencephalographytheory, instrumentation, and applications to noninvasive studies of the working human brain. Reviews of modern Physics, 65(2):413, 1993.

R. N. Henson, D. G. Wakeman, V. Litvak, and K. J. Friston. A parametric empirical bayesian framework for the eeg/meg inverse problem: generative models for multi-subject and multi-modal integration. Frontiers in human neuroscience, 5, 2011.

A. Hillebrand, K. D. Singh, I. E. Holliday, P. L. Furlong, and G. R. Barnes. A new approach to neuroimaging with magnetoencephalography. Human brain mapping, 25(2):199 211, 2005.

M. X. Huang, J. J. Shih, R. R. Lee, D. L. Harrington, R. J. Thoma, M. P. Weisend, F. Hanlon, K. M. Paulson, T. Li, K. Martin, et al. Commonalities and diﬀerences among vectorized beamformers in electromagnetic source imaging. Brain topography, 16(3):139 158, 2004.

N. Kanwisher, J. Mc Dermott, and M. M. Chun. The fusiform face area: a module in human extrastriate cortex specialized for face perception. The Journal of Neuroscience, 17(11): 4302 4311, 1997.

O. Ledoit and M. Wolf. A well-conditioned estimator for large-dimensional covariance matrices. Journal of multivariate analysis, 88(2):365 411, 2004.

LCMV Beamforming

F. Merlev ede, M. Peligrad, and E. Rio. A bernstein type inequality and moderate deviations for weakly dependent sequences. Probability Theory and Related Fields, 151(3-4):435 474, 2011.

J. C. Mosher, R. M. Leahy, and P. S. Lewis. Eeg and meg: forward solutions for inverse methods. Biomedical Engineering, IEEE Transactions on, 46(3):245 259, 1999.

R. Oostenveld, P. Fries, E. Maris, and J. Schoﬀelen. Fieldtrip: open source software for advanced analysis of meg, eeg, and invasive electrophysiological data. Computational intelligence and neuroscience, 2011, 2010.

D. Pitcher, D. D. Dilks, R. R. Saxe, C. Triantafyllou, and N. Kanwisher. Diﬀerential selectivity for dynamic versus static information in face-selective cortical regions. Neuroimage, 56(4):2356 2363, 2011.

M. A. Quraan, S. N. Moses, Y. Hung, T. Mills, and M. J. Taylor. Detection and localization of hippocampal activity using beamformers with meg: a detailed investigation using simulations and empirical data. Human brain mapping, 32(5):812 827, 2011.

J. O. Ramsay. Functional data analysis. Wiley Online Library, 2006.

S. E. Robinson. Functional neuroimaging by synthetic aperture magnetometry (sam). Recent advances in biomagnetism, pages 302 305, 1999.

A. Rodr ıguez-Rivera, B. V. Baryshnikov, B. D. Van Veen, and R. T. Wakai. Meg and eeg source localization in beamspace. Biomedical Engineering, IEEE Transactions on, 53(3): 430 441, 2006.

A. J. Rothman, E. Levina, and J. Zhu. Generalized thresholding of large covariance matrices. Journal of the American Statistical Association, 104(485):177 186, 2009.

J. Sarvas. Basic mathematical and electromagnetic concepts of the biomagnetic inverse problem. Physics in medicine and biology, 32(1):11, 1987.

K. Sekihara and S. S. Nagarajan. Adaptive spatial ﬁlters for electromagnetic brain imaging. Springer Science & Business Media, 2008.

K. Sekihara, S. S. Nagarajan, D. Poeppel, and A. Marantz. Asymptotic snr of scalar and vector minimum-variance beamformers for neuromagnetic source reconstruction. Biomedical Engineering, IEEE Transactions on, 51(10):1726 1734, 2004.

B. D. Van Veen, W. Van Drongelen, M. Yuchtman, and A. Suzuki. Localization of brain electrical activity via linearly constrained minimum variance spatial ﬁltering. Biomedical Engineering, IEEE Transactions on, 44(9):867 880, 1997.

J. Zhang, C. Liu, and G. Green. Source localization with meg data: A beamforming approach based on covariance thresholding. Biometrics, 70(1):121 131, 2014.

Zhang and Liu

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=1/25,J=500

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=1/25,J=500

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=1/0.64,J=500

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=1/0.64,J=500

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=1/25,J=1000

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=1/25,J=1000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=1/0.64,J=1000

2 4 6 8 10 12 14

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=1/0.64,J=1000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=1/25,J=2000

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=1/25,J=2000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=1/0.64,J=2000

2 4 6 8 10 12

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=1/0.64,J=2000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=1/25,J=3000

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=1/25,J=3000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=1/0.64,J=3000

2 4 6 8 10 12

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=1/0.64,J=3000

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 1 SNR=1/25,J=500

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 2 SNR=1/25,J=500

8 10 12 14 16

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 1 SNR=1/0.64,J=500

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 2 SNR=1/0.64,J=500

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 1 SNR=1/25,J=1000

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 2 SNR=1/25,J=1000

8 10 12 14 16

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 1 SNR=1/0.64,J=1000

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 2 SNR=1/0.64,J=1000

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 1 SNR=1/25,J=2000

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 2 SNR=1/25,J=2000

8 10 12 14 16

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 1 SNR=1/0.64,J=2000

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 2 SNR=1/0.64,J=2000

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 1 SNR=1/25,J=3000

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 2 SNR=1/25,J=3000

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 1 SNR=1/0.64,J=3000

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 2 SNR=1/0.64,J=3000

Figure 2: Scenario 1: Two sources located at CTF coordinates (3, 1, 4)T cm and ( 5, 2, 6)T cm respectively. The ﬁrst four rows display the box-and-whisker plots of the index values and the localization biases against the tuning constant c0 = 0, 0.5, 1, 1.5, 2, ma, mi and sh for the combinations of n = 91, SNR= 1/25, 1/0.64, and J = 500, 1000, 2000, 3000 respectively. Here, ma and mi stand for the proposed hard-thresholded covariance based methods. sh stands for the optimal shrinkage-based method. With a slightly abuse of notation, c0 = ma, mi, sh refer to that ma, mi, and sh are used. The remaining rows present the box-and-whisker plots of the local localization bias to the sources r1 and r2 against the transverse slice indices from 0 to 10 when c0 was selected by the minimum strategy for the above combinations respectively. The red colored lines in the boxes are the medians. Note that when the distribution of the localization biases are degenerate, the upper and lower quartiles and medians of localization biases will be equal. Consequently, the box in the plot will reduce to a red colored line. The plots in the last four rows show that all the local peaks on the transverse slices are not close to the source location r1, implying that the source 1 has been masked on the neuronal activity map by source cancellations.

LCMV Beamforming

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=1/25,J=500

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=1/25,J=500

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=1/0.64,J=500

2 4 6 8 10 12 14

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=1/0.64,J=500

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=1/25,J=1000

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=1/25,J=1000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=1/0.64,J=1000

2 4 6 8 10 12

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=1/0.64,J=1000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=1/25,J=2000

2 4 6 8 10 12

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=1/25,J=2000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=1/0.64,J=2000

2 4 6 8 10 12

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=1/0.64,J=2000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=1/25,J=3000

2 4 6 8 10 12

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=1/25,J=3000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=1/0.64,J=3000

2 4 6 8 10 12

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=1/0.64,J=3000

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 1 SNR=1/25,J=500

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 2 SNR=1/25,J=500

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 1 SNR=1/0.64,J=500

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 2 SNR=1/0.64,J=500

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 1 SNR=1/25,J=1000

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 2 SNR=1/25,J=1000

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 1 SNR=1/0.64,J=1000

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 2 SNR=1/0.64,J=1000

12 14 16 18 20 22

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 1 SNR=1/25,J=2000

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 2 SNR=1/25,J=2000

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 1 SNR=1/0.64,J=2000

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 2 SNR=1/0.64,J=2000

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 1 SNR=1/25,J=3000

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 2 SNR=1/25,J=3000

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 1 SNR=1/0.64,J=3000

1 2 3 4 5 6 7 8 9 10 Slice

Biases to source 2 SNR=1/0.64,J=3000

Figure 3: Scenario 2: Two sources located at CTF coordinates ( 5, 5, 6)T cm and ( 6, 2, 5)T cm respectively. The ﬁrst four rows display the box-and-whisker plots of the index values and the localization biases against the tuning constant c0 = 0, 0.5, 1, 1.5, 2, ma, mi and sh for the combinations of n = 91, SNR= 1/25, 1/0.64, and J = 500, 1000, 2000, 3000 respectively. The remaining rows present the box-and-whisker plots of the minimum local localization bias to the sources r1 and r2 against the transverse slice indices from 0 to 10 when c0 is selected by the minimum strategy for the above combinations respectively. The red colored lines in the boxes are the medians. When the upper and lower quartiles and medians of localization biases have the same value, the box in the plot will reduce to a red colored line. The plots in the last four rows show that all the local peaks on the transverse slices are not close to the source location r1, implying the source 1 has been masked on the neuronal activity map by source cancellations.

Zhang and Liu

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=8.16,J=500

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=8.16,J=500

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=6.25,J=500

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=6.25,J=500

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=4,J=500

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=4,J=500

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=8.16,J=1000

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=8.16,J=1000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=6.25,J=1000

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=6.25,J=1000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=4,J=1000

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=4,J=1000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=8.16,J=2000

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=8.16,J=2000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=6.25,J=2000

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=6.25,J=2000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=4,J=2000

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=4,J=2000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=8.16,J=3000

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=8.16,J=3000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=6.25,J=3000

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=6.25,J=3000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=4,J=3000

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=4,J=3000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 1 SNR=8.16,J=500

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 2 SNR=8.16,J=500

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 1 SNR=6.25,J=500

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 2 SNR=6.25,J=500

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 1 SNR=4,J=500

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 2 SNR=4,J=500

468 10 12 14

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 1 SNR=8.16,J=1000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 2 SNR=8.16,J=1000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 1 SNR=6.25,J=1000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 2 SNR=6.25,J=1000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 1 SNR=4,J=1000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 2 SNR=4,J=1000

468 10 12 14

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 1 SNR=8.16,J=2000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 2 SNR=8.16,J=2000

468 10 12 14

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 1 SNR=6.25,J=2000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 2 SNR=6.25,J=2000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 1 SNR=4,J=2000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 2 SNR=4,J=2000

468 10 12 14

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 1 SNR=8.16,J=3000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 2 SNR=8.16,J=3000

468 10 12 14

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 1 SNR=6.25,J=3000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 2 SNR=6.25,J=3000

468 10 12 14

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 1 SNR=4,J=3000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 2 SNR=4,J=3000

Figure 4: Scenario 3: Two sources located at CTF coordinates (3, 1, 4)T cm and ( 5, 2, 6)T cm respectively. The ﬁrst six rows show the box-and-whisker plots of the index values and the localization biases against the tuning constant c0 = 0, 0.5, 1, 1.5, 2, ma, mi and sh for the combinations of n = 102 sensors, SNR= 1/0.352, 1/0.42, 1/0.52, and the sample rates J = 500, 1000, 2000, 3000 respectively. The last six rows give the box-and-whisker plots of the minimum local localization bias to the sources r1 and r2 against the transverse slice indices from 0 to 10 when c0 is selected by the minimum strategy for these combinations respectively. The red colored lines in the boxes are the medians. When the upper and lower quartiles and medians of localization biases are equal, the box in the plot will reduce to a red colored line. The last six rows of the plots show all the local peaks on the transverse slices are not close to the source location r1, implying the source 1 has been masked on the neuronal activity map by source cancellations. 2136

LCMV Beamforming

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=1.7313,J=500

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=1.7313,J=500

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=1.6437,J=500

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=1.6437,J=500

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=2.0408,J=500

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=2.0408,J=500

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=1.7313,J=1000

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=1.7313,J=1000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=1.6437,J=1000

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=1.6437,J=1000

0.92 0.94 0.96 0.981

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=2.0408,J=1000

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=2.0408,J=1000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=1.7313,J=2000

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=1.7313,J=2000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=1.6437,J=2000

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=1.6437,J=2000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=2.0408,J=2000

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=2.0408,J=2000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=1.7313,J=3000

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=1.7313,J=3000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=1.6437,J=3000

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=1.6437,J=3000

0 0.5 1 1.5 2 sh ma mi c0

LCMV: SNR=2.0408,J=3000

0 0.5 1 1.5 2 sh ma mi c0

Biases: SNR=2.0408,J=3000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 1 SNR=1.7313,J=500

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 2 SNR=1.7313,J=500

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 1 SNR=1.6437,J=500

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 2 SNR=1.6437,J=500

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 1 SNR=2.0408,J=500

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 2 SNR=2.0408,J=500

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 1 SNR=1.7313,J=1000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 2 SNR=1.7313,J=1000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 1 SNR=1.6437,J=1000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 2 SNR=1.6437,J=1000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 1 SNR=2.0408,J=1000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 2 SNR=2.0408,J=1000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 1 SNR=1.7313,J=2000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 2 SNR=1.7313,J=2000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 1 SNR=1.6437,J=2000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 2 SNR=1.6437,J=2000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 1 SNR=2.0408,J=2000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 2 SNR=2.0408,J=2000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 1 SNR=1.7313,J=3000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 2 SNR=1.7313,J=3000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 1 SNR=1.6437,J=3000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 2 SNR=1.6437,J=3000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 1 SNR=2.0408,J=3000

1 2 3 4 5 6 7 8 9 101112 Slice

Biases to source 2 SNR=2.0408,J=3000

Figure 5: Scenario 4: Two sources located at CTF coordinates ( 5, 5, 6)T cm and ( 6, 2, 5)T cm respectively. The ﬁrst six rows show the box-and-whisker plots of the index values and the localization biases against the tuning constant c0 = 0, 0.5, 1, 1.5, 2, ma, mi and sh for the combinations of n = 102 sensors, SNR= 1/0.352, 1/0.42, 1/0.52, and the sample rates J = 500, 1000, 2000, 3000 respectively. The last six rows give the box-and-whisker plots of the minimum local localization bias to the sources r1 and r2 against the transverse slice indices from 0 to 10 when c0 was selected by the minimum strategy for these combinations respectively. The red colored lines in the boxes are the medians. When the upper and lower quartiles and medians of localization biases are equal, the box in the plot will reduce to a red colored line. The last six rows of the plots show all the local peaks on the transverse slices are not close to the source location r1, implying the source 1 has been masked on the neuronal activity map by source cancellations. 2137

Zhang and Liu

ma mi gma gmi adp sh

Biases: SNR=1/25,J=500

ma mi gma gmi adp sh

Biases: SNR=1/0.64,J=500

ma mi gma gmi adp sh

Biases: SNR=1/25,J=1000

ma mi gma gmi adp sh

Biases: SNR=1/0.64,J=1000

ma mi gma gmi adp sh

Biases: SNR=1/25,J=2000

ma mi gma gmi adp sh

Biases: SNR=1/0.64,J=2000

ma mi gma gmi adp sh c0

Biases: SNR=1/25,J=3000

ma mi gma gmi adp sh c0

Biases: SNR=1/0.64,J=3000

ma mi gma gmi adp sh

Biases: SNR=1/25,J=500

ma mi gma gmi adp sh

Biases: SNR=1/0.64,J=500

ma mi gma gmi adp sh

Biases: SNR=1/25,J=1000

ma mi gma gmi adp sh

Biases: SNR=1/0.64,J=1000

ma mi gma gmi adp sh

Biases: SNR=1/25,J=2000

ma mi gma gmi adp sh

Biases: SNR=1/0.64,J=2000

ma mi gma gmi adp sh

Biases: SNR=1/25,J=3000

ma mi gma gmi adp sh

Biases: SNR=1/0.64,J=3000

Figure 6: Performance comparison of the six diﬀerent beamformers, namely ma, mi, gma, gmi, adp and sh in Scenarios 1 and 2. Here, ma and mi stand for the hardthresholded covariance based methods when the tuning constant c0 is chosen by use of the maximum strategy and the minimum strategy respectively; gma and gmi stand for the generalized thresholded covariance based methods when the tuning constant c0 is chosen by use of the maximum strategy and the minimum strategy respectively; adp and sh stand for the adaptive thresholding-based method and the optimal shrinkage-based method. The upper two rows of multiple box-whisker plots are for the combinations of n = 91, SNR= 1/25, 1/0.64, and J = 500, 1000, 2000, 3000 in Scenario 1, while the lower two rows are for the combinations of n = 91, SNR= 1/25, 1/0.64, and J = 500, 1000, 2000, 3000 in Scenario 2. Each panel shows the localization biases against the above six different beamformer methods. The red colored lines in the boxes are the medians. When the upper and lower quartiles and medians of localization biases are equal, the box in the plot will reduce to a red colored line.

LCMV Beamforming

ma mi gma gmi adp sh

Biases: SNR=8.16,J=500

ma mi gma gmi adp sh

Biases: SNR=6.25,J=500

ma mi gma gmi adp sh

Biases: SNR=4,J=500

ma mi gma gmi adp sh

Biases: SNR=8.16,J=1000

ma mi gma gmi adp sh

Biases: SNR=6.25,J=1000

ma mi gma gmi adp sh

Biases: SNR=4,J=1000

ma mi gma gmi adp sh

Biases: SNR=8.16,J=2000

ma mi gma gmi adp sh

Biases: SNR=6.25,J=2000

ma mi gma gmi adp sh

Biases: SNR=4,J=2000

ma mi gma gmi adp sh

Biases: SNR=8.16,J=3000

ma mi gma gmi adp sh

Biases: SNR=6.25,J=3000

ma mi gma gmi adp sh

Biases: SNR=4,J=3000

Figure 7: Performance comparison of the six diﬀerent beamformers, namely ma, mi, gma, gmi, adp and sh in Scenario 3. Multiple box-whisker plots of localization biases are displayed for the combinations of n = 102, SNR= 1/0.352, 1/0.42, 1/0.52, and J = 500, 1000, 2000, 3000. Each panel shows the localization biases against the six diﬀerent beamformer methods, namely ma, mi, gma, gmi, adp and sh. The red colored lines in the boxes are the medians. When the upper and lower quartiles and medians of localization biases are equal, the box in the plot will reduce to a red colored line.

Zhang and Liu

ma mi gma gmi adp sh

Biases: SNR=1.7313,J=500

ma mi gma gmi adp sh

Biases: SNR=1.6437,J=500

ma mi gma gmi adp sh

Biases: SNR=2.0408,J=500

ma mi gma gmi adp sh

Biases: SNR=1.7313,J=1000

ma mi gma gmi adp sh

Biases: SNR=1.6437,J=1000

ma mi gma gmi adp sh

Biases: SNR=2.0408,J=1000

ma mi gma gmi adp sh

Biases: SNR=1.7313,J=2000

ma mi gma gmi adp sh

Biases: SNR=1.6437,J=2000

ma mi gma gmi adp sh

Biases: SNR=2.0408,J=2000

ma mi gma gmi adp sh

Biases: SNR=1.7313,J=3000

ma mi gma gmi adp sh

Biases: SNR=1.6437,J=3000

ma mi gma gmi adp sh

Biases: SNR=2.0408,J=3000

Figure 8: Performance comparison of the six diﬀerent beamformers, namely ma, mi, gma, gmi, adp and sh in Scenario 4. Multiple box-whisker plots of localization biases are displayed for the combinations of n = 102, SNR= 1/0.352, 1/0.42, 1/0.52, and J = 500, 1000, 2000, 3000 in Scenario 4. Each panel shows the localization biases against the six diﬀerent beamformer methods, namely ma, mi, gma, gmi, adp and sh. The red colored lines in the boxes are the medians. When the upper and lower quartiles and medians of localization biases are equal, the box in the plot will reduce to a red colored line.

LCMV Beamforming

ma mi gmagmi adp sh

Biases: SNR=8.16,J=500

ma mi gmagmi adp sh

Biases: SNR=6.25,J=500

ma mi gmagmi adp sh

Biases: SNR=4,J=500

ma mi gma gmi adp sh

Biases: SNR=8.16,J=1000

ma mi gma gmi adp sh

Biases: SNR=6.25,J=1000

ma mi gmagmi adp sh

Biases: SNR=4,J=1000

ma mi gma gmi adp sh

Biases: SNR=8.16,J=2000

ma mi gma gmi adp sh

Biases: SNR=6.25,J=2000

ma mi gmagmi adp sh

Biases: SNR=4,J=2000

ma mi gma gmi adp sh

Biases: SNR=8.16,J=3000

ma mi gma gmi adp sh

Biases: SNR=6.25,J=3000

ma mi gmagmi adp sh

Biases: SNR=4,J=3000

Figure 9: Performance comparison of the six diﬀerent beamformers, namely ma, mi, gma, gmi, adp and sh in Scenario 5. Multiple box-whisker plots of localization biases are displayed for the combinations of n = 102, SNR= 1/0.352, 1/0.42, 1/0.52, and J = 500, 1000, 2000, 3000 respectively. Each panel shows the localization biases against the six diﬀerent beamformer methods, namely ma, mi, gma, gmi, adp and sh.

Zhang and Liu

Figure 10: Plots of the log-contrasts between the faces and scrambled faces on three orthogonal slices through the peak locations for each of ﬁve sessions, which are overlaid on the subject s MRI scan. The plots in the left-hand two columns and the right-hand two columns are derived from the procedures mi and adp respectively. Rows 1 and 2, 3 and 4, 5 and 6, and 7 and 8 are for sessions 1 5 respectively. The highlighted yellow colored areas revealed neuronal activity increases or decreases for the faces relative to the scrambled faces. The areas shown in the left-hand two columns are in or close to the IOG, STS, and PCu regions which are known to be related to the human face perception.

LCMV Beamforming

Figure 11: Plots of the log-contrasts between the faces and scrambled faces on 20 transverse slices for each of ﬁve sessions, which are overlaid on the subject s MRI scan. The plots in the left-hand column and the right-hand column are derived from the procedures mi and adp respectively. Rows 1 5 are for sessions 1 5 respectively. The highlighted yellow colored areas revealed neuronal activity increases for the faces relative to the scrambled faces. The areas highlighted in the ﬁrst column are in or close to the OFA, IOG, STS, and PCu regions which are known to be related to the human face perception.

Zhang and Liu

time course

time course

0 500 0.005

time course

time course

time course

time course

time course

time course

time course

0 500 0.025

time course

time course

time course

time course

0 500 0.025

time course

time course

Figure 12: Plots of the estimated time-courses at the global peaks along x, y and z-axes respectively for each of ﬁve sessions. The solid curve and the dashed curve in each plot stand for the estimated time-courses under the faces and the scrambled faces respectively. The plots are ordered from the top left panel to the right panel to the bottom panel corresponding to sessions 1 5.

LCMV Beamforming

0 50 100 150 200 250 300 350 1

number of sensors

Maximum partial correlation among sources

0 50 100 150 200 250 300 350 1

number of sensors

Maximum partial correlation among voxels

Figure 13: Plots of d12(k) and dmax(k) against k = 1, 2, ..., 306 respectively, where k stands for k randomly chosen sensors from the 306 sensors in the face-perception data, two sources are located at CTF (-4, 3,8)cm and (-4,-5,5) cm respectively, and the dashed curve in each plot is for the function log(log(k)).