# on_linearly_constrained_minimum_variance_beamforming__cf37772d.pdf Journal of Machine Learning Research 16 (2015) 2099-2145 Submitted 10/12; Published 12/15 On Linearly Constrained Minimum Variance Beamforming Jian Zhang jz79@kent.ac.uk Chao Liu cl304@kent.ac.uk School of Mathematics, Statistics and Actuarial Science University of Kent Canterbury, Kent CT2 7NF, UK Editor: Xiaotong Shen Beamforming is a widely used technique for source localization in signal processing and neuroimaging. A number of vector-beamformers have been introduced to localize neuronal activity by using magnetoencephalography (MEG) data in the literature. However, the existing theoretical analyses on these beamformers have been limited to simple cases, where no more than two sources are allowed in the associated model and the theoretical sensor covariance is also assumed known. The information about the effects of the MEG spatial and temporal dimensions on the consistency of vector-beamforming is incomplete. In the present study, we consider a class of vector-beamformers defined by thresholding the sensor covariance matrix, which include the standard vector-beamformer as a special case. A general asymptotic theory is developed for these vector-beamformers, which shows the extent of effects to which the MEG spatial and temporal dimensions on estimating the neuronal activity index. The performances of the proposed beamformers are assessed by simulation studies. Superior performances of the proposed beamformers are obtained when the signalto-noise ratio is low. We apply the proposed procedure to real MEG data sets derived from five sessions of a human face-perception experiment, finding several highly active areas in the brain. A good agreement between these findings and the known neurophysiology of the MEG response to human face perception is shown. Keywords: MEG neuroimaging, vector-beamforming, sparse covariance estimation, source localization and reconstruction 1. Introduction MEG is a non-invasive imaging technique that records brain activity with high temporal resolution. Postsynaptic current flow within the dendrites of active neurons generates a magnetic field that can be measured close to the scalp surface by use of sensors (H am al ainen et al., 1993). The magnitude of these measured fields is directly related to neuronal current strength, and hence their measurement will reflect the amplitude of brain activity. The major challenge, however, is to localize active regions inside the head, given the measured magnetic fields outside the head (i.e., given MEG data). This is an ill-posed problem of source localization since the magnetic fields could be caused by an infinite number of neuronal regions. Mathematically, the problem can be stated as follows: one observes a vector of time-series Y(t) = (Y1(t), ..., Yn(t))T Rn, t = tj, 1 j J from n sensors, c 2015 Jian Zhang and Chao Liu. Zhang and Liu which are linked to candidate sources located at rk, 1 k p in the brain via the model k=1 Hkmk(t) + ε(t), (1.1) where Hk is an n 3 lead field matrix at rk (i.e., the unit output of the candidate source at location rk, which is derived from Maxwell s equations), mk(t) with covariance matrix Σk is a 3 1 moment (time-course) at time t and location rk, ε(t) with covariance matrix σ2 0In represents white noises at the MEG channels, and In is the n n identity matrix. See Mosher et al. (1999) for more details. In practice, when candidate source locations (i.e., voxels) are created by discretizing the source space in the brain, the number of these sources can be substantially larger than the number of available sensors. Moreover, unlike the traditional functional data, not only source time courses but also sensor readings are spatially correlated. Therefore, searching for a small set of latent sources of non-null powers from a large number of candidates poses a challenge to standard i.i.d. sample-based methods in functional data analysis (Ramsay, 2006). Here, the source power at location rk is referred as the trace of the covariance matrix Σk. Two types of approaches have been proposed for handling the above problem in the literature: global approach and local approach (e.g., Henson et al., 2011; Bolstad et al., 2009; Van Veen et al., 1997; Robinson, 1999; Huang et al., 2004; Quraan et al., 2011). In the global approach, one puts all candidate sources into the model and solves a sparse estimation problem. In the local approach, on other hand, one invokes a list of local models, each is tailored to a particular candidate region. The global approach often requires to specify parametric models, while the local approach is model-free. When the number of candidate sources p is small or moderate compared to the number of available sensors n, one may use a Bayesian method to infer latent sources, with helps of computationally intensive algorithms (e.g., Henson et al., 2011). To make an accurate inference, a large p should be chosen. However, when p is large, the global approach may be computationally intractable and the local approach is preferred. Here, we focus on the so-called linearly constrained minimum variance (LCMV) beamforming (also called vector-beamforming), a local method for solving the above large-p-small-n problem. It involves two steps as follows: Projection step. For location rk in the source space, one searches for the optimal n 3 weighting-matrix W by minimizing the trace of the sample covariance of the projected data W T Y (tj), 1 j J, subject to W T Hk = I3, where I3 is a 3 3 identity matrix. This gives the optimal trace ˆSk = tr([HT k ˆC 1Hk] 1), (1.2) where ˆC is a sensor covariance estimator and for any invertible matrix A, A 1 denotes its inverse, and tr( ) stands for the matrix trace operator. See Van Veen et al. (1997) for the details. Mapping step. For each location rk, calculate the neuronal activity index ˆSk/(σ2 0tr([HT k Hk] 1)), where σ2 0 is estimated by certain baseline noise data such as the pre-stimulus data. Plot the index against the grid points, creating a neuronal activity map over a given temporal window. LCMV Beamforming In the projection step, the procedure aims at estimating the desired signal from each chosen location while minimizing the contributions of other unknown locations in the presence of noises by optimizing the variation of the projected data. This can be easily seen from the following decomposition of the projected covariance under the constrain W T Hk = I3: tr cov(W T Y(t)) = tr(Σk) + tr(W T cov( X j =k Hjmj(t) + ε(t))W) +2tr(cov(mk(t), W T ( X j =k Hjmj(t) + ε(t)))), where the first term is the underlying signal strength at rk and the last two terms are the contributions of other locations and background noises to the estimated strength of the signal at rk. Therefore, minimizing the trace of the projected covariance of the data with respect to W is equivalent to minimizing the the contributions of other locations and background noises to estimating the true signal strength at rk. The further mathematical details can be found in Sekihara and Nagarajan (2008). As pointed out before, in practice, we often have the baseline noise data. Performing the above projection procedure on the noise data under the assumption that the noise covariance matrix is approximately σ2 0In, we obtain the optimal trace of the covariance matrix of the projected noise at rk, σ2 0tr([HT k Hk] 1). This implies that the above neuronal activity index is a signal-to-noise ratio (SNR) at location rk. Therefore, the map generated in the mapping step is a SNR map. A similar formula can be derived under a general model of the noise covariance. However, to avoid highdimensional effects on estimating sensor covariance matrices, we often employed a diagonal noise covariance model even when the true one is not diagonal. Both theoretical and empirical studies have suggested that the vector-beamforming can provide excellent performance given a sufficient number of observations (e.g., Sekihara et al., 2004; Brookes et al., 2008; Quraan et al., 2011). However, the existing theoretical analyses have been limited to simple cases, where no more than two sources are allowed in the model and the theoretical sensor covariance is assumed known. In limited data scenarios the estimated sensor covariance may possess considerable variation and thus deteriorate the performance of localization. Empirical studies have also demonstrated that the sampling window and rate are generally required to increase as the number of spatial sensors increases. For example, when using the sample covariance matrix to estimate the sensor covariance matrix, the number of statistically independent data records should be three or more times the number of sensors in order to obtain statistically stable source location estimates (e.g., Rodr ıguez-Rivera et al., 2006). Consequently, the potential advantages of having a large number of sensors are offset by the requirement for increased sampling window and rate. Therefore, it is important to develop a general framework for users to examine the extent of effects to which the spatial dimension (i.e., the lead field matrix) and the temporal dimension (i.e., the temporal correlations of sensor measurements) of MEG on the accuracy of source localization. Furthermore, most brain activities are conducted by neural networks which consist of multiple sources. For example, in the so-called evoked median-nerve MEG response study, scientists have found the relatively large number of neuronal sources activated in a relatively short period of time by the median-nerve stimulation with typical repetition rates, which challenges covariance-based analysis techniques such as beamformer due to source cancellations (Huang et al., 2004). We need to understand how the accuracy Zhang and Liu of localization is affected by source cancellations both theoretically and empirically. In particular, we need to address the fundamental questions of whether the neuronal activity map can reveal the true sources when the number of sensors and the width of the sampling window are large enough and of how much multiple source cancellation effects are reduced by increasing spatial and temporal dimensions of MEG. The goal of the present study is to demonstrate at both theoretical and empirical levels the behavior of a class of vector-beamforming techniques which includes the standard vectorbeamformer as a special example. These beamformers are based on thresholding the sample sensor covariance matrix. By thresholding, we aim at reducing the noise level in the sample sensor covariance. We provide an asymptotic theory on these beamformers when the sensor covariance matrix is consistently estimated and when multiple sources exist. We show that the estimated source power is consistent when multiple sources are asymptotically separable in terms of a lead field distance. We further assess the performance of the proposed procedure by both simulations and real data analyses. The paper is organized as follows. The details of the proposed procedures are given in Section 2. The asymptotic analysis is provided in Section 3. Other covariance estimatorbased beamformers are introduced in Section 4. The simulation studies on these beamformers and an application to face-perception data are conducted in Section 5. The discussion and conclusion are made in Section 6. The proofs of the theorems and corollaries are deferred to Section 7. Throughout the paper, let ||A|| denote the operator norm of matrix A. For a sequence of matrix An, we mean by An = O(1) that ||An|| is bounded and by An = o(1) that ||An|| = o(1). Similarly, we define the notations Op and op for a sequence of random matrices An. For non-negative matrices A and B, we say A < B if a T Aa < a T Ba for any a with ||a|| = 1. We say that random matrix An is asymptotically larger than random matrix Bn in probability if min||a||=1 a T (An Bn)a is asymptotically bounded below from zero in probability. 2. Methodology Suppose that the sensor measurements (Y(tj) : 1 j J) are weakly stationary timecourses observed from n sensors. We want to identify a small set of non-null sources that underpin these observations. To this end, we introduce a family of vector-beamformers based on thresholding sensor covariance as follows. 2.1 Thresholding the sensor covariance matrix The sensor covariance matrix of Y(t), C can be estimated by the sample covariance matrix ˆC = (ˆcij) = 1 j=1 Y(tj)Y(tj)T Y YT , where Y is the sample mean of (Y(tj) : 1 j J). It is well-known that the sample covariance estimator can breakdown when the dimension n is large (Bickel and Levina, 2008). In the statistical literature (Bickel and Levina, 2008), various sparse estimation procedures have been proposed to fix the sample covariance, including the following thresholded LCMV Beamforming estimator: ˆC(τn J) = (ˆcij(τn J)) with ˆcij(τn J) = ˆcij I(|ˆcij| τn J), where τn J is a varying constant in n and J. As with the i.i.d. case (Bickel and Levina, 2008), the above thresholded estimator will be shown to converges to positive definite limit with probability tending to 1 in the Lemma 7.2 in Section 7 below. Although the thresholded estimator has good theoretical properties, it may not be always positive definite when the sample size is finite or when sensors are spatially too close to each other. To tackle the issue, we assume that ˆC(τn J) has the eigendecomposition ˆC(τn J) = Pn k=1 ˆλkv T k vk and then a positive semidefinite estimator can be obtained by setting these negative eigenvalues to zeros. We further shrinkage the covariance matrix estimator by artificially adding ϵ0In to it in our implementation, where we choose ϵ0 to be a tuning constant which is equal to or slightly larger than the maximum eigenvalue of the noise covariance matrix. We will show in the following sections that adding ϵ0In to the thresholded covariance matrix does not affect the consistency of the neuronal activity index. 2.2 Beamforming As before, let Σk denote the covariance matrix of the moment mk(t) at the location rk. Based on the thresholded sensor covariance estimator ˆC(τn J), we estimate Σk, 1 k p and create a neuronal activity map in the following two steps. In the projection step, for 1 k p, we search for an n 3 weight matrix ˆWk which attains the minimum trace of W T ˆC(τn J)W subject to W T Hk = I3. When ˆC(τn J) is invertible, it follows from Van Veen et al. (1997) that ˆWk = ˆC(τn J) 1Hk h HT k ˆC(τn J) 1Hk i 1 with the resulting moment covariance matrix and trace estimators ˆΣk = h HT k ˆC 1(τn J)Hk i 1 , ˆSk = tr h HT k ˆC(τn J) 1Hk i 1 respectively. In the mapping step, we calculate the so-called neuronal activity index NAI(rk) = ˆSk/ σ2 0tr HT k Hk 1 , creating a brain activity map, where σ2 0 is estimated from baseline data (i.e., called prestimulus data in the next subsection). One of the underlying sources can be then estimated by the global peak on the map with the associated latent time-course estimated by projecting the data along the optimal weighting vector. The multiple sources can also be identified by grouping the local peaks on the transverse slices of the brain. 2.3 Choosing the thresholding level In practice, the MEG imaging is often run on a subject first without stimulus and then with stimulus. This allows us to calculate the sample covariance ˆC for the stimulus data Zhang and Liu as well as the sample covariance ˆC0 for the pre-stimulus data. The latter can provide an estimator of the background noise level. In the next section, we will show that the convergence rate of the thresholded sample covariance is O( p log(n)/J). In light of this, we set τn J = c0ˆσ2 0 p log(n)/J with a tuning constant c0 and threshold ˆC by τn J, where ˆσ2 0 is the minimum diagonal element in ˆC0 and c0 is a tuning constant. Note that, when c0 = 0, the proposed procedure reduces to the standard vector-beamformer implemented in the software Field Trip (Oostenveld et al., 2010). For each value of c0, we apply the proposed procedure to the data and calculate the maximum neuronal activity index NAIc0 = max{NAI(r) : r is running over the grid}. (2.3) In simulations, we will show that c0 D0 = {0, 0.5, 1, 1.5, 2} has covered its useful range. Our simulations also suggests that there is an optimal value of c0, which depends on several factors including the strengths of signals and source interferences. To exploit these two factors, we choose c0 in which NAIc0 attains maximum or minimum, resulting in two procedures called ma and mi respectively. By choosing c0, the procedure ma intends to increase the maximum SNR value, while the procedure mi tries to reduce source interferences. The simulation studies in Section 5 suggest that mi can perform better than ma when sources are correlated. 2.4 Two sets of stimuli Suppose now that MEG measurements (Y(1)(t)) and (Y(2)(t)) are made under two different sets of stimuli and pre-stimuli with the associated neuronal activity indices denoted by NAI(1)(rk) and NAI(2)(rk) respectively. The previous strategy for selecting the tuning constant c0 can be adopted here when we calculate these indices. To identify source locations that respond to the change of stimulus set, we calculate a log-contrast log(NAI(1)(rk)/NAI(2)(rk)) between the two sets of stimuli at location rk, 1 k p, creating a log-contrast map. The resulting log-contrast map is equivalent to the map based on index ratio NAI(1)(rk)/NAI(2)(rk), which was often seen in the literature (e.g., Hillebrand et al., 2005). We further take the global peak of the log-contrast as the maximum location estimator for a source location that contributes to the difference between the two sets of MEG measurements. In this section, we develop a theory on the consistency as well as the convergence rate of the hard thresholding-based beamformer estimator under regularity conditions. In particular, we show that the consistency holds true under regularity conditions if we let the hard threshold τn J = A p log(n)/J with constant A. This provides a theoretical basis for using the proposed procedures ma and mi. Without loss of generality, we assume that the first q p moment vectors are of nonzero covariance matrices Σk, 1 k q, where q is unknown and often much smaller than p in practice. For the simplicity of mathematical derivations, we also assume that Σk does LCMV Beamforming not grow with the number of sensors n. Our task is to identify the unknown true model k=1 Hkmk(t) + ε(t), (3.4) from the working model (1.1) by using the proposed procedure, where the unknown moments mk(t), 1 k q are of non-zero powers tr(Σk), 1 k q. To establish a theory for the proposed procedures, we assume that (A1): Both the moment vectors (mk(t) : 1 k q) and the white noise process (ε(t)) are stationary with zero means and temporally uncorrelated with each other. Also, mk(t) is temporally uncorrelated with mj(t) for k = j. Under Condition (A1), the sensor covariance matrix of Y(t), C can be expressed in the form k=1 HkΣk HT k + σ2 0In. As pointed out by Sekihara and Nagarajan (2008, Chapter 9), Condition (A1) is one of fundamental assumptions in the vector-beamforming. However, source activities in the brain are inevitably correlated to some degree, and in strict sense, (A1) cannot be satisfied. The theoretical influence of temporally correlated sources has been investigated by Sekihara and Nagarajan (2008, Chapter 9). The equation (9.3) in Sekihara and Nagarajan (2008, Chapter 9) implies that the influence can be ignored if the partial correlations between sources are close to zeros in order of o(1/n) when the number of sensors n is sufficiently large. Note that although in practice the number of sensors is limited to a few hundreds, we still ideally let n tend to infinity to identify potential spatial factors that affect the performance of a vector-beamformer. In the next section, by using simulations, we will demonstrate that the source correlations can mask some true sources. To show the consistency of the estimators ˆΣk and Sk, we need more notations and condition as follows. Let Hk denote the lead field matrix at the location rk. For the simplicity of the technical derivations later, we further assume that the lead field matrices satisfy the condition that for any location rk, HT k Hk/n G in terms of the operator norm as n tends infinity, where G is a 3 3 positive definite matrix. Under the above condition, we can find a positive definite matrix Qk satisfying that HT k Hk = n Qk QT k and Q 1 k HT k Hk Q T k = n I3 when n is large enough, where I3 is an identity matrix. Letting H k = Hk Q T k , m k(t) = QT k mk and Σ k = QT k Σk Qk, we reparametrize the model (1.1) as follows: k=1 H km k + ε(t) with the covariance matrix C = Pp k=1 H kΣ k H T k + σ2 0In. Then, under the reparametrized model, the estimators ˆΣ k = h H T k ˆC(τn J) 1H k i 1 = h Q 1 k HT k ˆC(τn J) 1Hk Q T k i 1 = QT k ˆΣk Qk. ˆSk = tr(Q T k ˆΣ k Q 1 k ). Zhang and Liu Consequently, ˆΣ k is consistent with Σ k if and only if ˆΣk is consistent with Σk. Therefore, without loss of generality, hereinafter we assume that (A2): HT k Hk = n I3 for any location rk. We process the remaining analysis in two stages: In the first stage, we develop an asymptotic theory for the proposed vector-beamformers when the sensor covariance matrix C is known. The sensor covariance matrix can be assumed known if the width of the sampling window can be arbitrarily large. In the second stage, we extend the theory to the case where C is estimated by ˆC(τn J). 3.1 Beamformer analysis when C is known We begin with introducing some more notations. For any locations rx and ry, let Hx and Hy denote their lead field matrices. Define the lead field coherent matrix by ρxy = ρ(rx, ry) = HT x Hy/n. Note that ρxy +ρyx = I3 (Hx Hy)T (Hx Hy)/(2n). Therefore, I3 (ρxy +ρyx) indicates how close rx is to ry. In general, the partial coherence factor matrices (or called partial correlation matrices) ayx|k, 1 k q are defined iteratively by the so-called sweep operation (Goodnight, 1979) as follows: ayx|1 = σ 2 0 ρ(ry, r1, rx) = σ 2 0 (ρ(ry, rx) ρ(ry, r1)ρ(r1, rx)) , ayx|(k+1) = ayx|k ay(k+1)|k a(k+1)(k+1)|k 1 a(k+1)x|k, 1 k q 1. For example, we have σ2 0ayx|1 = ρyx ρy1ρ1x, σ2 0a22|1 = I3 ρT 12ρ12, σ2 0a33|2 = I3 ρT 13ρ13 ρ23 ρT 12ρ13 T I3 ρT 12ρ12 1 ρ23 ρT 12ρ13 . Note that σ2 0a(k+1)(k+1)|k gauges the partial variability of rk+1 given the previous r ks while σ2 0ayx|(k+1) shows the partial coherence between rx and ry given {r1, ..., rk+1}. We expect that ayx|(k+1) will be small if ry and rx are spatially far away from each other. We define byx|k, 1 k q, by letting byx|1 = ρy1Σ 1 1 ρ1x and byx|k = byx|(k 1) byk|(k 1) akk|(k 1) 1 akx|(k 1) ayk|(k 1) akk|(k 1) 1 bkx|(k 1) +ayk|(k 1) akk|(k 1) 1 Σ 1 k + bkk|(k 1) akk|(k 1) 1 akx|(k 1). We also define cjj|k, 1 j k q by ( Σ 1 k akk|(k 1) 1 Σ 1 k , j = k cjj|(k 1) bjk|(k 1) akk|(k 1) 1 b T jk|(k 1), 1 j k 1. Let anq = n min1 k q 1 ||a(k+1)(k+1)|k||, and let km = 0 if anq and km = min{1 k q 1 : n||a(k+1)(k+1)|k|| = O(1)} if anq = O(1). Let dx|q = max2 k q ||akx|(k 1)a 1 kk|(k 1)||, which measures the maximum absolute partial correlation among q sources by using their lead field matrix. As the lead field matrix measures the unit outputs of sources recorded by sensors, the maximum absolute partial correlation may increase when the number of sensors n increases. In the following theorem, for any location rx of interest, the condition that LCMV Beamforming dx|q = O(1) (i.e., the maximum absolute partial correlation will be bounded) is imposed on the lead field matrix. The condition is used to ensure the coherence stability for the grid approximation to the lead field. Our numerical experience suggests that the condition roughly holds when the underlying sources are asymptotically not close to each other. See the discussion in Section 7. The following theorem shows when the source covariance estimator is consistent and when it is not. Theorem 1 Under Conditions (A1) (A2) and C is known, we have: (1) If anq = O(1) and max1 k q dk|q = O(1), then the estimated source covariance at rkm+1 Hkm+1T C 1Hkm+1 1 is asymptotically larger than Σkm+1. (2) If anq , then for 1 j q, the estimated source covariance at rj admits HT j C 1Hj 1 = Σj + 1 nΣjcjj|qΣj + O(a 2 nq ), provided max1 k q dk|q = O(1), where ||Σjcjj|qΣj/n|| = O(a 1 nq ) as n . (3) If anq , then for rx {r1, ..., rq}, the estimated source covariance at rx admits HT x C 1Hx 1 = 1 na 1 xx|q 1 n2 a 1 xx|qbxx|qa 1 xx|q + O(a 3 nq ), provided max1 j q dj|q = O(1), ||naxx|q|| , and dx|q = O(1) as n tends to infinity, where bxx|q = O(1) as n . The following lemma shows when the source power estimator is consistent. Corollary 2 Under Condition (A1) (A2), we have: (1) If anq = O(1) and max1 k q dk|q = O(1), then the estimated source power at rkm+1 tr Hkm+1T C 1Hkm+1 1 is asymptotically larger than tr (Σkm+1). (2) If anq , then for 1 j q, the estimated source power at rj admits tr HT j C 1Hj 1 = tr(Σj) + 1 ntr(Σjcjj|qΣj) + O(a 2 nq ), provided max1 k q dk|q = O(1), where ||Σjcjj|qΣj/n|| = O(a 1 nq ) as n . (3) If anq , then for rx {r1, ..., rq}, the estimated source power at rx admits tr HT x C 1Hx 1 = 1 ntr(a 1 xx|q) 1 n2 tr(a 1 xx|qbxx|qa 1 xx|q) + O(a 3 nq ), provided max1 j q dj|q = O(1), ||naxx|q|| , and dx|q = O(1) as n tends to infinity, where bxx|q = O(1) as n . Zhang and Liu Remark 3 It follows from the definition that cjj|q is proportional to σ2 0, which implies the convergence rate of the neuronal activity index is of order O(σ2 0/(σ2 0anq)), where σ2 0anq is independent of σ2 0. Therefore, the effect of adding ϵ0In to C on the above convergence rate is increasing or decreasing the rate by the amount of O(ϵ0/((σ2 0 + ϵ0)anq)). In particular, adding ϵ0In to C does not affect the consistency of the neuronal activity index if anq tends infinity. Remark 4 From the proof in Section 7, we can see that if we relax the coherence stability condition max1 k q dk|q = O(1) to max1 k q dk|q = O(log(n)), then the convergence rates in the theorem will be reduced by a factor of log(n). Remark 5 If there are MEG measurements made under two different sets of stimuli and pre-stimuli, we let C(1) = Pp k=1 HT k Σ(1) k Hk + σ2 01In and C(2) = Pp k=1 HT k Σ(2) k Hk + σ2 02In be the corresponding sensor covariance matrices. We perform the proposed beamformers on C(1) and C(2) respectively. Then, under certain conditions, Theorem 1 can be extended to this setting. When rk is a source location for both sets of stimuli, the log-contrast tends to the true one as n ; when rk is a source for stimulus set 1 but not for stimulus set 2, the log-contrast tends to infinite; when rk is a source location for stimulus set 2 but not for stimulus set 1, the log-contrast tends to ; when rj is neither a source for stimulus set 1 nor a source for stimulus 2, the log-contrast tends to a finite value depending on the associated values of axx|q. The details are omitted here. 3.2 Beamformer analysis when C is estimated We now estimate the sensor covariance matrix by using the sensor observations over J time instants. Following Bickel and Levina (2008) and Fan et al. (2011), we establish the asymptotic theory for the resulting beamformer estimators when both n and J are tending to infinity. In addition to Conditions (A1) and (A2), we need the following two conditions for conducting the asymptotic analysis above. The first one is imposed to regularize the tail behavior of the sensor processes. (A3): There exist positive constants κ1 and τ1 such that for any u > 0 and all t, max 1 i n P(||Yi(t)|| > u) exp(1 τ1uκ1) and max1 i n E||Yi(t)||2 < + , where the noise covariance matrix is σ2 0In and || || is the L2 norm. Note that Condition (A3) holds if Y(t) is a multivariate normal. In the second additional condition, we assume that the sensor processes are strong mixing. Let F0 and F k denote the σ-algebras generated by {Y(t) : t 0} and {Y(t) : t k} respectively. Define the mixing coefficient α(k) = sup A F0 ,B F k |P(A)P(B) P(AB)|. The mixing coefficient α(k) quantifies the degree of the temporal dependence of the process {Y(t)} at lag k. We assume that α(k) is decreasing exponentially fast as lag k is increasing. LCMV Beamforming (A4): There exist positive constants κ2 and τ2 such that α(k) exp( τ2kκ2). Condition (A4) is a commonly used assumption for studying asymptotic behavior of time series. For a constant A, let τn J = A p log(n)/J. As before, let Yi be the sample mean of the i-th sensor and j=1 (Yi(tj) Yi)(Yk(tj) Yk), ˆC(τn J) = (ˆcik I(ˆcik τn J)), where I( ) is the indicator. We are now in position to generalize Theorem 1 to the case where the sensor covariance is estimated by the thresholded covariance estimator. Theorem 6 Under Conditions (A1) (A4) and assuming that n2p log(n)/J = o(1) as n and J tend to infinity, we have: (1) If anq = O(1) and max1 k q dk|q = O(1), then as n and J tend to infinity, the estimated source covariance at rkm+1 ˆΣkm+1 is asymptotically larger than Σkm+1 in probability. (2) If anq , then as n and J tend to infinity, for 1 j q, the estimated source covariance at rj admits ˆΣj = Σj + 1 nΣjcjj|qΣj + Op(a 2 nq + n2p provided max1 k q dk|q = O(1), where ||Σjcjj|qΣj/n|| = O(a 1 nq ) as n . (3) If anq , then as n and J tend to infinity, for rx {r1, ..., rq}, the estimated source covariance at rx admits ˆΣx = 1 na 1 xx|q 1 n2 a 1 xx|qbxx|qa 1 xx|q + O(a 3 nq + n2p provided max1 j q dj|q = O(1), ||naxx|q|| , and dx|q = O(1) as n tends to infinity, where bxx|q = O(1) as n . Corollary 7 Under Conditions (A1) (A4) and assuming that n2p log(n)/J = o(1) as n and J tend to infinity, we have: (1) If anq = O(1), max1 k q dk|q = O(1), as n and J tend to infinity, the estimated source power at rkm+1, ˆSkm+1 is asymptotically larger than tr (Σkm+1). (2) If anq , then as n and J tend to infinity, for 1 j q, the estimated source power at rj admits ˆSj = tr(Σj) + 1 ntr(Σjcjj|qΣj) + O(a 2 nq + n2p provided max1 k q dk|q = O(1), where ||Σjcjj|qΣj/n|| = O(a 1 nq ) as n . Zhang and Liu (3) If anq , then as n and J tend to infinity, for rx {r1, ..., rq}, the estimated source power at rx admits ˆSx = 1 ntr(a 1 xx|q) 1 n2 tr(a 1 xx|qbxx|qa 1 xx|q) + O(a 3 nq + n2p provided max1 j q dj|q = O(1), ||naxx|q|| , and dx|q = O(1) as n tends to infinity, where bxx|q = O(1) as n . Remark 8 Theorem 6 indicates the convergence rate of the vector-beamformer estimation is much slower than the empirical rate suggested by Rodr ıguez-Rivera et al. (2006). However, the result is in agreement with an empirical result of Brookes et al. (2008). In fact, using their heuristic arguments, we can show that the error of the power estimation at location rx is determined by the factor Hx( ˆC(τn J) 1 C 1)Hx, which has a rate of n2p log(n)/J. Theorem 6 can be also extended to the scenarios where MEG data are obtained under two different sets of stimuli. Remark 9 From the proof of Theorem 6, we can see that the thresholded covariance is still consistent with the true C even when the underlying sources are correlated. 4. Other covariance estimator-based beamformers There are various ways to estimate the sensor covariance matrix. Each can be used to construct a beamformer. These covariance estimators can be roughly divided into two categories, namely global shrinkage-based methods and elementwise thresholding-based methods. In shrinkage-based settings, the sample covariance is shrinking toward a target structure (for example, a diagonal matrix). The so-called optimal shrinkage estimator belongs to this category (Ledoit and Wolf, 2004). In thresholding-based settings, an elementwise thresholding is applied to the sample covariance estimator. Examples of these approaches include hard thresholding, generalized thresholding and adaptive thresholding (Bickel and Levina, 2008; Rothman et al., 2009; Cai and Liu, 2011). Here, we focus on the following three methods recommended by the above authors. The optimal shrinkage covariance matrix is defined by ˆCopt = b2 n d2n µn In + d2 n b2 n d2n ˆC, µn = D ˆC, In E , d2 n = D ˆC µn In, ˆC µn In E , b2 n = 1 J2 D Yj YT j ˆC, Yj YT j ˆC E , b2 n = min( b2 n, d2 n), and the operator < A, B >= tr(ABT )/n for any n n matrices A and B. The idea behind the above estimator is to find the optimal weighted average of the sample covariance matrix ˆC and the identity matrix via minimizing the expected squared loss. Under certain LCMV Beamforming conditions ˆCopt converges to the true covariance C as n tends infinity, implying that ˆCopt can be degenerate if C is degenerate (Ledoit and Wolf, 2004). As before, we tackle the issue by adding ϵ0In to ˆCopt, where ϵ0 is determined by the maximum eigenvalue of the pre-stimulus sample covariance matrix. The beamformer based on the above covariance estimator is denoted as sh. A family of generalized thresholding-based covariance estimators indexed by tuning constants c0 0 and δ0 > 0 can be defined by replacing the hard thresholding in Subsection 2.1 with the generalized thresholding, i.e., ˆCg = (g(ˆcij)) with g(ˆcij) = ˆcij(1 (τn J/|ˆcij|)δ0), where τn J = c0ˆσ2 0 p log(n)/J and ˆσ2 0 is estimated from a baseline sample. Following the suggestion of Rothman et al. (2009), we choose δ0 = 4. The same maximum/minimum strategy as in Subsection 2.3 can be adapted to choose the tuning constant c0 when we use the above estimator to construct a beamformer. The corresponding beamformers are denoted by gmax and gmin respectively. Similarly, an adaptive thresholding estimator can be introduced by replacing the above τn J in the g function by λij = 2 q ˆθij log(n)/J, where ˆθij is the estimated variance of the (i, j)-th entry ˆcij and is defined by k=1 [(Yik Yi)(Yjk Yj) ˆcij]2 and Yi and Yj are the sample means of the i-th and the j-th sensors. See Cai and Liu (2011). The corresponding beamformer is denoted by adp. 5. Numerical results In this section, we compare the proposed procedures to the standard vector-beamformer (with the tuning c0 = 0) and to the other covariance estimator-based beamformers in terms of localization bias by simulation studies and real data analyses. Here, for any estimator ˆr of a source location r, the localization bias |ˆr r| is the L1 distance between ˆr and r. The spatial correlation ρmax between locations r1 and r2 is measured by the maximum correlation between the projected lead field vectors at r1 and r2: ρmax(r1, r2) = max ||η1||=1,||η2||=1 (l(r1)η1)T l(r1)η1 ||l(r1)η1||| |l(r2)η2|| By simulations, we attempted to answer the following questions: Has the vector-beamformer been improved by using the thresholded covariance estimator? To what extent will the performance of the proposed beamformer procedure deteriorate by source interferences (or source cancellations) and source spatial correlations? Can the proposed beamformers ma and mi be superior to the other covariance estimator-based beamformers? Zhang and Liu 5.1 Simulated data We started with specifying the following two head models (Sarvas, 1987). The simple head model that uses a homogeneous sphere in simulating the magnetic fields emanating from current electric dipole neuronal activity possesses the advantage that the lead field matrix can be calculated analytically. However, with more realistic head models, the numerical approximations such as a finite element method have to be used when we calculate the lead field matrix. Here, we considered both of them: the simple one is a spherical volume conductor with 10cm radius from the origin and with 91 sensors, created by using the software Field Trip (Oostenveld et al., 2010), and the realistic one is a single shell head model by using the magnetic resonance imaging (MRI) scan of a human brain provided by Henson et al. (2011). We then discretized the inside brain space into a 3D-grid of resolution 1 cm. This yielded a grid with 2222 points for the simple model and 1487 points for the realistic model. The grids was further sliced into 10 and 14 transverse layers along the z-axis of the brain respectively. We put two non-null sources at r1 and r2 or three sources at r1, r2 and r3 respectively, where two sources {r1, r2} are equal to {(3, 1, 4)T , ( 5, 2, 6)T } cm or {( 5, 5, 6)T , ( 6, 2, 5)T } cm, and three sources {r1, r2, r3} are equal to {(3, 1, 4)T , ( 5, 2, 6)T , (5, 5, 6)T } in the Subject Coordinate System (SCS/CTF). Note that the second set of source locations was obtained in our real data analyses which will be presented later. These sources were located in the region of the parietal and occipital lobes, where visual, auditory and touch information is processed. We considered two types of sources in the brain: evoked responses that are phase-locked to the stimulus and induced responses that are not. The induced responses often have oscillatory patterns. Combining these sources with the two head models, we had the following four scenarios: Scenario 1: For the simple head model, we put two oscillatory sources at locations r1 = (3, 1, 4)T and r2 = ( 5, 2, 6)T with time-courses mk(t) = ηk cos(20tπ), k = 1, 2, respectively, where η1 = (10, 1, 1)T and η2 = (8, 0, 0)T . We considered two values of the signal-to-noise-ratio (SNR): 0.04 and 1/0.64 = 1.5625. Scenario 2: For the simple head model, we put the above oscillatory sources at locations r1 = ( 5, 5, 6)T and r2 = ( 6, 2, 5)T . We also considered two values of the SNR: 0.04 and 1/0.64 = 1.5625. Scenario 3: For the realistic head model, we put the following evoked response sources at locations r1 = (3, 1, 4)T and r2 = ( 5, 2, 6)T with moments (i.e., timecourses) mk(t) = αk exp( (t τk1)2/ω2 k) sin(fk2π(t τk2)), k = 1, 2, respectively, where α1 = (5, 0, 0)T , α2 = (20, 0, 0)T , τ11 = 0.239, τ12 = 0.139, τ21 = 0.199, τ22 = 0.139, f1 = 4.75, f2 = 6.25, and ω1 = ω2 = 0.067. We considered three values of the SNR: 1/0.352 = 8.16, 1/0.42 = 6.25, 1/0.52 = 4. LCMV Beamforming Scenario 4: For the realistic head model, we put the following evoked response sources at locations r1 = ( 5, 5, 6)T and r2 = ( 6, 2, 5)T with moments (i.e., timecourses) mk(t) = αk exp( (t τk1)2/ω2 k) sin(fk2π(t τk2)), k = 1, 2, respectively, where α1 = (2, 0, 0)T , α2 = (18, 0, 0)T , τ11 = 0.439, τ12 = 0.139, τ21 = 0.399, τ22 = 0.139, f1 = 6, f2 = 9, and ω1 = ω2 = 2. We considered three values of the SNR: 1/0.72 = 2.04, 1/0.762 = 1.73, 1/0.782 = 1.64. Scenario 5: We added another evoked response source at location r3 = (5, 5, 6)T to the model in Scenario 3 with moment m3(t) = α3 exp( (t τ31)2/ω2 3) sin(f32π(t τ32)), where α3 = (2.5, 0.25, 0.25), τ31 = 0.1, τ32 = 0.139, f3 = 1.25, and w3 = 0.067. The three source locations are highly spatially correlated with the pairwise spatial correlations ρ(r1, r2) = 0.7289, ρ(r1, r3) = 0.7935, and ρ(r2, r3) = 0.5924. We considered the same SNR values as in Scenario 3. The pair sources mk(t), k = 1, 2 for the first four scenarios and the treble sources mk(t), k = 1, 2, 3 for Scenario 5 are plotted respectively in Figure 1. By Scenarios 1 and 2, we compared the proposed procedure to the standard vector-beamformer (with c0 = 0) and to the other estimator-based beamformer, when there existed two highly correlated oscillatory sources (they have the same frequency and phase, but with slightly different amplitudes). By Scenarios 3 and 4, we tested these beamformers when there existed two unbalanced evoked response (or slightly dumped-oscillatory) sources. By Scenario 5, we assessed these beamformers when there were three spatially correlated source locations. In each scenario, with time-window width 1 and sample rate J, we sampled 30 data sets of Y(t) from the model k=1 Hkmk(t) + ε(t), (5.5) where in Scenarios 1 4, mk(t), k = 1, 2 are non-null time-courses at the two locations and mk(t), 3 k p are null time-courses at other grid points, while in Scenario 5, mk(t), k = 1, 2, 3 are non-null time-courses at the three locations and mk(t), 4 k p are null time-courses at other grid points. As before, {ε(t)} is a white noise process with noise level σ2 0. We considered various combinations of (n, p) = (91, 2222) and (102, 1487), and J = 500, 1000, 2000, and 3000. Note that p is substantially larger than n and that the sources are sparse in the sense that there are only two or three non-null sources among p candidates. We first applied the proposed procedures ma, mi and sh to each data set. We calculated the maximum indices over the grids and the L1-biases of the maximum location estimates to two sources respectively. For each combination of (n, p, J) and the SNR, we then summarized these values in the form of a box-whisker plot as in Figures 2, 3, 4, and 5 corresponding to Scenarios 1, 2, 3, and 4 respectively. The results demonstrate that the proposed hard thresholding-based procedure mi can outperform both the conventional vector-beamformer Zhang and Liu 0 0.2 0.4 0.6 0.8 1 20 Sources at (3, 1,4) and ( 5,2,6) 0 0.2 0.4 0.6 0.8 1 20 Sources at ( 5,5,6) and ( 6, 2,5) 0 0.2 0.4 0.6 0.8 1 20 Sources at (3, 1,4) and ( 5,2,6) 0 0.2 0.4 0.6 0.8 1 20 Sources at ( 5,5,6) and ( 6, 2,5) 0 0.2 0.4 0.6 0.8 1 20 Sources at (3, 1,4),( 5,2,6) and (5,5,6) Figure 1: The amplitude plots of mk(t), k = 1, 2 for Scenarios 1 to 4 and the amplitude plots of mk(t), k = 1, 2, 3 for Scenario 5. In these plots, the blue, green and red colored curves are corresponding to the amplitudes of mk(t), k = 1, 2, 3 respectively. and the procedures ma and sh in all four scenarios, in particular when the SNR is low. We note that in several cases, the localization bias and the maximum index were degenerate to a single value with some outliers, indicating that random variations have not changed the global peak location although they have effects on local peaks on the map. The simulations also suggest that the proposed procedure may be unable to detect evoked response sources of low SNR values. The local peak box-whisker plots in these figures reveal that all the local peaks on the transverse slices are not close to the source location r1, implying that the source at r1 has been masked on the neuronal activity index-based map even when two sources have a similar power level. This may be due to source cancellations as the lead field vectors at these two locations were correlated and the sensor positions might favor the detection of r2. Finally, we note that the results are robust to the choice of J in the sense that increasing sampling frequency has only slightly reduced both the mean and standard error of localization bias. LCMV Beamforming To compare the procedures ma, mi and sh with the procedures gma, gmi and adp based on the generalized and adaptive thresholding, we again generated 30 data sets from model (5.5) for each of the above four scenarios and for each combination of (n, p) = (91, 2222) and (102, 1487), and J = 500, 1000, 2000, and 3000. We applied these procedures to each data set and calculated their localization biases respectively. As before, we displayed these biases by multiple box-whisker plots in Figures 6, 7 and 8. From these figures, we can see a dramatic improvement in localization performance of the hard thresholdingbased procedure mi over the other procedures in Scenarios 1 and 2 and a slightly better or similar performance to ma, gma, gmi, adp and sh in Scenarios 3 and 4. This is striking because the existing studies have already shown that the soft (or generalized) and adaptive thresholding-based covariance estimators can improve the hard thresholdingbased covariance estimator in terms of estimation loss. The potential explanations for this phenomena are as follows: (1) The procedure adp may lose efficiency by not using the prestimulus data. (2) The existing covariance estimators were aimed to improve the estimation accuracy by reducing the estimation loss (the distance between the estimator and the true covariance matrix) or by increasing the sensitivity and specificity in recovering sparse entries in the true covariance matrix (Rothman et al., 2009; Cai and Liu, 2011). Unfortunately, the sparsity in MEG means a sparse signal distribution, which is quite different from the entry sparsity of the sensor covariance matrix. Therefore, these estimators may be not efficient for improving the accuracy of the beamformer estimation which is related to the signal sparsity. In fact, our simulation experience suggests that besides the covariance estimation, there are other factors that can affect the performance of a beamformer such as the lead field matrix and the spatial distribution of signals in the brain. Therefore, the covariance estimator with a smaller estimation loss may not give rise to a beamformer with a lower localization bias. To assess the performances of the six procedures ma, mi, gma, gmi, adp and sh when there are more than two spatially correlated sources, we applied these procedures to the 30 data sets generated for Scenario 5. We calculated the average localization bias for each procedure and presented them in Figure 9. It can be seen from these plots that like in twosource scenarios, mi can have superior performance over the other procedures. However, compared the above result to those in Scenario 3, we can see that the source cancellation from r3 has increased the average localization bias from zero to the value of three. Note that although Theorem 2 suggests that in general the localization bias will be reduced as the sampling rate increases, it does not implies the localization bias is a monotone function of the sampling rate (or the number of time instances). In fact, from row 4 in Figure 2 and row one in Figure 9, it can be seen that the localization bias when J = 500 is smaller than when J = 1000, 2000 and 3000. A potential explanation is that in finite cases a higher sampling rate may cause a higher amount of leakage of background noises (in a neighborhood of the target location) into the neuronal activity index calculation. Finally, we notice that we also carried out simulations with the soft thresholding (δ0 = 1). The result is very similar to the case with δ0 = 4. For reasons of space, we do not report it here. Zhang and Liu 5.2 Face-perception data We applied the proposed methodology to human MEG data acquired in five sessions by Wakeman and Henson (Henson et al., 2011). In each session, 96 face trials and 50 scrambled face trials were performed on a healthy young adult subject. Each trial started with a central fixation cross (presented for a random duration of 400 to 600 ms), followed by a face or scrambled face (presented for a random duration of 800 to 1000 ms), and followed by a central circle for 1700 ms. The subject used either his/her left or right index finger to report whether he/she thought the stimulus was symmetrical or asymmetrical vertically through its center. The data were collected with a Neuromag Vector View system, containing a magnetometer and two orthogonal, planar gradiometers located at each of 102 positions within a hemispherical array situated in a light, magnetically shielded room. The sampling rate was 1100Hz. We focused our analysis on localizing non-null source positions, where neuronal activity increases for the face stimuli relative to the scrambled face stimuli. For this purpose, we normalized the subject s MRI scan to a MRI template by using the Field Trip, on which a grid CTF system of 1 cm resolution was created with 1487 points. For each session, we applied the neuroimaging software SPM8 to read and preprocess the recorded data, and to epoch and average the data generated from the face stimulus trials and the scrambled face stimulus trials respectively. This gives rise to five 306 771 data matrices: the first 220 columns for 200ms pre-stimuli and the later 551 columns for the stimuli. For each session, we calculated the sample covariance ˆC and noise covariance ˆC0 by using the stimulus data and the pre-stimulus data respectively. We estimated the baseline noise level by ˆσ2 0, the minimum diagonal element in ˆC0. We applied the beamforming procedures ma, mi, gma, gmi, adp, and sh to the face data set and the scrambled face data set respectively, obtaining the log-contrasts at each grid point. Here, if there exist the negative eigenvalues of the covariance estimators (used in ma, mi, gma, gmi, adp and sh), we set them to zeros and added ϵ0 to them to make the resulting covariance estimators positive definite, where ϵ0 was determined by the maximum eigenvalue of the noise matrix ˆC0. For each procedure, we interpolated and overlaid its log-contrasts on the structural MRI of the subject, obtaining its index map. There were no visible differences among the maps derived from ma, mi, gma, gmi and sh. The map derived from the adp slightly differed from the rest. So, we reported only the mi-based and adp-based maps below. For each session, we first identified the global peak location from each map, followed by slicing the maps through their global peak locations as shown in Figure 10. For sessions 1 4, the global peaks derived from the mi and adp were the same, which were located at ( 4, 3, 8)cm, ( 1, 6, 8)cm, ( 6, 2, 5)cm, and ( 4, 4, 6)cm respectively. However, for session 5, the global peaks derived from the mi and the adp were located at two slightly different positions, ( 4, 4, 6)cm and ( 7, 3, 6)cm. We then projected the data along the associated optimal weight directions, obtaining estimated time-courses at these global peaks. For reasons of space, we presented only these time-courses derived from the procedure mi. See Figure 12. Finally, we made 20 transverse slices along the z-axis to identify the local peaks. There were some subtle differences between the mi-based and the adp-based local peaks. For example, in session 1, the mi-based local peaks were located at (1, 5, 2) cm, (0, 1, 11) cm, (3, 2, 10) cm, (3, 4, 9) cm, ( 5, 3, 3) cm, ( 4, 3, 4) cm, ( 2, 1, 1) cm,( 4, 3, 1) cm,( 2, 1, 0) cm,( 4, 5, 5) cm,( 4, 2, 6) cm,( 5, 3, 7) cm and LCMV Beamforming ( 4, 3, 8) cm, whereas the adp-based local peaks were located at (3, 2, 2)cm, (0, 1, 11)cm, ( 4, 3, 9)cm, ( 6, 2, 1)cm, ( 4, 3, 4)cm, (2, 3, 10)cm, ( 4, 3, 1) cm, ( 1, 1, 0) cm, ( 3, 6, 3) cm, ( 4, 4, 5) cm, ( 4, 2, 6) cm, ( 5, 3, 7) cm, and ( 4, 3, 8)cm. They are not the same as shown in Figure 11. Note that the previous simulations demonstrated that the procedure mi was expected to give a more accurate localization result than did the procedure adp. Although the areas highlighted in Figures 10 and 11 were varying over sessions, they did reveal the following known regions of face perception: the occipital face area (OFA), the inferior occipital gyrus (IOG), and the superior temporal sulcus (STS), and the precuneus (PCu). Interestingly, in each session, we identified a pair of nearly symmetric sources, of which one was strongly powered while the other was weakly powered. This phenomenon occurred due to source cancellations that prevented the second source from identification as we have demonstrated in our simulation studies. The time-courses plots in Figure 12 showed the response differences under face stimuli and scrambled face stimuli during the time period 100ms 300ms. The results are consistent with recent findings in face-perception studies by using an MEG-based multiple sparse prior approach (Friston et al., 2006; Henson et al., 2011) and by other empirical approaches (e.g., Pitcher et al., 2011; Kanwisher et al., 1997). However, in the first two papers, the authors made a parametric model assumption on source temporal correlation structures and imposed a limit on the number of candidate sources in the model, whereas in our approach, the model is non-parametric and allows for arbitrary number of candidate sources. 6. Discussion and Conclusion In the present study, we have proposed a class of vector-beamformers by thresholding the sensor sample covariance matrix. The consistency and the convergence rate of the proposed vector-beamformer estimation have been proved in the presence of multiple sources. The theory has provided a basis for choosing the threshold τn J = c0σ2 0 p log(n)/J in the beamformer construction. However, it requires a number of conditions. As pointed out in Section 3, conditions (A1) (A4) are commonly used assumptions in literature for studying multiple time series (Sekihara and Nagarajan, 2008; Fan et al., 2011). We only need to validate the coherence stability condition which is new. Intuitively, the strength of correlations between sensors (therefore the absolute partial correlation) will increase when the number of sensors increases in general. Taking the face-perception data (session 1) as an example, we show how to validate it empirically by random sub-samples of the 306 sensors below. We take the first two peaks in Figure 8 as two true sources. They are located at CTF (-4,3,8) cm and (-4,-5,5) cm respectively. First, we reparametrize the lead field matrix as in Section 3. Then, for k = 1, 2, ..., 306, we randomly choose k sensors, obtaining a k 4461 sub lead field matrix for the 1487 voxels in the brain. We calculate the maximum absolute partial correlation d12(k) = max{d1|2, d2|2} between the two sources and the maximum absolute correlation dmax(k) = max dx|2 for all voxels, where x is running over these voxels. Finally, we plot d12(k), dmax(k), and log(log(k)) against k = 1, 2, ..., 306 respectively as displayed in Figure 13. As expected, the result shows that both d12(k) and dmax(k) change very slowly when the number of sensors k changes, with a rate much slower than log(log(k)). This implies that the coherence stability condition nearly holds. Zhang and Liu In real world situations, the underlying number of true sources, q needs to be estimated. The influence of q on the beamformer estimators can be measured by the lead field partial correlation coefficient anq defined in Section 3. In this paper, local peaks on transverse slices have been used to reduce the search space of sources. We can cluster the local peak values into two groups, one of which is taken as a group of potential sources. The size of the selected group gives an estimate of q. In the face-perception data, we have only presented the first two sources which are ranked higher than the remaining local peaks, because these two are of clear neurological implications. Our approach is non-parametric in the sense that we have not made any parametric assumptions on the model (1.1). However, if we are willing to assume a family of parametric models for background noises, then we can determine q via model selection criteria such as Bayesian information criterion. By theoretical and empirical studies, we have shown that due to source cancellations, the beamformer power estimator can be inconsistent if the underlying multiple sources are not well separated in terms of a lead field distance. Unlike the existing theories in the literature, the new theory is applicable to more general scenarios, where multiple sources exist and the sensor covariance matrix are estimated from the data. In the new theory, we do assume that the powers of the unknown no-null sources as well as the underlying number q are not growing with the number of sensors n. This assumption is natural to neurologists and has simplified mathematical derivations of the theory very much. However, the theory can be extended to the case where these quantities are growing with n. In the theory, we have not impose any constraint on p as we only consider local behavior of beamformers. If we want to investigate global properties of the neuronal activity map, then some constraints need to be imposed on the growth rate of p with respect to n. The performances of the proposed beamformers have further been assessed by simulations and real data analyses. We have demonstrated that thresholding the sensor covariance matrix can help reduce the source localization bias when the data have a low SNR value. We have applied the vector-beamformer to an MEG data set for identifying the active regions related to human face perception. Some excellent agreements have been found between the current results and the existing neurological facts on human face perception. Finally, we note that there are other ways to measure the contrast between two source covariances such as the information-divergence. The theory can be easily extended to this case. The details will be presented elsewhere. In this section we prove the theorems and corollaries in Section 3. To prove Theorem 1, we need the following lemma. Lemma 10 If anq as n , then we have HT j C 1 k Hj = bjj|k + cjj|k n + O(a 2 nk ), bjj|k = Σ 1 j , for 1 j k HT j1C 1 k Hj2 = cj1j2|k n + O(a 2 nk ), for 1 j1 = j2 k HT j C 1 k Hx = bjx|k + cjx|k n + O(a 2 nk ), for 1 j k, x / Rk HT y C 1 k Hx = nayx|k + byx|k + O(a 1 nk ), for x, y / Rk LCMV Beamforming where ank = n min1 j k 1 tr a(j+1)(j+1)|j , Rk = {r1, . . . , rk}, Ck = Pk j=1 HT j Σj Hj +σ2 0In, and ayx|k, byx|k and cjj|k are defined before and the other c s are defined iteratively as follows: bj1k|(k 1)Σ 1 k a 1 kk|(k 1), 1 j1 k 1, j2 = k Σ 1 k a 1 kk|(k 1)bkj2|(k 1), 1 j2 k 1, j1 = k cj1j2|(k 1) bj1k|(k 1)a 1 kk|(k 1)bkj2|(k 1), 1 j1 = j2 k 1. akk|(k 1)Σk 1 bkx|(k 1) I3 + bkk|(k 1) akk|(k 1)Σk 1 akx|(k 1) o , j = k cjx|(k 1) cjk|(k 1)a 1 kk|(k 1)akx|(k 1) bjk|(k 1)a 1 kk|(k 1)bkx|(k 1) 1 j k 1 +bjk|(k 1)a 1 kk|(k 1) Σ 1 k + bkk|(k 1) a 1 kk|(k 1)akx|(k 1). Proof Note that under the stability condition and the assumption that anq , we have byx|k = O(1), 1 k q. And for any rx in the source space, n = O(n 1), cyx|k n = O(a 1 n(k 1)), 1 y k, 2 k q. We prove the lemma by induction. For k = 1, we have C 1 1 = σ 2 0 In σ 4 0 H1(Σ 1 1 + nσ 2 0 I3) 1HT 1 , HT 1 C 1 1 H1 = nσ 2 0 I3 n2σ 4 0 (Σ 1 1 + nσ 2 0 I3) 1 I3 I3 + Σ 1 1 σ2 0 n = nσ 2 0 I3 + nΣ1σ 2 0 1 = Σ 1 1 I3 σ2 0Σ 1 1 /n + O(n 2) = Σ 1 1 Σ 1 1 σ2 0Σ 1 1 /n + O(n 2) = b11|1 + c11|1 n + O(n 2), where b11|1 = Σ 1 1 , c11|1 = σ2 0Σ 2 1 . Analogously, HT 1 C 1 1 Hx = σ 2HT 1 Hx σ 4 0 n(Σ 1 1 + nσ 2 0 I3) 1HT 1 Hx = I3 I3 + Σ 1 1 σ2 0/n 1 HT 1 Hx I3 + σ2 0 n Σ 1 1 = Σ 1 1 ρ1x Σ 1 1 σ2 0Σ 1 1 ρ1x/n + O(n 2) = b1x|1 + c1x|1 n + O(n 2), Zhang and Liu where b1x|1 = Σ 1 1 ρ1x, c11|1 = Σ 2 1 σ2 0ρ1x. HT y C 1 1 Hx = σ 2 0 HT y Hx σ 4 0 HT y H1(Σ 1 1 + nσ 2 0 I3) 1HT 1 Hx = nσ 2 0 ρyx σ 4 0 HT y H1 σ2 0 n I3 + σ2 0Σ 1 1 /n 1 HT 1 Hx = nσ 2 0 ρyx nσ 2 0 ρy1 I3 σ2 0Σ 1 1 /n ρ1x + O(n 1) = nσ 2 0 ρy1x + ρy1Σ 1 1 ρ1x + O(n 1) = nayx|1 + byx|1 + O(n 1), where ρy1x = ρyx ρy1ρ1x, ayx|1 = σ 2 0 ρy1x, byx|1 = ρy1Σ 1 1 ρ1x. This implies the lemma holds for k = 1. Assuming the lemma holds for the cases with less or equal to k sources, we show that it is also true for the case with k + 1 sources by invoking the matrix inversion formulas C 1 k+1 = C 1 k C 1 k Hk+1 Σ 1 k+1 + HT k+1C 1 k Hk+1 1 HT k+1C 1 k , (7.6) C 1 k = C 1 k+1 + C 1 k+1Hk+1Σk+1HT k+1C 1 k . The details are as follows. For 1 j k, HT j C 1 k+1Hj = HT j C 1 k Hj HT j C 1 k Hk+1 Σ 1 k+1 + HT k+1C 1 k Hk+1 1 HT k+1C 1 k Hj = bjj|k + cjj|k n + O(a 2 nk ) bj(k+1)|k + cj(k+1)|k n + O(a 2 nk ) Σ 1 k+1 + na(k+1)(k+1)|k + b(k+1)(k+1)|k + O(a 1 nk ) 1 bj(k+1)|k + cj(k+1)|k n + O(a 2 nk ) T = bjj|k + cjj|k n + O(a 2 nk ) bj(k+1)|k + O(a 1 nk (na(k+1)(k+1)|k) 1 O(a 2 n(k+1)) bj(k+1)|k + O(a 1 nk ) T = bjj|k + cjj|k nbj(k+1)|ka 1 (k+1)(k+1)|kb T j(k+1)|k + O(a 2 n(k+1)) = bjj|(k+1) + cjj|(k+1) n + O(a 2 n(k+1)). For j = k + 1, we have HT j C 1 k+1Hj = HT k+1C 1 k+1Hk+1 = HT k+1C 1 k Hk+1 I3 + Σk+1HT k+1C 1 k Hk+1 1 = na(k+1)(k+1)|k + b(k+1)(k+1)|k + O(a 1 nk ) I3 + Σk+1 na(k+1)(k+1)|k + b(k+1)(k+1)|k + O(a 1 nk ) 1 LCMV Beamforming = na(k+1)(k+1)|k + b(k+1)(k+1)|k + O(a 1 nk ) na(k+1)(k+1)|k 1 I3 + (na(k+1)(k+1)|k) 1Σ 1 k+1 + O(a 2 n(k+1)) 1 Σ 1 k+1 = Σ 1 k+1 1 nΣ 1 k+1a 1 (k+1)(k+1)|kΣ 1 k+1 + O(a 2 n(k+1)) = b(k+1)(k+1)|(k+1) + c(k+1)(k+1)|(k+1) n + O(a 2 n(k+1)). This completes the proof of the first equation in the lemma. To prove the second equation in the lemma, we let B = Σk+1na(k+1)(k+1)|k 1 + Σk+1na(k+1)(k+1)|k 1 Σk+1b(k+1)(k+1)|k Σk+1na(k+1)(k+1)|k 1 Then, when 1 j1 k, j2 = k + 1, we have HT j1C 1 k+1Hj2 = HT j1C 1 k+1Hk+1 = HT j1C 1 k Hk+1 I3 + Σk+1HT k+1C 1 k Hk+1 1 = bj1(k+1)|k + 1 ncj1(k+1)|k + O(a 2 nk ) I3 + Σk+1 na(k+1)(k+1)|k + b(k+1)(k+1)|k + O(a 1 nk 1 = bj1(k+1)|k + 1 ncj1(k+1)|k + O(a 2 nk ) Σk+1na(k+1)(k+1)|k 1 (I3 + B + O(a 2 n(k+1))) 1 Σk+1na(k+1)(k+1)|k 1 = bj1(k+1)|k Σk+1na(k+1)(k+1)|k 1 I3 + O (na(k+1)(k+1)|k) 1 Σk+1na(k+1)(k+1)|k 1 2 + O(a 2 n(k+1)) = 1 nbj1(k+1)|kΣ 1 k+1a 1 (k+1)(k+1)|k + O(a 2 n(k+1)) = cj1(k+1)|(k+1) n + O(a 2 n(k+1)). Similarly, when 1 j1 = j2 k, we have HT j1C 1 k+1Hj2 = HT j1C 1 k Hj2 HT j1C 1 k Hk+1 Σ 1 k+1 + HT k+1C 1 k Hk+1 1 HT k+1C 1 k Hj2 = 1 ncj1j2|k + O(a 2 nk ) 1 nbj1(k+1|k)a 1 (k+1)(k+1)|kb(k+1)j2|k + O(a 2 n(k+1)) = 1 ncj1j2|(k+1) + O(a 2 n(k+1)). We complete the proof of the second equation in the lemma. To prove the third equation in the lemma, we let D = na(k+1)(k+1)|kΣk+1 1 + na(k+1)(k+1)|kΣk+1 1 b(k+1)(k+1|k)Σk+1 na(k+1)(k+1)|kΣk+1 1 F = (na(k+1)(k+1)|k) 1 2 Σ 1 k+1 + b(k+1)(k+1)|k (na(k+1)(k+1)|k) 1 Zhang and Liu Then, for j = k + 1, HT j C 1 k+1Hx = I3 + HT k+1C 1 k Hk+1Σk+1 1 HT k+1C 1 k Hx = I3 + na(k+1)(k+1)|kΣk+1 + b(k+1)(k+1)|kΣk+1 + O(a 1 nk ) 1 na(k+1)x|k + b(k+1)x|k + O(a 1 nk ) = na(k+1)(k+1)|kΣk+1 1 2 (I3 + D + O(a 2 n(k+1))) 1 na(k+1)(k+1)|kΣk+1 1 na(k+1)x|k + b(k+1)x|k + O(a 1 nk ) = a(k+1)(k+1)|kΣk+1 1 2 n I3 D + O(a 2 n(k+1)) o a(k+1)(k+1)|kΣk+1 1 2 a(k+1)x|k + b(k+1)x|k/n + O(a 1 nk )/n = a(k+1)(k+1)|kΣk+1 1 a(k+1)x|k a(k+1)(k+1)|kΣk+1 1/2 D a(k+1)(k+1)|kΣk+1 1/2 a(k+1)x|k +O(a 3 n(k+1)) + 1 n a(k+1)(k+1)|kΣk+1 1 b(k+1)x|k + O(a 2 n(k+1)) = a(k+1)(k+1)|kΣk+1 1 a(k+1)x|k + 1 n a(k+1)(k+1)|kΣk+1 1 n b(k+1)x|k I3 + b(k+1)(k+1)|kΣk+1 a(k+1)(k+1)|kΣk+1 1 a(k+1)x|k o +O(a 2 n(k+1)) = b(k+1)x|(k+1) + 1 nc(k+1)x|(k+1) + O(a 2 n(k+1)). For 1 j k, we have HT j C 1 k+1Hx = HT j C 1 k Hx HT j C 1 k Hk+1 Σ 1 k+1 + HT k+1C 1 k Hk+1 1 HT k+1C 1 k Hx = bjx|k + 1 ncjx|k + O(a 2 nk ) bj(k+1)|k + 1 ncj(k+1)|k + O(a 2 nk ) Σ 1 k+1 + na(k+1)(k+1)|k + b(k+1)(k+1)|k + O(a 2 nk ) 1 na(k+1)x|k + b(k+1)x|k + O(a 1 nk ) = bjx|k + 1 ncjx|k + O(a 2 nk ) bj(k+1)|k + 1 ncj(k+1)|k + O(a 2 nk ) (na(k+1)(k+1)|k) 1/2 I3 + F + O(a 2 nk ) 1 (na(k+1)(k+1)|k) 1/2 na(k+1)x|k + b(k+1)x|k + O(a 1 nk ) = bjx|k + 1 ncjx|k + O(a 2 nk ) bj(k+1)|k + 1 ncj(k+1)|k + O(a 2 nk ) a 1 (k+1)(k+1)|ka(k+1)x|k a 1/2 (k+1)(k+1)|k Fa 1/2 (k+1)(k+1)|ka(k+1)x|k + O(a 2 nk ) (na(k+1)(k+1)|k) 1b(k+1)x|k a 1/2 (k+1)(k+1)|k Fa 1/2 (k+1)(k+1)|kb(k+1)x|k/n +O(a 2 n(k+1)) LCMV Beamforming = bjx|k + 1 ncjx|k + O(a 2 nk ) bj(k+1)|k + 1 ncj(k+1)|k + O(a 2 nk ) a 1 (k+1)(k+1)|ka(k+1)x|k 1 na 1 (k+1)(k+1)|k(Σ 1 k+1 + b(k+1)(k+1)|k) a 1 (k+1)(k+1)|ka(k+1)x|k 1 na 1 (k+1)(k+1)|kb(k+1)x|k + O(a 2 n(k+1)) = bjx|k + 1 ncjx|k bj(k+1)|ka 1 (k+1)(k+1)|ka(k+1)x|k n cj(k+1)|ka 1 (k+1)(k+1)|ka(k+1)x|k + bj(k+1)|ka 1 (k+1)(k+1)|kb(k+1)x|k bj(k+1)|ka 1 (k+1)(k+1)|k Σ 1 k+1 + b(k+1)(k+1)|k a 1 (k+1)(k+1)|ka(k+1)x|k o +O(a 2 n(k+1)) = bjx|k bj(k+1)|ka 1 (k+1)(k+1)|ka(k+1)x|k + 1 ncjx|(k+1) + O(a 2 n(k+1)). (7.7) Note that for k = j, ajx|j = ajx|(j 1) ajj|(j 1)a 1 jj|(j 1)ajx|(j 1) = 0. Assuming that for k = j + m, m > 0, the statement is true, i.e., ajx|(k+m) = 0 for all x. Then, ajx|(j+m+1) = ajx|(j+m) aj(j+m+1)|(j+m)a 1 (j+m+1)(j+m+1)|(j+m)a(j+m+1)x|(j+m) = 0. By induction, we have that ajx|k = 0 for all x, j k. This implies that and bjx|(k+1) = bjx|k bj(k+1)|ka 1 (k+1)(k+1)|ka(k+1)x|k by the definition of bjx|(k+1). Combining this with (7.7), we complete the proof of the third equation in the lemma. Finally, we turn to the last equation in the lemma. Assume that the equation holds for the case k. We show that it also holds for k + 1 below. For x, y / Rk+1 (thus x, y / Rk), by the assumption, we have HT y C 1 k Hx = nayx|k + byx|k + O(a 1 nk ). This together with (7.6) yields HT y C 1 k+1Hx = HT y C 1 k Hx HT y C 1 k Hk+1 Σ 1 k+1 + HT k+1C 1 k Hk+1 1 HT k+1C 1 k Hx = nayx|k + byx|k + O(a 1 nk ) nay(k+1)|k + by(k+1)|k + O(a 1 nk ) Σ 1 k+1 + na(k+1)(k+1)|k + b(k+1)(k+1)|k + O(a 1 nk ) 1 na(k+1)x|k + b(k+1)x|k + O(a 1 nk ) = nayx|k + byx|k + O(a 1 nk ) nay(k+1)|k + by(k+1)|k + O(a 1 nk ) Zhang and Liu (na(k+1)(k+1)|k) 1 1 n2 a 1 (k+1)(k+1)|k Σ 1 k+1 + b(k+1)(k+1)|k a 1 (k+1)(k+1)|k + O(a 3 n(k+1)) nay(k+1)|k + by(k+1)|k + O(a 1 nk ) = nayx|k + byx|k + O(a 1 nk ) n ay(k+1)|ka 1 (k+1)(k+1)|k ay(k+1)|ka 1 (k+1)(k+1)|k Σ 1 k+1 + b(k+1)(k+1)|k a 1 (k+1)(k+1)|k/n +by(k+1)|ka 1 (k+1)(k+1)|k/n + O(a 2 n(k+1)) o na(k+1)x|k + b(k+1)x|k + O(a 1 nk ) = n h ayx|k ay(k+1)|ka 1 (k+1)(k+1)|ka(k+1)x|k i + h byx|k by(k+1)|ka 1 (k+1)(k+1)|ka(k+1)x|k ay(k+1)|ka 1 (k+1)(k+1)|kb(k+1)x|k +ay(k+1)|ka 1 (k+1)(k+1)|k Σ 1 k+1 + b(k+1)(k+1)|k a 1 (k+1)(k+1)|ka(k+1)x|k i +O(a 1 n(k+1)) = nayx|(k+1) + byx|(k+1) + O(a 1 n(k+1)). The proof is completed. Proof of Theorem 1. Note that byx|1 = ρ(ry, r1)Σ 1 1 ρ(x1, x), ayx|1 = σ 2 0 (ρyx ρy1ρ1x), and both are bounded. By induction and the stability condition, it can be shown that ayx|k and byx|k are bounded for 2 k q. If anq is bounded, then there exists km such that na(km+1)(km+1)|km = O(1) and ankm = min1 j km 1 na(j+1)(j+1)|j as n tends to infinity. By Lemma 10, we have HT km+1C 1 km Hkm+1 = na(km+1)(km+1)|km + b(km+1)(km+1)|km + O(a 1 nkm), which is bounded and non-negative definite. Furthermore, there exists an orthogonal matrix Q and a diagonal matrix D = diag(d1, d2, d3) such that Σ1/2 km+1HT km+1C 1 km Hkm+1Σ1/2 km+1 = QDQT . HT km+1C 1 km+1Hkm+1 = HT km+1C 1 km Hkm+1 I3 Σ 1 km+1 + HT km+1C 1 km Hkm+1 1 HT km+1C 1 km Hkm+1 = HT km+1C 1 km Hkm+1 Σ 1 km+1 + HT km+1C 1 km Hkm+1 1 Σ 1 km+1 = HT km+1C 1 km Hkm+1Σ1/2 km+1 I3 + Σ1/2 km+1HT km+1C 1 km Hkm+1Σ1/2 km+1 1 Σ 1/2 km+1 = Σ 1/2 km+1QDQT I3 + QDQT 1 Σ 1/2 km+1 = Σ 1/2 km+1 I3 + QD 1QT 1 Σ 1/2 km+1 = Σ 1/2 km+1 Q(I3 + D 1)QT 1 Σ 1/2 km+1 = Σ 1/2 km+1Q(I3 + D 1) 1QT Σ 1/2 km+1. (7.8) LCMV Beamforming Note that Σ1/2 km+1HT km+1C 1 km Hkm+1Σ1/2 km+1 = O(1), which implies that dk 0, 1 k 3 are bounded. We can find a positive constant ϵ0 such that max1 k 3(1 + d 1 k ) 1 < (1 + ϵ0) 1 when n is large enough. Consequently, for any vector a R3 with ||a|| = 1, we have a T Σ1/2 km+1Q(I3 + D 1)QT Σ1/2 km+1a > (1 + ϵ0)a T Σ1/2 km+1QQT Σ1/2 km+1a, which shows that Σ1/2 km+1Q(I3 + D 1)QT Σ1/2 km+1 (thus HT km+1C 1 km+1Hkm+1 1 due to (7.8)) is asymptotically larger than Σkm+1(1 + ϵ0). We now consider the case where anq . For j = q, by Lemma 10, we have n = Σ 1 q naqq|(q 1) 1 Σ 1 l = O(a 1 nq ), HT q C 1 q Hq 1 = h Σ 1 q + cqq|q n + O(a 2 nq ) i 1 , = Σ1/2 q h I3 + Σ1/2 q cqq|q n Σ1/2 q + O(a 2 nq ) i 1 Σ1/2 q = Σ1/2 q h I3 Σ1/2 q cqq|q n Σ1/2 q + O(a 2 nq ) i Σ1/2 q = Σq naqq|(q 1) 1 + O(a 2 nq ) as n . For 1 j q 1, by Lemma 10, we have HT j C 1 q Hj = Σ 1 j + cjj|q n + O(a 2 nq ), where cjj|q n = O(a 1 nq ). This entails HT j C 1 q Hj 1 = Σ1/2 j nΣ1/2 j cjj|qΣ1/2 j + O(a 2 nq ) 1 Σ1/2 j nΣ1/2 j cjj|qΣ1/2 j + O(a 2 nq ) Σ1/2 j nΣjcjj|qΣj + O(a 2 nq ). For any location rx, by Lemma 10, we have HT x C 1 q Hx 1 = 1 n na 1 xx|qbxx|q + O(a 2 nq ) 1 a 1 xx|q na 1 xx|qbxx|q + O(a 2 nq ) a 1 xx|q = 1 na 1 xx|q 1 n2 a 1 xx|qbxx|qa 1 xx|q + O(a 3 nq ). The proof is completed. Proof of Corollary 2. First, let An = Hkm+1T C 1 l Hkm+1 1. If anq = O(1) and max1 k q dk|q = O(1), then by Theorem (1), there exists a positive constant ϵ0 such that Zhang and Liu min||a||=1 a T (An Σkm+1)a > ϵ0 for large n. Let a1 = (1, 0, 0)T , a2 = (0, 1, 0)T and a3 = (0, 0, 1)T . Then, we have tr(An) = tr(An k=1 aka T k ) = k=1 tr(Anaka T k ) k=1 a T k Anak > 3ϵ0 + k=1 akΣkm+1a T k k=1 tr(Σkm+1aka T k ) = 3ϵ0 + tr(Σkm+1 k=1 aka T k ) = 3ϵ0 + tr(Σkm+1), which implies tr(An) is asymptotically larger than Σkm+1. To prove Theorem 2, we need two more lemmas as follows and the following condition (A1 ) : {Y(tj) : 1 j J} is stationary and has a finite covariance matrix. Lemma 11 Under Conditions (A1 ) and (A3) (A4), if τn J = O( p log(n)/J) and nτn J = o(1) as n and J , then (i) max1 i,j n |ˆcij cij| = Op( p (ii) || ˆC(τn J) C|| = Op(mn p (iii) || ˆC(0) C|| (mn + n)τn J, where mn = max1 i n Pn j=1 I(cij = 0) n. Proof. Let κ3 = max{2(2/κ1+1/κ2) 1, (4/3)(1/κ1+1/κ2) 1/3, 1}. Then n p log(n)/J = o(1) yields (log(n))κ3/J = o(1). We adopted the techniques of Bickel and Levina (2008); Fan et al. (2011); Zhang et al. (2014) to prove it. To prove (i), we set up more notations. Let τ(t) be the so-called Dedecker-Prieur τ-mixing coefficients (Merlev ede et al., 2011, see). Let Θ(u, t) = {v > 0 : P(|y1(t)y2(t)| > v) u}, ψy(M, t) = max{min{yi(t)yj(t), M}, M}. It follows from Lemma 7 in Dedecker and Prieur (2004) that sup t Θ(u, t) b1(1 log(u))2/κ1, which, under Condition (A4), gives τ(t) b2 exp( b3tκ2). Similarly, it is derived from Remark 3 in Merlev ede et al. (2011) that sup M>0 [sup t var(ψy(M, t)) + 2 X t1>t2 |cov(ψy(M, t1), ψy(M, t2))] sup M>0 sup t var(ψy(M, t)) sup M>0 sup t var(ψy(M, t)) + 4 X 0 (sup t Θ(u))2du LCMV Beamforming Let 1/κ = 2/κ1 + 1/κ2. By Theorem 1 in Merlev ede et al. (2011), we can find positive constants dk, 1 k 5 that only depend on τ1, κ2, b2, b3 such that t=1 yi(t)yj(t) cij| u J exp (Ju)κ + exp (Ju)2 d2(1 + Jd3)) d5(log(Ju))κ Consequently, max 1 i,j n | 1 t=1 yi(t)yj(t) cij| > u n2 max 1 i,j n P t=1 yi(t)yj(t) cij| > u n2J exp (Ju)κ + n2 exp (Ju)2 d2(1 + Jd3) d5(log(Ju))κ Let u = A p log(n)/J. Then Ju = p J log(n). When both n and J tend to infinity, we have n2J exp (Ju)κ 2 log(n) + log(J) (A p (2(log(n))1 κ/2 d1 )(J log(n))κ/2 + log(J) since (log(n))1 κ/2/Jκ/2 = o(1). Similarly, if we choose A > p 2d2(d3 + 1), we have n2 exp (Ju)2 d2(1 + Jd3) = n2 exp A2J log(n) d2(1 + Jd3) d2(d3 + 1/J) log(n) = o(1). d5(log(Ju))κ Aκ(1 κ)(J log(n))κ(1 κ)/2 J log(n)))κ Zhang and Liu max 1 i,j n | 1 t=1 yi(t)yj(t) cij| > u = o(1). (7.9) Note that for u = A p log(n)/J, there exist positive constants dk, 1 k 5 so that P( max 1 i,j n | yi|| yj| > u) = P( max 1 i n | yi| > u) n max 1 i n P(| yi| > u) = n J exp (J u)κ1 + n exp (J u)2 d2(1 + Jd3) (J u)κ1(1 κ1) d5(log(Ju))κ1 since (log(n))4/(3κ1) 1/3/J = o(1) and log(n)/J = o(1). This together with (7.9) yields that for u = O( p P max 1 i,j n |ˆcij cij| > u P max 1 i,j n | 1 t=1 yi(t)yj(t) cij| > u +P max 1 i,j n | yi|| yj| > u = o(1), which implies max 1 i,j n |ˆcij cij| = Op p We turn to ˆC(τn J) in (ii). Let T1 = ||(ˆcij I(|ˆcij| > τn J) cij I(|cij| > τn J))||. We have || ˆC(τn J) C|| ||(ˆcij I(|ˆcij| > τn J) cij I(|cij| > τn J))|| + ||(cij I(|cij| τn J))|| j=1 |cij|I(|cij| τn J) T1 + τn Jmn. (7.10) Similarly, we have || ˆC(0) C|| T1 + τn Jmn + max i j=1 |ˆcij|I(|ˆcij| τn J) T1 + (mn + n)τn J. j=1 |ˆcij I(|ˆcij| > τn J) cij I(|cij| > τn J)| LCMV Beamforming j=1 |ˆcij (I(|ˆcij| > τn J, |cij| τn J) + I(|ˆcij| > τn J, |cij| > τn J)) cij (I(|cij| > τn J, |ˆcij| > τn J) + I(|cij| > τn J, |ˆcij| τn J)) | I +II +III, j=1 |ˆcij cij|I(|ˆcij| > τn J, |cij| > τn J), j=1 |ˆcij|I(|ˆcij| > τn J, |cij| τn J), III = max i J |cij|I(|cij| > τn J, |ˆcij| τn J). We bound the above three terms as follows. I max i,j |ˆcij cij| max i j=1 I(|cij| > 0) log(n)/J . (7.11) For δ > 0, using the equality in (i), we have j=1 |ˆcij cij|I(|ˆcij| > τn J, |cij| τn J) j=1 |cij|I(|cij| τn J) j=1 |ˆcij cij|I(|ˆcij| > τn J, |cij| δτn J) j=1 |ˆcij cij|I(|ˆcij| > τn J, δτn J < |cij| < τn J) + τn Jmn max i,j |ˆcij cij| j=1 I(|ˆcij| > τn J, |cij| δτn J) + mn j=1 I(|ˆcij cij| (1 δ)τn J) + mn log(n)/J)(op(1) + mn) + τn Jmn = Op(τn Jmn), (7.12) j=1 I(|ˆcij cij| (1 δ)τn J) > ϵ P max i,j |ˆcij ci,j| (1 δ)τn J Zhang and Liu j=1 (|ˆcij cij| + |ˆcij||) I(|cij| > τn J, |ˆcij| τn J) max i,j |ˆcij cij| j=1 I(|cij| > τn J) + τn J max i j=1 I(|cij| > τn J) Op(τn J)mn + τn Jmn = Op(τn Jmn). Combining this with (7.11), (7.12) and (7.10), we obtain the desired result in (ii). The proof is completed. Lemma 12 Under Conditions (A1 ) and (A3) (A4), if τn J = O( p log(n)/J) and nτn J = o(1) as n and J , then (i) || ˆC(τn J) 1 C 1|| = Op(mnτn J) and || ˆC(τn J) 2 C 2|| = Op(mnτn J), (ii) || ˆC(0) 1 C 1|| Op(τn J(mn + n)); || ˆC(0) 2 C 2|| Op(τn J(mn + n)), where mn = max1 i n Pn j=1 I(cij = 0) n. Proof. Let κ3 = max{2(2/κ1+1/κ2) 1, (4/3)(1/κ1+1/κ2) 1/3, 1}. Then n p log(n)/J = o(1) yields (log(n))κ3/J = o(1). If let λmin(C) denote the minimum eigenvalue of C, then we have that λmin(C) σ2 0. If let λmin( ˆC(τn J)) denote the minimum eigenvalue of ˆC(τn J), then it follows from Lemma 11 that λmin( ˆC(τn J)) = λmin(C) + Op(mnτn J) σ2 0 + Op(mnτn J), which is bounded below by σ2 0/2 if τn Jmn is small enough. Therefore, we have || ˆC(τn J) 1 C 1|| = || ˆC(τn J) 1(C ˆC(τn J))C 1|| || ˆC(τn J) 1||(||C ˆC(τn J)||)||C 1|| λmin( ˆC(τn J)) 1λmin(C) 1||C ˆC(τn J)|| = Op(τn Jmn). || ˆC(τn J) 2 C 2|| || ˆC(τn J) 1( ˆC(τn J) 1 C 1)|| + ||( ˆC(τn J) 1 C 1)C 1|| || ˆC(τn J) 1|||| ˆC(τn J) 1 C 1|| + || ˆC(τn J) 1 C 1||||C 1|| = (λmin( ˆC(τn J)) 1 + λmin(C) 1)|| ˆC(τn J) 1 C 1|| = Op(τn Jmn). || ˆC(0) 1 C 1|| Op(τn J(mn + n)), || ˆC(0) 2 C 2|| Op(τn J(mn + n)). LCMV Beamforming Proof of Theorem 6. Note that for any x, HT x Hx = n. We have || h HT j ˆC(τn J) 1Hj i 1 HT j C 1Hj 1 || = || h HT j ˆC(τn J) 1Hj i 1 HT j C 1Hj HT j ˆC(τn J) 1Hj HT j C 1Hj 1 || n|| h HT j ˆC(τn J) 1Hj/n i 1 ||||HT j C 1Hj/n HT j ˆC(τn J) 1Hj/n|| || HT j C 1Hj/n 1 || n|| h HT j ˆC(τn J) 1Hj/n i 1 ||||C 1 ˆC(τn J) 1|||| HT j C 1Hj/n 1 ||, which combining with Lemma 11 yields || h HT j ˆC(τn J) 1Hj i 1 HT j C 1Hj 1 || log(n)/J)|| h HT j ˆC(τn J) 1Hj i 1 |||| HT j C 1Hj 1 ||. (7.13) Let λm and ˆλm denote the smallest eigenvalues of HT j C 1Hj and HT j ˆC(τn J) 1Hj respectively. Invoking Theorem 1, (HT j C 1Hj) 1 = Σj + o(1). There exists a positive constant ϵ0 such that for large n, λm ϵ0. By the definition, there exists am R3 with ||am|| = 1, such that ˆλm = a T m HT j ˆC(τn J) 1Hjam. So |ˆλm a T m HT j C 1Hjam| = |(Hiam)T ( ˆC 1 C 1)Hiam| n|| ˆC(τn J) 1 C 1|| which implies ˆλm a T m HT j C 1Hjam Op(n2p log(n)/J) ϵ0 Op(n2p This shows that for large n, ˆλm is bounded below from zero. Consequently, we have || h HT j ˆC(τn J) 1Hj i 1 || = O(1), || HT j C 1Hj 1 || = O(1). This together with (7.13) proves that || h HT j ˆC(τn J) 1Hj i 1 HT j C 1Hj 1 || = Op(n2p The proof is completed. Proof of Corollary 7. It follows from Theorem 6 directly. The details are omitted. Acknowledgments We thank Professor Richard Henson from MRC Cognition and Brain Sciences Unit, Cambridge for sharing his MEG data with us. The software SPM8 is available at http://www.fil.ion.ucl.ac.uk/spm/software/spm8/. We also thank the Editor and three anonymous reviewers for their helpful comments. Zhang and Liu P. J. Bickel and E. Levina. Covariance regularization by thresholding. The Annals of Statistics, pages 2577 2604, 2008. A. Bolstad, B. Van Veen, and R. Nowak. Space time event sparse penalization for magneto- /electroencephalography. Neuro Image, 46(4):1066 1081, 2009. M. J. Brookes, J. Vrba, S. E. Robinson, C. M. Stevenson, A. M. Peters, G. R. Barnes, A. Hillebrand, and P. G. Morris. Optimising experimental design for meg beamformer imaging. Neuroimage, 39(4):1788 1802, 2008. T. Cai and W. Liu. Adaptive thresholding for sparse covariance matrix estimation. Journal of the American Statistical Association, 106(494):672 684, 2011. J. Dedecker and C. Prieur. Coupling for τ-dependent sequences and applications. Journal of Theoretical Probability, 17(4):861 885, 2004. J. Fan, Y. Liao, and M. Mincheva. High dimensional covariance matrix estimation in approximate factor models. Annals of statistics, 39(6):3320, 2011. K. Friston, R. Henson, C. Phillips, and J. Mattout. Bayesian estimation of evoked and induced responses. Human brain mapping, 27(9):722 735, 2006. J. H. Goodnight. A tutorial on the sweep operator. The American Statistician, 33(3): 149 158, 1979. M. H am al ainen, R. Hari, R. J. Ilmoniemi, J. Knuutila, and O. V. Lounasmaa. Magnetoencephalographytheory, instrumentation, and applications to noninvasive studies of the working human brain. Reviews of modern Physics, 65(2):413, 1993. R. N. Henson, D. G. Wakeman, V. Litvak, and K. J. Friston. A parametric empirical bayesian framework for the eeg/meg inverse problem: generative models for multi-subject and multi-modal integration. Frontiers in human neuroscience, 5, 2011. A. Hillebrand, K. D. Singh, I. E. Holliday, P. L. Furlong, and G. R. Barnes. A new approach to neuroimaging with magnetoencephalography. Human brain mapping, 25(2):199 211, 2005. M. X. Huang, J. J. Shih, R. R. Lee, D. L. Harrington, R. J. Thoma, M. P. Weisend, F. Hanlon, K. M. Paulson, T. Li, K. Martin, et al. Commonalities and differences among vectorized beamformers in electromagnetic source imaging. Brain topography, 16(3):139 158, 2004. N. Kanwisher, J. Mc Dermott, and M. M. Chun. The fusiform face area: a module in human extrastriate cortex specialized for face perception. The Journal of Neuroscience, 17(11): 4302 4311, 1997. O. Ledoit and M. Wolf. A well-conditioned estimator for large-dimensional covariance matrices. Journal of multivariate analysis, 88(2):365 411, 2004. LCMV Beamforming F. Merlev ede, M. Peligrad, and E. Rio. A bernstein type inequality and moderate deviations for weakly dependent sequences. Probability Theory and Related Fields, 151(3-4):435 474, 2011. J. C. Mosher, R. M. Leahy, and P. S. Lewis. Eeg and meg: forward solutions for inverse methods. Biomedical Engineering, IEEE Transactions on, 46(3):245 259, 1999. R. Oostenveld, P. Fries, E. Maris, and J. Schoffelen. Fieldtrip: open source software for advanced analysis of meg, eeg, and invasive electrophysiological data. Computational intelligence and neuroscience, 2011, 2010. D. Pitcher, D. D. Dilks, R. R. Saxe, C. Triantafyllou, and N. Kanwisher. Differential selectivity for dynamic versus static information in face-selective cortical regions. Neuroimage, 56(4):2356 2363, 2011. M. A. Quraan, S. N. Moses, Y. Hung, T. Mills, and M. J. Taylor. Detection and localization of hippocampal activity using beamformers with meg: a detailed investigation using simulations and empirical data. Human brain mapping, 32(5):812 827, 2011. J. O. Ramsay. Functional data analysis. Wiley Online Library, 2006. S. E. Robinson. Functional neuroimaging by synthetic aperture magnetometry (sam). Recent advances in biomagnetism, pages 302 305, 1999. A. Rodr ıguez-Rivera, B. V. Baryshnikov, B. D. Van Veen, and R. T. Wakai. Meg and eeg source localization in beamspace. Biomedical Engineering, IEEE Transactions on, 53(3): 430 441, 2006. A. J. Rothman, E. Levina, and J. Zhu. Generalized thresholding of large covariance matrices. Journal of the American Statistical Association, 104(485):177 186, 2009. J. Sarvas. Basic mathematical and electromagnetic concepts of the biomagnetic inverse problem. Physics in medicine and biology, 32(1):11, 1987. K. Sekihara and S. S. Nagarajan. Adaptive spatial filters for electromagnetic brain imaging. Springer Science & Business Media, 2008. K. Sekihara, S. S. Nagarajan, D. Poeppel, and A. Marantz. Asymptotic snr of scalar and vector minimum-variance beamformers for neuromagnetic source reconstruction. Biomedical Engineering, IEEE Transactions on, 51(10):1726 1734, 2004. B. D. Van Veen, W. Van Drongelen, M. Yuchtman, and A. Suzuki. Localization of brain electrical activity via linearly constrained minimum variance spatial filtering. Biomedical Engineering, IEEE Transactions on, 44(9):867 880, 1997. J. Zhang, C. Liu, and G. Green. Source localization with meg data: A beamforming approach based on covariance thresholding. Biometrics, 70(1):121 131, 2014. Zhang and Liu 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=1/25,J=500 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=1/25,J=500 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=1/0.64,J=500 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=1/0.64,J=500 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=1/25,J=1000 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=1/25,J=1000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=1/0.64,J=1000 2 4 6 8 10 12 14 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=1/0.64,J=1000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=1/25,J=2000 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=1/25,J=2000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=1/0.64,J=2000 2 4 6 8 10 12 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=1/0.64,J=2000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=1/25,J=3000 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=1/25,J=3000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=1/0.64,J=3000 2 4 6 8 10 12 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=1/0.64,J=3000 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 1 SNR=1/25,J=500 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 2 SNR=1/25,J=500 8 10 12 14 16 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 1 SNR=1/0.64,J=500 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 2 SNR=1/0.64,J=500 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 1 SNR=1/25,J=1000 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 2 SNR=1/25,J=1000 8 10 12 14 16 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 1 SNR=1/0.64,J=1000 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 2 SNR=1/0.64,J=1000 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 1 SNR=1/25,J=2000 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 2 SNR=1/25,J=2000 8 10 12 14 16 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 1 SNR=1/0.64,J=2000 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 2 SNR=1/0.64,J=2000 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 1 SNR=1/25,J=3000 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 2 SNR=1/25,J=3000 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 1 SNR=1/0.64,J=3000 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 2 SNR=1/0.64,J=3000 Figure 2: Scenario 1: Two sources located at CTF coordinates (3, 1, 4)T cm and ( 5, 2, 6)T cm respectively. The first four rows display the box-and-whisker plots of the index values and the localization biases against the tuning constant c0 = 0, 0.5, 1, 1.5, 2, ma, mi and sh for the combinations of n = 91, SNR= 1/25, 1/0.64, and J = 500, 1000, 2000, 3000 respectively. Here, ma and mi stand for the proposed hard-thresholded covariance based methods. sh stands for the optimal shrinkage-based method. With a slightly abuse of notation, c0 = ma, mi, sh refer to that ma, mi, and sh are used. The remaining rows present the box-and-whisker plots of the local localization bias to the sources r1 and r2 against the transverse slice indices from 0 to 10 when c0 was selected by the minimum strategy for the above combinations respectively. The red colored lines in the boxes are the medians. Note that when the distribution of the localization biases are degenerate, the upper and lower quartiles and medians of localization biases will be equal. Consequently, the box in the plot will reduce to a red colored line. The plots in the last four rows show that all the local peaks on the transverse slices are not close to the source location r1, implying that the source 1 has been masked on the neuronal activity map by source cancellations. LCMV Beamforming 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=1/25,J=500 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=1/25,J=500 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=1/0.64,J=500 2 4 6 8 10 12 14 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=1/0.64,J=500 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=1/25,J=1000 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=1/25,J=1000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=1/0.64,J=1000 2 4 6 8 10 12 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=1/0.64,J=1000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=1/25,J=2000 2 4 6 8 10 12 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=1/25,J=2000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=1/0.64,J=2000 2 4 6 8 10 12 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=1/0.64,J=2000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=1/25,J=3000 2 4 6 8 10 12 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=1/25,J=3000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=1/0.64,J=3000 2 4 6 8 10 12 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=1/0.64,J=3000 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 1 SNR=1/25,J=500 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 2 SNR=1/25,J=500 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 1 SNR=1/0.64,J=500 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 2 SNR=1/0.64,J=500 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 1 SNR=1/25,J=1000 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 2 SNR=1/25,J=1000 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 1 SNR=1/0.64,J=1000 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 2 SNR=1/0.64,J=1000 12 14 16 18 20 22 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 1 SNR=1/25,J=2000 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 2 SNR=1/25,J=2000 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 1 SNR=1/0.64,J=2000 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 2 SNR=1/0.64,J=2000 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 1 SNR=1/25,J=3000 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 2 SNR=1/25,J=3000 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 1 SNR=1/0.64,J=3000 1 2 3 4 5 6 7 8 9 10 Slice Biases to source 2 SNR=1/0.64,J=3000 Figure 3: Scenario 2: Two sources located at CTF coordinates ( 5, 5, 6)T cm and ( 6, 2, 5)T cm respectively. The first four rows display the box-and-whisker plots of the index values and the localization biases against the tuning constant c0 = 0, 0.5, 1, 1.5, 2, ma, mi and sh for the combinations of n = 91, SNR= 1/25, 1/0.64, and J = 500, 1000, 2000, 3000 respectively. The remaining rows present the box-and-whisker plots of the minimum local localization bias to the sources r1 and r2 against the transverse slice indices from 0 to 10 when c0 is selected by the minimum strategy for the above combinations respectively. The red colored lines in the boxes are the medians. When the upper and lower quartiles and medians of localization biases have the same value, the box in the plot will reduce to a red colored line. The plots in the last four rows show that all the local peaks on the transverse slices are not close to the source location r1, implying the source 1 has been masked on the neuronal activity map by source cancellations. Zhang and Liu 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=8.16,J=500 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=8.16,J=500 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=6.25,J=500 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=6.25,J=500 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=4,J=500 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=4,J=500 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=8.16,J=1000 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=8.16,J=1000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=6.25,J=1000 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=6.25,J=1000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=4,J=1000 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=4,J=1000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=8.16,J=2000 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=8.16,J=2000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=6.25,J=2000 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=6.25,J=2000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=4,J=2000 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=4,J=2000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=8.16,J=3000 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=8.16,J=3000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=6.25,J=3000 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=6.25,J=3000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=4,J=3000 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=4,J=3000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 1 SNR=8.16,J=500 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 2 SNR=8.16,J=500 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 1 SNR=6.25,J=500 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 2 SNR=6.25,J=500 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 1 SNR=4,J=500 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 2 SNR=4,J=500 468 10 12 14 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 1 SNR=8.16,J=1000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 2 SNR=8.16,J=1000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 1 SNR=6.25,J=1000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 2 SNR=6.25,J=1000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 1 SNR=4,J=1000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 2 SNR=4,J=1000 468 10 12 14 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 1 SNR=8.16,J=2000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 2 SNR=8.16,J=2000 468 10 12 14 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 1 SNR=6.25,J=2000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 2 SNR=6.25,J=2000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 1 SNR=4,J=2000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 2 SNR=4,J=2000 468 10 12 14 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 1 SNR=8.16,J=3000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 2 SNR=8.16,J=3000 468 10 12 14 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 1 SNR=6.25,J=3000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 2 SNR=6.25,J=3000 468 10 12 14 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 1 SNR=4,J=3000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 2 SNR=4,J=3000 Figure 4: Scenario 3: Two sources located at CTF coordinates (3, 1, 4)T cm and ( 5, 2, 6)T cm respectively. The first six rows show the box-and-whisker plots of the index values and the localization biases against the tuning constant c0 = 0, 0.5, 1, 1.5, 2, ma, mi and sh for the combinations of n = 102 sensors, SNR= 1/0.352, 1/0.42, 1/0.52, and the sample rates J = 500, 1000, 2000, 3000 respectively. The last six rows give the box-and-whisker plots of the minimum local localization bias to the sources r1 and r2 against the transverse slice indices from 0 to 10 when c0 is selected by the minimum strategy for these combinations respectively. The red colored lines in the boxes are the medians. When the upper and lower quartiles and medians of localization biases are equal, the box in the plot will reduce to a red colored line. The last six rows of the plots show all the local peaks on the transverse slices are not close to the source location r1, implying the source 1 has been masked on the neuronal activity map by source cancellations. 2136 LCMV Beamforming 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=1.7313,J=500 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=1.7313,J=500 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=1.6437,J=500 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=1.6437,J=500 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=2.0408,J=500 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=2.0408,J=500 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=1.7313,J=1000 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=1.7313,J=1000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=1.6437,J=1000 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=1.6437,J=1000 0.92 0.94 0.96 0.981 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=2.0408,J=1000 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=2.0408,J=1000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=1.7313,J=2000 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=1.7313,J=2000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=1.6437,J=2000 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=1.6437,J=2000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=2.0408,J=2000 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=2.0408,J=2000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=1.7313,J=3000 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=1.7313,J=3000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=1.6437,J=3000 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=1.6437,J=3000 0 0.5 1 1.5 2 sh ma mi c0 LCMV: SNR=2.0408,J=3000 0 0.5 1 1.5 2 sh ma mi c0 Biases: SNR=2.0408,J=3000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 1 SNR=1.7313,J=500 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 2 SNR=1.7313,J=500 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 1 SNR=1.6437,J=500 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 2 SNR=1.6437,J=500 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 1 SNR=2.0408,J=500 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 2 SNR=2.0408,J=500 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 1 SNR=1.7313,J=1000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 2 SNR=1.7313,J=1000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 1 SNR=1.6437,J=1000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 2 SNR=1.6437,J=1000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 1 SNR=2.0408,J=1000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 2 SNR=2.0408,J=1000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 1 SNR=1.7313,J=2000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 2 SNR=1.7313,J=2000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 1 SNR=1.6437,J=2000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 2 SNR=1.6437,J=2000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 1 SNR=2.0408,J=2000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 2 SNR=2.0408,J=2000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 1 SNR=1.7313,J=3000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 2 SNR=1.7313,J=3000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 1 SNR=1.6437,J=3000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 2 SNR=1.6437,J=3000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 1 SNR=2.0408,J=3000 1 2 3 4 5 6 7 8 9 101112 Slice Biases to source 2 SNR=2.0408,J=3000 Figure 5: Scenario 4: Two sources located at CTF coordinates ( 5, 5, 6)T cm and ( 6, 2, 5)T cm respectively. The first six rows show the box-and-whisker plots of the index values and the localization biases against the tuning constant c0 = 0, 0.5, 1, 1.5, 2, ma, mi and sh for the combinations of n = 102 sensors, SNR= 1/0.352, 1/0.42, 1/0.52, and the sample rates J = 500, 1000, 2000, 3000 respectively. The last six rows give the box-and-whisker plots of the minimum local localization bias to the sources r1 and r2 against the transverse slice indices from 0 to 10 when c0 was selected by the minimum strategy for these combinations respectively. The red colored lines in the boxes are the medians. When the upper and lower quartiles and medians of localization biases are equal, the box in the plot will reduce to a red colored line. The last six rows of the plots show all the local peaks on the transverse slices are not close to the source location r1, implying the source 1 has been masked on the neuronal activity map by source cancellations. 2137 Zhang and Liu ma mi gma gmi adp sh Biases: SNR=1/25,J=500 ma mi gma gmi adp sh Biases: SNR=1/0.64,J=500 ma mi gma gmi adp sh Biases: SNR=1/25,J=1000 ma mi gma gmi adp sh Biases: SNR=1/0.64,J=1000 ma mi gma gmi adp sh Biases: SNR=1/25,J=2000 ma mi gma gmi adp sh Biases: SNR=1/0.64,J=2000 ma mi gma gmi adp sh c0 Biases: SNR=1/25,J=3000 ma mi gma gmi adp sh c0 Biases: SNR=1/0.64,J=3000 ma mi gma gmi adp sh Biases: SNR=1/25,J=500 ma mi gma gmi adp sh Biases: SNR=1/0.64,J=500 ma mi gma gmi adp sh Biases: SNR=1/25,J=1000 ma mi gma gmi adp sh Biases: SNR=1/0.64,J=1000 ma mi gma gmi adp sh Biases: SNR=1/25,J=2000 ma mi gma gmi adp sh Biases: SNR=1/0.64,J=2000 ma mi gma gmi adp sh Biases: SNR=1/25,J=3000 ma mi gma gmi adp sh Biases: SNR=1/0.64,J=3000 Figure 6: Performance comparison of the six different beamformers, namely ma, mi, gma, gmi, adp and sh in Scenarios 1 and 2. Here, ma and mi stand for the hardthresholded covariance based methods when the tuning constant c0 is chosen by use of the maximum strategy and the minimum strategy respectively; gma and gmi stand for the generalized thresholded covariance based methods when the tuning constant c0 is chosen by use of the maximum strategy and the minimum strategy respectively; adp and sh stand for the adaptive thresholding-based method and the optimal shrinkage-based method. The upper two rows of multiple box-whisker plots are for the combinations of n = 91, SNR= 1/25, 1/0.64, and J = 500, 1000, 2000, 3000 in Scenario 1, while the lower two rows are for the combinations of n = 91, SNR= 1/25, 1/0.64, and J = 500, 1000, 2000, 3000 in Scenario 2. Each panel shows the localization biases against the above six different beamformer methods. The red colored lines in the boxes are the medians. When the upper and lower quartiles and medians of localization biases are equal, the box in the plot will reduce to a red colored line. LCMV Beamforming ma mi gma gmi adp sh Biases: SNR=8.16,J=500 ma mi gma gmi adp sh Biases: SNR=6.25,J=500 ma mi gma gmi adp sh Biases: SNR=4,J=500 ma mi gma gmi adp sh Biases: SNR=8.16,J=1000 ma mi gma gmi adp sh Biases: SNR=6.25,J=1000 ma mi gma gmi adp sh Biases: SNR=4,J=1000 ma mi gma gmi adp sh Biases: SNR=8.16,J=2000 ma mi gma gmi adp sh Biases: SNR=6.25,J=2000 ma mi gma gmi adp sh Biases: SNR=4,J=2000 ma mi gma gmi adp sh Biases: SNR=8.16,J=3000 ma mi gma gmi adp sh Biases: SNR=6.25,J=3000 ma mi gma gmi adp sh Biases: SNR=4,J=3000 Figure 7: Performance comparison of the six different beamformers, namely ma, mi, gma, gmi, adp and sh in Scenario 3. Multiple box-whisker plots of localization biases are displayed for the combinations of n = 102, SNR= 1/0.352, 1/0.42, 1/0.52, and J = 500, 1000, 2000, 3000. Each panel shows the localization biases against the six different beamformer methods, namely ma, mi, gma, gmi, adp and sh. The red colored lines in the boxes are the medians. When the upper and lower quartiles and medians of localization biases are equal, the box in the plot will reduce to a red colored line. Zhang and Liu ma mi gma gmi adp sh Biases: SNR=1.7313,J=500 ma mi gma gmi adp sh Biases: SNR=1.6437,J=500 ma mi gma gmi adp sh Biases: SNR=2.0408,J=500 ma mi gma gmi adp sh Biases: SNR=1.7313,J=1000 ma mi gma gmi adp sh Biases: SNR=1.6437,J=1000 ma mi gma gmi adp sh Biases: SNR=2.0408,J=1000 ma mi gma gmi adp sh Biases: SNR=1.7313,J=2000 ma mi gma gmi adp sh Biases: SNR=1.6437,J=2000 ma mi gma gmi adp sh Biases: SNR=2.0408,J=2000 ma mi gma gmi adp sh Biases: SNR=1.7313,J=3000 ma mi gma gmi adp sh Biases: SNR=1.6437,J=3000 ma mi gma gmi adp sh Biases: SNR=2.0408,J=3000 Figure 8: Performance comparison of the six different beamformers, namely ma, mi, gma, gmi, adp and sh in Scenario 4. Multiple box-whisker plots of localization biases are displayed for the combinations of n = 102, SNR= 1/0.352, 1/0.42, 1/0.52, and J = 500, 1000, 2000, 3000 in Scenario 4. Each panel shows the localization biases against the six different beamformer methods, namely ma, mi, gma, gmi, adp and sh. The red colored lines in the boxes are the medians. When the upper and lower quartiles and medians of localization biases are equal, the box in the plot will reduce to a red colored line. LCMV Beamforming ma mi gmagmi adp sh Biases: SNR=8.16,J=500 ma mi gmagmi adp sh Biases: SNR=6.25,J=500 ma mi gmagmi adp sh Biases: SNR=4,J=500 ma mi gma gmi adp sh Biases: SNR=8.16,J=1000 ma mi gma gmi adp sh Biases: SNR=6.25,J=1000 ma mi gmagmi adp sh Biases: SNR=4,J=1000 ma mi gma gmi adp sh Biases: SNR=8.16,J=2000 ma mi gma gmi adp sh Biases: SNR=6.25,J=2000 ma mi gmagmi adp sh Biases: SNR=4,J=2000 ma mi gma gmi adp sh Biases: SNR=8.16,J=3000 ma mi gma gmi adp sh Biases: SNR=6.25,J=3000 ma mi gmagmi adp sh Biases: SNR=4,J=3000 Figure 9: Performance comparison of the six different beamformers, namely ma, mi, gma, gmi, adp and sh in Scenario 5. Multiple box-whisker plots of localization biases are displayed for the combinations of n = 102, SNR= 1/0.352, 1/0.42, 1/0.52, and J = 500, 1000, 2000, 3000 respectively. Each panel shows the localization biases against the six different beamformer methods, namely ma, mi, gma, gmi, adp and sh. Zhang and Liu Figure 10: Plots of the log-contrasts between the faces and scrambled faces on three orthogonal slices through the peak locations for each of five sessions, which are overlaid on the subject s MRI scan. The plots in the left-hand two columns and the right-hand two columns are derived from the procedures mi and adp respectively. Rows 1 and 2, 3 and 4, 5 and 6, and 7 and 8 are for sessions 1 5 respectively. The highlighted yellow colored areas revealed neuronal activity increases or decreases for the faces relative to the scrambled faces. The areas shown in the left-hand two columns are in or close to the IOG, STS, and PCu regions which are known to be related to the human face perception. LCMV Beamforming Figure 11: Plots of the log-contrasts between the faces and scrambled faces on 20 transverse slices for each of five sessions, which are overlaid on the subject s MRI scan. The plots in the left-hand column and the right-hand column are derived from the procedures mi and adp respectively. Rows 1 5 are for sessions 1 5 respectively. The highlighted yellow colored areas revealed neuronal activity increases for the faces relative to the scrambled faces. The areas highlighted in the first column are in or close to the OFA, IOG, STS, and PCu regions which are known to be related to the human face perception. Zhang and Liu time course time course 0 500 0.005 time course time course time course time course time course time course time course 0 500 0.025 time course time course time course time course 0 500 0.025 time course time course Figure 12: Plots of the estimated time-courses at the global peaks along x, y and z-axes respectively for each of five sessions. The solid curve and the dashed curve in each plot stand for the estimated time-courses under the faces and the scrambled faces respectively. The plots are ordered from the top left panel to the right panel to the bottom panel corresponding to sessions 1 5. LCMV Beamforming 0 50 100 150 200 250 300 350 1 number of sensors Maximum partial correlation among sources 0 50 100 150 200 250 300 350 1 number of sensors Maximum partial correlation among voxels Figure 13: Plots of d12(k) and dmax(k) against k = 1, 2, ..., 306 respectively, where k stands for k randomly chosen sensors from the 306 sensors in the face-perception data, two sources are located at CTF (-4, 3,8)cm and (-4,-5,5) cm respectively, and the dashed curve in each plot is for the function log(log(k)).