# deterministic_symmetric_positive_semidefinite_matrix_completion__5022d480.pdf Deterministic Symmetric Positive Semidefinite Matrix Completion William E. Bishop1,2, Byron M. Yu2,3,4 1Machine Learning, 2Center for the Neural Basis of Cognition, 3Biomedical Engineering, 4Electrical and Computer Engineering Carnegie Mellon University {wbishop, byronyu}@cmu.edu We consider the problem of recovering a symmetric, positive semidefinite (SPSD) matrix from a subset of its entries, possibly corrupted by noise. In contrast to previous matrix recovery work, we drop the assumption of a random sampling of entries in favor of a deterministic sampling of principal submatrices of the matrix. We develop a set of sufficient conditions for the recovery of a SPSD matrix from a set of its principal submatrices, present necessity results based on this set of conditions and develop an algorithm that can exactly recover a matrix when these conditions are met. The proposed algorithm is naturally generalized to the problem of noisy matrix recovery, and we provide a worst-case bound on reconstruction error for this scenario. Finally, we demonstrate the algorithm s utility on noiseless and noisy simulated datasets. 1 Introduction There are multiple scenarios where we might wish to reconstruct a symmetric positive semidefinite (SPSD) matrix from a sampling of its entries. In multidimensional scaling, for example, pairwise distance measurements are used to form a kernel matrix and PCA is performed on this matrix to embed the data in a low-dimensional subspace. However, due to constraints, it may not be possible to measure pairwise distances for all variables, rendering the kernel matrix incomplete. In neuroscience a population of neurons is often modeled as driven by a low-dimensional latent state [1], producing a low-rank covariance structure in the observed neural recordings. However, with current technology, it may only be possible to record from a large population of neurons in small, overlapping sets [2,3], leaving holes in the empirical covariance matrix. More generally, SPSD matrices in the form of Gram matrices play a key role in a broad range of machine learning problems such as support vector machines [4], Gaussian processes [5] and nonlinear dimensionality reduction techniques [6] and the reconstruction of such matrices from a subset of their entries is of general interest. In real world scenarios, the constraints that make it difficult to observe a whole matrix often also constrain which particular entries of a matrix are observable. In such settings, existing matrix completion results, which assume matrix entries are revealed in an unstructured, random manner [7 14] or the ability to finely query individual entries of a matrix in an adaptive manner [15, 16] might not be applicable. This motivates us to examine the problem of recovering a SPSD matrix from a given, deterministic set of its entries. In particular we focus on reconstructing a SPSD matrix from a revealed set of its principal submatrices. Recall that a principal submatrix of a matrix is a submatrix obtained by symmetrically removing rows and columns of the original matrix. When individual entries of a matrix are formed by pairwise measurements between experimental variables, principal submatrices are a natural way to formally capture how entries are revealed. Figure 1: (A) An example A matrix with two principal submatrices, showing the correspondence between A(ρl, ρl) and C(ρl, :). (B) Mapping of C1 and C2 to C, illustrating the role of ιl, φl and ηl. Sampling principal submatrices also allows for an intuitive method of matrix reconstruction. As shown in Fig. 1, any n n rank r SPSD matrix A can be decomposed as A = CCT for some C Rn r. Any principal submatrix of A can also be decomposed in the same way. Further, if ρi is an ordered set indexing the the ith principal submatrix of A, it must be that A(ρi, ρi) = C(ρi, :)C(ρi, :)T .1 This suggests we can decompose each A(ρi, ρi) to learn the the rows of C and then reconstruct A from the learned C, but there is one complication. Any matrix, C(ρi, :), such that A(ρi, ρi) = C(ρi, :)C(ρi, :)T , is only defined up to an orthonormal transformation. The na ıve algorithm just suggested has no way of ensuring the rows of C learned from two different principal submatrices are consistent with respect to this degeneracy. Fortunately, the situation is easily remedied if the principal submatrices in question have some overlap, so that the C(ρi, :) matrices have some rows that map to each other. Under appropriate conditions explored below, we can learn unique orthonormal transformations rendering these rows equal, allowing us to align the C(ρi, :) matrices to learn a proper C. Contributions In this paper, we make the following contributions. 1. We prove sufficient conditions, which are also necessary in certain situations, for the exact recovery of a SPSD matrix from a given set of its principal submatrices. 2. We present a novel algorithm which exactly recovers a SPSD matrix when the sufficient conditions are met. 3. The algorithm is generalized when the set of observed principal submatrices of a matrix are corrupted by noise. We present a theorem guaranteeing a bound on reconstruction error. 1.1 Related Work The low rank matrix completion problem has received considerable attention since the work of Cand es and Recht [17] who demonstrated that a simple convex problem could exactly recover many low-rank matrices with high probability. This work, as did much of what followed (e.g., [7 9]), made three key assumptions. First, entries of a matrix were assumed to be uncorrupted by noise and, second, revealed in a random, unstructured manner. Finally, requirements, such as incoherence, were also imposed to rule out matrices with most of their mass concentrated in a only a few entries. These assumptions have been reexamined and relaxed in additional work. The case of noisy observed entries has been considered in [10 14]. Others have reduced or removed the requirements for incoherence by using iterative, adaptive sampling schemes [15,16]. Finally, recent work [18,19] has considered the case of matrix recovery when entries are selected a deterministic manner. 1Throughout this work we will use MATLAB indexing notation, so C(ρi, :) is the submatrix of C made up of the rows indexed by the ordered set ρi. Our work considerably differs from this earlier work. Our applications of interest allow us to assume much structure, i.e., that matrices are SPSD, which our algorithm exploits, and our sufficient conditions make no appeal to incoherence. Our work also differs from previous results for deterministic sampling schemes (e.g., [18,19]), which do not consider noise nor provide sufficient conditions for exact recovery, instead approaching the problem as one of matrix approximation. Previous work has also considered the problem of completing SPSD matrices of any [20] or low rank [21,22]. Our work to identify conditions for a unique completion of a given rank can be viewed as a continuation of this work where our sufficient and necessary conditions can be understood in a particularly intuitive manner due to our sampling scheme. Finally, the Nystr om method [23] is a well known technique for approximating a SPSD matrix as low rank. It can also be applied to the matrix recovery problem, and in the noiseless case, sufficient conditions for exact recovery are known [24]. However, the Nystr om method requires sampling full columns and rows of the original matrix, a sampling scheme which may not be possible in many of our applications of interest. 2 Preliminaries 2.1 Deterministic Sampling for SPSD Matrices We denote the set of index pairs for the revealed entries of a matrix by Ω. Formally, an index pair, (i, j), is in Ωif and only if we observe the corresponding entry of an n n matrix so that Ω [n] [n].2 In this work, we assume Ωindexes a set of principal submatrices of a matrix. Let Ωl Ωindicate a subset of Ω. If Ωl indexes a principal submatrix of a matrix, it can be compactly described by the unique set of row (or equivalently column) indices it contains. Let ρ{Ωl} = {i|(i, j) Ωl} be the set of row indices contained in Ωl. For compactness, let ρl = ρ{Ωl}. Finally, let | | indicate cardinality. Then, for an n n matrix, A, of rank r we make the following assumptions on Ω. (A1) ρ{Ω} = [n]. (A2) There exists a collection Ω1, . . . , Ωk of subsets of Ωsuch that Ω= k l=1Ωl, and for each Ωl, (i, i) Ωl and (j, j) Ωl if and only if (i, j) Ωl and (j, i) Ωl. (A3) There exists a collection Ω1, . . . , Ωk of subsets of Ωsuch that A2 holds and if k > 1, there exists an ordering τ1, . . . , τk such that for all i 2, |ρτi i 1 j=1ρτj | r. The first assumption ensures Ωindexes at least one entry for each row of A. Assumption A2 requires that Ωindexes a collection of principal submatrices of A, and A3 allows for the possible alignment of rows of C (recall, A = CCT ) estimated from each principal submatrix. 2.2 Additional Notation Denote the set of real, n n SPSD matrices by Sn +, and let A Sn + be the rank r matrix to be recovered. For the noisy case, A will indicate a perturbed version of A. We will use the notation Al to indicate the principal submatrix of a matrix A indexed by Ωl. Denote the eigendecomposition of A as A = EΛET for the diagonal matrix Λ Rr r containing the non-zero eigenvalues of A, λ1 . . . λr, along its diagonal and the matrix En r containing the corresponding eigenvectors of A in its columns. Let nl denote the size of Al and rl the rank. Because Al is a principal submatrix of A, it follows that Al Snl + . Denote the eigendecomposition of each Al as Al = ElΛl ET l for the matrices Λl Rrl rl and El Rnl,rl. We add tildes to the appropriate symbols for the eigendecomposition of A and its principal submatrices. Finally, let ιl = ρτl ( j=1,...,l 1 ρτj) be the intersection of the indices for the lth principal submatrix with the indices of the all of the principal submatrices ordered before it. Let Cl be a matrix such that Cl CT l = Al. If Al is a principal submatrix of A there will exist some Cl such that C(ρl, :) = Cl. For such a Cl, let φl be an index set that assigns the rows of the matrix C(ιl, :) to their location in Cl, so that C(ιl, :) = Cl(φl, :) and let ηl assign the rows of C(ρl \ ιl, :) to their 2We use the notation, [n] to indicate the set {1, . . . , n}. Algorithm 1 SPSD Matrix Recovery (r, { El, Λl, τl, ρl, φl, ιl, ηl}k l=1) Initialize ˆC as a n r matrix. 1. ˆC(ρτ1, :) Eτ1(:, 1 : r) Λ1/2 τ1 (1 : r, 1 : r) 2. For l {2, . . . , k} (a) ˆCl Eτl(:, 1 : r) Λ1/2 τl (1 : r, 1 : r) (b) ˆWl argmin W W T =I || ˆC(ιl, :) ˆCl(φl, :)W||2 F (c) ˆC(ρτl \ ιl, :) ˆCl(ηl, :) ˆWl 3. Return ˆA = ˆC ˆCT location in Cl, so that C(ρl \ ιl, :) = Cl(ηl, :). The role of ρl, ιl, ηl and φl is illustrated for the case of two principal submatrices with τ1 = 1, τ2 = 2 in Figure 1. 3 The Algorithm Before establishing a set of sufficient conditions for exact matrix completion, we present our algorithm. Except for minor notational differences, the algorithms for the noiseless and noisy matrix recovery scenarios are identical, and for brevity we present the algorithm for the noisy scenario. Let Ωsample the observed entires of A so that A1 through A3 hold. Assume each perturbed principal submatrix, Al, indexed by Ωis SPSD and of rank r or greater. These assumptions on each Al will be further explored in section 5. Decompose each Al as Al = El Λl ET l , and form a rank r matrix ˆCl as ˆCl = El(:, 1 : r) Λ1/2 l (1 : r, 1 : r). The rows of the ˆCl matrices contain estimates for the rows of C such that A = CCT , though rows estimated from different principal submatrices may be expressed with respect to different orthonormal transformations. Without loss of generality, assume the principal submatrices are labeled so that τ1 = 1, . . . , τk = k. Our algorithm begins to construct ˆC by estimating ˆC(ρ1, :) = ˆC1. In this step, we also implicitly choose to express ˆC with respect to the basis for ˆC1. We then iteratively add rows to ˆC, for each ˆCl adding the rows ˆCl(ηl, :) to ˆC. To estimate the orthornormal transformation to align the rows of ˆCl with the rows of ˆC estimated in previous iterations, we solve the following optimization problem ˆWl = argmin W W T =I ˆC(ιl, :) ˆCl(φl, :)W 2 In words, equation 1 estimates ˆWl so that the rows of ˆCl which overlap with the previously estimated rows of ˆC match as closely as possible. In the noiseless case, (1) is equivalent to ˆWl = W : ˆC(ιi, :) ˆCl(φi, :)W = 0. Equation 1 is known as the Procrustes problem and is non-convex, but its solution can be found in closed form and sufficient conditions for its unique solution are known [25]. After learning ˆWl for each ˆCl, we build up the estimate for ˆC by setting ˆC(ρl \ ιl, :) = ˆCl(ηl, :) ˆWl. This step adds the rows of ˆCl that do not overlap with those already added to ˆC to the growing estimate of ˆC. If we process principal submatrices in the order specified by A3, this algorithm will generate a complete estimate for ˆC. The full matrix ˆA can then be estimated as ˆA = ˆC ˆC. The pseudocode for this algorithm is given in Algorithm 1. 4 The Noiseless Case We begin this section by stating one additional assumption on A. (A4) There exists a collection Ω1, . . . , Ωk of subsets of Ωsuch that A2 holds and if k > 1, there exists an ordering τ1, . . . , τk such that the rank of A(ιl, ιl) is equal to r for each l {2, . . . , k}. In Theorem 2 we show that A1 - A4 are sufficient to guarantee the exact recovery of A. Conditions A1 - A4 can also be necessary for the unique recovery of A by any method, as we show next in Theorem 1. Theorem 1 may at first glance appear quite simple, but it is a restatement of Lemma 6 in the appendix, from which more general necessity results can be derived. Specifically, Corollary 7 in the appendix can be used to establish the above conditions are necessary to recover A from a set of its principal submatrices which can be aligned in a overlapping sequence (e.g., submatrices running down the diagonal of A), which might be encountered when constructing a covariance matrix from sequentially sampled subgroups of variables. Corollary 8 establishes a similar result when there exists a set of principal submatrices which have no overlap among themselves but all overlap with one other submatrix not in the set, and Corollary 9 establishes that it is sufficient to find just one principal submatrix that obeys certain conditions with respect to the rest of the sampled entries of the matrix to certify the impossibility of matrix completion. This last corollary in fact applies even when the rest of the sampled entries do not fall into a union of principal submatrices of the matrix. Theorem 1. Let Ω = [n] [n] index A so that A2 holds for some Ω1 Ωand Ω2 Ω. Then A1, A3 and A4 must hold with respect to Ω1 and Ω2 for A to be recoverable by any method. The proof can be found in the appendix. Here we briefly provide the intuition. Key to understanding the proof is recognizing that recovering A from the set of entries indexed by Ωis equivalent to learning a matrix C from the same set of entries such that A = CCT . If A1 is not met, a complete row and the corresponding column of A is not sampled, and there is nothing to constrain the estimate for the corresponding row of C. If A3 and A4 are not met, we can construct a C such that all of the entries of the matrices A and CCT indexed by Ωare identical yet A = CCT . We now show that our algorithm can recover A as soon as the above conditions are met, establishing their sufficiency. Theorem 2. Algorithm 1 will exactly recover A from a set of its principal submatrices indexed by Ω1, . . . , Ωk which meets conditions A1 through A4. The proof, which is provided in the appendix, shows that in the noiseless case, for each principal submatrix, Al, of A, step 2a of Algorithm 1 will learn an exact ˆCl such that Al = ˆCl ˆCT l . Further, when assumptions A3 and A4 are met, step 2b will correctly learn the orthonormal transformation to align each ˆCl to the previously added rows of ˆC. Therefore, progressive iterations of step 2 correctly learn more and more rows of a unified ˆC. As the algorithm progresses, all of the rows of ˆC are learned and the entirety of A can be recovered in step 3 of the algorithm. It is instructive to ask what we have gained or lost by constraining ourselves to sampling principal submatrices. In particular, we can ask how many individual entries must be observed before we can recover a matrix. A SPSD matrix has at least nr degrees of freedom, and we would not expect any matrix recovery method to succeed before at least this many entries of the original matrix are revealed. The next theorem establishes that our sampling scheme is not necessarily wasteful with respect to this bound. Theorem 3. For any rank r 1 matrix A Sn + there exists a Ωsuch that A1 A3 hold and |Ω| n(2r + 1). Of course, this work is motivated by real-world scenarios where we are not at the liberty to finely select the principal submatrices we sample, and in practice we may often have to settle for a set of principal submatrices which sample more of the matrix. However, it is reassuring to know that our sampling scheme does not necessarily require a wasteful number of samples. We note that assumptions A1 through A4 have an important benefit with respect to a requirement of incoherence. Incoherence is an assumption about the entire row and column space of a matrix and cannot be verified to hold with only the observed entries of a matrix. However, assumptions A1 through A4 can be verified to hold for a matrix of known rank using its observed entries. Thus, it is possible to verify that these assumptions hold for a given Ωand A and provide a certificate guaranteeing exact recovery before matrix completion is attempted. 5 The Noisy Case We analyze the behavior of Algorithm 1 in the presence of noise. For simplicity, we assume each observed, noise corrupted principal submatrix is SPSD so that the eigendecompositions in steps 1 and 2a of the algorithm are well defined. In the noiseless case, to guarantee the uniqueness of ˆA, A4 required each A(ιl, ιl) to be of rank r. In the noisy case, we place a similar requirement on A(ιl, ιl), where we recognize that the rank of each A(ιl, ιl) may be larger than r due to noise. (A5) There exists a collection Ω1, . . . , Ωk of subsets of Ωsuch that A2 holds and if k > 1, there exists an ordering τ1, . . . , τk such that the rank of A(ιl, ιl) is greater than or equal to r for each l {2, . . . , k}. (A6) There exists a collection Ω1, . . . , Ωk of subsets of Ωsuch that A2 holds and Al Snl + for each l {1, . . . , k}. In practice, any Al which is not SPSD can be decomposed into the sum of a symmetric and an antisymmetric matrix. The negative eigenvalues of the symmetric matrix can then be set to zero, rendering a SPSD matrix. As long as this resulting matrix meets the rank requirement in A5, it can be used in place of Al. Our algorithm can then be used without modification to estimate ˆA. Theorem 4. Let Ωindex an n n matrix A which is a perturbed version of the rank r matrix A such that A1 A6 simultaneously hold for a collection of principal submatrices indexed by Ω1, . . . , Ωk. Let b maxl [k] ||Cl||F for some Cl Rnl r such that Al = Cl CT l , ζ λl,1, and δ min{mini [r 1], |λl,i λl,i+1|, λl,r}. Assume ||Al Al||F ϵ for all l for some ϵ < min{b2/r, δ/2, 1}. Then if in step 2 of Algorithm 1, rank n ˆCl(φl, :)T ˆC(ιl, :) o = r for all l 2, Algorithm 1 will estimate an ˆA from the set of principal submatrices of A indexed by Ωsuch that A ˆA F 2Gk 1L||C||F rϵ + G2k 2L2rϵ, where C Rn r is some matrix such that A = CCT , G = 4 + 12/v, and v λr(A(ιr, ιr))/b2 for all l and L = q 2ζ1/2 δ3/2 . The proof is left to the appendix and is accomplished in two parts. In the first part, we guarantee that the ordered eigenvalues and eigenvectors of each Al, which are the basis for estimating each ˆCl, will not be too far from those of the corresponding Al. In the second part, we bound the amount of additional error that can be introduced by learning imperfect ˆW matrices which result in slight misalignments as each ˆCl matrix is incorporated into the final estimate for the complete ˆC. This second part relies on a general perturbation bound for the Procrustes problem, derived as Lemma 16 in the appendix. Our error bound is non-probabilistic and applies in the presence of adversarial noise. While we know of no existing results for the recovery of matrices from deterministic samplings of noise corrupted entries, we can compare our work to bounds obtained for various results applicable to random sampling schemes, (e.g., [10 13]). These results require either incoherence [10,11], boundedness [13] of the entries of the matrix to be recovered or assume the sampling scheme obeys the restricted isometry property [12]. Error is measured with various norms, but in all cases shows a linear dependence on the size of the original perturbation. For this initial analysis, our bound establishes that reconstruction error consistently goes to 0 with perturbation size, and we conjecture that with a refinement of our proof technique we can prove a linear dependence on ϵ. We provide initial evidence for this conjecture in the results below. 6 Simulations We demonstrate our algorithm s performance on simulated data, starting with the noiseless setting in Fig. 2. Fig. 2A shows three sampling schemes, referred to as masks, that meet assumptions A1 Random Mask Block Diagonal Mask Full Columns Mask True Matrix 5 10 15 20 25 F Rank 5 10 15 20 25 = Block Diagonal = Full Column Example Deterministic Sampling Schemes for SPSD Matrix Completion Completion Success of the Block Diagonal Sampling Scheme Completion Success with Matrix Rank for Three Sampling Schemes with Figure 2: Noiseless simulation results. (A) Example masks for successful completion of a rank 4 matrix. (B) Completion success as rank is varied for masks with minimal overlap (minl |ιl|) of 10. (C) Completion success for rank 1 55 matrices with block diagonal masks with minimal overlap ranging between 0 54. through A3 for a randomly generated 40 40 rank 4 matrix. In all of the noiseless simulations, we simulate a rank r matrix A Sn + by first randomly generating a C Rn r with entries individually drawn from a N(0, 1) distribution and forming A as A = CCT . The block diagonal mask is formed from 5 5 principal submatrices running down the diagonal, each principal submatrix overlapping the one to its upper left. Such a mask might be encountered in practice if we obtain pairwise measurements from small sets of variables sequentially. The lth principal submatrix of the full columns mask is formed by sampling all pairs of entires, (i, j) indexed by i, j {1, 2, 3, 4, l+4} and might be encountered when obtaining pairwise measurements between sets of variables, where some small number of variables is present in all sets. The random mask is formed from principal submatrices randomly generated to conform to assumptions A1 through A3 and demonstrates that masks with non-obvious structure in the underlying principal submatrices can conform to assumptions A1 through A3. Algorithm 1 correctly recovers the true matrix from all three masks. In panel Fig. 2B, we modify these three types of masks so that minl |ιl|, the minimal overlap of a principal submatrix with those ordered before it, is 10 for each and attempt to reconstruct random matrices of size 55 55 and increasing rank. Corollaries 7 9 in the appendix, which can be derived from Theorem 1 above, can be applied to these scenarios to establish the necessity that minl |ιl| be greater than r for a rank r matrix. As predicted, for all masks recovery is successful for all matrices of rank 10 or less and unsuccessful for matrices of greater rank. In Fig. 2C, we show this is not unique to masks with minimal overlap of 10. Here we generate block diagonal masks with minimal overlap between the principal submatrices varying between 0 and 54. For each overlap value, we then attempt to recover matrices of rank 1 through o + 1, where o is the minimal overlap value. To guard against false positives, we randomly generated 10 matrices of a specified rank for each mask and only indicated success in black if matrix completion was successful in all cases. As predicted by theory, matrix completion failed exactly when the rank of the underlying matrix exceeded the minimal overlap value of the mask. Identical results were obtained for the full column and random masks. We provide evidence the dependence on ϵ in Theorem 4 should be linear in Fig. 3. We generate random 55 55 matrices of rank 1 through 10. Matrices were generated as in the noiseless scenario and normalized to have a Frobenius norm of 1. We use a block diagonal mask with 25 25 blocks and 0 1 2 3 4 5 6 x 10 5 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 x 10 5 0 12345678910 A B Rank Noisy Matrix Reconstruction Error Noisy Matrix Reconstruction Error Adjusted for Rank 3 1 2 4 5 6 3 1 Figure 3: Noisy simulation results. (A) Reconstruction error with increasing amounts of noise applied to the original matrix. (B) Traces in panel (A), each divided by its value at ϵ = ϵmin. an overlap of 15 and randomly generate SPSD noise, scaled so that ||Al Al|| = ϵ for each principal submatrix. We sweep through a range of ϵ [ϵmin, ϵmax] for a ϵmin > 0 and a ϵmax determined by the matrix with the tightest constraint on ϵ in theorem 4. Fig. 3A shows that reconstruction error generally increases with ϵ and the rank of the matrix to be recovered. To better visualize the dependence on ϵ, in Fig. 3B, we plot ||A ˆA||F /||A ˆA||F,ϵmin, where ||A ˆA||F,ϵmin indicates the reconstruction error obtained with ϵ = ϵmin. All of the lines coincide, suggesting a linear dependence on ϵ. 7 Discussion In this work we present an algorithm for the recovery of a SPSD matrix from a deterministic sampling of its principal submatrices. We establish sufficient conditions for our algorithm to exactly recover a SPSD matrix and present a set of necessity results demonstrating that our stated conditions can be quite useful for determining when matrix recovery is possible by any method. We also show that our algorithm recovers matrices obscured by noise with increasing fidelity as the magnitude of noise goes to zero. Our algorithm incorporates no tuning parameters and can be computationally light, as the majority of computations concern potentially small principal submatrices of the original matrix. Implementations of the algorithm, which estimate each ˆCl in parallel, are also easy to construct. Additionally, our results can be generalized when the principal submatrices our method uses for reconstruction are themselves not fully observed. In this case, existing matrix recovery techniques can be used to estimate each complete underlying principal submatrix with some bounded error. Our algorithm can then reconstruct the full matrix from these estimated principal submatrices. An open question is the computational complexity of finding a set of principal submatrices which satisfy conditions A1 through A4. However, in many practical situations there is an obvious set of principal submatrices and ordering which satisfy these conditions. For example, in the neuroscience application described in the introduction, a set of recording probes are independently movable and each probe records from a given number of neurons in the brain. Each configuration of the probes corresponds to a block of simultaneously recorded neurons, and by moving the probes one at a time, blocks with overlapping variables can be constructed. When learning a low rank covariance structure for this data, the overlapping blocks of variables naturally define observed blocks of a low rank covariance matrix to use in algorithm 1. Acknowledgements This work was supported by an NDSEG fellowship, NIH grant T90 DA022762, NIH grant R90 DA023426-06 and by the Craig H. Nielsen Foundation. We thank Martin Azizyan, Geoff Gordon, Akshay Krishnamurthy and Aarti Singh for their helpful discussions and Rob Kass for his guidance. [1] John P Cunningham and Byron M Yu. Dimensionality reduction for large-scale neural recordings. Nature Neuroscience, 17(11):1500 1509, 2014. [2] Srini Turaga, Lars Buesing, Adam M Packer, Henry Dalgleish, Noah Pettit, Michael Hausser, and Jakob Macke. Inferring neural population dynamics from multiple partial recordings of the same neural circuit. In Advances in Neural Information Processing Systems, pages 539 547, 2013. [3] Suraj Keshri, Eftychios Pnevmatikakis, Ari Pakman, Ben Shababo, and Liam Paninski. A shotgun sampling solution for the common input problem in neural connectivity inference. ar Xiv preprint ar Xiv:1309.3724, 2013. [4] Bernhard Sch olkopf and Alexander J Smola. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, 2002. [5] C.E. Rasmussen and C.K.I. Williams. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, Cambridge, MA, 2006. [6] John A Lee and Michel Verleysen. Nonlinear dimensionality reduction. Springer, 2007. [7] Emmanueal J. Candes and Terence Tao. The power of convex relaxation: Near-optimal matrix completion. Information Theory, IEEE Transactions on, 56(5):2053 2080, May 2010. [8] Raghunandan H Keshavan, Andrea Montanari, and Sewoong Oh. Matrix completion from a few entries. Information Theory, IEEE Transactions on, 56(6):2980 2998, 2010. [9] Benjamin Recht. A simpler approach to matrix completion. The Journal of Machine Learning Research, 12:3413 3430, 2011. [10] Raghunandan H Keshavan, Andrea Montanari, and Sewoong Oh. Matrix completion from noisy entries. Journal of Machine Learning Research, 11(2057-2078):1, 2010. [11] Emmanuel J Candes and Yaniv Plan. Matrix completion with noise. Proceedings of the IEEE, 98(6):925 936, 2010. [12] Emmanuel J Candes and Yaniv Plan. Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements. Information Theory, IEEE Transactions on, 57(4):2342 2359, 2011. [13] Vladimir Koltchinskii, Karim Lounici, and Alexandre B Tsybakov. Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. The Annals of Statistics, 39(5):2302 2329, 2011. [14] Sahand Negahban and Martin J Wainwright. Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. The Journal of Machine Learning Research, 13:1665 1697, 2012. [15] Akshay Krishnamurthy and Aarti Singh. Low-rank matrix and tensor completion via adaptive sampling. In C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 836 844. 2013. [16] Jie Chen, Nannan Cao, Kian Hsiang Low, Ruofei Ouyang, Colin Keng-Yan Tan, and Patrick Jaillet. Parallel gaussian process regression with low-rank covariance matrix approximations. ar Xiv preprint ar Xiv:1305.5826, 2013. [17] Emmanuel J Cand es and Benjamin Recht. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6):717 772, 2009. [18] Eyal Heiman, Gideon Schechtman, and Adi Shraibman. Deterministic algorithms for matrix completion. Random Structures & Algorithms, 2013. [19] Troy Lee and Adi Shraibman. Matrix completion from any given set of observations. In Advances in Neural Information Processing Systems, pages 1781 1787, 2013. [20] Monique Laurent. Matrix completion problems. Encyclopedia of Optimization, pages 1967 1975, 2009. [21] Monique Laurent and Antonios Varvitsiotis. A new graph parameter related to bounded rank positive semidefinite matrix completions. Mathematical Programming, 145(1-2):291 325, 2014. [22] Monique Laurent and Antonios Varvitsiotis. Positive semidefinite matrix completion, universal rigidity and the strong arnold property. Linear Algebra and its Applications, 452:292 317, 2014. [23] Christopher Williams and Matthias Seeger. Using the nystr om method to speed up kernel machines. In Advances in Neural Information Processing Systems 13. Citeseer, 2001. [24] Sanjiv Kumar, Mehryar Mohri, and Ameet Talwalkar. Sampling techniques for the nystrom method. In International Conference on Artificial Intelligence and Statistics, pages 304 311, 2009. [25] Peter H Sch onemann. A generalized solution of the orthogonal procrustes problem. Psychometrika, 31(1):1 10, 1966.