# universal_matrix_completion__ce909a35.pdf Universal Matrix Completion Srinadh Bhojanapalli BSRINADH@UTEXAS.EDU The University of Texas at Austin Prateek Jain PRAJAIN@MICROSOFT.COM Microsoft Research, India The problem of low-rank matrix completion has recently generated a lot of interest leading to several results that offer exact solutions to the problem. However, in order to do so, these methods make assumptions that can be quite restrictive in practice. More specifically, the methods assume that: a) the observed indices are sampled uniformly at random, and b) for every new matrix, the observed indices are sampled afresh. In this work, we address these issues by providing a universal recovery guarantee for matrix completion that works for a variety of sampling schemes. In particular, we show that if the set of sampled indices come from the edges of a bipartite graph with large spectral gap (i.e. gap between the first and the second singular value), then the nuclear norm minimization based method exactly recovers all low-rank matrices that satisfy certain incoherence properties. Moreover, we also show that under certain stricter incoherence conditions, O(nr2) uniformly sampled entries are enough to recover any rank-r n n matrix, in contrast to the O(nr log n) sample complexity required by other matrix completion algorithms as well as existing analyses of the nuclear norm method. 1. Introduction In this paper, we study the problem of universal low-rank matrix completion. Low-rank matrix completion is an important problem with several applications in areas such as recommendation systems, sketching, and quantum tomography (Recht et al., 2010; Cand es & Recht, 2009; Gross et al., 2010). The goal in matrix completion is to recover a rank-r matrix, given a small number of entries Proceedings of the 31 st International Conference on Machine Learning, Beijing, China, 2014. JMLR: W&CP volume 32. Copyright 2014 by the author(s). of the matrix. That is, to find a matrix M given values {Mij, (i, j) 2 }, where is the set of observed indices. Recently, several methods with provable guarantees have been proposed for solving the problem under the following two assumptions: a) M is incoherent, b) is sampled uniformly and | | cnr log n. Moreover, needs to be re-sampled for each matrix M that is to be recovered, i.e., the same cannot be re-used without worsening the guarantees significantly. While the first assumption can be shown to be necessary for any matrix oblivious sampling, the second assumption is relatively restrictive and might not hold in several practical settings. The main goal of this work is to develop a general result that can handle other sampling schemes as well. Moreover, we aim to develop a universal method where one fixed would be enough to recover any low-rank matrix M. Such a universal recovery result is highly desirable in several signal processing applications, where the goal is to design one that can recover any low-rank signal matrix M by observing M over alone. To this end, we reduce the problem of recoverability using an index set to the spectral gap (gap between largest and second largest singular values) of G, where G is a bipartite graph whose biadjacency matrix G 2 Rn n is given by: Gij = 1 iff (i, j) 2 and Gij = 0 otherwise. In particular, we show that if G has a large enough spectral gap and if the rank-r matrix M satisfies the standard incoherence property, then the best rank-r approximation of P (M) (see (1)) itself is enough to get a reasonable approximation to M (see Theorem 4.1 for details). Note that our approximation result is similar to Theorem 1.1 of (Keshavan et al., 2010), but our result holds for any with large spectral gap unlike (Keshavan et al., 2010) that requires uniform sampling. On the other hand, we require explicit incoherence condition on singular vectors of M, while the result of (Keshavan et al., 2010) only requires a bound on Mmax = maxij Mij. The later assumption is strictly weaker assumption; the assumptions coincide for PSD matrices. Universal Matrix Completion Next, we show that by assuming certain stronger incoherence properties, the number of samples required by the popular nuclear-norm minimization method (Cand es & Recht, 2009) to recover back M depends only on n, r and the spectral gap of the d-regular bipartite graph G. In particular, we require d σ2(G) r, where σ2(G) is the second largest singular value of G. Hence, if σ2(G) = O( d), i.e., if G is an expander, then | | = nd = O(nr2) samples suffice for exact recovery. Our recovery results applies to any low rank matrix M that satisfies the stronger incoherence property, given that the fixed graph G has a large spectral gap. To the best of our knowledge, this is the first universal guarantee for matrix completion. Furthermore, using recent results by (Feige & Ofek, 2005) we show that for the standard uniform sampling of , only O(nr2) samples suffice for exact recovery of a rank-r matrix M that satisfies a stronger incoherence condition (see A2 in Section 3). Next, we discuss the stronger incoherence property that we require for our universal recovery guarantees. In particular, we show that the standard incoherence condition alone cannot provide universal recovery with any graph G and hence a stronger incoherence property is required. Finally, we empirically demonstrate our observation that, instead of the number of samples, the spectral gap of G is what really governs recoverability of the true matrix. In particular, we construct a family of graphs based on the stochastic block model and show that the probability of success grows linearly with the spectral gap, irrespective of the number of samples. Notation: We denote matrices by capital letters (e.g. U) and vectors by small letters (e.g. u). U T denotes the transpose of the matrix U. Uij represents the (i, j)-th element of U. Ui represents the i-th column of U and U i represents the i-th row of U (but in column format). kuk represents the L2 norm of u and k Uk represents the spectral norm of U, i.e., k Uk = maxx:kxk 1 k Uxk. k Uk F represents the Frobenius norm and k Uk1 is the absolute maximum element of U. C = A.B represents the Hadamard product of A, B, i.e., Cij = Aij Bij. Similarly, (u.v)i = uivi. 1 denotes the all 1 s vector. 1? represents a unit vector that is perpendicular to 1 and is determined appropriately by the context. Paper Organization: In the next section, we discuss some related works. Then, in Section 3, we define the problem of matrix completion and the bipartite graph G that we use. We present our main results in Section 4 and discuss the additional incoherence assumption in Section 5. In Section 6, we present observations from our empirical study. Finally in Section 7, we provide the proof of our exact recovery result. 2. Related work Matrix completion: In a seminal paper on matrix completion, (Cand es & Recht, 2009) showed that any n n incoherent matrix of rank r can be recovered from Cn1.2r log(n) uniform random samples using nuclear norm minimization. Later, assuming the matrix to be strongly incoherent, (Cand es & Tao, 2010) improved the sample complexity for nuclear norm minimization method to O(nr log6(n)). Subsequently, (Recht, 2009; Gross, 2011) generalized this result for any incoherent matrix using matrix Bernstein inequalities and presented significantly simpler proofs. Concurrently other algorithms were shown to recover incoherent matrices using O(nr log(n)) (or worse) samples such as: SVD followed by descent on Grassmanian manifold (Keshavan et al., 2010), alternating minimization (Jain et al., 2012). We note that all the above mentioned results need to assume a rather restrictive sampling scheme, i.e., each entry is sampled uniformly at random and furthermore require a fresh set of samples for each new matrix. Moreover, the number of samples required is at least O(nr log n). Other sampling schemes: Recently, there has been some results for different type of sampling schemes such as power-law distributions (Meka et al., 2009), but here again universal results are not known and furthermore the proposed algorithms are not robust to noise. Another line of work has been to devise sampling schemes dependent on the data matrix (Chen et al., 2013), (Kir aly & Tomioka, 2012). Naturally, these schemes cannot be universal as the sampling scheme itself is dependent on the data matrix. Furthermore, practicality of such schemes is not clear a priori. Universality: Universality is an important property in signal processing or sketching applications, as the goal there is to have one fixed sampling operator that performs well for all the given signals. While, universality results are well known for several other sensing problems, such as sparse vector recovery (Candes & Tao, 2005), one-bit compressive sensing (Gopi et al., 2013), similar results for lowrank matrix sensing are mostly restricted to RIP-type operators (Recht et al., 2010; Liu, 2011). Unfortunately, RIPtype of operators are typically dense, requiring knowledge of all elements of matrix to get observations, and also require large storage/computational complexity. On the other hand sampling individual elements is a sparse operator and hence computationally efficient. Hence, for several signal processing applications, universal matrix completion results are critical. In fact, several recent works have studied problems similar to that of universal matrix completion. For example, (Kir aly & Tomioka, 2012; Heiman et al., 2013) and (Lee & Shraibman, 2013). However, there are critical differ- Universal Matrix Completion ences in our results/approaches that we now highlight. In (Kir aly & Tomioka, 2012) authors consider an algebraic approach to analyze sufficient conditions for matrix completion. While they propose interesting deterministic sufficient conditions, the algorithm analyzed in the paper requires solving an NP hard problem. In contrast we analyze the nuclear norm minimization method which is known to have several efficient implementations. In (Heiman et al., 2013) and (Lee & Shraibman, 2013) authors consider sampling based on expanders but only provide generalization error bounds rather than exact recovery guarantee. Moreover, the recovered matrix using their algorithm need not have a low-rank. 3. Problem Definition & Assumptions Let M 2 Rn1 n2 be a rank-r matrix and let n1 n2. Define n = max{n1, n2} = n1. Let M = U V T be the SVD of M and let σ1 σ2 σr be the singular values of M. We observe a small number of entries of M indexed by a set 2 [n1] [n2]. That is, we observe Mij, 8(i, j) 2 . Define the sampling operator P : Rn1 n2 ! Rn1 n2 as: Mij, if (i, j) 2 , 0, if (i, j) 62 . (1) Next, we define a bipartite graph associated with the sampling operator P . That is, let G = (V, E) be a bipartite graph where V = {1, 2, . . . , n1} [ {1, 2, . . . , n2} and (i, j) 2 E iff (i, j) 2 . Let G 2 Rn1 n2 be the biadjacency matrix of the bipartite graph G with Gij = 1 iff (i, j) 2 . Note that, P (M) = M.G, where . denotes the Hadamard product. Now, the goal in universal matrix completion is to design a set and a recovery algorithm, s.t., all rank-r matrices M can be recovered using only P (M). In the next section, we present two results for this problem. Our first result gives an approximate solution to the matrix completion problem and our second result gives exact recovery guarantees. For our results, we require G, that is associated with , to be a d-regular bipartite graph with large spectral gap. More concretely, we require the following two properties from the sampling graph G: Assumptions on G/ : G1 Top singular vectors of G are all 1 s vector. G2 σ1(G) = d and σ2(G) C Note that as the graph is d-regular, hence | | = nd. The eigenvalues of the adjacency matrix of the bipartite graph G are {σi(G), σi(G)}, i = 1, ..n. We state all the definitions in terms of singular values of G instead of the eigenvalues of the adjacency matrix. The above two properties are satisfied by a class of expander graphs called Ramanujan graphs; in fact, Ramanujan graphs are defined by using this spectral gap property: Definition 3.1 (Ramanujan graph (Hoory et al., 2006)). Let σ1(G), σ2(G), ..., σn(G) be the singular values of G in decreasing order. Then, a d-regular bipartite graph G is a Ramanujan graph if σ2(G) 2 Ramanujan graphs G are well-studied in literature and there exists several randomized/deterministic methods to generate such graphs. We briefly discuss a couple of popular constructions in Section 4.2. Incoherence assumptions: Now, we present incoherence assumptions that we impose on M: A1 ||U i||2 µ0r , 8i and ||V j||2 µ0r d U k U k T Ik δd, 8S [n1], |S| = d and d0 V k V k T Ik δd, 8S [n2], |S| = d0. d0 = dn2/n1. Note that A1 is the standard incoherence assumption required by most of the existing matrix completion results. However, A2 is a stricter assumption than A1 and is similar to the stronger incoherence property introduced by (Cand es & Tao, 2010). We discuss necessity of such assumption for universal matrix completion in Section 5. 4. Main Results We now present our main results for the matrix completion problem. We assume that is generated using a bipartite d-regular expander and satisfies G1 and G2 (see Section 3). Our first result shows that, if M satisfies A1, then the best rank-r approximation of P (M) is close to M and hence serves as a good approximation for M that can also be used for initialization of other methods like alternating least squares. Our second results shows that if M satisfies both A1 and A2, then using nuclear-norm minimization based method, P (M) can be used to recover back M exactly. 4.1. Matrix approximation Theorem 4.1. Let G be a d-regular bipartite graph satisfying G1 and G2. Let M be a rank-r matrix that satisfies assumption A1. Then, Universal Matrix Completion That is, k n d Pk(P (M)) Mk 2Cµ0r p d ||M||, for any k r, where Pk(A) is the best rank-k approximation of A and can be obtained using top-k singular vectors of A. Now, if M is a PSD matrix then the above result is exactly same as the Theorem 1.1 of (Keshavan et al., 2010). For non-PSD matrices, our result requires a bound on norm of each row of singular vectors of M, while the result of (Keshavan et al., 2010) only requires a bound on the largest element of M, hence is similar to our requirement but is strictly weaker as well. On the other hand, our result holds for all M for a given , if s associated graph G satisfies both G1 and G2. Moreover, if G is generated using an Erdos-Renyi graph then, after a standard trimming step, the above theorem directly implies(for PSD matrices) Theorem 1.1 of (Keshavan et al., 2010). Finally, we would like to stress that our proof is significantly simpler and is able to exploit the fact that Erd os R enyi graphs have good spectral gap in a fairly straightforward and intuitive manner. We now present a detailed proof of the above theorem. Proof. Let M = U V T , U, V 2 Rn r. Note that, d P (M) Mk = max {x,y:kxk2=1, d P (M) M)x. d P (M) M)x = d σi(y.Ui)T G(x.Vi) σi(y T Ui)(x T Vi) Let y.Ui = i1 + βi1i ?. Then, i = 1T (y.Ui) d P (M) M)x σi(y T Uix T Vi + n T G(x.Vi)) σiy T Uix T Vi σiβikx.Vik2 where 1 follows from assumption G2 and 2 follows from the Cauchy-Schwarz inequality. Now, where 1 follows from A1 and 2 follows by using kyk2 = 1. Using similar argument as above, Pr i=1 kx.Vik2 µ0r n . Theorem now follows by using (5), (6), and the above inequality. The proof of the second part is given in appendix A. 4.2. Nuclear norm minimization We now present our result for exact recovery of the matrix M using P (M) alone. For recovery, we use the standard nuclear norm minimization algorithm, i.e., we obtain a matrix X by solving the following convex optimization problem: min k Xk s. t. P (X) = P (M), (7) where k Xk denotes the nuclear norm of X; nuclear norm of X is equal to the sum of its singular values. As mentioned in Section 2, nuclear norm minimization technique is a popular technique for the low-rank matrix completion problem and has been shown to provably recover the true matrix, assuming that is sampled uniformly at random and | | cnr log n (Cand es & Tao, 2010). Below, we provide a universal recovery result for the nuclear-norm minimization method as long as the samples come from G that satisfies G1 and G2. Theorem 4.2. Let M be an n1 n2 matrix of rank r satisfying assumptions (A1) and (A2) with δd 1 6, and is generated from a d-regular graph G that satisfies the assumptions (G1) and (G2). Also, let d 36C2µ2 0r2, i.e., | | = nd 36C2µ2 0r2 max{n1, n2}. Then M is the unique optimum of problem (7). Note that the above result requires only deterministic constraints on the sampling operator P and guarantees exact recovery for any matrix M that satisfies A1, A2. As mentioned earlier, A2 is a stronger assumption than A1. But as we show in Section 5, universal recovery is not possible with assumption A1 alone. We can use the above theorem to derive results for several interesting sampling schemes such as random d-regular graphs. Using Theorem 1 in (Friedman, 2003), the second singular value of a random d-regular graph is 2 d 1+ , for every > 0, with high probability. Hence, a random d-regular graph, with high probability, obeys G1 and G2 which implies the following exact recovery result: Corollary 4.3. Let M be an n1 n2 matrix of rank r satisfying assumptions (A1) and (A2) with δd 1 6, and is generated from a random d-regular graph, then M is the unique optimum of program (7) when d 36 4µ2 0r2, with high probability. Universal Matrix Completion Note that the standard completion results such as (Cand es & Recht, 2009), (Keshavan et al., 2010) generate using Erd os-R enyi graph, which are slightly different than the random d-regular graph we considered above. However, (Feige & Ofek, 2005) showed that the second largest singular value of the Erd os-R enyi graph, G(n1, n2, p), is O( d) when p is (log(n1)/n2). Interestingly, even when p = c/n2, i.e. n2 p is a constant, trimming the graph (i.e., removing few nodes with high degree) gives a graph G s.t. σ2(G) = O( d). Hence, we can again apply Theorem 4.2 to obtain the following result: Corollary 4.4. Let M be an n1 n2 matrix of rank r satisfying assumptions (A1) and (A2) with δd 1 6, and is generated from a G(n, p) graph after trimming, then M is the unique optimum of program (7) when p 36cµ2 min{n1,n2}, with high probability. While the above two results exploit the fact that a random graph is almost a Ramanujan expander and hence our general recovery result can be applied, the graph construction is still randomized. Interestingly, (Lubotzky et al., 1988; Margulis, 1988; Morgenstern, 1994) proposed explicit deterministic constructions of Ramanujan graphs when d 1 is a prime power. Moreover, (Marcus et al., 2013) showed that bipartite Ramanujan graphs exist for all n and d. However, explicit construction for all n and d still remains an open problem. 5. Discussion In this section, we discuss the two assumptions A1 and A2 that are mentioned in Section 3. Note that A1 is a standard assumption that is used by most of the existing approaches (Cand es & Recht, 2009), (Keshavan et al., 2010). Moreover, it is easy to show that for any matrix oblivious sampling approach, this assumption is necessarily required for exact recovery. For example, if Gij = 0, i.e., (i, j)-th element is not observed then we cannot recover M = eie T However, A2 is a slightly non-standard assumption and intuitively it requires the singular vectors of M to satisfy RIP. Note that A2 is similar to the strong incoherence property introduced by (Cand es & Tao, 2010). Below we show the connection between strong incoherence property (SIP) assumed in (Cand es & Tao, 2010) and assumption A2. Claim 5.1. Let M 2 Rn1 n2 be a rank-r matrix. Let M = U V T satisfy SIP i.e., |hei, UU T eji r , 81 i, j n1, |hei, V V T eji r , 81 i, j n2. (8) Then, M satisfies A2 for all d r and δd µ1 Note that the above claim holds with δd = µ1 pr, 8d r. This bound is independent of d and hence weak; as d becomes close to n1, k2S U k U k T I !!! gets close to 0 since U T U = I. We leave the task of obtaining a stronger bound as an open problem. In the context of universal recovery, a natural question is if any additional assumption is required or the standard A1 assumption alone suffices. Here, we answer this question in negative. Specifically, we show that if M satisfies A1 only, then universal recovery guarantee for even rank-2 matrices cannot be provided by using as many as n1n2/4 observations. Claim 5.2. Let be a fixed set of indices and let P be the sampling operator as defined in (1). Let n1 = n2 = n and let | | = n2/4. Then, there exists a rank-2 matrix M that cannot be recovered exactly from P (M). Now, another question is if we require a property as strong as A2 and if just a lower-bound on k U ik2, k V jk2 is enough for universal recovery. The proof of the above claim (in appendix) shows that even if k U ik2, k V jk2 are lowerbounded, then also exact recovery is not possible. 6. Simulations In this section, we will present a few empirical results on both synthetic and real data sets. The goal of this section is to demonstrate effect of the spectral gap of the sampling graph G (associated with ) on successful recovery of a matrix. First, we use synthetic data sets generated in the following manner. We first sample U, V 2 R500 10 using standard normal distribution. We then generate rank-10 matrix M, using M = UV T . As U, V are sampled from the normal distribution, hence w.h.p., M satisfies incoherence assumptions A1, A2 mentioned in Section 3. Next, we generate a sequence of sampling operators P (and the associated graph G) with varying (relative) spectral gap(1 σ2(G)/σ1(G)) by using a stochastic block model. In the basic stochastic block model, the nodes can be thought of as being divided into two clusters. Now, each intra-cluster edge is sampled uniformly with probability p and an inter-cluster edge is sampled uniformly with probability q. Note that, when p = q, then the spectral gap is largest and when p = 1, q = 0, then spectral gap is smaller as there are two distinct clusters in that case (Nadakuditi & Newman, 2012). Note that number of samples generated in this model depends only on the value of p + q. To generate (i.e., G), we first fix a value for p + q, hence fixing the number of samples, and then vary p, q, which gives graphs of different spectral gap. As value of p goes from 0 to p+q 2 , the spectral gap goes up and from p+q 2 to p + q, the spectral gap goes Universal Matrix Completion 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 spectral gap (dotted) success ratio(solid) p + q= 0.2, d=50 p + q= 0.4, d=100 p + q= 0.6, d=150 0 0.2 0.4 0.6 0.8 1 0 Spectral gap Success ratio p + q= 0.2, d=50 p + q= 0.3, d=75 p + q= 0.4, d=100 p + q= 0.6, d=150 Figure 1. (a): Figure plots the spectral gap (dotted lines) as well as the the fraction of successful recoveries (solid lines) with varying p, the parameter in stochastic block model. Value of p + q dictates the number of observed entries, i.e., | |. (b): Fraction of successful recovery vs spectral gap of sampling operator. Clearly, success ratio for matrix recovery show a phase transition type phenomenon w.r.t. the spectral gap. Also, different values of p + q, i.e, number of samples do not affect success ratio too much. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 Spectral gap Relative error p + q= 0.2 p + q= 0.4 p + q= 0.6 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 Spectral gap Relative error p + q= 0.05 p + q= 0.1 p + q= 0.15 p + q= 0.2 σ2/σ1 10 20 30 40 0 Size of the matrix Average runtime(in secs) r closure NNM (a) (b) (c) Figure 2. (a): Noisy samples case: Relative error in Frobenius norm vs spectral gap of sampling operator. Perturbed samples of matrix M + Z are observed, where Z is Gaussian noise matrix with σ = k Zk F /k Mk F . Solid lines correspond to σ = 0.1 and dotted for σ = 0.2. (b): Temperature prediction: Figure plots relative error in spectral norm vs spectral gap of the sampling operator for different number of samples(p+q). (c): Comparison with r-closure algorithm: Figure compares running time of r-closure algorithm with nuclear norm minimization algorithm. down. Figure 1(a) clearly demonstrates this trend. We use an Augmented Lagrangian Method (ALM) based method (Lin et al., 2010) to solve the nuclear norm minimization (7) problem. A trial is considered to be successful if the relative error (in Frobenius norm) is less than 0.01. We average over 50 such trials to determine the success ratio. Figure 1(a) plots the (relative) spectral gap (dotted lines) and the success ratio (solid lines) as p varies. Lines of different colors indicate different number of samples (p + q). As expected, the spectral gap increases initially, as p varies from 0 to p+q 2 and then it decreases. Moreover, the trend of successful recovery also follows a similar trajectory and hence, is more or less independent of the number of samples (given a particular spectral gap). Figure 1(b) shows fraction of successful recoveries as the spectral gap increases. Here again, lines of different colors indicate different number of samples (p + q). Clearly, success ratio is positively correlated with the spectral gap of the sampling operator and in fact exhibits a phase transition type of phenomenon. We expect the difference in success ratio for different p + q values to decrease with increasing problem size, i.e, dimensionality of M. Now, we conduct an experiment to show that spectral gap of G helps in reducing effect of noise as well. To this end, we generated noisy input matrix (M + Z), where Z is a random Gaussian matrix and let σ = ||Z||F /||M||F . We consider two values of σ, i.e., σ = 0.1 and 0.2. Figure 2(a) plots the error(in Frobenius norm) in the recovered matrix against the relative spectral gap in the noisy setting. Solid lines represent σ = 0.1 and dotted lines represent σ = 0.2. Clearly, larger spectral gap leads to smaller error in recovery. Moreover, the matrix completion denoising effect (Candes & Plan, 2010) can also be observed. For example, when ||Z||F = 0.1 and p + q = 0.4, output error is less than 0.05, and when ||Z||F = 0.2, error is less than 0.1 Temperature prediction: Finally we take a real dataset of temperature values(T) for 365 days at 316 different locations from (NCDC), which has been used to test matrix completion algorithms (Candes & Plan, 2010) before. Note that T is approximately rank-1 matrix with Universal Matrix Completion σ1(T)/||T||F = 0.98(σi(T) are singular values of T). We use the block model sampling scheme to sample entries from T, and let the output of (7) be ˆT. In figure 2(b) we plot the error || ˆT T||/||T|| for different values of spectral gap of G and for different number of samples (p + q). Note that ||X T||/||T|| σ2(T)/σ1(T) for any rank-1 matrix X, and we see that for large enough spectral gap we achieve this bound. Finally in figure 2(c) we compare running times of the rclosure algorithm proposed in (Kir aly & Tomioka, 2012) with the nuclear norm minimization algorithm. While it is noted that this algorithm has better error guarantees, it is combinatorial and takes exponential time to compute. 7. Proof of Theorem 4.2 In this section, we present the proof of our main result (Theorem 4.2). The main steps of our proof are similar to the proof given by (Recht, 2009). The main difference is that the bounds in the existing proof assume that is independent of M and hence is not adversarial and holds with high probability. In contrast, for our proofs, bounds are deterministic and our proofs have to work under the assumption that M is adversarially selected for a given . The key steps in the proof are: a) provide conditions that an optimal dual solution (or dual certificate) of problem (7) should satisfy, so that the true matrix M is the unique optimum of (7), b) construct such a dual certificate and hence guarantee that M is the unique optimum of (7). We first introduce a few notations required by our proof. For simplicity, we assume that n1 = n2 = n. Note that, our proof easily generalizes to case when n1 6= n2. Define T which is a subspace of Rn n, and is span of all matrices of form UXT and Y V T , i.e. all matrices with either row space in V or column space in U. Hence, the projection operator PT is defined as follows: PT (Z) =UU T Z + ZV V T UU T ZV V T =UU T Z + (I UU T )ZV V T . Hence any matrix in T can be written as UXT + Y V T , for some X and Y such that Y and U are orthogonal to each other. Similarly, we can define the projection operator onto T ?, the orthogonal complement of T: PT ?(Z) = (I UU T )Z(I V V T ). Now, before presenting conditions on the dual certificate and construction of the dual certificate, we provide a few structural lemmas that show goodness of operators PT and P . We would like to stress that the key differences of our proof from that of (Recht, 2009) is in fact proofs of these structural lemmas and also in the way we apply Lemma 7.3. We specifically show (using Lemma 7.3) that each matrix in the series used in the construction of dual certificate Y (discussed later in the section) is incoherent and has small infinity norm. The first lemma proves injectivity of operator P on the subspace T: Lemma 7.1. Let M = U V T satisfies A1, A2 and let the graph G that generates satisfies G1, G2 (see Section 3). Then, for any matrix Z 2 T, d PT P (Z) Zk F d )k Zk F . Next, we provide a lemma that characterizes the difference between P (Z) and Z, for any incoherent-type Z 2 T: Lemma 7.2. Let Z 2 T, i.e., Z = UXT + Y V T and Y is orthogonal to U, and X and Y be incoherent, i.e., n , k Y jk2 c2 Let satisfy the assumptions G1 and G2, then: d P (Z) Zk (c1 + c2)Cµ0r p The next lemma is a stronger version of Lemma 7.1 for special incoherent-type matrices Z 2 T. Lemma 7.3. Let Z 2 T, i.e., Z = UXT + Y V T and Y is orthogonal to U. Let X and Y be incoherent, i.e., n , k Y jk2 c2 Let Z = Z n d PT P (Z). Then, the following holds for M, that satisfy conditions given in Lemma 7.1: k Zk1 (c1+c2)µ0r n (δd + Cµ0r p Z = U XT + Y V T and X and Y are incoher- ent. k Xik2 µ0r and k Y jk2 n (δdc2 + (c1 + c2) Cµ0r p The proof of the above three lemmas is provided in the appendix. Conditions on the dual certificate: We now present the lemma that characterizes the conditions a dual certificate should satisfy so that M is the unique optimum of (7): Lemma 7.4. Let M, satisfy A1, A2 and G1, G2, respectively. Then, M is the unique optimum of (7), if there exists a Y 2 Rn n that satisfies the following: Universal Matrix Completion ||PT (Y ) UV T ||F ||PT ?(Y )|| < 1 Having specified the conditions on dual certificate and also the key structural lemma, we are now ready to present the proof of Theorem 4.2. Proof of Theorem 4.2. We prove the theorem by constructing a dual certificate Y that satisfies conditions in lemma 7.4 and hence guarantee that M is exactly recovered by (7). Our construction of Y is similar to the golfing scheme based construction given in (Gross, 2011; Recht, 2009). In particular, Y is obtained as the p-th term of the series given below: Wk+1 = Wk n d PT P Wk, Yk = where W0 = UV T . That is, Y = Yp where p = d 1 2 log3( n 18C2µ2 0r)e. Also, define = Cµ0r p 0r2, we get: 1 Now, the first condition of Lemma 7.4 is satisfied trivially by construction as Yp is a sum of P (Wi) terms. Bounding ||PT (Y ) UV T ||F : By construction: PT (Y ) UV T = n d PT P Wi UV T = Wp. (9) Now, note that each Wk 2 T. Hence, using Lemma 7.1, k Wk+1k F = k Wk n d PT P Wkk F d )k Wkk F 1 3k Wkk F , (10) where the last inequality follows by using assumption on δd and by using 1/6. Hence, using (9), (10), we have: k PT (Y ) UV T k F = k Wpk F where the last inequality follows by using p = d 1 2 log3( n 18C2µ2 0r)e and by using k W0k F = pr. Bounding k PT ?(Y )k: Recall that, Wk 2 T. Now, let Wk = UXT Wk + YWk V T with YWk perpendicular to U. Moreover, let, pµ0r pn , k Y i Note that, for W0 = UV T , c W0 1 = 1 and c W0 2 = 0. Hence, using Lemma 7.3: c W1 6. Similarly, applying Lemma 7.3 k-times, we get: Now, by using construction of Y , by triangle inequality, and by using the fact that PT ? is a contraction operator: k PT ?(Y )k d P (Wk 1) Wk 1k where the last inequality follows by using Lemma 7.2. Now, using (12), (13), and Lemma 7.3, we have: k PT ?(Y )k Cµ0r p 3k 2 1 6k 2 )) Hence, proved. 8. Conclusions In this paper, we provided the first (to the best of our knowledge) universal recovery guarantee for matrix completion. The main observation of the paper is that the spectral gap of G (that generates ) is the key property that governs recoverability of M using P (M) alone. For example, if G is a Ramanujan expander (i.e., σ2(G) = O( d)), then we have universal recovery guarantees for matrices with strong incoherence property. For uniformly sampled , our main result implies exact recovery of constant rank matrices using O(n) entries, in contrast to the O(n log n) entries required by the existing analyses. One caveat is that we require stronger incoherence property to obtain the above given sample complexity. Our results also provide a recipe to determine if a given index set is enough to recover a low-rank matrix. That is, given and its associated graph G, we can measure the spectral gap of G and if it is large enough then our results guarantee exact recovery of strongly incoherent matrices. In Section 5, we showed that the standard incoherence assumption alone is not enough for universal recovery and a property similar to A2 (see Section 3) is required. However, it is an open problem to obtain precise information theoretic limits on δd (see A2) for universal recovery guarantees. Another interesting research direction is to study the alternating minimization method under assumptions given in Section 3. Universal Matrix Completion Candes, Emmanuel J and Plan, Yaniv. Matrix completion with noise. Proceedings of the IEEE, 98(6):925 936, 2010. Cand es, Emmanuel J and Recht, Benjamin. Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6):717 772, 2009. Candes, Emmanuel J and Tao, Terence. Decoding by linear programming. Information Theory, IEEE Transactions on, 51(12):4203 4215, 2005. Cand es, Emmanuel J and Tao, Terence. The power of convex relaxation: Near-optimal matrix completion. Information Theory, IEEE Transactions on, 56(5):2053 2080, 2010. Chen, Yudong, Bhojanapalli, Srinadh, Sanghavi, Sujay, and Ward, Rachel. Coherent matrix completion. ar Xiv preprint ar Xiv:1306.2979, 2013. Feige, Uriel and Ofek, Eran. Spectral techniques applied to sparse random graphs. Random Structures & Algorithms, 27(2):251 275, 2005. Friedman, Joel. A proof of alon s second eigenvalue con- jecture. In Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, pp. 720 724. ACM, 2003. Gopi, Sivakant, Netrapalli, Praneeth, Jain, Prateek, and Nori, Aditya. One-bit compressed sensing: Provable support and vector recovery. In Proceedings of the 30th international conference on machine learning (ICML13), pp. 154 162, 2013. Gross, David. Recovering low-rank matrices from few co- efficients in any basis. Information Theory, IEEE Transactions on, 57(3):1548 1566, 2011. Gross, David, Liu, Yi-Kai, Flammia, Steven T, Becker, Stephen, and Eisert, Jens. Quantum state tomography via compressed sensing. Physical review letters, 105(15): 150401, 2010. Heiman, Eyal, Schechtman, Gideon, and Shraibman, Adi. Deterministic algorithms for matrix completion. Random Structures & Algorithms, 2013. Hoory, Shlomo, Linial, Nathan, and Wigderson, Avi. Ex- pander graphs and their applications. Bulletin of the American Mathematical Society, 43(4):439 561, 2006. Jain, Prateek, Netrapalli, Praneeth, and Sanghavi, Su- jay. Low-rank matrix completion using alternating minimization. ar Xiv preprint ar Xiv:1212.0467, 2012. Keshavan, Raghunandan H, Montanari, Andrea, and Oh, Sewoong. Matrix completion from a few entries. Information Theory, IEEE Transactions on, 56(6):2980 2998, 2010. Kir aly, Franz J. and Tomioka, Ryota. A combinatorial alge- braic approach for the identifiability of low-rank matrix completion. In ICML, 2012. Lee, Troy and Shraibman, Adi. Matrix completion from any given set of observations. In Advances in Neural Information Processing Systems, pp. 1781 1787, 2013. Lin, Zhouchen, Chen, Minming, and Ma, Yi. The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. ar Xiv preprint ar Xiv:1009.5055, 2010. Liu, Yi-Kai. Universal low-rank matrix recovery from pauli measurements. ar Xiv preprint ar Xiv:1103.2816, 2011. Lubotzky, Alexander, Phillips, Ralph, and Sarnak, Pe- ter. Ramanujan graphs. Combinatorica, 8(3):261 277, 1988. Marcus, Adam, Spielman, Daniel A, and Srivastava, Nikhil. Interlacing families i: Bipartite ramanujan graphs of all degrees. ar Xiv preprint ar Xiv:1304.4132, 2013. Margulis, Grigorii Aleksandrovich. Explicit grouptheoretical constructions of combinatorial schemes and their application to the design of expanders and concentrators. Problemy peredachi informatsii, 24(1):51 60, 1988. Meka, Raghu, Jain, Prateek, and Dhillon, Inderjit S. Ma- trix completion from power-law distributed samples. In Advances in Neural Information Processing Systems, pp. 1258 1266, 2009. Morgenstern, Moshe. Existence and explicit constructions of q+ 1 regular ramanujan graphs for every prime power q. Journal of Combinatorial Theory Series B, 62(1):44 62, 1994. Nadakuditi, Raj Rao and Newman, Mark EJ. Graph spec- tra and the detectability of community structure in networks. Physical review letters, 108(18):188701, 2012. NCDC. National climatic data center. URL http:// www.ncdc.noaa.gov/oa/ncdc.html. Recht, Benjamin. A simpler approach to matrix comple- tion. ar Xiv preprint ar Xiv:0910.0651, 2009. Recht, Benjamin, Fazel, Maryam, and Parrilo, Pablo A. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review, 52(3):471 501, 2010.