# universal_matrix_completion__ce909a35.pdf

Universal Matrix Completion

Srinadh Bhojanapalli BSRINADH@UTEXAS.EDU The University of Texas at Austin

Prateek Jain PRAJAIN@MICROSOFT.COM Microsoft Research, India

The problem of low-rank matrix completion has recently generated a lot of interest leading to several results that offer exact solutions to the problem. However, in order to do so, these methods make assumptions that can be quite restrictive in practice. More speciﬁcally, the methods assume that: a) the observed indices are sampled uniformly at random, and b) for every new matrix, the observed indices are sampled afresh. In this work, we address these issues by providing a universal recovery guarantee for matrix completion that works for a variety of sampling schemes. In particular, we show that if the set of sampled indices come from the edges of a bipartite graph with large spectral gap (i.e. gap between the ﬁrst and the second singular value), then the nuclear norm minimization based method exactly recovers all low-rank matrices that satisfy certain incoherence properties. Moreover, we also show that under certain stricter incoherence conditions, O(nr2) uniformly sampled entries are enough to recover any rank-r n n matrix, in contrast to the O(nr log n) sample complexity required by other matrix completion algorithms as well as existing analyses of the nuclear norm method.

1. Introduction

In this paper, we study the problem of universal low-rank matrix completion. Low-rank matrix completion is an important problem with several applications in areas such as recommendation systems, sketching, and quantum tomography (Recht et al., 2010; Cand es & Recht, 2009; Gross et al., 2010). The goal in matrix completion is to recover a rank-r matrix, given a small number of entries

Proceedings of the 31 st International Conference on Machine Learning, Beijing, China, 2014. JMLR: W&CP volume 32. Copyright 2014 by the author(s).

of the matrix. That is, to ﬁnd a matrix M given values {Mij, (i, j) 2 }, where is the set of observed indices.

Recently, several methods with provable guarantees have been proposed for solving the problem under the following two assumptions: a) M is incoherent, b) is sampled uniformly and | | cnr log n. Moreover, needs to be re-sampled for each matrix M that is to be recovered, i.e., the same cannot be re-used without worsening the guarantees signiﬁcantly.

While the ﬁrst assumption can be shown to be necessary for any matrix oblivious sampling, the second assumption is relatively restrictive and might not hold in several practical settings. The main goal of this work is to develop a general result that can handle other sampling schemes as well. Moreover, we aim to develop a universal method where one ﬁxed would be enough to recover any low-rank matrix M. Such a universal recovery result is highly desirable in several signal processing applications, where the goal is to design one that can recover any low-rank signal matrix M by observing M over alone.

To this end, we reduce the problem of recoverability using an index set to the spectral gap (gap between largest and second largest singular values) of G, where G is a bipartite graph whose biadjacency matrix G 2 Rn n is given by: Gij = 1 iff (i, j) 2 and Gij = 0 otherwise. In particular, we show that if G has a large enough spectral gap and if the rank-r matrix M satisﬁes the standard incoherence property, then the best rank-r approximation of P (M) (see (1)) itself is enough to get a reasonable approximation to M (see Theorem 4.1 for details).

Note that our approximation result is similar to Theorem 1.1 of (Keshavan et al., 2010), but our result holds for any with large spectral gap unlike (Keshavan et al., 2010) that requires uniform sampling. On the other hand, we require explicit incoherence condition on singular vectors of M, while the result of (Keshavan et al., 2010) only requires a bound on Mmax = maxij Mij. The later assumption is strictly weaker assumption; the assumptions coincide for PSD matrices.

Universal Matrix Completion

Next, we show that by assuming certain stronger incoherence properties, the number of samples required by the popular nuclear-norm minimization method (Cand es & Recht, 2009) to recover back M depends only on n, r and the spectral gap of the d-regular bipartite graph G. In particular, we require d σ2(G) r, where σ2(G) is the second largest singular value of G. Hence, if σ2(G) = O(

d), i.e., if G is an expander, then | | = nd = O(nr2) samples sufﬁce for exact recovery.

Our recovery results applies to any low rank matrix M that satisﬁes the stronger incoherence property, given that the ﬁxed graph G has a large spectral gap. To the best of our knowledge, this is the ﬁrst universal guarantee for matrix completion. Furthermore, using recent results by (Feige & Ofek, 2005) we show that for the standard uniform sampling of , only O(nr2) samples sufﬁce for exact recovery of a rank-r matrix M that satisﬁes a stronger incoherence condition (see A2 in Section 3).

Next, we discuss the stronger incoherence property that we require for our universal recovery guarantees. In particular, we show that the standard incoherence condition alone cannot provide universal recovery with any graph G and hence a stronger incoherence property is required.

Finally, we empirically demonstrate our observation that, instead of the number of samples, the spectral gap of G is what really governs recoverability of the true matrix. In particular, we construct a family of graphs based on the stochastic block model and show that the probability of success grows linearly with the spectral gap, irrespective of the number of samples.

Notation: We denote matrices by capital letters (e.g. U) and vectors by small letters (e.g. u). U T denotes the transpose of the matrix U. Uij represents the (i, j)-th element of U. Ui represents the i-th column of U and U i represents the i-th row of U (but in column format). kuk represents the L2 norm of u and k Uk represents the spectral norm of U, i.e., k Uk = maxx:kxk 1 k Uxk. k Uk F represents the Frobenius norm and k Uk1 is the absolute maximum element of U. C = A.B represents the Hadamard product of A, B, i.e., Cij = Aij Bij. Similarly, (u.v)i = uivi. 1 denotes the all 1 s vector. 1? represents a unit vector that is perpendicular to 1 and is determined appropriately by the context.

Paper Organization: In the next section, we discuss some related works. Then, in Section 3, we deﬁne the problem of matrix completion and the bipartite graph G that we use. We present our main results in Section 4 and discuss the additional incoherence assumption in Section 5. In Section 6, we present observations from our empirical study. Finally in Section 7, we provide the proof of our exact recovery result.

2. Related work

Matrix completion: In a seminal paper on matrix completion, (Cand es & Recht, 2009) showed that any n n incoherent matrix of rank r can be recovered from Cn1.2r log(n) uniform random samples using nuclear norm minimization. Later, assuming the matrix to be strongly incoherent, (Cand es & Tao, 2010) improved the sample complexity for nuclear norm minimization method to O(nr log6(n)). Subsequently, (Recht, 2009; Gross, 2011) generalized this result for any incoherent matrix using matrix Bernstein inequalities and presented signiﬁcantly simpler proofs. Concurrently other algorithms were shown to recover incoherent matrices using O(nr log(n)) (or worse) samples such as: SVD followed by descent on Grassmanian manifold (Keshavan et al., 2010), alternating minimization (Jain et al., 2012). We note that all the above mentioned results need to assume a rather restrictive sampling scheme, i.e., each entry is sampled uniformly at random and furthermore require a fresh set of samples for each new matrix. Moreover, the number of samples required is at least O(nr log n).

Other sampling schemes: Recently, there has been some results for different type of sampling schemes such as power-law distributions (Meka et al., 2009), but here again universal results are not known and furthermore the proposed algorithms are not robust to noise. Another line of work has been to devise sampling schemes dependent on the data matrix (Chen et al., 2013), (Kir aly & Tomioka, 2012). Naturally, these schemes cannot be universal as the sampling scheme itself is dependent on the data matrix. Furthermore, practicality of such schemes is not clear a priori.

Universality: Universality is an important property in signal processing or sketching applications, as the goal there is to have one ﬁxed sampling operator that performs well for all the given signals. While, universality results are well known for several other sensing problems, such as sparse vector recovery (Candes & Tao, 2005), one-bit compressive sensing (Gopi et al., 2013), similar results for lowrank matrix sensing are mostly restricted to RIP-type operators (Recht et al., 2010; Liu, 2011). Unfortunately, RIPtype of operators are typically dense, requiring knowledge of all elements of matrix to get observations, and also require large storage/computational complexity. On the other hand sampling individual elements is a sparse operator and hence computationally efﬁcient. Hence, for several signal processing applications, universal matrix completion results are critical.

In fact, several recent works have studied problems similar to that of universal matrix completion. For example, (Kir aly & Tomioka, 2012; Heiman et al., 2013) and (Lee & Shraibman, 2013). However, there are critical differ-

Universal Matrix Completion

ences in our results/approaches that we now highlight. In (Kir aly & Tomioka, 2012) authors consider an algebraic approach to analyze sufﬁcient conditions for matrix completion. While they propose interesting deterministic sufﬁcient conditions, the algorithm analyzed in the paper requires solving an NP hard problem. In contrast we analyze the nuclear norm minimization method which is known to have several efﬁcient implementations. In (Heiman et al., 2013) and (Lee & Shraibman, 2013) authors consider sampling based on expanders but only provide generalization error bounds rather than exact recovery guarantee. Moreover, the recovered matrix using their algorithm need not have a low-rank.

3. Problem Deﬁnition & Assumptions

Let M 2 Rn1 n2 be a rank-r matrix and let n1 n2. Deﬁne n = max{n1, n2} = n1. Let M = U V T be the SVD of M and let σ1 σ2 σr be the singular values of M. We observe a small number of entries of M indexed by a set 2 [n1] [n2]. That is, we observe Mij, 8(i, j) 2 . Deﬁne the sampling operator P : Rn1 n2 ! Rn1 n2 as:

Mij, if (i, j) 2 , 0, if (i, j) 62 . (1)

Next, we deﬁne a bipartite graph associated with the sampling operator P . That is, let G = (V, E) be a bipartite graph where V = {1, 2, . . . , n1} [ {1, 2, . . . , n2} and (i, j) 2 E iff (i, j) 2 . Let G 2 Rn1 n2 be the biadjacency matrix of the bipartite graph G with Gij = 1 iff (i, j) 2 . Note that, P (M) = M.G, where . denotes the Hadamard product.

Now, the goal in universal matrix completion is to design a set and a recovery algorithm, s.t., all rank-r matrices M can be recovered using only P (M). In the next section, we present two results for this problem. Our ﬁrst result gives an approximate solution to the matrix completion problem and our second result gives exact recovery guarantees.

For our results, we require G, that is associated with , to be a d-regular bipartite graph with large spectral gap. More concretely, we require the following two properties from the sampling graph G:

Assumptions on G/ :

G1 Top singular vectors of G are all 1 s vector.

G2 σ1(G) = d and σ2(G) C

Note that as the graph is d-regular, hence | | = nd.

The eigenvalues of the adjacency matrix of the bipartite graph G are {σi(G), σi(G)}, i = 1, ..n. We state all the

deﬁnitions in terms of singular values of G instead of the eigenvalues of the adjacency matrix. The above two properties are satisﬁed by a class of expander graphs called Ramanujan graphs; in fact, Ramanujan graphs are deﬁned by using this spectral gap property:

Deﬁnition 3.1 (Ramanujan graph (Hoory et al., 2006)). Let σ1(G), σ2(G), ..., σn(G) be the singular values of G in decreasing order. Then, a d-regular bipartite graph G is a Ramanujan graph if σ2(G) 2

Ramanujan graphs G are well-studied in literature and there exists several randomized/deterministic methods to generate such graphs. We brieﬂy discuss a couple of popular constructions in Section 4.2.

Incoherence assumptions: Now, we present incoherence assumptions that we impose on M:

A1 ||U i||2 µ0r

, 8i and ||V j||2 µ0r

d U k U k T Ik δd, 8S [n1], |S| = d and

d0 V k V k T Ik δd, 8S [n2], |S| = d0.

d0 = dn2/n1. Note that A1 is the standard incoherence assumption required by most of the existing matrix completion results. However, A2 is a stricter assumption than A1 and is similar to the stronger incoherence property introduced by (Cand es & Tao, 2010). We discuss necessity of such assumption for universal matrix completion in Section 5.

4. Main Results

We now present our main results for the matrix completion problem. We assume that is generated using a bipartite d-regular expander and satisﬁes G1 and G2 (see Section 3). Our ﬁrst result shows that, if M satisﬁes A1, then the best rank-r approximation of P (M) is close to M and hence serves as a good approximation for M that can also be used for initialization of other methods like alternating least squares. Our second results shows that if M satisﬁes both A1 and A2, then using nuclear-norm minimization based method, P (M) can be used to recover back M exactly.

4.1. Matrix approximation

Theorem 4.1. Let G be a d-regular bipartite graph satisfying G1 and G2. Let M be a rank-r matrix that satisﬁes assumption A1. Then,

Universal Matrix Completion

That is, k n

d Pk(P (M)) Mk 2Cµ0r p

d ||M||, for any k r, where Pk(A) is the best rank-k approximation of A and can be obtained using top-k singular vectors of A.

Now, if M is a PSD matrix then the above result is exactly same as the Theorem 1.1 of (Keshavan et al., 2010). For non-PSD matrices, our result requires a bound on norm of each row of singular vectors of M, while the result of (Keshavan et al., 2010) only requires a bound on the largest element of M, hence is similar to our requirement but is strictly weaker as well.

On the other hand, our result holds for all M for a given , if s associated graph G satisﬁes both G1 and G2. Moreover, if G is generated using an Erdos-Renyi graph then, after a standard trimming step, the above theorem directly implies(for PSD matrices) Theorem 1.1 of (Keshavan et al., 2010). Finally, we would like to stress that our proof is signiﬁcantly simpler and is able to exploit the fact that Erd os R enyi graphs have good spectral gap in a fairly straightforward and intuitive manner.

We now present a detailed proof of the above theorem.

Proof. Let M = U V T , U, V 2 Rn r. Note that,

d P (M) Mk = max {x,y:kxk2=1,

d P (M) M)x.

d P (M) M)x =

d σi(y.Ui)T G(x.Vi)

σi(y T Ui)(x T Vi)

Let y.Ui = i1 + βi1i

?. Then, i = 1T (y.Ui)

d P (M) M)x

σi(y T Uix T Vi + n

T G(x.Vi)) σiy T Uix T Vi

σiβikx.Vik2

where 1 follows from assumption G2 and 2 follows from the Cauchy-Schwarz inequality. Now,

where 1 follows from A1 and 2 follows by using kyk2 = 1. Using similar argument as above, Pr

i=1 kx.Vik2 µ0r

n . Theorem now follows by using (5), (6), and the above inequality. The proof of the second part is given in appendix A.

4.2. Nuclear norm minimization

We now present our result for exact recovery of the matrix M using P (M) alone. For recovery, we use the standard nuclear norm minimization algorithm, i.e., we obtain a matrix X by solving the following convex optimization problem:

min k Xk s. t. P (X) = P (M), (7)

where k Xk denotes the nuclear norm of X; nuclear norm of X is equal to the sum of its singular values.

As mentioned in Section 2, nuclear norm minimization technique is a popular technique for the low-rank matrix completion problem and has been shown to provably recover the true matrix, assuming that is sampled uniformly at random and | | cnr log n (Cand es & Tao, 2010).

Below, we provide a universal recovery result for the nuclear-norm minimization method as long as the samples come from G that satisﬁes G1 and G2.

Theorem 4.2. Let M be an n1 n2 matrix of rank r satisfying assumptions (A1) and (A2) with δd 1

6, and is generated from a d-regular graph G that satisﬁes the assumptions (G1) and (G2). Also, let d 36C2µ2

0r2, i.e., | | = nd 36C2µ2

0r2 max{n1, n2}. Then M is the unique optimum of problem (7).

Note that the above result requires only deterministic constraints on the sampling operator P and guarantees exact recovery for any matrix M that satisﬁes A1, A2. As mentioned earlier, A2 is a stronger assumption than A1. But as we show in Section 5, universal recovery is not possible with assumption A1 alone.

We can use the above theorem to derive results for several interesting sampling schemes such as random d-regular graphs. Using Theorem 1 in (Friedman, 2003), the second singular value of a random d-regular graph is 2

d 1+ , for every > 0, with high probability. Hence, a random d-regular graph, with high probability, obeys G1 and G2 which implies the following exact recovery result:

Corollary 4.3. Let M be an n1 n2 matrix of rank r satisfying assumptions (A1) and (A2) with δd 1

6, and is generated from a random d-regular graph, then M is the unique optimum of program (7) when d 36 4µ2

0r2, with high probability.

Universal Matrix Completion

Note that the standard completion results such as (Cand es & Recht, 2009), (Keshavan et al., 2010) generate using Erd os-R enyi graph, which are slightly different than the random d-regular graph we considered above. However, (Feige & Ofek, 2005) showed that the second largest singular value of the Erd os-R enyi graph, G(n1, n2, p), is O(

d) when p is (log(n1)/n2). Interestingly, even when p = c/n2, i.e. n2 p is a constant, trimming the graph (i.e., removing few nodes with high degree) gives a graph G s.t. σ2(G) = O(

d). Hence, we can again apply Theorem 4.2 to obtain the following result:

Corollary 4.4. Let M be an n1 n2 matrix of rank r satisfying assumptions (A1) and (A2) with δd 1

6, and is generated from a G(n, p) graph after trimming, then M is the unique optimum of program (7) when p 36cµ2

min{n1,n2}, with high probability.

While the above two results exploit the fact that a random graph is almost a Ramanujan expander and hence our general recovery result can be applied, the graph construction is still randomized. Interestingly, (Lubotzky et al., 1988; Margulis, 1988; Morgenstern, 1994) proposed explicit deterministic constructions of Ramanujan graphs when d 1 is a prime power. Moreover, (Marcus et al., 2013) showed that bipartite Ramanujan graphs exist for all n and d. However, explicit construction for all n and d still remains an open problem.

5. Discussion

In this section, we discuss the two assumptions A1 and A2 that are mentioned in Section 3.

Note that A1 is a standard assumption that is used by most of the existing approaches (Cand es & Recht, 2009), (Keshavan et al., 2010). Moreover, it is easy to show that for any matrix oblivious sampling approach, this assumption is necessarily required for exact recovery. For example, if Gij = 0, i.e., (i, j)-th element is not observed then we cannot recover M = eie T

However, A2 is a slightly non-standard assumption and intuitively it requires the singular vectors of M to satisfy RIP. Note that A2 is similar to the strong incoherence property introduced by (Cand es & Tao, 2010). Below we show the connection between strong incoherence property (SIP) assumed in (Cand es & Tao, 2010) and assumption A2.

Claim 5.1. Let M 2 Rn1 n2 be a rank-r matrix. Let M = U V T satisfy SIP i.e.,

|hei, UU T eji r

, 81 i, j n1,

|hei, V V T eji r

, 81 i, j n2. (8)

Then, M satisﬁes A2 for all d r and δd µ1

Note that the above claim holds with δd = µ1

pr, 8d r. This bound is independent of d and hence weak; as d becomes close to n1,

k2S U k U k T I

!!! gets close

to 0 since U T U = I. We leave the task of obtaining a stronger bound as an open problem.

In the context of universal recovery, a natural question is if any additional assumption is required or the standard A1 assumption alone sufﬁces. Here, we answer this question in negative. Speciﬁcally, we show that if M satisﬁes A1 only, then universal recovery guarantee for even rank-2 matrices cannot be provided by using as many as n1n2/4 observations.

Claim 5.2. Let be a ﬁxed set of indices and let P be the sampling operator as deﬁned in (1). Let n1 = n2 = n and let | | = n2/4. Then, there exists a rank-2 matrix M that cannot be recovered exactly from P (M).

Now, another question is if we require a property as strong as A2 and if just a lower-bound on k U ik2, k V jk2 is enough for universal recovery. The proof of the above claim (in appendix) shows that even if k U ik2, k V jk2 are lowerbounded, then also exact recovery is not possible.

6. Simulations

In this section, we will present a few empirical results on both synthetic and real data sets. The goal of this section is to demonstrate effect of the spectral gap of the sampling graph G (associated with ) on successful recovery of a matrix.

First, we use synthetic data sets generated in the following manner. We ﬁrst sample U, V 2 R500 10 using standard normal distribution. We then generate rank-10 matrix M, using M = UV T . As U, V are sampled from the normal distribution, hence w.h.p., M satisﬁes incoherence assumptions A1, A2 mentioned in Section 3. Next, we generate a sequence of sampling operators P (and the associated graph G) with varying (relative) spectral gap(1 σ2(G)/σ1(G)) by using a stochastic block model. In the basic stochastic block model, the nodes can be thought of as being divided into two clusters. Now, each intra-cluster edge is sampled uniformly with probability p and an inter-cluster edge is sampled uniformly with probability q. Note that, when p = q, then the spectral gap is largest and when p = 1, q = 0, then spectral gap is smaller as there are two distinct clusters in that case (Nadakuditi & Newman, 2012).

Note that number of samples generated in this model depends only on the value of p + q. To generate (i.e., G), we ﬁrst ﬁx a value for p + q, hence ﬁxing the number of samples, and then vary p, q, which gives graphs of different spectral gap. As value of p goes from 0 to p+q

2 , the spectral gap goes up and from p+q

2 to p + q, the spectral gap goes

Universal Matrix Completion

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0

spectral gap (dotted)

success ratio(solid)

p + q= 0.2, d=50 p + q= 0.4, d=100 p + q= 0.6, d=150

0 0.2 0.4 0.6 0.8 1 0

Spectral gap

Success ratio

p + q= 0.2, d=50 p + q= 0.3, d=75 p + q= 0.4, d=100 p + q= 0.6, d=150

Figure 1. (a): Figure plots the spectral gap (dotted lines) as well as the the fraction of successful recoveries (solid lines) with varying p, the parameter in stochastic block model. Value of p + q dictates the number of observed entries, i.e., | |. (b): Fraction of successful recovery vs spectral gap of sampling operator. Clearly, success ratio for matrix recovery show a phase transition type phenomenon w.r.t. the spectral gap. Also, different values of p + q, i.e, number of samples do not affect success ratio too much.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0

Spectral gap

Relative error

p + q= 0.2 p + q= 0.4 p + q= 0.6

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0

Spectral gap

Relative error

p + q= 0.05 p + q= 0.1 p + q= 0.15 p + q= 0.2 σ2/σ1

10 20 30 40 0

Size of the matrix

Average runtime(in secs)

r closure NNM

(a) (b) (c)

Figure 2. (a): Noisy samples case: Relative error in Frobenius norm vs spectral gap of sampling operator. Perturbed samples of matrix M + Z are observed, where Z is Gaussian noise matrix with σ = k Zk F /k Mk F . Solid lines correspond to σ = 0.1 and dotted for σ = 0.2. (b): Temperature prediction: Figure plots relative error in spectral norm vs spectral gap of the sampling operator for different number of samples(p+q). (c): Comparison with r-closure algorithm: Figure compares running time of r-closure algorithm with nuclear norm minimization algorithm.

down. Figure 1(a) clearly demonstrates this trend.

We use an Augmented Lagrangian Method (ALM) based method (Lin et al., 2010) to solve the nuclear norm minimization (7) problem. A trial is considered to be successful if the relative error (in Frobenius norm) is less than 0.01. We average over 50 such trials to determine the success ratio.

Figure 1(a) plots the (relative) spectral gap (dotted lines) and the success ratio (solid lines) as p varies. Lines of different colors indicate different number of samples (p + q). As expected, the spectral gap increases initially, as p varies from 0 to p+q

2 and then it decreases. Moreover, the trend of successful recovery also follows a similar trajectory and hence, is more or less independent of the number of samples (given a particular spectral gap).

Figure 1(b) shows fraction of successful recoveries as the spectral gap increases. Here again, lines of different colors indicate different number of samples (p + q). Clearly, success ratio is positively correlated with the spectral gap of the sampling operator and in fact exhibits a phase transition

type of phenomenon. We expect the difference in success ratio for different p + q values to decrease with increasing problem size, i.e, dimensionality of M.

Now, we conduct an experiment to show that spectral gap of G helps in reducing effect of noise as well. To this end, we generated noisy input matrix (M + Z), where Z is a random Gaussian matrix and let σ = ||Z||F /||M||F . We consider two values of σ, i.e., σ = 0.1 and 0.2. Figure 2(a) plots the error(in Frobenius norm) in the recovered matrix against the relative spectral gap in the noisy setting. Solid lines represent σ = 0.1 and dotted lines represent σ = 0.2. Clearly, larger spectral gap leads to smaller error in recovery. Moreover, the matrix completion denoising effect (Candes & Plan, 2010) can also be observed. For example, when ||Z||F = 0.1 and p + q = 0.4, output error is less than 0.05, and when ||Z||F = 0.2, error is less than 0.1

Temperature prediction: Finally we take a real dataset of temperature values(T) for 365 days at 316 different locations from (NCDC), which has been used to test matrix completion algorithms (Candes & Plan, 2010) before. Note that T is approximately rank-1 matrix with

Universal Matrix Completion

σ1(T)/||T||F = 0.98(σi(T) are singular values of T). We use the block model sampling scheme to sample entries from T, and let the output of (7) be ˆT. In ﬁgure 2(b) we plot the error || ˆT T||/||T|| for different values of spectral gap of G and for different number of samples (p + q). Note that ||X T||/||T|| σ2(T)/σ1(T) for any rank-1 matrix X, and we see that for large enough spectral gap we achieve this bound.

Finally in ﬁgure 2(c) we compare running times of the rclosure algorithm proposed in (Kir aly & Tomioka, 2012) with the nuclear norm minimization algorithm. While it is noted that this algorithm has better error guarantees, it is combinatorial and takes exponential time to compute.

7. Proof of Theorem 4.2

In this section, we present the proof of our main result (Theorem 4.2). The main steps of our proof are similar to the proof given by (Recht, 2009). The main difference is that the bounds in the existing proof assume that is independent of M and hence is not adversarial and holds with high probability. In contrast, for our proofs, bounds are deterministic and our proofs have to work under the assumption that M is adversarially selected for a given .

The key steps in the proof are: a) provide conditions that an optimal dual solution (or dual certiﬁcate) of problem (7) should satisfy, so that the true matrix M is the unique optimum of (7), b) construct such a dual certiﬁcate and hence guarantee that M is the unique optimum of (7).

We ﬁrst introduce a few notations required by our proof. For simplicity, we assume that n1 = n2 = n. Note that, our proof easily generalizes to case when n1 6= n2. Deﬁne T which is a subspace of Rn n, and is span of all matrices of form UXT and Y V T , i.e. all matrices with either row space in V or column space in U. Hence, the projection operator PT is deﬁned as follows:

PT (Z) =UU T Z + ZV V T UU T ZV V T

=UU T Z + (I UU T )ZV V T .

Hence any matrix in T can be written as UXT + Y V T , for some X and Y such that Y and U are orthogonal to each other. Similarly, we can deﬁne the projection operator onto T ?, the orthogonal complement of T:

PT ?(Z) = (I UU T )Z(I V V T ).

Now, before presenting conditions on the dual certiﬁcate and construction of the dual certiﬁcate, we provide a few structural lemmas that show goodness of operators PT and P . We would like to stress that the key differences of our proof from that of (Recht, 2009) is in fact proofs of these structural lemmas and also in the way we apply Lemma 7.3. We speciﬁcally show (using Lemma 7.3) that

each matrix in the series used in the construction of dual certiﬁcate Y (discussed later in the section) is incoherent and has small inﬁnity norm.

The ﬁrst lemma proves injectivity of operator P on the subspace T:

Lemma 7.1. Let M = U V T satisﬁes A1, A2 and let the graph G that generates satisﬁes G1, G2 (see Section 3). Then, for any matrix Z 2 T,

d PT P (Z) Zk F

d )k Zk F .

Next, we provide a lemma that characterizes the difference between P (Z) and Z, for any incoherent-type Z 2 T:

Lemma 7.2. Let Z 2 T, i.e., Z = UXT + Y V T and Y is orthogonal to U, and X and Y be incoherent, i.e.,

n , k Y jk2 c2

Let satisfy the assumptions G1 and G2, then:

d P (Z) Zk (c1 + c2)Cµ0r p

The next lemma is a stronger version of Lemma 7.1 for special incoherent-type matrices Z 2 T.

Lemma 7.3. Let Z 2 T, i.e., Z = UXT + Y V T and Y is orthogonal to U. Let X and Y be incoherent, i.e.,

n , k Y jk2 c2

Let Z = Z n

d PT P (Z). Then, the following holds for M, that satisfy conditions given in Lemma 7.1:

k Zk1 (c1+c2)µ0r

n (δd + Cµ0r p

Z = U XT + Y V T and X and Y are incoher-

ent. k Xik2 µ0r

and k Y jk2

n (δdc2 + (c1 + c2) Cµ0r p

The proof of the above three lemmas is provided in the appendix.

Conditions on the dual certiﬁcate: We now present the lemma that characterizes the conditions a dual certiﬁcate should satisfy so that M is the unique optimum of (7):

Lemma 7.4. Let M, satisfy A1, A2 and G1, G2, respectively. Then, M is the unique optimum of (7), if there exists a Y 2 Rn n that satisﬁes the following:

Universal Matrix Completion

||PT (Y ) UV T ||F

||PT ?(Y )|| < 1

Having speciﬁed the conditions on dual certiﬁcate and also the key structural lemma, we are now ready to present the proof of Theorem 4.2.

Proof of Theorem 4.2. We prove the theorem by constructing a dual certiﬁcate Y that satisﬁes conditions in lemma 7.4 and hence guarantee that M is exactly recovered by (7). Our construction of Y is similar to the golﬁng scheme based construction given in (Gross, 2011; Recht, 2009). In particular, Y is obtained as the p-th term of the series given below:

Wk+1 = Wk n

d PT P Wk, Yk =

where W0 = UV T . That is, Y = Yp where p = d 1

2 log3( n 18C2µ2

0r)e. Also, deﬁne = Cµ0r p

0r2, we get: 1

Now, the ﬁrst condition of Lemma 7.4 is satisﬁed trivially by construction as Yp is a sum of P (Wi) terms.

Bounding ||PT (Y ) UV T ||F : By construction:

PT (Y ) UV T =

n d PT P Wi UV T = Wp. (9)

Now, note that each Wk 2 T. Hence, using Lemma 7.1,

k Wk+1k F = k Wk n

d PT P Wkk F

d )k Wkk F 1

3k Wkk F , (10)

where the last inequality follows by using assumption on δd and by using 1/6. Hence, using (9), (10), we have:

k PT (Y ) UV T k F = k Wpk F

where the last inequality follows by using p = d 1

2 log3( n 18C2µ2

0r)e and by using k W0k F = pr.

Bounding k PT ?(Y )k: Recall that, Wk 2 T. Now, let Wk = UXT

Wk + YWk V T with YWk perpendicular to U. Moreover, let,

pµ0r pn , k Y i

Note that, for W0 = UV T , c W0

1 = 1 and c W0

2 = 0. Hence, using Lemma 7.3: c W1

6. Similarly, applying Lemma 7.3 k-times, we get:

Now, by using construction of Y , by triangle inequality, and by using the fact that PT ? is a contraction operator:

k PT ?(Y )k

d P (Wk 1) Wk 1k

where the last inequality follows by using Lemma 7.2. Now, using (12), (13), and Lemma 7.3, we have:

k PT ?(Y )k Cµ0r p

3k 2 1 6k 2 ))

Hence, proved.

8. Conclusions

In this paper, we provided the ﬁrst (to the best of our knowledge) universal recovery guarantee for matrix completion. The main observation of the paper is that the spectral gap of G (that generates ) is the key property that governs recoverability of M using P (M) alone. For example, if G is a Ramanujan expander (i.e., σ2(G) = O(

d)), then we have universal recovery guarantees for matrices with strong incoherence property.

For uniformly sampled , our main result implies exact recovery of constant rank matrices using O(n) entries, in contrast to the O(n log n) entries required by the existing analyses. One caveat is that we require stronger incoherence property to obtain the above given sample complexity. Our results also provide a recipe to determine if a given index set is enough to recover a low-rank matrix. That is, given and its associated graph G, we can measure the spectral gap of G and if it is large enough then our results guarantee exact recovery of strongly incoherent matrices.

In Section 5, we showed that the standard incoherence assumption alone is not enough for universal recovery and a property similar to A2 (see Section 3) is required. However, it is an open problem to obtain precise information theoretic limits on δd (see A2) for universal recovery guarantees. Another interesting research direction is to study the alternating minimization method under assumptions given in Section 3.

Universal Matrix Completion

Candes, Emmanuel J and Plan, Yaniv. Matrix completion

with noise. Proceedings of the IEEE, 98(6):925 936, 2010.

Cand es, Emmanuel J and Recht, Benjamin. Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6):717 772, 2009.

Candes, Emmanuel J and Tao, Terence. Decoding by linear

programming. Information Theory, IEEE Transactions on, 51(12):4203 4215, 2005.

Cand es, Emmanuel J and Tao, Terence. The power of convex relaxation: Near-optimal matrix completion. Information Theory, IEEE Transactions on, 56(5):2053 2080, 2010.

Chen, Yudong, Bhojanapalli, Srinadh, Sanghavi, Sujay,

and Ward, Rachel. Coherent matrix completion. ar Xiv preprint ar Xiv:1306.2979, 2013.

Feige, Uriel and Ofek, Eran. Spectral techniques applied

to sparse random graphs. Random Structures & Algorithms, 27(2):251 275, 2005.

Friedman, Joel. A proof of alon s second eigenvalue con-

jecture. In Proceedings of the thirty-ﬁfth annual ACM symposium on Theory of computing, pp. 720 724. ACM, 2003.

Gopi, Sivakant, Netrapalli, Praneeth, Jain, Prateek, and

Nori, Aditya. One-bit compressed sensing: Provable support and vector recovery. In Proceedings of the 30th international conference on machine learning (ICML13), pp. 154 162, 2013.

Gross, David. Recovering low-rank matrices from few co-

efﬁcients in any basis. Information Theory, IEEE Transactions on, 57(3):1548 1566, 2011.

Gross, David, Liu, Yi-Kai, Flammia, Steven T, Becker,

Stephen, and Eisert, Jens. Quantum state tomography via compressed sensing. Physical review letters, 105(15): 150401, 2010.

Heiman, Eyal, Schechtman, Gideon, and Shraibman, Adi.

Deterministic algorithms for matrix completion. Random Structures & Algorithms, 2013.

Hoory, Shlomo, Linial, Nathan, and Wigderson, Avi. Ex-

pander graphs and their applications. Bulletin of the American Mathematical Society, 43(4):439 561, 2006.

Jain, Prateek, Netrapalli, Praneeth, and Sanghavi, Su-

jay. Low-rank matrix completion using alternating minimization. ar Xiv preprint ar Xiv:1212.0467, 2012.

Keshavan, Raghunandan H, Montanari, Andrea, and Oh,

Sewoong. Matrix completion from a few entries. Information Theory, IEEE Transactions on, 56(6):2980 2998, 2010.

Kir aly, Franz J. and Tomioka, Ryota. A combinatorial alge-

braic approach for the identiﬁability of low-rank matrix completion. In ICML, 2012.

Lee, Troy and Shraibman, Adi. Matrix completion from

any given set of observations. In Advances in Neural Information Processing Systems, pp. 1781 1787, 2013.

Lin, Zhouchen, Chen, Minming, and Ma, Yi. The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. ar Xiv preprint ar Xiv:1009.5055, 2010.

Liu, Yi-Kai. Universal low-rank matrix recovery from pauli

measurements. ar Xiv preprint ar Xiv:1103.2816, 2011.

Lubotzky, Alexander, Phillips, Ralph, and Sarnak, Pe-

ter. Ramanujan graphs. Combinatorica, 8(3):261 277, 1988.

Marcus, Adam, Spielman, Daniel A, and Srivastava,

Nikhil. Interlacing families i: Bipartite ramanujan graphs of all degrees. ar Xiv preprint ar Xiv:1304.4132, 2013.

Margulis, Grigorii Aleksandrovich. Explicit grouptheoretical constructions of combinatorial schemes and their application to the design of expanders and concentrators. Problemy peredachi informatsii, 24(1):51 60, 1988.

Meka, Raghu, Jain, Prateek, and Dhillon, Inderjit S. Ma-

trix completion from power-law distributed samples. In Advances in Neural Information Processing Systems, pp. 1258 1266, 2009.

Morgenstern, Moshe. Existence and explicit constructions

of q+ 1 regular ramanujan graphs for every prime power q. Journal of Combinatorial Theory Series B, 62(1):44 62, 1994.

Nadakuditi, Raj Rao and Newman, Mark EJ. Graph spec-

tra and the detectability of community structure in networks. Physical review letters, 108(18):188701, 2012.

NCDC. National climatic data center. URL http:// www.ncdc.noaa.gov/oa/ncdc.html.

Recht, Benjamin. A simpler approach to matrix comple-

tion. ar Xiv preprint ar Xiv:0910.0651, 2009.

Recht, Benjamin, Fazel, Maryam, and Parrilo, Pablo A.

Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review, 52(3):471 501, 2010.