# analysis_of_robust_pca_via_local_incoherence__47eddf1e.pdf

Analysis of Robust PCA via Local Incoherence

Huishuai Zhang Department of EECS

Syracuse University Syracuse, NY 13244 hzhan23@syr.edu

Yi Zhou Department of EECS

Syracuse University Syracuse, NY 13244 yzhou35@syr.edu

Yingbin Liang Department of EECS

Syracuse University Syracuse, NY 13244 yliang06@syr.edu

We investigate the robust PCA problem of decomposing an observed matrix into the sum of a low-rank and a sparse error matrices via convex programming Principal Component Pursuit (PCP). In contrast to previous studies that assume the support of the error matrix is generated by uniform Bernoulli sampling, we allow non-uniform sampling, i.e., entries of the low-rank matrix are corrupted by errors with unequal probabilities. We characterize conditions on error corruption of each individual entry based on the local incoherence of the low-rank matrix, under which correct matrix decomposition by PCP is guaranteed. Such a reﬁned analysis of robust PCA captures how robust each entry of the low rank matrix combats error corruption. In order to deal with non-uniform error corruption, our technical proof introduces a new weighted norm and develops/exploits the concentration properties that such a norm satisﬁes.

1 Introduction

We consider the problem of robust Principal Component Analysis (PCA). Suppose a n-by-n1 data matrix M can be decomposed into a low-rank matrix L and a sparse matrix S as

M = L + S. (1)

Robust PCA aims to ﬁnd L and S with M given. This problem has been extensively studied recently. In [1, 2], Principal Component Pursuit (PCP) has been proposed to solve the robust PCA problem via the following convex programming

PCP: minimize

L,S k Lk + λk Sk1 (2)

subject to M = L + S,

where k k denotes the nuclear norm, i.e., the sum of singular values, and k k1 denotes the l1 norm i.e., the sum of absolute values of all entries. It was shown in [1, 2] that PCP successfully recovers L and S if the two matrices are distinguishable from each other in properties, i.e., L is not sparse and S is not low-rank. One important quantity that determines similarity of L to a sparse matrix is the incoherence of L, which measures how column and row spaces of L are aligned with canonical basis and between themselves. Namely, suppose that L is a rank-r matrix with SVD L = U V , where is a r r diagonal matrix with singular values as its diagonal entries, U is a n r matrix with columns as the left singular vectors of L, V is a n r matrix with columns as the right singular vectors of L, and V denotes the transpose of V . The incoherence of L is measured

1In this paper, we focus on square matrices for simplicity. Our results can be extended to rectangular matrices in a standard way.

by µ = max{µ0, µ1}, where µ0 and µ1 are deﬁned as

n , k V ejk

n , for all i, j = 1, , n (3)

Previous studies suggest that the incoherence crucially determines conditions on sparsity of S in order for PCP to succeed. For example, Theorem 2 in [3] explicitly shows that the matrix L with larger µ can tolerate only smaller error density to guarantee correct matrix decomposition by PCP. In all previous work on robust PCA, the incoherence is deﬁned to be the maximum over all column and row spaces of L as in (3) and (4), which can be viewed as the global parameter for the entire matrix L, and consequently, characterization of error density is based on such global (and in fact the worst case) incoherence.

In fact, each (i, j) entry of the low rank matrix L can be associated with a local incoherence parameter µij, which is less than or equal to the global parameter µ, and then the allowable entry-wise error density can be potentially higher than that characterized based on the global incoherence. Thus, the total number of errors that the matrix can tolerate in robust PCA can be much higher than that characterized based on the global incoherence when errors are distributed accordingly. Motivated by such an observation, this paper aims to characterize conditions on error corruption of each entry of the low rank matrix based on the corresponding local incoherence parameter, which guarantee success of PCP. Such conditions imply how robust each individual entry of L to resist error corruption. Naturally, the error corruption probability is allowed to be non-uniform over the matrix (i.e., locations of non-zero entries in S are sampled non-uniformly).

We note that the notion of local incoherence was ﬁrst introduced in [4] for studying the matrix completion problem, in which local incoherence determines the local sampling density in order to guarantee correct matrix completion. Here, local incoherence plays a similar role, and determines the maximum allowable error density at each entry to guarantee correct matrix decomposition. The difference lies in that local incoherence here depends on both localized µ0 and µ1 rather than only on localized µ0 in matrix completion due to further difﬁculty of robust PCA, in which locations of error corrupted entries are unknown, as pointed out in [1,3].

Our Contribution. In this paper, we investigate a more general robust PCA problem, in which entries of the low rank matrix are corrupted by non-uniformly distributed Bernoulli errors. We characterize the conditions that guarantee correct matrix decomposition by PCP. Our result identiﬁes the local incoherence (deﬁned by localized µ0 and µ1 for each entry of the low rank matrix) to determine the condition that each local Bernoulli error corruption parameter should satisfy. Our results provide the following useful understanding of the robust PCA problem:

Our characterization provides a localized (and hence more reﬁned) view of robust PCA, and

determines how robust each entry of the low rank matrix combats error corruption. Our results suggest that the total number of errors that the low-rank matrix can tolerate depends

on how errors are distributed over the matrix. Via cluster problems, our results provide an evidence that µ1 is necessary in characterizing

conditions for robust PCA.

In order to deal with non-uniform error corruption, our technical proof introduces a new weighted norm denoted by lw(1), which involves the information of both localized µ0 and µ1 and is hence different from the weighted norms introduced in [4] for matrix completion. Thus, our proof necessarily involves new technical developments associated with such a new norm.

Related Work. A closely related but different problem from robust PCA is matrix completion, in which a low-rank matrix is partially observed and is to be completed. Such a problem has been previously studied in [5 8], and it was shown that a rank-r n-by-n matrix can be provably recoverable by convex optimization with as few as (max{µ0, µ1}nr log2 n)2 observed entries. Later on, it was shown in [4] that µ1 does not affect sample complexity for matrix completion and hence (µ0nr log2 n) observed entries are sufﬁcient for guaranteeing correct matrix completion. It was further shown in [9] that a coherent low-rank matrix (i.e., with large µ0) can be recovered with

2f(n) 2 (g(n)) means k1 g(n) f(n) k2 g(n) for some positive k1, k2.

(nr log2 n) observations as long as the sampling probability is proportional to the leverage score (i.e., localized µ0). Our problem can be viewed as its counterpart in robust PCA, where the difference lies in the local incoherence in our problem depends on both localized µ0 and µ1.

Robust PCA aims to decompose an observed matrix into the sum of a low-rank matrix and a sparse matrix. In [2,10], robust PCA with ﬁxed error matrix was studied, and it was shown that the maximum number of errors in any row or column should be bounded from above in order to guarantee correct decomposition by PCP. Robust PCA with random error matrix was investigated in a number of studies. It has been shown in [1] that such decomposition can be exact with high probability if the percentage of corrupted entries is small enough, under the assumptions that the low-rank matrix is incoherent and the support set of the sparse matrix is uniformly distributed. It was further shown in [11] that if signs of nonzero entries in the sparse matrix are randomly chosen, then an adjusted convex optimization can produce exact decomposition even when the percentage of corrupted entries goes to one (i.e., error is dense). The problem was further studied in [1, 3, 12] for the case with the error-corrupted low-rank matrix only partially observed. Our work provides a more reﬁned (i.e. entry-wise) view of robust PCA with random error matrix, aiming at understanding how local incoherence affects susceptibility of each matrix entry to error corruption.

2 Model and Main Result

2.1 Problem Statement

We consider the robust PCA problem introduced in Section 1. Namely, suppose an n-by-n matrix M can be decomposed into two parts: M = L + S, where L is a low rank matrix and S is a sparse (error) matrix. We assume that the rank of L is r, and the support of S is selected randomly but non-uniformly. More speciﬁcally, let denote the support of S and then [n] [n], where [n] denotes the set {1, 2, . . . , n}. The event {(i, j) 2 } is independent across different pairs (i, j) and

P ((i, j) 2 ) = ij, (5)

where ij represents the probability that the (i, j)-entry of L is corrupted by error. Hence, is determined by Bernoulli sampling with non-uniform probabilities.

We study both the random sign and ﬁxed sign models for S. For the ﬁxed sign model, we assume signs of nonzero entries in S are arbitrary and ﬁxed, whereas for the random sign model, we assume that signs of nonzero entries in S are independently distributed Bernoulli variables, randomly taking values +1 or 1 with probability 1/2 as follows:

[sgn(S)]ij =

1 with prob. ij/2 0 with prob. 1 ij 1 with prob. ij/2.

In this paper, our goal is to characterize conditions on ij that guarantees correct recovery of L and S with observation of M.

We provide some notations that are used throughout this paper. A matrix X is associated with ﬁve norms: k Xk F denotes the Frobenius norm, k Xk denotes the nuclear norm (i.e., the sum of singular values), k Xk denotes the spectral norm (i.e., the largest singular value), and k Xk1 and k Xk1 represent respectively the l1 and l1 norms of the long vector stacked by X. The inner product between two matrices is deﬁned as h X, Y i := trace(X Y ). For a linear operator A that acts on the space of matrices, k Ak denotes the operator norm given by k Ak = sup{k Xk F =1} k AXk F .

2.2 Main Theorems

We adopt the PCP to solve the robust PCA problem. We deﬁne the following local incoherence parameters, which play an important role in our characterization of conditions on entry-wise ij.

k U eik2 + k V ejk2&

, µ1ij := n2([UV ]ij)2

µij := max{µ0ij, µ1ij}. (8)

It is clear that µ0ij µ0 and µ1ij µ1 for all i, j = 1, , n. We note that although maxi,j µij > 1, some µij might take values as small as zero.

We ﬁrst consider the robust PCA problem under the random sign model as introduced in Section 2.1. The following theorem characterizes the condition that guarantees correct recovery by PCP. Theorem 1. Consider the robust PCA problem under the random sign model. If

for some sufﬁciently large constant C0 and for all i, j 2 [n], then PCP yields correct matrix recovery with λ = 1 32pn log n, with probability at least 1 cn 10 for some constant c.

We note that the term 1/n3 is introduced to justify dual certiﬁcate conditions in the proof (see Appendix A.2). We further note that satisfying the condition in Theorem 1 implies C0

µr/n log n 1, which is an essential bound required in our proof and coincides with the conditions in previous studies [1, 12]. Although we set λ = 1 32pn log n for the sake of proof, in practice λ is often determined via cross validation.

The above theorem suggests that the local incoherence parameter µij is closely related to how robust each entry of L to error corruption in matrix recovery. An entry corresponding to smaller µij tolerates larger error density ij. This is consistent with the result in [4] for matrix completion, in which smaller local incoherence parameter requires lower local sampling rate. The difference lies in that here both µ0ij and µ1ij play roles in µij whereas only µ0ij matters in matrix completion. The necessity of µ1ij for robust PCA is further demonstrated in Section 2.3 via an example.

Theorem 1 also provides a more reﬁned view for robust PCA in the dense error regime, in which the error corruption probability approaches one. Such an interesting regime was previously studied in [3, 11]. In [11], it is argued that PCP with adaptive λ yields exact recovery even when the error corruption probability approaches one if errors take random signs and the dimension n is sufﬁciently large. In [3], it is further shown that PCP with a ﬁxed λ also yields exact recovery and the scaling behavior of the error corruption probability is characterized. The above Theorem 1 further provides the scaling behavior of the local entry-wise error corruption probability ij as it approaches one, and captures how such scaling behavior depends on local incoherence parameters µij. Such a result implies that robustness of PCP depends not only on the error density but also on how errors are distributed over the matrix with regard to µij.

We next consider the robust PCA problem under the ﬁxed sign model as introduced in Section 2.1. In this case, non-zero entries of the error matrix S can take arbitrary and ﬁxed values, and only locations of non-zero entries are random. Theorem 2. Consider the robust PCA problem under the ﬁxed sign model. If

(1 2 ij) max

for some sufﬁcient large constant C0 and for all i, j 2 [n], then PCP yields correct recovery with λ = 1 32pn log n, with probability at least 1 cn 10 for some constant c.

Theorem 2 follows from Theorem 1 by adapting the elimination and derandomization arguments [1, Section 2.2] as follows. Let be the matrix with each (i, j)-entry being ij. If PCP yields exact recovery with a certain probability for the random sign model with the parameter 2 , then it also yields exact recovery with at least the same probability for the ﬁxed sign model with locations of non-zero entries sampled using Bernoulli model with the parameter .

We now compare Theorem 2 for robust PCA with non-uniform error corruption to Theorem 1.1 in [1] for robust PCA with uniform error corruption. It is clear that if we set i,j = for all i, j 2 [n], then the two models are the same. It can then be easily checked that conditions

µr/n log n r and s in Theorem 1.1 of [1] implies the conditions in Theorem 2. Thus, Theorem 2 provides a more relaxed condition than Theorem 1.1 in [1]. Such beneﬁt of condition relaxation should be attributed to the new golﬁng scheme introduced in [3, 12], and this paper provides a more reﬁned view of robust PCA by further taking advantage of such a new golﬁng scheme to analyze local conditions.

More importantly, Theorem 2 characterizes relationship between local incoherence parameters and local error corruption probabilities, which implies that different areas of the low-rank matrix have

different levels of ability to resist errors: a more incoherent area (i.e., with smaller µij) can tolerate more errors. Thus, Theorem 2 illustrates the following interesting fact. Whether PCP yields correct recovery depends not only on the total number of errors but also on how errors are distributed. If more errors are distributed to more incoherent areas (i.e, with smaller µij), then more errors in total can be tolerated. However, if errors are distributed in an opposite manner, then only smaller number of errors can be tolerated.

2.3 Implication on Cluster Matrix In this subsection, we further illustrate our result when the low rank matrix is a cluster matrix. Although robust PCA and even more sophisticated approaches have been applied to solve clustering problems, e.g., [13 15], our perspective here is to demonstrate how local incoherence affects entrywise robustness to error corruption, which has not been illustrated in previous studies.

Suppose there are n elements to be clustered. We use a cluster matrix L to represent the clustering relationship of these n elements with Lij = 1 if elements i and j are in the same cluster and Lij = 0 otherwise. Thus, with appropriate ordering of the elements, L is a block diagonal matrix with all diagonal blocks containing all 1 s and off-diagonal blocks containing all 0 s. Hence, the rank r of L equals the number of clusters, which is typically small compared to n. Suppose these entries are corrupted by errors that ﬂip entries from one to zero or from zero to one. This can be thought of as adding a (possibly sparse) error matrix S to L so that the observed matrix is L + S. Then PCP can be applied to recover the cluster matrix L.

We ﬁrst consider an example with clusters having equal size n/r. We set n = 600 and r = 4 (i.e., four equal-size clusters). We apply errors to diagonal-block entries and off-diagonal-block entries respectively with the probabilities d and od. In Fig. 1a, we plot recovery accuracy of PCP for each pairs of ( od, d). It is clear from the ﬁgure that failure occurs for larger od than d, which thus implies that off-diagonal blocks are more robust to errors than diagonal blocks. This can be explained by Theorem 2 as follows. For a cluster matrix with equal cluster size n/r, the local incoherence parameters are given by

µ0ij = 1 for all (i, j), and µ1ij =

r, (i, j) is in diagonal blocks 0, (i, j) is in off-diagonal blocks, and thus

µij = max{µ0ij, µ1ij} =

r, (i, j) is in diagonal blocks 1, (i, j) is in off-diagonal blocks.

Off diagonal block error ρod

Diagonal block error ρd

0 0.1 0.2 0.3 0.4 0.5

(a) Diagonal-block error vs. off-diagonal-block error. n = 600, r = 4 with equal cluster sizes

Cluster1 error ρ1

Cluster2 error ρ2

0 0.1 0.2 0.3 0.4 0.5

(b) Error vulnerability with respect to cluster sizes 500 vs. 100

Figure 1: Error vulnerability on different parts for cluster matrix. In both cases, for each probability pair, we generate 10 trials of independent random error matrices and count the number of successes of PCP. We declare a trial to be successful if the recovered ˆL satisﬁes kˆL Lk F /k Lk F 10 3. Color from white to black represents the number of successful trials changes from 10 to 0.

Based on Theorem 2, it is clear that diagonal-block entries are more locally coherent and hence are more vulnerable to errors, whereas off-diagonal-block entries are more locally incoherent and hence are more robust to errors.

Moreover, this example also demonstrates the necessity of µ1 in the robust PCA problem. [4] showed that µ1 is not necessary for matrix completion and argued informally that µ1 is necessary for robust PCA by connecting the robust PCA problem to hardness of ﬁnding a small clique in a large random graph. Here, the above example provides an evidence for such a fact. In the example, µ0ij are the same over the entire matrix, and hence it is µ1ij that differentiates incoherence between diagonal blocks and off-diagonal blocks, and thus differentiates their robustness to errors.

We then consider the case with two clusters that have different sizes: cluster1 size 500 versus cluster2 size 100. Hence, r = 2. We apply errors to block diagonal entries corresponding to clusters 1 and 2 respectively with the probabilities 1 and 2. In Fig. 1b, we plot the recovery accuracy of PCP for each pair of ( 1, 2). It is clear from the ﬁgure that failure occurs for larger 1 than 2, which thus implies that entries corresponding to the larger cluster are more robust to errors than entries corresponding to smaller clusters. This can be explained by Theorem 2 because the local incoherence of a block diagonal entry is given by µij = n2 r K2 , where K is the corresponding cluster size, and hence the error corruption probability should satisfy 1 2 ij > C0

K log n for correct recovery. Thus, a larger cluster can resist denser errors. This also coincides with the results on graph clustering in [13,16].

2.4 Outline of the Proof of Theorem 1 The proof of Theorem 1 follows the idea established in [1] and further developed in [3, 12]. Our main technical development lies in analysis of non-uniform error corruption based on local incoherence parameters, for which we introduce a new weighted norm lw(1), and establish concentration properties and bounds associated with this norm. As a generalization of matrix inﬁnity norm, lw(1) incorporates both µ0ij and µ1ij, and is hence different from the weighted norms lµ(1) and lµ(1,2) in [9] by its role in the analysis for the robust PCA problem. We next outline the proof here and the detailed proofs are provided in Appendix A.

We ﬁrst introduce some notations. We deﬁne the subspace T := {UX + Y V : X, Y 2 Rn r}, where U, V are left and right singular matrix of L. Then T induces a projection operator PT given by PT (M) = UU M +MV V UU MV V . Moreover, T ?, the complement subspace to T, induces an orthogonal projection operator PT ? with PT ?(M) = (I UU )M(I V V ). We further deﬁne two operators associated with Bernoulli sampling. Let 0 denote a generic subset of [n] [n]. We deﬁne a corresponding projection operator P 0 as P 0(M) = P

ij I{(i,j)2 0}h M, eie

j, where I{ } is the indicator function. If 0 is a random set generated by Bernoulli sampling with P((i, j) 2 0) = tij with 0 < tij 1 for all i, j 2 [n], we further deﬁne a linear operator R 0 as R 0(M) = P

1 tij I{(i,j)2 0}h M, eie

We further note that throughout this paper with high probability means with probability at least 1 cn 10 , where the constant c may be different in various contexts.

Our proof includes two main steps: establishing that existence of a certain dual certiﬁcate is sufﬁcient to guarantee correct recovery and constructing such a dual certiﬁcate. For the ﬁrst step, we establish the following proposition.

Proposition 1. If 1 ij max

, PCP yields a unique solution which agrees

with the correct (L, S) with high probability if there exists a dual certiﬁcate Y obeying

P Y = 0, (9)

k PT ?(λ sgn(S) + Y )k 1

k PT (Y + λ sgn(S) UV )k F λ

where λ = 1 32pn log n.

The proof of the above proposition adapts the idea in [1,12] for uniform errors to non-uniform errors. In particular, the proof exploits the properties of R associated with non-uniform errors, which are presented as Lemma 1 (established in [9]) and Lemma 2 in Appendix A.1.

Proposition 1 suggests that it sufﬁces to prove Theorem 1 if we ﬁnd a dual certiﬁcate Y that satisﬁes the dual certiﬁcate conditions (9)-(12). Thus, the second step is to construct Y via the golﬁng scheme. Although we adapt the steps in [12] to construct the dual certiﬁcate Y , our analysis requires new technical development based on local incoherence parameters. Recall the following deﬁnitions in Section 2.1: P((i, j) 2 ) = ij and P((i, j) 2 Γ) = pij, where Γ = c and pij = 1 ij.

Consider the golﬁng scheme with nonuniform sizes as suggested in [12] to establish bounds with fewer log factors. Let Γ = Γ1 [ Γ2 [ [ Γl, where {Γk} are independent random sets given by

P((i, j) 2 Γ1) = pij

6 , P((i, j) 2 Γk) = qij, for k = 2, , l.

Thus, if ij = (1 pij

6 )(1 qij)l 1, the two sampling strategies are equivalent. Due to the overlap between {Γk}, we have qij 5

pij l 1. We set l = b5 log n + 1c and construct a dual certiﬁcate Y in the following iterative way:

Z0 = PT (UV λ sgn(S)) (13) Zk = (PT PT RΓk PT )Zk 1, for k = 1, , l (14)

RΓk Zk 1. (15)

It is then sufﬁcient to show that such constructed Y satisﬁes the dual certiﬁcate conditions (9)-(12). Condition (9) is due to the construction of Y . Condition (12) can be shown by a concentration property of each iteration step (14) with k k F characterized in Lemma 3 in Appendix A.1. In order to show that Y satisﬁes conditions (10) and (11), we introduce the following weighted norm. Let

n2 and wij = max{ ˆwij, }, where is the smallest nonzero ˆwij. Here is introduced to avoid singularity. Then for any matrix Z, deﬁne

k Zkw(1) = max

It is easy to verify k kw(1) is a well deﬁned norm. We can then show that each iteration step (14) with k k and k kw(1) norms satisﬁes two concentration properties characterized respectively in Lemmas 4 and 5, which are essential to prove conditions (10) and (11).

3 Numerical Experiments

In this section, we provide numerical experiments to demonstrate our theoretical results. In these experiments, we adopt an augmented Lagrange multiplier algorithm in [17] to solve the PCP. We set λ = 1/pn log n. A trial of PCP (for a given realization of error locations) is declared to be successful if ˆL recovered by PCP satisﬁes kˆL Lk F /k Lk F 10 3.

We apply the following three models to construct the low rank matrix L.

Bernoulli model: L = XX where X is n r matrix with entries independently taking values

+1/pn and 1/pn equally likely. Gaussian model: L = XX , where X is n r matrix with entries independently sampled

from Gaussian distribution N(0, 1/n). Cluster model: L is a block diagonal matrix with r equal-size blocks containing all 1 s. In order to demonstrate that the local incoherence parameter affects local robustness to error corruptions, we study the following two types of error corruption models.

Uniform error corruption: sgn(Sij) is generated as (6) with ij = for all i, j 2 [n], and

S = sgn(S). Adaptive error corruption: sgn(Sij) is generated as (6) with ij =

1/µij for all i, j 2

[n], and S = sgn(S). It is clear in both cases, the error matrix has the same average error corruption percentage , but in adaptive error corruption, the local error corruption probability is adaptive to the local incoherence.

Our ﬁrst experiment demonstrates that robustness of PCP to error corruption not only depends on the number of errors but also depends on how errors are distributed over the matrix. For all three

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0

Error percentage ρ

Failure frequency

uniform error adaptive error

(a) Bernoulli model

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0

Error percentage ρ

Failure frequency

uniform error adaptive error

(b) Gaussian model

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0

Error percentage ρ

Failure frequency

uniform noise adaptive noise

(c) Cluster model Figure 2: Recovery failure of PCP versus error corruption percentage.

low rank matrix models, we set n = 1200 and rank r = 10. For each low rank matrix model, we apply the uniform and adaptive error matrices, and plot the failure frequency of PCP versus the error corruption percentage in Fig. 2. For each value of , we perform 50 trials of independent error corruption and count the number of failures of PCP. Each plot of Fig. 2 compares robustness of PCP to uniform error corruption (the red square line) and adaptive error corruption (the blue circle line). We observe that PCP can tolerate more errors in the adaptive case. This is because the adaptive error matrix is distributed based on the local incoherence parameter, where error density is higher in areas where matrices can tolerate more errors. Furthermore, comparison among the three plots in Fig. 2 illustrates that the gap between uniform and adaptive error matrices is the smallest for Bernoulli model and the largest for cluster model. Our theoretic results suggest that the gap is due to the variation of the local incoherence parameter across the matrix, which can be measured by the variance of µij. Larger variance of µij should yield larger gap. Our numerical calculation of the variances for three models yield Var(µBernoulli) = 1.2109, Var(µGaussian) = 2.1678, and Var(µcluster) = 7.29, which conﬁrms our explanation.

0 20 40 60 80 100 0

Error percentage ρ

uniform error adaptive error

(a) Bernoulli model

0 10 20 30 40 50 60 0

Error percentage ρ

uniform error adative error

(b) Gaussian model

2 4 6 8 10 12 14 0

Error percentage ρ

uniform error adaptive error

(c) Cluster model Figure 3: Largest allowable error corruption percentage versus rank of L so that PCP yields correct recovery.

We next study the phase transition in rank and error corruption probability. For the three low-rank matrix models, we set n = 1200. In Fig. 3, we plot the error corruption percentage versus the rank of L for both uniform and adaptive error corruption models. Each point on the curve records the maximum allowable error corruption percentage under the corresponding rank such that PCP yields correction recovery. We count a (r, ) pair to be successful if nine trials out of ten are successful. We ﬁrst observe that in each plot of Fig. 3, PCP is more robust in adaptive error corruption due to the same reason explained above. We further observe that the gap between the uniform and adaptive error corruption changes as the rank changes. In the low-rank regime, the gap is largely determined by the variance of incoherence parameter µij as we argued before. As the rank increases, the gap is more dominated by the rank and less affected by the local incoherence. Eventually for large enough rank, no error can be tolerated no matter how errors are distributed.

4 Conclusion

We characterize reﬁned conditions under which PCP succeeds to solve the robust PCA problem. Our result shows that the ability of PCP to correctly recover a low-rank matrix from errors is related not only to the total number of corrupted entries but also to locations of corrupted entries, more essentially to the local incoherence of the low rank matrix. Such result is well supported by our numerical experiments. Moreover, our result has rich implication when the low rank matrix is a cluster matrix, and our result coincides with state-of-the-art studies on clustering problems via low rank cluster matrix. Our result may motivate the development of weighted PCP to improve recovery performance similar to the weighted algorithms developed for matrix completion in [9,18].

[1] E. J. Cand es, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? Journal of the ACM

(JACM), 58(3):11, 2011.

[2] V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky. Rank-sparsity incoherence for matrix

decomposition. SIAM Journal on Optimization, 21(2):572 596, 2011.

[3] Y. Chen, A. Jalali, S. Sanghavi, and C. Caramanis. Low-rank matrix recovery from errors and erasures.

IEEE Transactions on Information Theory, 59(7):4324 4337, 2013.

[4] Y. Chen. Incoherence-optimal matrix completion. IEEE Transactions on Information Theory, 61(5):2909 2923, May 2015.

[5] E. J. Cand es and B. Recht. Exact matrix completion via convex optimization. Foundations of Computa-

tional mathematics, 9(6):717 772, 2009.

[6] E. J. Cand es and T. Tao. The power of convex relaxation: Near-optimal matrix completion. IEEE Trans-

actions on Information Theory, 56(5):2053 2080, 2010.

[7] D. Gross. Recovering low-rank matrices from few coefﬁcients in any basis. IEEE Transactions on Information Theory, 57(3):1548 1566, 2011.

[8] B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via

nuclear norm minimization. SIAM Review, 52(3):471 501, 2010.

[9] Y. Chen, S. Bhojanapalli, S. Sanghavi, and R. Ward. Completing any low-rank matrix, provably. ar Xiv

preprint ar Xiv:1306.2979, 2013.

[10] D. Hsu, S. M. Kakade, and T. Zhang. Robust matrix decomposition with sparse corruptions. IEEE

Transactions on Information Theory, 57(11):7221 7234, 2011.

[11] A. Ganesh, J. Wright, X. Li, E. J. Candes, and Y. Ma. Dense error correction for low-rank matrices

via principal component pursuit. In IEEE International Symposium on Information Theory (ISIT), pages 1513 1517, Austin, TX, US, June 2010.

[12] X. Li. Compressed sensing and matrix completion with constant proportion of corruptions. Constructive

Approximation, 37(1):73 99, 2013.

[13] S. Oymak and B. Hassibi. Finding dense clusters via low rank+ sparse decomposition. ar Xiv preprint

ar Xiv:1104.5186, 2011.

[14] Y. Chen, S. Sanghavi, and H. Xu. Clustering sparse graphs. In Advances in Neural Information Processing

Systems (NIPS), pages 2204 2212, Lake Tahoe, Nevada, US, December 2012.

[15] Y. Chen, S. Sanghavi, and H. Xu. Improved graph clustering. IEEE Transactions on Information Theory,

60(10):6440 6455, Oct 2014.

[16] Y. Chen, A. Jalali, S. Sanghavi, and H. Xu. Clustering partially observed graphs via convex optimization.

Journal of Machine Learning Research, 15(1):2213 2238, 2014.

[17] Z. Lin, M. Chen, and Y. Ma. The augmented lagrange multiplier method for exact recovery of corrupted

low-rank matrices. ar Xiv preprint ar Xiv:1009.5055, 2010.

[18] N. Srebro and R. R. Salakhutdinov. Collaborative ﬁltering in a non-uniform world: Learning with the

weighted trace norm. In Advances in Neural Information Processing Systems (NIPS), pages 2056 2064, Hyatt Regency, Vancouver, Canada, 2010. December.

[19] R. Vershynin. Introduction to the non-asymptotic analysis of random matrices. ar Xiv preprint ar Xiv:1011.3027, 2010.

[20] J. A. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of Computational

Mathematics, 12(4):389 434, 2012.