# refined_complexity_of_pca_with_outliers__a5af9875.pdf Refined Complexity of PCA with Outliers Fedor Fomin * 1 Petr Golovach * 1 Fahad Panolan * 1 Kirill Simonov * 1 Principal component analysis (PCA) is one of the most fundamental procedures in exploratory data analysis and is the basic step in applications ranging from quantitative finance and bioinformatics to image analysis and neuroscience. However, it is well-documented that the applicability of PCA in many real scenarios could be constrained by an immune deficiency to outliers such as corrupted observations. We consider the following algorithmic question about the PCA with outliers. For a set of n points in Rd, how to learn a subset of points, say 1% of the total number of points, such that the remaining part of the points is best fit into some unknown r-dimensional subspace? We provide a rigorous algorithmic analysis of the problem. We show that the problem is solvable in time n O(d2). In particular, for constant dimension the problem is solvable in polynomial time. We complement the algorithmic result by the lower bound, showing that unless Exponential Time Hypothesis fails, in time f(d)no(d), for any function f of d, it is impossible not only to solve the problem exactly but even to approximate it within a constant factor. 1. Introduction Problem statement and motivation. Classical principal component analysis (PCA) is one of the most popular and successful techniques used for dimension reduction in data analysis and machine learning (Pearson, 1901; Hotelling, 1933; Eckart & Young, 1936). In PCA one seeks the best *Equal contribution 1Department of Informatics, University of Bergen, Norway. Correspondence to: Fedor Fomin , Petr Golovach , Fahad Panolan , Kirill Simonov . Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s). low-rank approximation of data matrix M by solving minimize M L 2 F subject to rank(L) r. Here ||A||2 F = P i,j a2 ij is the square of the Frobenius norm of matrix A. By the Eckart-Young theorem (Eckart & Young, 1936), PCA is efficiently solvable via Singular Value Decomposition (SVD). PCA is used as a preprocessing step in a great variety of modern applications including face recognition, data classification, and analysis of social networks. In this paper we consider a variant of PCA with outliers, where we wish to recover a low-rank matrix from large but sparse errors. Suppose that we have n points (observations) in d-dimensional space. We know that a part of the points are arbitrarily located (say, produced by corrupted observations) while the remaining points are close to an r-dimensional true subspace. We do not have any information about the true subspace and about the corrupted observations. Our task is to learn the true subspace and to identify the outliers. As a common practice, we collect the points into n d matrix M, thus each of the rows of M is a point and the columns of M are the coordinates. However, it is very likely that PCA of M will not reveal any reasonable information about non-corrupted observations well-documented drawback of PCA is its vulnerability to even very small number of outliers, an example is shown in Figure 1. Matrix formulation suggests the following interpretation: we seek a low-rank matrix L that, with an exception in few rows, approximates M best. Refined Complexity of PCA with Outliers Figure 1. An illustration on how outliers impact PCA. The optimal approximation line (in dashed) of the given set of points without the evident outlier shows the linear structure of the dataset. However, when the outlier is present, the principal component (in solid) changes drastically. Input: Data matrix M Rn d, integer parameters r and k. Task: minimize M L S 2 F subject to L, S Rn d, rank(L) r, and S has at most k non-zero rows. PCA WITH OUTLIERS The geometric interpretation of PCA WITH OUTLIERS is very natural: Given n points in Rd, we seek for a set of k points whose removal leaves the remaining n k points as close as possible to some r-dimensional subspace. Related work. PCA WITH OUTLIERS belongs to the large class of extensively studied robust PCA problems, see e.g. (Vaswani & Narayanamurthy, 2018; Xu et al., 2010; Bouwmans et al., 2016). In the robust PCA setting we observe a noisy version M of data matrix L whose principal components we have to discover. In the case when M is a slightly disturbed version of L, PCA performed on M provides a reasonable approximation for L. However, when M is very noisy version of L, like being corrupted by a few outliers, even one corrupted outlier can arbitrarily alter the quality of the approximation. One of the approaches to robust PCA, which is relevant to our work, is to model outliers as additive sparse matrix. Thus we have a data d n matrix M, which is the superpo- sition of a low-rank component L and a sparse component S. That is, M = L+S. This approach became popular after the works of Cand es et al. (Cand es et al., 2011), Wright et al. (Wright et al., 2009), and Chandrasekaran et al. (Chandrasekaran et al., 2011). A significant body of work on the robust PCA problem has been centered around proving that, under some feasibility assumptions on M, L, and S, a solution to minimize rank(L) + λ S 0 (1) subject to M = L + S, where S 0 denotes the number of non-zero columns in matrix S and λ is a regularizing parameter, recovers matrix L uniquely. While optimization problem (1) is NP-hard (Gillis & Vavasis, 2018), it is possible to show that under certain assumptions on L and S, its convex relaxation can recover these matrices efficiently. The problem strongly related to (1) was studied in computational complexity under the name MATRIX RIGIDITY (Grigoriev, 1976; 1980; Valiant, 1977). Here, for a given matrix M, and integers r and k, the task is to decide whether at most k entries of M can be changes so that the rank of the resulting matrix is at most r. Equivalently, this is the problem to decide whether a given matrix M = L + S, where rank(L) r and S 0 k. Fomin et al. (Fomin et al., 2018) gave an algorithm solving MATRIX RIGIITY in time 2O(r k log(r k)) (nd)O(1). On the other hand, they show that the problem is W[1]-hard parameterized by k. In particular, this implies that an algorithm of running time f(k) (rnd)O(1) for this problem is highly unlikely for any function f of k only. A natural extension of the robust PCA approach (1) is to consider the noisy version of robust PCA: Given M = L+S+N, where L, S, and N are unknown, but L is known to be low rank, S is known to have a few non-zero rows, and noise matrix N is of small Frobenius norm, recover L. Wright et al. (Wright et al., 2009) studied the following model of noisy robust PCA: minimize rank(L) + λ S 0 (2) subject to M L S 2 F ε. Thus (2) models the situations when we want to learn the principal components of n points in d-dimensional space under the assumption that a small number of coordinates is corrupted. The study of the natural, and seemingly more difficult extension of (1) to the PCA with outliers, was initiated by Xu et al. (Xu et al., 2010), who introduced the following idealization of the problem. minimize rank(L) + λ S 0,r (3) subject to M = L + S. Refined Complexity of PCA with Outliers Here S 0,r denotes the number of non-zero columns in matrix S and λ is a regularizing parameter. Xu et al. (Xu et al., 2010) approached this problem by building its convex surrogate and applying efficient convex optimization-based algorithm for the surrogate. Chen et al. (Chen et al., 2011) studied the variant of the problem with the partially observed data. Similar as (2) is the noisy version of the robust PCA model (1), the PCA WITH OUTLIERS problem studied in our work can be seen as a noisy version of (3). Our results. Even though PCA WITH OUTLIERS was assumed to be NP-hard, to the best of our knowledge, this has never been studied formally. While NP-hardness is a serious strike against the tractability of the problem, on the other hand, it only says that in the worst case the problem is not tractable. But since the complexity of the problem could be governed by several parameters like the rank r of L, the number of outliers k or dimension d of M, it is natural to ask how these parameters influence the complexity of the problem. For example, when k is a small constant, we can guess which points are outliers and run PCA for the remaining points. This will bring us to nk calls of PCA which is polynomial for constant k and is exponential when k is a fraction of n. In this paper we give an algorithm solving PCA WITH OUT- LIERS roughly in time |M|O(d2), where |M| is the size of the input matrix M. Thus for fixed dimension d, the problem is solvable in polynomial time. The algorithms works in polynomial time for any number of outliers k and the rank r of the recovered matrix L. Our algorithm strongly relies on the tools developed in computational algebraic geometry, in particular, for handling arrangements of algebraic surfaces in Rd defined by polynomials of bounded degree. We complement our algorithmic result by a complexity lower bound. Our lower bound not only implies that the problem is NP-hard when dimension d is part of the input, it also rules out a possibility of certain type of algorithms for PCA WITH OUTLIERS. More precisely, assuming the Exponential Time Hypothesis (ETH),1 we show that for any constant ω 1, PCA WITH OUTLIERS cannot be ωapproximated in time f(d)|M|o(d), for any function f of d only. Our algorithm is, foremost, of theoretical interest, especially in the presense of the nearly-matching lower bound showing that doing something essentially better is next to impossible. In practice, PCA is often applied to reduce high-dimensional datasets, and for this task the running time exponential in d is not practical. However, there are cases where such an algorithm could be useful. One example could be the visualization of low-dimensional data, where the number of 1ETH of Impagliazzo, Paturi, and Zane (Impagliazzo et al., 2001) is that 3-SAT with n-variables is not solvable in time 2o(n). dimensions, even if it is small already, needs to be lowered down to two to actually draw the dataset. Another example could be when we suspect a small subset of features to be highly correlated, and we want to reduce them to one dimension in order to get rid of the redundancy in data. This potential application is well illustrated by the popular PCA tutorial (Shlens, 2014), where essentially one-dimensional movement of a spring-mass is captured by three cameras, resulting in 6 features. 2. Polynomial algorithm for bounded dimension 2.1. Preliminaries As a subroutine in our algorihm, we use a standard result about sampling points from cells of an arrangement of algebraic surfaces, so first we state some definitions and an algorithm from (Basu et al., 2006). We denote the ring of polynomials in variables X1, . . . , Xd with coefficients in R by R[X1, . . . , Xd]. By saying that an algebraic set V in Rd is defined by Q R[X1, . . . , Xd], we mean that V = {x Rd|Q(x1, . . . , xd) = 0}. For a set of s polynomials P = {P1, . . . , Ps} R[X1, . . . , Xd], a sign condition is specified by a sign vector σ { 1, 0, +1}s, and the sign condition is non-empty over V with respect to P if there is a point x V such that σ = (sign(P1(x)), . . . , sign(Ps(x))), where sign(x) is the sign function on real numbers defined as 1, if x > 0, 0, if x = 0, 1, if x < 0 The realization space of σ { 1, 0, +1}s over V is the set R(σ) = {x|x V, σ = (sign(P1(x)), . . . , sign(Ps(x)))}. If R(σ) is not empty then each of its non-empty semialgebrically connected (which is equivalent to just connected on semi-algebraic sets as proven in (Basu et al., 2006), Theorem 5.22) components is a cell of P over V . For an algebraic set W its real dimension is the maximal integer d such that there is a homeomorphism of [0, 1]d in W. Naturally, if W Rd, then d d. The following theorem from (Basu et al., 2006) gives an algorithm to compute a point in each cell of P over V . Proposition 1 ((Basu et al., 2006), Theorem 13.22). Let V be an algebraic set in Rd of real dimension d defined by Q(X1, . . . , Xd) = 0, where Q is a polynomial in R[X1, . . . , Xd] of degree at most b, and let P Refined Complexity of PCA with Outliers R[X1, . . . , Xd] be a finite set of s polynomials with each P P also of degree at most b. Let D be a ring generated by the coefficients of Q and the polynomials in P. There is an algorithm which takes as input Q, d and P and computes a set of points meeting every non-empty cell of V over P. The algorithm uses at most sd b O(d) arithmetic operations in D. On the practical side, we note that a number of routines from (Basu et al., 2006) is implemented in the SARAG library (Caruso, 2006). 2.2. Algorithm First, we emphasize on a folklore observation that geometrically the low-rank approximation matrix L is defined as orthogonal projection of rows of M on some r-dimensional subspace of Rd. For the proof see e.g. (Blum et al., 2017). Proposition 2. Given a matrix M Rn d with rows m1, ..., mn, the task of finding a matrix L of rank at most r which minimizes ||M L||2 F is equivalent to finding an r-dimensional subspace of Rd which minimizes the total squared distance from rows of M treated as points in Rd: min L Rn d rank L r ||M L||2 F = min U Rd U is a linear subspace of dim r i=1 ||mi proj U mi||2 F , where proj U x is the orthogonal projection of x on U for x Rd. By Proposition 2, if we fix an r-dimensional subspace U containing the span of rows of L, then the outliers are automatically defined as k farthest points from U among {mi}n i=1. In the next proposition, we give a precise statement of this. Proposition 3. The optimization objective of PCA WITH OUTLIERS for a given matrix M Rn d with rows m1, ... , mn can be equivalently redefined as follows. min L,S Rn d rank L r S has at most k non-zero rows ||M L S||2 F = min U Rd U is a linear subspace of dim r ||M LU SU||2 F , where SU has k non-zero rows which are k rows of M with the largest value of ||mi proj U mi||2 F , and LU is the orthogonal projection of the rows of (M SU) on U. m1 ... mk 0 ... 0 0 ... 0 proj U mk+1 ... proj U mn assuming that rows of M are ordered by descending ||mi proj U mi||2 F . So for a fixed U we may determine SU easily and then solve the classical PCA for the matrix (M SU). The intuition behind our algorithm is that the set of k farthest points is the same for many subspaces, and solving PCA for (M SU) treats all these subspaces. The crucial point is to bound the number of different matrices SU we have to consider. There is of course a trivial bound of nk since SU is always obtained by choosing k rows of M. But the number of different SU is also geometrically limited, and exploiting this we are able to obtain another bound of n O(d2), resulting in the following theorem. Theorem 1. Solving PCA WITH OUTLIERS is reducible to solving n 2 min(rd,(d r)d) 2O(d) = n O(d2) instances of PCA. This reduction can be computed in the number of operations over R bounded by the expression above. First, a note about the statement of Theorem 1. Our algorithm relies on solving the classical PCA, and since only iterative algorithms for PCA and SVD exist, we could not claim that our algorithm solves PCA WITH OUTLIERS in some fixed number of operations. However, if we are only interested in solving the problem up to some constant precision, for example machine epsilon, then PCA is solvable in polynomial number of operations and so by Theorem 1, PCA WITH OUTLIERS is solvable in n O(d2) operations. Proof of Theorem 1. We start with associating rdimensional subspaces of Rd with points of a certain algebraic set. Consider the matrix space R(d r) d, and for an element V R(d r) d, V = {vij}i,j, the following polynomial conditions: QO i,j(V ) := l=1 vilvjl = 0, for 1 i < j (d r), QN i (V ) := 1 = 0, for 1 i (d r), Refined Complexity of PCA with Outliers where condition QO i,j(V ) = 0 requires rows i, j of V to be pairwise orthogonal and condition QN j (V ) = 0 requires row j of V to have length 1. We may write all these conditions as a single polynomial condition Q(V ) = 0 by taking the sum of squares: 1 i> a >> D and this is crucial for our reduction. The main property of the constructed instance of PCA WITH OUTLIERS is stated in the following claim. Claim 1. If (G, V1, . . . , Vr) is a yes-instance of MULTICOLORED CLIQUE, then Opt(M, r, k) D, and if there is a feasible solution (L, S) for (M, r, k) with M L S 2 F ωD, then (G, V1, . . . , Vr) is a yes-instance of MULTICOLORED CLIQUE. Suppose that (G, V1, . . . , Vr) is a yes-instance of MULTI- COLORED CLIQUE, that is, there is a clique X of G such that |X Vi| = 1 for i {1, . . . , r}. Let X = {v1 i1, . . . , vr ir} and R = {vs isvt it | 1 s < t r}, that is, R = E(G[X]). We show that Opt(M, r, k) D. For this, we construct a feasible solution (L, S) such that M L S 2 F D. We define the m r matrices P and Q by setting the elements of P and Q respectively that are in the rows for e E(G) \ R to be zero and the other elements are the same as in P and Q, that is, for e R, the rows of P and Q are pe and qe respectively and other rows are zero-rows. We define an r r matrix B as follows: a i1 0 . . . 0 0 a i2 . . . 0 ... ... ... ... 0 0 . . . a ir that is, the diagonal elements of B are a i1, . . . , a ir and the other elements are zeros. Denote by b1, . . . , br the rows of B. We construct m copies B1, . . . , Bm of B and the (r + 1)m 2k matrix L using P , Q and Bi, A i for i {1, . . . , m} as blocks: B1 A 1 ... ... Bm A m P Q It is straightforward to verify that rank(L) r. Indeed, the rank of the each submatrix (Bi | A i) is r as only diagonal Refined Complexity of PCA with Outliers elements of Bi and A i are non-zero. Also for each e = vs isvt it R, we have that pe = c(bis + bit) and qe = c(ais +ait), i.e., each row of L indexed by e R is a linear combination of two rows of (B1 | A 1). Let 0 be r r zero matrix. We construct 2m copies 01, . . . , 0m and 0 1, . . . , 0 m of 0 and define 01 0 1 ... ... 0m 0 m P P Q Q Clearly, this matrix has at most k non-zero rows that are indexed by e R. Combining (6) (8), we have that A1 A 1 ... ... Am A m P Q B1 A 1 ... ... Bm A m P Q 01 0 1 ... ... 0m 0 m P P Q Q A1 B1 0 ... ... Am Bm 0 0 0 where 0 is the m r zero matrix, and M S L 2 F =m A B 2 F = m(i2 1 + . . . + i2 r) We conclude that (L, S) is a feasible solution for the considered instance (M, r, k) of PCA WITH OUTLIERS with M S L 2 F D. Therefore, Opt(M, r, k) D. Suppose now that (L, S) is a feasible solution for (M, r, k) of PCA WITH OUTLIERS with M S L 2 F ωD = D . We prove that (G, V1, . . . , Vr) is a yes-instance of MULTICOLORED CLIQUE. Recall that S has at most k = m r 2 non-zero rows, Hence, there is a set R E(G) with |R| = m k = r 2 such that the rows of S indexed by e R are zero-rows. We claim that the edges of R form the set of edges of a complete graph. More formally, we show the following. Claim 2. There are i1, . . . , ir {1, . . . , n} such that R = {vs isvt it | 1 s < t r}. The proof of Claim 2 exploits the property that c >> a >> D . It demands a lot of technicalities and, therefore is omitted. To complete the proof of Theorem 2, recall that M is (r + 1)m 2r integer matrix and the absolute value of each element is at most c = O(r2mn2). Therefore, the bitsize N of M is O(|V (G)|4 log |V (G)|). Observe that, given (G, V1, . . . , Vr), M can be constructed in polynomial time. Assume that there is a ω-approximation algorithm A for PCA WITH OUTLIERS with running time f(d) N o(d) for a computable function f. If (G, V1, . . . , Vr) is a yes-instance of MULTICOLORED CLIQUE, then Opt(G, r, k) D. Therefore, A applied to (M, r, k) reports that there is a feasible solution (L, S) with M L S 2 F ωOpt(G, r, k) ωD by Claim 1. For the opposite direction, if A reports that there is a feasible solution (L, S) with M L S 2 F ωD, then (G, V1, . . . , Vr) is a yes-instance of MULTICOL- ORED CLIQUE by Claim 1. Hence, A reports the existence of a feasible solution (L, S) with M L S 2 F ωD if and only if (G, V1, . . . , Vr) is a yes-instance of MULTICOL- ORED CLIQUE. Since N = O(|V (G)|4 log |V (G)|) and 2r, we obtain that A solves MULTICOLORED CLIQUE in time f(2k) |V (G)|o(r) contradicting ETH. As MULTICOLORED CLIQUE is well-known to be W[1]- hard (see (Fellows et al., 2009; Cygan et al., 2015)), our reduction gives the following corollary based on the weaker conjecture that FPT = W[1]. We refer to the book (Cygan et al., 2015) for the formal definitions of the parameterized complexity classes FPT and W[1]. Note that ETH implies that FPT = W[1] but not the other way around. Corollary 1. For any ω 1, there is no ω-approximation algorithm for PCA WITH OUTLIERS with running time f(d) N O(1) for any computable function f unless FPT = W[1], where N is the bitsize of the input matrix M. Acknowledgements This work is supported by the Research Council of Norway via the project MULTIVAL . The authors are affiliated with the CEDAS center in Bergen. Basu, S., Pollack, R., and Roy, M.-F. Algorithms in Real Algebraic Geometry (Algorithms and Computation in Mathematics). Springer-Verlag, Berlin, Heidelberg, 2006. ISBN 3540330984. Blum, A., Hopcroft, J., and Kannan, R. Foundations of Data Science. June 2017. URL https://www.microsoft.com/en-us/ Refined Complexity of PCA with Outliers research/publication/foundations-ofdata-science-2/. Bouwmans, T., Aybat, N. S., and Zahzah, E.-h. Handbook of robust low-rank and sparse matrix decomposition: Applications in image and video processing. Chapman and Hall/CRC, 2016. Cand es, E. J., Li, X., Ma, Y., and Wright, J. Robust principal component analysis? J. ACM, 58(3):11:1 11:37, 2011. doi: 10.1145/1970392.1970395. URL http: //doi.acm.org/10.1145/1970392.1970395. Caruso, F. The SARAG library: Some algorithms in real algebraic geometry. In Iglesias, A. and Takayama, N. (eds.), Mathematical Software - ICMS 2006, pp. 122 131, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg. ISBN 978-3-540-38086-3. Chandrasekaran, V., Sanghavi, S., Parrilo, P. A., and Willsky, A. S. Rank-sparsity incoherence for matrix decomposition. SIAM Journal on Optimization, 21(2):572 596, 2011. doi: 10.1137/090761793. URL https: //doi.org/10.1137/090761793. Chen, Y., Xu, H., Caramanis, C., and Sanghavi, S. Robust matrix completion and corrupted columns. In Proceedings of the 28th International Conference on Machine Learning (ICML), pp. 873 880, 2011. Cygan, M., Fomin, F. V., Kowalik, L., Lokshtanov, D., Marx, D., Pilipczuk, M., Pilipczuk, M., and Saurabh, S. Parameterized Algorithms. Springer, 2015. ISBN 978-3-319-21274-6. doi: 10.1007/978-3-31921275-3. URL https://doi.org/10.1007/9783-319-21275-3. Eckart, C. and Young, G. The approximation of one matrix by another of lower rank. Psychometrika, 1(3):211 218, 1936. Fellows, M. R., Hermelin, D., Rosamond, F. A., and Vialette, S. On the parameterized complexity of multiple-interval graph problems. Theoretical Computer Science, 410(1): 53 61, 2009. Fomin, F. V., Lokshtanov, D., Meesum, S. M., Saurabh, S., and Zehavi, M. Matrix rigidity from the viewpoint of parameterized complexity. SIAM J. Discrete Math., 32 (2):966 985, 2018. doi: 10.1137/17M112258X. URL https://doi.org/10.1137/17M112258X. Fomin, F. V., Golovach, P. A., Panolan, F., and Simonov, K. Refined complexity of PCA with outliers. Co RR, abs/1905.04124, 2019. URL http://arxiv.org/ abs/1905.04124. Gillis, N. and Vavasis, S. A. On the complexity of robust PCA and ℓ1-norm low-rank matrix approximation. Math. Oper. Res., 43(4):1072 1084, 2018. doi: 10.1287/moor.2017.0895. URL https://doi.org/ 10.1287/moor.2017.0895. Grigoriev, D. Using the notions of separability and independence for proving the lower bounds on the circuit complexity (in russian). Notes of the Leningrad branch of the Steklov Mathematical Institute, Nauka, 1976. Grigoriev, D. Using the notions of separability and independence for proving the lower bounds on the circuit complexity. Journal of Soviet Math., 14(5):1450 1456, 1980. Hotelling, H. Analysis of a complex of statistical variables into principal components. Journal of educational psychology, 24(6):417, 1933. Impagliazzo, R., Paturi, R., and Zane, F. Which problems have strongly exponential complexity. J. Computer and System Sciences, 63(4):512 530, 2001. Lokshtanov, D., Marx, D., and Saurabh, S. Lower bounds based on the exponential time hypothesis. Bulletin of the EATCS, 105:41 72, 2011. URL http://eatcs.org/beatcs/index.php/ beatcs/article/view/92. Pearson, K. Liii. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11): 559 572, 1901. Shlens, J. A tutorial on principal component analysis. Co RR, abs/1404.1100, 2014. URL http://arxiv.org/ abs/1404.1100. Valiant, L. G. Graph-theoretic arguments in low-level complexity. In MFCS, pp. 162 176, 1977. Vaswani, N. and Narayanamurthy, P. Static and dynamic robust PCA and matrix completion: A review. Proceedings of the IEEE, 106(8):1359 1379, 2018. doi: 10.1109/ JPROC.2018.2844126. URL https://doi.org/ 10.1109/JPROC.2018.2844126. Wright, J., Ganesh, A., Rao, S. R., Peng, Y., and Ma, Y. Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. In Proceedings of 23rd Annual Conference on Neural Information Processing Systems (NIPS), pp. 2080 2088. Curran Associates, Inc., 2009. URL http://papers.nips.cc/paper/3704robust-principal-component-analysisexact-recovery-of-corrupted-low-rankmatrices-via-convex-optimization. Refined Complexity of PCA with Outliers Xu, H., Caramanis, C., and Sanghavi, S. Robust PCA via outlier pursuit. In Proceedings of the 24th Annual Conference on Neural Information Processing Systems (NIPS), pp. 2496 2504. Curran Associates, Inc., 2010. URL http://papers.nips.cc/paper/ 4005-robust-pca-via-outlier-pursuit.