# refined_complexity_of_pca_with_outliers__a5af9875.pdf

Reﬁned Complexity of PCA with Outliers

Fedor Fomin * 1 Petr Golovach * 1 Fahad Panolan * 1 Kirill Simonov * 1

Principal component analysis (PCA) is one of the most fundamental procedures in exploratory data analysis and is the basic step in applications ranging from quantitative ﬁnance and bioinformatics to image analysis and neuroscience. However, it is well-documented that the applicability of PCA in many real scenarios could be constrained by an immune deﬁciency to outliers such as corrupted observations. We consider the following algorithmic question about the PCA with outliers. For a set of n points in Rd, how to learn a subset of points, say 1% of the total number of points, such that the remaining part of the points is best ﬁt into some unknown r-dimensional subspace? We provide a rigorous algorithmic analysis of the problem. We show that the problem is solvable in time n O(d2). In particular, for constant dimension the problem is solvable in polynomial time. We complement the algorithmic result by the lower bound, showing that unless Exponential Time Hypothesis fails, in time f(d)no(d), for any function f of d, it is impossible not only to solve the problem exactly but even to approximate it within a constant factor.

1. Introduction

Problem statement and motivation. Classical principal component analysis (PCA) is one of the most popular and successful techniques used for dimension reduction in data analysis and machine learning (Pearson, 1901; Hotelling, 1933; Eckart & Young, 1936). In PCA one seeks the best

*Equal contribution 1Department of Informatics, University of Bergen, Norway. Correspondence to: Fedor Fomin <fomin@ii.uib.no>, Petr Golovach <pgo041@uib.no>, Fahad Panolan <Fahad.Panolan@uib.no>, Kirill Simonov <Kirill.Simonov@uib.no>.

Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s).

low-rank approximation of data matrix M by solving

minimize M L 2 F subject to rank(L) r.

Here ||A||2 F = P

i,j a2 ij is the square of the Frobenius norm of matrix A. By the Eckart-Young theorem (Eckart & Young, 1936), PCA is efﬁciently solvable via Singular Value Decomposition (SVD). PCA is used as a preprocessing step in a great variety of modern applications including face recognition, data classiﬁcation, and analysis of social networks.

In this paper we consider a variant of PCA with outliers, where we wish to recover a low-rank matrix from large but sparse errors. Suppose that we have n points (observations) in d-dimensional space. We know that a part of the points are arbitrarily located (say, produced by corrupted observations) while the remaining points are close to an r-dimensional true subspace. We do not have any information about the true subspace and about the corrupted observations. Our task is to learn the true subspace and to identify the outliers. As a common practice, we collect the points into n d matrix M, thus each of the rows of M is a point and the columns of M are the coordinates. However, it is very likely that PCA of M will not reveal any reasonable information about non-corrupted observations well-documented drawback of PCA is its vulnerability to even very small number of outliers, an example is shown in Figure 1.

Matrix formulation suggests the following interpretation: we seek a low-rank matrix L that, with an exception in few rows, approximates M best.

Reﬁned Complexity of PCA with Outliers

Figure 1. An illustration on how outliers impact PCA. The optimal approximation line (in dashed) of the given set of points without the evident outlier shows the linear structure of the dataset. However, when the outlier is present, the principal component (in solid) changes drastically.

Input: Data matrix M Rn d, integer parameters r and k. Task:

minimize M L S 2 F subject to L, S Rn d,

rank(L) r, and

S has at most k non-zero rows.

PCA WITH OUTLIERS

The geometric interpretation of PCA WITH OUTLIERS is very natural: Given n points in Rd, we seek for a set of k points whose removal leaves the remaining n k points as close as possible to some r-dimensional subspace.

Related work. PCA WITH OUTLIERS belongs to the large class of extensively studied robust PCA problems, see e.g. (Vaswani & Narayanamurthy, 2018; Xu et al., 2010; Bouwmans et al., 2016). In the robust PCA setting we observe a noisy version M of data matrix L whose principal components we have to discover. In the case when M is a slightly disturbed version of L, PCA performed on M provides a reasonable approximation for L. However, when M is very noisy version of L, like being corrupted by a few outliers, even one corrupted outlier can arbitrarily alter the quality of the approximation.

One of the approaches to robust PCA, which is relevant to our work, is to model outliers as additive sparse matrix. Thus we have a data d n matrix M, which is the superpo-

sition of a low-rank component L and a sparse component S. That is, M = L+S. This approach became popular after the works of Cand es et al. (Cand es et al., 2011), Wright et al. (Wright et al., 2009), and Chandrasekaran et al. (Chandrasekaran et al., 2011). A signiﬁcant body of work on the robust PCA problem has been centered around proving that, under some feasibility assumptions on M, L, and S, a solution to

minimize rank(L) + λ S 0 (1)

subject to M = L + S,

where S 0 denotes the number of non-zero columns in matrix S and λ is a regularizing parameter, recovers matrix L uniquely. While optimization problem (1) is NP-hard (Gillis & Vavasis, 2018), it is possible to show that under certain assumptions on L and S, its convex relaxation can recover these matrices efﬁciently.

The problem strongly related to (1) was studied in computational complexity under the name MATRIX RIGIDITY (Grigoriev, 1976; 1980; Valiant, 1977). Here, for a given matrix M, and integers r and k, the task is to decide whether at most k entries of M can be changes so that the rank of the resulting matrix is at most r. Equivalently, this is the problem to decide whether a given matrix M = L + S, where rank(L) r and S 0 k. Fomin et al. (Fomin et al., 2018) gave an algorithm solving MATRIX RIGIITY in time 2O(r k log(r k)) (nd)O(1). On the other hand, they show that the problem is W[1]-hard parameterized by k. In particular, this implies that an algorithm of running time f(k) (rnd)O(1) for this problem is highly unlikely for any function f of k only.

A natural extension of the robust PCA approach (1) is to consider the noisy version of robust PCA: Given M = L+S+N, where L, S, and N are unknown, but L is known to be low rank, S is known to have a few non-zero rows, and noise matrix N is of small Frobenius norm, recover L. Wright et al. (Wright et al., 2009) studied the following model of noisy robust PCA:

minimize rank(L) + λ S 0 (2)

subject to M L S 2 F ε.

Thus (2) models the situations when we want to learn the principal components of n points in d-dimensional space under the assumption that a small number of coordinates is corrupted.

The study of the natural, and seemingly more difﬁcult extension of (1) to the PCA with outliers, was initiated by Xu et al. (Xu et al., 2010), who introduced the following idealization of the problem.

minimize rank(L) + λ S 0,r (3)

subject to M = L + S.

Reﬁned Complexity of PCA with Outliers

Here S 0,r denotes the number of non-zero columns in matrix S and λ is a regularizing parameter. Xu et al. (Xu et al., 2010) approached this problem by building its convex surrogate and applying efﬁcient convex optimization-based algorithm for the surrogate. Chen et al. (Chen et al., 2011) studied the variant of the problem with the partially observed data. Similar as (2) is the noisy version of the robust PCA model (1), the PCA WITH OUTLIERS problem studied in our work can be seen as a noisy version of (3).

Our results. Even though PCA WITH OUTLIERS was assumed to be NP-hard, to the best of our knowledge, this has never been studied formally. While NP-hardness is a serious strike against the tractability of the problem, on the other hand, it only says that in the worst case the problem is not tractable. But since the complexity of the problem could be governed by several parameters like the rank r of L, the number of outliers k or dimension d of M, it is natural to ask how these parameters inﬂuence the complexity of the problem. For example, when k is a small constant, we can guess which points are outliers and run PCA for the remaining points. This will bring us to nk calls of PCA which is polynomial for constant k and is exponential when k is a fraction of n.

In this paper we give an algorithm solving PCA WITH OUT-

LIERS roughly in time |M|O(d2), where |M| is the size of the input matrix M. Thus for ﬁxed dimension d, the problem is solvable in polynomial time. The algorithms works in polynomial time for any number of outliers k and the rank r of the recovered matrix L. Our algorithm strongly relies on the tools developed in computational algebraic geometry, in particular, for handling arrangements of algebraic surfaces in Rd deﬁned by polynomials of bounded degree.

We complement our algorithmic result by a complexity lower bound. Our lower bound not only implies that the problem is NP-hard when dimension d is part of the input, it also rules out a possibility of certain type of algorithms for PCA WITH OUTLIERS. More precisely, assuming the Exponential Time Hypothesis (ETH),1 we show that for any constant ω 1, PCA WITH OUTLIERS cannot be ωapproximated in time f(d)|M|o(d), for any function f of d only.

Our algorithm is, foremost, of theoretical interest, especially in the presense of the nearly-matching lower bound showing that doing something essentially better is next to impossible. In practice, PCA is often applied to reduce high-dimensional datasets, and for this task the running time exponential in d is not practical. However, there are cases where such an algorithm could be useful. One example could be the visualization of low-dimensional data, where the number of

1ETH of Impagliazzo, Paturi, and Zane (Impagliazzo et al., 2001) is that 3-SAT with n-variables is not solvable in time 2o(n).

dimensions, even if it is small already, needs to be lowered down to two to actually draw the dataset. Another example could be when we suspect a small subset of features to be highly correlated, and we want to reduce them to one dimension in order to get rid of the redundancy in data. This potential application is well illustrated by the popular PCA tutorial (Shlens, 2014), where essentially one-dimensional movement of a spring-mass is captured by three cameras, resulting in 6 features.

2. Polynomial algorithm for bounded dimension

2.1. Preliminaries

As a subroutine in our algorihm, we use a standard result about sampling points from cells of an arrangement of algebraic surfaces, so ﬁrst we state some deﬁnitions and an algorithm from (Basu et al., 2006).

We denote the ring of polynomials in variables X1, . . . , Xd with coefﬁcients in R by R[X1, . . . , Xd]. By saying that an algebraic set V in Rd is deﬁned by Q R[X1, . . . , Xd], we mean that V = {x Rd|Q(x1, . . . , xd) = 0}. For a set of s polynomials P = {P1, . . . , Ps} R[X1, . . . , Xd], a sign condition is speciﬁed by a sign vector σ { 1, 0, +1}s, and the sign condition is non-empty over V with respect to P if there is a point x V such that

σ = (sign(P1(x)), . . . , sign(Ps(x))),

where sign(x) is the sign function on real numbers deﬁned as

1, if x > 0, 0, if x = 0, 1, if x < 0

The realization space of σ { 1, 0, +1}s over V is the set

R(σ) = {x|x V, σ = (sign(P1(x)), . . . , sign(Ps(x)))}.

If R(σ) is not empty then each of its non-empty semialgebrically connected (which is equivalent to just connected on semi-algebraic sets as proven in (Basu et al., 2006), Theorem 5.22) components is a cell of P over V .

For an algebraic set W its real dimension is the maximal integer d such that there is a homeomorphism of [0, 1]d in W. Naturally, if W Rd, then d d.

The following theorem from (Basu et al., 2006) gives an algorithm to compute a point in each cell of P over V .

Proposition 1 ((Basu et al., 2006), Theorem 13.22). Let V be an algebraic set in Rd of real dimension d deﬁned by Q(X1, . . . , Xd) = 0, where Q is a polynomial in R[X1, . . . , Xd] of degree at most b, and let P

Reﬁned Complexity of PCA with Outliers

R[X1, . . . , Xd] be a ﬁnite set of s polynomials with each P P also of degree at most b. Let D be a ring generated by the coefﬁcients of Q and the polynomials in P. There is an algorithm which takes as input Q, d and P and computes a set of points meeting every non-empty cell of V over P. The algorithm uses at most sd b O(d) arithmetic operations in D.

On the practical side, we note that a number of routines from (Basu et al., 2006) is implemented in the SARAG library (Caruso, 2006).

2.2. Algorithm

First, we emphasize on a folklore observation that geometrically the low-rank approximation matrix L is deﬁned as orthogonal projection of rows of M on some r-dimensional subspace of Rd. For the proof see e.g. (Blum et al., 2017).

Proposition 2. Given a matrix M Rn d with rows m1, ..., mn, the task of ﬁnding a matrix L of rank at most r which minimizes ||M L||2 F is equivalent to ﬁnding an r-dimensional subspace of Rd which minimizes the total squared distance from rows of M treated as points in Rd:

min L Rn d rank L r ||M L||2 F =

min U Rd U is a linear subspace of dim r

i=1 ||mi proj U mi||2 F ,

where proj U x is the orthogonal projection of x on U for x Rd.

By Proposition 2, if we ﬁx an r-dimensional subspace U containing the span of rows of L, then the outliers are automatically deﬁned as k farthest points from U among {mi}n i=1. In the next proposition, we give a precise statement of this.

Proposition 3. The optimization objective of PCA WITH OUTLIERS for a given matrix M Rn d with rows m1, ... , mn can be equivalently redeﬁned as follows.

min L,S Rn d rank L r S has at most k non-zero rows

||M L S||2 F

= min U Rd U is a linear subspace of dim r ||M LU SU||2 F ,

where SU has k non-zero rows which are k rows of M with the largest value of ||mi proj U mi||2 F , and LU is the

orthogonal projection of the rows of (M SU) on U.

m1 ... mk 0 ... 0

0 ... 0 proj U mk+1 ... proj U mn

assuming that rows of M are ordered by descending ||mi proj U mi||2 F .

So for a ﬁxed U we may determine SU easily and then solve the classical PCA for the matrix (M SU). The intuition behind our algorithm is that the set of k farthest points is the same for many subspaces, and solving PCA for (M SU) treats all these subspaces. The crucial point is to bound the number of different matrices SU we have to consider. There is of course a trivial bound of nk since SU is always obtained by choosing k rows of M. But the number of different SU is also geometrically limited, and exploiting this we are able to obtain another bound of n O(d2), resulting in the following theorem.

Theorem 1. Solving PCA WITH OUTLIERS is reducible to solving n 2

min(rd,(d r)d) 2O(d) = n O(d2)

instances of PCA. This reduction can be computed in the number of operations over R bounded by the expression above.

First, a note about the statement of Theorem 1. Our algorithm relies on solving the classical PCA, and since only iterative algorithms for PCA and SVD exist, we could not claim that our algorithm solves PCA WITH OUTLIERS in some ﬁxed number of operations. However, if we are only interested in solving the problem up to some constant precision, for example machine epsilon, then PCA is solvable in polynomial number of operations and so by Theorem 1, PCA WITH OUTLIERS is solvable in n O(d2) operations.

Proof of Theorem 1. We start with associating rdimensional subspaces of Rd with points of a certain algebraic set. Consider the matrix space R(d r) d, and for an element V R(d r) d, V = {vij}i,j, the following polynomial conditions:

QO i,j(V ) :=

l=1 vilvjl = 0, for 1 i < j (d r),

QN i (V ) :=

1 = 0, for 1 i (d r),

Reﬁned Complexity of PCA with Outliers

where condition QO i,j(V ) = 0 requires rows i, j of V to be pairwise orthogonal and condition QN j (V ) = 0 requires row j of V to have length 1. We may write all these conditions as a single polynomial condition Q(V ) = 0 by taking the sum of squares:

1 i<j (d r) (QO i,j(V ))2 +

i=1 (QN i (V ))2.

Thus Q(V ) = 0 if and only if each of QO i,j(V ) and each of QN i (V ) is 0.

Consider an algebraic set W R(d r) d deﬁned as the zero set of Q(V ). For any V W with rows v1, ..., vd r, consider the r-dimensional subspace comp(V ) := span({v1, , vd r}) Rd which is the orthogonal complement of the span of the rows of V . Since Q(V ) = 0, the rows of V are pairwise orthogonal and are of length 1. Then, the dimension of comp(V ) is r and for any point x Rd

the squared distance from x to comp(V ) is equal to

i=1 (vi x)2 = ||V x T ||2 F ,

assuming that x is a row vector.

Each V W deﬁnes an r-dimensional subspace comp(V ) Rd and each r-dimensional subspace U Rd

is of this form for some V W since there exists an orthonormal basis of the orthogonal complement of U. Then we can reformulate Proposition 3 in terms of elements of W as follows.

min L,S Rn d rank L r S has at most k non-zero rows

||M L S||2 F

= min V W ||M Scomp(V ) Lcomp(V )||2 F , (4)

where Scomp(V ) and Lcomp(V ) are deﬁned in accordance with notation in Proposition 3. Let m1, ..., mn be the rows of the input matrix M; Scomp(V ) has k non-zero rows which are k rows of M with the largest value of ||mi projcomp(V ) mi||2 F = ||V m T i ||2 F , and Lcomp(V ) is the orthogonal projection of the rows of (M Scomp(V )) on comp(V ). Denote Scomp(V ) by SV and Lcomp(V ) by LV .

Now, consider the set of polynomials P = {Pi,j}1 i<j n deﬁned on W, where

Pi,j(V ) = ||V m T i ||2 F ||V m T j ||2 F .

Consider the partition C of W on cells over P. For each cell C, the sign condition with respect to P is constant over C, meaning that for every pair 1 i < j n, the sign of

||V m T i ||2 F ||V m T j ||2 F

is the same for all V C. So the relative order on {||V m T i ||2 F }n i=1 is also the same for all V C. Since ||V m T i ||2 F is exactly the squared distance from mi to V , k rows of M which are the farthest are also the same for all V C. Then SV is constant over V C, denote this common value as SC. We can rewrite (4) as

min V W ||M SV LV ||2 F

= min C C min V C ||M SV LV ||2 F

= min C C min V C ||(M SC) LV ||2 F .

min C C min V C ||(M SC) LV ||2 F

= min C C min V W ||(M SC) LV ||2 F , (5)

as for any C C, min V C ||(M SC) LV ||2 F min V W ||(M SC) LV ||2 F since C W. Also, any (SC, LV ) in the right-hand side of (5) is still a valid choice of (S, L) for the original problem, and the optimum of the original problem is equal to the left-hand side of (5).

For a ﬁxed C C computing right-hand side of (5) is equivalent to solving an instance (M SC, r) of the classical PCA by Proposition 2:

min V W ||(M SC) LV ||2 F

= min L Rn d, rank(L) r ||(M SC) L||2 F .

By the reasoning above, the optimum of the original instance of PCA WITH OUTLIERS is reached on one of the constructed instances {(M SC, r)}C C of PCA. A toy example of an algebraic set W and its partitioning is shown in Figure 2.

Putting all together, our algorithm proceeds as follows.

1. Using the algorithm from Proposition 1, obtain a point VC from each cell C of W over P.

2. For each VC, compute the optimal SVC: select the k rows of M with the largest value of ||VCm T i ||2 F . Construct the instance (M SVC, r) of PCA.

3. The solution to the original instance of PCA WITH OUTLIERS is the best solution among the solutions of all the constructed PCA instances.

Since degrees of Q and polynomials from P are at most 4, |P| = n 2 , and the real dimension of W is at most (d r)d, which is the dimension of R(d r) d W, the algorithm from Proposition 1 does at most

(d r)d 2O(d)

Reﬁned Complexity of PCA with Outliers

||V m T 1 ||2 F = ||V m T 2 ||2 F

||V m T 1 ||2 F = ||V m T 2 ||2 F

Figure 2. An example with d = 2, n = 2, r = 1, k = 1. Two rows of the input matrix M are represented as two points m1 and m2 on the plane. The same plane represents the choice of 1-dimensional approximation subspace through the selection of a vector V orthogonal to it. The algebraic set W is the unit circle since length of V must be 1. The diagonal lines mark the values of V for which m1 and m2 are equidistant. They split W into four one-dimensional and four zero-dimensional cells. For each of the one-dimensional cells it is shown in the corresponding sector which of the points is the outlier and hence is removed.

operations and produces at most t points VC, and our algorithm produces one instance of PCA for each computed point.2

We are also able to obtain a reduction to n 2

instances of PCA by proceeding in the same manner for slightly different characterization of r-dimensional subspaces. Intuitively, now points on the algebraic set deﬁne the orthonormal basis of the subspace itself, and not of its orthogonal complement as in the previous part.

Now the matrix space is Rr d, the conditions that an element V Rr d deﬁnes an orthonormal basis of size r are analogous:

QO i,j(V ) :=

l=1 vilvjl = 0, for 1 i < j r,

QN i (V ) :=

1 = 0, for 1 i r.

2As W is restricted by Q(V ) = 0, its dimension is actually smaller. It could be bounded more precisely as (d r)(d+r 1)/2, but we omit the calculation in order not to unnecessarily complicate the text.

Again, we may write them as a single polynomial condition Q(V ) = 0 where

1 i<j r ( QO i,j(V ))2 +

i=1 ( QN i (V ))2,

Consider an algebraic set W Rr d deﬁned as the set of zeroes of Q(V ). Similarly, any V W deﬁnes an rdimensional subspace U Rd which is the span of the rows of V . Since the rows of V form an orthonormal basis of U, for any point x Rd the squared distance from x to U is equal to

i=1 (vi x)2 = ||x||2 F ||V x T ||2 F .

The new distance formula leads to a slightly different set of polynomials P = { Pi,j}1 i<j n on W, comparing the distance from mi and from mj,

Pi,j(V ) = (||mi||2 F ||V m T i ||2 F ) (||mj||2 F ||V m T j ||2 F ).

Again, the k farthest points and the matrix SV are the same over any cell in the partition of W over P. So by the same reasoning as in the ﬁrst part, it sufﬁces to take a point V from each cell, compute the outlier matrix SV and solve PCA for (M SV , r).

As we can choose the most effecient of the two subspace representations, we can reduce PCA WITH OUTLIERS to

min(rd,(d r)d) 2O(d)

instances of PCA.3

3. ETH lower bound

In this section we show that we cannot avoid the dependence on d in the exponent of the running time of a constant factor approximation algorithm for PCA WITH OUTLIERS unless Exponential Time Hypothesis (ETH) is false. Recall that ETH is the conjecture stated by Impagliazzo, Paturi and Zane (Impagliazzo et al., 2001) in 2001 that for every integer k 3, there is a positive constant δk such that the k-SATISFIABILITY problem with n variables and m clauses cannot be solved in time O(2δkn (n+m)O(1)). This means that k-SATISFIABILITY cannot be solved in subexponential in n time.

3As with W, the dimension of W could be bounded more precisely as r(2d r 1)/2, and with these dimension bounds PCA WITH OUTLIERS reduces to n 2 min(r(2d r 1)/2,(d r)(d+r 1)/2)2O(d) instances of PCA.

Reﬁned Complexity of PCA with Outliers

Let (M, r, k) be an instance of PCA WITH OUTLIERS where M is an n d matrix. We say that a pair of n d matrices (L, S) is a feasible solution for (M, r, k) if rank(L) r and S has at most k non-zero rows. A feasible solution (L , S ) is optimal if M L S 2 F is minimum, and we denote Opt(M, r, k) = M L S 2 F .

Theorem 2. For any ω 1, there is no ω-approximation algorithm for PCA WITH OUTLIERS with running time f(d) N o(d) for any computable function f unless ETH fails, where N is the bitsize of the input matrix M.

Sketch of the proof. Due to space constraints, we only sketch the proof; the details can be found in the full version of the paper (Fomin et al., 2019). Let ω 1. We reduce from the MULTICOLORED CLIQUE problem:

Input: A graph G with a partition V1, . . . , Vr of the vertex set. Task: Decide whether there is a clique X in G with |X Vi| = 1.

MULTICOLORED CLIQUE parameterized by r

This problem cannot be solved in time f(r) |V (G)|o(r) for any computable function f unless ETH fails (Cygan et al., 2015; Lokshtanov et al., 2011)).

Let (G, V1, . . . , Vr) be an instance of MULTICOLORED CLIQUE. We assume without loss of generality that for i {1, . . . , r}, |Vi| = n (otherwise, we can add dummy isolated vertices to insure the property; clearly, the claim that the problem cannot be solved in time f(r) |V (G)|o(r) up to ETH remains correct) and each Vi is an independent set. We denote vi 1, . . . , vi n the vertices of Vi for i {1, . . . , r} and let m = |E(G)|. We also assume that r 4.

We set a = 4(r + 1)2mn2ω and c = 9a. We deﬁne m r matrices P = (pij) and Q = (qij) whose rows are indexed by the edges of G as follows: for every e = vs i vt j E(G),

set pes = c(a i), pet = c(a j),

set qes = qet = ca,

for h {1, . . . , r} such that h = i, j, set pei = qej = 0.

For e E(G), pe and qe denotes the e-th row of P and Q, respectively. We deﬁne the r r matrix A = a Ir. Let a1, . . . , ar be the rows of A. We construct 2m copies of A denoted by Ai, A i for i {1, . . . , m}. We construct an (r + 1)m 2r matrix M using P, Q and Ai, A i for

i {1, . . . , m} as blocks:

... Am A m P Q

To simplify notation, we index the rows of M corresponding to (P | Q) by the edges of G and use integers for other indices. We follow the same convention of the matrices of similar structure that are considered further. We deﬁne k = m r 2 and set D = rmn2 and D = Dω. Note that the dimension d = 2r Observe also that a and c are chosen in such a way that c >> a >> D and this is crucial for our reduction. The main property of the constructed instance of PCA WITH OUTLIERS is stated in the following claim.

Claim 1. If (G, V1, . . . , Vr) is a yes-instance of MULTICOLORED CLIQUE, then Opt(M, r, k) D, and if there is a feasible solution (L, S) for (M, r, k) with M L S 2 F ωD, then (G, V1, . . . , Vr) is a yes-instance of MULTICOLORED CLIQUE.

Suppose that (G, V1, . . . , Vr) is a yes-instance of MULTI-

COLORED CLIQUE, that is, there is a clique X of G such that |X Vi| = 1 for i {1, . . . , r}. Let X = {v1 i1, . . . , vr ir} and R = {vs isvt it | 1 s < t r}, that is, R = E(G[X]). We show that Opt(M, r, k) D. For this, we construct a feasible solution (L, S) such that M L S 2 F D.

We deﬁne the m r matrices P and Q by setting the elements of P and Q respectively that are in the rows for e E(G) \ R to be zero and the other elements are the same as in P and Q, that is, for e R, the rows of P and Q are pe and qe respectively and other rows are zero-rows. We deﬁne an r r matrix B as follows:

a i1 0 . . . 0 0 a i2 . . . 0 ... ... ... ... 0 0 . . . a ir

that is, the diagonal elements of B are a i1, . . . , a ir and the other elements are zeros. Denote by b1, . . . , br the rows of B. We construct m copies B1, . . . , Bm of B and the (r + 1)m 2k matrix L using P , Q and Bi, A i for i {1, . . . , m} as blocks:

B1 A 1 ... ... Bm A m P Q

It is straightforward to verify that rank(L) r. Indeed, the rank of the each submatrix (Bi | A i) is r as only diagonal

Reﬁned Complexity of PCA with Outliers

elements of Bi and A i are non-zero. Also for each e = vs isvt it R, we have that pe = c(bis + bit) and qe = c(ais +ait), i.e., each row of L indexed by e R is a linear combination of two rows of (B1 | A 1).

Let 0 be r r zero matrix. We construct 2m copies 01, . . . , 0m and 0 1, . . . , 0 m of 0 and deﬁne

01 0 1 ... ... 0m 0 m P P Q Q

Clearly, this matrix has at most k non-zero rows that are indexed by e R.

Combining (6) (8), we have that

A1 A 1 ... ... Am A m P Q

B1 A 1 ... ... Bm A m P Q

01 0 1 ... ... 0m 0 m P P Q Q

A1 B1 0 ... ... Am Bm 0 0 0

where 0 is the m r zero matrix, and

M S L 2 F =m A B 2 F = m(i2 1 + . . . + i2 r)

We conclude that (L, S) is a feasible solution for the considered instance (M, r, k) of PCA WITH OUTLIERS with M S L 2 F D. Therefore, Opt(M, r, k) D.

Suppose now that (L, S) is a feasible solution for (M, r, k) of PCA WITH OUTLIERS with M S L 2 F ωD = D . We prove that (G, V1, . . . , Vr) is a yes-instance of MULTICOLORED CLIQUE.

Recall that S has at most k = m r 2 non-zero rows, Hence, there is a set R E(G) with |R| = m k = r 2

such that the rows of S indexed by e R are zero-rows. We claim that the edges of R form the set of edges of a complete graph. More formally, we show the following.

Claim 2. There are i1, . . . , ir {1, . . . , n} such that R = {vs isvt it | 1 s < t r}.

The proof of Claim 2 exploits the property that c >> a >> D . It demands a lot of technicalities and, therefore is omitted.

To complete the proof of Theorem 2, recall that M is (r + 1)m 2r integer matrix and the absolute value of each element is at most c = O(r2mn2). Therefore, the bitsize N of M is O(|V (G)|4 log |V (G)|). Observe that, given (G, V1, . . . , Vr), M can be constructed in polynomial time. Assume that there is a ω-approximation algorithm A for PCA WITH OUTLIERS with running time f(d) N o(d) for a computable function f. If (G, V1, . . . , Vr) is a yes-instance of MULTICOLORED CLIQUE, then Opt(G, r, k) D. Therefore, A applied to (M, r, k) reports that there is a feasible solution (L, S) with M L S 2 F ωOpt(G, r, k) ωD by Claim 1. For the opposite direction, if A reports that there is a feasible solution (L, S) with M L S 2 F ωD, then (G, V1, . . . , Vr) is a yes-instance of MULTICOL-

ORED CLIQUE by Claim 1. Hence, A reports the existence of a feasible solution (L, S) with M L S 2 F ωD if and only if (G, V1, . . . , Vr) is a yes-instance of MULTICOL-

ORED CLIQUE. Since N = O(|V (G)|4 log |V (G)|) and 2r, we obtain that A solves MULTICOLORED CLIQUE in time f(2k) |V (G)|o(r) contradicting ETH.

As MULTICOLORED CLIQUE is well-known to be W[1]- hard (see (Fellows et al., 2009; Cygan et al., 2015)), our reduction gives the following corollary based on the weaker conjecture that FPT = W[1]. We refer to the book (Cygan et al., 2015) for the formal deﬁnitions of the parameterized complexity classes FPT and W[1]. Note that ETH implies that FPT = W[1] but not the other way around.

Corollary 1. For any ω 1, there is no ω-approximation algorithm for PCA WITH OUTLIERS with running time f(d) N O(1) for any computable function f unless FPT = W[1], where N is the bitsize of the input matrix M.

Acknowledgements

This work is supported by the Research Council of Norway via the project MULTIVAL . The authors are afﬁliated with the CEDAS center in Bergen.

Basu, S., Pollack, R., and Roy, M.-F. Algorithms in Real Algebraic Geometry (Algorithms and Computation in Mathematics). Springer-Verlag, Berlin, Heidelberg, 2006. ISBN 3540330984.

Blum, A., Hopcroft, J., and Kannan, R. Foundations of Data Science. June 2017. URL https://www.microsoft.com/en-us/

Reﬁned Complexity of PCA with Outliers

research/publication/foundations-ofdata-science-2/.

Bouwmans, T., Aybat, N. S., and Zahzah, E.-h. Handbook of robust low-rank and sparse matrix decomposition: Applications in image and video processing. Chapman and Hall/CRC, 2016.

Cand es, E. J., Li, X., Ma, Y., and Wright, J. Robust principal component analysis? J. ACM, 58(3):11:1 11:37, 2011. doi: 10.1145/1970392.1970395. URL http: //doi.acm.org/10.1145/1970392.1970395.

Caruso, F. The SARAG library: Some algorithms in real algebraic geometry. In Iglesias, A. and Takayama, N. (eds.), Mathematical Software - ICMS 2006, pp. 122 131, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg. ISBN 978-3-540-38086-3.

Chandrasekaran, V., Sanghavi, S., Parrilo, P. A., and Willsky, A. S. Rank-sparsity incoherence for matrix decomposition. SIAM Journal on Optimization, 21(2):572 596, 2011. doi: 10.1137/090761793. URL https: //doi.org/10.1137/090761793.

Chen, Y., Xu, H., Caramanis, C., and Sanghavi, S. Robust matrix completion and corrupted columns. In Proceedings of the 28th International Conference on Machine Learning (ICML), pp. 873 880, 2011.

Cygan, M., Fomin, F. V., Kowalik, L., Lokshtanov, D., Marx, D., Pilipczuk, M., Pilipczuk, M., and Saurabh, S. Parameterized Algorithms. Springer, 2015. ISBN 978-3-319-21274-6. doi: 10.1007/978-3-31921275-3. URL https://doi.org/10.1007/9783-319-21275-3.

Eckart, C. and Young, G. The approximation of one matrix by another of lower rank. Psychometrika, 1(3):211 218, 1936.

Fellows, M. R., Hermelin, D., Rosamond, F. A., and Vialette, S. On the parameterized complexity of multiple-interval graph problems. Theoretical Computer Science, 410(1): 53 61, 2009.

Fomin, F. V., Lokshtanov, D., Meesum, S. M., Saurabh, S., and Zehavi, M. Matrix rigidity from the viewpoint of parameterized complexity. SIAM J. Discrete Math., 32 (2):966 985, 2018. doi: 10.1137/17M112258X. URL https://doi.org/10.1137/17M112258X.

Fomin, F. V., Golovach, P. A., Panolan, F., and Simonov, K. Reﬁned complexity of PCA with outliers. Co RR, abs/1905.04124, 2019. URL http://arxiv.org/ abs/1905.04124.

Gillis, N. and Vavasis, S. A. On the complexity of robust PCA and ℓ1-norm low-rank matrix approximation. Math. Oper. Res., 43(4):1072 1084, 2018. doi: 10.1287/moor.2017.0895. URL https://doi.org/ 10.1287/moor.2017.0895.

Grigoriev, D. Using the notions of separability and independence for proving the lower bounds on the circuit complexity (in russian). Notes of the Leningrad branch of the Steklov Mathematical Institute, Nauka, 1976.

Grigoriev, D. Using the notions of separability and independence for proving the lower bounds on the circuit complexity. Journal of Soviet Math., 14(5):1450 1456, 1980.

Hotelling, H. Analysis of a complex of statistical variables into principal components. Journal of educational psychology, 24(6):417, 1933.

Impagliazzo, R., Paturi, R., and Zane, F. Which problems have strongly exponential complexity. J. Computer and System Sciences, 63(4):512 530, 2001.

Lokshtanov, D., Marx, D., and Saurabh, S. Lower bounds based on the exponential time hypothesis. Bulletin of the EATCS, 105:41 72, 2011. URL http://eatcs.org/beatcs/index.php/ beatcs/article/view/92.

Pearson, K. Liii. on lines and planes of closest ﬁt to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11): 559 572, 1901.

Shlens, J. A tutorial on principal component analysis. Co RR, abs/1404.1100, 2014. URL http://arxiv.org/ abs/1404.1100.

Valiant, L. G. Graph-theoretic arguments in low-level complexity. In MFCS, pp. 162 176, 1977.

Vaswani, N. and Narayanamurthy, P. Static and dynamic robust PCA and matrix completion: A review. Proceedings of the IEEE, 106(8):1359 1379, 2018. doi: 10.1109/ JPROC.2018.2844126. URL https://doi.org/ 10.1109/JPROC.2018.2844126.

Wright, J., Ganesh, A., Rao, S. R., Peng, Y., and Ma, Y. Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. In Proceedings of 23rd Annual Conference on Neural Information Processing Systems (NIPS), pp. 2080 2088. Curran Associates, Inc., 2009. URL http://papers.nips.cc/paper/3704robust-principal-component-analysisexact-recovery-of-corrupted-low-rankmatrices-via-convex-optimization.

Reﬁned Complexity of PCA with Outliers

Xu, H., Caramanis, C., and Sanghavi, S. Robust PCA via outlier pursuit. In Proceedings of the 24th Annual Conference on Neural Information Processing Systems (NIPS), pp. 2496 2504. Curran Associates, Inc., 2010. URL http://papers.nips.cc/paper/ 4005-robust-pca-via-outlier-pursuit.