# sparse_and_lowrank_tensor_decomposition__1acf5495.pdf

Sparse and Low-Rank Tensor Decomposition

Parikshit Shah parikshit@yahoo-inc.com Nikhil Rao nikhilr@cs.utexas.edu

Gongguo Tang gtang@mines.edu

Motivated by the problem of robust factorization of a low-rank tensor, we study the question of sparse and low-rank tensor decomposition. We present an efﬁcient computational algorithm that modiﬁes Leurgans algoirthm for tensor factorization. Our method relies on a reduction of the problem to sparse and low-rank matrix decomposition via the notion of tensor contraction. We use well-understood convex techniques for solving the reduced matrix sub-problem which then allows us to perform the full decomposition of the tensor. We delineate situations where the problem is recoverable and provide theoretical guarantees for our algorithm. We validate our algorithm with numerical experiments.

1 Introduction

Tensors are useful representational objects to model a variety of problems such as graphical models with latent variables [1], audio classiﬁcation [20], psychometrics [8], and neuroscience [3]. One concrete example proposed in [1] involves topic modeling in an exchangeable bag-of-words model wherein given a corpus of documents one wishes to estimate parameters related to the different topics of the different documents (each document has a unique topic associated to it). By computing the empirical moments associated to (exchangeable) bi-grams and tri-grams of words in the documents, [1] shows that this problem reduces to that of a (low rank) tensor decomposition. A number of other machine learning tasks, such as Independent Component Analysis [11], and learning Gaussian mixtures [2] are reducible to that of tensor decomposition. While most tensor problems are computationally intractable [12] there has been renewed interest in developing tractable and principled approaches for the same [4,5,12,15,19,21,24 27].

In this paper we consider the problem of performing tensor decompositions when a subset of the entries of a low-rank tensor X are corrupted adversarially, so that the tensor observed is Z = X+Y where Y is the corruption. One may view this problem as the tensor version of a sparse and low-rank matrix decomposition problem as studied in [6,9,10,13]. We develop an algorithm for performing such a decomopsition and provide theoretical guarantees as to when such decomposition is possible. Our work draws on two sets of tools: (a) The line of work addressing the Robust PCA problem in the matrix case [6, 9], and (b) Application of Leaurgans algorithm for tensor decomposition and tensor inverse problems [4,17,24].

Our algorithm is computationally efﬁcient and scalable, it relies on the key notion of tensor contraction which effectively reduces a tensor problem of dimension n n n to four decompostion problems for matrices of size n n. One can then apply convex methods for sparse and low-rank matrix decomposition followed by certain linear algebraic operations to recover the constituent tensors. Our algorithm not only produces the correct decomposition of Z into X and Y , but also produces the low rank factorization of X. We are able to avoid tensor unfolding based approaches [14,21,26] which are expensive and would lead to solving convex problems that are larger by orders of magnitude; in the 3rd order case the unfolded matrix would be n2 n. Furthermore, our method is

conceptually simple, to impelement as well as to analyze theoretically. Finally our method is also modular it can be extended to the higher order case as well as to settings where the corrupted tensor Z has missing entries, as described in Section 5.

1.1 Problem Setup

In this paper, vectors are denoted using lower case characters (e.g. x, y, a, b, etc.), matrices by uppercase characters (e.g. X, Y, etc,) and tensors by upper-case bold characters (e.g. X, T , A etc.). We will work with tensors of third order (representationally to be thought of as three-way arrays), and the term mode refers to one of the axes of the tensor. A slice of a tensor refers to a two dimensional matrix generated from the tensor by varying indices along two modes while keeping the third mode ﬁxed. For a tensor X we will refer to the indices of the ith mode-1 slice (i.e., the slice corresponding to the indices {i} [n2] [n3]) by S(1) i , where [n2] = {1, 2, . . . , n2} and [n3] is deﬁned similarly. We denote the matrix corresponding to S(1) i by X1 i . Similarly the indices of the kth mode-3 slice will be denoted by S(3) k and the matrix by X3 k.

Given a tensor of interest X, consider its decomposition into rank one tensors

i=1 λiui vi wi, (1)

where {ui}i=1,...,r Rn1, {vi}i=1,...,r Rn2, and {wi}i=1,...,r Rn3 are unit vectors. Here denotes the tensor product, so that X Rn1 n2 n3 is a tensor of order 3 and dimension n1 n2 n3. Without loss of generality, throughout this paper we assume that n1 n2 n3. We will present our results for third order tensors, and analogous results for higher orders follow in a transparent manner. We will be dealing with low-rank tensors, i.e. those tensors with r n1. Tensors can have rank larger than the dimension, indeed r n3 is an interesting regime, but far more challenging and is a topic left for future work.

Kruskal s Theorem [16] guarantees that tensors satisfying Assumption 1.1 below have a unique minimal decomposition into rank one terms of the form (1). The number of terms is called the (Kruskal) rank. Assumption 1.1. {ui}i=1,...,r Rn1, {vi}i=1,...,r Rn2, and {wi}i=1,...,r Rn3 are sets of linearly independent vectors.

While rank decomposition of tensors in the worst case is known to be computationally intractable [12], it is known that the (mild) assumption stated in Assumption 1.1 above sufﬁces for an algorithm known as Leurgans algorithm [4,18] to correctly identify the factors in this unique decomposition. In this paper, we will make this assumption about our tensor X throughout. This assumption may be viewed as a genericity or smoothness assumption [4].

In (1), r is the rank, λi R are scalars, and ui Rn1, vi Rn2, wi Rn3 are the tensor factors. Let U Rn1 r denote the matrix whose columns are ui, and correspondingly deﬁne V Rn2 r and W Rn3 r. Let Y Rn1 n2 n3 be a sparse tensor to be viewed as a corruption or adversarial noise added to X, so that one observes: Z = X + Y . The problem of interest is that of decomposition, i.e. recovering Xand Y from Z.

For a tensor X, we deﬁne its mode-3 contraction with respect to a contraction vector a Rn3, denoted by X3 a Rn1 n2, as the following matrix:

k=1 Xijkak, (2)

so that the resulting n1 n2 matrix is a weighted sum of the mode-3 slices of the tensor X. Under this notation, the kth mode-3 slice matrix X3 k is a mode-3 contraction with respect to the kth canonical basis vector. We similarly deﬁne the mode-1 contraction with respect to a vector c Rn1 as X1 c

i=1 Xijkci. (3)

In the subsequent discussion we will also use the following notation. For a matrix M, M refers to the spectral norm, M the nuclear norm, M 1 := P

i,j |Mij| the elementwise ℓ1 norm, and M := maxi,j |Mi,j| the elementwise ℓ norm.

1.2 Incoherence

The problem of sparse and low-rank decomposition for matrices has been studied in [6, 9, 13, 22], and it is well understood that exact decomposition is not always possible. In order for the problem to be identiﬁable, two situations must be avoided: (a) the low-rank component X must not be sparse, and (b) the sparse component Y must not be low-rank. In fact, something stronger is both necessary and sufﬁcient: the tangent spaces of the low-rank matrix (with respect to the rank variety) and the sparse matrix (with respect to the variety of sparse matrices) must have a transverse intersection [9].

For the problem to be amenable to recovery using comptationally tractable (convex) methods, somewhat stronger, incoherence assumptions are standard in the matrix case [6,7,9]. We will make similar assumptions for the tensor case, which we now describe.

Given the decomposition (1) of X we deﬁne the following subspaces of matrices:

TU,V = UAT + BV T : A Rn2 r, B Rn1 r

TV,W = V CT + DW T : C Rn3 r, D Rn2 r . (4)

Thus TU,V is the set of rank r matrices whose column spaces are contained in span(U) or row spaces are contained in span(V ) respectively, and a similar deﬁnition holds for TV,W and matrices V, W. If Q is a rank r matrix with column space span(U) and row space span(V ), TU,V is the tangent space at Q with respect to the variety of rank r matrices.

For a tensor Y , the support of Y refers to the indices corresponding to the non-zero entries of Y . Let Ω [n1] [n2] [n3] denote the support of Y . Further, for a slice Y 3 i , let Ω(3) i [n1] [n2] denote the corresponding sparsity pattern of the slice Y 3 i (more generally Ω(k) i can be deﬁned as the sparsity of the matrix resulting from the ith mode k slice). When a tensor contraction of Y is computed along mode k, the sparsity of the resulting matrix is the union of the sparsity patterns of each (matrix) slice, i.e. Ω(k) = Snk i=1 Ω(k) i . Let S Ω(k) denote the set of (sparse) matrices with support Ω(k). We deﬁne the following incoherence parameters:

ζ (U, V ) := max M TU,V : M 1 M ζ (V, W) := max M TV,W : M 1 M

µ Ω(k) := max N S(Ω(k)): N 1 N .

The quantities ζ (U, V ) and ζ (V, W) being small implies that for contractions of the tensor Z, all matrices in the tangent space of those contractions with respect to the variety of rank r matrices are diffuse , i.e. do not have sparse elements [9]. Similarly, µ Ω(k) being small implies that all matrices with the contracted sparsity pattern Ω(k) are such that their spectrum is diffuse , i.e. they do not have low rank. We will see speciﬁc settings where these forms of incoherence hold for tensors in Section 3.

2 Algorithm for Sparse and Low Rank Tensor Decomposition

We now introduce our algorithm to perform sparse and low rank tensor decompositions. We begin with a Lemma: Lemma 2.1. Let X Rn1 n2 n3, with n1 n2 n3 be a tensor of rank r n1. Then the rank of X3 a is at most r. Similarly the rank of X1 c is at most r.

Proof. Consider a tensor X = Pr i=1 λi ui vi wi. The reader may verify in a straightforward manner that X3 a enjoys the decomposition:

i=1 λi wi, a uiv T i . (5)

The proof for the rank of X1 c is analogous.

Note that while (5) is a matrix decomposition of the contraction, it is not a singular value decomposition (the components need not be orthogonal, for instance). Recovering the factors needs an application of simultaneous diagonalization, which we describe next. Lemma 2.2. [4, 18] Suppose we are given an order 3 tensor X = Pr i=1 λi ui vi wi of size n1 n2 n3 satisfying the conditions of Assumption 1.1. Suppose the contractions X3 a and X3 b are computed with respect to unit vectors a, b Rn3 distributed independently and uniformly on the unit sphere Sn3 1 and consider the matrices M1 and M2 formed as:

M1 = X3 a(X3 b ) M2 = (X3 b ) X3 a.

Then the eigenvectors of M1 (corresponding to the non-zero eigenvalues) are {ui}i=1,...,r, and the eigenvectors of M T 2 are {vi}i=1,...,r.

Remark Note that while the eigenvectors {ui} , {vj} are thus determined, a source of ambiguity remains. For a ﬁxed ordering of {ui} one needs to determine the order in which {vj} are to be arranged. This can be (generically) achieved by using the (common) eigenvalues of M1 and M2 for pairing i(f the contractions X3 a, X3 b are computed with respect to random vectors a, b the eigenvalues are distinct almost surely). Since the eigenvalues of M1, M2 are distinct they can be used to pair the columns of U and V .

Lemma 2.2 is essentially a simultaneous diagonalization result [17] that facilitates tensor decomposition [4]. Given a tensor T , one can compute two contractions for mode 1 and apply simultaneous diagonalization as described in Lemma 2.2 - this would yield the factors vi, wi (up to sign and reordering). One can then repeat the same process with mode 3 contractions to obtain ui, vi. In the ﬁnal step one can then obtain λi by solving a system of linear equations. The full algorithm is described in Algorithm 2 in the supplementary material.

For a contraction Zk v of a tensor Z with respect to a vector v along mode k, consider solving the convex problem:

minimize X,Y X + νk Y 1 subject to Zk v = X + Y. (6)

Our algorithm, stated in Algorithm 1, proceeds as follows: Given a tensor Z = X +Y , we perform two random contractions (w.r.t. vectors a, b) of the tensor along mode 3 to obtain matrices Z(3) a , Z(3) b . Since Z is a sum of sparse and low-rank components, by Lemma 2.1 so are the matrices Z(3) a , Z(3) b . We thus use (6) to decompose them into constituent sparse and low-rank components, which are the contractions of the matrices X(3) a , X(3) b , Y (3) a , Y (3) b . We then use X(3) a , X(3) b and Lemma 2.2 to obtain the factors U, V . We perform the same operations along mode 1 to obtain factors V, W. In the last step, we solve for the scale factors λi (a system of linear equations).

Algorithm 2 in the supplementary material, which we adopt for our decomposition problem in Algorithm 1, essentially relies on the idea of simultaneous diagonalization of matrices sharing common row and column spaces [17]. In this paper we do not analyze the situation where random noise is added to all the entries, but only the sparse adversarial noise setting. We note, however, that the key algorithmic insight of using contractions to perform tensor recovery is numerically stable and robust with respect to noise, as has been studied in [4,11,17].

Parameters that need to be picked to implement our algorithm are the regularization coefﬁcients ν1, ν3. In the theoretical guarantees we will see that this can be picked in a stable manner, and that a range of values guarantee exact decomposition when the suitable incoherence conditions hold. In practice these coefﬁcents would need to be determined by a cross-validation method. Note also that under suitable random sparsity assumptions [6], the regularization coefﬁcient may be picked to be the inverse of the square-root of the dimension.

2.1 Computational Complexity

The computational complexity of our algorithm is dominated by the complexity of perfoming the sparse and low-rank matrix decomposition of the contractions via (6). For simplicity, let us consider

Algorithm 1 Algorithm for sparse and low rank tensor decomposition

1: Input: Tensor Z, parameters ν1, ν3. 2: Generate contraction vectors a, b Rn3 independently and uniformly distributed on unit sphere. 3: Compute mode 3 contractions Z3 a and Z3 b respectively. 4: Solve the convex problem (6) with v = a, k = 3. Call the resulting solution matrices X3 a, Y 3 a , and regularization parameter ν1. 5: Solve the convex problem (6) with v = b, k = 3. Call the resulting solution matrices X3 b , Y 3 b and regularization parameter ν3. 6: Compute eigen-decomposition of M1 := X3 a(X3 b ) and M2 := (X3 b ) X3 a. Let U and V denote the matrices whose columns are the eigenvectors of M1 and M T 2 respectively corresponding to the non-zero eigenvalues, in sorted order. (Let r be the (common) rank of M1 and M2.) The eigenvectors, thus arranged are denoted as {ui}i=1,...,r and {vi}i=1,...,r. 7: Generate contraction vectors c, d Rn1 independently and uniformly distributed on unit sphere. 8: Solve the convex problem (6) with v = c, k = 1. Call the resulting solution matrices X1 c , Y 1 c and regularization parameter ν3. 9: Solve the convex problem (6) with v = d, k = 1. Call the resulting solution matrices X1 d, Y 1 d and regularization parameter ν4. 10: Compute eigen-decomposition of M3 := X1 c (X1 d) and M4 := (X1 c ) X1 d. Let V and W denote the matrices whose columns are the eigenvectors of M3 and M T 4 respectively corresponding to the non-zero eigenvalues, in sorted order. (Let r be the (common) rank of M3 and M4.) 11: Simultaneously reorder the columns of V , W, also performing simultaneous sign reversals as necessary so that the columns of V and V are equal, call the resulting matrix W with columns {wi}i=1,...,r. 12: Solve for λi in the linear system

i=1 λiuiv T i wi, a .

13: Output: Decomposition ˆ X := Pr i=1 λi ui vi wi, ˆY := Z ˆ X.

the case where the target tensor Z Rn n n has equal dimensions in different modes. Using a standard ﬁrst order method, the solution of (6) has a per iteration complexity of O(n3), and to achieve an accuracy of ϵ, O 1

ϵ iterations are required [22]. Since only four such steps need be

performed, the complexity of the method is O n3

ϵ where ϵ is the accuracy to which (6) is solved. Another alternative is to reformulate (6) such that it is amenable to greedy atomic approaches [23], which yields an order of magnitude improvement. We note that in contrast, a tensor unfolding for this problem [14,21,26] results in the need to solve much larger convex programs. For instance, for Z Rn n n, the resulting ﬂattened matrix would be of size n2 n and the resulting convex problem would then have a complexity of O n4

ϵ . For higher order tensors, the gap in computational complexity would increase by further orders of n.

2.2 Numerical Experiments

We now present numerical results to validate our approach. We perform experiments for tensors of size 50 50 50 (non-symmetric). A tensor Z is generated as the sum of a low rank tensor X and a sparse tensor Y . The low-rank component is generated as follows: Three sets of r unit vecots ui, vi, wi R50 are generated randomly, independently and uniformly distributed on the unit sphere. Also a random positive scale factor (uniformly distributed on [0, 1] is chosen and the tensor X = Pr i=1 λi ui vi wi. The tensor Y is generated by (Bernoulli) randomly sampling its entries with probability p. For each such p, we perform 10 trials and apply our algorithm. In all our experiments, the regularization parameter was picked to be ν = 1 n. The optimization problem (6) is solved using CVX in MATLAB. We report success if the MSE is smaller than 10 5, separately for both the X and Y components. We plot the empirical probability of success as a function of p in Fig. 1 (a), (b), for multiple values of the true rank r. In Fig. 1 (c), (d) we test the scalability

0 0.5 1 1.5 2 0

sparsity x 100

P(recovery)

r = 1 r = 2 r = 3 r = 4

(a) Low Rank Component

0 0.5 1 1.5 2 0

sparsity x 100

P(recovery)

r = 1 r = 2 r = 3 r = 4

(b) Sparse Component

0 0.05 0.1 0.15 0.2 0

Corruption Sparsity

# Inexact Recoveries

(c) Low Rank Component

0 0.05 0.1 0.15 0.2 0

Corruption Sparsity

# Inexact Recoveries

(d) Sparse Component

Figure 1: Recovery of the low rank and sparse components from our proposed methods. In ﬁgures (a) and (b) we see that the probability of recovery is high when both the rank and sparsity are low. In ﬁgures (c) and (d) we study the recovery error for a tensor of dimensions 300 300 300 and rank 50.

of our method, by generating a random 300 300 300 tensor of rank 50, and corrupting it with a sparse tensor of varying sparsity level. We run 5 independent trials and see that for low levels of corruption, both the low rank and sparse components are accurately recovered by our method.

3 Main Results

We now present the main rigorous guarantees related to the performance of our algorithm. Due to space constraints, the proofs are deferred to the supplementary materials. Theorem 3.1. Suppose Z = X + Y , where X = Pr i=1 λiui vi wi, has rank r n1 and such that the factors satisfy Assumption 1.1. Suppose Y has support Ωand the following condition is satisﬁed:

µ Ω(3) ζ (U, V ) 1

6 µ Ω(1) ζ (V, W) < 1

Then Algoritm 1 succeeds in exactly recovering the component tensors, i.e. (X, Y ) =

( ˆ X, ˆY ) whenever νk are picked so that ν3 ζ(U,V ) 1 4ζ(U,V )µ(Ω(3)), 1 3ζ(U,V )µ(Ω(3))

ν1 ζ(V,W ) 1 4ζ(V,W )µ(Ω(1)), 1 3ζ(V,W )µ(Ω(1))

. Speciﬁcally, choice of ν3 = (3ζ(U,V ))p (µ(Ω(3))) 1 p and

ν1 = (3ζ(V,W ))p (µ(Ω(1))) 1 p for any p [0, 1] in these respective intervals guarantees exact recovery.

For a matrix M, the degree of M, denoted by deg(M), is the maximum number of non-zeros in any row or column of M. For a tensor Y , we deﬁne the degree along mode k, denoted by degk(Y ) to be the maximum number of non-zero entries in any row or column of a matrix supported on Ω(k) (deﬁned in Section 1.2). The degree of Y is denoted by deg(Y ) := maxk {1,2,3} degk(Y ). Lemma 3.2. We have: µ Ω(k) deg(Y ), for all k.

For a subspace S Rn, let us deﬁne the incoherence of the subspace as:

β(S) := max i PSei 2,

where PS denotes the projection operator onto S, ei is a standard unit vector and 2 is the Euclidean norm of a vector. Let us deﬁne:

inc(X) := max {β (span(U)) , β (span(V )) , β (span(W))} inc3(X) := max {β (span(U)) , β (span(V ))} inc1(X) := max {β (span(V )) , β (span(W))} .

Note that inc(X) < 1, always. For many random ensembles of interest, we have that the incoherence

scales gracefully with the dimension n, i.e.: inc(X) K q

max{r,log n}

Lemma 3.3. We have ζ (U, V ) 2 inc(X) ζ (V, W) 2 inc(X).

Corollary 3.4. Let Z = X + Y , with X = Pr i=1 λiui vi wi and rank r n1, the factors satisfy Assumption 1.1 and incoherence inc(X). Suppose Y is sparse and has degree deg(Y ). If the condition inc(X)deg(Y ) < 1

12 holds then Algorithm 1 successfully recovers the true solution, i.e. . (X, Y ) = ( ˆ X, ˆY ) when the parameters

ν3 2inc3(X) 1 8deg3(Y )inc3(X), 1 6deg3(Y )inc3(X)

ν1 2inc1(X) 1 8deg1(Y )inc1(X), 1 6deg1(Y )inc1(X)

Speciﬁcally, a choice of ν3 = (6inc3(X))p

(2deg3(Y ))1 p , ν1 = (6inc1(X))p

(2deg1(Y ))1 p for any p [0, 1] is a valid choice that guarantees exact recovery.

Remark Note that Corollary 3.4 presents a deterministic guarantee on the recoverability of a sparse corruption of a low rank tensor, and can be viewed as a tensor extension of [9, Corollary 3].

We now consider, for the sake of simplicity, tensors of uniform dimension, i.e. X, Y , Z Rn n n. We show that when the low-rank and sparse components are suitably random, the approach outlined in Algorithm 1 achieves exact recovery.

We deﬁne the random sparsity model to be one where each entry of the tensor Y is non-zero independently and with identical probability ρ. We make no assumption about the mangitude of the entries of Y , only that its non-zero entries are thus sampled. Lemma 3.5. Let X = Pr i=1 λiui vi wi, where ui, vi, wi Rn are uniformly randomly distributed on the unit sphere Sn 1. Then the incoherence of the tensor X satisiﬁes:

max {r, log n}

n with probability exceeding 1 c2n 3 log n for some constants c1, c2. Lemma 3.6. Suppose the entries of Y are sampled according to the random sparsity model, and

ρ = O n 3 2 max(log n, r) 1 . Then the tensor Y satisﬁes: deg(Y ) n 12c1 max(log n,r) with

probability exceeding 1 exp c3

n max(log n,r) for some constant c3 > 0.

Corollary 3.7. Let Z = X + Y where X is low rank with random factors as per the conditions of Lemma 3.5 and Y is sparse with random support as per the conditions in Lemma 3.6. Provided r o n 1 2 , Algorithm 1 successfully recovers the correct decomposition, i.e. ( ˆ X, ˆY ) = (X, Y )

with probability exceeding 1 n α for some α > 0.

Remarks 1) Under this sampling model, the cardinality of the support of Y is allowed to be as large as m = O(n 3 2 log 1 n) when the rank r is constant (independent of n).

2) We could equivalently have looked at a uniformly random sampling model, i.e. one where a support set of size m is chosen uniformly randomly from the set of all possible support sets of cardinality at most m, and our results for exact recovery would have gone through. This follows from the equivalence principle for successful recovery between Bernoulli sampling and uniform sampling, see [6, Appendix 7.1].

3) Note that for the random sparsity ensemble, [6] shows that a choice of ν = 1 n ensures exact recovery (an additional condition regarding the magnitudes of the factors is needed, however). By extension, the same choice can be shown to work for our setting.

4 Extensions

The approach described in Algorithm 1 and the analysis is quite modular and can be adapted to various settings to account for different forms of measurements and robustness models. We do not present an analysis of these situations due to space constraints, but outline how these extensions follow from the current development in a straightforward manner.

1) Higher Order Tensors: Algorithm 1 can be extended naturally to the higher order setting. Recall that in the third order case, one needs to recover two contractions along the third mode to discover factors U, V and then two contractions along the ﬁrst mode to discover factors V, W. For an order K tensor of the form Z Rn1 ... n K which is the sum of a low rank component X = Pr i=1 λi NK l=1 u(l) i and a sparse component Y , one needs to compute higher order contractions of Z along K 1 different modes. For each of these K 1 modes the resulting contraction is the sum of a sparse and low-rank matrix, and thus pairs of matrix problems of the form (6) reveal the sparse and low-rank components of the contractions. The low-rank factors can then be recovered via application of Lemma 2.2 and the full decomposition can thus be recovered. The same guarantees as in Theorem 3.1 and Corollary 3.4 hold verbatim (the notions of incoherence inc(X) and degree deg(Y ) of tensors need to be extended to the higher order case in the natural way)

2) Block sparsity: Situations where entire slices of the tensor are corrupted may happen in recommender systems with adversarial ratings [10]. A natural approach in this case is to use a convex relaxation of the form

minimize M1,M2 νk M1 + M2 1,2

subject to Zk v = M1 + M2

in place of (6) in Algorithm 1. In the above, M 1,2 := P

i Mi 2, where Mi is the ith column of M. Since exact recovery of the block-sparse and low-rank components of the contractions are guaranteed via this relaxation under suitable assumptions [10], the algorithm would inherit associated provable guarantees.

3) Tensor completion: In applications such as recommendation systems, it may be desirable to perform tensor completion in the presence of sparse corruptions. In [24], an adaptation of Leurgans algorithm was presented for performing completion from measurements restricted to only four slices of the tensor with near-optimal sample complexity (under suitable genericity assumptions about the tensor). We note that it is straightforward to blend Algorithm 1 with this method to achieve completion with sparse corruptions. Recalling that Z = X + Y and therefore Z3 k = X3 k + Y 3 k (i.e. the kth mode 3 slice of Z is a sum of sparse and low rank slices of X and Y ), if only a subset of elements of Z3 k (say PΛ Z3 k ) is observed for some index set Λ, we can replace (6) in Algorithm 1 with

minimize M1,M2 νk M1 + M2 1 subject to PΛ Zk v = PΛ (M1 + M2) .

Under suitable incoherence assumptions [6, Theorem 1.2], the above will achieve exact recovery of the slices. Once four slices are accurately recovered, one can then use Leurgans algorithm to recover the full tensor [24, Theorem 3.6]. Indeed the above idea can be extended more generally to the concept of deconvolving a sum of sparse and low-rank tensors from separable measurements [24].

4) Non-convex approaches: A basic primitive for sparse and low-rank tensor decomposition used in this paper is that of using (6) for matrix decomposition. More efﬁcient non-convex approaches such as the ones described in [22] may be used instead to speed up Algorithm 1. These alternative nonconvex methods [22] requre O(rn2) steps per iterations, and O log 1

ϵ iterations resulting in a total complexity of O rn2 log 1

ϵ for solving the decomposition of the contractions to an accuracy of ϵ.

[1] A. ANANDKUMAR, R. GE, D. HSU, AND S. M. KAKADE, A tensor approach to learning mixed membership community models, The Journal of Machine Learning Research, 15 (2014), pp. 2239 2312. [2] A. ANANDKUMAR, R. GE, D. HSU, S. M. KAKADE, AND M. TELGARSKY, Tensor decompositions for learning latent variable models, Tech. Rep. 1, 2014.

[3] C. BECKMANN AND S. SMITH, Tensorial extensions of independent component analysis for multisubject FMRI analysis, Neuro Image, 25 (2005), pp. 294 311.

[4] A. BHASKARA, M. CHARIKAR, A. MOITRA, AND A. VIJAYARAGHAVAN, Smoothed analysis of tensor decompositions, in Proceedings of the 46th Annual ACM Symposium on Theory of Computing, ACM, 2014, pp. 594 603.

[5] S. BHOJANAPALLI AND S. SANGHAVI, A new sampling technique for tensors, ar Xiv preprint ar Xiv:1502.05023, (2015).

[6] E. J. CAND ES, X. LI, Y. MA, AND J. WRIGHT, Robust principal component analysis?, Journal of the ACM, 58 (2011), pp. 11 37.

[7] E. J. CAND ES AND B. RECHT, Exact matrix completion via convex optimization, Foundations of Computational Mathematics, 9 (2009), pp. 717 772.

[8] R. B. CATTELL, Parallel proportional proﬁles and other principles for determining the choice of factors by rotation, Psychometrika, 9 (1944), pp. 267 283.

[9] V. CHANDRASEKARAN, S. SANGHAVI, P. A. PARRILO, AND A. S. WILLSKY, Rank-sparsity incoherence for matrix decomposition, SIAM Journal on Optimization, 21 (2011), pp. 572 596.

[10] Y. CHEN, H. XU, C. CARAMANIS, AND S. SANGHAVI, Robust matrix completion and corrupted columns, in Proceedings of the 28th International Conference on Machine Learning (ICML-11), L. Getoor and T. Scheffer, eds., New York, NY, USA, 2011, ACM, pp. 873 880.

[11] N. GOYAL, S. VEMPALA, AND Y. XIAO, Fourier PCA and robust tensor decomposition, in Proceedings of the 46th Annual ACM Symposium on Theory of Computing, ACM, 2014, pp. 584 593.

[12] C. J. HILLAR AND L.-H. LIM, Most tensor problems are NP-hard, Journal of the ACM, 60 (2013), pp. 45:1 45:39.

[13] D. HSU, S. KAKADE, AND T. ZHANG, Robust matrix decomposition with sparse corruptions, Information Theory, IEEE Transactions on, 57 (2011), pp. 7221 7234.

[14] B. HUANG, C. MU, D. GOLDFARB, AND J. WRIGHT, Provable models for robust low-rank tensor completion, Paciﬁc Journal of Optimization, 11 (2015), pp. 339 364.

[15] A. KRISHNAMURTHY AND A. SINGH, Low-rank matrix and tensor completion via adaptive sampling, in Advances in Neural Information Processing Systems, 2013.

[16] J. B. KRUSKAL, Three-way arrays: Rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics, Linear Algebra Applicat., 18 (1977).

[17] V. KULESHOV, A. CHAGANTY, AND P. LIANG, Tensor factorization via matrix factorization, ar Xiv.org, (2015).

[18] S. LEURGANS, R. ROSS, AND R. ABEL, A decomposition for three-way arrays, SIAM Journal on Matrix Analysis and Applications, 14 (1993), pp. 1064 1083.

[19] Q. LI, A. PRATER, L. SHEN, AND G. TANG, Overcomplete tensor decomposition via convex optimization, in IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), Cancun, Mexico, Dec. 2015.

[20] N. MESGARANI, M. SLANEY, AND S. A. SHAMMA, Discrimination of speech from non-speech based on multiscale spectro-temporal modulations, Audio, Speech and Language Processing, IEEE Transactions on, 14 (2006), pp. 920 930.

[21] C. MU, B. HUANG, J. WRIGHT, AND D. GOLDFARB, Square deal: Lower bounds and improved relaxations for tensor recovery, preprint ar Xiv:1307.5870, 2013.

[22] P. NETRAPALLI, U. NIRANJAN, S. SANGHAVI, A. ANANDKUMAR, AND P. JAIN, Non-convex robust PCA, in Advances in Neural Information Processing Systems, 2014.

[23] N. RAO, P. SHAH, AND S. WRIGHT, Forward-backward greedy algorithms for signal demixing, in Signals, Systems and Computers, 2013 Asilomar Conference on, IEEE, 2014.

[24] P. SHAH, N. RAO, AND G. TANG, Optimal low-rank tensor recovery from separable measurements: Four contractions sufﬁce, ar Xiv.org, (2015).

[25] G. TANG AND P. SHAH, Guaranteed tensor decomposition: A moment approach, International Conference on Machine Learning (ICML 2015), (2015), pp. 1491 1500.

[26] R. TOMIOKA, K. HAYASHI, AND H. KASHIMA, Estimation of low-rank tensors via convex optimization, preprint ar Xiv:1010.0789, 2011.

[27] M. YUAN AND C.-H. ZHANG, On tensor completion via nuclear norm minimization, preprint ar Xiv:1405.1773, 2014.