# rank_determination_for_lowrank_data_completion__5120c6d3.pdf

Journal of Machine Learning Research 18 (2017) 1-29 Submitted 7/17; Revised 8/17; Published 9/17

Rank Determination for Low-Rank Data Completion

Morteza Ashraphijuo ASHRAPHIJUO@EE.COLUMBIA.EDU Columbia University New York, NY 10027, USA

Xiaodong Wang WANGX@EE.COLUMBIA.EDU Columbia University New York, NY 10027, USA

Vaneet Aggarwal VANEET@PURDUE.EDU Purdue University West Lafayette, IN 47907, USA

Editor: Animashree Anandkumar

Recently, fundamental conditions on the sampling patterns have been obtained for ﬁnite completability of low-rank matrices or tensors given the corresponding ranks. In this paper, we consider the scenario where the rank is not given and we aim to approximate the unknown rank based on the location of sampled entries and some given completion. We consider a number of data models, including single-view matrix, multi-view matrix, CP tensor, tensor-train tensor and Tucker tensor. For each of these data models, we provide an upper bound on the rank when an arbitrary low-rank completion is given. We characterize these bounds both deterministically, i.e., with probability one given that the sampling pattern satisﬁes certain combinatorial properties, and probabilistically, i.e., with high probability given that the sampling probability is above some threshold. Moreover, for both single-view matrix and CP tensor, we are able to show that the obtained upper bound is exactly equal to the unknown rank if the lowest-rank completion is given. Furthermore, we provide numerical experiments for the case of single-view matrix, where we use nuclear norm minimization to ﬁnd a low-rank completion of the sampled data and we observe that in most of the cases the proposed upper bound on the rank is equal to the true rank. Keywords: Low-rank data completion, rank estimation, tensor, matrix, manifold, Tucker rank, tensor-train rank, CP rank, multi-view matrix.

1. Introduction

Developing methods and algorithms to study large high-dimensional data is becoming more indispensable as hyperspectral images and videos, product ranking datasets and other applications of big datasets are attracting more attention recently. Moreover, in order to guarantee the same level of efﬁciency in images or videos, a minor increment in dimensionality in the datasets entails a signiﬁcant increment in the amount of the data, and this fact causes modeling and also computational challenges to analyze big high-dimensional datasets. Consequently, providing a statistically rigorous result requires a massive amount of data that grows exponentially with the dimension. The low-rank data completion problem is concerned with completing a matrix or tensor given a subset of its entries and some rank constraints. Various applications can be found in many ﬁelds including image and signal processing (Cand es et al., 2013; Ji et al., 2010), data mining (Eld en, 2007),

c 2017 Morteza Ashraphijuo, Xiaodong Wang, and Vaneet Aggarwal.

License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v18/17-375.html.

MORTEZA ASHRAPHIJUO, XIAODONG WANG, AND VANEET AGGARWAL

network coding (Harvey et al., 2005), compressed sensing (Lim and Comon, 2010; Sidiropoulos and Kyrillidis, 2012; Ashraphijuo et al., 2016c; Gandy et al., 2011; Ashraphijuo and Wang, 2017c; Ashraphijuo et al., 2015), reconstructing the visual data (Liu et al., 2013), bioinformatics and systems biology (Ogundijo et al., 2017; Ogundijo et al.), ﬁngerprinting (Liu et al., 2016), etc. There is an extensive literature on developing various optimization methods to treat this problem including minimizing a convex relaxation of rank (Cand es and Recht, 2009; Cand es and Tao, 2010; Cai et al., 2010; Gandy et al., 2011; Ashraphijuo et al., 2016b; Ashraphijuo and Wang, 2017c), non-convex approaches (Recht and R e, 2013), and alternating minimization (Jain et al., 2013; Ge et al., 2016), etc. More recently, deterministic conditions on the sampling patterns have been studied for subspace clustering in (Pimentel-Alarc on et al., 2016c, 2015, 2016a,b). Moreover, the fundamental conditions on the sampling pattern that lead to different numbers of completion (unique, ﬁnite, or inﬁnite) for different data structures given the corresponding rank constraints have been investigated in (Pimentel-Alarc on et al., 2016d; Ashraphijuo et al., 2017c; Ashraphijuo and Wang, 2017b; Ashraphijuo et al., 2016a; Ashraphijuo and Wang, 2017a; Ashraphijuo et al., 2017d,a,b). However, in many practical low-rank data completion problems, the rank may not be known a priori. In this paper, we investigate this problem and we aim to approximate the rank based on the given entries, where it is assumed that the original data is generically chosen from the manifold corresponding to the unknown rank. The only existing work that treats this problem for a singleview matrix data based on the sampling pattern is (Pimentel-Alarc on and Nowak, 2016), which requires some strong assumptions including the existence of a completion whose rank r is a lower bound on the unknown true rank r , i.e., r r. We start by investigating the single-view matrix to provide a new analysis that does not require such assumption and also we can extend our approach to treat the CP rank tensor model. Moreover, we further generalize our approach to treat vector rank data models including the multi-view matrix, the Tucker rank tensor and the tensor-train (TT) rank tensor. For each of these data models, we obtain the upper bound on the scalar rank or componentwise upper bound on the unknown vector rank, deterministically based on the sampling pattern and the rank of a given completion. We also obtain such bound that holds with high probability based on the sampling probability. Moreover, for the single-view matrix, we provide some numerical results to show how tight our probabilistic bounds on the rank are (in terms of the sampling probability). In particular, we used nuclear norm minimization to ﬁnd a completion and demonstrate our proposed method in obtaining a tight bound on the unknown rank. In general, providing a completion requires much less samples than recovering the original sampled data. The goal of this paper is to solve the fundamental problem of rank determination for the original sampled data given an arbitrary low-rank data completion. One possible application scenario is to improve upon the low-rank completion obtained by the convex relaxation methods. Speciﬁcally, using convex optimization (minimization of nuclear and atomic norms or summation of nuclear norms of matricizations and unfoldings) or any other methods in the literature, we may be able to ﬁnd a fairly low-rank completion of the original data, which is not necessarily equal (or even close) to the original sampled data. Then, under some circumstances, the rank of the obtained completion using any rank independent method can be an upper bound on the rank of the original sampled data (and sometimes the obtained rank can be exactly equal to the rank of the original sampled data). We take advantage of the geometric analysis on the manifold of the corresponding data which leads to the fundamental conditions on the sampling pattern (independent of the value of entries) (Pimentel-Alarc on et al., 2016d; Ashraphijuo et al., 2017c; Ashraphijuo and Wang, 2017b; Ashraphi-

RANK DETERMINATION FOR LOW-RANK DATA COMPLETION

juo et al., 2016a; Ashraphijuo and Wang, 2017a) such that given an arbitrary low-rank completion we can provide a tight upper bound on the rank. To illustrate how such approximation is even possible consider the following example. Assume that an n1 n2 rank-2 matrix is chosen generically from the corresponding manifold. Hence, any 2 2 submatrix of this matrix is full-rank with probability one (due to the genericity assumption). Moreover, note that any 3 3 submatrix of this matrix is not full-rank. As a result, by observing the sampled entries we can ﬁnd some bounds on the rank. Using the analysis in (Pimentel-Alarc on et al., 2016d; Ashraphijuo et al., 2017c; Ashraphijuo and Wang, 2017b; Ashraphijuo et al., 2016a; Ashraphijuo and Wang, 2017a) on ﬁnite completablity of the sampled data (ﬁnite number of completions) for different data models, we characterize both deterministic and probablistic bounds on the unknown rank. The remained of the paper is organized as follows. In Section 2, we introduce the data models and problem statement. In Sections 3 and 4 we characterize our determintic and probablistic bounds for scalar-rank cases (single-view matrix and CP tensor) and vector-rank cases (multi-view matrix, Tucker tensor and TT tensor), respectively. Finally, Section 5 concludes the paper.

2. Data Models and Problem Statement

2.1 Matrix Models

2.1.1 SINGLE-VIEW MATRIX

Assume that the sampled matrix U is chosen generically from the manifold of the n1 n2 matrices of rank r , where r is unknown. The matrix V Rn1 r is called a basis for U if each column of U can be written as a linear combination of the columns of V. Denote Ωas the binary sampling pattern matrix that is of the same size as U and Ω( x) = 1 if U( x) is observed and Ω( x) = 0 otherwise, where x = (x1, x2) represents the entry corresponding to row number x1 and column number x2. Moreover, deﬁne UΩas the matrix obtained from sampling U according to Ω, i.e.,

UΩ( x) = U( x) if Ω( x) = 1, 0 if Ω( x) = 0. (1)

2.1.2 MULTI-VIEW MATRIX

The matrix U Rn (n1+n2) is sampled. Denote a partition of U as U = [U1|U2] where U1 Rn n1 and U2 Rn n2 represent the ﬁrst and second views of data, respectively. The sampling pattern is deﬁned as Ω= [Ω1|Ω2], where Ω1 and Ω2 represent the sampling patterns corresponding to the ﬁrst and second views of data, respectively. Assume that rank(U1) = r 1, rank(U2) = r 2 and rank(U) = r , and also U is chosen generically from the manifold structure with above parameters. Denote r = (r 1, r 2, r ) which is assumed unknown.

2.2 Tensor Models

Assume that a d-way tensor U Rn1 nd is sampled. For the sake of simplicity in notation, deﬁne Ni Πi j=1 nj , Ni Πd j=i+1 nj and N i Nd

ni . Denote Ωas the binary sampling pattern tensor that is of the same size as U and Ω( x) = 1 if U( x) is observed and Ω( x) = 0 otherwise, where U( x) represents an entry of tensor U with coordinate x = (x1, . . . , xd). Moreover, deﬁne UΩ

MORTEZA ASHRAPHIJUO, XIAODONG WANG, AND VANEET AGGARWAL

as the tensor obtained from sampling U according to Ω, i.e.,

UΩ( x) = U( x) if Ω( x) = 1, 0 if Ω( x) = 0. (2)

For each subtensor U of the tensor U, deﬁne NΩ(U ) as the number of observed entries in U

according to the sampling pattern Ω. Deﬁne the matrix e U(i) RNi Ni as the i-th unfolding of the tensor U, such that U( x) = e U(i)(f Mi(x1, . . . , xi), f M i(xi+1, . . . , xd)), where f Mi : (x1, . . . , xi) {1, 2, . . . , Ni} and f M i : (xi+1, . . . , xd) {1, 2, . . . , Ni} are two bijective mappings. Let U(i) Rni N i be the i-th matricization of the tensor U, such that U( x) = U(i)(xi, Mi(x1, . . . , xi 1, xi+1, . . . , xd)), where Mi : (x1, . . . , xi 1, xi+1, . . . , xd) {1, 2, . . . , N i} is a bijective mapping. Observe that for any arbitrary tensor A, the ﬁrst matricization and the ﬁrst unfolding are the same, i.e., A(1) = e A(1). In what follows, we introduce three different tensor ranks, i.e., the CP rank, Tucker rank and TT rank.

2.2.1 CP DECOMPOSITION

The CP rank of a tensor U, rank CP(U) = r, is deﬁned as the minimum number r such that there exist al i Rni for 1 i d and 1 l r, such that

l=1 al 1 al 2 . . . al d, (3)

or equivalently,

U(x1, x2, . . . , xd) =

l=1 al 1(x1)al 2(x2) . . . al d(xd), (4)

where denotes the tensor product (outer product) and al i(xi) denotes the xi-th entry of vector al i. Note that al 1 al 2 . . . al d Rn1 nd is a rank-1 tensor, l = 1, 2, . . . , r.

2.2.2 TUCKER DECOMPOSITION

Given U Rn1 nd and X Rni n i, the product U U i X Rn1 ni 1 n i ni+1 nd is deﬁned as

U (x1, , xi 1, ki, xi+1, , xd)

xi=1 U(x1, , xi 1, xi, xi+1, , xd)X(xi, ki). (5)

The Tucker rank of a tensor U is deﬁned as rank Tucker(U) = r = (m1, . . . , md) where mi = rank(U(i)), i.e., the rank of the i-th matricization, i = 1, . . . , d. The Tucker decomposition of U is given by

kd=1 C(k1, . . . , kd)T1(k1, x1) . . . Td(kd, xd), (6)

RANK DETERMINATION FOR LOW-RANK DATA COMPLETION

or in short

U = C d i=1 Ti, (7)

where C Rm1 md is the core tensor and Ti Rmi ni are d orthogonal matrices.

2.2.3 TT DECOMPOSITION

The separation or TT rank of a tensor is deﬁned as rank TT(U) = r = (u1, . . . , ud 1) where ui = rank( e U(i)), i.e., the rank of the i-th unfolding, i = 1, . . . , d 1. Note that ui max{Ni, Ni} in general and also u1 is simply the conventional matrix rank when d = 2. The TT decomposition of a tensor U is given by

kd 1=1 U(1)(x1, k1)

i=2 U(i)(ki 1, xi, ki)

U(d)(kd 1, xd), (8)

or in short

U = U(1) . . . U(d), (9)

where the 3-way tensors U(i) Rui 1 ni ui for i = 2, . . . , d 1 and matrices U(1) Rn1 u1 and U(d) Rud 1 nd are the components of this decomposition. For each matrix or tensor model, we assume that the true rank of U or U is r or r which is unknown, and also U or U is chosen generically from the corresponding manifold. The table below represents the mentioned symbols in brief.

Data structure Sampled data Rank Comments

Single-view matrix U Rn1 n2 r Multi-view matrix U = [U1|U2] Rn (n1+n2) r = (r 1, r 2, r ) r i = rank(Ui) CP tensor U Rn1 nd r Tucker tensor U Rn1 nd r = (m 1, . . . , m d) m i = rank(U(i)) TT tensor U Rn1 nd r = (u 1, . . . , u d 1) u i = rank( e U(i))

2.3 Problem Statement

In this paper, we assume that there exists a full rank completion of the sampled data (i.e., the data is not over-sampled). For each one of the above data models, we are interested in obtaining the upper bound on the unknown scalar-rank r or component-wise upper bound on the unknown vector-rank r , deterministically based on the sampling pattern Ωor Ωand the rank of a given completion. Also, we aim to provide such bound that holds with high probability based only on the sampling probability of the entries and the rank of a given completion. Moreover, for the single-view matrix model and CP-rank tensor model, where the rank is a scalar, we provide both deterministic and probabilistic conditions such that the unknown rank can be exactly determined.

MORTEZA ASHRAPHIJUO, XIAODONG WANG, AND VANEET AGGARWAL

3. Scalar-Rank Cases

3.1 Single-View Matrix

Previously, this problem has been treated in (Pimentel-Alarc on and Nowak, 2016), where strong assumptions including the existence of a completion with rank r r have been used. In this section, we provide an analysis that does not require such assumption and moreover our analysis can be extended to multi-view data and tensors in the following sections. Furthermore, we show the tightness of our theoretical bounds via numerical examples. Assume that U Rn1 n2 is the sampled matrix. Let P1 and P2 denote the Lebesgue measures on Rn1 r and Rr n2, respectively. In this paper, we assume that the matrix U is chosen generically from the manifold of n1 n2 matrices of rank r , i.e., the entries of U are drawn independently with respect to Lebesgue measure on the corresponding manifold. Hence, the probability measures of all statements in this subsection are P1 P2.

3.1.1 DETERMINISTIC RANK ANALYSIS

The following condition will be used frequently in this subsection. Condition Ar: Each column of the sampled matrix includes at least r sampled entries. Consider an arbitrary column of the sampled matrix U (:, i), where i {1, . . . , n2}. Let li = NΩ(U (:, i)) denote the number of observed entries in the i-th column of U. Condition Ar results that li r. We construct a binary valued matrix called constraint matrix Ωr based on Ωand a given number r. Speciﬁcally, we construct li r columns with binary entries based on the locations of the observed entries in U (:, i) such that each column has exactly r+1 entries equal to one. Assume that x1, . . . , xli are the row indices of all observed entries in this column. Let Ωi r be the corresponding n1 (li r) matrix to this column which is deﬁned as the following: for any j {1, . . . , li r}, the j-th column has the value 1 in rows {x1, . . . , xr, xr+j} and zeros elsewhere. Deﬁne the binary constraint matrix as Ωr = Ω1 r|Ω2 r . . . |Ωn2 r Rn1 Kr (Pimentel-Alarc on et al., 2016d), where Kr = NΩ(U) n2r. Condition Br: There exists a submatrix1 Ω r Rn1 K of Ωr such that K = n1r r2 and for any K {1, 2, . . . , K} and any submatrix Ω r Rn1 K of Ω r we have

rf( Ω r) r2 K , (10)

where f( Ω r) denotes the number of nonzero rows of Ω r. Note that exhaustive enumeration is needed in order to check whether or not Condition Br holds. Hence, the deterministic analysis cannot be used in practice for large-scale data. However, it serves as the basis of the subsequent probabilistic analysis that will lead to a simple lower bound on the sampling probability such that Condition Br holds with high probability, which is of practical value. In the following, we restate Theorem 1 in (Pimentel-Alarc on et al., 2016d) which will be used later.

Lemma 1 With probability one, there are ﬁnitely many completions of the sampled matrix if and only if Conditions Ar and Br hold.

1. Speciﬁed by a subset of rows and a subset of columns (not necessarily consecutive).

RANK DETERMINATION FOR LOW-RANK DATA COMPLETION

Recall that the true rank r is assumed unknown.

Deﬁnition 2 Let SΩdenote the set of all natural numbers r such that both Conditions Ar and Br hold.

Lemma 3 There exists a number rΩsuch that SΩ= {1, 2, . . . , rΩ}.

Proof Assume that 1 < r min{n1, n2} and r SΩ. It sufﬁces to show r 1 SΩ. By contradiction, assume that r 1 / SΩ. Therefore, according to Lemma 1, there exist inﬁnitely many completions of U of rank r 1 and there exist at most ﬁnitely many completions of U of rank r. Consider a rank r 1 completion Ur 1. Note that changing one single entry (a non-observed entry) of Ur 1, for example Ur 1(1, 1) = x, to a random number in y R changes the rank of this matrix by at most 1 and basically since we are changing to a random number, it can be easily seen that the rank does not decrease with probability one. Hence, the rank of the modiﬁed matrix U r 1 would be either r 1 or r. Assume that the rank has been increased to r. Then, we show there exist inﬁnitely many completions of rank r, which contradicts the assumption. In fact, for any value of Ur 1(1, 1) except x, this matrix would be a rank r completion. To observe this more clearly, consider the r r submatrix of U r 1 whose determinant is not zero due to changing the value of Ur 1(1, 1). It is easily observed that this submatrix includes U r 1(1, 1) and let assume it is U r 1(1 : r, 1 : r), and therefore the determinant of U r 1(2 : r, 2 : r) is a nonzero number (otherwise the rank would not increase by changing the value of Ur 1(1, 1)). Hence, it is easy to see that for any value of Ur 1(1, 1) except x, U r 1 would be a rank r completion, and therefore there exist inﬁnitely many completions of rank r and proof is complete in this scenario. Now, assume that changing any of the non-observed entries does not increase the rank of Ur 1. Then, this contradicts the assumption that there exists a full rank completion of the sampled data since there does not exist any completion of rank higher than r 1.

The following theorem provides a relationship between the unknown rank r and rΩ.

Theorem 4 With probability one, exactly one of the following statements holds (i) r SΩ= {1, 2, . . . , rΩ}; (ii) For any arbitrary completion of the sampled matrix U of rank r, we have r / SΩ.

Proof Suppose that there does not exist a completion of the sampled matrix U of rank r such that r SΩ. Therefore, it is easily veriﬁed that statement (ii) holds and statement (i) does not hold. On the other hand, assume that there exists a completion of the sampled matrix U of rank r, where r SΩ. Hence, statement (ii) does not hold and to complete the proof it sufﬁces to show that with probability one, statement (i) holds. Observe that rΩ SΩ, and therefore Condition ArΩholds. Hence, each column of U includes at least rΩ+ 1 observed entries. On the other hand, the existence of a completion of the sampled matrix U of rank r SΩresults in the existence of a basis X Rn1 r such that each column of U is a linear combination of the columns of X, and thus there exists Y Rr n2 such that UΩ= (XY)Ω. Hence, given X, each observed entry U(i, j) results in a degree-1 polynomial in

MORTEZA ASHRAPHIJUO, XIAODONG WANG, AND VANEET AGGARWAL

terms of the entries of Y as the following

l=1 X(i, l)Y(l, j). (11)

Consider the ﬁrst column of U and recall that it includes at least rΩ+1 r+1 observed entries. The genericity of the coefﬁcients of the above-mentioned polynomials results that using r of the observed entries the ﬁrst column of Y can be determined uniquely. This is because there exists a unique solution for a system of r linear equations in r variables that are linearly independent. Then, there exists at least one more observed entry besides these r observed entries in the ﬁrst column of U and it can be written as a linear combination of the r observed entries that have been used to obtain the ﬁrst column of Y. Let U(i1, 1), . . . , U(ir, 1) denote the r observed entries that have been used to obtain the ﬁrst column of Y and U(ir+1, 1) denote the other observed entry. Hence, the existence of a completion of the sampled matrix U of rank r SΩresults in an equation as the following

U(ir+1, 1) =

l=1 tl U(il, 1), (12)

where tl s are constant scalars, l = 1, . . . , r. Assume that r / SΩ, i.e., statement (i) does not hold. Then, note that r r + 1 and U is chosen generically from the manifold of n1 n2 rankr matrices, and therefore an equation of the form of (12) holds with probability zero. Moreover, according to Lemma 1 there exist at most ﬁnitely many completions of the sampled matrix of rank r. Therefore, there exist a completion of U of rank r with probability zero, which contradicts the initial assumption that there exists a completion of the sampled matrix U of rank r, where r SΩ.

Note that the existing work that treats the similar problem for a single-view matrix data based on the sampling pattern is (Pimentel-Alarc on and Nowak, 2016), which requires some strong assumptions including the existence of a completion whose rank r is a lower bound on the unknown true rank r , i.e., r r. We provide a new analysis that does not require such assumption and also based on our new analysis, we can extend our approach to treat other data structures.

Corollary 5 Consider an arbitrary number r SΩ. Similar to Theorem 4, it follows that with probability one, exactly one of the followings holds (i) r {1, 2, . . . , r }; (ii) For any arbitrary completion of the sampled matrix U of rank r, we have r / {1, 2, . . . , r }.

As a result of Corollary 5, we have the following.

Corollary 6 Assuming that there exists a rank-r completion of the sampled matrix U such that r SΩ, then with probability one r r.

Corollary 7 Let U denote an optimal solution to the following NP-hard optimization problem

minimize U Rn1 n2 rank(U ) (13)

subject to U Ω= UΩ.

RANK DETERMINATION FOR LOW-RANK DATA COMPLETION

Also, let ˆU denote a suboptimal solution to the above optimization problem. Then, Corollary 5 results the following statements: (i) If rank(U ) SΩ, then r = rank(U ) with probability one. (ii) If rank( ˆU) SΩ, then r rank( ˆU) with probability one.

Remark 8 One challenge of applying Corollary 7 or any of the other obtained deterministic results is the computation of SΩ, which involves exhaustive enumeration to check Condition Br. Next, for each number r, we provide a lower bound on the sampling probability in terms of r that ensures r SΩwith high probability. Consequently, we do not need to compute SΩbut instead we can certify the above results with high probability.

3.1.2 PROBABILISTIC RANK ANALYSIS

The following lemma is a re-statement of Theorem 3 in (Pimentel-Alarc on et al., 2016d), which is the probabilistic version of Lemma 1.

Lemma 9 Suppose r n1

6 and that each column of the sampled matrix is observed in at least l entries, uniformly at random and independently across entries, where

l > max n 12 log n1

+ 12, 2r o . (14)

Also, assume that r(n1 r) n2. Then, with probability at least 1 ϵ, r SΩ.

The following lemma is taken from (Ashraphijuo et al., 2016a) and will be used to derive a lower bound on the sampling probability that leads to the similar statement as Theorem 4 with high probability.

Lemma 10 Consider a vector with n entries where each entry is observed with probability p independently from the other entries. If p > p = k n + 1 4 n, then with probability at least 1 exp( n

2 ) , more than k entries are observed.

The following proposition characterizes the probabilistic version of Theorem 4.

Proposition 11 Suppose r n1

6 , r(n1 r) n2 and that each entry of the sampled matrix is observed uniformly at random and independently across entries with probability p, where

n1 max n 12 log n1

+ 12, 2r o + 1

4 n1 . (15)

Then, with probability at least (1 ϵ) 1 exp( n1

2 ) n2, we have r SΩ.

Proof Consider an arbitrary column of U and note that resulting from Lemma 10 the number of observed entries at this column of U is greater than max 12 log n1

ϵ + 12, 2r with probability

at least 1 exp( n1

2 ) . Therefore, the number of sampled entries at each column satisﬁes

l > max n 12 log n1

+ 12, 2r o , (16)

MORTEZA ASHRAPHIJUO, XIAODONG WANG, AND VANEET AGGARWAL

with probability at least 1 exp( n1

2 ) n2. Thus, resulting from Lemma 9 with probability at least (1 ϵ) 1 exp( n1

2 ) n2, we have r SΩ.

Finally, we have the following probabilistic version of Corollary 7.

Corollary 12 Assume that rank(U ) n1

6 and rank(U )(n1 rank(U )) n2 and (15) holds for r = rank(U ), where U denotes an optimal solution to the optimization problem (13). Then,

according to Proposition 11 and Corollary 7, with probability at least (1 ϵ) 1 exp( n1

r = rank(U ). Similarly, assume that rank( ˆU) n1

6 and rank( ˆU)(n1 rank( ˆU)) n2 and (15) holds for r = rank( ˆU), where ˆU denotes a suboptimal solution to the optimization problem (13).

Then, with probability at least (1 ϵ) 1 exp( n1

2 ) n2, r rank( ˆU).

3.1.3 NUMERICAL RESULTS

In Fig. 1 and Fig. 2, the x-axis represents the sampling probability, and the y-axis denotes the value of r. The color scale represents the lower bound on the probability of event r SΩ. For example, as we can observe in Fig. 1, for any r {1, . . . , 44} we have r SΩwith probability at least 0.6 (approximately based on the color scale since the corresponding points are orange) given that p = 0.54. We consider the sampled matrix U R300 15000 and U R1200 240000 in Fig. 1 and Fig. 2, respectively. In particular, for ﬁxed values of sampling probability p and r, we ﬁrst ﬁnd a small ϵ that (15) holds by trial-and-error. Then, according to Proposition 11, we conclude that with probability at least (1 ϵ) 1 exp( n1

2 ) n2, r SΩ.

Sampling Probability

0.5 0.54 0.58 0.62 0.66 0.7

Figure 1: Probability of r SΩas a function of sampling probability for U R300 15000. The purpose of Figs. 3 6 is to show how tight our proposed upper bounds on rank can be. Here, we ﬁrst generate an n1 n2 random matrix of a given rank r by multiplying a random

RANK DETERMINATION FOR LOW-RANK DATA COMPLETION

Sampling Probability

0.2 0.24 0.28 0.32 0.36 0.4 0.44 0.48 0.52 0.56 0.6

Figure 2: Probability of r SΩas a function of sampling probability for U R1200 240000.

(entries are drawn according to a uniform distribution on real numbers within an interval) n1 r matrix and r n2 matrix. Then, each entry of the randomly generated matrix is sampled uniformly at random and independently across entries with some sampling probability p. Afterwards, we apply the nuclear norm minimization method proposed in (Candes and Recht, 2012) for matrix completion, where the non-convex objective function in (13) is relaxed by using nuclear norm, which is the convex hull of the rank function, as follows

minimize U Rn1 n2 U (17)

subject to U Ω= UΩ,

where U denotes the nuclear norm of U . Let ˆU denote an optimal solution to (17) and recall that U denotes an optimal solution to (13). Since (17) is a convex relaxation to (13), we conclude that ˆU is a suboptimal solution to (13), and therefore rank(U ) rank( ˆU ). We used the Matlab program found online (Shabat, 2015) to solve (17). As an example, we generate a random matrix U R300 15000 (the same size as the matrix in Fig. 1) of rank r as described above for r {1, . . . , 50} and some values of the sampling probability p. Then, we obtain the rank of the completion given by (17) and denote it by r . Due to the randomness of the sampled matrix, we repeat this procedure 5 times. We calculate the gap r r in each of these 5 runs and denote the maximum and minimum among these 5 numbers by dmax and dmin, respectively. Hence, dmax and dmin represent the loosest (worst) and tightest (best) gaps between the rank obtained by (17) and rank of the original sampled matrix over 5 runs, respectively. In Figs. 3 6, the maximum and minimum gaps are plotted as a function of rank of the matrix, for different sampling probabilities. We have the following observations.

According to Fig. 1, for p = 0.54 and p = 0.58 we can ensure that the rank of any completion is an upper bound on the rank of the sampled matrix or r with probability at least 0.6 and 0.8, respectively.

MORTEZA ASHRAPHIJUO, XIAODONG WANG, AND VANEET AGGARWAL

0 10 20 30 40 50 0

Rank of the Sampled Matrix

Maximum Gap

Minimum Gap

Figure 3: The gaps between the rank of the obtained matrix via (17) and that of the original sampled matrix for p = 0.46.

0 10 20 30 40 50 0

Rank of the Sampled Matrix

Maximum Gap

Minimum Gap

Figure 4: The gaps between the rank of the obtained matrix via (17) and that of the original sampled matrix for p = 0.50.

As we can observe in Figs. 3 6, the deﬁned gap is always a nonnegative number, which is consistent with previous observation that for p = 0.54 and p = 0.58 we can certify that with high probability ( 0.6) the rank of any completion is an upper bound on the rank of the sampled matrix or r .

For p = 0.54 and p = 0.58 that we have theoretical results (as mentioned in the ﬁrst observation) the gap obtained by (17) is very close to zero. This phenomenon (that we do not have a rigorous justiﬁcation for) shows that as soon as we can certify our proposed theoretical

RANK DETERMINATION FOR LOW-RANK DATA COMPLETION

0 10 20 30 40 50 0

Rank of the Sampled Matrix

Maximum Gap

Minimum Gap

Figure 5: The gaps between the rank of the obtained matrix via (17) and that of the original sampled matrix for p = 0.54.

0 10 20 30 40 50 0

Rank of the Sampled Matrix

Maximum Gap

Minimum Gap

Figure 6: The gaps between the rank of the obtained matrix via (17) and that of the original sampled matrix for p = 0.58.

results (i.e., as soon as the rank of a completion provides an upper bound on the rank of the sampled matrix or r ) by increasing the sampling probability, the upper bound found through (17) becomes very tight; in some cases this bound is exactly equal to r (red curves) and in some cases this bound is almost equal to r (blue curves). However, these gaps are not small (specially blue curves) for p = 0.46 and p = 0.50 and note that according to Fig. 1, for these values of p we cannot guarantee the bounds on the value of rank hold with high probability.

MORTEZA ASHRAPHIJUO, XIAODONG WANG, AND VANEET AGGARWAL

3.2 CP-Rank Tensor

Let Pi denote the Lebesgue measures on Rni r , i = 1, . . . , d. In this subsection, we assume that the sampled tensor U Rn1 ... nd is chosen generically from the manifold of tensors of rank r = rank CP(U) (where r is unknown), or in other words, the entries of U are drawn independently with respect to Lebesgue measure on the corresponding manifold. Hence, the probability measures of all statements in this subsection are P1 P2 . . . Pd. Condition Ar: Each row of the d-th matricization of the sampled tensor, i.e., U(d) includes at least r observed entries. We construct a binary valued tensor called constraint tensor Ωr based on Ωand a given number r. Consider any subtensor Y Rn1 n2 nd 1 1 of the tensor U. The sampled tensor U includes nd subtensors that belong to Rn1 n2 nd 1 1 and let Yi for 1 i nd denote these nd subtensors. Deﬁne a binary valued tensor Yi Rn1 n2 nd 1 ki, where ki = NΩ(Yi) r and its entries are described as the following. We can look at Yi as ki tensors each belongs to Rn1 n2 nd 1 1. For each of the mentioned ki tensors in Yi we set the entries corresponding to r of the observed entries equal to 1. For each of the other ki observed entries, we pick one of the ki tensors of Yi and set its corresponding entry (the same location as that speciﬁc observed entry) equal to 1 and set the rest of the entries equal to 0. In the case that ki = 0 we simply ignore Yi, i.e., Yi = By putting together all nd tensors in dimension d, we construct a binary valued tensor Ωr Rn1 n2 nd 1 K, where K = Pnd i=1 ki = NΩ(U) rnd and call it the constraint tensor. Observe that each subtensor of Ωr which belongs to Rn1 n2 nd 1 1 includes exactly r + 1 nonzero entries. In (Ashraphijuo and Wang, 2017b), an example is given on the construction of Ωr. Condition Br: Ωr consists a subtensor Ω r Rn1 n2 nd 1 K such that K = r(Pd 1 i=1 ni) r2 r(d 2) and for any K {1, 2, . . . , K} and any subtensor Ω r Rn1 n2 nd 1 K of Ω r we have

i=1 fi( Ω r)

min n max n f1( Ω r), . . . , fd 1( Ω r) o , r o (d 2)

where fi( Ω r) denotes the number of nonzero rows of the i-th matricization of Ω r. The following lemma is a re-statement of Theorem 1 in (Ashraphijuo and Wang, 2017b).

Lemma 13 With probability one, there are only ﬁnitely many rank-r completions of the sampled tensor if and only if Conditions Ar and Br hold.

Deﬁnition 14 Let SΩdenote the set of all natural numbers r such that both Conditions Ar and Br hold.

Lemma 15 There exists a number rΩsuch that SΩ= {1, 2, . . . , rΩ}.

Proof The proof is similar to the proof of Lemma 3 since the dimension of the manifold of CP rank-r tensors is r(Pd i=1 ni) r2 r(d 1), which is an increasing function in r.

The following theorem gives an upper bound on the unknown rank r .

Theorem 16 With probability one, exactly one of the following statements holds (i) r SΩ= {1, 2, . . . , rΩ}; (ii) For any arbitrary completion of the sampled tensor U of rank r, we have r / SΩ.

RANK DETERMINATION FOR LOW-RANK DATA COMPLETION

Proof Similar to the proof of Theorem 4, it sufﬁces to show that the assumption r / SΩresults that there exists a completion of U of CP rank r, where r SΩ, with probability zero. Deﬁne V = (V1, . . . , Vr) as the basis of the rank-r CP decomposition of U as in (3), where Vl = al 1 al 2 . . . al d 1 Rn1 ...nd 1 is a rank-1 tensor and al i is deﬁned in (3) for 1 l r and 1 i d. Deﬁne Y = (a1 d, . . . , ar d) and V d Y = Pr l=1 Vl al d. Observe that U = Pr l=1 Vl al d = V d Y. Observe that each row of U(d) includes at least rΩ+ 1 observed entries since Condition ArΩ holds. Moreover, the existence of a completion of the sampled tensor U of rank r SΩresults in the existence of a basis V = (V1, . . . , Vr) such that there exists Y = (a1 d, . . . , ar d) and UΩ= (V d Y)Ω. As a result, given V, each observed entry of U results in a degree-1 polynomial in terms of the entries of Y as

l=1 Vl(x1, . . . , xd 1)al d(xd). (19)

Note that rΩ r and each row of U(d) includes at least rΩ+ 1 r + 1 observed entries. Consider r + 1 of the observed entries of the ﬁrst row of U(d) and we denote them by U( x1), . . . , U( xr+1), where the last component of the vector xi is equal to one, 1 i r + 1. Similar to the proof of Theorem 4, genericity of U results in

l=1 tl U( xi), (20)

where tl s are constant scalars, l = 1, . . . , r. On the other hand, according to Lemma 13 there exist at most ﬁnitely many completions of the sampled tensor of rank r. Therefore, there exist a completion of U of rank r with probability zero. Moreover, an equation of the form of (20) holds with probability zero as r r + 1 and U is chosen generically from the manifold of tensors of rank-r . Therefore, there exists a completion of rank r with probability zero.

Corollary 17 Consider an arbitrary number r SΩ. Similar to Theorem 16, it follows that with probability one, exactly one of the followings holds (i) r {1, 2, . . . , r }; (ii) For any arbitrary completion of the sampled tensor U of rank r, we have r / {1, 2, . . . , r }.

Corollary 18 Assuming that there exists a CP rank-r completion of the sampled tensor U such that r SΩ, we conclude that with probability one r r.

Corollary 19 Let U denote an optimal solution to the following NP-hard optimization problem

minimize U Rn1 nd rank CP(U ) (21)

subject to U Ω= UΩ.

Assume that rank CP(U ) SΩ. Then, Corollary 18 results that r = rank CP(U ) with probability one.

The following lemma is Lemma 15 in (Ashraphijuo and Wang, 2017b), which is the probabilistic version of Lemma 13 in terms of the sampling probability.

MORTEZA ASHRAPHIJUO, XIAODONG WANG, AND VANEET AGGARWAL

Lemma 20 Assume that n1 = n2 = = nd = n, d > 2, n > max{200, r(d 2)} and r n

6 . Moreover, assume that the sampling probability satisﬁes

p > 1 nd 2 max 27 log n

+ 9 log 2r(d 2)

+ 18, 6r + 1

nd 2 . (22)

Then, with probability at least (1 ϵ) 1 exp(

2 ) n2 , we have r SΩ.

The following corollary is the probabilistic version of Corollaries 18 and 19.

Corollary 21 Assuming that there exists a CP rank-r completion of the sampled tensor U such that the conditions given in Lemma 20 hold, with the sampling probability satisfying (22), we conclude

that with probability at least (1 ϵ) 1 exp(

2 ) n2 we have r r. Therefore, given that (22) holds for r = rank(U ) and U denotes an optimal solution to the optimization problem (21),

with probability at least (1 ϵ) 1 exp(

2 ) n2 we have r = rank(U ).

3.2.1 NUMERICAL RESULTS

We generate a random tensor U R8 8 8 8 8 8 of CP-rank 2 by adding two random rank-1 tensors. The color scale represents the lower bound on the probability that we can guarantee the rank of a given completion is an upper bound on the true value of rank. Then, we solve the following convex optimization problem for different values of the sampling probability.

minimize U Rn1 nd e U (3) (23)

subject to U Ω= UΩ.

Note that rank of any of the unfoldings of a tensor is a lower bound on the CP-rank of that tensor. Hence, we minimize the nuclear norm of the unfolding with the possible maximum rank among all unfoldings as e U(3) R512 512. Then, we use the Matlab toolbox found online Tensorlab to calculate the CP-rank of the obtained tensor via solving convex program (23) (there are other methods to calculate CP decomposition, e.g., (Pimentel-Alarc on, 2016)). In Figure 7, gap represents the CP-rank of the solution of (23) minus the CP-rank of the original sampled tensor.

4. Vector-Rank Cases

4.1 Multi-View Matrix

Let P1 and P2 denote the Lebesgue measures on Rn r 1 and Rr 1 n1, respectively. Moreover, let P3 and P4 denote the Lebesgue measures on Rn (r r 1) and Rr 2 n2, respectively. In this paper, we assume that U is chosen generically from the manifold corresponding to rank vector (r 1, r 2, r ), i.e., the entries of U are drawn independently with respect to Lebesgue measure on the corresponding manifold. Hence, the probability measures of all statements in this subsection are P1 P2 P3 P4. The following Conditions will be used frequently in this subsection. Condition Ar1,r2: Each column of U1 and U2 include at least r1 and r2 sampled entries, respectively.

RANK DETERMINATION FOR LOW-RANK DATA COMPLETION

Sampling Probability

0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18

Figure 7: The rank gap as a function of sampling probability for U R8 8 8 8 8 8 of CP-rank 2.

We construct a binary valued matrix called constraint matrix for multi-view matrix U as Ωr1,r2 = [ Ωr1| Ωr2], where Ωr1 and Ωr2 represent the constraint matrix for single-view matrices U1 and U2 (deﬁned in Section 3.1), respectively. Condition Br1,r2,r: Ωr1,r2 consists a submatrix Ω r1,r2 Rn K such that K = nr r2 r2 1 r2 2 + r(r1 + r2) and for any K {1, 2, . . . , K} and any submatrix Ω r1,r2 Rn K of Ω r1,r2 we have

(r r2) f( Ω r1) r1 + + (r r1) f( Ω r2) r2 +

+(r1 + r2 r) f( Ω r1,r2) (r1 + r2 r) + K , (24)

where f(X) denotes the number of nonzero rows of X for any matrix X and Ω r1,r2 = [ Ω r1| Ω r2], and also Ω r1 and Ω r2 denote the columns of Ω r1,r2 corresponding to Ωr1 and Ωr2, respectively. The following lemma is a re-statement of Theorem 2 in (Ashraphijuo et al., 2017c).

Lemma 22 With probability one, there are only ﬁnitely many completions of the sampled multi-view data if and only if Conditions Ar 1,r 2 and Br 1,r 2,r hold.

Deﬁnition 23 Denote the rank vector r = (r1, r2, r). Deﬁne the generalized inequality r r as the component-wise set of inequalities, e.g., r 1 r1, r 2 r2 and r r.

Deﬁnition 24 Let SΩdenote the set of all r such that both Conditions Ar1,r2 and Br1,r2,r hold.

Lemma 25 Assume r SΩ. Then, for any r r, we have r SΩ.

Proof We consider the rank factorization of U as in (Ashraphijuo et al., 2017c) and similar to the single-view scenario in Lemma 3 each observed entry results in a polynomial in terms of the entries of the components of the decomposition. Note that the dimension of the manifold corresponding to rank vector r is equal to rn+r1n1+r2n2 r2 r2 1 r2 2 +r(r1+r2) (Ashraphijuo et al., 2017c), and

MORTEZA ASHRAPHIJUO, XIAODONG WANG, AND VANEET AGGARWAL

also observe that the fact that max{r1, r2} r r1+r2 min{2n, n1+n2} implies that reducing any of the values r1, r2, and r reduces the value of rn + r1n1 + r2n2 r2 r2 1 r2 2 + r(r1 + r2). Hence, the dimension of the manifold corresponding to rank vector r is larger than that for rank vector r , given r r, and thus similar to the proof of Lemma 3, ﬁnite completability of data with r results ﬁnite completability of data with r with probability one. Then, using Lemma 22, the proof is complete.

The following theorem provides a relationship between the unknown rank vector r and SΩ.

Theorem 26 With probability one, exactly one of the following statements holds (i) r SΩ; (ii) For any arbitrary completion of the sampled matrix U of rank vector r, we have r / SΩ.

Proof Similar to the proof of Theorem 4, suppose that there does not exist a completion of U of rank vector r such that r SΩ. Therefore, it is easily veriﬁed that statement (ii) holds and statement (i) does not hold. On the other hand, assume that there exists a completion of U of rank vector r, where r SΩ. Hence, statement (ii) does not hold and to complete the proof it sufﬁces to show that with probability one, statement (i) holds. Similar to Theorem 4, we show that assuming r / SΩ, there exists a completion of U of rank vector r, where r SΩ, with probability zero. Since r / SΩ, according to Lemma 25, for any r SΩat least one the following inequalities holds; r1 < r 1, r2 < r 2 and r < r . Note that assuming that there exists a completion of U1 of rank r1 with probability zero results that there exists a completion of U of rank vector r with probability zero and similar statement holds for r2 and r. Hence, in any possible scenario (r1 < r 1 or r2 < r 2 or r < r ) the similar proof as in Theorem 4 (for single-view matrix) results that there exists a completion of U of rank vector r, where r SΩ, with probability zero.

Corollary 27 Consider a subset S Ωof SΩsuch that for any two members of SΩthat r r and r S Ωwe have r S Ω. Then, with probability one, exactly one of the followings holds (i) r S Ω; (ii) For any arbitrary completion of U of rank vector r, we have r / S Ω.

Proof Note that the property in the statement of Lemma 25 holds for S Ωas well as SΩ. Moreover, as S Ω SΩ, for any r S Ωthere exists at most ﬁnitely many completions of U of rank vector r, and therefore the rest of the proof is the same as the proof of Theorem 26.

Corollary 28 Assuming that there exists a completion of U with rank vector r such that r SΩ, then with probability one r r.

The following lemma which is a re-statement of Theorem 3 in (Ashraphijuo et al., 2017c) gives the number of samples per column that is needed to ensure that Conditions Ar1,r2 and Br1,r2,r hold with high probability.

RANK DETERMINATION FOR LOW-RANK DATA COMPLETION

Lemma 29 Suppose that the following inequalities hold n

6 max{r1, r2, (r1 + r2 r)}, (25)

n1 (r r2)(n r1), (26)

n2 (r r1)(n r2), (27)

n1 + n2 (r r2)(n r1) + (r r1)(n r2)

+ (r1 + r2 r)(n (r1 + r2 r)). (28)

Moreover assume that each column of U is observed in at least l entries, uniformly at random and independently across entries, where

l > max 9 log n

+ 3 log 3 max {r r2, r r1, r1 + r2 r}

+ 6, 2r1, 2r2

Then, with probability at least 1 ϵ, r SΩ.

The following proposition is the probabilistic version of Theorem 26 in terms of the sampling probability instead of verifying Conditions Ar1,r2 and Br1,r2,r.

Proposition 30 Suppose that (25)-(28) hold for r and that each entry of the sampled matrix is observed uniformly at random and independently across entries with probability p, where

n max 9 log n

+ 3 log 3 max {r r2, r r1, r1 + r2 r}

+ 6, 2r1, 2r2

Then, with probability at least (1 ϵ) 1 exp( n

2 ) n1+n2, we have r SΩ.

Proof The proposition is easy to verify using Lemma 29 and Lemma 9 (similar to the proof for Proposition 11).

Corollary 31 Assuming that there exists a completion of U of rank vector r such that (25)-(28) hold

and the sampling probability satisﬁes (30), then with probability at least (1 ϵ) 1 exp( n

we have r r.

4.2 Tucker-Rank Tensor

Let Pi denote the Lebesgue measures on Rni m i , i = j + 1, . . . , d, and P0 denotes the Lebesgue measures on Rm j+1 m j+2 ... m d. In this subsection, we assume that the sampled tensor U Rn1 ... nd is chosen generically from the manifold of tensors of rank r = rank Tucker(U) = (m j+1, . . . , m d) (where r is unknown), or in other words, the entries of U are drawn independently with respect to Lebesgue measure on the corresponding manifold. Hence, the probability measures of all statements in this subsection are P0 Pj+1 Pj+2 . . . Pd. Without loss of generality assume that m j+1 . . . m d throughout this subsection. Also, given r = (mj+1, . . . , md), deﬁne the following function

MORTEZA ASHRAPHIJUO, XIAODONG WANG, AND VANEET AGGARWAL

Deﬁnition 32 For any i {j +1, . . . , d} and Si {1, . . . , ni}, deﬁne U(Si) as a set containing the entries of |Si| rows (corresponding to the elements of Si) of U(i). Moreover, deﬁne U(Sj+1,...,Sd) = U(Sj+1) . . . U(Sd).

Condition ATucker r : There exist Pd i=j+1 (nimi) observed entries such that for any Si {1, . . . , ni} for i {j + 1, . . . , d}, U(Sj+1,...,Sd) includes at most Pd i=j+1|Si|mi of the mentioned Pd i=j+1 nimi observed entries. Let P be a set of Pd i=j+1 (nimi) observed entries such that they satisfy Condition ATucker r . Now, we construct a (j + 1)th-order binary constraint tensor Ωr in some sense similar to that in Section 3.2. For any subtensor Y Rn1 n2 nj 1 1 of the tensor U, let NΩ(YP) denote the number of sampled entries in Y that belong to P. The sampled tensor U includes nj+1nj+2 nd subtensors that belong to Rn1 n2 nj 1 1

and we label these subtensors by Y(tj+1,...,td) where (tj+1, . . . , td) represents the coordinate of the

subtensor. Deﬁne a binary valued tensor Y(tj+1, ,td) Rn1 n2 nj

d j z }| { 1 . . . 1 k, where k = NΩ(Y(tj+1,...,td)) NΩ(YP (tj+1,...,td)) and its entries are described as the following. We can look at Y(tj+1, ,td) as k tensors each belongs to Rn1 n2 nj 1 1. For each of the mentioned k tensors in Y(tj+1, ,td) we set the entries corresponding to the NΩ(YP (tj+1,...,td)) observed entries that belong

to P equal to 1. For each of the other k observed entries, we pick one of the k tensors of Y(tj+1, ,td) and set its corresponding entry (the same location as that speciﬁc observed entry) equal to 1 and set the rest of the entries equal to 0. For the sake of simplicity in notation, we treat tensors Y(tj+1, ,td) as a member of Rn1 n2 nj k

instead of Rn1 n2 nj

d j z }| { 1 1 k. Now, by putting together all nj+1nj+2 nd tensors in dimension (j + 1), we construct a binary valued tensor Ωr Rn1 n2 nj Kj, where Kj = NΩ(U) Pd i=j+1 (nimi) and call it the constraint tensor (Ashraphijuo et al., 2016a). In (Ashraphijuo et al., 2016a), an example is given on the construction of Ωr. Condition BTucker r : The constraint tensor Ωr consists a subtensor Ω r Rn1 n2 nj K such

that K = Πj i=1ni Πd i=j+1mi Pd i=j+1 m2 i and for any K {1, 2, . . . , K} and any subtensor Ω r Rn1 n2 nd 1 K of Ω r we have Πd i=j+1mi fj+1( Ω r) gr

fj+1( Ω r) K , (31)

where fj+1( Ω r) denotes the number of nonzero columns of the (j + 1)-th matricization of Ω r. The following lemma is a re-statement of Theorem 3 in (Ashraphijuo et al., 2016a).

Lemma 33 With probability one, there are only ﬁnitely many completions of rank r of the sampled tensor if and only if Conditions ATucker r and BTucker r hold.

Deﬁnition 34 Let SΩdenote the set of all rank vectors r such that both Conditions ATucker r and BTucker r hold.

Lemma 35 Assume r SΩ. Then, for any rank vector r r, we have r SΩ.

RANK DETERMINATION FOR LOW-RANK DATA COMPLETION

Proof Note that the dimension of the manifold corresponding to r is Πj i=1ni Πd i=j+1mi + Pd i=j+1 nimi Pd i=j+1 m2 i , and thus by reducing the value of mi0 by one (for i0 {j+1, . . . , d}),

the value of the mentioned dimension reduces by at least Πj i=1ni +ni 2mi+1, which is greater than zero since mi ni. The rest of the proof is similar to the proof of Lemma 3.

Deﬁnition 36 Deﬁne SΩ(r) as a subset of SΩ, which includes all r SΩthat r r.

The following theorem gives a relationship between r and SΩ.

Theorem 37 With probability one, exactly one of the following statements holds (i) r SΩ; (ii) For any arbitrary completion of the sampled tensor U of rank r, we have r / SΩ(r ).

Proof Similar to the proof of Theorem 4, to complete the proof it sufﬁces to show that the assumption r / SΩresults that there exists a completion of U of rank r, where r SΩ(r ), with probability zero. Note that r SΩ(r ) SΩresults that Conditions ATucker r and BTucker r hold. Moreover, note that r r and since r / SΩwe conclude that there exists i0 {j + 1, . . . , d} such that mi0 < m i0. As a result, Pd i=j+1 nimi < Pd i=j+1 nim i . Condition BTucker r ensures there exists at least one more observed entry (otherwise the constraint tensor does not exist) besides the Pd i=j+1 nimi mentioned observed entries. Given the basis C Rn1 ... nj mj+1 ... md as in (7), there exist Pd i=j+1 nimi variables in the corresponding Tucker decomposition. However, we have Pd i=j+1 nimi + 1 polynomials in terms these Pd i=j+1 nimi variables and therefore the last polynomials can be written as algebraic combination of the other Pd i=j+1 nimi polynomials. This leads to a linear equation in terms of the Pd i=j+1 nimi + 1 corresponding observed entries. On the other hand, the Pd i=j+1 nimi observed entries satisfy the property stated as Condition ATucker r and it is easily veriﬁed that there exist Pd i=j+1 nim i entries (observed and non-observed) satisfying Condition ATucker r such that the union of the mentioned Pd i=j+1 nimi entries with any arbitrary other observed entry be a subset of those Pd i=j+1 nim i entries. However, U is generically chosen from the manifold corresponding to r and therefore a particular linear equation in terms of the mentioned Pd i=j+1 nim i entries holds with probability zero. The rest of the proof is similar to the proof of Theorem 4.

Corollary 38 Assuming that there exists a completion of U with rank vector r such that r SΩ, we conclude that with probability one r r.

The following lemma is Corollary 2 in (Ashraphijuo et al., 2016a), which ensures that Conditions ATucker r and BTucker r hold with high probability.

Lemma 39 Assume that Pd i=j+1 m2 i Πd i=j+1mi, Πd i=j+1ni NjΠd i=j+1mi Pd i=j+1 m2 i , Πd i=j+1mi Nj, where Nj = Πj i=1ni. Furthermore, assume that we observe each entry of U with

MORTEZA ASHRAPHIJUO, XIAODONG WANG, AND VANEET AGGARWAL

probability p, where

6 log (Nj) + 2 log

( 2 Pd i=j+1 r2 i ϵ , 2Πd i=j+1ri 2 Pd i=j+1 r2 i ϵ

Then, with probability at least (1 ϵ)

!Πd i=j+1ni , r SΩ.

The following corollary is the probabilistic version of Theorem 37.

Corollary 40 Assuming that there exists a completion of the sampled tensor U of Tucker rank r such that the assumptions in Lemma 39 hold and the sampling probability satisﬁes (32), then with

probability at least (1 ϵ)

!Πd i=j+1ni we have r r.

4.2.1 NUMERICAL RESULTS

We generate a random tensor U R8 8 8 8 8 8 of Tucker-rank (1, 3, 3, 2, 2). The color scale represents the lower bound on the probability that we can guarantee the rank of a given completion is a component-wise upper bound on the true rank. Then, we solve the following convex optimization problem for different values of the sampling probability.

minimize U Rn1 nd

i=1 U (i) (32)

subject to U Ω= UΩ.

Then, we calculate rank of each matricization of the tensor obtained via solving (32) to ﬁnd its Tucker-rank. In this scenario, for each component of the Tucker-rank, we ﬁnd the percentage of error via mi m i n m i 100%, where n = 8, mi and m i are the i-th rank component of the obtained tensor and original tensor, respectively. Hence, 100% error simply means that the corresponding matricization is full rank. In Figure 8, gap represents the average of the deﬁned error over all components of Tucker-rank, i.e., over all matricizations.

4.3 TT-Rank Tensor

Let Pi denote the Lebesgue measures on Ru i 1 ni u i , i = 1, . . . , d, where u 0 = u d = 1. In this subsection, we assume that the sampled tensor U Rn1 ... nd is chosen generically from the manifold of tensors of rank r = rank TT(U) = (u 1, . . . , u d 1) (where r is unknown), or in other words, the entries of U are drawn independently with respect to Lebesgue measure on the corresponding manifold. Hence, the probability measures of all statements in this subsection are P1 . . . Pd. Condition ATT r : Each row of the d-th matricization of the sampled tensor, i.e., U(d) includes at least ud 1 observed entries. We construct the d-way binary valued constraint tensor Ωud 1 similar to that in Section 3.2 as the following. Consider any subtensor Y Rn1 n2 nd 1 1 of the tensor U. The sampled tensor

RANK DETERMINATION FOR LOW-RANK DATA COMPLETION

Sampling Probability

0.13 0.132 0.134 0.136 0.138 0.140 0.142 0.144 0.146 0.148 0.15

Figure 8: The rank gap as a function of sampling probability for U R8 8 8 8 8 8 of Tucker-rank (1, 3, 3, 2, 2).

U includes nd subtensors that belong to Rn1 n2 nd 1 1 and let Yi for 1 i nd denote these nd subtensors. Deﬁne a binary valued tensor Yi Rn1 n2 nd 1 ki, where ki = NΩ(Yi) ud 1 and its entries are described as the following. We can look at Yi as ki tensors each belongs to Rn1 n2 nd 1 1. For each of the mentioned ki tensors in Yi we set the entries corresponding to ud 1 of the observed entries equal to 1. For each of the other ki observed entries, we pick one of the ki tensors of Yi and set its corresponding entry (the same location as that speciﬁc observed entry) equal to 1 and set the rest of the entries equal to 0. In the case that ki = 0 we simply ignore Yi, i.e., Yi = By putting together all nd tensors in dimension d, we construct a binary valued tensor Ωud 1 Rn1 n2 nd 1 K, where K = Pnd i=1 ki = NΩ(U) ud 1nd and call it the constraint tensor. Observe that each subtensor of Ωud 1 which belongs to Rn1 n2 nd 1 1 includes exactly ud 1+1 nonzero entries. In (Ashraphijuo and Wang, 2017a), an example is given on the construction of Ωud 1. Condition BTT r : Ωud 1 consists a subtensor Ω ud 1 Rn1 n2 nd 1 K such that K = Pd 1 i=1 ui 1niui Pd 1 i=1 u2 i and for any K {1, 2, . . . , K} and any subtensor Ω ud 1 Rn1 n2 nd 1 K

of Ω ud 1 we have

ui 1fi( Ω ud 1)ui u2 i + K , (33)

where fi( Ω ud 1) denotes the number of nonzero rows of the i-th matricization of Ω ud 1. The following lemma is a re-statement of Theorem 1 in (Ashraphijuo and Wang, 2017a).

Lemma 41 With probability one, there are only ﬁnitely many completions of rank r of the sampled tensor if and only if Conditions ATT r and BTT r hold.

Deﬁnition 42 Let SΩdenote the set of all rank vectors r such that both Conditions ATT r and BTT r hold.

MORTEZA ASHRAPHIJUO, XIAODONG WANG, AND VANEET AGGARWAL

The following lemma will be used in Lemma 44.

Lemma 43 ui min{ui 1ni, ui+1ni+1} for 1 i d 1.

Proof We ﬁrst show that ui ui 1ni, which is easily veriﬁed for i = 1 as e U1 includes n1 rows and u0 = 1, and therefore assume that i > 1. Deﬁne the (d 1)-way tensor Uli Rn1 ... ni 1 ni+1 ... nd such that Uli(x1, . . . , xi 1, xi+1, . . . , xd) = U(x1, . . . , xi 1, li, xi+1, . . . , xd) for 1 i d and 1 li ni. Also, recall that e Uli (i 1) denotes the (i 1)-th unfold-

ing of Uli. Observe that e Uli (i 1) is a subset of columns of matrix e U(i 1) (those columns that correspond to the entries of U with the i-th component of the location equal to li). Therefore, rank( e Uli (i 1)) rank( e U(i 1)) = ui 1.

On the other hand, observe that e Uli (i 1) is a subset of rows of e U(i) (those rows that correspond to the entries of U with the i-th component of the location equal to li). Hence, the union of rows of e Uli (i 1) s for 1 li ni constitute all rows of e U(i). Therefore, ui = rank( e U(i)) Pni li=1 rank( e Uli (i 1)) niui 1. Similarly, we can show that ui ui+1ni+1 to complete the proof.

Lemma 44 Assume r SΩ. Then, for any r r, we have r SΩ.

Proof Note that the dimension of the manifold corresponding to r is Pd i=1 ui 1niui Pd 1 i=1 u2 i . If we reduce the value of ui by one, the value of the mentioned dimension reduces by ui 1ni + ui+1ni+1 2ui + 1. According to Lemma 43, ui 1ni + ui+1ni+1 2ui + 1 is greater than zero, and therefore r r results that the dimension of the manifold corresponding to r is greater than that corresponding to r . The rest of the proof is similar to the proof of Lemma 3.

Deﬁnition 45 Deﬁne ˆSΩas the set of all rank vectors r SΩsuch that there exists a rank vector r SΩwith r r and ud 1 < u d 1 (instead of ud 1 u d 1). Note that ˆSΩalso satisﬁes the property in Lemma 44.

Theorem 46 With probability one, exactly one of the following statements holds: (i) r ˆSΩ; (ii) For any arbitrary completion of the sampled tensor U of rank r, we have r / ˆSΩ.

Proof Similar to the proof of Theorem 4, to complete the proof it sufﬁces to show that the assumption r / ˆSΩresults that there exists a completion of U of rank r, where r ˆSΩ, with probability zero. Deﬁne the multiplication U(1) . . . U(d 1) in (9) as the basis of the rank r TT decomposition of U. Then, by considering the (d 1)-th unfolding of U(1) . . . U(d 1) in TT decomposition we obtain a matrix factorization of the (d 1)-th unfolding of U. The rest of the proof is similar to the proof of Theorem 4.

Similar to Theorem 46, we can show the following.

RANK DETERMINATION FOR LOW-RANK DATA COMPLETION

Corollary 47 Consider a subset ˆS Ωof ˆSΩsuch that for any two members of ˆSΩthat r r and r ˆS Ωwe have r ˆS Ω. Then, with probability one, exactly one of the followings holds (i) r ˆS Ω; (ii) For any arbitrary completion of U of rank vector r, we have r / ˆS Ω.

Corollary 48 Assuming that there exists a completion of U with rank vector r such that r ˆSΩ, we conclude that with probability one r r.

The following lemma is Lemma 14 in (Ashraphijuo and Wang, 2017a), which ensures that Conditions ATT r and BTT r hold with high probability.

Lemma 49 Deﬁne m = Pd 2 k=1 uk 1uk, M = n Pd 2 k=1 uk 1uk Pd 2 k=1 u2 k and u = max n u1 u0 , . . . ,

o . Assume that n1 = n2 = = nd = n, n > max{m, 200} and u min{n

6 , ud 2} hold. Moreover, assume that the sampling probability satisﬁes

p > 1 nd 2 max 27 log n

+ 18, 6ud 2

nd 2 . (34)

Then, with probability at least (1 ϵ) 1 exp(

2 ) n2 , we have r SΩ.

The following corollary is the probabilistic version of Corollary 48.

Corollary 50 Assuming that there exists a completion of the sampled tensor U of TT rank r such that the assumptions in Lemma 49 hold and the sampling probability satisﬁes (34), then with prob-

ability at least (1 ϵ) 1 exp(

2 ) n2 we have r r.

4.3.1 NUMERICAL RESULTS

We generate a random tensor U R8 8 8 8 8 8 of TT-rank (1, 2, 4, 1, 1). The color scale represents the lower bound on the probability that we can guarantee the rank of a given completion is a component-wise upper bound on the true rank. Then, we solve the following convex optimization problem for different values of the sampling probability.

minimize U Rn1 nd

i=1 e U (i) (35)

subject to U Ω= UΩ.

Then, we calculate rank of each unfolding of the tensor obtained via solving (35) to ﬁnd its TT-rank. In this scenario, for each component of the TT-rank, we ﬁnd the percentage of error via ui u i min{ni,nd i} u i 100%, where n = 8, d = 6, ui and u i are the i-th rank component of the obtained tensor and original tensor, respectively. Hence, 100% error simply means that the corresponding unfolding is full rank. In Figure 9, gap represents the average of the deﬁned error over all components of TT-rank, i.e., over all unfoldings.

MORTEZA ASHRAPHIJUO, XIAODONG WANG, AND VANEET AGGARWAL

Sampling Probability

0.12 0.125 0.13 0.135 0.14 0.145 0.15 0.155 0.16 0.165 0.17

Figure 9: The rank gap as a function of sampling probability for U R8 8 8 8 8 8 of TT-rank (1, 2, 4, 1, 1).

5. Conclusions

We make use of the recently developed algebraic geometry analyses that study the fundamental conditions on the sampling patterns for ﬁnite completability under a number of low-rank matrix and tensor models to treat the problem of rank approximation for a partially sampled data. Particularly, the goal is to approximate the unknown scalar or vector rank based on the sampling pattern and the rank of a given completion. A number of data models have been treated, including singleview matrix, multi-view matrix, CP tensor, tensor-train tensor and Tucker tensor. First we have provided an upper bound on the unknown scalar rank (for single-view matrix and CP tensor) and an component-wise upper bound on the vector rank (for multi-view matrix, Tucker tensor and TT tensor) with probability one assuming that the sampling pattern satisﬁes the proposed combinatorial conditions. Moreover, we have also provided probabilistic versions of such bounds that hold with high probability assuming that the sampling probability is above a threshold. In addition, for singleview matrix and CP tensor, these upper bounds can be exactly equal to the unknown scalar rank given the lowest-rank completion. To illustrate how tight our proposed upper bounds are, we have provided some numerical results for the single-view matrix case in which we applied the nuclear norm minimization to ﬁnd a low-rank completion of the sampled data and observe that the proposed upper bound is almost equal to the true unknown rank.

Acknowledgments

This work was supported in part by the U.S. National Science Foundation (NSF) under grant CIF1064575, and in part by the U.S. Ofﬁce of Naval Research (ONR) under grant N000141410667.

Morteza Ashraphijuo and Xiaodong Wang. Characterization of deterministic and probabilistic sampling patterns for ﬁnite completability of low tensor-train rank tensor. ar Xiv preprint:1703.07698,

RANK DETERMINATION FOR LOW-RANK DATA COMPLETION

Morteza Ashraphijuo and Xiaodong Wang. Fundamental conditions for low-CP-rank tensor completion. Journal of Machine Learning Research, 18 (63):1 29, 2017b.

Morteza Ashraphijuo and Xiaodong Wang. Scaled nuclear norm minimization for low-rank tensor completion. ar Xiv preprint:1707.07976, 2017c.

Morteza Ashraphijuo, Ramtin Madani, and Javad Lavaei. Inverse function theorem for polynomial equations using semideﬁnite programming. In IEEE 54th Annual Conference on Decision and Control (CDC), pages 6589 6596, 2015.

Morteza Ashraphijuo, Vaneet Aggarwal, and Xiaodong Wang. Deterministic and probabilistic conditions for ﬁnite completability of low rank tensor. ar Xiv preprint:1612.01597, 2016a.

Morteza Ashraphijuo, Ramtin Madani, and Javad Lavaei. Characterization of rank-constrained feasibility problems via a ﬁnite number of convex programs. In 2016 IEEE 55th Conference on Decision and Control (CDC), pages 6544 6550, 2016b.

Morteza Ashraphijuo, Ramtin Madani, and Javad Lavaei. Characterization of rank-constrained feasibility problems via a ﬁnite number of convex programs. In IEEE 55th Conference on Decision and Control (CDC), pages 6544 6550, 2016c.

Morteza Ashraphijuo, Vaneet Aggarwal, and Xiaodong Wang. A characterization of sampling patterns for low-Tucker-rank tensor completion problem. In IEEE International Symposium on Information Theory (ISIT), pages 531 535, 2017a.

Morteza Ashraphijuo, Xiaodong Wang, and Vaneet Aggarwal. A characterization of sampling patterns for low-rank multi-view data completion problem. In IEEE International Symposium on Information Theory (ISIT), pages 1147 1151, 2017b.

Morteza Ashraphijuo, Xiaodong Wang, and Vaneet Aggarwal. Deterministic and probabilistic conditions for ﬁnite completability of low-rank multi-view data. ar Xiv preprint:1701.00737, 2017c.

Morteza Ashraphijuo, Xiaodong Wang, and Vaneet Aggarwal. An approximation of the CP-rank of a partially sampled tensor. In 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2017d.

Jian Feng Cai, Emmanuel J Cand es, and Zuowei Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4):1956 1982, 2010.

Emmanuel Candes and Benjamin Recht. Exact matrix completion via convex optimization. Communications of the ACM, 55(6):111 119, 2012.

Emmanuel J Cand es and Benjamin Recht. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6):717 772, 2009.

Emmanuel J Cand es and Terence Tao. The power of convex relaxation: Near-optimal matrix completion. IEEE Transactions on Information Theory, 56(5):2053 2080, 2010.

MORTEZA ASHRAPHIJUO, XIAODONG WANG, AND VANEET AGGARWAL

Emmanuel J Cand es, Yonina C Eldar, Thomas Strohmer, and Vladislav Voroninski. Phase retrieval via matrix completion. SIAM Journal on Imaging Sciences, 6(1):199 225, 2013.

Lars Eld en. Matrix Methods in Data Mining and Pattern Recognition, volume 4. Society for Industrial and Applied Mathematics, 2007.

Silvia Gandy, Benjamin Recht, and Isao Yamada. Tensor completion and low-n-rank tensor recovery via convex optimization. Inverse Problems, 27(2):1 19, 2011.

Rong Ge, Jason D Lee, and Tengyu Ma. Matrix completion has no spurious local minimum. ar Xiv preprint:1605.07272, 2016.

Nicholas JA Harvey, David R Karger, and Kazuo Murota. Deterministic network coding by matrix completion. In the annual ACM-SIAM symposium on discrete algorithms, pages 489 498, 2005.

Prateek Jain, Praneeth Netrapalli, and Sujay Sanghavi. Low-rank matrix completion using alternating minimization. In Annual Symposium on the Theory of Computing, pages 665 674, 2013.

Hui Ji, Chaoqiang Liu, Zuowei Shen, and Yuhong Xu. Robust video denoising using low rank matrix completion. In CVPR, pages 1791 1798, 2010.

Lek-Heng Lim and Pierre Comon. Multiarray signal processing: Tensor decomposition meets compressed sensing. Comptes Rendus Mecanique, 338(6):311 320, 2010.

Ji Liu, Przemyslaw Musialski, Peter Wonka, and Jieping Ye. Tensor completion for estimating missing values in visual data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1):208 220, 2013.

X. Y. Liu, S. Aeron, V. Aggarwal, X. Wang, and M. Y. Wu. Adaptive sampling of RF ﬁngerprints for ﬁne-grained indoor localization. IEEE Transactions on Mobile Computing, 15(10):2411 2423, 2016.

Oyetunji E Ogundijo, Abdulkadir Elmas, and Xiaodong Wang. Supplemental material for reverse engineering gene regulatory networks from measurement with missing values.

Oyetunji E Ogundijo, Abdulkadir Elmas, and Xiaodong Wang. Reverse engineering gene regulatory networks from measurement with missing values. EURASIP Journal on Bioinformatics and Systems Biology, 2017(1):2, 2017.

D Pimentel-Alarc on. A simpler approach to low-rank tensor canonical polyadic decomposition. In 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 474 481. IEEE, 2016.

D Pimentel-Alarc on, Nigel Boston, and Robert D Nowak. Deterministic conditions for subspace identiﬁability from incomplete sampling. In IEEE International Symposium on Information Theory (ISIT), pages 2191 2195, 2015.

D Pimentel-Alarc on, L Balzano, R Marcia, R Nowak, and R Willett. Group-sparse subspace clustering with missing data. In IEEE Workshop on Statistical Signal Processing (SSP), pages 1 5, 2016a.

RANK DETERMINATION FOR LOW-RANK DATA COMPLETION

D Pimentel-Alarc on, EDU Robert D Nowak, and WISC EDU. The information-theoretic requirements of subspace clustering with missing data. In International Conference on Machine Learning, 2016b.

Daniel Pimentel-Alarc on and Robert Nowak. A converse to low-rank matrix completion. In IEEE International Symposium on Information Theory (ISIT), 2016.

Daniel Pimentel-Alarc on, Laura Balzano, and Robert Nowak. Necessary and sufﬁcient conditions for sketched subspace clustering. In Allerton Conference on Communication, Control, and Computing, 2016c.

Daniel Pimentel-Alarc on, Nigel Boston, and Robert Nowak. A characterization of deterministic sampling patterns for low-rank matrix completion. IEEE Journal of Selected Topics in Signal Processing, 10(4):623 636, 2016d.

Benjamin Recht and Christopher R e. Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation, 5(2):201 226, 2013.

Gil Shabat. Matrix completion using nuclear norm, spectral norm or weighted nuclear norm minimization. 2015.

Nicholas D Sidiropoulos and Anastasios Kyrillidis. Multi-way compressed sensing for sparse lowrank tensors. IEEE Signal Processing Letters, 19(11):757 760, 2012.