# riemannian_submanifold_tracking_on_lowrank_algebraic_variety__da2e14c5.pdf Riemannian Submanifold Tracking on Low-Rank Algebraic Variety Qian Li Chinese Academy of Sciences Beijing, China qianli.charlene@gmail.com Zhichao Wang Tsinghua University Beijing, China wzchary@gmail.com Matrix recovery aims to learn a low-rank structure from high dimensional data, which arises in numerous learning applications. As a popular heuristic to matrix recovery, convex relaxation involves iterative calling of singular value decomposition (SVD). Riemannian optimization based method can alleviate such expensive cost in SVD for improved scalability, which however is usually degraded by the unknown rank. This paper proposes a novel algorithm RIST that exploits the algebraic variety of low-rank manifold for matrix recovery. Particularly, RIST utilizes an efficient scheme that automatically estimate the potential rank on the real algebraic variety and tracks the favorable Riemannian submanifold. Moreover, RIST utilizes the second-order geometric characterization and achieves provable superlinear convergence, which is superior to the linear convergence of most existing methods. Extensive comparison experiments demonstrate the accuracy and efficiency of RIST algorithm. Introduction Matrix recovery is becoming a fundamental problem arising in many applications ranging from machine learning, data mining to computer vision (Vandereycken 2013; Liu et al. 2011; Cand es et al. 2011). In particularly, low-rank matrix recovery decomposes a possibly noisy data Z into a lowrank component X and a sparse component E, and this can be formalized as an optimization problem: min X, E : rank(X) + λΥ(E) s.t. Z = A(X) + B(E) (1) where rank( ) encourages variable X to be low-rank, Υ( ) refers to a non-smooth regularizer and λ is the regularization parameter. Both A( ) and B( ) are linear operators determined by applications (Cand es et al. 2011; Cand es and Recht 2009; Recht 2011). Early attempts adopt the convex relaxation to make the non-convex low-rank matrix recovery tractable (e.g., the trace norm as the convex surrogates for the rank function), such as EALM (Lin, Chen, and Ma 2010), IALM (Lin, Chen, and Ma 2010) and LSADM (Goldfarb, Ma, and Scheinberg 2013). Above algorithms are greatly influenced Corresponding author Copyright c 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. by the cost of iterative calling of singular value decomposition (SVD) or truncated SVD (Bouwmans and Zahzah 2014). To reduce this cost, many algorithms transform the problem into a convex programming on the positive semi-definite matrices and use the approximate SDP solver (Hazan 2008; Jaggi, Sulovsk, and others 2010). However, these algorithms either converge slowly (Wang et al. 2014; Jaggi 2013) or are memory inefficient on large-scale problems (Laue 2012; Tan et al. 2014). More significantly, these algorithms resolve the convex-relaxed problem of lowrank matrix recovery instead of the original non-convex one, which may produce the solutions that seriously deviate from the true ones (Papamakarios, Panagakis, and Zafeiriou 2011). An alternative heuristic resorts to Riemannian optimization by exploiting the smooth geometry for optimum searching, which has shown superior scalability than the convex relaxation methods (Vandereycken 2013; Cand es et al. 2011). Most existing Riemannian optimization based algorithms differ in their specific choice of Riemannian manifold. For instance, RTRMC (Boumal and Absil 2011) and Lingo (Li et al. 2015) constrain the searching space over Grassmannian manifolds after adopting matrix factorization. RP (Tan et al. 2014), LRGeom CG (Vandereycken 2013) and Sc Grass MC (Cand es et al. 2011)) exploit the rank constraint as the fixed-rank matrix manifold and directly optimize the matrix recovery problem via Riemannian theory. Though most Riemannian optimization based algorithms show superior performance than classical heuristics on numerical experiments, two main limitations still exist. First, the rank of the matrix to be recovered determines the geometry of Riemannian manifold, which is usually unknown and nontrivial to specify. Moreover, most Riemannian optimization approaches establish the linear convergence rate at best. In addition, as Riemannian manifold is open without boundary when the rank is fixed, this convergence analysis may be limited due to the neglect of manifold boundary (Schneider and Uschmajew 2015; Agarwal, Negahban, and Wainwright 2010; Yan et al. 2015). To address above issues, this paper proposes an effective matrix recovery solver named Riemannian Submanifold Tracking (RIST), which optimizes on the closure of fixedrank matrix manifolds and searches for the optimal solution with the second-order information of Riemannian geometry. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) The main contributions of this work are as follows: Rather than using unbounded manifolds, RIST exploits the searching space over the real-algebraic variety that is closed with boundary. Theoretical analysis proves that RIST achieves a super-linear convergence rate, which is superior to the linear convergence rate of most state-ofthe-art methods. By tracking the submanifold of real-algebraic variety, the rank can be automatically estimated, and this alleviates the difficulty of rank estimation in most existing fixedrank matrix manifolds. RIST utilizes the second-order geometric characterization and optimizes over the submanifold of the real-algebraic variety, which benefits for achieving an accurate optimum with fast convergence. Numerical experiments on the task of Robust PCA and matrix completion also verify the effectiveness of RIST algorithm. Preliminaries and Notations Riemannian Manifolds can be considered as low dimensional smooth surfaces embedded in a higher dimensional Euclidean space. Given a positive integer k, the fixed-rank matrix manifold is denoted as Mk = {X Rm n : rank(X) = k} = {Udiag(σ)VT : U Stm k , V Stn k, σ 0 = k} (2) where Stm k = {U Rm k : UT U = I} specifies the Stiefel manifold of m k semi-orthogonal matrices. The set Mk is a submanifold of dimension (m+n k)k embedded in Rm n, and its tangent space at X Mk is given by TXMk = [U U ] A B C 0 [V V ]T (3) where A Rk k, B Rk (n k), C R(m k) k, and 0 R(m k) (n k). Given a smooth function f on manifold, its Riemannian gradient grad f(X) is given as the orthogonal projection of the Euclidean gradient f(X) onto the tangent space TXMk at X. Let PU = UUT , PV = VVT , the orthogonal projection is computed as PTXMk(B) = {PUBPV + P UBPV + PUBP V} (4) The Riemannian gradient grad f(X) satisfies Eq. (5): grad f(X) = PTXMk( f(X)) (5) For any X = UΣVT Mk and ξ TXMk, Riemannian Hessian in the direction of tangent vector ξ satisfies Hessf(X) [ξ] =PUξPV + P U(ξ + f(X)VpΣ 1VT )PV + PU(ξ + UΣ 1UT p f(X))PV (6) where UT p U = 0 and VT p V = 0. The real-algebraic variety M k specifies the set of matrices with rank less than k, which can be also considered as the closure of fixed-rank matrix manifolds: M k ={X Rm n : rank(X) k} =M1 M2 Mk (7) Suppose that U = ran(X) and V = ran(XT ), the tangent cone with respect to M k is as follows TXM k = TXMk κ {Ξκ U V } (8) Noted that Ξκ is the best rank-κ approximation of f(X) PTXMk κ( f(X)). Then the Riemannian gradient can be computed by grad f(X) = PTXMk κ( f(X)) + Ξκ (9) Given the tangent vector ξ defining a direction over the tangent space, retraction operator smoothly maps the element from tangent spaces to manifold as follows: RM : ξ R(X, ξ) = PM(X + ξ) = i=1 σiuiv T i (10) It is worth noting that the projection PM( ) is the rank-k approximation operation. Both ui and vi are the (ordered) singular values and vectors from X + ξ, and for references therein please refer to (Schneider and Uschmajew 2015). RIST Algorithm This section describes the RIST algorithm for non-convex matrix recovery in detail, which exploits the geometry of real-algebraic variety and adopts a trust-region method for optimization. Moreover, this section also gives theoretical justification on the superlinear convergence of RIST algorithm. Riemannian Submanifold Tracking RIST algorithm reformulates the non-convex matrix recovery of problem (1) as a Riemannian optimization over the searching space of real-algebraic variety M k as follows: min X, E : f(X, E) + λΥ(E) s.t. X M k, (11) Let f(X, E) = A(X) + B(E) Z 2 F . The following part describes the details of RIST algorithm for solving problem (11). The rank determines the real-algebraic variety M k in some sense. Most previous approaches initialize the rank empirically or resort to full SVD, which is inaccurate and expensive for large-scale problem. Motivated by Random SVD (Halko, Martinsson, and Tropp 2011), this paper proposes a simple and effective schema to initialize the rank through randomized partial matrix decompositions, as shown in Algorithm 1. Given the observed matrix Z, only few θ 1 singular values are computed sequentially by using a random matrix Θ. Algorithm 1 finally outputs the κ as the initial rank until the following condition is satisfied: σκ κ i=1 σi η, 0 η 0.05 (12) where σi σ = {σ1, σ2, , σi, σκ} are arranged in descending order. Moreover, the condition (12) is also significant for updating the rank by summing up a dynamic integer κ, which tracks the submanifold Mk within its closure M k. Algorithm 1 Rank Initialization Input: observed matrix Z Rm n, θ 1. Output: κ 1: U = 0 Rm θ, Σ = 0 Rθ θ, V = 0 Rn θ; 2: repeat 3: Generate standard Guassian matrix Θ Rn θ. 4: Compute the random SVD by calling: [Uθ, Σθ, Vθ] = Random SVD(Z UΣVT , θ, Θ) 5: U [U Σθ], Σ [Σ Σθ], V [V Vθ]. 6: Terminate if singular values in diag(Σ) satisfies Eq. (12); otherwise let κ = κ + θ. 7: until convergence 8: return κ. To accelerate the convergence, RIST implements a warmstart strategy over M k to yield initial point (X , E ) for a series of second-order Riemannian subproblems over fixedrank manifold Mk. To update X as in Algorithm 2, we fix E = Et and minimize a local model of f(X, Et) with respect to X. Given the search direction ξ = grad f(X, Et) and the step size α obtained by Armijo rule, X can be updated by smooth line-search Riemannian algorithm as follows: f(Xt 1, Et) f(R(Xt, αtξt), Et) αt ξt, ξt (13) For updating E, RIST fixes X and solves the objective function (11). The potential optimum E may vary. For Robust PCA (RPCA) (Cand es et al. 2011), the optimum E satisfies E = S[Z A(X0)] (14) Algorithm 2 RIST: Riemann Submanifold Tracking Input: estimated rank κ Output: X, E 1: Initialize X1 = 0, E1 = 0, k = κ; 2: for t = 1, 2, . . . do 3: Compute the gradf(Xt, Et) on M k by Eq. (9). 4: Update X = R(Xt, αtgrad f(Xt, Et)) on M k by Eq. (10). 5: Compute E by Eq. (14). 6: Let k = k + κ. 7: Use (X , E ) as warm-start point on Mk and call: (Xt+1, Et+1) = ROM(X , E , k) 8: Let σ = diag(Xt+1) and update κ by Eq. (12). 9: end for 10: return X and E. Submanifold Optimization over Mk After the initial points (X , E ) are computed, RIST calls an accurate second-order Riemannian optimization (ROM) over Mk submanifold. We fix E = Ej as in Algorithm 3 of ROM. The second-order approximation over the tangent space is m(X, Ej; δ) = f(X, Ej) + grad f(X, Ej), δ + 1 2 Hessf(X, Ej) [δ] , δ (15) where δ TXMk, grad f(X, Ej) and Hessf(X, Ej) [δ] are yielded by equation (5) and (6) respectively. The lowrank problem (11) with respect to X can be reformulated as follows: min δ : m(X, Ej; δ) + Υ(Ej)s.t. (X, δ) TXS, X Mk (16) It is worth mentioning that the tangent bundle TXS is as follows: TXS = {(X, δ)|X Mk, δ TXMk, δ, δ Δ} (17) where Δ > 0 is called the trust-region radius determining how far the movement can be made. RIST updates Δ iteratively by measuring the quotient as follows: φj = f(Xj) f(RM(Xj)) m(Xj; 0)) m(RM(Xj); δj) (18) Depending on the value of φj, the new iterate Xj will be accepted or rejected. When φj is exceedingly small by φj < μ1, the step-length and direction are not appropriate, the Xj should be rejected and the φj is reduced by a factor d1 < 1. Moreover, the φj (μ1, μ2) demonstrates a quite appropriate model producing the acceptable Xj and radius φj. A third case is that the model is relatively appropriate by φj > μ2, thus we accept Xj and enlarge φ by a factor d2 > 1. After yielding X, the updating process for E is similar to the Step 5 in Algorithm 2. Algorithm 3 ROM: Riemann Optimization over Mk Input: k, Δmax > 0, Δ0 (0, Δmax), X , E . Output: X, E 1: Initialize X0 = X , E0 = E , μ1 = 0.25, μ2 = 0.75, d1 = 0.5, d2 = 3. 2: for j = 0, 1, . . . do 3: Compute G = grad f(Xj) over Mk by Eq. (5). 4: Let H = Hess f(Xj)[G] via (6) and compute G 2 F H, G G H, G > 0 5: Update Xj+1 = R (Xj, βjδj) on Mk by Eq. (10). 6: Compute φj by Eq. (18). 7: if φj < μ1 then 8: Δj+1 = d1Δj 9: Xj+1 = Xj 10: else if μ1 < φj < μ2 then 11: Δj+1 = Δj 12: else 13: Δj+1 = min(d2Δj, Δmax) 14: end if 15: Compute Ej+1 via Eq. (14). 16: end for 17: return X and E. Complexity Analysis From above description, we can see that the complexity of RIST algorithm is dominated by the computation of Riemannian gradient, Riemannian Hessian and the retraction operator. Both Riemannian gradient and Hessian involves computing matrix-vector product and requires O(mnk) flops. Rank estimation and retraction operator are implemented by Randomized SVD (Halko, Martinsson, and Tropp 2011) and PROPACK (Larsen 2004) respectively, both with complexity O(mnk). Overall, RIST consumes a computational complexity of O(mnk). It is a significant complexity reduction since the estimated rank k is typically very small compared to the problem size m n. More significantly, the above complexity will reduce to O((m+n)k) when the matrix is sufficiently sparse. Convergence Analysis The optimum for E can either be zero in MC or obtained by the closed-form in RPCA, this section considers the convex function f with respect to X, namely f(X). Lemma 1. (Absil, Mahony, and Sepulchre 2009) The sequence {Xt} generated by Algorithm 2 converges to the stationary point X with grad f(X ) = 0. Theorem 1. Suppose δ TXj Mk is the unique solution for the subproblem (15) at point Xj and the first-order necessary condition gradf(Xj) + Hessf(Xj)[δ ] = 0 is satisfied. Assume that the Hessf(Xj) is positive definite, Algorithm 3 has q-superlinear convergence. Proof. Suppose the γ is the unique minimizing geodesic satisfying γ(0) = Xj and γ(1) = Xj+1. Let P 1 0 γ denote the parallel translator along γ by sending the vector of TXj Mk to the vector of TXj+1Mk and γ (0) = δ . P 0 1 γ gradf(Xj+1) = gradf(Xj) + 1 d dτ P 0 τ γ ξdτ dτ P 0 τ γ gradf(Xj) Hessf(Xj)[δ ] dτ Considering the fact that the parallel translation is an isometry, we have P0 1 γ gradf(Xj+1) = gradf(Xj+1) . Also we have gradf(Xj+1) P0 τ γ Hessf(γ) γ Hessf(Xj)[δ ] dτ 0 ( P 0 τ γ Hessf(γ)P τ 0 γ Hessf(γ)P τ 0 γ + Hessf(γ) P τ 0 γ I + Hessf(γ) Hessf(Xj) )dτ c0 δ 2( X3 + c1) where γ is a function of τ, namely, γ = γ(τ). Since δ satisfies the first order necessary condition gradf(Xj) + Hessf(Xj)[δ ] = 0, then δ = U0(UT 0 [Hessf(Xj)]U0) 1UT 0 gradf(Xj) where U0 is an arbitrary orthonormal basis for TXS (Sun, Qu, and Wright 2015). Therefore, it follows that gradf(Xj+1) o( X 3 ) gradf(Xj) 2 (19) Applying the Taylors theorem to f(Xj) yields gradf(Xj) = 1 2Hessf(X )(Xj X )2+o( Xj X 2) Let λmin and λmax be the eigenvalues of Hessf(X ), then there exist c2 < λmin and c3 > λmax such that c2 X Xj grad f(Xj) c3 X Xj (20) Combining (19) and (20) gives c2 X Xj+1 gradf(Xj+1) o( X 3 )c2 3 X Xj 2 Consequently, it can be deduced that X Xj+1 q X Xj where q = o( X 3 )c2 3 Xj X /c2. Experiments In this section, the tasks of Robust PCA and matrix completion are carried out to evaluate the proposed RIST algorithm. All comparison algorithms are implemented in Matlab and tested on a desktop computer with a 3.20 GHz CPU and 4.00 GB of memory. Experiments on Matrix Completion This section chooses five state-of-the-art manifold related completion algorithms including RTRMC (Boumal and Absil 2011), q Geom (Mishra, Apuroop, and Sepulchre 2012), RP (Tan et al. 2014), LRGeom (Vandereycken 2013) and Sc Grass (Ngo and Saad 2012) for comparison. Synthetic Experiments The low-rank matrix is generated X = LRT Rm m of rank k, where every entry in the matrix L or R follows the standard Gaussian distribution independently. We define the test set Ω by sampling k(2m k)ρ entries from X uniformly and the set size is |Ω| = k(2m k)ρ. Note that m = 10000 and ρ is the oversampling ratio that determines the number of entries that are observed. The parameters ρ and η of RIST are set as 1.5 and 0.04, respectively. We use the default parameters values for the comparison methods. The Mean Squared Error (MSE) is used as the comparison metric for synthetic experiments: MSE = PΩ( X) PΩ(X) 2 F /|Ω|. For comparing the convergence of RIST with several baseline methods, we set rank k = 15, 25 and 35, respectively. Fig. 1(a) first illustrates the numerical behavior of Algorithm 2 with respect to rank estimation. The rank estimated by RIST is 29, 103 and 148 receptively, which is approximate to the ground-truth rank setting (e.g., 30, 100, 150). More importantly, RIST takes less than six iterations to accurately estimate the ground-truth rank. Consequently, it can be concluded that RIST can accurately estimate the groundtruth rank with small number of iterations. In addition, performance evaluations in terms of MSE versus the run time are shown in Fig. 1(b), Fig. 1(c) and Fig. 1(d). RTRMC, q Geom, LRGeom and Sc Grass optimize over fixed-rank manifold, which all need to predefine the rank in advance. We set the rank parameter of these comparison methods as the ground-truth, namely, 15, 25 and 35. It can be seen that RTRMC converges the slowest, while RP is faster than RTRMC, Sc Grass, q Geom and LRGeom. Note that the RIST algorithm converges faster than other methods. 1 2 3 4 5 6 0 Estimated Rank rank=100 rank=150 rank=30 (a) estimated rank 0 5 10 15 20 25 30 10 10 RIST Sc Grass q Geom LRGeom RTRMC RP (b) rank k = 15 0 20 40 60 80 10 10 (c) rank k = 25 0 50 100 150 200 10 10 (d) rank k = 35 Figure 1: Performance of matrix completion methods on synthetic experiments: (a) shows the number of iterations required for estimating the rank. (b), (c) and (d) describe the results of MSE versus time. Movie 10M Jester all q Geom LRGeom RP RTRMC Sc Grass RIST Movie 10M Jester all 0 q Geom LRGeom RP RTRMC Sc Grass RIST Figure 2: Performance of comparison methods on Jester-all and Movie-10M. Collaborative Filtering We then test the proposed RIST algorithm on the real-world collaborative filtering task. Two collaborative filter datasets: the Jester-all dataset (Goldberg et al. 2001) and Movie-10M dataset (Herlocker et al. 1999) are used for the collaborative filtering. Among them, the Jester-all contains 4.1 106 ratings for 100 jokes from 73421 users, while the Movie-10M contains 107 ratings given by 69878 users on 10677 movies. The Normalized Mean Absolute Error (NMAE) is used as the comparison metric: i,j Ω | Xij Xij| (rmax rmin)|Ω| , where rmax and rmin are the upper and the lower bounds for the ratings. 50% of ratings is randomly selected as the test set Ω with the size defined by |Ω|. The estimated rank of Jester-all and Movie-10M by the proposed RIST algorithm is 19 and 16. For fair comparison, the same rank settings are assumed for Sc Grass, LRGeom, q Geom, RTRMC and RP. The parameters η of RIST is 0.05. We use the default parameters values for other methods. The averaged results of 20 independent experiments are reported in Fig. 2. RIST achieves the best NMAE values with least computational time, when compared with other methods. Noted that q Geom, RP and Sc Grass perform comparable NMAE values with RIST on Jester-all and Movie10M, but they consume more time to obtain such satisfactory results. These empirical results indicate that RIST is more efficient than competing algorithms while achieving the better or comparable NMAE for collaborative filtering. Experiments on Robust PCA For a systematic evaluation, we compare the proposed RIST algorithm with three popular algorithms for Robust PCA problem, including EALM (Lin, Chen, and Ma 2010), IALM (Lin, Chen, and Ma 2010) and LSADM (Goldfarb, Ma, and Scheinberg 2013). 0 1 2 3 4 5 10 8 RIST(η=0.04) RIST(η=0.05) (a) rank k = 0.06m 0 1 2 3 4 5 10 8 EALM RIST(η=0.04) RIST(η=0.05) IALM LSADM (b) rank k = 0.07m 0 1 2 3 4 5 10 8 EALM RIST(η=0.04) RIST(η=0.05) IALM LSADM (c) rank k = 0.08m 0 1 2 3 4 5 10 8 EALM RIST(η=0.04) RIST(η=0.05) IALM LSADM (d) rank k = 0.1m Figure 3: Performance of Robust PCA methods on synthetic experiments. Synthetic Experiments Each synthetic data is characterized by the sum of low-rank matrix X and sparse matrix E. For the sparse matrix E, the percentage of nonzero entries are both set as 10%. Those non-zero entries are generated from a uniform distribution with the range ( 100, 100). The Relative Recovery Error (RRE) and time are used to measure the matrix recovery accuracy and efficiency: RRE = X X F / X F . Performance evaluations in terms of RRE and computational time are shown in Figure 3, where m is set as 300 300 and the rank is defined as the percentage of matrix dimension. Note that RIST with different η generally converges faster than other three methods while LSADM per- Table 1: Performance evaluation on synthetic data in terms of RRE and time. size rank EALM IALM LSADM RIST (η = 0.05) RIST (η = 0.04) Time RRE Time RRE Time RRE Time RRE Time RRE 1% 11.0 3.4e-7 7.1 8.9e-8 49.7 8.4e-7 5.1 6.6e-8 5.0 6.4e-8 2% 13.5 1.4e-7 8.7 5.6e-8 72.9 7.6e-7 8.2 4.4e-8 8.3 4.3e-8 5% 13.8 5.1e-7 10.5 4.6e-6 77.6 2.7e-7 9.4 1.9e-7 9.8 1.7e-7 1% 51.3 6.2e-7 19.5 9.1e-7 87.8 8.7e-7 17.0 7.4e-7 16.8 7.3e-7 2% 51.8 1.0e-6 28.0 9.2e-7 92.9 5.5e-6 27.1 3.7e-7 26.8 3.5e-7 5% 62.1 9.8e-7 30.6 3.3e-7 141.4 6.9e-7 30.3 2.8e-8 29.1 2.7e-8 forms the worst under three rank settings. Additionally, from larger matrices of 500 500 and 1000 1000 in Table 1, RIST outperforms all other three methods with fast convergence and comparable accuracy, both under the scenario of η = 0.04 and η = 0.05. IALM improves the convergence speed of EALM but achieves worse RRE, since the partial SVD in IALM usually produce approximate solutions. In contrast, LSADM consumes more running time to achieve the comparable RRE values with other methods. Figure 4: Subway Station and Highway video frames and their separated backgrounds and foregrounds. Background Modeling Background modeling is an active application of RPCA problem, where the background sequence is modeled by a low-rank component X changing over time and the moving foreground objects E constitute the correlated sparse noises. Two surveillance videos including the Subway Station frames and Highway frames are used. Specifically, 195 frames with the resolution 160 130 are extracted from Subway Station frames and 90 frames with the resolution 320 240 are extracted from the Highway frames. From Fig. 4 it is evident that IALM, EALM and LSADM fail to completely remove the moving shadows marked by the red circles. Especially, LSADM is affected by the variation of shadow and produces some white parts after the objects moving. In contrast, RIST is robust to the shadow changes and provides cleaner backgrounds without any remaining shadows. Moreover, the moving objects of Subway Station images separated by EALM and LSADM still involve some background shadow except human flows and moving escalators, which verifies that EALM and LSADM perform poorly than IALM and RIST. RIST algorithm is superior to IALM through more accurately capturing the moving people and separating the backgrounds. Though Riemannian optimization based matrix recovery methods have shown superior scalability over convex relaxation methods, they are often degraded by inappropriate rank estimation in practice. Moreover, due to the neglect of the manifold boundary, their convergence analysis is not reliable. To alleviate these issues, this paper proposes an algorithm RIST that adopts Riemannian optimization to recover the matrix over low-rank algebraic variety. RIST represents an effective rank estimation strategy and utilizes the secondorder geometric characterization, which achieves superlinear convergence rate. Both numerical experiments and theoretical analysis validate the effectiveness of RIST. Absil, P.-A.; Mahony, R.; and Sepulchre, R. 2009. Optimization algorithms on matrix manifolds. Princeton University Press. Agarwal, A.; Negahban, S.; and Wainwright, M. J. 2010. Fast global convergence rates of gradient methods for highdimensional statistical recovery. In Advances in Neural Information Processing Systems, 37 45. Boumal, N., and Absil, P.-a. 2011. Rtrmc: A riemannian trust-region method for low-rank matrix completion. In Advances in neural information processing systems, 406 414. Bouwmans, T., and Zahzah, E. H. 2014. Robust pca via principal component pursuit: a review for a comparative evaluation in video surveillance. Computer Vision and Image Understanding 122:22 34. Cand es, E. J., and Recht, B. 2009. Exact matrix comple- tion via convex optimization. Foundations of Computational mathematics 9(6):717 772. Cand es, E. J.; Li, X.; Ma, Y.; and Wright, J. 2011. Robust principal component analysis? Journal of the ACM (JACM) 58(3):11. Goldberg, K.; Roeder, T.; Gupta, D.; and Perkins, C. 2001. Eigentaste: A constant time collaborative filtering algorithm. Information Retrieval 4(2):133 151. Goldfarb, D.; Ma, S.; and Scheinberg, K. 2013. Fast alternating linearization methods for minimizing the sum of two convex functions. Mathematical Programming 141(12):349 382. Halko, N.; Martinsson, P.-G.; and Tropp, J. A. 2011. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review 53(2):217 288. Hazan, E. 2008. Sparse approximate solutions to semidefinite programs. In LATIN 2008: Theoretical Informatics. Springer. 306 316. Herlocker, J. L.; Konstan, J. A.; Borchers, A.; and Riedl, J. 1999. An algorithmic framework for performing collaborative filtering. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, 230 237. ACM. Jaggi, M.; Sulovsk, M.; et al. 2010. A simple algorithm for nuclear norm regularized problems. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), 471 478. Jaggi, M. 2013. Revisiting frank-wolfe: Projection-free sparse convex optimization. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), 427 435. Larsen, R. M. 2004. Propack-software for large and sparse svd calculations. Available online. URL http://sun. stanford. edu/rmunk/PROPACK 2008 2009. Laue, S. 2012. A hybrid algorithm for convex semidefinite optimization. ar Xiv preprint ar Xiv:1206.4608. Li, Q.; Niu, W.; Li, G.; Cao, Y.; Tan, J.; and Guo, L. 2015. Lingo: Linearized grassmannian optimization for nuclear norm minimization. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, 801 809. ACM. Lin, Z.; Chen, M.; and Ma, Y. 2010. The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. ar Xiv preprint ar Xiv:1009.5055. Liu, R.; Lin, Z.; Wei, S.; and Su, Z. 2011. Solving principal component pursuit in linear time via l 1 filtering. ar Xiv preprint ar Xiv:1108.5359. Mishra, B.; Apuroop, K. A.; and Sepulchre, R. 2012. A riemannian geometry for low-rank matrix completion. ar Xiv preprint ar Xiv:1211.1550. Ngo, T., and Saad, Y. 2012. Scaled gradients on grassmann manifolds for matrix completion. In Advances in Neural Information Processing Systems, 1412 1420. Papamakarios, G.; Panagakis, Y.; and Zafeiriou, S. 2011. Generalised scalable robust principal component analysis. analysis 58(3):11 37. Recht, B. 2011. A simpler approach to matrix completion. The Journal of Machine Learning Research 12:3413 3430. Schneider, R., and Uschmajew, A. 2015. Convergence results for projected line-search methods on varieties of lowrank matrices via lojasiewicz inequality. SIAM Journal on Optimization 25(1):622 646. Sun, J.; Qu, Q.; and Wright, J. 2015. Complete dictionary recovery over the sphere. ar Xiv preprint ar Xiv:1504.06785. Tan, M.; Tsang, I. W.; Wang, L.; Vandereycken, B.; and Pan, S. J. 2014. Riemannian pursuit for big matrix recovery. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), 1539 1547. Vandereycken, B. 2013. Low-rank matrix completion by riemannian optimization. SIAM Journal on Optimization 23(2):1214 1236. Wang, Z.; Lai, M.-J.; Lu, Z.; Fan, W.; Davulcu, H.; and Ye, J. 2014. Rank-one matrix pursuit for matrix completion. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), 91 99. Yan, Y.; Mingkui, T.; Ivor, T.; Yi, Y.; Chengqi, Z.; and Qinfeng, S. 2015. Scalable maximum margin matrix factorization by active riemannian subspace search. In International Joint Conference on Artificial Intelligence, 3989 3994.