# twodimensional_pca_with_fnorm_minimization__9e47c12b.pdf Two-Dimensional PCA with F-Norm Minimization Qianqian Wang State Key Laboratory of ISN, Xidian University Xi an China Quanxue Gao State Key Laboratory of ISN, Xidian University Xi an China Two-dimensional principle component analysis (2DPCA) has been widely used for face image representation and recognition. But it is sensitive to the presence of outliers. To alleviate this problem, we propose a novel robust 2DPCA, namely 2DPCA with F-norm minimization (F-2DPCA), which is intuitive and directly derived from 2DPCA. In F-2DPCA, distance in spatial dimensions (attribute dimensions) is measured in F-norm, while the summation over different data points uses 1-norm. Thus it is robust to outliers and rotational invariant as well. To solve F-2DPCA, we propose a fast iterative algorithm, which has a closed-form solution in each iteration, and prove its convergence. Experimental results on face image databases illustrate its effectiveness and advantages. Introduction Principal component analysis (PCA) (Turk and Pentland 1991), linear discriminant analysis (LDA) (Belhumeur, Hespanha, and Kriegman 1997), locality preserving projection (LPP) (He and Niyogi 2005) and neighborhood preserving embedding (NPE) (He et al. 2005) are four of the most representative methods, where PCA is used to extract the most expressive features, LDA is considered to be capable of extracting the most discriminating features. Different from PCA and LDA, which characterize the global geometric structure, LPP and NPE well preserve the local geometric structure of data. Applying the aforementioned methods to image recognition, we need to transform each image, which is represented as a matrix, into 1D image vector by concatenating all rows. So, these methods cannot well exploit the spatial structure information that is embedded in pixels of image and important for image representation and recognition (Yang et al. 2004; Zhang et al. 2015; Lu, Plataniotis, and Venetsanopoulos 2008). To handle this problem, many twodimensional subspace learning methods or tensor methods have been developed (Yang et al. 2004; Lu, Plataniotis, and Venetsanopoulos 2008; Yang et al. 2005). In contrast to the aforementioned methods, two-dimensional subspace learning methods directly extract features from image matrix and Copyright c 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. fully consider the variation among different rows/columns of an image. The representative two-dimensional methods include two-dimensional PCA (2DPCA) (Yang et al. 2004) and two-dimensional LDA (2DLDA) (Yang et al. 2005). Although their motivations of two-dimensional methods are different, they can be unified within the graph embedding framework (Yan et al. 2005) and measure the similarity between images by using squared F-norm. It is commonly known that squared F-norm is not robust in the sense that outlying measurements can arbitrarily skew the solution from the desired solution. Thus, these methods are not robust in the presence of outliers (Ke and Kanade 2005; Collins, Dasgupta, and Schapire 2001; Gao et al. 2013). Recently, ℓ1-norm based subspace learning technique is considered to be capable of obtaining the robust projection vectors and has become an active topic in dimensionality reduction. For example, Ke and Kanade (2005) proposed L1PCA that uses ℓ1-norm to measure the reconstruction error. Kwak (2008) used ℓ1-norm to measure the variance and proposed PCA-L1 with greedy algorithm. Nie et al. (2011) proposed a non-greedy iterative to solve PCA-L1. Motivated by ℓ1-norm based PCA, some ℓ1-norm based LDA algorithms have been developed, such as LDA-L1 (Zhong and Zhang 2013) and ILDA-L1 (Chen, Yang, and Jin 2014). However, ℓ1-norm is not rotational invariant (Ding et al. 2006), which is a fundamental property of Euclidean space with ℓ2-norm. It has been emphasized in the context of learning algorithms (Kwak 2014). Based on this content, Ding et al. (2006) proposed the rotational invariant ℓ1-norm for feature extraction and developed R1-PCA that measures the similarity among data by R1-norm, which is just ℓ2,1-norm of a matrix. To further analysis robustness of subspace learning technique, Kwak et al. extended ℓ1-norm to ℓp-norm and proposed ℓp-norm based subspace learning methods (Kwak 2014; Oh and Kwak 2016). Although the aforementioned methods are robust to outliers, they need to transform 2D image into a vector by concatenating all rows of image. So, these methods cannot well exploit the spatial structure information of data. To handle this problem, Li et al. (2010) extended PCA-L1 to 2DPCAL1 with greedy algorithm. Wang et al. (2013) imposed sparse constraint in 2DPCA-L1 and proposed 2DPCAL1-S. Y. Pang et al. (2010) proposed ℓ1-norm based tensor subspace learning. Wang et al. (2015) proposed 2DPCA-L1 Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) with non-greedy algorithm. However, these methods do not have rotational invariance and do not explicitly consider the reconstruction error, which is the real goal of PCA. To handle these problems, we propose a robust 2DPCA with F-norm minimization, namely F-2DPCA for feature extraction. In F-2DPCA, distance in spatial dimensions (attribute dimensions) is measured in F-norm, while the summation over different data points uses ℓ1-norm. Furthermore, we solve F-2DPCA by non-greedy iterative algorithm, which has a closed-form solution in each iteration. Finally, we prove the convergence of our proposed algorithm. Compared with ℓ1-norm based 2DPCA methods, our approach has the following advantages. Fist, F-2DPCA not only is robust to outliers but also has rotational invariance, which has been emphasized in the context of learning algorithms; Second, our proposed non-greedy algorithm has a local solution and best minimizes the objective function value. Third, our approach (solution) relates to image covariance matrix. 2DPCA and 2DPCA-L1 Denote by Ai Rm n(i = 1, 2, ..., N) the N training images, and V = [v1, v2, ..., vk] Rn k the projection matrix. Without loss of generality, we assume the data set are centralized, i.e., N i=1 Ai = 0. 2DPCA aims to seek a projection matrix by (Yang et al. 2004): max VT V=Ik i=1 Ai V 2 F (1) where tr( ) is the trace operator of a matrix, Ik Rk k is an identity matrix, and 2 F denotes the squared Fnorm. It is easy to see that, the objective function (1) is totally equivalent to the objective function (2) due to the fact N i=1 Ai Ai VVT 2 F + N i=1 Ai V 2 F = N i=1 Ai 2 F . min VT V=Ik Ai Ai VVT 2 F (2) The solution of the objective function (1) or (2) is composed of the eigenvectors of the image covariance matrix St = N i=1 (Ai)T Ai corresponding to the first k largest eigenvalues. We can see that squared large distance will remarkably dominate the solution of the objective function (1) or (2). Thus, the objective function (1) or (2) is not robust in the sense that outlying measurements can skew the solution from the desired solution. To handle this problem, 2DPCA-L1 was proposed (Li, Pang, and Yuan 2010; Wang et al. 2015). It aims to find the projection matrix by solving the following objective function. max VT V=Ik i=1 Ai V L1 (3) where L1 denotes the ℓ1-norm of a matrix, which is defined as follows: D L1 = m i=1 n j=1 |D(i, j)| D(i, j) denotes the element of the i-th row j-th column of matrix D. Compared with traditional 2DPCA, ℓ1-norm based 2DPCA technique is robust, but it has several shortcomings as follows. Traditional 2DPCA has rotational invariance, while ℓ1-norm based 2DPCA does not have this property. Given an arbitrary rotation matrix Γ( ΓΓT = I), in general, we have ΓAi V L1 = Ai V L1. Moreover, it is not clear whether ℓ1-norm based PCA (i.e., solution) relates to the covariance matrix. Finally, the objective function (3) does not explicitly consider the reconstruction error, which is the real goal of PCA, due to the fact N i=1 Ai Ai VVT L1 + N i=1 Ai V L1 = N i=1 Ai L1. To handle these problems, we propose a robust 2DPCA with F-norm minimization in the following section. 2DPCA with F-norm minimization Motivation and Objective function 2DPCA uses the squared F-norm to measure the similarity among images in the objective function. It is well known that squared F-norm is not robust in the sense that outlying measurements can arbitrarily skew the solution from the desired solution. This results in sensitivity of 2DPCA. To handle this problem, the contribution of distance metric to the criterion function (2) should reduce the effect of large distance. Moreover, we hope to obtain a robust low-dimensional subspace that is not uniquely determined up to an orthogonal transformations. Compared with squared F-norm, F-norm not only can weaken the effect of large distance but also has rotational invariance. Thus, an intuitive and reasonable way is to use F-norm instead of squared F-norm, i.e., Ai Ai VVT 2 F Ai Ai VVT F (4) Substituting Eq. (4) into the objective function (2), we have argmin VT V=Ik Ai Ai VVT F (5) The objective function (5) is called 2DPCA with F-norm minimization (F-2DPCA). In the objective function (5), distance in spatial dimensions (attribute dimensions) is measured in F-norm, while the summation over different data points uses ℓ1-norm. Compared with 2DPCA, our proposed method can further weaken the effect of large distance, and compared with 2DPCA-L1, our proposed method has rotational invariance due to the fact ΓAi V F = Ai V F . Now we consider how to solve the objective function (5). By simple algebra, we have Ai Ai VVT F = Ai Ai VVT 2 F Ai Ai VVT F tr(Ai T Ai) tr(VT Ai T Ai V) Ai Ai VVT F Substituting Eq. (6) into Eq. (5), and by simple algebra, the objective function (5) becomes argmin VT V=Ik tr(Ai T Ai) tr(VT Ai T Ai V) di (7) where di = 1 Ai Ai VVT F , in order to avoid being divided by 0, di is defined as follows. di = 1 Ai Ai VVT F + γ (8) where γ > 0 is a small constant. In the objective function (7), we have two unknown variables V and di which relate to V . Thus, it has no closedform solution and is difficult to directly solve the solution of the objective function (7). An algorithm can be developed for alternatively updating V (while fixing di) and di (while fixing V). To be specific, in the (t + 1)-th iteration, when di (t) is known, we can update V by minimizing the objective function (7). In this case, the first term in the objective function (7) becomes constant. Thus, the objective function (7) is converted to solve the following objective function: argmax VT V=Ik tr(VT HV) (9) where H = N i=1 Ai T di Ai, which is the weighted image covariance matrix. According to the matrix theory, the column vectors of the optimal projection matrix V of Eq. (9) are composed of the eigenvectors of H = N i=1 Ai T di Ai corresponding to the k largest eigenvalues. After that, we can calculate di by Eq. (8). This iterative procedure is repeated until convergence, which is proved in the subsequent subsection. Eq. (9) illustrates that solution of our proposed method relates to the weighted image covariance matrix. We summarize the pseudo code of solving the objective function (5), i.e., F2DPCA in Algorithm 1. Algorithm 1: F -2DPCA Input: Ai Rm n( i = 1, , N), k , where A is centralized, γ = 0.00001. Initialize V(t) Rm k which satisfies VT V = I, t = 1. while not converge do 1. For all training samples, calculate d(t)(i = 1, , N) by Eq. (8). 2. Calculate H(t) according to Eq. (9), i.e., H(t) = N i=1 Ai T di (t)Ai . 3. Solve V(t+1) = argmax VT V=Ik tr(VT H(t)V): the column of the optimal solution V(t+1) are the eigenvectors of H(t) corresponding to the k largest eigenvalues. 4. Update t t + 1. end while Output:V(t+1) Rm k Convergence analysis Theorem 1: In each iteration of Algorithm 1, we have: Ai Ai V(t+1)(V(t+1)) T F Ai Ai V(t)(V(t)) T F i.e., Algorithm 1 monotonically decreases the objective function value of F-2DPCA. Proof: For each iteration t, according to step 3 in Algorithm 1, we have the following inequality tr((V(t+1)) T Ai T Ai V(t+1)) Ai Ai V(t)(V(t)) T F tr((V(t)) T Ai T Ai V(t)) Ai Ai V(t)(V(t)) T F Multiplying -1 and adding N tr(Ai T Ai) Ai Ai V(t)(V(t)) T F on both sides of Eq. (11), and by simple algebra, the Eq. (11) becomes tr(Ai T Ai) tr((V(t+1)) T Ai T Ai V(t+1)) Ai Ai V(t)(V(t)) T F tr(Ai T Ai) tr((V(t)) T Ai T Ai V(t)) Ai Ai V(t)(V(t)) T F According to Ai Ai VVT F = tr(Ai T Ai) tr(VT Ai T Ai V), Eq. (12) becomes Ai Ai V(t+1)(V(t+1)) T 2 F Ai Ai V(t)(V(t)) T F Ai Ai V(t)(V(t)) T 2 F Ai Ai V(t)(V(t)) T F According to inequality a2 + b2 2ab b2 a 2b a , we have 2 Ai Ai V(t+1)(V(t+1)) T F Ai Ai V(t)(V(t)) T F Ai Ai V(t+1)(V(t+1)) T 2 F Ai Ai V(t)(V(t))T F (14) Eq. (14) holds for each index i, thus we can rewrite (14) as N i=1 (2 Ai Ai V(t+1)(V(t+1)) T F Ai Ai V(t)(V(t)) T F ) Ai Ai V(t+1)(V(t+1)) T 2 F Ai Ai V(t)(V(t)) T F Combining Eq. (13) and Eq. (15) yields i=1 (2 Ai Ai V(t+1)(V(t+1)) T F Ai Ai V(t)(V(t)) T F ) Ai Ai V(t)(V(t)) T 2 F Ai Ai V(t)(V(t)) T F By simple algebra, Eq. (16) becomes Ai Ai V(t+1)(V(t+1)) T F Ai Ai V(t)(V(t)) T F Eq. (17) shows that the Algorithm 1 monotonically decreases the objective function value of F-2DPCA in each iteration. Theorem 2: Algorithm 1 will converge to a local solution of the objective function (5). Proof: The Lagrangian function of the objective function (5) is Ai Ai VVT F tr ΛT (VT V I) (18) where the Lagrangian multiplies Λ = (Λpq) for enforcing the orthonormal constrains VT V = I . The KKT condition for optimal solution specifies that the gradient of L must be zero, i.e., i=1 ( Ai Ai VVT 1 F )Ai T Ai V VΛT = 0 (19) By simple algebra, we have i=1 ( Ai Ai VVT 1 F )Ai T Ai V = VΛT (20) According to step 3 in Algorithm 1, we find the optimal solution of the objective function (9). Thus the converged solution of Algorithm 1 satisfies the KKT condition of the objective function (9). The Lagrangian function of Eq. (9) is L2(V) = tr(VT N i=1 Ai T di Ai V) tr(ΛT (VT V I)) (21) Taking the derivative w.r.t. V and setting it to zero, we get the KKT condition of Eq. (9) as follows i=1 Ai T Aidi V VΛT = 0 (22) Eq. (22) is formally similar to Eq. (20). The main difference between Eq. (22) and Eq. (20) is that di is known in each iteration in Algorithm 1. Suppose we obtain the optimal solution V in the (t + 1)-th, thus, we have Vt+1 = V = Vt. According to the definition of di, we can see that Eq. (22) is the same as Eq. (20) in this case. It means that the converged solution of Algorithm 1 satisfies the KKT condition of Eq. (5), i.e. V=V = 0 (23) Combining Theorem 1 and Eq. (23), we have that the converged solution of Algorithm 1 is a local solution of Eq. (5). Experimental results We validate our approach in three face databases (Extended Yale B, AR and PIE) and compare it with 2DPCA (Yang et al. 2004), 2DPCA-L1 (Li, Pang, and Yuan 2010), 2DPCAL1 non-greedy (Wang et al. 2015), 2DPCAL1-S (Wang and Wang 2013) and N-2DPCA (Zhang et al. 2015). In our experiments, we use 1-nearest neighbor (1NN) for classification. We set the number of projection vectors as 25 in the Extended Yale B and CMU PIE databases, 30 in the AR database. The Extended Yale B database (Georghiades, Belhumeur, and Kriegman 2001) consists of 2144 frontal-face pictures of 38 individuals with different illuminations. There are 64 pictures for each person except 60 for 11th and 13th, 59 for 12th, 62 for 15th and 63 for 14th, 16th and 17th. Figure 1(a) shows some samples of one person in the Extended Yale B database. In the experiments, each image was normalized to 32 32 pixels. 14 images of each individual were randomly selected and noised by black and white dots with random distribution. The location of noise is random and ratio of the pixels of noise to number of image pixels is intervenient 0.05 to 0.15. We randomly select 32 images, which include 7 noisy images, per person for training, and the remaining images for testing. 2DPCA, 2DPCA-L1, 2DPCA-L1 nongreedy, 2DPCAL1-S, N-2DPCA and our approach are used to extract features, respectively. We repeat this process 10 times. In the AR database (Martinez 1998), the pictures of 120 individuals were taken in two sessions. Each session contains 13 color images, which include 6 images with occlusions and 7 full facial images with different facial expressions and lighting conditions. We manually cropped the face Figure 1: (a) Some samples of one person in the Extended Yale B database. (b) Some samples of one person in the CMU PIE database. (The second row is noised samples.) Dimension 0 5 10 15 20 25 Recognition Rate(%) 2DPCA 2DPCA-L1 2DPCAL1-S 2DPCA-L1 non-greedy N-2DPCA F-2DPCA Number of experiments 1 2 3 4 5 6 7 8 9 10 Reconstruction Error 2DPCA 2DPCA-L1 2DPCAL1-S 2DPCA-L1 non-greedy N-2DPCA F-2DPCA Figure 2: (a) Classification accuracy vs. the number of projection vectors. (b) The optimal Reconstruction Error of six approaches under ten experiments on the Extended Yale B database. Dimension 0 5 10 15 20 25 30 Recognition Rate(%) 2DPCA 2DPCA-L1 2DPCAL1-S 2DPCA-L1 non-greedy N-2DPCA F-2DPCA Number of experiments 1 2 3 4 5 6 7 8 9 10 Reconstruction Error 2DPCA 2DPCA-L1 2DPCAL1-S 2DPCA-L1 non-greedy N-2DPCA F-2DPCA Figure 3: (a) Classification accuracy vs. the number of projection vectors. (b) The optimal Reconstruction Error of six approaches under ten experiments on the AR database. portion of the image and then normalized it to 50 40 pixels. In the experiments, we randomly select 13 images per person for training and the remaining images for testing, and then repeat this process 10 times. The CMU PIE database (Sim, Baker, and Bsat 2002) consists of 2856 frontal-face images of 68 individuals with different illuminations. In the experiments, each image was normalized to 32 32 pixels, we randomly selected 10 images and added the same noise as that in the Extended Yale B database. Figure 1(b) shows some samples of one person in the CMU PIE database. We randomly select 21 images, which include 16 without noisy images, per person for training and the remaining images for testing, and then repeat this process 10 times. Tables 1 and 2 list the average recognition accuracy, running time and the corresponding standard deviation of each method on the Extended Yale B, AR, and CMU PIE databases, respectively. Figures 2, 3, and 4 plot the classification curve versus the number of projection vectors and the reconstruction error of six approaches under ten Dimension 0 5 10 15 20 25 Recognition Rate(%) 2DPCA 2DPCA-L1 2DPCAL1-S 2DPCA-L1 non-greedy N-2DPCA F-2DPCA Number of experiments 1 2 3 4 5 6 7 8 9 10 Reconstruction Error 2DPCA 2DPCA-L1 2DPCAL1-S 2DPCA-L1 non-greedy N-2DPCA F-2DPCA Figure 4: (a) Classification accuracy vs. the number of projection vectors. (b) The optimal Reconstruction Error of six approaches under ten experiments on the CMU PIE database. experiments on the Extended Yale B, AR and CMU PIE databases, respectively. Figure 5 shows the convergence curve of our method on the Extended Yale B, AR, and CMU PIE databases. (1) 2DPCA is overall inferior to the other five approaches. The main reason is that 2DPCA is not robust to outliers such as illumination and occlusion. 2DPCA-L1, 2DPCAL1 non-greedy and 2DPCAL1-S are not remarkably better than 2DPCA. This is probably because that they do not explicitly consider the reconstruction error. N-2DPCA is not better. The reason is that in classification stage, we use Euclidean distance to measure similarity between data rather than nuclear norm as in (Zhang et al. 2015). Table 1: The average classification accuracy (%) and the corresponding standard deviation on the Extended Yale B, AR and CMU PIE databases. Experiments Methods Extended AR CMU PIE Yale B 2DPCA 59.92 0.42 80.40 0.88 85.39 0.73 2DPCA-L1 60.33 0.38 80.39 0.88 85.71 0.77 2DPCA-L1 non-greedy 66.63 1.01 87.49 1.47 86.34 0.71 N-2DPCA 59.99 0.57 80.37 0.91 85.39 0.73 2DPCAL1-S 60.37 0.54 80.38 0.85 85.91 0.69 F-2DPCA 67.35 0.95 89.19 0.70 91.60 0.74 (2) F-2DPCA is superior to the other five approaches. Compared with 2DPCA, which uses squared F-norm to measure similarity, F-norm is robust to outliers. Compared with the other ℓ1-norm approaches, F-2DPCA is intuitive and directly derived from 2DPCA. Moreover, F-2DPCA retains 2DPCA s desirable properties. For example, F-2DPCA considers the reconstruction error and the solution of F-2DPCA relates to the image covariance matrix. Another reason may be that F-2DPCA best optimizes the objective function. Fig- Table 2: The running time and the corresponding standard deviation on the Extended Yale B, AR and CMU PIE databases. Experiments Methods Extended AR CMU PIE Yale B 2DPCA 0.01 0.00 0.04 0.00 0.01 0.00 2DPCA-L1 7.07 0.7 11.30 1.37 9.20 1.27 L1-2DPCA non-greedy 7.05 0.19 14.71 0.10 7.97 0.09 N-2DPCA 12.69 1.38 24.36 5.16 15.41 3.15 2DPCAL1-S 6.66 0.48 11.73 0.79 8.33 0.71 F-2DPCA 3.35 0.08 7.20 1.32 4.70 0.21 No.of Iteration 0 5 10 15 20 25 30 35 40 45 50 Value of the function AR Extended Yale B CMU PIE Figure 5: Convergence curve of our method on three databases. ure 5 and table 2 illustrate that our proposed algorithm is fast and convergent. This is consistent with our theory analysis in the Convergence analysis section. Conclusions We present a robust unsupervised dimensionality reduction method, namely F-2DPCA. F-2DPCA uses F-norm instead of squared F-norm as distance metric to measure the reconstruction error in the criterion function. Compared with ℓ1norm, F-norm of matrix not only has rotational invariance but also retains 2DPCA s desirable properties such as rotational invariance. Moreover, our method explicitly takes into account the reconstruction error while ℓ1-norm based 2DPCA technique does not. To solve F-2DPCA, we present a fast iterative algorithm, which has a closed-form solution in each iteration and convergence. Experimental results on several face image databases illustrate the effectiveness and advantages of our proposed method. Acknowledgements This work is supported by National Natural Science Foundation of China under Grant 61271296, China Postdoctoral Science Foundation (Grant 2012M521747), the 111 Project of China (B08038), and Fundamental Research Funds for the Central Universities of China under Grant BDY21. References Belhumeur, P. N.; Hespanha, J. P.; and Kriegman, D. J. 1997. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Transactions on pattern analysis and machine intelligence 19(7):711 720. Chen, X.; Yang, J.; and Jin, Z. 2014. An improved linear discriminant analysis with l1-norm for robust feature extraction. In International Conference on Pattern Recognition, 1585 1590. Collins, M.; Dasgupta, S.; and Schapire, R. E. 2001. A generalization of principal components analysis to the exponential family. In Proceedings of Advances in Neural Information Processing Systems, 617 624. Ding, C.; Zhou, D.; He, X.; and Zha, H. 2006. R 1-pca: rotational invariant l 1-norm principal component analysis for robust subspace factorization. In Proceedings of the 23rd international conference on Machine learning, 281 288. ACM. Gao, Q.; Gao, F.; Zhang, H.; Hao, X.-J.; and Wang, X. 2013. Two-dimensional maximum local variation based on image euclidean distance for face recognition. IEEE Transactions on Image Processing 22(10):3807 3817. Georghiades, A. S.; Belhumeur, P. N.; and Kriegman, D. J. 2001. From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(6):643 660. He, X., and Niyogi, P. 2005. Locality preserving projections. In Proceedings of Advances in Neural Information Processing Systems, 186 197. He, X.; Cai, D.; Yan, S.; and Zhang, H. J. 2005. Neighborhood preserving embedding. In Tenth IEEE International Conference on Computer Vision, 1208 1213. Ke, Q., and Kanade, T. 2005. Robust l1 norm factorization in the presence of outliers and missing data by alternative convex programming. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 1, 739 746. Kwak, N. 2008. Principal component analysis based on l1norm maximization. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(9):1672 1680. Kwak, N. 2014. Principal component analysis by lp-norm maximization. IEEE Transactions on Cybernetics 44(5):594 609. Li, X.; Pang, Y.; and Yuan, Y. 2010. L1-norm-based 2dpca. IEEE Transactions on Systems Man and Cybernetics Part B Cybernetics 40(4):1170 1175. Lu, H.; Plataniotis, K. N.; and Venetsanopoulos, A. N. 2008. Mpca: Multilinear principal component analysis of tensor objects. IEEE Transactions on Neural Networks 19(1):18 39. Martinez, A. M. 1998. The ar face database. CVC Technical Report 24. Nie, F.; Huang, H.; Ding, C. H. Q.; Luo, D.; and Wang, H. 2011. Robust principal component analysis with non-greedy l1-norm maximization. In Proceedings of the International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, 1433 1438. Oh, J., and Kwak, N. 2016. Generalized mean for robust principal component analysis. Pattern Recognition 54:116 127. Pang, Y.; Li, X.; and Yuan, Y. 2010. Robust tensor analysis with l1-norm. IEEE Transactions on Circuits and Systems for Video Technology 20(2):172 178. Sim, T.; Baker, S.; and Bsat, M. 2002. The cmu pose, illumination, and expression (pie) database. In IEEE International Conference on Automatic Face and Gesture Recognition, 46 51. Turk, M., and Pentland, A. 1991. Eigenfaces for recognition. Journal of cognitive neuroscience 3(1):71 86. Wang, H., and Wang, J. 2013. 2dpca with l1-norm for simultaneously robust and sparse modelling. Neural Networks 46:190 198. Wang, R.; Nie, F.; Yang, X.; Gao, F.; and Yao, M. 2015. Robust 2dpca with non-greedy-norm maximization for image analysis. IEEE Transactions on Cybernetics 45(5):1108 1112. Yan, S.; Xu, D.; Zhang, B.; and Zhang, H.-J. 2005. Graph embedding: a general framework for dimensionality reduction. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2, 830 837. Yang, J.; Zhang, D.; Frangi, A. F.; and Yang, J. Y. 2004. Twodimensional pca: a new approach to appearance-based face representation and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(1):131 7. Yang, J.; Zhang, D.; Yong, X.; and Yang, J. Y. 2005. Twodimensional discriminant transform for face recognition. Pattern Recognition 38(7):1125 1129. Zhang, F.; Yang, J.; Qian, J.; and Xu, Y. 2015. Nuclear norm-based 2-dpca for extracting features from images. IEEE Transactions on Neural Networks and Learning Systems 26(10):2247 2260. Zhong, F., and Zhang, J. 2013. Linear discriminant analysis based on l1-norm maximization. IEEE Transactions on Image Processing 22(8):3018 27.