# euler_sparse_representation_for_image_classification__63029f4c.pdf Euler Sparse Representation for Image Classification Yang Liu,1 Quanxue Gao,1 Jungong Han,2 Shujian Wang 1 1State Key Laboratory of Integrated Services Networks, Xidian University, Xi an, China 2School of Computing and Communications, Lancaster University, United Kingdom xidianliuyang@gmail.com, qxgao@xidian.edu.cn, jungonghan77@gmail.com, wangsj079@163.com Sparse representation based classification (SRC) has gained great success in image recognition. Motivated by the fact that kernel trick can capture the nonlinear similarity of features, which may help improve the separability and margin between nearby data points, we propose Euler SRC for image classification, which is essentially the SRC with Euler sparse representation. To be specific, it first maps the images into the complex space by Euler representation, which has a negligible effect for outliers and illumination, and then performs complex SRC with Euler representation. The major advantage of our method is that Euler representation is explicit with no increase of the image space dimensionality, thereby enabling this technique to be easily deployed in real applications. To solve Euler SRC, we present an efficient algorithm, which is fast and has good convergence. Extensive experimental results illustrate that Euler SRC outperforms traditional SRC and achieves better performance for image classification. Introduction Image classification is one of the active topics in pattern recognition and machine learning. In many applications, each image is usually represented as a point in the high-dimensional image space. However, many works have demonstrated that the naturally generated images may reside on a lower dimensional manifold. Thus, it is beneficial to first perform a dimensionality reduction (DR) prior to utilizing any technique for image classification and retrieval. Principal component analysis (PCA) (Turk and Pentland 1991), linear discriminant analysis (LDA) (Belhumeur, Hespanha, and Kriegman 1997), locality preserving projection (LPP) (Niyogi 2004) and neighborhood preserving embedding (NPE) (He et al. 2005) are four of the most prevalent techniques for DR. PCA is an unsupervised technique and used to extract the most expressive features. LDA is considered to be capable of extracting the most discriminating features. Different from PCA and LDA, which preserve the global geometric structure, LPP and NPE characterize the local geometric structure of images. Motivated by these methods, many related approaches have been developed for dimensionality reduction, although Corresponding author: Q. Gao. Copyright c 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. the motivations of these methods are different, they employ squared ℓ2-norm as the distance metric in the objective function to measure the similarity between images. It is commonly known that squared ℓ2-norm is not robust in the sense that outlying measurements can arbitrarily skew the solution from the desired solution (Ke and Kanade 2005; Cand es et al. 2011; Oh et al. 2016). To improve the robustness of subspace learning methods, some ℓ1-norm based subspace learning methods have been developed, which can be broadly divided into two categories: ℓ1-norm based PCA technique and ℓ1-norm based LDA technique. As can be easily derived from the name, this sort of DR algorithms employs l1 normal as the distance metric in the objective function to measure the similarity between data, where the representatives include L1-PCA (Ke and Kanade 2005), PCAL1 (Kwak 2008), and LDA-L1 (Liu et al. 2016). Another important area requiring further investigation is the design of classifier for image classification. Nearest neighbor (NN) classifier has been widely used in classification stage by using Euclidean distance measure. It is easy to see that Euclidean distance is very sensitive to shape variation. To handle this problem, Li and Lu (Li and Lu 1999) proposed the nearest feature line (NFL) classifier. Motivated by the impressive results, Chien and Wu (Chien and Wu 2002) proposed two robust classifiers: nearest feature plane (NFP) and nearest feature space (NFS) to improve face recognition system. Gao and Wang (Gao and Wang 2007) proposed center-based nearest neighbor classifier to deal with pattern classification. Wang and Zhang (Wang and Zhang 2004) proposed linear generalization subspace (LGS) to achieve higher recognition rate using only limited training images. Zheng et al. (Zheng, Zhao, and Zou 2004) developed two classifiers (nearest neighbor line and nearest neighbor plane) to improve the robustness of classification. Recently, sparse representation has shown its superior performance in image classification and computer vision (Zhang et al. 2016; Wright et al. 2009; Gao, Tsang, and Chia 2010). Wright et al. (Wright et al. 2009) proposed a sparse representation based classification (SRC) algorithm for face recognition, and obtained impressive results in experiments. Inspired by SRC, sparse representation has been widely used in image recognition and denoising. To improve performance of SRC, kernel sparse representation has become an active topic in image recognition (Harandi et al. The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) 2012; Gao, Tsang, and Chia 2010). Despite its relatively better performance, an essential mapping from the image to a high-dimensional hidden space required by those methods is computational expensive, thus impeding their usage in realworld applications. Inspired by the fact that kernel trick can capture the nonlinear similarity of features, which can suppress outliers, we propose Euler SRC for image classification. Euler SRC is essentially the SRC in Euler space. It comes from the same DR family tree as the kernel sparse representation, but they are clearly different in the way of organizing mapping. More specifically, Euler SRC first maps the images into the complex space by Euler representation, which has a negligible effect for outliers and illumination, and then performs complex SRC with Euler representation. The main advantage of our method is that Euler representation not only is explicit but also does not increase the dimensionality of image space, enabling the fast and simple computation. Extensive experiments illustrate the effectiveness of proposed approach. Sparse representation based classification Classification is one of the particularly simple applications for sparse representation. It assumes that the training samples from a single class do lie on a subspace. Thus, for any test sample y Rm , it will approximately lie in the linear span of the samples, which has the same class label with y. Assume that we have a new matrix X Rm N for the entire training set sampled from the C classes, then y can be represented as a linear representation of the entire training samples. The corresponding coefficient vector s RN can be obtained by solving the following objective function (Cand es and Tao 2006). min s s 1 s.t. Xs = y (1) where 1 denotes the ℓ1-norm, which counts the sum of modulus of each element in a vector. The objective function (1) illustrates that y can be exactly represented. In real applications, data are noisy, implying that it is impossible to exactly express y as a sparse superposition of the training samples. In this case, the objective function (1) is modified as follows (Wright et al. 2010). min s 1 2 Xs y 2 2 + α s 1, α> 0 (2) After obtaining the coefficient vector s by the objective function (2), SRC classifies the test sample y according to residual error. To be specific, for each class i, let δi : RN RN be the characteristic function that selects the coefficients associated with the ith class. δi(s) RN is a new vector whose only nonzero entries are the entries in s that are associated with class i. Then, we assign y to be the jth class if rj(y) = min i ri(y) = y Xδi(s) 2 (3) It is easy to see that SRC has a good classification performance in image recognition, especially for images with occlusion. As can be seen in the objective function (2), SRC employs squared ℓ2-norm as the distance metric in the first term in the objective function (2). It is commonly known that the variations between images, which have the same class label under different illumination and time, is larger than the change of the image identity under the squared ℓ2-norm distance metric. Thus, this affects the robustness of SRC and reduces the flexibility of SRC. To solve this problem, we introduce Euler SRC in the following section. Euler SRC Motivation To achieve robustness, the contribution of distance metric to the criterion function (2) should reduce the effect of large distance. Motivated by the fact that kernel trick can capture the nonlinear similarity of features, which can suppress outliers, we propose Euler SRC for image classification. Prior to formulating the proposed Euler SRC, we first introduce cosine kernel function, i.e., cosine distance metric and then analyze its advantages. Definition 1 (Fitch et al. 2005; Liwicki et al. 2013) Given two arbitrary vectors xj and xq Rm, the cosine distance metric between them is d(xj, xq) = c=1 {1 cos (απ(xj(c) xq(c)))} (4) where xj(c) is the cth element of xj. Let us consider two motivating examples in which different dissimilarity measures are applied to the images shown in Figure 1. Figure 1 shows one image, which is marked by A, and the other eight images that are its 4 nearest neighbor images having the same class label with it and its 4 nearest neighbor images having different class labels with it. Table 1 and Table 2 list the distance between A and the other 8 images under different distance metrics. As can be seen in Figure 1, Table 1 and Table 2, the ℓ2-norm associates a smaller margin (8.6263-7.1618 = 1.1645). In contrast, the use of the cosine-based measure results in a large margin (488.2649-451.0045=37.2604). It shows that cosine distance metric can help obtain a large margin. Thus, cosine distance metric, which is defined in Eq. (4), is an effective kernel function. It is not only robust to outliers but also suitable for classification compared with the squared ℓ2-norm. Table 1: Comparison of normalized dissimilarity measures between images having the same class label in Figure 1. Dissimilarity A-B A-C A-D A-E Euclidean 5.1609 5.7977 6.173 7.1618 Cosine-based 192.9056 296.278 320.8623 451.0045 Table 2: Comparison of normalized dissimilarity measures between images having different class labels in Figure 1. Dissimilarity A-F A-G A-H A-K Euclidean 8.6263 8.7619 9.0132 9.0332 Cosine-based 488.2649 682.7639 570.5157 683.3511 Combining the aforementioned analysis, we can get the following interesting conclusion: Figure 1: Original image and its 4 nearest images having the same class label and its 4 nearest images having different class labels with it. Images with the same shape belong to the same person. First, the cosine distance metric enlarges the distance between all image so it can help obtain a large margin and make it easy to realize classification. Second, the distance between the same class images was enlarged a smaller multiple than the distance between the different class images by the cosine distance metric. Therefore, compared with Euclidean distance, cosine distance metric not only can help obtain a large margin but also improve the separability between different class image. Combining the aforementioned analysis for cosine distance metric, we can easy build a robust formulation for SRC. To be specific, we use cosine distance instead of squared ℓ2-norm as the distance metric in the image pixel space. By simple algebra, Eq. (4) becomes d(xj, xq)= m c=1 {1 cos (γπ(xj(c) xq(c)))} 2(eiγπxj eiγπxq) 2 2 = zj zq 2 2 eiγπxj(1) ... eiγπxj(c) 2eiγπxj (6) is called Euler representation of xj. Eq. (5) illustrates that cosine distance metric between two images in pixel space is equivalent to squared ℓ2-norm between the corresponding two vectors with Euler representation. Moreover, compared with cosine distance metric, squared ℓ2-norm is easy to be performed in solving the optimization problem. Thus, our proposed robust method first maps the intensity values xj normalized in [0, 1] onto the complex representation zj Cm and then performs complex SRC in Euler space. Euler SRC Denote by A = 1 2eiαπX and b = 1 2eiγπy, according to the aforementioned analysis, Euler SRC aims to seek the coefficient vector z by solving the following objective function. min z 1 2 Az b 2 2 + α z 1, α> 0 (7) where z 2 2 is defined as z 2 2 = z z, where z is the complex conjugate transpose of z and z 1 is defined as z 1 = m i=1 |zi| with zi the ith component of z. Before solving the objective function (7), we first introduce the following definition. Definition 2 (Fletcher 2013). Subgradient of z 1, denoted by z 1, as follows: z 1 = u| ui = zi |zi| if zi = 0, and |ui| 1 otherwise (8) Theorem 1 (Complex SRC for sparse coding): For the following function arg min z ( b Az 2 2 + α z 1) (9) where matrix A and b are complex valued matrix and vector respectively. The sparse representation z corresponding to the input signal b can be iteratively solved by z = |Z| A (A |Z| A + αI) 1b (10) where Z is a square diagonal matrix built upon the components of vector z, i.e., Z = diag(z). |Z| is also a quare diagonal matrix whose diagonal elements are the abstract value of the corresponding diagonal elements of Z. Proof: The Lagrangian function of the problem (9) is 2 b Az 2 2 + α z 1 (11) The KKT condition for optimal solution of the problem (11) specifies that the gradient of L w.r.t. zi must be zero, i.e., L z = A (Az b) + αu = 0 (12) Note that, u is related to z. To get the root of the model (12), we adopt a fixed point iteration algorithm to solve it. The core of fixed point algorithm is to convert the equation to the form z = g(z). According to the definition of subgradient of z 1, we have u = |Z| 1z (13) In Eq.(13), it may have 0 0 when the elements of Z have zero. We only use it for the sake of subsequent derivation and do not calculate it. Moreover, it will vanish in the final representation (See Eq. (14)). Substituting (13) into (12), and by simple algebra, we get z = (A A + α|Z| 1) 1A b = {(A A |Z| + αI)|Z| 1} 1A b = |Z| (A A |Z| + αI) 1A b According to (A A |Z|+αI)A = A (A |Z| A +αI), we have (A A |Z| + αI) 1A = A (A |Z| A + αI) 1 (15) Combining (14) and (15) yields z = |Z| A (A |Z| A + αI) 1b (16) As can be seen in Eq. (16), we can iteratively solve the complex sparse vector of the model (9) by Eq. (16). According to the fixed point iteration algorithm and Eq. (16), we summarize the pseudo code for solving the objective function (7) in Algorithm 1. Algorithm 1: Euler SRC algorithm Input: Training Sample Matrix X Rm N, Parameter γ (0 < γ < 2), α(α > 0), t = 1, τ = 10 6. Initialize z1 = 1. Procedure 1. Map each column vector in X in [0,1]. 2. Calculate A = 1 2 exp(γ πi X). 3. If m > N, calculate zt+1 by Eq. (14), otherwise, calculate zt+1 by Eq. (16). 4. t = t + 1, go to step 3 until zt+1 zt 1 < τ. 5. Return zt. Experiments In this section, we validate our proposed method on four databases (COIL20, AR, PIE and LFWcrop), and compare it with the recently proposed algorithms SRC (Wright et al. 2009) and Riemannian Sparse Representation (RSR) (Harandi et al. 2012). In the experiments, α = 0.5, PCA and down-sampling are selected as preprocessing for each approach respectively, where PCA reduces dimensionality to 200. In Euler SRC, we set γ = 1.9 in the following experiments. Experiments on Image Classification The AR database (Martinez 1998) contains over 4000 color face images of 126 people, including frontal views of faces with different facial expressions, lighting conditions and occlusions. The pictures of 120 individuals (65 men and 55 women) are taken in two sessions (separated by two weeks). Figure 2: Some samples in the AR dataset. Figure 3: Some samples in the COIL20 dataset. Figure 4: Some samples in the PIE dataset. Figure 5: Some samples in the LFWCrop dataset. Each session contains 13 color images, in which 6 of them are with occlusion and the other 7 full facial images are with different facial expressions and lighting conditions. In the experiments, we manually cropped the face portion of the image and then normalized it to 50 40 pixels. Figure 2 shows some normalized images of one person. We do three group experiments in this database. Experiment 1: We consider the corrupted training images with sunglasses. We select facial images (a)-(g) and image (h) with sunglasses for training, and the remaining full facial images and the images with sunglasses, which include 2 images from session one and 3 images from session two, for testing. In summary, we have 960 training images and 1440 testing images. Experiment 2: We consider the impact of occlusion images corrupted by scarfs, which occlude about 40% of the face image, for image classification. In the experiments, we select images from (a) to (g) and image (k) per person for training, and images from (n) to (t) and image (x) per person for testing. Thus, we again end up with 960 training images and 1440 testing images. Experiment 3: We consider the impact of occlusion images corrupted by sunglasses and scarfs. In this experiment, we select 7 full facial images from (a) to (g) and two corrupted images (h) and (k) per person for training, and the remaining images for testing. Thus, we have 1080 Table 3: The average classification rate (%) and standard deviation on the AR dataset using PCA as a preprocessing. Methods SRC RSR Euler SRC Exp.1 80.00 0.49 81.26 0.39 82.98 0.42 Exp.2 77.74 1.12 81.29 0.33 83.63 0.83 Exp.3 78.72 0.19 79.90 0.20 81.96 1.15 Table 4: The average classification rate (%) and standard deviation on the AR dataset using down-sampling as a preprocessing. Methods SRC RSR Euler SRC Exp.1 77.80 0.40 79.01 0.39 80.32 0.89 Exp.2 75.90 0.19 77.00 0.29 77.25 0.29 Exp.3 76.21 0.11 80.23 0.17 81.37 1.14 Table 5: The average classification rate (%) and standard deviation on the COIL20, PIE and LFWCrop datasets using PCA as a preprocessing. Methods SRC RSR Euler SRC COIL20 98.71 0.77 98.93 0.27 99.28 0.50 PIE 96.76 1.55 97.01 1.23 97.74 1.53 LFWCrop 50.40 3.04 53.62 1.01 55.13 2.86 Table 6: The average classification rate (%) and standard deviation on the COIL20, PIE and LFWCrop datasets using down-sampling as a preprocessing. Methods SRC RSR Euler SRC COIL20 99.05 0.60 99.51 0.31 99.63 0.29 PIE 96.72 1.64 97.03 0.91 97.99 1.82 LFWCrop 51.03 2.10 51.06 1.01 52.20 1.86 training images and 2040 testing images. All of the aforementioned experiments are repeated 3 times. The COIL20 database (Nene et al. 1996) includes 1440 color images of 20 objects (72 images per object). The object has a wide variety of complex geometric and reflectance characteristics. This database, called Columbia Object Image Library (COIL-20), was used in a real-time 20 object recognition system. Each object was placed in a stable configuration at approximately the center of the turntable. The turntable was then rotated through 360 degrees and 72 images were taken per object; one at every 5 degrees of rotation. Some sample images of one object are shown in Figure 3. In this database, we randomly select 10 images per subject for training, and the remaining images for testing. All of experiments are repeated 10 times. The CMU PIE database (Sim, Baker, and Bsat 2002) contains 68 subjects with 41368 face images as a whole. Face images of each person are captured by 13 synchronized cameras and 21 flashes, under 43 different illumination conditions, and with 4 different expressions. In the experiment, we select pose C05 (a nearly frontal pose) as gallery which includes 68 persons and 49 images for each person. All images were manually aligned, cropped, and resized to 64 64 pixels. Some sample images of one person are shown in Figure 4. In this database, we randomly select 7 images per subject for training, and the remaining images for testing. All of experiments are repeated 10 times. 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 99 classification accuracy Figure 6: The average accuracy of Euler SRC vs. γ on the COIL20 database 0 10 20 30 40 50 60 70 80 90 100 0 value of the objective function number of iteration AR COIL20 PIE LFWCrop Figure 7: Convergence curve of Euler SRC on four databases. The LFWCrop database (Sanderson and Lovell 2009) is a cropped version of the Labeled Faces in the Wild (LFW) dataset (Huang et al. 2007). In the vast majority of images, almost all of the background is omitted. The extracted area was then scaled to a size of 64x64 pixels. The cropped faces in LFWCrop exhibit real-life conditions, including misalignment, scale variations, in-plane as well as out-of-plane rotations. Some sample images of one person are shown in Figure 5. In the experiment, we choose person who has more than 20 photos but less than 100 photos as the sub-dataset, which contains 57 classes and 1883 images. For each person, we randomly choose ninety percent of images for training, and the remaining images for testing. We repeat this process 10 times. Figure 8: Face recovery results of Euler SRC on the AR dataset. Figure 9: Face recovery results of SRC on the AR dataset. Table 3 and Table 4 list the average classification accuracy and standard deviation on the AR dataset. Table 5 and Table 6 list the average classification accuracy and standard deviation on the COIL20, PIE and LFWCrop datasets, respectively. Figure 6 plots the recognition accuracy of Euler SRC vs. parameter γ on the COIL20 database. Figure 7 shows the convergence curve of Euler SRC on the AR, COIL20, PIE and LFWCrop databases respectively. Comparing the aforementioned experiments, we have the several interesting observations as follows. Table 3 to Table 6 show that Euler SRC is overall superior to SRC and RSR. This is probably because that Euler space can help improve the separability of data and obtain a large margin for different classes. Moreover, SRC and RSR are performed in image pixels space, which is very sensitive to outliers and illumination. Figure 6 shows that when γ = 1.9, we get a best classification accuracy on the COIL20 database. This is probably because that Euler representation can well suppress outliers. It is consistent with (Liwicki et al. 2013). Figure 6 also illustrates that parameter gamma has a small effect for the performance of Euler SRC. Figure 7 illustrates that our proposed iterative algorithm monotonically decreases the objective function in each iteration and can get converged within a few iterations, although we cannot prove it in theoretical analysis. We will study it in our future work. Experiments on Face Representation To illustrate the robustness of Euler SRC, We evaluate Euler SRC and SRC for face representation on the AR database database. The training images is the same as that in the aforementioned experiments in the AR dataset. The original face images, recovered faces and error faces of Euler SRC and SRC are described in Figure 8 and Figure 9, respectively. As can be seen in Figure 8 and Figure 9, Euler SRC well remove the occlusions from the noise images. It means that Euler SRC is superior to SRC for image reconstruction. The reason may be that Euler representation is more robust to outliers than image pixels representation. Conclusions We present a robust Euler sparse representation, which is called Euler SRC, for image classification. Different from kernel SRC, Euler SRC maps the images into a Euler space by explicit Euler representation and does not increase the dimensionality of image space. Thus, it is easy to realize Euler SRC in real applications. We provide an algorithm to find an exact sparse representation, which can best optimize the objective function. Experiments on the AR, COIL20, PIE and LFWCrop databases illustrate the advantages of the proposed approach. Acknowledgements This work is supported by National Natural Science Foundation of China under Grant 61271296 and 61773302, and the 111 Project of China (B08038). Belhumeur, P. N.; Hespanha, J. P.; and Kriegman, D. J. 1997. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(7):711 720. Cand es, E. J., and Tao, T. 2006. Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE transactions on information theory 52(12):5406 5425. Cand es, E. J.; Li, X.; Ma, Y.; and Wright, J. 2011. Robust principal component analysis? Journal of the ACM (JACM) 58(3):11. Chien, J. T., and Wu, C. C. 2002. Discriminant waveletfaces and nearest feature classifiers for face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(12):1644 1649. Fitch, A. J.; Kadyrov, A.; Christmas, W. J.; and Kittler, J. 2005. Fast robust correlation. IEEE Transactions on Image Processing 14(8):1063 1073. Fletcher, R. 2013. Practical methods of optimization. John Wiley & Sons. Gao, Q. B., and Wang, Z. Z. 2007. Center-based nearest neighbor classifier. Pattern Recognition 40(1):346 349. Gao, S.; Tsang, I. W.-H.; and Chia, L.-T. 2010. Kernel sparse representation for image classification and face recognition. In European Conference on Computer Vision, 1 14. Harandi, M. T.; Sanderson, C.; Hartley, R.; and Lovell, B. C. 2012. Sparse coding and dictionary learning for symmetric positive definite matrices: A kernel approach. In Computer Vision ECCV 2012. Springer. 216 229. He, X.; Cai, D.; Yan, S.; and Zhang, H.-J. 2005. Neighborhood preserving embedding. In IEEE International Conference on Computer Vision, volume 2, 1208 1213. Beijing, China: IEEE. Huang, G. B.; Ramesh, M.; Berg, T.; and Learned-Miller, E. 2007. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical report, Technical Report 07-49, University of Massachusetts, Amherst. Ke, Q., and Kanade, T. 2005. Robust L1 norm factorization in the presence of outliers and missing data by alternative convex programming. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 1, 739 746. San Diego, CA: IEEE. Kwak, N. 2008. Principal component analysis based on l1norm maximization. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(9):1672 1680. Li, S. Z., and Lu, J. 1999. Face recognition using the nearest feature line method. IEEE Transactions on Neural Networks 10(2):439 443. Liu, Y.; Gao, Q.; Gao, X.; Nie, F.; and Li, Y. 2016. A nongreedy algorithm for l1-norm lda. IEEE Transactions on Image Processing 69(1):DOI: 10.1109/TIP.2016.2621667. Liwicki, S.; Tzimiropoulos, G.; Zafeiriou, S.; and Pantic, M. 2013. Euler principal component analysis. International Journal of Computer Vision 101(3):498 518. Martinez, A. M. 1998. The ar face database. CVC Technical Report 24. Nene, S. A.; Nayar, S. K.; Murase, H.; et al. 1996. Columbia object image library (coil-20). Technical report, Technical report CUCS-005-96. Niyogi, X. 2004. Locality preserving projections. In Neural information processing systems, volume 16, 153. Oh, T. H.; Tai, Y. W.; Bazin, J. C.; Kim, H.; and Kweon, I. S. 2016. Partial sum minimization of singular values in robust pca: Algorithm and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 38(4):744 758. Sanderson, C., and Lovell, B. C. 2009. Multi-Region Probabilistic Histograms for Robust and Scalable Identity Inference. Springer Berlin Heidelberg. Sim, T.; Baker, S.; and Bsat, M. 2002. The cmu pose, illumination, and expression (pie) database. In IEEE International Conference on Automatic Face and Gesture Recognition, 46 51. Turk, M., and Pentland, A. 1991. Eigenfaces for recognition. Journal of cognitive neuroscience 3(1):71 86. Wang, H., and Zhang, L. 2004. Linear generalization probe samples for face recognition. Pattern Recognition Letters 25(8):829 840. Wright, J.; Yang, A. Y.; Ganesh, A.; Sastry, S. S.; and Ma, Y. 2009. Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(2):210 227. Wright, J.; Ma, Y.; Mairal, J.; Sapiro, G.; Huang, T. S.; and Yan, S. 2010. Sparse representation for computer vision and pattern recognition. Proceedings of the IEEE 98(6):1031 1044. Zhang, Z.; Li, F.; Zhao, M.; Zhang, L.; and Yan, S. 2016. Joint low-rank and sparse principal feature coding for enhanced robust representation and visual classification. IEEE Transactions on Image Processing 25(6):2429 2443. Zheng, W.; Zhao, L.; and Zou, C. 2004. Locally nearest neighbor classifiers for pattern classification. Pattern Recognition 37(6):1307 1309.