# a_selfrepresentation_induced_classifier__f8b6e8f0.pdf A Self-Representation Induced Classifier Pengfei Zhu1, Lei Zhang2, Wangmeng Zuo3, Xiangchu Feng4, Qinghua Hu1 1School of Computer Science and Technology, Tianjin University, Tianjin, China 2Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China 3School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China 4School of Mathematics and Statistics, Xidian University, Xian, China zhupengfei@tju.edu.cn Almost all the existing representation based classifiers represent a query sample as a linear combination of training samples, and their time and memory cost will increase rapidly with the number of training samples. We investigate the representation based classification problem from a rather different perspective in this paper, that is, we learn how each feature (i.e., each element) of a sample can be represented by the features of itself. Such a self-representation property of sample features can be readily employed for pattern classification and a novel self-representation induced classifier (SRIC) is proposed. SRIC learns a selfrepresentation matrix for each class. Given a query sample, its self-representation residual can be computed by each of the learned self-representation matrices, and classification can then be performed by comparing these residuals. In light of the principle of SRIC, a discriminative SRIC (DSRIC) method is developed. For each class, a discriminative selfrepresentation matrix is trained to minimize the self-representation residual of this class while representing little the features of other classes. Experimental results on different pattern recognition tasks show that DSRIC achieves comparable or superior recognition rate to state-of-the-art representation based classifiers, however, it is much more efficient and needs much less storage space. 1 Introduction Nearest neighbor classifier (NNC) has been widely used in machine learning and pattern recognition tasks such as face recognition [Turk and Pentland, 1991], handwritten digit recognition [Lee, 1991], and image classification [Boiman et al., 2008], etc. NNC measures the distance/similarity between the query sample and each of the training samples independently, and assigns the label of the nearest sample to the query sample. If the training samples are distributed densely enough, the classification error of NNC is bounded by twice the classification error of Bayesian classifier. NNC Corresponding author does not need the prior knowledge of sample distribution and it is parameter-free. However, NNC ignores the relationship between training samples [Vincent and Bengio, 2001], and often fails for high-dimensional pattern recognition tasks because of the curse of dimensionality [Bach, 2014]. Besides, all training samples should be stored in NNC and it becomes time-consuming in large scale problems [Muja and Lowe, 2014]. To reduce the computation burden of NNC and dilute the curse of dimensionality, nearest subspace classifier (NSC) was proposed. NSC measures the distance from the query sample to the subspace of each class and then classifies the query sample to its nearest subspace. The subspaces are often used to describe the appearance of objects under different lighting [Basri and Jacobs, 2003], viewpoint [Ullman and Basri, 1991], articulation [Torresani et al., 2001], and identity [Blanz and Vetter, 2003]. Each class can be modeled as a linear subspace [Chien and Wu, 2002], affine hull (AH) [Vincent and Bengio, 2001] or convex hull (CH) [Vincent and Bengio, 2001], hyperdisk [Cevikalp et al., 2008] or variable smooth manifold [Liu et al., 2011]. When one class is considered as a linear subspace, NSC actually represents a query sample by a linear combination of the samples in that class. In such a case, a set of projection matrices can be calculated offline, and thus NSC avoids the one-to-one searching process in NNC, reducing largely the time cost. Some approximate nearest subspace algorithms have also been proposed to further accelerate the searching process [Basri et al., 2011]. Whereas, NSC only considers the information of one class when calculating the distance from the query sample to this class, and it ignores the information of other classes. As a significant extension to NSC, the sparse representation based classifier (SRC) [Wright et al., 2009] exploits the information from all classes of training samples when representing the given query sample, and it has shown promising classification performance [Wright et al., 2009]. Specifically, SRC represents the query sample as a linear combination of all training samples with l1-norm sparsity constraint imposed on the representation coefficients, and then it classifies the query sample to the class with the minimal representation error [Wright et al., 2009]. In spite of the promising classification accuracy, SRC has to solve an l1-norm minimization problem for each query sample, which is very costly. It has been shown in [Zhang et al., 2011] that the collaborative rep- Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) resentation mechanism (i.e., using samples from all classes to collaboratively represent the query image) plays a more important role in the success of SRC. By using l2-norm to regularize the representation coefficients, the so-called collaborative representation based classification (CRC) demonstrates similar classification rates to SRC [Wright et al., 2009]. CRC has a closed-form solution to representing the query sample, and therefore has much lower computational cost than SRC. Inspired by SRC and CRC, in [Chi and Porikli, 2014] a collaborative representation optimized classifier (CROC) is proposed to pursue a balance between NSC and CRC. In [Yang et al., 2011], feature weights are introduced to the representation model to penalize pixels with large error so that the model is robust to outliers. A kernel sparse representation model is proposed by mapping features to a high dimensional reproducing kernel Hilbert space [Gao et al., 2013]. In [Zhang et al., 2015], a sparse representation classifier with manifold constraints transfer is proposed to add manifold priors to SRC. Different variants of sparse representation models are developed for face recognition with single sample per person as well [Gao et al., 2014]. In addition, dictionary learning methods have been proposed to learn discriminative dictionaries for representation based classifiers [Liu et al., 2014; Harandi and Salzmann, 2015; Quan et al., 2015]. Most of the current representation based classifiers, including NSC, SRC and CRC, are sample oriented, and they represent a query sample as a combination of training samples. The time and memory complexity of such a sample oriented representation strategy, however, will increase rapidly with the number of training samples. For instance, in the training stage the time complexities of NSC and CRC are O(Kn3) and O((Kn)3), respectively, where K is the number of classes and n is the number of samples per class. Clearly, the complexity is polynomial w.r.t. the training sample number. In the testing stage, the memory complexities of NSC and CRC are both O(d Kn), where d is the feature dimension. It is linear to the number of training sample and can be very costly for large scale pattern classification problems, where there are many classes and a lot of samples per class. Different from those previous representation based classifiers, in this paper we investigate the representation based classification problem from a feature oriented perspective. Instead of representing a sample as the linear combination of other samples, we propose to learn how each feature (i.e., each element) of a sample can be represented by the features of itself. Such a self-representation property of features generally holds for most high dimensional data, and has been applied in machine learning and computer vision fields [Xu et al., 2015]. For example, in [Mitra et al., 2002] this property is used to select the representative features by feature clustering. In [Zhu et al., 2015], a regularized self-representation model is proposed for unsupervised feature selection. Motivated by the self-representation property of sample features, we propose a novel self-representation induced classifier (SRIC), which learns a self-representation matrix for each class by its training data. To classify a query sample, we project it onto the learned self-representation matrix and compute its feature self-representation residual. The query sample is then classified to the class which has mini- mal feature self-representation residual. SRIC learns the selfrepresentation matrix individually for each class. In light of the principle of SRIC, we then present a discriminative SRIC (DSRIC) approach. Using all training data, for each class a discriminative self-representation matrix is trained to minimize the feature self-representation residual of this class while representing little the features of other classes. The classification of a query still depends on which class has the minimal feature self-representation residual. DSRIC is intuitive and easy to understand. The main contribution of this paper is summarized as follows We propose two novel feature-oriented representation classifiers, i.e., self-representation induced classifier (SRIC) and discriminative SRIC (DSRIC). The training and testing time complexity of SRIC and DSRIC is irrelevant to the number of samples. We prove that SRIC is equivalent to NSC with l2-norm regularization in terms of the final classification decision. Furthermore, we also prove that SRIC is essentially the principal component analysis (PCA) with eigenvalue shrinkage. Extensive experiments show that DSRIC has compara- ble or superior recognition rate to state-of-the-art representation based classifiers such as SRC and CRC; however, our theoretical complexity analysis and experimental results will show that DSRIC is much more efficient and needs much less storage space than other representation based classifiers. 2 Self-representation for classification 2.1 Nearest subspace classifier Suppose that we have a set of training samples from K classes X = [X1, ..., Xk, ..., XK], where Xk = [x1k, ..., xik, ..., xnk] 2 Rd n, is the sample subset of class k and xik is the ith sample of it, d is the feature dimension and n is the number of training samples in each class. Given a query sample z, the nearest subspace classifier (NSC) represents it by the samples of class k as: z = Xkak + ek (1) where ak is the representation vector and ek is the representation residual vector. To get an optimal representation of z, NSC minimizes the representation residual by solving the following least square problem: ˆak = arg minak kz Xkakk2 The problem in Eq. (2) has a closed-from solution ˆak = (XT k Xk) 1 is non-singular. In practice, an l2-norm regularization can be imposed on ak to make (XT k Xk) 1 more stable, resulting in an l2-norm regularized least regression problem: ˆak = arg minak kz Xkakk2 2 + λ kakk2 The analytical solution to Eq. (3) is ˆak = (XT k Xk + λI) 1XT k z, where I is an identity matrix. Then the representation residual can be computed as rk = !!!z Xk(XT 2. NSC classifies z to the class with the minimal representation residual. Let k Xk + λI) 1XT The classification rule of NSC can be written as label(z) = arg mink kz Wkzk2 Clearly, NSC learns a set of symmetric matrices Wk 2 n, k = , where Hk 2