# asymmetric_discrete_graph_hashing__5daf715a.pdf

Asymmetric Discrete Graph Hashing

Xiaoshuang Shi, Fuyong Xing, Kaidi Xu, Manish Sapkota, Lin Yang University of Florida, Gainesville, FL, 32611, USA xsshi2015@uﬂ.edu

Recently, many graph based hashing methods have been emerged to tackle large-scale problems. However, there exists two major bottlenecks: (1) directly learning discrete hashing codes is an NP-hard optimization problem; (2) the complexity of both storage and computational time to build a graph with n data points is O(n2). To address these two problems, in this paper, we propose a novel yet simple supervised graph based hashing method, asymmetric discrete graph hashing, by preserving the asymmetric discrete constraint and building an asymmetric afﬁnity matrix to learn compact binary codes. Speciﬁcally, we utilize two different instead of identical discrete matrices to better preserve the similarity of the graph with short binary codes.We generate the asymmetric afﬁnity matrix using m (m << n) selected anchors to approximate the similarity among all training data so that computational time and storage requirement can be signiﬁcantly improved. In addition, the proposed method jointly learns discrete binary codes and a low-dimensional projection matrix to further improve the retrieval accuracy. Extensive experiments on three benchmark large-scale databases demonstrate its superior performance over the recent state of the arts with lower training time costs.

Introduction

Hashing techniques have attracted considerable attention for tackling a variety of large-scale tasks in computer vision and machine learning. The core idea of hashing is to map high-dimensional data into a low-dimensional binary (Hamming) space with preserving the similarity among neighbors (Wang, Kumar, and Chang 2012). Hence the highdimensional data can be represented by a set of compact binary codes, which lead to signiﬁcant gains in both storage and query speed (Liu et al. 2014). Over the past decades, various hashing methods (Indyk and Motwani 1998), (Weiss, Torralba, and Fergus 2009), (Liu et al. 2011), (Gong et al. 2013), (Strecha et al. 2012), (Norouzi and Blei 2011), (Zhang et al. 2010) have been proposed. Generally, these methods can be grouped into two main categories: data-independent and datadependent. Locality-sensitive hashing (LSH) (Indyk and Motwani 1998) is one of representative data-independent

Copyright c 2017, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

methods, which utilizes simple random projections to generate binary codes, thereby requiring long binary codes to achieve good retrieval accuracy. To learn compact binary codes, an increasing number of methods focus on learning data-dependent hashing functions with available training data. Based on whether utilizing the semantic (label) information, data-dependent methods can be further classiﬁed into unsupervised and supervised hashing. Unsupervised hashing methods, such as spectral hashing (SH) (Weiss, Torralba, and Fergus 2009), anchor graph hashing (AGH) (Liu et al. 2011) and iterative quantization (ITQ) (Gong et al. 2013), try to explore intrinsic features of data to learn binary codes without any supervision. By contrast, supervised hashing methods, including linear discriminant analysis hashing (LDAH) (Strecha et al. 2012), minimal loss hashing (MLH) (Norouzi and Blei 2011) and semisupervised hashing (SSH) (Wang, Kumar, and Chang 2012), leverage semantic information to map the high-dimensional data into a low-dimensional binary space. Recently, supervised hashing has attracted increasing interest, since unsupervised hashing methods are not able to guarantee desired retrieval accuracy via semantic distances due to the semantic gap (Wang, Kumar, and Chang 2012). Meanwhile, compared to linear hashing methods, kernel-based hashing methods like binary reconstruction embedding (BRE) (Kulis and Darrell 2009), random maximum margin hashing (RMMH) (Joly and Buisson 2011), kernel supervised hashing (KSH) (Liu et al. 2012) and kernel supervised discrete hashing (KSDH) (Shi et al. 2016b) usually achieve better retrieval performance because they can capture the non-linear manifold structure hidden in the data.

Most of data-dependent hashing methods are derived from traditional dimensionality reduction techniques. One popular dimensionality reduction framework is graph embedding (Yan et al. 2007), on which many hashing methods like SH and AGH have been developed. However, there are two major bottlenecks for graph based hashing methods: (1) directly learning the discrete binary codes is an NP-hard problem; (2) the complexity of both storage and computational time to build a graph with n data points is O(n2). To obtain binary codes, most of graph based hashing methods utilize the relaxation+rounding schemes, which might generate accumulate quantization errors between hashing and projection functions, especially for large-scale training data

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17)

(Shen et al. 2015). Therefore, discrete graph hashing (DGH) (Liu et al. 2014) with preserving symmetric discrete constraint (using the element-wise product of two identical discrete matrices to preserve the similarity among neighbors) has been proposed to remove this limitation. To improve storage requirement and computational time of graph generation, DGH adopts one popular strategy, namely anchor graphs, to build an approximate symmetric neighbor graph (Liu et al. 2011). Furthermore, recent literature (Neyshabur et al. 2013) argues that the symmetric discrete constraint might be difﬁcult to optimize and not necessary to learn binary codes of training data, and the asymmetric discrete constraint (using the element-wise product of two different discrete matrices to approximate the similar matrix) might be more powerful to learn shorter binary codes. However, the optimization problem in (Neyshabur et al. 2013) is not completely same to the problem in graph based hashing algorithms. In addition, the complexity of both storage and computational time of the algorithm (Neyshabur et al. 2013) is O(n2), which requires large computer memory and consumes high training costs for a large n. Motivated by the aforementioned observations, in this paper, we propose a novel supervised graph based hashing method, asymmetric discrete graph hashing (ADGH), that can utilize the semantic information to encode highdimensional data into compact binary codes with low training time costs. Speciﬁcally, instead of preserving the symmetric discrete constraint, we preserve the asymmetric discrete constraint in the proposed optimization model. To reduce the size of the graph, unlike the anchor graphs strategy generating an n n symmetric afﬁnity matrix, we utilize a small number m (m << n) anchors to build an n m asymmetric afﬁnity matrix to approximate the similarity among all training data. Besides directly learning binary codes of training data, the proposed method can simultaneously learn the low-dimensional projection matrix. In summary, our contributions are:

We propose a novel supervised graph based hashing model by preserving the asymmetric discrete constraint and building an asymmetric afﬁnity matrix. In addition, our model can jointly learn binary codes of training data and a low-dimensional projection matrix, which can further improve the retrieval accuracy.

We present an optimization procedure to solve the proposed model in an alternative and efﬁcient manner, and analyze its convergence and time complexity.

We evaluate the proposed method on three popular largescale image and video databases, and demonstrate its superior performance over the state of the arts with lower training time costs.

Graph based asymmetric discrete hashing

Given data X = {x1, x2, , xn} Rd n, where d and n are the number of predictors and observations, respectively. Without loss of generality, in this paper, the data {xi}n i=1 are assumed to be zero centered, i.e. n i=1 xi = 0. These data points are used to construct a graph G = {X, W},

where W is a symmetric afﬁnity matrix with wij characterizing the favorite similarity relationship between xi and xj. The purpose of traditional graph methods is to map the high-dimensional data into a low-dimensional space with similarity among neighbors best preserved. Suppose that Y Rn r is the corresponding low-dimensional data of X, one popular formulation can be written as follows:

min Y Tr YT LY , s.t. YT Y = Ir, (1)

where L = S W is the graph Laplacian, and S is a diagonal matrix with the i-th diagonal element sii =

j wij. Many graph based hashing algorithms (Weiss, Torralba, and Fergus 2009), (Liu et al. 2011) have been developed based on Eq. (1). However, most of them use the relaxation+rounding schemes to learn binary codes, which might generate the accumulated quantization errors between hashing and projection functions. Although DGH directly learn the binary codes via preserving the symmetric discrete constraint, the optimization procedure is relatively difﬁcult to optimize and requires high training time cost. Meanwhile, the symmetric discrete constraint might require longer binary codes than the asymmetric discrete constraint to preserve the similarity of the afﬁnity matrix. For clear illustration, we present a lemma as follows: Lemma 1: Given a symmetric positive semideﬁnite matrix M Rn n, for any discrete matrix B1 { 1}n r, there exists a discrete matrix B2 { 1}n r such that Tr BT 1 MB1 Tr BT 1 MB2 . (Proof in the supplemental material) In Lemma 1, a larger objective value of Tr BT 1 MB2 suggests that the discrete matrices can preserve more similarity (label) information, which usually means better retrieval accuracy. Dimensionality reduction methods like principal component analysis (PCA) (Wright et al. 2009), (Cand es et al. 2011) and nonnegative matrix factorizations (Recht et al. 2012) (Zhou, Bian, and Tao 2013), suggest that a matrix with high dimensionality yet low-rank can be represented or approximated by a linear combination of basis vectors. Therefore, in our problem, the afﬁnity matrix W Rn n can be approximated by a factorization W VP, where V Rn m, V W and P Rm n is a weight matrix. Typically, m << n. Based on these observations, we propose a novel graph based hashing method, namely asymmetric discrete graph hashing (ADGH).

Model formulation

Suppose there exists a projection matrix A Rd r such that XT A = Y. Because YT Y = Ir and W VP, the optimization problem in Eq. (1) can be reformulated as:

max A Tr AT XVPXT A , s.t. AT XXT A = Ir. (2)

Although V can be a subset of W, it is computationally expensive to calculate the matrix P. To avoid the calculation of P, we assume Z = PXT Rm d to be anchors. Then

the matrix V can be viewed as the afﬁnity matrix characterizing the relationship between the data X and the anchors Z, and the projection matrix A is to maintain their similarity into a low-dimensional space. To learn compact binary codes of X and Z with similarity preserved, we consider the requirement of the maximum information of each hash bit and the minimum redundancy among different hash bits, and formulate the following optimization problem:

max B,D Tr BT VD ,

s.t. B { 1, 1}n r , D { 1, 1}m r , BT 1n = 0, BT B = n Ir, DT 1m = 0, DT D = m Ir,

where 1m Rm is a column vector with all elements being one, the discrete matrices B and D represent the binary codes of the data X and the anchors Z, respectively. It is difﬁcult to solve Eq. (3) due to the hard constraints. To compute the binary code matrices B, D and the projection matrix A, we soften the hard constraints to obtain:

max B,D,A Tr BT VD γ

2 B XT A 2 F ,

s.t. B { 1, 1}n r , D { 1, 1}m r , AT XXT A = n Ir,

where the parameter γ is to weight the similarity information and the error between the discrete binary code matrix B and the projection function f(X) = XT A. Since AT X1n = 0, AT XXT A = n Ir that can reduce the redundancy among data points (Shi et al. 2016a), when γ , the binary matrix B satisﬁes the constraint BT 1n = 0 and BT B = n Ir. Note that we ignore the inﬂuence of the discrete matrix D on the projection matrix A due to m << n, hence it is unnecessary to compute matrix P. In practice, we can select m data points X Rd m from X and build V based on the similarity between the selected data and the training data. To improve the robustness of the projection function to outliers, we replace the constraint AT XXT A = n Ir with a more general constraint AT XXT A n Ir 2 F ϵ, where ϵ is a very small coefﬁcient. Since Tr BT B = rn and Tr AT XXT A rn, we propose our graph based hashing framework:

max B,D,A Tr BT VD + γTr BT XT A ,

s.t. B { 1, 1}n r , D { 1, 1}m r , AT XXT A n Ir 2 F ϵ.

The framework incorporates the binary codes B, D and the projection matrix A so that embedding learning and regression model can be jointly implemented and optimized, which can further improve the robustness of the projection matrix A (Shi et al. 2015), (Hou et al. 2014). Out-of-Sample Hashing: After obtaining A, given a query data point q Rd, we will use the hashing functions H(q) = [h1(q), h2(q), , hr(q)] = sgn(a T i q) r i=1 to

achieve its binary codes, where

sgn(a T i q) = 1 if a T i q 0; 1 otherwise. (6)

Optimization process Since it is difﬁcult to directly to solve the problem in Eq. (5), we propose a tractable optimization manner via alternative solving the following four subproblems. B-subproblem:

max B Tr BT VD + γTr BT XT A ,

s.t. B { 1, 1}n r , (7)

from which we can obtain B = sgn(VD + γXT A). D-subproblem:

max D Tr BT VD , s.t. D { 1, 1}m r , (8)

whose optimal solution is D = sgn(VT B). C-subproblem:

max C Tr BT C , s.t. CT 1n = 0, CT C = n Ir, (9)

where C Rn r is an auxiliary matrix to replace XT A. Eq. (9) can be solved based on Theorem 1. Theorem 1: If rank(JB) = r, where J = 1n1T n 1

n1n1T n. Eq. (9) has a unique global solution C = n RQT , Q Rr r is obtained based on the SVD of BT JB = QΣ2QT and R = JBQΣ 1 Rn r. (Proof in the supplemental material) A-subproblem:

min A C XT A 2 F + λ A 2 F , (10)

whose solution is A = (XXT + λId) 1XC. Obviously, AT XXT A = CT XT (XXT + λId) 1XXT (XXT + λId) 1XC. If λ = 0, it is easy to achieve ϵ = 0; otherwise, ϵ > 0, and a larger λ means a larger ϵ. In summary, Algorithm 1 presents the optimization procedure to solve Eq. (5). In Algorithm 1, when λ = 0, the objective value in each iteration is non-decreasing and bounded. Therefore, we have the following theorem: Theorem 2: When λ = 0, the loop step in Algorithm 1 will monotonously non-decrease in each iteration and ﬁnally converge to an optima. (Proof in the supplemental material) In practice, when λ is small, Algorithm 1 can also converge to an optima because each subproblem has a closed form solution. For clarity, we show the convergence process with λ = 0.01 in Figure 1, which shows that the objective value is close to an optima after 3 iterations and mean average precision (MAP) becomes stable after 6 iterations.

Kernel Extension Since kernel hashing methods, which can capture the nonlinear manifold structure hidden in data, usually obtain better retrieval accuracy than linear hashing methods, we can further kernelize our graph based hashing model. Let φ: Rd H represent a kernel mapping from the original

Algorithm 1: ADGH Input: X Rd n, X Rd m, V Rn m, r, γ and λ. Output: B { 1, 1}n r, A Rd r. Initialize: t=0, let A0 be the eigenvectors of XV X and D0 = sgn( XA0), calculate M = (XXT + λId) 1X; Repeat Update Bt+1 = sgn(VDt + γXT At); Update Dt+1 = sgn(VT Bt+1); Update Ct+1 based on Theorem 1; Update At+1 = MCt+1; Until convergence

Iteration #

0 2 4 6 8 10 12 14 16 18 20

(a) @ λ = 0.01

Iteration #

2 4 6 8 10 12 14 16 18 20

(b) @ λ = 0.01

Figure 1: Inﬂuence of the number of iterations on retrieval performance and training time (We present the results of ADGH with 32-bit binary codes on You Tube face database with 100 classes and 30000 training data. The number (m) of selected anchors is 3000.). (a) Objective vs Iteration #, (b) MAP vs Iteration #.

space to the kernel space, where H is a Reproducing Kernel Hilbert Space (RKHS) with a kernel function K(x, y) = φ(x)φ(y). To map the data X into the kernel space, we randomly select l data points from X to construct a kernel matrix K Rl n. With normalizing the kernel matrix K to be K Rl n such that K1n = 0, we kernelize the graph based hashing model:

max B,D,Ak Tr BT VD + γTr BT KT Ak ,

s.t. B { 1, 1}n r , D { 1, 1}m r , AT k K KT Ak n Ir 2 F ϵ,

where Ak Rl r is a projection matrix to map the kernelized data into a low-dimensional space. Since Eq. (11) is similar to Eq. (5), it is easy to solve Eq. (11) using Algorithm 1 by replacing the data matrix X with the kernel matrix K. In this paper, the kernel version of ADGH is named KADGH.

Time Complexity Analysis In the initialization step of Algorithm 1, ADGH, the time complexity to calculate the matrices A0, D0 and M is O(nmd), O(mdr) and O(nd2), respectively. In the loop step, calculating the matrices Bt+1, Dt+1, Ct+1 and At+1 requires at most max(O(nmr), O(ndr)), O(nmr), O(nr2) and O(ndr) operations, respectively. Thus the time complexity of the loop step is max(O(knmr), O(kndr)), where k is the number of iterations, empirically k 6. The total complexity of Algorithm 1 is max(O(nmd), O(nd2)),

since usually kr < m and kr < d. Similarly, the time complexity of KADGH is max(O(nml), O(nl2)). In the test stage, for ADGH and KADGH, the time complexity of encoding one test sample is O(dr) and O(ld+lr), respectively.

We evaluate the proposed algorithms ADGH and KADGH on three benchmark large-scale image databases: CIFAR10 (Torralba, Fergus, and Freeman 2008), You Tube (Wolf, Hassner, and Maoz 2011) and Image Net (Deng et al. 2009). CIFAR-10 database is a labeled subset of 80M tiny images, constituted of 60K color images from ten object categories. Every category contains 6K images, each of which is cropped and aligned to 32 32 pixels and then represented by a 512-dimensional GIST feature vector (Oliva and Torralba 2001). You Tube face database consists of 1595 people, from which we randomly choose 100 individuals that each one has at least 310 images to form a subset for evaluation. Each face image is represented by a 1770-dimensional LBP feature vector (Ahonen, Hadid, and Pietikainen 2006). Image Net database contains more than 1.4 million labeled images. We adopt the ILSVRC-2012 dataset, containing 1.2 million images of 1000 object categories, from which 50 categories are selected to construct a subset, and convolutional neural network (CNN) (Le Cun, Bengio, and Hinton 2015) is used to extract a 4096-dimensional feature vector to represent each image.

Experimental setting

In our experiments, we split CIFAR-10 database into a training set (59K images) and a test query set (1K images), which consists of 10 categories with each containing 100 images. We also partition the selected set of You Tube face database into two parts: training and test query sets, where we randomly pick up 300 and 10 images of each individual for training and testing, respectively. For the subset of Image Net database, we randomly choose 1000 and 20 images from each category for training and testing, respectively. We compare ADGH and KADGH against four unsupervised methods, LSH (Indyk and Motwani 1998), SH (Weiss, Torralba, and Fergus 2009), AGH (Liu et al. 2011) and DGH (Liu et al. 2014), and ﬁve supervised methods, KSH (Liu et al. 2012), asymmetric Lin:V (Neyshabur et al. 2013), Fast Hash (Lin et al. 2014), SDH (Shen et al. 2015), and COSDISH (Kang, Li, and Zhou 2016). Apart from that, we also show the retrieval performance of supervised discrete graph hashing (SDGH) that preserves the symmetric discrete matrix to learn binary codes of training data (The details of the model formulation and the optimization procedure are explained in the supplemental material). We set the regularization parameter λ = 0.01 for all ADGH, KADGH and SDGH, γ = 0.1 for ADGH, γ = 0.01 for KADGH, and search the best γ during [0.01, 100] for SDGH. We choose the same kernel as KSH for KADGH in experiments. To construct the afﬁnity matrix V, we randomly select 10 percent samples of each class as anchors for ADGH and KADGH. This process is repeated 10 times and the average

Table 1: Hamming ranking performance (%) on CIFAR-10, You Tube face and Image Net databases (r is the number of bits of each hashing method).

Anchors Method

CIFAR-10 You Tube Image Net MAP/Top-5900 MAP/Top-300 MAP/Top-1000 r = 8 r = 16 r = 32 r = 16 r = 32 r = 64 r = 16 r = 32 r = 64

LSH 17.10 17.73 18.65 6.77 11.36 19.87 12.46 19.50 28.91 SH 17.04 17.10 16.41 22.36 29.60 35.52 29.02 32.44 36.53 AGH 1 22.84 22.47 23.28 22.05 31.02 37.25 39.92 49.21 51.14 AGH 2 22.45 21.72 22.39 16.37 26.70 32.81 34.26 44.14 49.17 DGH 23.62 21.89 22.21 14.40 30.20 38.12 28.36 40.54 45.64 - Fast Hash 48.66 57.81 62.90 39.20 48.00 51.00 67.98 78.10 81.24

COSDISH 45.70 52.65 57.55 27.30 44.26 51.70 71.73 81.00 82.85 Lin:V 43.50 49.30 47.30 35.10 36.36 37.21 69.50 67.00 66.50 SDGH 47.00 55.17 55.90 35.56 45.93 51.24 70.90 81.40 83.80 ADGH 51.70 53.16 56.60 39.61 47.60 52.77 75.80 82.60 83.80

KSH 37.46 43.92 47.65 36.74 45.55 48.97 52.67 62.68 66.84 SDH 41.50 59.80 61.24 36.60 46.80 47.48 67.30 72.22 75.90 KADGH 57.00 60.40 62.80 40.20 49.40 52.70 73.00 80.02 81.40

KSH 42.89 49.09 52.11 34.46 44.05 51.09 53.83 63.23 67.52 SDH 42.30 63.40 65.28 43.30 48.30 53.00 72.60 76.11 78.20 KADGH 62.50 64.30 67.50 44.50 52.38 55.10 78.40 82.90 83.60

Table 2: Training and test time on CIFAR-10, You Tube face and Image Net databases (The kernels in KSH, SDH and KADGH are constructed by 3000 anchors. r is the number of bits of each hashing method. All training and test time are in seconds.).

CIFAR-10 You Tube Image Net Train Time Test Time Train Time Test Time Train Time Test Time r = 8 r = 32 r = 32 r = 16 r = 64 r = 64 r = 16 r = 64 r = 64 LSH 0.18 0.25 3.0 10 6 0.28 0.36 1.4 10 5 0.49 0.64 2.0 10 5

SH 0.76 1.00 1.0 10 5 1.43 3.06 4.2 10 5 9.38 10.95 4.0 10 5

AGH 1 202.81 183.90 2.8 10 4 28.51 31.67 3.3 10 4 93.37 93.67 6.3 10 4

AGH 2 211.33 190.91 2.7 10 4 30.03 32.57 3.5 10 4 104.46 105.76 6.7 10 4

DGH > 10000 > 10000 3.1 10 4 300.76 364.16 3.3 10 4 525.19 633.40 7.0 10 4

Fast Hash 279.31 975.63 4.4 10 4 221.63 768.12 8.3 10 4 447.05 1290.67 8.5 10 4

COSDISH 3.47 26.77 6.4 10 6 6.93 52.41 1.4 10 5 26.78 98.89 2.0 10 5

Lin:V > 10000 > 60000 1.1 10 4 > 20000 > 50000 9.0 10 5 > 30000 > 90000 1.8 10 5

SDGH 55.09 199.44 6.4 10 6 46.88 98.15 1.4 10 5 215.60 885.44 2.0 10 5

ADGH 6.54 6.74 6.4 10 6 7.16 8.01 1.4 10 5 50.04 52.09 2.0 10 5

KSH > 10000 > 50000 1.1 10 4 > 10000 > 50000 1.8 10 4 > 10000 > 50000 3.3 10 4

SDH 69.40 85.89 1.1 10 4 29.36 101.66 2.0 10 4 55.71 145.57 3.9 10 4

KADGH 35.57 37.16 1.1 10 4 19.20 20.05 1.8 10 4 37.21 39.23 3.3 10 4

retrieval accuracy is reported. The afﬁnity matrix is deﬁned as:

nc if xi, xsj c-th class, 0 otherwise, (12)

where nc is the number of training data belonging to c-th class, xi represents one training sample and xsj is the selected data. For the kernel methods KADGH, KSH and SDH, we randomly select 1000 and 3000 samples as anchors to construct the kernels, respectively. Three standard main criterions: Precision, Mean average precision (MAP) and Precision Recall (PR) curve, are used to evaluate the hashing methods. In addition, we also report the training and test time of each hashing method for comparison. All experiments are conducted using Matlab on a 3.50GHz Intel Xeon CPU with 128GB memory.

Experimental results As shown in Table 1, ADGH exhibits better ranking performance than the other three linear methods, SDGH, Lin:V and COSDISH, on three databases in most cases, especially with short binary codes like 8-bit, 16-bit and 16-bit binary codes on CIFAR-10, You Tube and Image Net databases, respectively. Compared to kernel methods SDH and KSH, KADGH exhibits better accuracy when 1000 and 3000 anchors are selected, and the gain in MAP ranges from 3.4% to 46% over the best competitor SDH on three databases with different bits. Moreover, KADGH with 3000 anchors achieves higher MAP than the other hashing methods except ADGH. Compared to unsupervised hashing methods LSH, SH, AGH 1, AGH 2 and DGH, supervised hashing methods always obtain better ranking performance on the three databases. Table 2 shows the training and testing time of different hashing methods on the three databases. In contrast to

4 8 16 32 64 Bit

Fast Hash COSDISH Lin:V SDGH KSH SDH ADGH KADGH

(a) @ Top-5900 neighbors

8 16 32 64 128 Bit

Fast Hash COSDISH Lin:V SDGH KSH SDH ADGH KADGH

(b) @ Top-300 neighbors

8 16 32 64 128 Bit

Fast Hash COSDISH Lin:V SDGH KSH SDH ADGH KADGH

(c) @ Top-1000 neighbors

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Recall

Fast Hash COSDISH Lin:V SDGH KSH SDH ADGH KADGH

(d) @ 16-bit

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Recall

Fast Hash COSDISH Lin:V SDGH KSH SDH ADGH KADGH

(e) @ 32-bit

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall

Fast Hash COSDISH Lin:V SDGH KSH SDH ADGH KADGH

(f) @ 32-bit

Figure 2: Precision and PR curves of different hashing methods on three databases: (a) Precision vs. Bit on CIFAR-10, (b) Precision vs. Bit on You Tube, (c) Precision vs. Bit on Image Net, (d) PR curve on CIFAR-10, (e) PR curve on Youtube, (f) PR curve on Image Net.

ADGH, although COSDISH requires less training time cost on short bits (8-bit on CIFAR-10, 16-bit on Youtube and Image Net databases), it spends much more training time on long bits (32-bit on CIFAR-10, 64-bit on Youtube and Image Net databases). Compared to DGH, Lin:V and SDGH, ADGH consumes much less training time. Among the kernel methods, KADGH takes the least training time on all three databases, and requires a lower test time cost than SDH on You Tube and Image Net databases. To better illustrate the retrieval performance of different supervised hashing methods, we present precision with different bits and PR-curve on three databases in Figure. 2, which indicates that KADGH outperforms the other hashing methods, and ADGH has the superior performance over SDGH and COSDISH, especially on short binary codes.

Discussion Experiments on all the three benchmark databases demonstrate that the linear hashing method ADGH has superior retrieval performance over SDGH and COSDISH, and its kernelization KADGH can further improve the retrieval accuracy and outperform the other hashing methods. We summarize and interpret our preservations as follows:

Compared to the unsupervised hashing methods LSH, SH, AGH 1, AGH 2 and DGH, the supervised hashing methods ADGH, SDGH, Lin:V, COSDISH, Fast Hash, KSH and SDH can obtain better retrieval accuracy, because they utilize the semantic information to encode each image into binary codes.

ADGH outperforms COSDISH, probably because COSDISH divides the training data into two parts to produce binary codes, thereby losing some semantic infor-

mation. ADGH takes less training time cost to produce relatively long binary code than COSDISH, since the optimization procedure in COSDISH calculates the binary codes bit by bit. ADGH achieves better retrieval performance than SDGH with short binary codes (the number of bits is smaller than the number of classes), probably because the asymmetric discrete constraint can better preserve the similarity of afﬁnity matrix than the symmetric discrete constraint. ADGH has superior retrieval accuracy (including MAP and precision) to Lin:V, probably because it can jointly learn binary codes and the projection matrix. Moreover, ADGH consumes less training time than SDGH and Lin:V because of the asymmetric afﬁnity matrix and its efﬁcient optimization procedure.

Compared to ADGH, KADGH can obtain better retrieval performance because it can capture the non-linear manifold structure hidden in the data. KADGH outperforms KSH because of directly learning binary codes. KADGH can achieve better performance than SDH, probably because SDH aims to reduce the length of binary codes to the number of classes, which suggests that it achieves the best accuracy of itself when the length of binary codes is much larger than the number of classes.

In this paper, we propose a novel graph bashed hashing method, asymmetric discrete graph hashing, to utilize the semantic information to encode the high-dimensional data into a set of binary codes. The proposed method preserves the asymmetric discrete constraint and builds an asymmetric afﬁnity matrix to maintain the similarity of the original afﬁnity matrix. Besides directly learning the binary codes of

training data, the method can simultaneously learn a lowdimensional projection matrix. Extensive experiments on three benchmark databases demonstrate that the proposed method outperforms the art-of-the-start methods with lower training time costs.

References Ahonen, T.; Hadid, A.; and Pietikainen, M. 2006. Face description with local binary patterns: Application to face recognition. TPAMI 28(12):2037 2041. Cand es, E. J.; Li, X.; Ma, Y.; and Wright, J. 2011. Robust principal component analysis? JACM 58(3):11. Deng, J.; Dong, W.; Socher, R.; Li, L. J.; Li, K.; and Li, F. F. 2009. Imagenet: A large-scale hierarchical image database. In CVPR, 248 255. Gong, Y.; ; Lazebnik, S.; Gordo, A.; and Perronnin, F. 2013. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. TPAMI 35(12):2916 2929. Hou, C.; Nie, F.; Li, X.; Yi, D.; and Wu, Y. 2014. Joint embedding learning and sparse regression: A framework for unsupervised feature selection. T Cybernetics 44(6):793 804. Indyk, P., and Motwani, R. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality. In ACM STOC, 604 613. Joly, A., and Buisson, O. 2011. Random maximum margin hashing. In CVPR, 873 880. Kang, W.; Li, W.; and Zhou, Z. 2016. Column sampling based discrete supervised hashing. In AAAI. Kulis, B., and Darrell, T. 2009. Learning to hash with binary reconstructive embeddings. In NIPS, 1042 1050. Le Cun, Y.; Bengio, Y.; and Hinton, G. 2015. Deep learning. Nature 521(28):436 444. Lin, G.; Shen, C.; Shi, Q.; Hengel, A.; and Suter, D. 2014. Fast supervised hashing with decision trees for highdimensional data. In CVPR, 1963 1970. Liu, W.; Wang, J.; Kumar, S.; and Chang, S. F. 2011. Hashing with graphs. In ICML, 1 8. Liu, W.; Wang, J.; Ji, R.; Jiang, Y.; and Chang, S. 2012. Supervised hashing with kernels. In CVPR, 2074 2081. Liu, W.; Cun, M.; Kumar, S.; and Chang, S. F. 2014. Discrete graph hashing. In NIPS, 3419 3427. Neyshabur, B.; Srebro, N.; Salakhutdinov, R. R.; Makarychev, Y.; and Yadollahpour, P. 2013. The power of asymmetry in binary hashing. In NIPS, 2823 2831. Norouzi, M., and Blei, D. M. 2011. Minimal loss hashing for compact binary codes. In ICML. Oliva, A., and Torralba, A. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV 42(3):145 175. Recht, B.; Re, C.; Tropp, J.; and Bittorf, V. 2012. Factoring nonnegative matrices with linear programs. In NIPS, 1214 1222.

Shen, F.; Shen, C.; Liu, W.; and Shen, H. T. 2015. Supervised discrete hashing. In CVPR, 37 45. Shi, X.; Guo, Z.; Lai, Z.; Yang, Y.; Bao, Z.; and Zhang, D. 2015. A framework of joint graph embedding and sparse regression for dimensionality reduction. TIP 24(4):1341 1355. Shi, X.; Guo, Z.; Nie, F.; Yang, L.; You, J.; and Tao, D. 2016a. Two-dimensional whitening reconstruction for enhancing robustness of principal component analysis. TPAMI 38(10):2130 2136. Shi, X.; Xing, F.; Cai, J.; Zhang, Z.; Xie, Y.; and Yang, L. 2016b. Kernel-based supervised discrete hashing for image retrieval. In ECCV, 419 433. Strecha, C.; Bronstein, A. M.; Bronstein, M. M.; and Fua, P. 2012. Ldahash: Improved matching with smaller descriptors. TPAMI 34(1):66 78. Torralba, A.; Fergus, R.; and Freeman, W. T. 2008. 80 million tiny images: A large data set for nonparametric object and scene recognition. TPAMI 30(11):1958 1970. Wang, J.; Kumar, S.; and Chang, S. 2012. Semi-supervised hashing for large-scale search. TPAMI 34(12):2393 2406. Weiss, Y.; Torralba, A.; and Fergus, R. 2009. Spectral hashing. In NIPS, 1753 1760. Wolf, L.; Hassner, T.; and Maoz, I. 2011. Face recognition in unconstrained videos with matched background similarity. In CVPR, 529 534. Wright, J.; Ganesh, A.; Rao, S.; Peng, Y.; and Ma, Y. 2009. Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. In NIPS, 2080 2088. Yan, S.; Xu, D.; Zhang, B.; Zhang, H.; Yang, Q.; and Lin, S. 2007. Graph embedding and extensions: a general framework for dimensionality reduction. TPAMI 29(1):40 51. Zhang, D.; Wang, J.; Cai, D.; and Lu, J. 2010. Self-taught hashing for fast similarity search. In ACM SIGIR, 18 25. Zhou, T.; Bian, W.; and Tao, D. 2013. Divide-and-conquer anchoring for near-separable nonnegative matrix factorization and completion in high dimensions. In ICDM, 917 926.