# efficient_and_effective_incomplete_multiview_clustering__89d19e6c.pdf

The Thirty-Third AAAI Conference on Artiﬁcial Intelligence (AAAI-19)

Efﬁcient and Effective Incomplete Multi-View Clustering

Xinwang Liu,1 Xinzhong Zhu,2,3 Miaomiao Li,1 Chang Tang,4 En Zhu,1 Jianping Yin,5 Wen Gao6

1School of Computer Science, National University of Defense Technology, Changsha, China, 410073 2College of Mathematics, Physics and Information Engineering, Zhejiang Normal University, Jinhua, Zhengjiang, China, 321004 3Research Institute of Ningbo Cixing Co., Ltd, Ningbo, China, 315336 4School of Computer Science, China University of Geosciences, Wuhan, China, 430074 5Dongguan University of Technology, Guangdong, China 6School of Electronics Engineering and Computer Science, Peking University, Beijing, China, 100871

Incomplete multi-view clustering (IMVC) optimally fuses multiple pre-speciﬁed incomplete views to improve clustering performance. Among various excellent solutions, the recently proposed multiple kernel k-means with incomplete kernels (MKKM-IK) forms a benchmark, which redeﬁnes IMVC as a joint optimization problem where the clustering and kernel matrix imputation tasks are alternately performed until convergence. Though demonstrating promising performance in various applications, we observe that the manner of kernel matrix imputation in MKKM-IK would incur intensive computational and storage complexities, overcomplicated optimization and limitedly improved clustering performance. In this paper, we propose an Efﬁcient and Effective Incomplete Multi-view Clustering (EE-IMVC) algorithm to address these issues. Instead of completing the incomplete kernel matrices, EE-IMVC proposes to impute each incomplete base matrix generated by incomplete views with a learned consensus clustering matrix. We carefully develop a three-step iterative algorithm to solve the resultant optimization problem with linear computational complexity and theoretically prove its convergence. Further, we conduct comprehensive experiments to study the proposed EE-IMVC in terms of clustering accuracy, running time, evolution of the learned consensus clustering matrix and the convergence. As indicated, our algorithm signiﬁcantly and consistently outperforms some state-of-the-art algorithms with much less running time and memory.

Introduction Multi-view clustering (MVC) optimally integrates features from different views to improve clustering performance (Bickel and Scheffer 2004). It has been intensively studied and widely applied into various applications during the last few decade (Yu et al. 2012; Li, Jiang, and Zhou 2014; Du et al. 2015; Liu et al. 2016; Li et al. 2016; Liu et al. 2017b; Li et al. 2015; Cai, Nie, and Huang 2013; Tao, Liu, and Fu 2017; Liu et al. 2013; Zhang et al. 2015; Tao et al. 2017). All these MVC algorithms assume that the views of samples are observable. However, in some practical applications (Kumar et al. 2013; Xiang et al. 2013), this assumption may not hold anymore due to the absence of partial views among samples. The violation on this assumption

Copyright c 2019, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

makes the aforementioned MVC algorithms not applicable to handle incomplete multi-view clustering (IMVC) tasks. Many efforts have been devoted to addressing IMVC, which can roughly be grouped into two categories. In the ﬁrst category, the incomplete views are ﬁrstly ﬁlled with an imputation algorithm such as zero-ﬁlling, mean value ﬁlling, k-nearest-neighbor ﬁlling, expectation-maximization (EM) ﬁlling (Ghahramani and Jordan 1993) and other advanced ones (Trivedi et al. 2010; Xu, Tao, and Xu 2015; Shao, He, and Yu 2015; Bhadra, Kaski, and Rousu 2016; Yin, Wu, and Wang 2015). A standard MVC algorithm is subsequently applied into these imputed views to perform clustering tasks. This kind of algorithms are termed twostage ones, where the imputation and clustering processes are separately carried out. By observing that the abovementioned two-stage algorithms disconnect the processes of imputation and clustering, the other category, termed as one-stage , puts forward to unify imputation and clustering into a single optimization procedure and instantiate a clustering-oriented algorithm termed as multiple kernel kmeans with incomplete kernels (MKKM-IK) algorithm (Liu et al. 2017a). Speciﬁcally, the clustering result at the last iteration guides the imputation of absent kernel elements, and the latter is used in turn to conduct the subsequent clustering. By this way, these two procedures are seamlessly connected, with the aim to achieve better clustering performance. Of the above-mentioned IMVC algorithms, the onestage methods form a benchmark, where the incomplete views are optimized to best serve clustering. The main contribution of these methods is the uniﬁcation of imputation and clustering, so that the imputation would be meaningful and beneﬁcial for clustering. It has been demonstrated that the one-stage methods can achieve promissing clustering performance in various applications (Liu et al. 2017a; Zhu et al. 2018), but they also suffer from the following nonignorable drawbacks. i) High computational and storage complexities. Its computational and storage complexities are O(n3) and O(mn2) per iteration, respectively, where n and m are the number of samples and views. It prevents them from being applied to large-scale clustering tasks. ii) Overcomplicated imputation. Existing one-stage methods directly impute multiple incomplete similarity matrices, in which the number of variables increases quadratically with the number of samples for each view. This could make the

whole optimization over-complicated and also considerably increase the risk of falling into a low-quality local minimum. iii) Limitedly improved clustering performance. Note that a clustering result is determined by a whole similarity matrix in (Liu et al. 2017a). As a result, the imputation to an incomplete similarity matrix has impact to the clustering of all samples, no matter whether a sample is complete or not. When an imputation is not of high quality, it could adversely affect the clustering performance of all samples, especially for those with complete views. All of the above issues signal that directly imputing the incomplete similarity matrices seems to be problematic and that a more efﬁcient and effective approach shall be taken. In this paper, we propose Efﬁcient and Effective Incomplete Multi-view Clustering (EE-IMVC) to address these issues. EE-IMVC imputes each incomplete base clustering matrix generated by performing clustering on each separated incomplete similarity matrix, instead of itself. These imputed base clustering matrices are then used to learn a consensus clustering matrix, which is then employed to impute each incomplete base clustering matrix. These two steps are alternately performed until convergence. This idea is fulﬁlled by maximizing the alignment between the consensus clustering matrix and an adaptively weighted base clustering matrices with an optimal permutation. We design a simple and computationally efﬁcient algorithm to solve the resultant optimization problem by three singular value decomposition (SVD) per iteration, and analyze its computational and storage complexities and theoretically prove its convergence. After that, we conduct comprehensive experiments on four benchmark datasets to study the properties of the proposed algorithm, including the clustering accuracy with the various missing ratios, the running time with the various number of samples, the evolution of the learned consensus matrix with iterations and the objective value with iterations. As demonstrated, EE-IMVC signiﬁcantly and consistently outperforms the state-of-the-art methods in terms of clustering accuracy with much less running time.

Related Work

Let {xi}n i=1 X be a collection of n samples, and φp( ) : x X 7 Hp be the p-th feature mapping that maps x onto a reproducing kernel Hilbert space Hp (1 p m). In the multiple kernel setting, each sample is represented as φβ(x) = [β1φ1(x) , , βmφm(x) ] , where β = [β1, , βm] consists of the coefﬁcients of the m base kernels {κp( , )}m p=1. These coefﬁcients will be optimized during learning. Based on the deﬁnition of φβ(x), a kernel function can be expressed as κβ(xi, xj) = φβ(xi) φβ(xj) = Pm p=1 β2 pκp(xi, xj). A kernel matrix Kβ is then calculated by applying the kernel function κβ( , ) into {xi}n i=1. Based on the kernel matrix Kβ = Pm p=1 β2 p Kp, the objective of MKKM can be written as

min H,β Tr(Kβ(In HH ))

s.t. H Rn k, H H = Ik,

β 1m = 1, βp 0, p,

where Ik is an identity matrix with size k and k is the number of clusters. The optimization problem in Eq. (1) can be solved by alternately updating H and β. Speciﬁcally, H is updated by given β, and β is then optimized with updated H. These two steps are alternately performed until convergence. The recently proposed MKKM-IK (Liu et al. 2017a) has extended the existing MKKM in Eq. (1) to enable it to handle multiple kernel clustering with incomplete kernels. It uniﬁes the imputation and clustering procedure into a single optimization objective and alternately optimizes each of them. That is, i) imputing the absent kernels under the guidance of clustering; and ii) updating the clustering with the imputed kernels. The above idea is mathematically fulﬁlled as,

min H, β, {Kp}m p=1 Tr(Kβ(In HH ))

s.t. H Rn k, H H = Ik,

β 1m = 1, βp 0,

Kp(sp, sp) = K(cc) p , Kp 0, p,

where sp (1 p m) denote the sample indices for which the p-th view is present and K(cc) p be used to denote the kernel sub-matrix computed with these samples. The constraint Kp(sp, sp) = K(cc) p is imposed to ensure that Kp maintains the known entries during the course. Different from the optimization in MKKM, (Liu et al. 2017a) incorporates an extra step to impute the missing entries of base kernels, leading to a three-step alternate optimization algorithm. Interested readers are referred to (Liu et al. 2017a). Although MKKM-IK demonstrates excellent clustering performance in handling incomplete multi-view clustering tasks (Liu et al. 2017a), it also suffers from the following non-ignorable drawbacks. Firstly, from the above optimization procedure, we observe that its computational complexity is O(n3 + Pm p=1 n3 p + m3) per iteration, where n, np (np n) and m are the number of all samples, observed samples of p-th view and views. During the learning procedure, it requires to store m base kernel matrices with size n. Therefore, its storage complexity is O(mn2). The relatively high computational and storage complexities preclude it from being applied to large-scale clustering tasks. Furthermore, as seen from Eq. (2), there are 1 2(n np)(n + np + 1) elements to be imputed for the p-th incomplete base kernel matrix Kp(1 p m). It unnecessarily increases the complexity of the optimization and the risk of be trapped into a local minimum, adversely affecting the clustering performance. In addition, note that a clustering result is determined by a whole similarity matrix in (Liu et al. 2017a). As a result, the imputation to an incomplete similarity matrix has impact to the clustering of all samples, no matter whether a sample is complete or not. This improperly increases the inﬂuence of imputation on all samples, especially for those with complete views. As a result, instead of imputing incomplete similarity matrices {Kp}m p=1, we propose to impute the incomplete base clustering matrices to address the aforementioned issues. Moreover, we argue that this imputation is more natural and reasonable since

all of them reside in a common clustering partition space, which would produce better imputation and ﬁnally improve the clustering.

Efﬁcient and Effective Incomplete Multi-view Clustering (EE-IMVC) In this section, we propose Efﬁcient and Effective Incomplete Multi-view Clustering (EE-IMVC) which performs clustering and imputes the incomplete base clustering matrices simultaneously. We ﬁrstly deﬁne the p-th (1 p m) base clustering matrix as

Hp = [H(o) p , H(u) p ] Rn k, (3)

where H(o) p Rnp k can be obtained by solving kernel kmeans in Eq. (2) with m incomplete base kernel matrices {Kp(sp, sp)}m p=1, while H(u) p R(n np) k denote the incomplete part of Hp that is required to be ﬁlled. Note that other similarity based clustering algorithms such as spectral clustering can also be used to generate {H(o) p }m p=1. According to the above discussion, EE-IMVC proposes to simultaneously perform clustering and the imputation of {H(u) p }m p=1 while keeping {H(o) p }m p=1 unchanged during the learning course. Speciﬁcally, it ﬁrstly optimizes a consensus clustering matrix H from imputed {Hp}m p=1, and then ﬁll

the {H(u) p }m p=1 with H. These two learning processes are seamlessly integrated. By doing so, they are allowed to coordinate with each other to achieve optimal clustering. The above idea can be fulﬁlled as follows,

max H,{Wp,H(u) p ,βp}m p=1 Tr

H(o) p H(u) p

s.t. H Rn k, H H = Ik,

Wp Rk k, W p Wp = Ik,

H(u) p R(n np) k, H(u) p H(u) p = Ik,

p=1 β2 p = 1, βp 0,

(4) where H and H(u) p are the consensus clustering matrix and the missing part of the p-th base clustering matrix, respectively, Wp is the p-th permutation matrix to optimally match Hp and H, and β = [β1, , βm] is the adaptive weights of m base clustering matrices. Note that the orthogonal constraints are respectively imposed on H and H(u) p since they are clustering matrices. We also put an orthogonal constraint on Wp because it is a permutation matrix. Compared with MKKM-IK (Liu et al. 2017a), the objective function of EE-IMVC in Eq. (4) has the following nice properties. (1) Less imputation variables: The number of elements needs to be ﬁlled for the p-th view is (n np) k, which is much less than 1

2(n np) (n + np + 1) required by MKKM-IK. This could dramatically simplify the model and enhance its robustness to optimization. (2) Less vulnerable to low-quality imputation: In EE-IMVC, clustering on samples with complete views will not be affected by

the imputation they are kept unchanged during the learning course. However, it is not the case for MKKM-IK because it needs to ﬁll all incomplete elements and conduct eigndecomposition on the whole imputed similarity for clustering. This is helpful to make the proposed model be more robust in the whole course of optimization. (3) More reasonable imputation: EE-IMVC utilizes H to complete H(u) p rather than the incomplete base kernels matrices as in (Liu et al. 2017a), which is more reasonable since both H and H(u) p reside in clustering partition space. Besides, our algorithm is parameter-free once the number of clusters to form is speciﬁed. These advantages signiﬁcantly boosts the clustering performance, as demonstrated in the experimental part.

Alternate Optimization

Jointly optimizing H, {H(u) p , Wp}m p=1 and β in Eq. (4) is difﬁcult. In the following, we design a simple and computationally efﬁcient three-step algorithm to solve it alternately.

Solving H with ﬁxed {Wp, H(u) p }m p=1 and β Given

{Wp, H(u) p }m p=1 and β, the optimization w.r.t H in Eq. (4) is equivalent to

max H Tr H T s.t. H Rn k, H H = Ik, (5)

where T = Pm p=1 βp Hp Wp. It is a singular value decomposition (SVD) problem and can be efﬁciently solved with computational complexity O(nk2).

Solving {Wp}m p=1 with ﬁxed H, {H(u) p }m p=1 and β

Given H, {H(u) p }m p=1 and β, the optimization w.r.t permutation matrix Wp in Eq. (4) equivalently reduces to,

max Wp Tr W p Qp s.t.Wp Rk k, W p Wp = Ik, (6) where Qp = H p H. Again, it is a SVD optimization problem with computational complexity O(k3).

Solving {H(u) p }m p=1 with ﬁxed {Wp}m p=1, H and β

Given H, {Wp}m p=1 and β, the optimization w.r.t H(u) p in Eq. (4) is equivalent to

max H(u) p Tr H(u) p Up

s.t. H(u) p R(n np) k, H(u) p H(u) p = Ik, (7)

where Up = H(ˆsp, :)W p and ˆsp denotes the sample indices for which the p-th view is missing. Once again, it is a SVD problem and can be efﬁciently solved with computational complexity O((n np)k2).

Solving β with ﬁxed H and {Wp, H(u) p }m p=1 Given H

and {Wp, H(u) p }m p=1, the optimization w.r.t β in Eq. (4) is equivalent to

maxβ ν β s.t. β Rm, Xm

p=1 β2 p = 1, βp 0,

(8) where ν = [ν1, ν2, , νm] with νp = Tr(H Hp Wp).

As seen, the optimization in Eq. (8) has an analytical solution if νp 0 (1 p m). The following Theorem 1 tells that the optimal weights of each base clustering matrix can be obtained analytically.

Theorem 1 The optimal solution for Eq. (8) is β = ν/ ν .

Proof 1 Let (H(t), {H(t) p , W(t) p }m p=1) be the solution at the t-th iteration. We have ν(t) p = Tr((H(t)) H(t) p W(t) p ) =

max H(u) p Tr (H(t)) [H(o) p , H(u) p ] W(t) p max Wp Tr

(H(t)) [H(o) p , H(u) p (t 1) ] Wp > 0, p. The proof is completed by taking the derivative of the Lagrangian function of Eq. (8) on βp and letting it vanish.

Algorithm 1 The Proposed EE-IMVC

1: Input: {H(o) p , sp}m p=1, k and ϵ0. 2: Output: H.

3: Initialize W(0) p = Ik, H(u) p (0) = 0, β(0) = 1/ m and t = 1. 4: repeat 5: Update H(t) by solving Eq. (5) with

{W(t 1) p , H(u) p (t 1)}m p=1 and β(t 1).

6: Update {W(t) p }m p=1 with H(t), {H(u) p (t 1)}m p=1 and β(t 1) by Eq. (6).

7: Update {H(u) p (t)}m p=1 with H(t), {W(t) p }m p=1 and β(t 1) by Eq. (7).

8: Update β(t) with H(t), {W(t) p }m p=1 and

{H(u) p (t)}m p=1 by Eq. (8). 9: t = t + 1. 10: until obj(t) obj(t 1) /obj(t 1) ϵ0

In sum, our algorithm for solving Eq. (4) is outlined in Algorithm 1, where obj(t) denotes the objective value at the t-th iteration. The following Theorem 2 shows that Algorithm 1 is guaranteed to converge to a local maximum.

Theorem 2 Algorithm 1 is guaranteed to converge to a local optimum.

Proof 2 Note that for p, Tr(H [H(o) p , H(u) p ] Wp)

1 2[Tr(H H) + Tr(W p [H(o) p , H(u) p ][H(o) p , H(u) p ]

Wp)] = 1 2[2k + Tr(W p H(o) p H(o) p Wp)]. Note

that the maximum of Tr(W p H(o) p H(o) p Wp) with constraint W p Wp = Ik is Pk j=1 λj p, where

{λj p}k j=1 are the k eigenvalue of H(o) p H(o) p . We have

Tr H [H(o) p , H(u) p ] Wp 1

2[2k + Pk j=1 λj p] ap.

Correspondingly, Pm p=1 βp Tr(H [H(o) p , H(u) p ] Wp) Pm p=1 βpap, which is upper-bounded by Pm p=1 ap due to

the ℓ2-norm constraint on β. Meanwhile, the objective of Algorithm 1 is guaranteed to be monotonically increased when optimizing one variable with others ﬁxed at each iteration. As a result, our algorithm is guaranteed to converge to a local minimum.

Discussion and Extension We end up this section by analyzing the computational and storage complexities, the initialization of {H(u) p , Wp}m p=1 and potential extensions. Computational complexity: As seen from Algorithm 1, the computational complexity of EE-IMVC is O(nk2 +m(k3 + (n np)k2)) per iteration, where n, m and k are the number of samples, views and clusters, respectively. Therefore, EEIMVC has a linear computational complexity with number of samples, which enables it more efﬁciently to handle large scale clustering tasks when compared with MKKM-IK (Liu et al. 2017a). Storage complexity: During the learning procedure, EEIMVC needs to store H and {Hp, Wp}m p=1. Its storage complexity is O(nk+mnk+mk2), which is much less than that of MKKM-IK with O(mn2) since n k in practice. Initialization of {H(u) p , Wp}m p=1: In our current imple-

mentation, we simply initialize {H(u) p }m p=1 as zeros, and {Wp}m p=1 as identity matrix. This initialization has well demonstrated superior clustering performance of EE-IMVC in our experiments. Further exploring other initializations and studying their inﬂuence on the clustering performance will be an interesting future work. Extensions: EE-IMVC can be extended from the following aspects. Firstly, EE-IMVC could be further improved by sufﬁciently considering the correlation among {Hp}m p=1. For example, we may build this correlation by criteria such as Kullback-Leibler (KL) divergence (Kato and Rivero 2017) and Hilbert-Schmidt independence criteria (HSIC), to name just a few. This prior knowledge could provide a good regularization on mutual base clustering matrix completion, and would be helpful to improve the clustering performance. Secondly, the way in generating {H(o) p }m p=1 could be readily extendable to other similarity based clustering algorithms, such us spectral clustering (von Luxburg 2007). This could further improve the clustering performance. Last but not least, the idea of joint imputation and clustering is so natural that can be generalized to other learning task such as feature missing.

Experiments Experimental settings The proposed algorithm is experimentally evaluated on four widely used multiple kernel benchmarkdata sets shown in Table 2. They are Oxford Flower17 and Flower1021, Caltech1022 and Columbia Consumer Video (CCV)3. For these datasets, all kernel matrices are pre-computed and can be

1http://www.robots.ox.ac.uk/ vgg/data/ﬂowers/ 2http://ﬁles.is.tue.mpg.de/pgehler/projects/iccv09/ 3http://www.ee.columbia.edu/ln/dvmm/CCV/

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 missing ratio

MKKM+ZF MKKM+MF MKKM+KNN MKKM+AF

MKKM-IK LI-MKKM EE-IMVC

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 missing ratio

MKKM+ZF MKKM+MF MKKM+KNN MKKM+AF MKKM-IK LI-MKKM EE-IMVC

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 missing ratio

MKKM+ZF MKKM+MF MKKM+KNN MKKM+AF MKKM-IK LI-MKKM EE-IMVC

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 missing ratio

MKKM+ZF MKKM+MF MKKM+KNN MKKM+AF MKKM-IK LI-MKKM EE-IMVC

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 missing ratio

MKKM+ZF MKKM+MF MKKM+KNN MKKM+AF

MKKM-IK LI-MKKM EE-IMVC

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 missing ratio

MKKM+ZF MKKM+MF MKKM+KNN MKKM+AF MKKM-IK LI-MKKM EE-IMVC

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 missing ratio

MKKM+ZF MKKM+MF MKKM+KNN MKKM+AF MKKM-IK LI-MKKM EE-IMVC

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 missing ratio

MKKM+ZF MKKM+MF MKKM+KNN MKKM+AF MKKM-IK LI-MKKM EE-IMVC

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 missing ratio

MKKM+ZF MKKM+MF MKKM+KNN MKKM+AF

MKKM-IK LI-MKKM EE-IMVC

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 missing ratio

MKKM+ZF MKKM+MF MKKM+KNN MKKM+AF MKKM-IK LI-MKKM EE-IMVC

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 missing ratio

MKKM+ZF MKKM+MF MKKM+KNN MKKM+AF MKKM-IK LI-MKKM EE-IMVC

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 missing ratio

MKKM+ZF MKKM+MF MKKM+KNN MKKM+AF MKKM-IK LI-MKKM EE-IMVC

Figure 1: ACC, NMI, Purity and Rand Index comparison with the variation of missing ratios on four benchmark datasets.

Table 1: Aggregated ACC, NMI, purity and rand index comparison (mean std) of different clustering algorithms on benchmark datasets.

Datasets MKKM+ZF MKKM+MF MKKM+KNN MKKM+AF MKKM-IK LI-MKKM EE-IMVC (Trivedi et al. 2010) (Liu et al. 2017a) (Zhu et al. 2018) Proposed

ACC Flower17 37.0 0.7 36.8 0.6 37.8 0.3 40.8 0.3 44.7 0.3 47.9 0.3 52.9 0.7 Flower102 18.0 0.2 18.0 0.2 18.3 0.1 18.5 0.1 21.5 0.2 23.1 0.2 36.4 0.2 CCV 16.2 0.1 16.4 0.1 16.6 0.2 17.4 0.1 18.1 0.2 18.7 0.3 23.4 0.4 Caltech102 14.0 0.1 14.0 0.1 15.4 0.2 15.8 0.1 17.3 0.2 22.1 0.2 31.9 0.2

Flower17 37.3 0.5 37.3 0.4 38.4 0.2 40.3 0.3 43.7 0.3 46.3 0.2 51.4 0.6 Flower102 37.4 0.1 37.4 0.1 37.8 0.1 37.7 0.1 39.6 0.1 41.7 0.1 50.8 0.1 CCV 12.5 0.1 12.7 0.1 12.9 0.1 13.3 0.1 13.8 0.2 15.1 0.2 18.2 0.2 Caltech102 37.7 0.1 37.7 0.1 39.1 0.1 38.9 0.1 40.4 0.2 45.6 0.1 52.9 0.1

Flower17 38.4 0.6 38.2 0.5 39.3 0.3 42.2 0.3 45.9 0.7 48.8 0.3 54.7 0.7 Flower102 22.5 0.1 22.4 0.1 22.9 0.1 22.9 0.2 26.0 0.2 28.0 0.2 41.8 0.2 CCV 20.4 0.1 20.7 0.1 20.8 0.1 21.2 0.1 21.9 0.2 22.0 0.2 26.5 0.4 Caltech102 15.3 0.1 15.3 0.1 16.9 0.1 17.0 0.1 18.6 0.2 23.9 0.1 34.3 0.2

publicly downloaded from the above websites. Their number of samples varies from one thousand to over eight thousands, and views from four to 48. We compare the proposed EE-IMVC with several commonly used imputation methods, including zero ﬁlling (ZF),

mean ﬁlling (MF), k-nearest-neighbor ﬁlling (KNN) and the alignment-maximization ﬁlling (AF) proposed in (Trivedi et al. 2010). The widely used MKKM (G onen and Margolin 2014) is applied with these imputed base kernels. These two-stage methods are termed MKKM+ZF, MKKM+MF,

Table 2: Datasets used in our experiments.

Dataset #Samples #Kernels #Classes

Flower17 1360 7 17 Flower102 8189 4 102 CCV 6773 6 20 Caltech102 3060 48 102

MKKM+KNN and MKKM+AF, respectively. In addition, we compare with the recently proposed MKKM-IK (Liu et al. 2017a), which jointly optimizes the imputation and clustering. For all data sets, it is assumed that the true number of clusters k is known and it is set as the true number of classes. We follow the approach in (Liu et al. 2017a; Zhu et al. 2018) to generate the missing vectors {sp}m p=1. The parameter ε, termed missing ratio in this experiment, controls the percentage of samples that have absent views, and it affects the performance of the algorithms in comparison. To show this point in depth, we compare these algorithms with respect to ε. Speciﬁcally, ε on all the datasets is set as [0.1 : 0.1 : 0.9]. The widely used clustering accuracy (ACC), normalized mutual information (NMI) and purity are applied to evaluate the clustering performance. For all algorithms, we repeat each experiment for 50 times with random initialization to reduce the effect of randomness caused by k-means, and report the best result. Meanwhile, we randomly generate the incomplete patterns for 30 times in the above-mentioned way and report the statistical results. The aggregated ACC, NMI, purity and rand index are used to evaluate the goodness of the algorithms in comparison. Taking the aggregated ACC for example, it is obtained by averaging the averaged ACC achieved by an algorithm over different ε. In the following parts, we conduct comprehensive experiments to study the properties of EE-IMVC from the following four aspects: clustering performance, running time, the evolution of the learned consensus clustering matrix and convergence.

Clustering Performance Figure 1 presents the ACC and NMI comparison of the above algorithms with different missing ratios on all datasets. We have the following observations: 1) The recently proposed MKKM-IK (Liu et al. 2017a) (in green) outperforms existing twostage imputation methods. For example, it exceeds the best two-stage imputation method (MKKM+AF) by 0.1%, 1.2%, 1.7%, 2.7%, 2.8%, 4.4%, 4.7%, 6.9% and 6.9% in terms of NMI, with the variation of missing ratios in [0.1, , 0.9] on Flower17. These results verify the effectiveness of its joint optimization on imputation and clustering. 2) The proposed EE-IMVC signiﬁcantly and consistently outperforms MKKM-IK. For example, it improves the latter by 10.7%, 9.7%, 8.7%, 8.2%, 8.0%, 6.5%, 5.9%, 5.5% and 5.6% with the variation of missing ratios in [0.1, , 0.9]

on Flower17. These results verify the effectiveness of imputing base clustering matrices rather than kernel matrices. 3) The superiority of EE-IMVC is more signiﬁcant when the missing ratio is relatively small. For example, EE-IMVC improves the second best algorithm (MKKM-IK) by 10.7% on Flower17 in terms of NMI when the missing ratio is 0.1 (see Figure 1(c)). The curves in terms of purity and rand index are provided in the supplemental material due to space limit. We also report the aggregated ACC, NMI, purity and rand index, and the standard deviation in Table 1, where the one with the highest performance is shown in bold. Again, we observe that the proposed algorithm signiﬁcantly outperforms MKKM+ZF, MKKM+MF, MKKM+KNN, MKKM+AF and MKKM-IK. For example, EE-IMVC exceeds the second best one (MKKM-IK) by 8.3%, 14.9%, 5.3% and 14.7% in terms of clustering accuracy on Flower17, Flower102, CCV and Caltech102, respectively. These results are consistent with our observations in Figure 1. The above experimental results on these datasets have well demonstrated that EE-IMVC is superior to some stateof-the-art in terms of clustering accuracy, NMI, purity and rand index. We attribute the superiority of EE-IMVC as two aspects: i) Completing the incomplete base clustering matrices with the consensus one. Different from MKKM-IK where the consensus clustering matrix H is utilized to ﬁll incomplete base kernels, EE-IMVC imputes each incomplete base clustering matrix with H. The latter is more natural and reasonable since both H and incomplete base clustering matrices reside in the same clustering space, leading to more suitable imputation. ii) The joint optimization on imputation and clustering. On one hand, the imputation is guided by the clustering results, which makes the imputation more directly targeted at the ultimate goal. On the other hand, this meaningful imputation is beneﬁcial to reﬁne the clustering results. These factors bring forth the signiﬁcant improvements on clustering performance.

Running Time To compare the computational efﬁciency of the abovementioned algorithms, we design another experiment to study the relationship between running time and the number of samples. To see this point in depth, we randomly select samples from the four benchmark datasets, run the aforementioned algorithms, record their running time, and plot them in Figure 4. We have the following observations from these ﬁgures: 1) The running time of EE-IMVC is nearly linear with the number of samples. 2) The superiority of EEIMVC is more signiﬁcant with the increase of samples, indicating its computational efﬁciency in handling large-scale clustering tasks. In sum, the experimental results in Figure 4 have well demonstrated the computational advantage of EE-IMVC.

Effectiveness of the Learned Consensus Matrix We conduct extra experiments to show the evolution of the learned consensus clustering matrix H during the learning procedure. Speciﬁcally, we evaluate the NMI of EE-IMVC

10 20 30 40 50 60 70 80 90 100 The Number of Iterations

Flower17 (missing ratio = 0.1)

10 20 30 40 50 60 70 80 90 100 The Number of Iterations

Flower102 (missing ratio = 0.1)

10 20 30 40 50 60 70 80 90 100 The Number of Iterations

CCV (missing ratio = 0.1)

10 20 30 40 50 60 70 80 90 100 The Number of Iterations

Caltech102 (missing ratio = 0.1)

Figure 2: The evolution of the learned consensus clustering matrix H with missing ratio 0.1 on all datasets. The curves with other missing ratios are similar and we omit them due to space limit.

10 20 30 40 50 60 70 80 90 100 The Number of Iterations

Objective Values

Flower17 (missing ratio = 0.1)

10 20 30 40 50 60 70 80 90 100 The Number of Iterations

Objective Values

Flower102 (missing ratio = 0.1)

10 20 30 40 50 60 70 80 90 100 The Number of Iterations

Objective Values

CCV (missing ratio = 0.1)

10 20 30 40 50 60 70 80 90 100 The Number of Iterations

Objective Values

Caltech102 (missing ratio = 0.1)

Figure 3: The objective value of EE-IMVC with iterations with missing ratio 0.1 on all datasets. The curves with other missing ratios are similar and we omit them due to space limit.

200 400 600 800 1000 1200 Number of Samples

Running Time

Flower17 (missing ratio=0.1) MKKM+ZF MKKM+MF MKKM+KNN MKKM+AF MKKM-IK EE-IMVC

1000 1500 2000 2500 3000 3500 4000 Number of Samples

Running Time

Flower102 (missing ratio=0.1)

MKKM+ZF MKKM+MF MKKM+KNN MKKM+AF MKKM-IK EE-IMVC

400 600 800 1000 1200 1400 1600 1800 2000 Number of Samples

Running Time

CCV (missing ratio=0.1)

MKKM+ZF MKKM+MF MKKM+KNN MKKM+AF MKKM-IK EE-IMVC

1000 1500 2000 2500 3000 Number of Samples

Running Time

Caltech102 (missing ratio=0.1)

MKKM+ZF MKKM+MF MKKM+KNN

1000 1500 2000 2500 3000 Number of Samples

Running Time

Caltech102 (missing ratio=0.1)

MKKM+ZF MKKM+MF MKKM+KNN MKKM+AF MKKM-IK EE-IMVC

Figure 4: Running time comparison of different algorithms with various number of samples with missing ratio 0.1 on all datasets. The curves with other missing ratios are similar and omitted due to space limit.

based on the H learned at each iteration on Flower102 and CCV and plot the curves in Figure 2. From these ﬁgures, we observe that the NMI of EE-IMVC gradually increases to a maximum and generally maintains it up to slight variation. These experiments have clearly demonstrated the effectiveness of learned consensus clustering matrix, indicating the advantage of imputing incomplete base clustering matrices, instead of imputing incomplete kernel matrices. Other curves in terms of ACC and purity have similar trend and are omitted due to space limit.

Convergence

Our algorithm is theoretically guaranteed to converge according to Theorem 2. We record the objective values of EEIMVC with iterations on all datasets and plot them in Figure 3. As observed, the objective value of EE-IMVC does monotonically increase at each iteration and that it usually converges in less than 50 iterations.

Conclusion While the recently proposed MKKM-IK (Liu et al. 2017a) is able to handle incomplete multi-view clustering, the relatively high computational and space complexities prevent it from large scale clustering tasks. This paper proposes a late fusion approach to simultaneously clustering and imputing the incomplete base clustering matrices. The proposed algorithm effectively and efﬁciently solves the resultant optimization problem, and demonstrates well improved clustering performance via extensive experiments on benchmark datasets. In the future, we plan to explore the correlation among base clustering matrices and use it to further improve the imputation.

Acknowledgements This work was supported by National Key R&D Program of China 2018YFB1003203, the Natural Science Foundation of China (project no. 61773392). The authors wish to grate-

fully acknowledge Prof. Huiying Xu from Zhejiang Normal University for her help in the proofreading of this paper. Xinwang Liu and Xinzhong Zhu equally contribute to the paper. Xinzhong Zhu is the corresponding author of this paper.

References Bhadra, S.; Kaski, S.; and Rousu, J. 2016. Multi-view kernel completion. In ar Xiv:1602.02518. Bickel, S., and Scheffer, T. 2004. Multi-view clustering. In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), 19 26. Cai, X.; Nie, F.; and Huang, H. 2013. Multi-view k-means clustering on big data. In IJCAI, 2598 2604. Du, L.; Zhou, P.; Shi, L.; Wang, H.; Fan, M.; Wang, W.; and Shen, Y.-D. 2015. Robust multiple kernel k-means clustering using ℓ21-norm. In IJCAI, 3476 3482. Ghahramani, Z., and Jordan, M. I. 1993. Supervised learning from incomplete data via an EM approach. In NIPS, 120 127. G onen, M., and Margolin, A. A. 2014. Localized data fusion for kernel k-means clustering with application to cancer biology. In NIPS, 1305 1313. Kato, T., and Rivero, R. 2017. Mutual kernel matrix completion. In ar Xiv:1702.04077v2. Kumar, R.; Chen, T.; Hardt, M.; Beymer, D.; Brannon, K.; and Syeda-Mahmood, T. F. 2013. Multiple kernel completion and its application to cardiac disease discrimination. In ISBI, 764 767. Li, Y.; Nie, F.; Huang, H.; and Huang, J. 2015. Large-scale multi-view spectral clustering via bipartite graph. In AAAI, 2750 2756. Li, M.; Liu, X.; Wang, L.; Dou, Y.; Yin, J.; and Zhu, E. 2016. Multiple kernel clustering with local kernel alignment maximization. In IJCAI, 1704 1710. Li, S.; Jiang, Y.; and Zhou, Z. 2014. Partial multi-view clustering. In AAAI, 1968 1974. Liu, J.; Wang, C.; Danilevsky, M.; and Han, J. 2013. Largescale spectral clustering on graphs. In IJCAI, 1486 1492. Liu, X.; Dou, Y.; Yin, J.; Wang, L.; and Zhu, E. 2016. Multiple kernel k-means clustering with matrix-induced regularization. In AAAI, 1888 1894. Liu, X.; Li, M.; Wang, L.; Dou, Y.; Yin, J.; and Zhu, E. 2017a. Multiple kernel k-means with incomplete kernels. In AAAI, 2259 2265. Liu, X.; Zhou, S.; Wang, Y.; Li, M.; Dou, Y.; Zhu, E.; and Yin, J. 2017b. Optimal neighborhood kernel clustering with multiple kernels. In AAAI, 2266 2272. Shao, W.; He, L.; and Yu, P. S. 2015. Multiple incomplete views clustering via weighted nonnegative matrix factorization with ℓ2,1 regularization. In ECML PKDD, 318 334. Tao, Z.; Liu, H.; Li, S.; Ding, Z.; and Fu, Y. 2017. From ensemble clustering to multi-view clustering. In IJCAI, 2843 2849. Tao, Z.; Liu, H.; and Fu, Y. 2017. Simultaneous clustering and ensemble. In AAAI, 1546 1552.

Trivedi, A.; Rai, P.; Daum e III, H.; and Du Vall, S. L. 2010. Multiview clustering with incomplete views. In NIPS 2010: Machine Learning for Social Computing Workshop, Whistler, Canada. von Luxburg, U. 2007. A tutorial on spectral clustering. Statistics and Computing 17(4):395 416. Xiang, S.; Yuan, L.; Fan, W.; Wang, Y.; Thompson, P. M.; and Ye, J. 2013. Multi-source learning with block-wise missing data for alzheimer s disease prediction. In ACM SIGKDD, 185 193. Xu, C.; Tao, D.; and Xu, C. 2015. Multi-view learning with incomplete views. IEEE Trans. Image Processing 24(12):5812 5825. Yin, Q.; Wu, S.; and Wang, L. 2015. Incomplete multi-view clustering via subspace learning. In ACM CIKM, 383 392. Yu, S.; Tranchevent, L.-C.; Liu, X.; Gl anzel, W.; Suykens, J. A. K.; Moor, B. D.; and Moreau, Y. 2012. Optimized data fusion for kernel k-means clustering. IEEE TPAMI 34(5):1031 1039. Zhang, R.; Li, S.; Fang, T.; Zhu, S.; and Quan, L. 2015. Joint camera clustering and surface segmentation for largescale multi-view stereo. In ICCV, 2084 2092. Zhu, X.; Liu, X.; Li, M.; Zhu, E.; Liu, L.; Cai, Z.; Yin, J.; and Gao, W. 2018. Localized incomplete multiple kernel k-means. In Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence, IJCAI 2018, 3271 3277.