# multiview_clustering_on_topological_manifold__f99eab91.pdf

Multi-View Clustering on Topological Manifold

Shudong Huang1, Ivor Tsang2, Zenglin Xu3, Jiancheng Lv1, Quan-Hui Liu1*

1 College of Computer Science, Sichuan University, Chengdu 610065, China 2 Centre for Artiﬁcial Intelligence, FEIT, University of Technology Sydney, Sydney, NSW 2007, Australia 3 School of Computer Science and Technology, Harbin Institute of Technology Shenzhen, Shenzhen 518055, China {huangsd,lvjiancheng,quanhuiliu}@scu.edu.cn, ivor.tsang@uts.edu.au, xuzenglin@hit.edu.cn

Multi-view clustering has received a lot of attentions in data mining recently. Though plenty of works have been investigated on this topic, it is still a severe challenge due to the complex nature of the multiple heterogeneous features. Particularly, existing multi-view clustering algorithms fail to consider the topological structure in the data, which is essential for clustering data on manifold. In this paper, we propose to exploit the implied data manifold by learning the topological relationship between data points. Our method coalesces multiple view-wise graphs with the topological relevance considered, and learns the weights as well as the consensus graph interactively in a uniﬁed framework. Furthermore, we manipulate the consensus graph by a connectivity constraint such that the data points from the same cluster are precisely connected into the same component. Substantial experiments on benchmark datasets are conducted to validate the effectiveness of the proposed method, compared to the state-of-the-art algorithms over the clustering performance.

Introduction In many real scenarios, data are often generated from different sources in diverse domains or described by various feature sets (i.e., views) (Bickel and Scheffer 2004; Huang, Kang, and Xu 2020). A prime example is the documents, which can be written in different languages; other prominent examples include images as well as web pages, the former is represented by different visual descriptors, and the latter is typically classiﬁed based on their content or citation links (Liu et al. 2018; Wang, Yang, and Liu 2019). These data are referred to as multi-view data. Usually, different views capture different aspects of information, any of which sufﬁces for mining knowledge (Li, Chen, and Wang 2019). Multi-view clustering, which partitions the data points into distinct clusters according to their compatible and complementary information encoded in heterogeneous features, has attracted widespread attention in the domain of unsupervised learning during the past two decades (Huang et al. 2021b; Nie, Cai, and Li 2017). Numerous multi-view clustering methods have been investigated up to now, among which the graph-oriented multiview clustering methods make up a large proportion due

*Corresponding author. Copyright 2022, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

their efﬁciency of learning relationships and underlying common structure shared by multiple views. (Liang, Huang, and Wang 2019) designed a new alternating optimization scheme such that the consistent and inconsistent parts of each single-view graph can be explicitly detected. (Huang et al. 2021a) proposed to simultaneously leverage the multiview consistency and the multi-view diversity in a joint framework. Due to the efﬁciency of extracting similarities between multiple views, the kernel strategy is widely utilized to boost the learning performance of multi-view clustering methods. (Tzortzis and Likas 2012) expressed each view in terms of given kernel matrix, and learned a weighted combination of the kernels in parallel to the partitioning. (Houthuys, Langone, and Suykens 2018) formulated the multi-view kernel spectral clustering as a weighted kernel canonical correlation analysis in a primal-dual optimization setting, in which a coupling term is included to enforce the clustering scores corresponding to the different views to align. (Huang et al. 2019) further performed multi-view clustering task and learned similarity relationships in kernel spaces simultaneously. Moreover, multiple kernel learning is adopted such that the clustering performance is robust to the input kernel matrix. (Chen, Xiao, and Zhou 2019) presented a novel kernelized method to handle nonlinear data structures by jointly learning the representation tensor and afﬁnity matrix. There are also several works that seek for a joint graph compatible across multiple views by making use of the bipartite graph fusion method. For instance, (Li et al. 2015) used the local manifold fusion to integrate heterogeneous features, and approximated the similarity graphs using bipartite graphs for multi-view spectral clustering. Furthermore, this kind of multi-view spectral clustering method is able to handle the out-of-sample problem. Inspired by the idea of anchor graph, (Nie et al. 2020) and (Kang et al. 2020) involved a small number of the data points as anchors so that the bipartite relations of anchor points and the data points can be used to cover the afﬁnity of the entire point cloud. Although graph-oriented multi-view clustering methods achieve promising results, there still exist several drawbacks. For one thing, the similarity predeﬁned in data graph is large only for the neighbors. Considering that real world data are typically sampled from a nonlinear manifold, the distant data points may keep high consistency if they are linked by con-

The Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22)

secutive neighbors. Therefore, these methods cannot fully investigate the latent topological structure of data lying on manifold. For another, the clustering performance of these methods are critically relies on initial graphs as they did not involve the graph learning as a part of the optimization procedure, which could lead to a degradation of their performance. Regrading the aforementioned deﬁciencies, we propose to exploit the implied data manifold by learning the topological relationship between data points. To be more speciﬁc, instead of the utilizing Euclidean structure, a more suitable manifold topological structure is explored to calculate the intrinsic similarities. We explicitly exploit the manifold structure of data by propagating the topological connectivities between data points from near to far. We further structurize the consensus graph by a connectivity constraint so that the data points from the same cluster are precisely connected into the same component. As a result, the proposed method coalesces multiple view-wise graphs with the topological relevance considered, and learns the weights as well as the structured consensus graph interactively in a uniﬁed framework. Substantial experiments on both toy data and real datasets are conducted to validate the effectiveness of exploring manifold topological structure, and demonstrate its superior performance compared to state-of-the-art competitors.

Preliminary Previous studies have shown that real world data are typically sampled from a nonlinear low-dimensional manifold which is embedded in the high dimensional ambient space (Roweis and Saul 2000; Zhang, Wang, and Zha 2011; Minh, Bazzani, and Murino 2016). Thus it is obviously crucial to uncover the manifold structure implied within the original data matrix. Recently, (Wang, Chen, and Li 2017) presented a propagation-based manifold learning method to reveal the intrinsic structures of crowds and calculate collectiveness by exploring the topological relationship between individuals. It is based on a simple yet intuitive assumption that the topological connectivities between individuals could be propagated from near to far. That is, the spatial similarity between two individuals may be low, but their topological relevance to each other would be high if they are linked by consecutive neighbors. Instead of the utilizing Euclidean structure, a more suitable manifold topological structure is explored to calculate the intrinsic similarities. According to the topological structure learning theory, if two data points keep high consistency, their topological relevance to any other point is assumed to be similar. Given a similarity graph G Rn n, where n is number of data points. Based on the assumption that data points with large similarity would share similar topological relevance to any other point, (Wang, Chen, and Li 2017) proposed to extract the topological relationship of data points by minimizing the following objective function

i,j,k=1 Gjk (Zij Zik)2 + α Z I 2 F , (1)

where i, j, and k are data points indexes, Z indicates the target topological relationship matrix, and Zij denotes the data point j s topological relevance to i. In Eq. (1), the ﬁrst term is essentially a smoothness constraint that follows the above assumption, i.e., it guarantees that data points j and k share similar topological relationship with data point i if j and k are similar. The second term is actually a ﬁtting constraint that prevents the trivial solution, where all the elements of Z to be identical. And the parameter α balances the two terms. Based on Eq. (1), the topological consistency is propagated through neighbors with high similarities, and the distant data points will keep close relationship if they are linked by consecutive neighbors. Finally, the optimization to the cost function deﬁned in Eq. (1) is able to guide the search of the topological relationship matrix Z. Notwithstanding, the learned Z does not contain the explicit cluster structures, hence a subsequent postprocessing step is required to obtain the ﬁnal discrete clustering results. Moreover, it is designed for the single-view setting, and thus cannot be directly applied for multi-view clustering tasks.

Our Proposed Methodology In order to utilize the manifold topological structure for multi-view clustering, in this paper we extend the formulation introduced in Eq. (1) to the multi-view clustering domain. For multi-view data with m views, let G(1), G(2), . . . , G(m) be the corresponding input similarity graphs, and G(v) Rn n(1 v m). According to Eq. (1), we can search the topological relationship for each view by solving

min Z(v) 1 2

i,j,k=1 G(v) jk Z(v) ij Z(v) ik 2 + α Z(v) I 2

s.t. z(v) i T 1 = 1, z(v) ij 0. (2) Here we constrain that the sum of each row of Z(v) is one, and all elements of Z(v) are non-negative. It is clear that if point j is connected with many similar neighbors, it will largely affect the objective value. Hence we propose a normalized version of Eq. (2) so that each point is treated equally, which can be formulated as

min Z(v) 1 2

i,j,k=1 G(v) jk

D(v) jj Z(v) ik q

+ α Z(v) I 2 F

s.t. z(v) i T 1 = 1, z(v) ij 0, (3) where D(v) is the degree matrix of G(v). Once the topological relationship matrices of all views are obtained, we need to adopt a set of suitable weights µ(v)(1 v m) to reﬂect the importance of each view. In detail, we can approximate every Z(v) with different conﬁdences by learning a consensus graph S. This thought can be modeled by minimizing the linear combination of S µ(v)Z(v) 2 F . Furthermore, as pointed out by (Mohar et al. 2001; Nie et al. 2020), we can manipulate the consensus graph to contain

exactly c connected components by adding a connectivity constraint rank (LS) = n c, where c is the number of clusters and LS is the Laplacian matrix of S. Thus we arrive at

min Z(v),S,µ(v) 1 2

i,j,k=1 G(v) jk

D(v) jj Z(v) ik q

+α Z(v) I 2

F + β S µ(v)Z(v) 2

s.t. z(v) i T 1 = 1, z(v) ij 0, s(v) i T 1 = 1, s(v) ij 0,

v=1 µ(v) = 1, rank (LS) = n c

(4) We see that Eq. (4) is difﬁcult to solve due to the rank constraint as it depends on S. Let σi (LS) be the i-th smallest eigenvalue of LS. The constraint rank (LS) = n c would be satisﬁed if Pk i=1 σi (LS) = 0 as LS is a positive semideﬁnite matrix. We incorporate the constraint term Pk i=1 σi (LS) into the cost function, thus our model can be ﬁnally formulated as

min Z(v),S F,µ(v)

i,j,k=1 G(v) jk

D(v) jj Z(v) ik q

+ α Z(v) I 2

F | {z } topological relevance learning

+ β S µ(v)Z(v) 2

F | {z } graph fusion

+ 2λTr FT LSF

| {z } partition label learning

s.t. z(v) i T 1 = 1, z(v) ij 0, s(v) i T 1 = 1, s(v) ij 0,

v=1 µ(v) = 1, FT F = I,

(5) where F Rn c denotes the cluster indicator matrix, λ is a self-tuned parameter, and the following Ky Fan s Theorem (Fan 1949) is employed

i=1 σi (LS) = min F Tr FT LSF

s.t. F Rn c, FT F = I.

It is noteworthy that our model formulated in Eq. (5) is distinct from other approaches in several aspects: Orthogonal to other multi-view clustering methods, our model explicitly exploit the implied data manifold by learning the topological relationship between data points. Considering the topological relevance of two data points could be high if they are linked by consecutive neighbors, it is critical to search a suitable manifold topological structure so that the intrinsic similarities can be better calculated. It is well-known that learning with multi-stage strategy usually leads to sub-optimal performance. Therefore, we

propose a joint learning framework that seamlessly integrates subtasks including topological relevance learning, graph fusion, and partition label learning together. We manipulate the consensus graph S by a connectivity constraint so that it contains exactly c connected components. Thus S can be considered as an indicator matrix, where the points from the same cluster are connected into the same component. In this way, the discretization procedure is no longer required in our model. Hence it is an end-to-end single-stage learning paradigm.

Optimization In this section, we design an iterative updating algorithm to solve the optimization problem in Eq. (5). Since it is not jointly convex in all variables, we propose to optimize the objective function with respect to one variable while ﬁx other variables. And the procedure repeats until convergence.

Update F With other variables ﬁxed, we solve F according to

min F Rn c,FT F=I Tr FT LSF , (7)

which is a classical spectral problem and the corresponding solution can be obtained by calculating the c eigenvectors of LS corresponding to the c smallest eigenvalues.

Update Z(v) for Each View For each Z(v), we need to solve

min Z(v) 1 2

i,j,k=1 G(v) jk

D(v) jj Z(v) ik q

+α Z(v) I 2

F + β S µ(v)Z(v) 2

s.t. z(v) i T 1 = 1, z(v) ij 0.

Note that Eq. (8) is independent for different v, thus for the v-th view we have

j,k=1 G(v) jk

D(v) jj Z(v) ik q

Z(v) ij Iij 2

Sij µ(v)Z(v) ij 2

s.t. z(v) i T 1 = 1, z(v) ij 0.

For each i, Eq. (9) can be further rewritten in a vector form as

z(v) i T I D 1

+α z(v) i ei 2

2 + β si µ(v)z(v) i 2

s.t. z(v) i T 1 = 1, z(v) ij 0.

Algorithm 1: Algorithm to solve Eq. (13)

Input: a nonzero matrix A and a nonzero vector b. Set 1 < ρ < 2, initialize η > 0, q. Output: Z(v). 1: repeat 2: Update p according to (16).

3: Update z(v) i according to (17). 4: Update η ρη.

5: Update q q + η z(v) i p .

6: until converge

Denote A = 1 + α + β µ(v) 2 I D 1

and b = 2αei + 2βµ(v)si, Eq. (10) can be stated as

min z(v) i T 1=1,z(v) ij 0

z(v) i T Az(v) i z(v) i T b. (11)

It is clear that Eq. (11) is a quadratic convex optimization problem, and we can solve it with the classical augmented Lagrangian multiplier (ALM) method (Bertsekas 1997). In detail, Eq. (11) can be solved by tackling its counterpart

min z(v) i T 1=1,z(v) ij 0,p=z(v) i

z(v) i T Ap z(v) i T b. (12)

Via ALM, the augmented Lagrangian function of Eq. (12) can be deﬁned as

min z(v) i T 1=1,z(v) ij 0,p

z(v) i T Ap z(v) i T b

z(v) i p + 1

where the second term in Eq. (13) is a penalty function term which guarantees that p = z(v) i , η and q are the corresponding penalty coefﬁcient and parameter, respectively. Note that p and z(v) i can be iteratively optimized: 1) Update p with ﬁxed z(v) i . The Lagrange function of Eq. (13) w.r.t. p is

Lp = z(v) i T Ap + η

z(v) i p + 1

Taking the derivative of Lp w.r.t p and setting the derivative to zero, i.e., Lp

p = 0, (15)

thus we have

p = z(v) i 1

AT z(v) i + q . (16)

2) Update z(v) i with ﬁxed p. The Lagrange function of Eq. (13) w.r.t. z(v) i can be written as

min z(v) i T 1=1,z(v) ij 0

z(v) i p + 1

Algorithm 2: The Algorithm for Eq. (5)

Input: Initial graphs {G(1), G(2), . . . , G(m)} for the m views, cluster number c, parameters α and β. Initialize the weight of each view µ(v) = 1

m. Initialize the consensus graph S = Pm v=1 µ(v)G(v). Output: The indicator matrix S Rn n with exactly c connected components. 1: repeat 2: Update F according to Eq. (7). 3: Update Z(v) by Algorithm 1. 4: Update S according to Eq. (21). 5: Update µ(v) according to Eq. (24). 6: until converge

which has a closed-form solution and can be readily obtained by the optimization algorithm proposed in (Huang, Nie, and Huang 2015). According to the ALM principles (Bertsekas 1997), η can be exaggerated increasingly during each iteration, and q is updated by q q + η z(v) i p . The detailed algorithm to solve Eq. (13) is summarized in Algorithm 1.

Update S Drop all unrelated terms of Eq. (5) w.r.t. S, thus we have

v=1 β S µ(v)Z(v) 2

F + 2λTr FT LSF

s.t. s(v) i T 1 = 1, s(v) ij 0.

Since Eq. (18) is independent for different i, thus we obtain

sij µ(v)z(v) ij 2 + λ

β fi fj 2 2 sij

s.t. s(v) i T 1 = 1, s(v) ij 0,

Eq. (19) can be further written as

sij µ(v)z(v) ij 2 + λ

β fi fj 2 2 sij

s.t. s(v) i T 1 = 1, s(v) ij 0, (20) For each i, we get the following compact formulation

min s T i 1=1,sij 0

v=1 µ(v)z(v) i λ

which can be effectively solved by the optimization algorithm proposed in (Huang, Nie, and Huang 2015).

Update µ(v) for Each View Optimizing Eq. (5) w.r.t. µ(v) is equivalent to solving

min Pm v=1 µ(v)=1,µ(v) 0

S µ(v)Z(v) 2

Method ACC NMI Purity F-score Precision Recall ARI SC(All Fea) 56.09 3.55 52.66 3.90 72.54 3.59 51.15 5.38 55.64 6.70 47.38 4.59 37.83 7.01 Co-train 56.45 4.77 56.55 3.61 76.15 2.03 53.87 4.44 62.43 4.87 47.38 4.10 42.31 5.48 Co-reg 55.86 4.00 55.38 1.77 74.56 2.18 54.08 4.68 58.20 5.92 50.55 3.94 41.38 6.24 Di MSC 76.15 0.56 63.47 0.82 80.30 0.56 68.59 0.35 64.28 0.84 78.29 0.60 56.01 0.57 WMSC 57.75 0.75 49.45 1.05 71.72 0.78 50.72 0.87 54.76 1.08 47.24 0.76 37.21 1.14 AWP 54.44 0.00 45.88 0.00 63.31 0.00 42.46 0.00 38.19 0.00 47.80 0.00 22.42 0.00 MCGC 56.80 0.00 34.21 0.00 65.09 0.00 51.58 0.00 41.21 0.00 68.93 0.00 31.72 0.00 m PAC 60.95 0.00 57.58 0.00 71.01 0.00 54.74 0.00 55.61 0.00 53.90 0.00 41.32 0.00 LMSC 61.48 3.44 49.72 2.31 70.89 1.33 58.52 2.65 60.87 2.88 56.46 3.79 46.58 3.24 GMC 69.23 0.00 54.80 0.00 74.56 0.00 60.47 0.00 48.44 0.00 80.45 0.00 44.31 0.00 CDG 71.78 6.04 69.88 5.42 81.54 3.17 67.43 5.38 66.27 2.82 66.05 8.36 57.99 6.29 Ours 78.70 0.00 65.93 0.00 82.25 0.00 70.71 0.00 69.97 0.00 82.33 0.00 60.14 0.00

Table 1: Clustering performance (mean standard deviation) on dataset 3sources (%).

Method ACC NMI Purity F-score Precision Recall ARI SC(All Fea) 71.22 3.24 68.29 0.90 73.31 2.30 63.46 1.91 60.78 2.28 66.43 2.20 59.22 2.15 Co-train 76.97 1.27 70.30 0.59 77.65 0.37 67.09 1.02 66.20 1.04 68.05 2.16 63.40 1.09 Co-reg 66.64 4.79 62.75 1.96 67.83 3.49 56.64 2.45 55.51 3.24 57.86 1.74 51.73 2.81 Di MSC 87.47 0.27 75.64 0.12 87.99 0.17 83.00 0.13 82.34 0.11 83.73 0.16 72.81 0.13 WMSC 90.55 0.02 84.41 0.04 90.55 0.02 83.00 0.05 82.45 0.05 83.56 0.04 81.10 0.05 AWP 74.85 0.00 73.94 0.00 74.85 0.00 72.24 0.00 64.66 0.00 81.83 0.00 68.77 0.00 MCGC 82.40 0.00 83.27 0.00 84.70 0.00 79.04 0.00 72.99 0.00 86.18 0.00 76.51 0.00 m PAC 61.45 0.00 60.39 0.00 61.70 0.00 56.79 0.00 51.98 0.00 62.57 0.00 51.51 0.00 LMSC 81.41 4.59 80.71 1.75 84.98 2.77 77.44 2.75 73.97 4.11 81.36 2.22 74.81 3.11 GMC 88.20 0.00 89.32 0.00 88.20 0.00 86.53 0.00 82.60 0.00 90.85 0.00 84.96 0.00 CDG 86.00 0.00 85.88 0.00 86.00 0.00 81.94 0.00 81.59 0.00 82.28 0.00 79.93 0.00 Ours 96.85 0.00 92.86 0.00 96.85 0.00 93.80 0.00 93.73 0.00 93.87 0.00 93.12 0.00

Table 2: Clustering performance (mean standard deviation) on dataset HW (%).

For each view, the Lagrange function of Eq. (22) is

S µ(v)Z(v) 2

where γ is the Lagrange multiplier for the v-th view. Setting the derivative of L w.r.t µ(v) to zero, we have

µ(v) = 2Tr SZ(v)T γ

2Tr Z(v)Z(v)T . (24)

Considering the constraint Pm v=1 µ(v) = 1, we can compute γ and further get each µ(v). The detailed algorithm to solve the objective in Eq. (5) is summarized in Algorithm 2.

Experiments We validate the proposed method by comparing it with following state-of-the-art competitors: Co-training multi-view spectral clustering (Co-train) (Kumar and Daum e 2011), Co-regularized multi-view spectral clustering (Co-reg) (Kumar, Rai, and Daume 2012), Diversity-induced multiview subspace clustering (Di MSC) (Cao et al. 2015), Weighted

Datasets n m c dv 3S 169 3 6 3560/3631/3068 HW 2000 6 10 216/76/64/6/240/47 Cal7 1474 6 7 48/40/254/1984/512/928 Cal20 2386 6 20 48/40/254/1984/512/928

Table 3: Characteristics of the data sets.

multi-view spectral clustering (WMSC) (Zong et al. 2018), Multi-view clustering via adaptively weighted procrustes (AWP) (Nie, Tian, and Li 2018), Multi-view consensus graph clustering (MCGC) (Zhan et al. 2019), Multiple Partitions Aligned Clustering (m PAC) (Kang et al. 2019), Latent multi-view subspace clustering ((LMSC) (Zhang et al. 2020), Graph-based multi-view clustering (GMC) (Wang, Yang, and Liu 2020), and Multi-view clustering via crossview graph diffusion (CGD) (Tang et al. 2020). The standard spectral clustering (SC) (Ng, Jordan, and Weiss 2002) is included as baseline. We perform SC on the concatenated features of all views (denoted by SC(All Fea)). Several benchmark multi-view data sets are used in this paper: 3source, Handwritten numerals, Caltech7 and Caltech20. Recall that

Method ACC NMI Purity F-score Precision Recall ARI SC(All Fea) 40.55 3.00 29.32 1.00 80.01 0.28 39.34 2.14 68.15 0.29 27.70 2.19 22.00 1.45 Co-train 40.78 3.76 33.24 2.76 79.86 2.36 42.73 3.35 75.53 3.44 29.82 2.77 26.82 3.60 Co-reg 46.09 5.43 38.86 2.57 81.91 1.16 48.54 3.91 79.53 4.80 34.97 3.29 32.76 4.44 Di MSC 41.72 0.80 32.21 0.59 76.19 0.68 42.42 0.56 71.84 0.95 30.10 0.70 25.45 0.18 WMSC 38.92 0.11 28.07 0.01 79.58 0.00 37.78 0.05 67.72 0.02 26.19 0.04 20.73 0.04 AWP 58.96 0.00 46.25 0.00 83.04 0.00 61.83 0.00 87.56 0.00 47.79 0.00 47.56 0.00 MCGC 55.22 0.00 47.00 0.00 82.97 0.00 58.78 0.00 74.21 0.00 48.66 0.00 40.67 0.00 m PAC 54.41 0.00 46.24 0.00 85.14 0.00 57.51 0.00 88.68 0.00 42.55 0.00 43.35 0.00 LMSC 53.05 2.90 47.01 3.20 85.77 1.62 55.18 3.26 85.28 2.82 40.83 3.05 40.33 3.64 GMC 69.20 0.00 58.56 0.00 88.47 0.00 72.17 0.00 88.58 0.00 60.88 0.00 59.43 0.00 CDG 57.15 3.95 51.46 2.55 86.17 1.06 58.88 3.16 86.84 1.91 44.61 3.49 44.37 3.36 Ours 70.22 0.00 60.66 0.00 88.81 0.00 72.52 0.00 89.05 0.00 61.16 0.00 59.94 0.00

Table 4: Clustering performance (mean standard deviation) on dataset Caltech7 (%).

Method ACC NMI Purity F-score Precision Recall ARI SC(All Fea) 29.43 1.33 33.96 0.66 60.00 0.66 23.97 0.93 47.92 1.51 15.98 0.69 17.24 0.93 Co-train 38.26 2.18 47.79 1.18 70.83 1.30 33.28 2.02 66.18 2.82 22.23 1.49 27.34 2.07 Co-reg 43.58 4.18 54.91 1.62 76.66 1.48 39.19 3.25 72.24 3.58 26.91 2.61 33.32 3.32 Di MSC 28.69 0.75 27.20 0.42 54.33 0.45 19.52 0.31 39.71 0.65 12.94 0.22 12.52 0.33 WMSC 33.26 1.42 41.50 0.58 67.08 0.52 29.83 1.61 57.61 3.01 20.13 1.12 23.38 1.73 AWP 51.55 0.00 55.90 0.00 73.39 0.00 50.43 0.00 71.62 0.00 42.61 0.00 47.00 0.00 MCGC 47.53 0.00 54.57 0.00 68.65 0.00 40.17 0.00 41.74 0.00 38.71 0.00 29.06 0.00 m PAC 55.11 0.00 56.90 0.00 75.82 0.00 51.27 0.00 78.68 0.00 38.02 0.00 45.49 0.00 LMSC 44.46 3.42 55.32 0.75 76.69 0.98 37.49 3.14 68.73 3.46 25.80 2.56 31.42 3.21 GMC 45.64 0.00 38.46 0.00 55.49 0.00 34.03 0.00 22.78 0.00 67.28 0.00 12.84 0.00 CDG 53.35 2.56 58.75 1.17 76.54 0.61 48.41 2.51 75.85 3.79 35.57 2.14 42.43 2.74 Ours 66.81 0.00 57.44 0.00 76.03 0.00 51.62 0.00 70.96 0.00 69.76 0.00 49.14 0.00

Table 5: Clustering performance (mean standard deviation) on dataset Caltech20 (%).

Figure 1: The consensus graph of data set HW learned by different methods. Best viewed in color.

n, m, and c denote the number of samples, views, and clusters, respectively. dv denotes the dimensionality of the features in the v-th view. The speciﬁc characteristics of these data sets are given in Table 3.

Seven widely-used metrics are adopted to achieve a comprehensive evaluation: clustering accuracy (ACC), Normalized Mutual Information (NMI), Purity, Precision, Recall, Fscore, and Adjusted Rand Index (ARI). Motivated by (Nie, Cai, and Li 2017), we initialize the initial graphs G(v) by selecting 20-nearest neighbors among raw data.

Clustering Results

We repeat each experiment 10 times, and their mean values as well as standard deviations are reported for comparison. Note that the best clustering performance is bolded. As shown in Tables 1 5, it is clear that our approach achieves the best performance in majority cases, which veriﬁes the effectiveness of our method. As mentioned before, S can be considered as an indicator matrix, where the points from the same cluster are connected into the same component. Once we obtain the consensus graph, the cluster label of each data

(d) F-score

(e) Precision

Figure 2: The clustering performance with respect to different parameter settings.

point can be directly assigned without any postprocessing. Hence our method is very stable and that s why the standard deviations of our method are always 0.00. To visualize the effect of connectivity constraint, we plot the consensus graph learned by different methods. Taking the dataset HW as an example, as shown in Figure 1, we see that compared method Di MSC cannot even ﬁnd the block diagonal structure of the consensus graph. MCGC is able to search the block diagonal structure, but the number of diagonal blocks is not correct. LMSC can ﬁnd the correct number of diagonal blocks, but it is seriously corrupted. It is clear that our method almost achieves a pure structured consensus graph with a much more clear clustering structure, which properly approximates the ground truth.

Parameter Analysis This section investigates the clustering performance with respect to different parameter settings. Note that the parameter λ can be tuned in a heuristic way. That is, we initialize λ to a random positive value (e.g., λ = 1), then our model is able to automatically halve or double it when the number of connected components of S is greater or smaller than the cluster number c during each iteration. Thus we only need to search the parameters α and β. For simplicity, we search both α and β in the range [0.05, 0.1, 0.5, 1, 2, 5, 10]. Taking the dataset 3sources as an example, in Figure 2 we see that the clustering performance of our method is relatively stable under different parameter settings, which demonstrates the robustness of our model. For simplicity, we can achieve de-

cent results by setting α = β = 1 in practical applications.

Conclusion In this paper, we propose to exploit the implied data manifold by learning the topological relationship between data points. Our method coalesces multiple view-wise graphs with the topological relevance considered, and learns the weights as well as the consensus graph interactively in a uniﬁed framework. Furthermore, we manipulate the consensus graph by a connectivity constraint so that the data points from the same cluster are precisely connected into the same component. To solve the optimization problem of our model, an efﬁcient iterative updating algorithm is proposed. Substantial experiments on real datasets are conducted to validate the effectiveness of the proposed method, compared to the state-of-the-art algorithms.

Acknowledgments This work was partially supported by the National Natural Science Foundation of China under Grants 62106164 and 62003230, Sichuan Science and Technology Program under Grant 2021ZDZX0011, Sichuan Key Project of Research and Development Program under Grant 2022YFG0188, and the 111 Project under Grant B21044.

References Bertsekas, D. P. 1997. Nonlinear programming. Journal of the Operational Research Society, 48(3): 334 334.

Bickel, S.; and Scheffer, T. 2004. Multi-view clustering. In ICDM, 2004, 19 26. Cao, X.; Zhang, C.; Fu, H.; Liu, S.; and Zhang, H. 2015. Diversity-induced multi-view subspace clustering. In CVPR, 586 594. Chen, Y.; Xiao, X.; and Zhou, Y. 2019. Jointly learning kernel representation tensor and afﬁnity matrix for multi-view clustering. IEEE Transactions on Multimedia, 22(8): 1985 1997. Fan, K. 1949. On a theorem of Weyl concerning eigenvalues of linear transformations I. Proceedings of the National Academy of Sciences of the United States of America, 35(11): 652 655. Houthuys, L.; Langone, R.; and Suykens, J. A. 2018. Multiview kernel spectral clustering. Information Fusion, 44: 46 56. Huang, J.; Nie, F.; and Huang, H. 2015. A new simplex sparse learning model to measure data similarity for clustering. In IJCAI, 3569 3575. Huang, S.; Kang, Z.; Tsang, I. W.; and Xu, Z. 2019. Autoweighted multi-view clustering via kernelized graph learning. Pattern Recognition, 88: 174 184. Huang, S.; Kang, Z.; and Xu, Z. 2020. Auto-weighted multiview clustering via deep matrix decomposition. Pattern Recognition, 97: 1 11. Huang, S.; Tsang, I.; Xu, Z.; and Lv, J. C. 2021a. Measuring Diversity in Graph Learning: A Uniﬁed Framework for Structured Multi-view Clustering. IEEE Transactions on Knowledge and Data Engineering, 1 15. Huang, S.; Tsang, I. W.; Xu, Z.; Lv, J.; and Liu, Q. 2021b. CDD: Multi-view Subspace Clustering via Cross-view Diversity Detection. In ACM MM, 2308 2316. Kang, Z.; Guo, Z.; Huang, S.; Wang, S.; Chen, W.; Su, Y.; and Xu, Z. 2019. Multiple Partitions Aligned Clustering. In IJCAI, 2701 2707. Kang, Z.; Zhou, W.; Zhao, Z.; Shao, J.; Han, M.; and Xu, Z. 2020. Large-scale multi-view subspace clustering in linear time. In AAAI, 4412 4419. Kumar, A.; and Daum e, H. 2011. A co-training approach for multi-view spectral clustering. In ICML, 393 400. Kumar, A.; Rai, P.; and Daume, H. 2012. Co-regularized multi-view spectral clustering. In NIPS, 1413 1421. Li, X.; Chen, M.; and Wang, Q. 2019. Adaptive consistency propagation method for graph clustering. IEEE Transactions on Knowledge and Data Engineering, 32(4): 797 802. Li, Y.; Nie, F.; Huang, H.; and Huang, J. 2015. Large-scale multi-view spectral clustering via bipartite graph. In AAAI, 2750 2756. Liang, Y.; Huang, D.; and Wang, C.-D. 2019. Consistency meets inconsistency: A uniﬁed graph learning framework for multi-view clustering. In ICDM, 1204 1209. Liu, X.; Zhu, X.; Li, M.; Wang, L.; Tang, C.; Yin, J.; Shen, D.; Wang, H.; and Gao, W. 2018. Late fusion incomplete multi-view clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(10): 2410 2423.

Minh, H. Q.; Bazzani, L.; and Murino, V. 2016. A unifying framework in vector-valued reproducing kernel hilbert spaces for manifold regularization and co-regularized multiview learning. Journal of Machine Learning Research, 17: 1 72. Mohar, B.; Alavi, Y.; Chartrand, G.; Oellermann, O. R.; and Schwenk, A. J. 2001. The laplacian spectrum of graphs. In Graph Theory, Combinatorics, and Applications, 871 898. Ng, A. Y.; Jordan, M. I.; and Weiss, Y. 2002. On spectral clustering: analysis and an algorithm. In NIPS, 849 856. Nie, F.; Cai, G.; and Li, X. 2017. Multi-view clustering and semi-supervised classiﬁcation with adaptive neighbours. In AAAI, 2408 2414. Nie, F.; Tian, L.; and Li, X. 2018. Multiview clustering via adaptively weighted procrustes. In KDD, 2022 2030. Nie, F.; Zhang, H.; Wang, R.; and Li, X. 2020. Multi-view clustering: A scalable and parameter-free bipartite graph fusion method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1 15. Roweis, S. T.; and Saul, L. K. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500): 2323 2326. Tang, C.; Liu, X.; Zhu, X.; Zhu, E.; Luo, Z.; Wang, L.; and Gao, W. 2020. CGD: Multi-view clustering via cross-view graph diffusion. In AAAI, 5924 5931. Tzortzis, G.; and Likas, A. 2012. Kernel-based weighted multi-view clustering. In ICDM, 675 684. Wang, H.; Yang, Y.; and Liu, B. 2019. GMC: Graph-based multi-view clustering. IEEE Transactions on Knowledge and Data Engineering, 32(6): 1116 1129. Wang, H.; Yang, Y.; and Liu, B. 2020. GMC: Graph-based Multi-view Clustering. IEEE Transactions on Knowledge and Data Engineering, 32(6): 1116 1129. Wang, Q.; Chen, M.; and Li, X. 2017. Quantifying and detecting collective motion by manifold learning. In AAAI, 4292 4298. Zhan, K.; Nie, F.; Wang, J.; and Yang, Y. 2019. Multiview Consensus Graph Clustering. IEEE Transactions on Image Processing, 28(3): 1261 1270. Zhang, C.; Fu, H.; Hu, Q.; Cao, X.; Xie, Y.; Tao, D.; and Xu, D. 2020. Generalized Latent Multi-View Subspace Clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(1): 86 99. Zhang, Z.; Wang, J.; and Zha, H. 2011. Adaptive manifold learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(2): 253 265. Zong, L.; Zhang, X.; Liu, X.; and Yu, H. 2018. Weighted multi-view spectral clustering based on spectral perturbation. In AAAI, 4621 4628.