# khyperedge_medoids_for_clustering_ensemble__490d6863.pdf

k-Hyper Edge Medoids for Clustering Ensemble

Feijiang Li1,2, Jieting Wang1,2, Liuya Zhang1, Yuhua Qian1,2*, Shuai Jin1, Tao Yan1,2, Liang Du1,2

1Institute of Big Data Science and Industry, Shanxi University 2Key Laboratory of Evolutionary Science Intelligence of Shanxi Province, Taiyuan, Shanxi, China jinchengqyh@126.com

Clustering ensemble has been a popular research topic in data science due to its ability to improve the robustness of the single clustering method. Many clustering ensemble methods have been proposed, most of which can be categorized into clustering-view and sample-view methods. The clustering-view method is generally efficient, but it could be affected by the unreliability that existed in base clustering results. The sample-view method shows good performance, while the construction of the pairwise sample relation is timeconsuming. In this paper, the clustering ensemble is formulated as a k-Hyper Edge Medoids discovery problem and a clustering ensemble method based on k-Hyper Edge Medoids that considers the characteristics of the above two types of clustering ensemble methods is proposed. In the method, a set of hyperedges is selected from the clustering view efficiently, then the hyperedges are diffused and adjusted from the sample view guided by a hyperedge loss function to construct an effective k-Hyper Edge Medoid set. The loss function is mainly reduced by assigning samples to the hyperedge with the highest degree of belonging. Theoretical analyses show that the solution can approximate the optimal, the assignment method can gradually reduce the loss function, and the estimation of the belonging degree is statistically reasonable. Experiments on artificial data show the working mechanism of the proposed method. The convergence of the method is verified by experimental analysis of twenty data sets. The effectiveness and efficiency of the proposed method are also verified on these data, with nine representative clustering ensemble algorithms as reference.

Introduction Data clustering is an interesting technique to discover the group structure inherent in a set of data samples without labels (Vega-Pons and Ruiz-Shulcloper 2011). The general objective of data clustering is discovering a set of clusters in which the samples in the same cluster share some kind of high similarity while the samples in different clusters share high diversity. With the complexity of the data development, many traditional clustering methods suffer from the robust problem. To handle this, the clustering ensemble technique that combines multiple diverse clustering results has been illustrated as an effective strategy to improve the robustness

Copyright 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

of a single clustering result. The clustering ensemble problem is first proposed by Strehl and Ghosh (Strehl and Ghosh 2002), which is described as combining multiple clustering results of a set of objects into a single consolidated clustering that shares most information with the ensemble base without accessing the features of the data samples or the algorithms that produce these clustering results. Following this description, many methods have been proposed to solve the clustering ensemble problem. Most of the existing clustering ensemble methods can be roughly categorized into clustering-view and sample-view. As for the clustering-view methods, their main idea is to generate a clustering result that is similar to the clustering results in the ensemble base. To generate such a clustering result, a simple strategy is to first align the labels of the different clustering results and then merge them using a fusion strategy, such as voting. Another strategy is to give a clustering similarity measure and then use a dynamic programming method to produce the integrated result. Generally, due to that the number of clustering results is much less than that of samples, most of the clustering-view methods are efficient. However, the unreliability that existed in the base clustering results brings a negative effect on the ensemble performance (Zhou et al. 2022). As for the sample-view methods, their main idea is to generate a clustering result that the samples in the same cluster share high similarity. There are two issues for this type of method, one is how to evaluate the similarity between samples and the other one is how to discover such a clustering result. As for the first issue, the co-association relation that evaluates the times that two samples appear in the same cluster is one of the most utilized sample pair similarity measures (Li, Qian, and Wang 2021). As for the second issue, many partition methods have been proposed, in which the clustering strategies (Liu et al. 2017), graph partition strategies (Strehl and Ghosh 2002), or heuristic strategies (Li et al. 2019) are utilized. This type of method has shown excellent effectiveness in handling the clustering ensemble problem. However, calculating the pairwise sample relationships and optimizing a clustering loss is often time-consuming. With consideration of the characteristics of the above two types of clustering ensemble methods, in this paper, a method based on k-Hyper Edge Medoids is developed,

The Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25)

which integrates clustering-view and sample-view. First, the description of k-Hyper Edge Medoids is given. For a hypergraph, the k-Hyper Edge Medoids are k nonoverlapping hyperedges that each hyperedge in the hypergraph could find a similar hyperedge in the k-Hyper Edge Medoids. Then the clustering ensemble problem is formulated as a k-Hyper Edge Medoids discovering problem. We propose a method to solve the problem and conduct theoretical and experimental analysis about the method. The main work of this paper in terms of methods, theory, and experiments includes: Method. A k-Hyper Edge Medoids discovery method is proposed 1. This method mainly contains three steps, which are k-Hyper Edge initialization, k-Hyper Edge diffusion, and k-Hyper Edge adjustion. The k-Hyper Edge initialization step is from the clustering view, which utilizes the k-mediods algorithm to discover a set of sub-hyperedges mediods. The k-medoids algorithm is suitable for arbitrary distance measurement and robust to noise (Tiwari et al. 2020). The k Hyper Edge diffusion and k-Hyper Edge adjusting improve the quality of the initial hyperedges methods from the sample view. The improvement is realized by updating the sample assignment guided by a hyperedge loss function based on cluster quality. The loss function is mainly reduced by assigning samples to the hyperedge with the highest degree of belonging. Theory. Three main issues in the proposed method are theoretically analyzed. Firstly, given the initial hyperedge set, the solution has been proven to be able to approximate to the optimal, indicating its ability to guide model learning. Secondly, assigning samples to the hyperedge with the highest belonging degree has been proven to be able to reduce the value of the loss function. Finally, the method for estimating the belonging degree has been proven to be statistically reasonable. Experiment. The proposed method is experimentally analyzed from three aspects. The working mechanism of the proposed method is illustrated on artificial data. The effectiveness of the method in solving the clustering ensemble problem is illustrated on twenty real-world data sets compared with the other nine representative clustering ensemble methods. The time complexity of the proposed method is analysed and the time costs of the compared methods on real data sets are reported. The results show the effectiveness and efficiency of the proposed method.

Preliminaries and Related Work Clustering Ensemble As for clustering ensemble, what we can obtain is a set of clustering results Π = {π1, π2, . . . , πl} on a data set X = {x1, x2, . . . , xn}, where l is number of base clustering result and n is the number of data point. The jth clustering result contains kj clusters πj = {cj 1, cj 2, . . . , cj kj}

and X = Skj i=1 cj i. The cluster label of xi in πj is πj(xi). For Π, there exists nc = Ph i=1 ki base clusters. The task

1Demo code is at https://github.com/Feijiang Li/Code-k Hyper Edge-Medoids-for-Clustering-Ensemble-AAAI

of the clustering ensemble is to discover a clustering result π = {c 1, c 2, . . . c k} that shares the most information with Π, where k is the excepted number of clusters. In the original description of the clustering ensemble, the generation of π does not access the features or algorithms that determine these clustering results (Strehl and Ghosh 2002). This paper follows this description. Many clustering ensemble methods have been proposed, accompanied by various classification schemes for these methods (Vega-Pons and Ruiz-Shulcloper 2011). From the view of processing, most of the clustering ensemble methods can be roughly categorized into clustering-view and sampleview.

Clustering-view Clustering Ensemble The clustering ensemble methods from the clustering-view are mainly guided by the objective that the integrated clustering should be similar to the base clustering results. Then, the objective can be expressed as follows,

π = arg max π

π Π sim(π, π ), (1)

where sim(π, π ) is a similarity between π and π . The optimization of the objective is an NP-hard problem, most of the proposed methods generally obtain an approximate solution. One direct approach is to first align the cluster labels and then merge them using a voting strategy (Zhou and Tang 2006). Ayad et al. (Ayad and Kamel 2008, 2010) introduced an idea of cumulative voting that utilizes a probabilistic mapping method for cluster label alignment, and computed empirical probability summarizing the ensemble. Carpineto et al. (Carpineto and Romano 2012) proposed a probabilistic rand index and cast clustering ensemble as an optimization problem of the PRI between a target result and the ensemble base. Li et al. (Li et al. 2017) proposed a Dempster-Shafer evidence theory-based clustering ensemble method that substitutes voting with evidence theory. Huang et al. (Huang, Gao, and Akhavan 2023) proposed an ensemble hierarchical clustering algorithm based on merits at the clustering level, the main idea of which is to re-cluster the selected clusters to generate a consensus clustering. Guilbert et al. (Guilbert et al. 2022) proposed an anchored constrained clustering ensemble method that constructs an allocation matrix and uses an integer linear programming model to find the consensus partition. Generally, these methods are simple and efficient. However, the performance of these methods relies on the reliability of the underlying clustering ensemble. Due to the diversity among the base clustering results, low-reliability results exist in the ensemble inevitably, which could affect the performance of many clustering-view clustering ensemble methods.

Sample-view Clustering Ensemble The clustering ensemble methods from the sample view are mainly guided by the objective that the intra-cluster samples in the integrated clustering should be similar to each other. This objective is equivalent to finding a clustering result with

high cluster quality, where cluster quality is measured by the consistency among intra-cluster samples. The objective can be expressed as follows

π = arg max π

ci π Q(ci), (2)

where Q(ci) indicates the quality of ci. From the sample view, Q(ci) is generally evaluated by the consistency between samples in ci. This type of methods are generally realized by optimizing a clustering loss function. Fred et al. (Fred and Jain 2005) first proposed the concept of the co-association matrix (CA matrix) and proposed an evidence accumulation method that utilizes the idea of hierarchical clustering. The spectral ensemble clustering method proposed by Liu utilizes the spectral clustering method to optimize the clustering loss (Liu et al. 2017). Iam-On proposed a type of link-based approach to refine the cluster-based matrix, and k-means, PAM, and graph partitioning algorithms are used to generate the final result. Graph partitioning techniques have been used in many clustering ensemble methods to optimize the clustering loss, such as CSPA, HGPA, MCLA (Strehl and Ghosh 2002), HBGF (Fern and Brodley 2004), PTGP (Huang, Lai, and Wang 2016), U-SENC (Huang et al. 2020), SCCBG (Zhou, Du, and Li 2020), etc. Hao et al. (Hao et al. 2024) proposed a clustering ensemble method with attentional representation that jointly optimizes the clustering loss and the reconstruction loss. In addition, many enriching methods are proposed to improve the relation between samples, such as the probability trajectory proposed by Huang et al. (Huang, Lai, and Wang 2016), the propagating cluster-wise similarities with meta-cluster proposed by Huang et al. (Huang et al. 2021), co-association matrix self-enhancement proposed by Jia et al. (Jia et al. 2023), dense representation proposed by Zhou et al. (Zhou, Zheng, and Pan 2019), stable clustering ensemble method based on evidence theory proposed by Fu et al. (Fu et al. 2022), etc. This type of method has shown excellent ensemble performance. However, for such methods, the construction of the co-association relationship is time-consuming, so as with many enriching methods, the time cost of which is generally at least O(n2), where n is the number of samples. Although some methods utilize representative point selection or downsampling strategies for acceleration, this affects the stability and effectiveness of the methods. In this paper, considering the characteristics of the above two types of clustering ensemble methods, we propose a clustering ensemble method based on k-Hyper Edge Medoids.

k-Hyper Edge Medoids It is natural to represent a set of clustering results as a hypergraph (Zhou et al. 2022). In this section, we introduce the k-Hyper Edge Medoids. A hypergraph G can be represented as G = {V, E, W}, where V is the node set, E is the hyperedge set, and W is the set of hyperedge weight. Mainly different from the traditional edge in the graph, the hyperedge links a set of nodes

in the graph that can represent complex multivariate relationships more flexibly. For example, in a hypergraph with a node set V = {x1, x2, x3, x4, x5}, a hyperedge e could connect multiple nodes, such as e = {x1, x2, x5}, representing a common relationship or connection among x1, x2, and x5. Given a set of clustering results Π = {π1, π2, . . . , πl} on a data set X = {x1, x2, . . . , xn}, from cluster view, Π can be expressed as Πc = {c1 1, c1 2, . . . , c1 k1, . . . , cl 1, cl 2, . . . , cl kl}. For convenience, we denote Πc = {c1, c2, . . . , cnc}. (3) The hypergraph representation of Πc is G(Πc) = {V(Πc), E(Πc), W(Πc)}, where V is the set of data points V(Πc) = X, E is the set of clusters E(Πc) = Πc, and W is the weight of each cluster. The cluster weight can be quantified by cluster quality evaluation. In this paper, the cluster weight is not a focus and is set equally. Definition 1. (k-Hyper Edge Medoids) For a hypergraph G = {V, E, W}, the k-Hyper Edge Medoids Em(G) are k hyperedges

Em(G) = arg max E

e E max e E sim(e, e ), (4)

e Em = V, ei, ej Em, i = j, ei ej = .

From Definition 1, it is known that the k-Hyper Edge Medoids are k representative hyperedges that maximize overall similarity with all hyperedges in the hypergraph. Additionally, the k-Hyper Edge Medoids are non-overlapping and collectively cover all nodes. From the view of clustering ensemble, each cluster can be treated as a hyperedge, and multiple clusters form a hypergraph. Then the k-Hyper Edge Medoids of G(Πc) can be treated as a clustering ensemble result of Πc, and the clustering ensemble problem can be formulated as a k-Hyper Edge Medoids discovering problem.

A k-Hyper Edge Medoids Discovery Method In this section, a heuristic k-Hyper Edge Medoids discovery method is proposed to approximatively solve Formula (4). The method mainly contains three steps, which are k-hyperedge initialization, k-hyperedge diffusion, and k-hyperedge adjustion. The main framework of the k Hyper Edge Medoids discovers method is shown as Figure 1. Then processes of each step are introduced.

k-Hyper Edge Initialization The objective of the k-hyperedge initialization step is finding a set of k hyperedges from E(Πc) = {c1, c2, . . . , cnc}, i.e. Ei = {ei1, ei2, . . . eik} E, that minimizes the following loss:

i=1 min ei Ei (1 sim(ei, ci)) , (5)

where sim(ei, ci) is a similarity measure between ei and the ith hyperedge ci. To prevent the discovery of an excessively large hyperedge, in this step, we use the following similarity measure

sim(ei, ci) = |ei ci|

Ś Representing hyperedge ś Running k-medoids Ŝ Removing overlaps

K-Hyper Edge Diffusion

K-Hyper Edge Adjustionc

K-Hyper Edge Medoids Discovery

ŝ Iteratively assigning samples

Ş Iteratively improving hyperedge quality

K-Hyper Edge Initialization

Figure 1: The framework of a k-Hyper Edge Medoids discovery method

As shown by Equation (5), what we want to find is k hyperedges, in which each hyperedge in E(Πc) could find a similar representation. Equation (5) is equivalent to the loss function of the k-medoids algorithm. Then the kmedoids algorithm can be directly utilized to solve (5). Meanwhile, the k-medoids algorithm has several advantages in this step. Firstly, the medoids are real points, which are interpretable. Secondly, k-medoids are not sensitive to noise, which could mitigate the negative effects of unreliability clusters. Thirdly, the distance or consistency function in kmedoids can be arbitrary, which makes it possible to adopt Equation (6) (Tiwari et al. 2020). After run k-medoids algorithm on E(Πc), k hyperedges are obtained, which is noted as Ei = {ei1, ei2, . . . eik}. Due to the selected hyperedges could come from different clusterings, which may not satisfy S ei Ei ei = V and eii, eij Ei, i = j, eii eij = . To handle the first condition, we remove the intersection regions between different hyperedges and regenerate Ei = {ei1, ei2, . . . eik}. To handle the second one, we conduct the following hyperedge diffusion step. The processes of the k-Hyper Edge initialization step are shown as Algorithm 1 in Appendix A.1 2.

k-Hyper Edge Diffusion The purpose of the k-Hyper Edge diffusion step is to make each sample in X belong to a hyperedge. This step is realized by iteratively assigning a sample to the hyperedge which it is most likely to belong. Firstly, in this step, we initialize Ed = Ei. In each iteration, a set of samples that with high belonging confidence will be reassigned to the k initialization hyperedges Ed = {ed1, ed2, . . . edk}. With the progress of iteration, the number of samples to be assigned gradually increases. The diffusion step stops when the number of the assigned sample is equal to the number of data points. The calculation of the sample confidence and the number of samples to be assigned are then introduced. Firstly, given E(Πc), the belonging degree of sample xi to a hyperedge ed is defined as the average similarity between ed and the hyperedges that xi belongs to in E(Πc).

2The Appendix is in (Li et al. 2024)

The equation is

b (x, ed) = X

x cx |cx ed|, (7)

where cx E(Πc). Based on Equation (7), the belongings of a sample to all the hyperedges in Ed can be obtained. Then, the hyperedge to which sample xi belongs is obtained by

ed (xi) = arg max ed Ed b(xi, ed). (8)

The sample xi will be assigned to ed with a confidence degree of

m(xi) = b (xi, ed (xi)) max edj =ed (xi), edj Ed b (xi, edj) , (9)

which is the difference between the maximum and the second maximum degree of belonging. With Equation (9), the confidence of all the samples in X can be obtained, and the samples with high confidence will be selected and re-assigned to the hyperedge set in hand. Assuming Ed is the present hyperedge set, the number of samples in Ed is

With the overall consideration of na and n, the number of samples ns to be selected is calculated as

ns = min{(na + max{ n na

k , n }), n}. (11)

For an edge ed, the diffusion is realized by

ed = {x|ed (x) = ed, m(x) m(ns)(X)}, (12)

where m(ns)(X) is the ns-th largest element in the set {m(x1), m(x2), . . ., m(xn)}. Repeating the above diffusion process, the number of samples in Ed will increase. When na = n, the k-Hyper Edge diffusion process stops, and the obtained hypergedge set after this step is noted as Ed = {ed1, ed2, . . . , edk}, which satisfies S ed Ed = V and edi, edj Ed, i = j, edi edj = . The processes of the k-Hyper Edge diffusion step are shown as Algorithm 2 in Appendix A.1. Further, the quality of Ed is improved by the following k-Hyper Edge adjustion step.

k-Hyper Edge Adjustion After assigning all the samples in X, we further improve the quality of the hyperedge set from the sample view according to Equation (2). Assuming the hyperedge set is Ea = {ea1, ea2, . . . , eak}, the quality of eai is

x eai b(x, eai). (13)

The loss function of the k-Hyper Edge adjustion step is

i=1 (n|eai| Q(eai)) (14)

i=1 (n b(xi, ea (xi))) .

The loss value can be iteratively reduced by updating Ea until convergence. Assuming the hyperedge set in the t-th iteration is Eat = {eat1, eat2, . . . , eatk}, the belongings of a sample xi to a hyperedge eatj in Eat can be calculated by Equation (7), noted as b(xi, eatj). Then, the hyperedge that xi belongs to in the (t + 1)-th iteration is

ea(t+1) (xi) = arg max eat Eat b(xi, eat). (15)

The i-th hyperedge is updated by

ea(t+1) i = {x|ea(t+1) (x) = eat i, x X}. (16)

The k updated hyperedges constitute the hyperedge set Ea(t+1). The adjustion will stops when Ea(t+1) = Eat. The processes of the k-Hyper Edge adjustion step are shown as Algorithm 3 in Appendix A.1. The final hyperedge set is the discovered k-Hyper Edge Medoids and noted as Em = {e1, e2, . . . , ek}. The final corresponding clustering ensemble result is

π = {c1 = e1, c2 = e2, . . . , ck = ek}. (17)

We note the above clustering ensemble method based on the construction of k-Hyper Edge Medoids as CEHM. The processes of CECH is shown as Algorithm 4 in Appendix A.1.

Theoretical Analysis In this section, we give three theorems about the key element in the method to show the rationality. First, given a g-approximate initial hyperedge set Ea0, we prove that the solution can approximate to the optimal, which is reflected by the following theorem.

Theorem 1. let Ea0 = {ea01, ea02, . . . , ea0k} be a set of hyperedges produced by a g-approximate algorithm based on a base cluster set Πc, and Let

dist(eai, eaj) = n |eai eaj|, (18)

ϕ(eai; eaj) = X

x eaj (n b(x, eai)) (19)

x cx |cx eai|

then for any hyperedge set, denoted by Ea = {ea1, ea2, . . . , eak}, we have for any eai, there exists ed0i satisfies dist(eai, ea0i) (g+1)ϕ(eai; )

Secondly, we prove that assigning samples to the hyperedge with the highest belonging degree can reduce the value of the loss function.

Theorem 2. Assuming the optimal hyperedge of sample x is the i-th hyperedge: ea (x) = eai; assuming a hyperedge sets Ea1 = {ea11, ea12, . . . , ea1k} and x is in the

i-th hyperedge in Ea1: x ea1i; assuming another hyperedge set Ea2 based on Ea1, and in Ea2, x is moved from the i-th hyperedge into the j-th hyperedge: for p {1, 2, .., k}/{i, j}, ea2i = ea1i/x, ea2j = ea1j x. Then, we have Ladj(Ea1) Ladj(Ea2). Thirdly, we prove that using neighbors to estimate the belonging degree is statistically reasonable. To evaluate the similarity between x and ea, Equation (7) uses the number of samples in the intersection set of the cluster cx and ea. Indeed, to estimate the posterior probability η(X) = P(x ea|X = x), the essence of Equation (7) is to use the empirical conditional posterior probability

ˆηN(x) = 1 Ncx

i:Xi cx I{Xi ea}, (20)

where Ncx is the number of samples in cx. To evaluate of the estimator ˆηN(x), we explore the upper bound of the estimation error E|ˆηN(X) η(X)|, which is reflected by the following theorem. Theorem 3. Consider an estimator ˆηN(X) of η(X) as defined above. Suppose that the second moment of η(X) of ea exists: Ex eaη(X)2 < + and the difference in probability density between set ea and the whole set X is bounded: that

is, there exists C, such that Ex ea m(x|x X) m(x|x ea) 2 < C. Then E|ˆηN(X) η(X)| 0 if

m(x|x ea) 1 2 0, (22)

Vx ea(η(X)) 0, (23)

where m( ) is the probability density function and V is the variance operator. Theorem 3 give three intuitively reasonable and principled rules to design a good estimator: The ensemble generation method should guarantee each cx has many samples, as suggested by Formula (21). The sets cx and ea should have a high similarity, as suggested by Formula (22). In our method, according to Equation (7) and Equation (15), there are many samples in the intersection of ea and cx, which follows this suggestion. For an ideal k-Hyper Edge Medoids discovering problem, the posterior probabilities in ea should be concentrated, as suggested by Formula (23). The proofs of the above theorems are given in the Appendix A.2 to A.4, respectively.

Experimental Analyses The CEHM is experimentally analyzed from three aspects: the demonstration of working mechanisms on artificial data, convergence demonstration on real data, and ensemble performance comparison with representative methods. The experimental analyses are conducted on a PCWIN 64 computer with 64G memory.

Number Data name n d k 1 iris 150 4 3 2 wine 178 13 3 3 seeds 210 7 3 4 heart 294 12 2 5 soybean-train 307 35 18 6 ecoli 336 7 8 7 dermatology 366 34 6 8 low-res-spect 531 100 9 9 breast-cancer-wisc-diag 569 30 2 10 energy 768 8 3 11 lbp-riu-gris 1022 25 3 12 semeion 1593 256 10 13 statlog-landsat-test 2000 36 6 14 cardiotocography-3clases 2126 21 3 15 statlog-landsat-train 4435 36 6 16 twonorm 7400 20 2 17 mushroom 8124 21 2 18 statlog-shuttle 43500 9 7 19 Pen Digits 10992 16 10 20 USPS 11000 256 10

Table 1: Description of the data sets

For a data set, the clustering result set is generated by running the k-means algorithm multiple times with different initial centers and different cluster numbers k. For a data set with n samples and k clusters, the numbers of clusters in the base clustering results are randomly selected in the range k, max{min{ n, 50}, 3

2k } . This setting could improve the diversity between the clustering results in the ensemble base.

Working Mechanism To show the working mechanism of CEHM, four twodimensional artificial data sets are utilized. The information about these data is shown in Table 3 in Appendix A.5. Figure 2 and Figure 5 to Figure 7 in Appendix A.5. show some of the process results of CEHM on these data sets. In Figure 2 and Figure 5 to Figure 7, the data points that are not in the hypergraph are shown by Gray dot, Ei indicates the result of the k-Hyper Edge initialization step, Ed(1) and Ed(2) are two results of the k-Hyper Edge diffusion step, Ea(1) and Ea(2) are two results of the k-Hyper Edge adjustion step, and Em is the discovered k-Hyper Edge Medoids. From Figure 2 and Figure 5 to Figure 7, it can be seen that the operation process conforms to the expectations of each stage.

Convergence The convergence of CEHM is analyzed on twenty widely used benchmark data sets. The information about these data sets is shown in Table 1. For each data set, we generate 30 base clustering result sets and then show the loss of the hyperedge that is sequentially generated during the running of CEHM on each base clustering result set. The hyperedge loss is calculated by Equation (14). Figure 3 shows the results of the convergence analysis on the first 5 data sets. The rest convergence results are shown in Figure 8 in Appendix A.6. From Figure 3 and

(a) Ei (b) Ed(1) (c) Ed(2)

(d) Ea(1) (e) Ea(2) (f) Em

Figure 2: Working mechanism of CEHM on the Wing Nut data set

Figure 8, it can be seen that the loss of the hyperedge set gradually reduces for each test. The CEHM algorithm can converge quickly. On the test on these data sets, the maximum number of iterations is not greater than 30.

Ensemble Performance

To test the clustering ensemble performance of CEHM, nine representative clustering ensemble methods are utilized as reference methods, which are U-SENC (Huang et al. 2020), DREC (Zhou, Zheng, and Pan 2019), EC-CMS (Jia et al. 2023), PTGP (Huang, Lai, and Wang 2016), ECPCS-MC (Huang et al. 2021), PTA (Huang, Lai, and Wang 2016), WCT (Iam-On et al. 2011), WTQ (Iam-On et al. 2011) and Voting (Li et al. 2017). The details about the methods and their settings are in the Appendix A.7. To evaluate the clustering ensemble result of each method, two of the most widely used external indices are utilized, which are Normalized Mutual Information (NMI) (Strehl and Ghosh 2002) and Adjusted Rand Index (ARI) (Hubert and Arabie 1985). The ensemble performance results are shown in Table 2 and Table 4 (in Appendix A.8). The tables respectively show the average NMI and AIR of the 30 times for each method on each data set. The maximum index value of each comparison is marked in bold type with underlining. The OM means the corresponding method out of memory during handling the data. From the tables, it is easy to see that the CEHM is marked on most of the data sets, which show the effectiveness of CEHM in handling the clustering ensemble problem on these data sets. We statistically analyze the experimental results that are shown in the Table 2 and Table 4 (in Appendix A.8). The analysis methods and results are included in Appendix A.9.

The time complexity of CEHM consists of three parts, which correspond to the three steps. The time complexity of the k Hyper Edge initialization step is O(nc2) (Tiwari et al. 2020). The time complexity of the k-Hyper Edge diffusion step and

2 4 6 8 Iteration

1 2 3 Iteration

2 4 6 Iteration

1 2 3 4 5 Iteration

1 2 3 4 5 Iteration

Figure 3: The convergence demonstration of CEHM on the first five data sets

Data U-SENC DREC EC-CMS PTGP ECPCS-MC PTA WCT WTQ Voting CEHM 1 0.6782 0.7123 0.7123 0.6958 0.6907 0.5706 0.7128 0.7080 0.6309 0.7456 2 0.8961 0.8871 0.8442 0.8833 0.8783 0.8685 0.8947 0.8562 0.8276 0.9039 3 0.7316 0.6290 0.6770 0.6426 0.6960 0.6022 0.6915 0.6788 0.7297 0.7320 4 0.3054 0.2644 0.3033 0.1422 0.2835 0.1179 0.2790 0.2106 0.1797 0.3072 5 0.7373 0.7277 0.7262 0.6916 0.7366 0.6957 0.7135 0.7160 0.6952 0.7420 6 0.6178 0.6048 0.6482 0.5757 0.6493 0.6063 0.6000 0.5924 0.5371 0.7007 7 0.8910 0.8757 0.8572 0.8838 0.8952 0.8873 0.8532 0.8336 0.8109 0.9109 8 0.4278 0.4228 0.4633 0.4269 0.4619 0.4346 0.4045 0.4069 0.3967 0.4674 9 0.6074 0.6315 0.2299 0.5885 0.5570 0.5398 0.5710 0.5841 0.4522 0.6511 10 0.5853 0.5853 0.5853 0.5853 0.5853 0.5853 0.5580 0.5612 0.3762 0.5853 11 0.3850 0.4118 0.3156 0.3210 0.3234 0.3768 0.3605 0.3539 0.3400 0.4718 12 0.6424 0.6573 0.6369 0.6463 0.6457 0.6608 0.6265 0.5987 0.6008 0.6665 13 0.6175 0.6142 0.5348 0.5998 0.6190 0.5864 0.6119 0.5748 0.5621 0.6316 14 0.2035 0.1365 0.1352 0.1044 0.1865 0.1137 0.1166 0.1694 0.1088 0.2391 15 0.6299 0.6479 0.6071 0.6376 0.6510 0.6630 0.6134 0.5642 0.5322 0.6687 16 0.8400 0.8384 OM 0.8378 0.8426 0.8204 0.8411 0.8400 0.8432 0.8444 17 0.1907 0.2196 OM 0.2214 0.1902 0.0925 0.1909 0.2486 0.2272 0.3359 18 0.4931 0.3352 OM 0.4019 0.1002 0.4253 0.4024 0.3353 0.3043 0.5994 19 0.7526 OM OM OM 0.7550 OM 0.7273 0.6911 0.6752 0.7567 20 0.5434 OM OM OM 0.5603 OM 0.5613 0.4983 0.5025 0.5818

Table 2: The NMI from the compared methods on the twenty data sets

Time Cost (seconds) [Log Scale]

Data 1 Data 2 Data 3 Data 4 Data 5 10-2

Data 6 Data 7 Data 8 Data 9 Data 10 10-2

Data 11 Data 12 Data 13 Data 14 Data 15

Data 16 Data 17 Data 18 Data 19 Data 20

U-SENC DREC EC-CMS PTGP ECPCS-MC PTA WCT WTQ Voting CEHM

Figure 4: The time cost of the compared methods on the twenty data sets

the k-Hyper Edge adjustion are both O(tnnc). Then the total time the k-Hyper Edge adjustion is O(nc2 + 2tnnc). The time costs of these methods on the twenty data sets are also compared. Figure 4 shows the average running time of each method on the twenty data sets. From Figure 4, it can be seen

that CEHM consumes the least running time except for USENC, a method specifically designed for large-scale data with the downsampling techniques. The results illustrate the efficiency of CEHM.

Conclusion In this paper, we proposed k-Hyper Edge Medoids for clustering ensemble. The k-Hyper Edge Medoids were defined as nonoverlapping hyperedges, and each hyperedge in the hypergraph could find a similar hyperedge medoid. We solved the clustering ensemble problem by discovering a set of k Hyper Edge Medoids. The proposed discovery method contains k-Hyper Edge initialization, k-Hyper Edge diffusion, and k-Hyper Edge adjustion. Concretely, a set of initial subhyperedges is selected in the k-Hyper Edge initialization step, and the quality of the chosen hyperedges is improved by diffusion and adjustment guided by a hyperedge loss function that was reduced by assigning samples to the hyperedge with the highest degree of belonging. The rationality of the solution, the assignment method, and the belonging degree estimation method are illustrated by theoretical analysis. The working mechanism, convergence, effectiveness, and efficiency were illustrated by experimental analysis. In the future, it is interesting to develop novel learning methods to construct the k-Hyper Edge Medoids.

Acknowledgements This work was supported by the National Natural Science Foundation of China (Nos. 62476160, 62136005, 62306170, 62472268, 62306171), the Science and Technology Major Project of Shanxi (No. 202201020101006), the Special Fund for Science and Technology Innovation Teams of Shanxi Province (No. 202304051001001), and the Funds for Central-Government-Guided Local Science and Technology Development (YDZJSX20231C001).

References Ayad, H.; and Kamel, M. S. 2008. Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(1): 160 173. Ayad, H.; and Kamel, M. S. 2010. On voting-based consensus of cluster ensembles. Pattern Recognition, 43(5): 1943 1953. Carpineto, C.; and Romano, G. 2012. Consensus clustering based on a new probabilistic rand index with application to subtopic retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(12): 2315 2326. Fern, X. Z.; and Brodley, C. E. 2004. Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of the twenty-first international conference on Machine learning, 36. Fred, A. L.; and Jain, A. K. 2005. Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6): 835 850. Fu, H.; Yue, X.; Liu, W.; and Denoeux, T. 2022. Stable Clustering Ensemble Based on Evidence Theory. In 2022 IEEE International Conference on Image Processing (ICIP), 2046 2050. Guilbert, M.; Vrain, C.; Dao, T.-B.-H.; and de Souto, M. C. P. 2022. Anchored Constrained Clustering Ensemble. In 2022 International Joint Conference on Neural Networks (IJCNN), 1 8. Hao, Z.; Lu, Z.; Li, G.; Nie, F.; Wang, R.; and Li, X. 2024. Ensemble Clustering With Attentional Representation. IEEE Transactions on Knowledge and Data Engineering, 36(2): 581 593. Huang, D.; Lai, J.-H.; and Wang, C.-D. 2016. Robust ensemble clustering using probability trajectories. IEEE Transactions on Knowledge and Data Engineering, 28(5): 1312 1326. Huang, D.; Wang, C.-D.; Peng, H.; Lai, J.; and Kwoh, C.- K. 2021. Enhanced ensemble clustering via fast propagation of cluster-wise similarities. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 51(1): 508 520. Huang, D.; Wang, C.-D.; Wu, J.-S.; Lai, J.-H.; and Kwoh, C.-K. 2020. Ultra-Scalable Spectral Clustering and Ensemble Clustering. IEEE Transactions on Knowledge and Data Engineering, 32(6): 1212 1226. Huang, Q.; Gao, R.; and Akhavan, H. 2023. An ensemble hierarchical clustering algorithm based on merits at cluster and partition levels. Pattern Recognition, 136: 109255.

Hubert, L.; and Arabie, P. 1985. Comparing partitions. Journal of Classification, 2(1): 193 218. Iam-On, N.; Boongoen, T.; Garrett, S.; and Price, C. 2011. A link-based approach to the cluster ensemble problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12): 2396 2409. Jia, Y.; Tao, S.; Wang, R.; and Wang, Y. 2023. Ensemble Clustering via Co-Association Matrix Self-Enhancement. IEEE Transactions on Neural Networks and Learning Systems, 1 12. Li, F.; Qian, Y.; and Wang, J. 2021. Go T: a Growing Tree Model for Clustering Ensemble. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 8349 8356. Li, F.; Qian, Y.; Wang, J.; Dang, C.; and Jing, L. 2019. Clustering ensemble based on sample s stability. Artificial Intelligence, 273: 37 55. Li, F.; Qian, Y.; Wang, J.; and Liang, J. 2017. Multigranulation information fusion: A Dempster-Shafer evidence theory-based clustering ensemble method. Information Sciences, 378: 389 409. Li, F.; Wang, J.; zhang, L.; Qian, Y.; jin, S.; Yan, T.; and Du, L. 2024. k-Hyper Edge Medoids for Clustering Ensemble. ar Xiv:2412.08289. Liu, H.; Wu, J.; Liu, T.; Tao, D.; and Fu, Y. 2017. Spectral ensemble clustering via weighted k-means: theoretical and practical evidence. IEEE Transactions on Knowledge and Data Engineering, 29(5): 1129 1143. Strehl, A.; and Ghosh, J. 2002. Cluster ensembles a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3(Dec): 583 617. Tiwari, M.; Zhang, M. J.; Mayclin, J.; Thrun, S.; Piech, C.; and Shomorony, I. 2020. Bandit PAM: almost linear time kmedoids clustering via multi-armed bandits. In Proceedings of the 34th International Conference on Neural Information Processing Systems. Curran Associates Inc. Vega-Pons, S.; and Ruiz-Shulcloper, J. 2011. A survey of clustering ensemble algorithms. International Journal of Pattern Recognition and Artificial Intelligence, 25(03): 337 372. Zhou, J.; Zheng, H.; and Pan, L. 2019. Ensemble clustering based on dense representation. Neurocomputing, 357(SEP.10): 66 76. Zhou, P.; Du, L.; and Li, X. 2020. Self-paced Consensus Clustering with Bipartite Graph. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, 2133 2139. Zhou, P.; Wang, X.; Du, L.; and Li, X. 2022. Clustering ensemble via structured hypergraph learning. Information Fusion, 78: 171 179. Zhou, Z.-H.; and Tang, W. 2006. Clusterer ensemble. Knowledge-Based Systems, 19(1): 77 83.