# affinity_learning_for_mixed_data_clustering__d784bcac.pdf

Afﬁnity Learning for Mixed Data Clustering

Nan Li and Longin Jan Latecki Department of Computer and Information Sciences Temple University, Philadelphia, USA {nan.li,latecki}@temple.edu

In this paper, we propose a novel afﬁnity learning based framework for mixed data clustering, which includes: how to process data with mixed-type attributes, how to learn afﬁnities between data points, and how to exploit the learned afﬁnities for clustering. In the proposed framework, each original data attribute is represented with several abstract objects deﬁned according to the speciﬁc data type and values. Each attribute value is transformed into the initial afﬁnities between the data point and the abstract objects of attribute. We reﬁne these afﬁnities and infer the unknown afﬁnities between data points by taking into account the interconnections among the attribute values of all data points. The inferred afﬁnities between data points can be exploited for clustering. Alternatively, the reﬁned afﬁnities between data points and the abstract objects of attributes can be transformed into new data features for clustering. Experimental results on many real world data sets demonstrate that the proposed framework is effective for mixed data clustering.

1 Introduction Clustering is the task of partitioning the data objects into a set of groups (clusters) such that objects in the same group are similar, while objects in different groups are dissimilar. It is one of the most fundamental problems in data mining and machine learning. Numerous algorithms have been developed for clustering. Most of them are designed to handle data with only one type of attributes, e.g. continuous, categorical or ordinal. Mixed data clustering has received relatively less attention, despite the fact that data with mixed types of attributes are common in real applications. For mixed data clustering, one of the greatest challenges is how to measure the afﬁnities or distances between data points. One of the most straightforward methods for processing mixed data is the so-called 1-hot or 1-of-K encoding. A categorical attribute with K distinct values is encoded to K 0 1 binary attributes. Each categorical attribute value is transformed into a 1 on its corresponding binary attribute. Then they are treated just like continuous attributes. The

more formal Gower s similarity coefﬁcient [Gower, 1971] and its extensions [Legendre and Legendre, 1998; Podani, 1999] compute the partial afﬁnity between two data points on each attribute according to the data type, and then aggregate all of them into a composite similarity measure. Such methods are widely used in practice. However, they essentially compute the afﬁnity or distance locally between two data points, without considering the attribute values of other data points. This may result in missing some intrinsic information. For example, in many real world data sets, some values of a categorical attribute are inherently related. Such information would be missed by similarity measures like Gower s coefﬁcient, which simply assume different categories are totally independent and unrelated. In this paper, we propose a novel afﬁnity learning based framework for mixed data clustering. It includes how to process data with mixed-type attributes, how to learn afﬁnities between data points, and how to exploit the learned afﬁnities for clustering. First, each original attribute is represented with several abstract objects deﬁned according to the speciﬁc data type and values. Each attribute value is then transformed into the initial afﬁnities between the data point and the abstract objects of attribute. For categorical attributes, each category is deﬁned as an abstract object. Its afﬁnities to the data points in this category are initialized to a constant value. For each continuous attribute, two abstract objects are deﬁned to represent its minimum and maximum values. Their initial afﬁnities to each data point are transformed from the individual continuous attribute value with a novel method. For ordinal attributes, all possible values are ﬁrst ranked and then replaced by their ranks. The new ordinal attributes are processed as continuous attributes. After the data processing, we obtain a bipartite graph consisting of the data points, the abstract objects of attributes, and the initial afﬁnities between them. The next step is to learn new afﬁnities, including inferring the unknown afﬁnities and reﬁning the known afﬁnities. We adopt the algorithm proposed in [Li and Latecki, 2015], which essentially implements the von Neumann kernel [Kandola et al., 2003] from the perspective of transitive inference conﬁdence. Speciﬁcally, the new afﬁnities are learned according to the transitive property of the afﬁnitive relation. All the initial afﬁnities are scaled with a common scaling factor. Any transitive infer-

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

Figure 1: An illustration of data point connections via their attribute values. Blue circles represent data points. Rectangles represent categorical attributes, each has three distinct attribute values.

ence process without self-loops is considered to be effective to reveal the two objects are afﬁnitive. The conﬁdence of such an inference process is quantiﬁed as the product of the related scaled afﬁnity values. In general, there can be an inﬁnite number of distinct transitive inference processes between two objects. The conﬁdence of all these inference processes are added up to be the new afﬁnity between two objects. The details of this afﬁnity learning algorithm are presented in Section 3.2. In comparison to Gower s similarity coefﬁcient and its extensions, our afﬁnity learning method shares the similar idea of aggregating partial afﬁnities on individual attributes into an overall measure. But the signiﬁcant difference is that our afﬁnities are computed globally by taking into account the interconnections among the attribute values of all data points, not just between the two data points. This is illustrated in Figure 1. The numbered blue circles represent data points, i.e. {x1, x2, x3, x4}. The two rectangles represent categorical attributes R and Y , each of which has three distinct attribute values. If we compute the afﬁnity Sij just between the two data points xi and xj, like Gower s coefﬁcient does, then S13 and S14 are both 0, because they don t have any common attribute values. However, because of the existence of x2, which shares one common attribute value with x1 and x3 respectively, it s intuitive to infer that x1 is more afﬁnitive to x3 in comparison to x4. Our afﬁnity learning method can capture such information by taking into account all transitive inference processes, including x1 R1 x2 Y2 x3. The inferred afﬁnities between data points can be used by many clustering algorithms. Alternatively, the reﬁned afﬁnities between data point and the abstract objects of attribute can be transformed into new data features. With such features, any algorithms can be used for clustering. The mixed data clustering algorithms derived from the proposed framework achieve superior performance on many real world data sets. The details of the experimental evaluation are presented in Section 4.

2 Related Work

For mixed data clustering, in addition to using 1-hot encoding to obtain continuous features or Gower s coefﬁcient [Gower, 1971] and its extensions [Legendre and Legendre, 1998; Podani, 1999] to measure the similarities between data points, as introduced in Section 1, there are also some specially designed clustering algorithms, including k-prototypes [Huang, 1997; 1998], K-means-mixed [Ahmad and Dey, 2007], CAVE [Hsu and Chen, 2007], M-ART [Hsu and Huang, 2008], INTEGRATE [B ohm et al., 2010], INCONCO [Plant and B ohm, 2011], SCENIC [Plant, 2012] and so on. K-prototypes algorithm [Huang, 1997; 1998], which essentially follows the same idea of k-means algorithm, calculates the dissimilarity between two mixed-type objects as a combination of the squared Euclidean distance measure on the numeric attributes and the simple matching dissimilarity measure on the categorical attributes. K-means-mixed [Ahmad and Dey, 2007], like k-prototypes, is also based on the k-means paradigm and combines distance measures computed separately on numeric attributes and categorical attributes. Unlike k-prototypes, kmeans-mixed does not assume a binary or a discrete measure between two distinct categorical attribute values but computes the distance as a function of their overall distribution and co-occurrence with other categorical attributes. This idea of computing distances globally is similar to ours, but it s only applied within categorical attributes. CAVE [Hsu and Chen, 2007] uses variance to measure the similarity of the numeric part of the data and computes the similarity of the categorical part based on entropy weighted by the distances in the hierarchies. Similarly, the incremental clustering algorithm M-ART [Hsu and Huang, 2008] also computes the distance between two data points according to distance hierarchies associated with the mixed-type attributes. INTEGRATE [B ohm et al., 2010] applies ideas from information theory to implement the k-means paradigm. It models both numerical and categorical attributes with their probability distributions and minimizes a cost function based on the Minimum Description Length principle for clustering. INCONCO [Plant and B ohm, 2011] and SCENIC [Plant, 2012] process mixed-type attributes in a similar way as INTEGRATE. Their main advantage is the capability of modeling and revealing the cluster-speciﬁc dependency patterns among the attributes. To learn afﬁnities between heterogeneous objects of data points and attributes, we adopt the algorithm proposed in [Li and Latecki, 2015], which models the new afﬁnities from the perspective of transitive inference conﬁdence. It essentially implements the von Neumann kernel deﬁned in [Kandola et al., 2003]. The idea of learning semantic similarity between terms from a corpus for measuring similarity between text documents in [Kandola et al., 2003] is similar to our idea of capturing the intrinsic information between attribute values. One signiﬁcant difference, besides the applications are different, is that we explicitly model the interconnections among data points and attribute values together. There are also some other algorithms can be used for afﬁnity learning , such as [Zhou et al., 2003] and [Yang et al., 2013]. The main reasons we do not choose them include: 1. their row or column normalizations on the initial afﬁnity matrix change the original

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

Figure 2: An illustration for explaining the ﬁrst requirement in equation (1) for transforming a continuous attribute value into initial afﬁnities.

relationships between the heterogeneous objects; and 2. they are not as semantically intuitive and meaningful as the one [Li and Latecki, 2015] we adopt.

3 Our Framework 3.1 Mixed Data Processing We ﬁrst transform the data points and their mixed-type attribute values into abstract objects and initial afﬁnities. For categorical attributes, each category is deﬁned as an abstract object. Its afﬁnities to the data points in this category are initialized to 1, while its initial afﬁnities to the rest data points are 0. This is similar to the 1-hot encoding. For each continuous attribute C, two abstract objects are deﬁned to represent its minimum and maximum values, i.e. Cmin and Cmax. The attribute value x C of the data point x is transformed into two initial afﬁnities to the abstract objects of Cmin and Cmax. Suppose they are Sx,Cmin = a and Sx,Cmax = b, we have two requirements, a2 + b2 = 1 (Cmin a + Cmax b)/(a + b) = x C (1)

To understand the ﬁrst requirement, consider the illustration in Figure 2. a, b, c, d on the edges represent the initial afﬁnities of two data points x and x to the abstract objects of Cmin and Cmax respectively. The diffusion based afﬁnity learning algorithms essentially compute the afﬁnity Sx,x as

Sx,x = a c + b d (2) If x and x have the same attribute value x C on C, obviously their afﬁnities to Cmin and Cmax should be the same, i.e. a = c and b = d. It s also obvious to require that Sx,x to be a constant, e.g. 1, no matter what the attribute value x C is. Therefore, we get the ﬁrst requirement,

Sx,x = a c + b d = a a + b b = 1 (3) The second requirement makes sure the original attribute value can be restored from the transformed afﬁnities. Speciﬁcally, to transform the attribute value x C of x into the initial afﬁnities, x C is ﬁrst scaled with the Min-Max normalization.

x C = x C Cmin Cmax Cmin (4)

The scaled attribute value x C is in range [0, 1], i.e. C min = 0 and C max = 1. We have a2 + b2 = 1 (0 a + 1 b)/(a + b) = x C (5)

Solve the system of equations, we get the afﬁnity transformation formula as

(1 x C)2/(2 x C 2 2 x C + 1)

x C 2/(2 x C 2 2 x C + 1) (6)

In this way, if two data points have the same value on a continuous attribute, their partial afﬁnity inferred by the diffusion based afﬁnity learning algorithm described below based on this agreement is always the same, no matter what the value is. For ordinal attributes, all possible values are ﬁrst ranked and then replaced by their ranks. The new ordinal attributes are processed as continuous attributes. If an attribute value of x is missing, the related initial afﬁnities are all set to 0.

3.2 Afﬁnity Learning Now we have a bipartite graph consisting of n data points, m abstract objects of attribute, and the initial afﬁnities between them. We construct a nonnegative symmetric afﬁnity matrix A = (aij)α α, where α = m + n.

A = ADD ADC ACD ACC

where ADD is a n n zero matrix indicating that the afﬁnities between data points are unknown; ADC = A CD is a n m matrix consisting of the initial afﬁnities between data points and the abstract objects of attributes; ACC is a m m zero matrix indicating that the afﬁnities between abstract objects of attributes are unknown. The next step is to scale the nonzero entries in A, i.e. the initial afﬁnities, with a common scaling factor which satisﬁes

> max(amax, ρ(A)) (8) where amax is the maximum entry of A; ρ(A) is the spectral radius of A. Each entry aij of A is scaled with to obtain another matrix A = (a ij)α α where

Obviously, any entry a ij of A is less than 1. Also, the spectral radius of A is less than 1. Therefore,

lim l (A )l = 0 (10)

Then we compute a matrix A as

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

A = (I A ) 1 (11)

where I is the α α identity matrix. Each entry a ij of A denotes a value,

l=0 [(A )l]ij (12)

which is the learned afﬁnity between objects i and j. The inferred afﬁnities between data points are in A DD. The reﬁned afﬁnities between data points and abstract objects of attributes are in A DC. To get the scaling factor , we need to calculate the spectral radius ρ(A) of A. With iterative eigenvalue algorithms, it can be done in O(α2). Scaling the nonzero entries of A takes O(α2). The straightforward computation for inverting the matrix I A takes O(α3). Advanced algorithms, such as Strassen algorithm, can further reduce the asymptotic computational complexity. Therefore, the straightforward time complexity of our afﬁnity learning algorithm is O(α3). However, A and I A are usually very sparse. Consequently, the practical efﬁciency should be much better. We evaluate it on several real data sets with α up to about 30, 000. The details are presented in Section 4

3.3 Clustering with Learned Afﬁnities In this paper, we use the complete-linkage algorithm for clustering with the inferred afﬁnities between data points in A DD. It is one of the agglomerative hierarchical clustering methods. Speciﬁcally, in the beginning, each data point is in a cluster of its own. Then these clusters are iteratively combined until the target cluster number is reached. At each step, the two clusters, whose two members (one in each cluster) have the minimum pair-wise afﬁnity, are combined. Alternatively, the reﬁned afﬁnities of data points to the abstract objects of attributes can be used as new data features. Speciﬁcally, in the n m matrix A DC, each row is considered as a m-dimensional feature vector of the corresponding data point. In this paper, we choose k-means algorithm and complete-linkage algorithm for clustering with such features.

4 Experimental Evaluation 4.1 Experimental Setup We evaluate the performance of the proposed clustering framework on several real world data sets from the UCI Machine Learning Repository, including 5 mixed-type (Acute Inﬂammations, Heart Disease, Credit Approval, Contraceptive Method Choice and Adult) and 2 categorical (Soybean and Tic-Tac-Toe Endgame). The detailed information of these data sets is summarized in Table 1. Each record of Acute Inﬂammation data set corresponds to the yes or no diagnoses of two diseases of the urinary system. We transform the two diagnoses into four classes, i.e. (yes,yes), (yes,no), (no,yes) and (no,no). For Adult data set, we only use the training set, which contains 32, 561 records. For fair comparison, we remove the records with missing attribute values. The ﬁnal data set contains 30, 162 records. We skip the attribute education , because it is fully expressed by

another attribute education-num . In Credit Approval data set, 37 (about %5) records have one or more missing values. We simply remove them. The clustering algorithms derived from the proposed framework include: 1. IA+CL (Inferred Afﬁnities between data points + Complete-Linkage algorithm); 2. FRA+CL (Feature from Reﬁned Afﬁnities of the data point to the abstract objects of attributes + Complete-Linkage algorithm); 3. FRA+KM (Feature from Reﬁned Afﬁnities of the data point to the abstract objects of attributes + K-Means algorithm). For the three derived clustering algorithms, we vary the scaling factor in equation 9 in the range of (max(amax, ρ(A)), 4 max(amax, ρ(A))] (see equation (8)) with a step size of 10. The best results achieved by each algorithm in this process are reported. For FRA+CL and FRA+KM, we use the squared Euclidean distance measure. The comparison algorithms include: 1. OH+CL (Feature from One-Hot encoding + Complete-Linkage algorithm); 2. OH+KM (Feature from One-Hot encoding + K-Means algorithm); 3. GC+CL (Gower s Coefﬁcient + Complete Linkage algorithm); 4. KP (k-prototypes) [Huang, 1997; 1998]; 5. KMM (K-means-mixed) [Ahmad and Dey, 2007]. These algorithms are widely used in practice for mixed data clustering. Some of them are still state-of-the-art in performance. Many recent algorithms, such as [Plant and B ohm, 2011; Plant, 2012] are very complex to be implemented. We are not able to obtain the source code from the authors. The Gower s coefﬁcient in GC+CL processes ordinal attributes according to Eqs. 2a-b of [Podani, 1999]. For KP (k-prototypes), we scale all numeric attributes to the range of [0, 1] with Min-Max normalization and randomly select k data points without missing values to be the initial prototypes. The parameter γ is varied from 0.5 to 1.5 with a step size of 0.1 for the 5 mixed-type data sets. When using k-means technique, including k-prototypes and K-means-mixed, the maximum number of iterations is set to be 1000 for the Adult data set, which contains much more data, and 100 for the other 6 data sets. Moreover, all the tests are run for 100 times and the average results are reported. For all the clustering algorithms above, we set the target number of clusters to be the number of classes in each data set. The clustering quality is measured in terms of Jaccard Coefﬁcient, Fowlkes and Mallows Index, and FScore [Jing et al., 2007]. The results are consistent, so only FScore is reported. Suppose k is the class and cluster number, n is the number of data points, ni and nj are the numbers of data points in class CLAi and cluster CLUj respectively, nij is the number of data points in both CLAi and CLUj, FScore is deﬁned as

n max 1 j k 2 Rij Pij

Rij + Pij ) (13)

where Rij = nij/ni and Pij = nij/nj. All the experiments are implemented in MATLAB R2016a and conducted on a PC with Intel(R) Core(TM) i7 processor up to 3.4 GHz and 16GB RAM.

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

Table 1: Data Sets for Experimental Evaluation (number of different types of attributes, number of instances and number of classes)

Data set Continuous Categorical Ordinal #Instance #Class Acute Inﬂammations 1 5 - 120 4 Heart Disease 6 6 1 270 2 Credit Approval 6 9 - 690 2 Contraceptive Method Choice 2 7 - 1,473 3 Adult 6 8 - 48,842 2 Soybean - 35 - 47 4 Tic-Tac-Toe Endgame - 9 - 958 2

Table 2: Clustering Results (FScore on AI: Acute Inﬂammations; HD: Heart Disease; CA: Credit Approval; CMC: Contraceptive Method Choice; Adult; Soybean; TTT: Tic-Tac-Toe Endgame)

AI HD CA CMC Adult Soybean TTT IA+CL 0.92 0.78 0.78 0.52 0.75 1 0.71 FRA+CL 0.92 0.79 0.75 0.51 0.73 1 0.76 FRA+KM 0.80 0.79 0.70 0.44 0.73 0.89 0.58 GC+CL 0.92 0.71 0.63 0.47 0.58 1 0.68 OH+CL 0.76 0.63 0.64 0.48 0.69 1 0.68 OH+KM 0.72 0.76 0.69 0.44 0.73 0.88 0.57 KP 0.51 0.76 0.62 0.42 0.68 0.84 0.58 KMM 0.79 0.78 0.77 0.43 0.73 0.91 0.60

4.2 Experimental Results

As shown in Table 2, the three clustering algorithms derived from the proposed framework achieve superior performance (with ties) on all the 7 data sets. Apparently, among these three algorithms, IA+CL is the best. It achieves the best performance on 5 data sets. In comparison to GC+CL, the performance of IA+CL is consistently better (with ties). Since they use the same clustering algorithm, it proves that our inferred afﬁnities between data points, which are computed globally , capture more useful information than the locally computed similarities. We can see FRA+CL is consistently better (with ties) than OH+CL, and FRA+KM is consistently better (with ties) than OH+KM. It means the feature derived from the reﬁned afﬁnities of the data point to the abstract objects of attributes, which is also computed globally in our framework, is more effective than the 1-hot encoding feature. On some data sets, the performance of KP and KMM are competitive. But overall, our IA+CL and FRA+CL are superior. Obviously, the algorithms derived from the proposed framework are effective for mixed data clustering. In order to further prove that it is beneﬁcial to take into account the interconnections among the attribute values of all data points, we compare the performance of IA+CL, FRA+CL and FRA+KM, which are reported in Table 2, with those achieved with locally inferred afﬁnities and features from non-reﬁned afﬁnities. Speciﬁcally, after scaling the initial afﬁnities with equation 9, we obtain the matrix A . A DC contains the non-reﬁned afﬁnities of data points to the abstract objects of attributes. We use them as data feature (FNRA: Feature from Non-Reﬁned Afﬁnities) for clustering with the complete-linkage (CL) and k-means (KM) algorithms. To obtain the local afﬁnities between data

points, we compute A = A A . The afﬁnities in A DD are inferred locally just between each pair of data points. We call them LIA (Locally Inferred Afﬁnities) and use the complete-linkage (CL) algorithm for clustering. When comparing LIA+CL versus IA+CL, FNRA+CL versus FRA+CL and FNRA+KM versus FRA+KM, on each data set, LIA+CL, FNRA+CL and FNRA+KM use the same scaling factors as IA+CL, FRA+CL and FRA+KM respectively. Table 3 shows the performance comparisons. As we can see, the performance of using globally inferred or reﬁned afﬁnities are always better or equal to those of using locally inferred or non-reﬁned afﬁnities. It demonstrates that the proposed framework is effective for modeling and exploiting the interconnections among the attribute values of all data points to improve clustering performance.

In order to show that the proposed framework for mixed data clustering is applicable in practice, we evaluate its efﬁciency on real world data sets. The proposed framework consists of three main components: 1. processing mixed data; 2. learning afﬁnities; 3. clustering with the learned afﬁnities. As introduced in Section 3.1, it takes linear time to process the mixed data. When clustering with the learned afﬁnities, the time complexity totally depends on the selected clustering algorithm. Therefore, in this paper, we only evaluate the efﬁciency of afﬁnity learning. The average time consumed on this step in the clustering experiments are reported in Table 4.

As shown in Table 4, for small data sets, which contain at most thousands of objects, the time consumed on afﬁnity learning is negligible. For medium data sets, such as Adult, it may take a few minutes. Since these results are obtained on an ordinary PC, we can say, with modern computation technologies and computing power, the proposed framework is

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

Table 3: Clustering Results with Locally and Globally Learned Afﬁnities (FScore on AI: Acute Inﬂammations; HD: Heart Disease; CA: Credit Approval; CMC: Contraceptive Method Choice; Adult; Soybean; TTT: Tic-Tac-Toe Endgame)

LIA+CL IA+CL FNRA+CL FRA+CL FNRA+KM FRA+KM AI 0.92 0.92 0.92 0.92 0.79 0.80 HD 0.71 0.78 0.71 0.79 0.78 0.79 CA 0.60 0.78 0.60 0.75 0.69 0.70 CMC 0.49 0.52 0.49 0.51 0.44 0.44 Adult 0.67 0.75 0.67 0.73 0.69 0.73 Soybean 1 1 1 1 0.88 0.89 TTT 0.68 0.71 0.68 0.76 0.57 0.58

Table 4: Time Consumed on Afﬁnity Learning (sec.)

Data set #Instance #Attribute #Object Time Consumed Acute Inﬂammations 120 6 132 0.005 Heart Disease 270 13 300 0.01 Credit Approval 653 15 705 0.03 Contraceptive Method Choice 1473 9 1493 0.09 Adult 30,162 13 30,256 40 Soybean 47 35 105 0.005 Tic-Tac-Toe Endgame 958 9 985 0.04

applicable in practice.

5 Conclusions

The main contributions of this paper include: 1. we develop a novel framework for mixed data clustering; 2. our approach to mixed data processing, especially the way we transform continuous attribute values into initial afﬁnities, is novel; 3. it s novel to transform the reﬁned afﬁnities between data points and the abstract objects of attributes into new data features. Experimental results on several real world data sets demonstrate the proposed framework is effective.

Acknowledgments

This material is based upon work supported by the National Science Foundation under Grant IIS-1302164.

References [Ahmad and Dey, 2007] Amir Ahmad and Lipika Dey. A kmean clustering algorithm for mixed numeric and categorical data. Data & Knowledge Engineering, 63(2):503 527, 2007. [B ohm et al., 2010] Christian B ohm, Sebastian Goebl, Annahita Oswald, Claudia Plant, Michael Plavinski, and Bianca Wackersreuther. Integrative parameter-free clustering of data with mixed type attributes. In Paciﬁc-Asia Conference on Knowledge Discovery and Data Mining, pages 38 47. Springer, 2010. [Gower, 1971] John C Gower. A general coefﬁcient of similarity and some of its properties. Biometrics, pages 857 871, 1971. [Hsu and Chen, 2007] Chung-Chian Hsu and Yu-Cheng Chen. Mining of mixed data with application to catalog

marketing. Expert Systems with Applications, 32(1):12 23, 2007. [Hsu and Huang, 2008] Chung-Chian Hsu and Yan-Ping Huang. Incremental clustering of mixed data based on distance hierarchy. Expert Systems with Applications, 35(3):1177 1185, 2008. [Huang, 1997] Zhexue Huang. Clustering large data sets with mixed numeric and categorical values. In Proceedings of the 1st paciﬁc-asia conference on knowledge discovery and data mining,(PAKDD), pages 21 34. Citeseer, 1997. [Huang, 1998] Zhexue Huang. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data mining and knowledge discovery, 2(3):283 304, 1998. [Jing et al., 2007] Liping Jing, Michael K Ng, and Joshua Zhexue Huang. An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Transactions on Knowledge and Data Engineering, 19(8):1026 1041, 2007. [Kandola et al., 2003] Jaz Kandola, Nello Cristianini, and John S Shawe-taylor. Learning semantic similarity. In Advances in Neural Information Processing Systems, pages 673 680, 2003. [Legendre and Legendre, 1998] Pierre Legendre and Louis Legendre. Numerical ecology, volume 24, (developments in environmental modelling). 1998. [Li and Latecki, 2015] Nan Li and Longin Jan Latecki. Afﬁnity inference with application to recommender systems. In Web Intelligence and Intelligent Agent Technology (WI-IAT), 2015 IEEE/WIC/ACM International Conference on, volume 1, pages 393 400. IEEE, 2015.

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

[Plant and B ohm, 2011] Claudia Plant and Christian B ohm. Inconco: interpretable clustering of numerical and categorical objects. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1127 1135. ACM, 2011. [Plant, 2012] Claudia Plant. Dependency clustering across measurement scales. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 361 369. ACM, 2012. [Podani, 1999] J anos Podani. Extending gower s general coefﬁcient of similarity to ordinal characters. Taxon, pages 331 340, 1999. [Yang et al., 2013] Xingwei Yang, Lakshman Prasad, and Longin Jan Latecki. Afﬁnity learning with diffusion on tensor product graph. IEEE transactions on pattern analysis and machine intelligence, 35(1):28 38, 2013. [Zhou et al., 2003] Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Sch olkopf. Learning with local and global consistency. In NIPS, volume 16, pages 321 328, 2003.

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)