Density-ratio based clustering for discovering clusters with varying densities
- Authors: Zhu, Ye , Ting, Kaiming , Carman, Mark
- Date: 2016
- Type: Text , Journal article
- Relation: Pattern Recognition Vol. 60, no. (2016), p. 983-997
- Full Text: false
- Reviewed:
- Description: Density-based clustering algorithms are able to identify clusters of arbitrary shapes and sizes in a dataset which contains noise. It is well-known that most of these algorithms, which use a global density threshold, have difficulty identifying all clusters in a dataset having clusters of greatly varying densities. This paper identifies and analyses the condition under which density-based clustering algorithms fail in this scenario. It proposes a density-ratio based method to overcome this weakness, and reveals that it can be implemented in two approaches. One approach is to modify a density-based clustering algorithm to do density-ratio based clustering by using its density estimator to compute density-ratio. The other approach involves rescaling the given dataset only. An existing density-based clustering algorithm, which is applied to the rescaled dataset, can find all clusters with varying densities that would otherwise impossible had the same algorithm been applied to the unscaled dataset. We provide an empirical evaluation using DBSCAN, OPTICS and SNN to show the effectiveness of these two approaches. © 2016 Elsevier Ltd
Grouping points by shared subspaces for effective subspace clustering
- Authors: Zhu, Ye , Ting, Kaiming , Carman, Mark
- Date: 2018
- Type: Text , Journal article
- Relation: Pattern Recognition Vol. 83, no. (2018), p. 230-244
- Full Text: false
- Reviewed:
- Description: Clusters may exist in different subspaces of a multidimensional dataset. Traditional full-space clustering algorithms have difficulty in identifying these clusters. Various subspace clustering algorithms have used different subspace search strategies. They require clustering to assess whether cluster(s) exist in a subspace. In addition, all of them perform clustering by measuring similarity between points in the given feature space. As a result, the subspace selection and clustering processes are tightly coupled. In this paper, we propose a new subspace clustering framework named CSSub (Clustering by Shared Subspaces). It enables neighbouring core points to be clustered based on the number of subspaces they share. It explicitly splits candidate subspace selection and clustering into two separate processes, enabling different types of cluster definitions to be employed easily. Through extensive experiments on synthetic and real-world datasets, we demonstrate that CSSub discovers non-redundant subspace clusters with arbitrary shapes in noisy data; and it significantly outperforms existing state-of-the-art subspace clustering algorithms.
Overcoming key weaknesses of distance-based neighbourhood methods using a data dependent dissimilarity measure
- Authors: Ting, Kaiming , Zhu, Ye , Carman, Mark , Zhu, Yue , Zhi-Hua, Zhou
- Date: 2016
- Type: Text , Conference paper
- Relation: 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco; August 13th-17th, 2016 p. 1205-1214
- Full Text: false
- Reviewed:
- Description: This paper introduces the first generic version of data dependent dissimilarity and shows that it provides a better closest match than distance measures for three existing algorithms in clustering, anomaly detection and multi-label classification. For each algorithm, we show that by simply replacing the distance measure with the data dependent dissimilarity measure, it overcomes a key weakness of the otherwise unchanged algorithm.