Neighbourhood contrast : A better means to detect clusters than density
- Authors: Chen, Bo , Ting, Kaiming
- Date: 2018
- Type: Text , Conference paper
- Relation: 22nd Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2018; Melbourne, Australia; 3rd-6th June 2018; published in Lecutre Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 10939 LNAI, p. 401-412
- Full Text: false
- Reviewed:
- Description: Most density-based clustering algorithms suffer from large density variations among clusters. This paper proposes a new measure called Neighbourhood Contrast (NC) as a better alternative to density in detecting clusters. The proposed NC admits all local density maxima, regardless of their densities, to have similar NC values. Due to this unique property, NC is a better means to detect clusters in a dataset with large density variations among clusters. We provide two applications of NC. First, replacing density with NC in the current state-of-the-art clustering procedure DP leads to significantly improved clustering performance. Second, we devise a new clustering algorithm called Neighbourhood Contrast Clustering (NCC) which does not require density or distance calculations, and therefore has a linear time complexity in terms of dataset size. Our empirical evaluation shows that both NC-based methods outperform density-based methods including the current state-of-the-art.
Half-space mass : a maximally robust and efficient data depth method
- Authors: Chen, Bo , Ting, Kaiming , Washio, Takashi , Haffari, Gholamreza
- Date: 2015
- Type: Text , Journal article
- Relation: Machine Learning Vol. 100, no. 2-3 (2015), p. 677-699
- Full Text: false
- Reviewed:
- Description: Data depth is a statistical method which models data distribution in terms of center-outward ranking rather than density or linear ranking. While there are a lot of academic interests, its applications are hampered by the lack of a method which is both robust and efficient. This paper introduces Half-Space Mass which is a significantly improved version of half-space data depth. Half-Space Mass is the only data depth method which is both robust and efficient, as far as we know. We also reveal four theoretical properties of Half-Space Mass: (i) its resultant mass distribution is concave regardless of the underlying density distribution, (ii) its maximum point is unique which can be considered as median, (iii) the median is maximally robust, and (iv) its estimation extends to a higher dimensional space in which the convex hull of the dataset occupies zero volume. We demonstrate the power of Half-Space Mass through its applications in two tasks. In anomaly detection, being a maximally robust location estimator leads directly to a robust anomaly detector that yields a better detection accuracy than half-space depth; and it runs orders of magnitude faster than L2 depth, an existing maximally robust location estimator. In clustering, the Half-Space Mass version of K-means overcomes three weaknesses of K-means.
- Description: Data depth is a statistical method which models data distribution in terms of center-outward ranking rather than density or linear ranking. While there are a lot of academic interests, its applications are hampered by the lack of a method which is both robust and efficient. This paper introduces
Local contrast as an effective means to robust clustering against varying densities
- Authors: Chen, Bo , Ting, Kaiming , Washio, Takashi , Zhu, Ye
- Date: 2018
- Type: Text , Journal article
- Relation: Machine Learning Vol. 107, no. 8-10 (2018), p. 1621-1645
- Full Text:
- Reviewed:
- Description: Most density-based clustering methods have difficulties detecting clusters of hugely different densities in a dataset. A recent density-based clustering CFSFDP appears to have mitigated the issue. However, through formalising the condition under which it fails, we reveal that CFSFDP still has the same issue. To address this issue, we propose a new measure called Local Contrast, as an alternative to density, to find cluster centers and detect clusters. We then apply Local Contrast to CFSFDP, and create a new clustering method called LC-CFSFDP which is robust in the presence of varying densities. Our empirical evaluation shows that LC-CFSFDP outperforms CFSFDP and three other state-of-the-art variants of CFSFDP. © 2018, The Author(s).