List of Titles

A comparative study of practical stochastic clustering method with traditional methods

Authors: Tan, Swee , Ting, Kaiming , Teng, Shyh
Date: 2010
Type: Text , Conference paper
Relation: Proceedings of the 23rd Australasian Joint Conference on Artificial Intelligence p. 112-121
Full Text: false
Reviewed:
Description: In many real-world clustering problems, there usually exist little information about the clusters underlying a certain dataset. For example, the number of clusters hidden in many datasets is usually not known a priori. This is an issue because many traditional clustering methods require such information as input. This paper examines a practical stochastic clustering method (PSCM) that has the ability to find clusters in datasets without requiring users to specify the centroids or the number of clusters. By comparing with traditional methods (k-means, self-organising map and hierarchical clustering methods), the performance of PSCM is found to be robust against overlapping clusters and clusters with uneven sizes. The proposed method also scales well with datasets having varying number of clusters and dimensions. Finally, our experimental results on real-world data confirm that the proposed method performs competitively against the traditional clustering methods in terms of clustering accuracy and efficiency.

A general stochastic clustering method for automatic cluster discovery

Authors: Tan, Swee , Ting, Kaiming , Teng, Shyh
Date: 2011
Type: Text , Journal article
Relation: Pattern Recognition Vol. 44, no. 10-11 (2011), p. 2786-2799
Full Text: false
Reviewed:
Description: Finding clusters in data is a challenging problem. Given a dataset, we usually do not know the number of natural clusters hidden in the dataset. The problem is exacerbated when there is little or no additional information except the data itself. This paper proposes a general stochastic clustering method that is a simplification of nature-inspired ant-based clustering approach. It begins with a basic solution and then performs stochastic search to incrementally improve the solution until the underlying clusters emerge, resulting in automatic cluster discovery in datasets. This method differs from several recent methods in that it does not require users to input the number of clusters and it makes no explicit assumption about the underlying distribution of a dataset. Our experimental results show that the proposed method performs better than several existing methods in terms of clustering accuracy and efficiency in majority of the datasets used in this study. Our theoretical analysis shows that the proposed method has linear time and space complexities, and our empirical study shows that it can accurately and efficiently discover clusters in large datasets in which many existing methods fail to run.

Clustering gene expression data using ant-based heuristics

Authors: Tan, Swee , Ting, Kaiming , Teng, Shyh
Date: 2011
Type: Text , Conference paper
Relation: IEEE Congress on Evolutionary Computation (IEEE CEC) 2011 p. 1-8
Full Text: false
Reviewed:
Description: ABSTRACT We consider the problem of finding the clusters in novel datasets in which the number of clusters is not known a priori; and little or no additional information is available for users to adjust the parameters in a clustering algorithm. We address this problem using a stochastic algorithm named SATTA (Simplified Adaptive Time Dependent Transporter), which attempts to find clusters without requiring users to specify the number of clusters or adjust any parameters. SATTA is then compared with Expectation Maximization Clustering, which is also able to estimate the number clusters using the principle of maximum likelihood and find the underlying clusters without any human interventions. Our results on seven gene expression datasets show that SATTA significantly outperforms Expectation Maximization Clustering in terms of clustering accuracy and efficiency. We discuss the conceptual differences between SATTA and EMC, which suggests that SATTA is a more promising alternative approach than Expectation Maximization Clustering when little or no additional information is available for clustering novel datasets.
Description: ABSTRACT We consider the problem of finding the clusters in novel datasets in which the number of clusters is not known a priori; and little or no additional information is available for users to adjust the parameters in a clustering algorithm. We address this problem using a stochastic algorithm named SATTA (Simplified Adaptive Time Dependent Transporter), which attempts to find clusters without requiring users to specify the number of clusters or adjust any parameters. SATTA is then compared with Expectation Maximization Clustering, which is also able to estimate the number clusters using the principle of maximum likelihood and find the underlying clusters without any human interventions. Our results on seven gene expression datasets show that SATTA significantly outperforms Expectation Maximization Clustering in terms of clustering accuracy and efficiency. We discuss the conceptual differences between SATTA and EMC, which suggests that SATTA is a more promising alternative approach than Expectation Maximization Clustering when little or no additional information is available for clustering novel datasets. [less] 0 BOOKMARKS · 54 VIEWS

FaSS : Ensembles for stable learners

Authors: Ting, Kaiming , Wells, Jonathan , Tan, Swee , Teng, Shyh , Webb, Geoffrey
Date: 2009
Type: Text , Conference paper
Relation: 8th International Workshop on Multipul Classifier Systems (MCS 2009)
Full Text: false
Reviewed:
Description: This paper introduces a new ensemble approach, Feature-Space Subdivision (FaSS), which builds local models instead of global models. FaSS is a generic ensemble approach that can use either stable or unstable models as its base models. In contrast, existing ensemble approaches which employ randomisation can only use unstable models. Our analysis shows that the new approach reduces the execution time to generate a model in an ensemble with an increased level of localisation in FaSS. Our empirical evaluation shows that FaSS performs significantly better than boosting in terms of predictive accuracy, when a stable learner SVM is used as the base learner. The speed up achieved by FaSS makes SVM ensembles a reality that would otherwise infeasible for large data sets, and FaSS SVM performs better than Boosting J48 and Random Forests when SVM is the preferred base learner

Feature-subspace aggregating: ensembles for stable and unstable learners

Authors: Ting, Kaiming , Wells, Jonathan , Tan, Swee , Teng, Shyh , Webb, Geoffrey
Date: 2011
Type: Text , Journal article
Relation: Machine Learning Vol. 82, no. 3 (2011), p. 375-397
Full Text: false
Reviewed:
Description: This paper introduces a new ensemble approach, Feature-Subspace Aggregating (Feating), which builds local models instead of global models. Feating is a generic ensemble approach that can enhance the predictive performance of both stable and unstable learners. In contrast, most existing ensemble approaches can improve the predictive performance of unstable learners only. Our analysis shows that the new approach reduces the execution time to generate a model in an ensemble through an increased level of localisation in Feating. Our empirical evaluation shows that Feating performs significantly better than Boosting, Random Subspace and Bagging in terms of predictive accuracy, when a stable learner SVM is used as the base learner. The speed up achieved by Feating makes feasible SVM ensembles that would otherwise be infeasible for large data sets. When SVM is the preferred base learner, we show that Feating SVM performs better than Boosting decision trees and Random Forests. We further demonstrate that Feating also substantially reduces the error of another stable learner, k-nearest neighbour, and an unstable learner, decision tree.

Issues of grid-cluster retrievals in swarm-based clustering

Authors: Tan, Swee , Ting, Kaiming , Teng, Shyh
Date: 2008
Type: Text , Conference paper
Relation: Proceedings of the 2008 IEEE World Congress on Computational Intelligence p. 511-518
Full Text: false
Reviewed:
Description: One common approach in swarm-based clustering is to use agents to create a set of clusters on a two-dimensional grid, and then use an existing clustering method to retrieve the clusters on the grid. The second step, which we call grid-cluster retrieval, is an essential step to obtain an explicit partitioning of data. In this study, we highlight the issues in grid-cluster retrievals commonly neglected by researchers, and demonstrate the non-trivial difficulties involved. To tackle the issues, we then evaluate three methods: K-means, hierarchical clustering (Weighted Single-link) and density-based clustering (DBScan). Among the three methods, DBScan is the only method which has not been previously used for grid-cluster retrievals, yet it is shown to be the most suitable method in terms of effectiveness and efficiency.

Showing items 1 - 6 of 6

A comparative study of practical stochastic clustering method with traditional methods

A general stochastic clustering method for automatic cluster discovery

Clustering gene expression data using ant-based heuristics

FaSS : Ensembles for stable learners

Feature-subspace aggregating: ensembles for stable and unstable learners

Issues of grid-cluster retrievals in swarm-based clustering