Issues of grid-cluster retrievals in swarm-based clustering
- Authors: Tan, Swee , Ting, Kaiming , Teng, Shyh
- Date: 2008
- Type: Text , Conference paper
- Relation: Proceedings of the 2008 IEEE World Congress on Computational Intelligence p. 511-518
- Full Text: false
- Reviewed:
- Description: One common approach in swarm-based clustering is to use agents to create a set of clusters on a two-dimensional grid, and then use an existing clustering method to retrieve the clusters on the grid. The second step, which we call grid-cluster retrieval, is an essential step to obtain an explicit partitioning of data. In this study, we highlight the issues in grid-cluster retrievals commonly neglected by researchers, and demonstrate the non-trivial difficulties involved. To tackle the issues, we then evaluate three methods: K-means, hierarchical clustering (Weighted Single-link) and density-based clustering (DBScan). Among the three methods, DBScan is the only method which has not been previously used for grid-cluster retrievals, yet it is shown to be the most suitable method in terms of effectiveness and efficiency.
FaSS : Ensembles for stable learners
- Authors: Ting, Kaiming , Wells, Jonathan , Tan, Swee , Teng, Shyh , Webb, Geoffrey
- Date: 2009
- Type: Text , Conference paper
- Relation: 8th International Workshop on Multipul Classifier Systems (MCS 2009)
- Full Text: false
- Reviewed:
- Description: This paper introduces a new ensemble approach, Feature-Space Subdivision (FaSS), which builds local models instead of global models. FaSS is a generic ensemble approach that can use either stable or unstable models as its base models. In contrast, existing ensemble approaches which employ randomisation can only use unstable models. Our analysis shows that the new approach reduces the execution time to generate a model in an ensemble with an increased level of localisation in FaSS. Our empirical evaluation shows that FaSS performs significantly better than boosting in terms of predictive accuracy, when a stable learner SVM is used as the base learner. The speed up achieved by FaSS makes SVM ensembles a reality that would otherwise infeasible for large data sets, and FaSS SVM performs better than Boosting J48 and Random Forests when SVM is the preferred base learner
A comparative study of practical stochastic clustering method with traditional methods
- Authors: Tan, Swee , Ting, Kaiming , Teng, Shyh
- Date: 2010
- Type: Text , Conference paper
- Relation: Proceedings of the 23rd Australasian Joint Conference on Artificial Intelligence p. 112-121
- Full Text: false
- Reviewed:
- Description: In many real-world clustering problems, there usually exist little information about the clusters underlying a certain dataset. For example, the number of clusters hidden in many datasets is usually not known a priori. This is an issue because many traditional clustering methods require such information as input. This paper examines a practical stochastic clustering method (PSCM) that has the ability to find clusters in datasets without requiring users to specify the centroids or the number of clusters. By comparing with traditional methods (k-means, self-organising map and hierarchical clustering methods), the performance of PSCM is found to be robust against overlapping clusters and clusters with uneven sizes. The proposed method also scales well with datasets having varying number of clusters and dimensions. Finally, our experimental results on real-world data confirm that the proposed method performs competitively against the traditional clustering methods in terms of clustering accuracy and efficiency.
Clustering gene expression data using ant-based heuristics
- Authors: Tan, Swee , Ting, Kaiming , Teng, Shyh
- Date: 2011
- Type: Text , Conference paper
- Relation: IEEE Congress on Evolutionary Computation (IEEE CEC) 2011 p. 1-8
- Full Text: false
- Reviewed:
- Description: ABSTRACT We consider the problem of finding the clusters in novel datasets in which the number of clusters is not known a priori; and little or no additional information is available for users to adjust the parameters in a clustering algorithm. We address this problem using a stochastic algorithm named SATTA (Simplified Adaptive Time Dependent Transporter), which attempts to find clusters without requiring users to specify the number of clusters or adjust any parameters. SATTA is then compared with Expectation Maximization Clustering, which is also able to estimate the number clusters using the principle of maximum likelihood and find the underlying clusters without any human interventions. Our results on seven gene expression datasets show that SATTA significantly outperforms Expectation Maximization Clustering in terms of clustering accuracy and efficiency. We discuss the conceptual differences between SATTA and EMC, which suggests that SATTA is a more promising alternative approach than Expectation Maximization Clustering when little or no additional information is available for clustering novel datasets.
- Description: ABSTRACT We consider the problem of finding the clusters in novel datasets in which the number of clusters is not known a priori; and little or no additional information is available for users to adjust the parameters in a clustering algorithm. We address this problem using a stochastic algorithm named SATTA (Simplified Adaptive Time Dependent Transporter), which attempts to find clusters without requiring users to specify the number of clusters or adjust any parameters. SATTA is then compared with Expectation Maximization Clustering, which is also able to estimate the number clusters using the principle of maximum likelihood and find the underlying clusters without any human interventions. Our results on seven gene expression datasets show that SATTA significantly outperforms Expectation Maximization Clustering in terms of clustering accuracy and efficiency. We discuss the conceptual differences between SATTA and EMC, which suggests that SATTA is a more promising alternative approach than Expectation Maximization Clustering when little or no additional information is available for clustering novel datasets. [less] 0 BOOKMARKS · 54 VIEWS
Simplifying and improving ant-based clustering
- Authors: Tan, Swee , Ting, Kaiming , Teng, Shyh
- Date: 2011
- Type: Text , Conference paper
- Relation: 11th International Conference on Computational Science, ICCS 2011; Singapore, Singapore; 1st-3rd June 2011, published in Procedia Computer Science Vol. 4, p. 46-55
- Full Text:
- Reviewed:
- Description: Ant-based clustering (ABC) is a data clustering approach inspired from cemetery formation activities observed in real ant colonies. Building upon the premise of collective intelligence, such an approach uses multiple ant-like agents and a mixture of heuristics, in order to create systems that are capable of clustering real-world data. Many recently proposed ABC systems have shown competitive results, but these systems are geared towards adding new heuristics, resulting in increasingly complex systems that are harder to understand and improve. In contrast to this direction, we demonstrate that a state-of-the-art ABC system can be systematically evaluated and then simplified. The streamlined model, which we call SABC, differs fundamentally from traditional ABC systems as it does not use the ant-colony and several key components. Yet, our empirical study shows that SABC performs more effectively and effciently than the state-of-the-art ABC system.