List of Titles

New diagonal bundle method for clustering problems in large data sets

Authors: Karmitsa, Napsu , Bagirov, Adil , Taheri, Sona
Date: 2017
Type: Text , Journal article
Relation: European Journal of Operational Research Vol. 263, no. 2 (2017), p. 367-379
Relation: http://purl.org/au-research/grants/arc/DP140103213
Full Text: false
Reviewed:
Description: Clustering is one of the most important tasks in data mining. Recent developments in computer hardware allow us to store in random access memory (RAM) and repeatedly read data sets with hundreds of thousands and even millions of data points. This makes it possible to use conventional clustering algorithms in such data sets. However, these algorithms may need prohibitively large computational time and fail to produce accurate solutions. Therefore, it is important to develop clustering algorithms which are accurate and can provide real time clustering in large data sets. This paper introduces one of them. Using nonsmooth optimization formulation of the clustering problem the objective function is represented as a difference of two convex (DC) functions. Then a new diagonal bundle algorithm that explicitly uses this structure is designed and combined with an incremental approach to solve this problem. The method is evaluated using real world data sets with both large number of attributes and large number of data points. The proposed method is compared with two other clustering algorithms using numerical results. © 2017 Elsevier B.V.

A heuristic algorithm for solving the minimum sum-of-squares clustering problems

Authors: Ordin, Burak , Bagirov, Adil
Date: 2015
Type: Text , Journal article
Relation: Journal of Global Optimization Vol. 61, no. 2 (2015), p. 341-361
Relation: http://purl.org/au-research/grants/arc/DP140103213
Full Text: false
Reviewed:
Description: Clustering is an important task in data mining. It can be formulated as a global optimization problem which is challenging for existing global optimization techniques even in medium size data sets. Various heuristics were developed to solve the clustering problem. The global k-means and modified global k-means are among most efficient heuristics for solving the minimum sum-of-squares clustering problem. However, these algorithms are not always accurate in finding global or near global solutions to the clustering problem. In this paper, we introduce a new algorithm to improve the accuracy of the modified global k-means algorithm in finding global solutions. We use an auxiliary cluster problem to generate a set of initial points and apply the k-means algorithm starting from these points to find the global solution to the clustering problems. Numerical results on 16 real-world data sets clearly demonstrate the superiority of the proposed algorithm over the global and modified global k-means algorithms in finding global solutions to clustering problems.

Application of optimisation-based data mining techniques to medical data sets: A comparative analysis

Authors: Dzalilov, Zari , Bagirov, Adil , Mammadov, Musa
Date: 2012
Type: Text , Conference paper
Relation: IMMM 2102: The Second International Conference on Advances in Information Mining and Management p. 41-46
Full Text: false
Reviewed:
Description: Abstract - Computational methods have become an important tool in the analysis of medical data sets. In this paper, we apply three optimisation-based data mining methods to the following data sets: (i) a cystic fibrosis data set and (ii) a tobacco control data set. Three algorithms used in the analysis of these data sets include: the modified linear least square fit, an optimization based heuristic algorithm for feature selection and an optimization based clustering algorithm. All these methods explore the relationship between features and classes, with the aim of determining contribution of specific features to the class outcome. However, the three algorithms are based on completely different approaches. We apply these methods to solve feature selection and classification problems. We also present comparative analysis of the algorithms using computational results. Results obtained confirm that these algorithms may be effectively applied to the analysis of other (bio)medical data sets

A novel piecewise linear classifier based on polyhedral conic and max-min separabilities

Authors: Bagirov, Adil , Ugon, Julien , Webb, Dean , Ozturk, Gurkan , Kasimbeyli, Refail
Date: 2011
Type: Text , Journal article
Relation: TOP Vol.21, no.1 (2011), p. 1-22
Full Text: false
Reviewed:
Description: In this paper, an algorithm for finding piecewise linear boundaries between pattern classes is developed. This algorithm consists of two main stages. In the first stage, a polyhedral conic set is used to identify data points which lie inside their classes, and in the second stage we exclude those points to compute a piecewise linear boundary using the remaining data points. Piecewise linear boundaries are computed incrementally starting with one hyperplane. Such an approach allows one to significantly reduce the computational effort in many large data sets. Results of numerical experiments are reported. These results demonstrate that the new algorithm consistently produces a good test set accuracy on most data sets comparing with a number of other mainstream classifiers. Â© 2011 Sociedad de EstadÃstica e InvestigaciÃ³n Operativa.

An efficient algorithm for the incremental construction of a piecewise linear classifier

Authors: Bagirov, Adil , Ugon, Julien , Webb, Dean
Date: 2011
Type: Text , Journal article
Relation: Information Systems Vol. 36, no. 4 (2011), p. 782-790
Relation: http://purl.org/au-research/grants/arc/DP0666061
Full Text: false
Reviewed:
Description: In this paper the problem of finding piecewise linear boundaries between sets is considered and is applied for solving supervised data classification problems. An algorithm for the computation of piecewise linear boundaries, consisting of two main steps, is proposed. In the first step sets are approximated by hyperboxes to find so-called "indeterminate" regions between sets. In the second step sets are separated inside these "indeterminate" regions by piecewise linear functions. These functions are computed incrementally starting with a linear function. Results of numerical experiments are reported. These results demonstrate that the new algorithm requires a reasonable training time and it produces consistently good test set accuracy on most data sets comparing with mainstream classifiers. Â© 2010 Elsevier B.V. All rights reserved.

Classification through incremental max-min separability

Authors: Bagirov, Adil , Ugon, Julien , Webb, Dean , Karasozen, Bulent
Date: 2011
Type: Text , Journal article
Relation: Pattern Analysis and Applications Vol. 14, no. 2 (2011), p. 165-174
Relation: http://purl.org/au-research/grants/arc/DP0666061
Full Text: false
Reviewed:
Description: Piecewise linear functions can be used to approximate non-linear decision boundaries between pattern classes. Piecewise linear boundaries are known to provide efficient real-time classifiers. However, they require a long training time. Finding piecewise linear boundaries between sets is a difficult optimization problem. Most approaches use heuristics to avoid solving this problem, which may lead to suboptimal piecewise linear boundaries. In this paper, we propose an algorithm for globally training hyperplanes using an incremental approach. Such an approach allows one to find a near global minimizer of the classification error function and to compute as few hyperplanes as needed for separating sets. We apply this algorithm for solving supervised data classification problems and report the results of numerical experiments on real-world data sets. These results demonstrate that the new algorithm requires a reasonable training time and its test set accuracy is consistently good on most data sets compared with mainstream classifiers. © 2010 Springer-Verlag London Limited.

Application of optimisation-based data mining techniques to tobacco control dataset

Authors: Dzalilov, Zari , Zhang, J , Bagirov, Adil , Mammadov, Musa
Date: 2010
Type: Text , Journal article
Relation: International Journal of Lean Thinking Vol. 1, no. 1 (2010), p. 27-41
Full Text: false
Reviewed:
Description: Tobacco smoking is one of the leading causes of death around the world. Consequently, control of tobacco use is an important global public health issue. Tobacco control may be aided by development of theoretical and methodological frameworks for describing and understanding complex tobacco control systems. Linear regression and logistic regression are currently very popular statistical techniques for modeling and analyzing complex data in tobacco control systems. However, in tobacco markets, numerous interrelated factors nontrivially interact with tobacco control policies, such that policies and control outcomes are nonlinearly related.

Derivative free stochastic discrete gradient method with adaptive mutation

Authors: Ghosh, Ranadhir , Ghosh, Moumita , Bagirov, Adil
Date: 2006
Type: Text , Journal article
Relation: Advances in Data Mining Vol. 4065, no. (2006), p. 264-278
Full Text: false
Reviewed:
Description: In data mining we come across many problems such as function optimization problem or parameter estimation problem for classifiers for which a good learning algorithm for searching is very much necessary. In this paper we propose a stochastic based derivative free algorithm for unconstrained optimization problem. Many derivative-based local search methods exist which usually stuck into local solution for non-convex optimization problems. On the other hand global search methods are very time consuming and works for only limited number of variables. In this paper we investigate a derivative free multi search gradient based method which overcomes the problems of local minima and produces global solution in less time. We have tested the proposed method on many benchmark dataset in literature and compared the results with other existing algorithms. The results are very promising.
Description: C1
Description: 2003001541

Showing items 1 - 8 of 8

New diagonal bundle method for clustering problems in large data sets

A heuristic algorithm for solving the minimum sum-of-squares clustering problems

Application of optimisation-based data mining techniques to medical data sets: A comparative analysis

A novel piecewise linear classifier based on polyhedral conic and max-min separabilities

An efficient algorithm for the incremental construction of a piecewise linear classifier

Classification through incremental max-min separability

Application of optimisation-based data mining techniques to tobacco control dataset

Derivative free stochastic discrete gradient method with adaptive mutation