New gene selection algorithm using hypeboxes to improve performance of classifiers
- Authors: Bagirov, Adil , Mardaneh, Karim
- Date: 2020
- Type: Text , Journal article
- Relation: International Journal of Bioinformatics Research and Applications Vol. 16, no. 3 (2020), p. 269-289
- Full Text: false
- Reviewed:
- Description: The use of DNA microarray technology allows to measure the expression levels of thousands of genes in one single experiment which makes possible to apply classification techniques to classify tumours. However, the large number of genes and relatively small number of tumours in gene expression datasets may (and in some cases significantly) diminish the accuracy of many classifiers. Therefore, efficient gene selection algorithms are required to identify most informative genes or groups of genes to improve the performance of classifiers. In this paper, a new gene selection algorithm is developed using marginal hyberboxes of genes or groups of genes for each tumour type. Informative genes are defined using overlaps between hyberboxes. The results on six gene expression datasets demonstrate that the proposed algorithm is able to considerably reduce the number of genes and significantly improve the performance of classifiers. © 2020 Inderscience Enterprises Ltd.
An incremental piecewise linear classifier based on polyhedral conic separation
- Authors: Ozturk, Gurkan , Bagirov, Adil , Kasimbeyli, Refail
- Date: 2015
- Type: Text , Journal article
- Relation: Machine Learning Vol. 101, no. 1-3 (2015), p. 397-413
- Relation: http://purl.org/au-research/grants/arc/DP140103213
- Full Text: false
- Reviewed:
- Description: In this paper, a piecewise linear classifier based on polyhedral conic separation is developed. This classifier builds nonlinear boundaries between classes using polyhedral conic functions. Since the number of polyhedral conic functions separating classes is not known a priori, an incremental approach is proposed to build separating functions. These functions are found by minimizing an error function which is nonsmooth and nonconvex. A special procedure is proposed to generate starting points to minimize the error function and this procedure is based on the incremental approach. The discrete gradient method, which is a derivative-free method for nonsmooth optimization, is applied to minimize the error function starting from those points. The proposed classifier is applied to solve classification problems on 12 publicly available data sets and compared with some mainstream and piecewise linear classifiers. © 2014, The Author(s).
Diagnostic with incomplete nominal/discrete data
- Authors: Jelinek, Herbert , Yatsko, Andrew , Stranieri, Andrew , Venkatraman, Sitalakshmi , Bagirov, Adil
- Date: 2015
- Type: Text , Journal article
- Relation: Artificial Intelligence Research Vol. 4, no. 1 (2015), p. 22-35
- Full Text:
- Reviewed:
- Description: Missing values may be present in data without undermining its use for diagnostic / classification purposes but compromise application of readily available software. Surrogate entries can remedy the situation, although the outcome is generally unknown. Discretization of continuous attributes renders all data nominal and is helpful in dealing with missing values; particularly, no special handling is required for different attribute types. A number of classifiers exist or can be reformulated for this representation. Some classifiers can be reinvented as data completion methods. In this work the Decision Tree, Nearest Neighbour, and Naive Bayesian methods are demonstrated to have the required aptness. An approach is implemented whereby the entered missing values are not necessarily a close match of the true data; however, they intend to cause the least hindrance for classification. The proposed techniques find their application particularly in medical diagnostics. Where clinical data represents a number of related conditions, taking Cartesian product of class values of the underlying sub-problems allows narrowing down of the selection of missing value substitutes. Real-world data examples, some publically available, are enlisted for testing. The proposed and benchmark methods are compared by classifying the data before and after missing value imputation, indicating a significant improvement.
Classification through incremental max-min separability
- Authors: Bagirov, Adil , Ugon, Julien , Webb, Dean , Karasozen, Bulent
- Date: 2011
- Type: Text , Journal article
- Relation: Pattern Analysis and Applications Vol. 14, no. 2 (2011), p. 165-174
- Relation: http://purl.org/au-research/grants/arc/DP0666061
- Full Text: false
- Reviewed:
- Description: Piecewise linear functions can be used to approximate non-linear decision boundaries between pattern classes. Piecewise linear boundaries are known to provide efficient real-time classifiers. However, they require a long training time. Finding piecewise linear boundaries between sets is a difficult optimization problem. Most approaches use heuristics to avoid solving this problem, which may lead to suboptimal piecewise linear boundaries. In this paper, we propose an algorithm for globally training hyperplanes using an incremental approach. Such an approach allows one to find a near global minimizer of the classification error function and to compute as few hyperplanes as needed for separating sets. We apply this algorithm for solving supervised data classification problems and report the results of numerical experiments on real-world data sets. These results demonstrate that the new algorithm requires a reasonable training time and its test set accuracy is consistently good on most data sets compared with mainstream classifiers. © 2010 Springer-Verlag London Limited.
Data mining with combined use of optimization techniques and self-organizing maps for improving risk grouping rules : Application to prostate cancer patients
- Authors: Churilov, Leonid , Bagirov, Adil , Schwartz, Daniel , Smith, Kate , Dally, Michael
- Date: 2005
- Type: Text , Journal article
- Relation: Journal of Management Information Systems Vol. 21, no. 4 (2005), p. 85-100
- Full Text:
- Reviewed:
- Description: Data mining techniques provide a popular and powerful tool set to generate various data-driven classification systems. In this paper, we investigate the combined use of self-organizing maps (SOM) and nonsmooth nonconvex optimization techniques in order to produce a working case of a data-driven risk classification system. The optimization approach strengthens the validity of SOM results, and the improved classification system increases both the quality of prediction and the homogeneity within the risk groups. Accurate classification of prostate cancer patients into risk groups is important to assist in the identification of appropriate treatment paths. We start with the existing rules and aim to improve classification accuracy by identifying inconsistencies utilizing self-organizing maps as a data visualization tool. Then, we progress to the study of assigning prostate cancer patients into homogenous groups with the aim to support future clinical treatment decisions. Using the case of prostate cancer patients grouping, we demonstrate strong potential of data-driven risk classification schemes for addressing the risk grouping issues in more general organizational settings. © 2005 M.E. Sharpe, Inc.
- Description: C1
- Description: 2003001265
Max-min separability
- Authors: Bagirov, Adil
- Date: 2005
- Type: Text , Journal article
- Relation: Optimization Methods and Software Vol. 20, no. 2-3 (2005), p. 271-290
- Full Text:
- Reviewed:
- Description: We consider the problem of discriminating two finite point sets in the n-dimensional space by a finite number of hyperplanes generating a piecewise linear function. If the intersection of these sets is empty, then they can be strictly separated by a max-min of linear functions. An error function is introduced. This function is nonconvex piecewise linear. We discuss an algorithm for its minimization. The results of numerical experiments using some real-world datasets are presented, which show the effectiveness of the proposed approach.
- Description: C1
- Description: 2003001350
An algorithm for clustering based on non-smooth optimization techniques
- Authors: Bagirov, Adil , Rubinov, Alex , Sukhorukova, Nadezda , Yearwood, John
- Date: 2003
- Type: Text , Journal article
- Relation: International Transactions in Operational Research Vol. 10, no. 6 (2003), p. 611-617
- Full Text: false
- Reviewed:
- Description: The problem of cluster analysis is formulated as a problem of non-smooth, non-convex optimization, and an algorithm for solving the cluster analysis problem based on non-smooth optimization techniques is developed. We discuss applications of this algorithm in large databases. Results of numerical experiments are presented to demonstrate the effectiveness of this algorithm.
- Description: C1
- Description: 2003000422
New algorithms for multi-class cancer diagnosis using tumor gene expression signatures
- Authors: Bagirov, Adil , Ferguson, Brent , Ivkovic, Sasha , Saunders, Gary , Yearwood, John
- Date: 2003
- Type: Text , Journal article
- Relation: Bioinformatics Vol. 19, no. 14 (2003), p. 1800-1807
- Full Text:
- Reviewed:
- Description: Motivation: The increasing use of DNA microarray-based tumor gene expression profiles for cancer diagnosis requires mathematical methods with high accuracy for solving clustering, feature selection and classification problems of gene expression data. Results: New algorithms are developed for solving clustering, feature selection and classification problems of gene expression data. The clustering algorithm is based on optimization techniques and allows the calculation of clusters step-by-step. This approach allows us to find as many clusters as a data set contains with respect to some tolerance. Feature selection is crucial for a gene expression database. Our feature selection algorithm is based on calculating overlaps of different genes. The database used, contains over 16 000 genes and this number is considerably reduced by feature selection. We propose a classification algorithm where each tissue sample is considered as the center of a cluster which is a ball. The results of numerical experiments confirm that the classification algorithm in combination with the feature selection algorithm perform slightly better than the published results for multi-class classifiers based on support vector machines for this data set.
- Description: C1
- Description: 2003000439