New algorithms for multi-class cancer diagnosis using tumor gene expression signatures
- Bagirov, Adil, Ferguson, Brent, Ivkovic, Sasha, Saunders, Gary, Yearwood, John
- Authors: Bagirov, Adil , Ferguson, Brent , Ivkovic, Sasha , Saunders, Gary , Yearwood, John
- Date: 2003
- Type: Text , Journal article
- Relation: Bioinformatics Vol. 19, no. 14 (2003), p. 1800-1807
- Full Text:
- Reviewed:
- Description: Motivation: The increasing use of DNA microarray-based tumor gene expression profiles for cancer diagnosis requires mathematical methods with high accuracy for solving clustering, feature selection and classification problems of gene expression data. Results: New algorithms are developed for solving clustering, feature selection and classification problems of gene expression data. The clustering algorithm is based on optimization techniques and allows the calculation of clusters step-by-step. This approach allows us to find as many clusters as a data set contains with respect to some tolerance. Feature selection is crucial for a gene expression database. Our feature selection algorithm is based on calculating overlaps of different genes. The database used, contains over 16 000 genes and this number is considerably reduced by feature selection. We propose a classification algorithm where each tissue sample is considered as the center of a cluster which is a ball. The results of numerical experiments confirm that the classification algorithm in combination with the feature selection algorithm perform slightly better than the published results for multi-class classifiers based on support vector machines for this data set.
- Description: C1
- Description: 2003000439
- Authors: Bagirov, Adil , Ferguson, Brent , Ivkovic, Sasha , Saunders, Gary , Yearwood, John
- Date: 2003
- Type: Text , Journal article
- Relation: Bioinformatics Vol. 19, no. 14 (2003), p. 1800-1807
- Full Text:
- Reviewed:
- Description: Motivation: The increasing use of DNA microarray-based tumor gene expression profiles for cancer diagnosis requires mathematical methods with high accuracy for solving clustering, feature selection and classification problems of gene expression data. Results: New algorithms are developed for solving clustering, feature selection and classification problems of gene expression data. The clustering algorithm is based on optimization techniques and allows the calculation of clusters step-by-step. This approach allows us to find as many clusters as a data set contains with respect to some tolerance. Feature selection is crucial for a gene expression database. Our feature selection algorithm is based on calculating overlaps of different genes. The database used, contains over 16 000 genes and this number is considerably reduced by feature selection. We propose a classification algorithm where each tissue sample is considered as the center of a cluster which is a ball. The results of numerical experiments confirm that the classification algorithm in combination with the feature selection algorithm perform slightly better than the published results for multi-class classifiers based on support vector machines for this data set.
- Description: C1
- Description: 2003000439
Feature selection using misclassification counts
- Bagirov, Adil, Yatsko, Andrew, Stranieri, Andrew
- Authors: Bagirov, Adil , Yatsko, Andrew , Stranieri, Andrew
- Date: 2011
- Type: Conference proceedings , Unpublished work
- Relation: Proceedings of the 9th Australasian Data Mining Conference (AusDM 2011), 51-62. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 121.
- Full Text:
- Description: Dimensionality reduction of the problem space through detection and removal of variables, contributing little or not at all to classification, is able to relieve the computational load and instance acquisition effort, considering all the data attributes accessed each time around. The approach to feature selection in this paper is based on the concept of coherent accumulation of data about class centers with respect to coordinates of informative features. Ranking is done on the degree to which different variables exhibit random characteristics. The results are being verified using the Nearest Neighbor classifier. This also helps to address the feature irrelevance and redundancy, what ranking does not immediately decide. Additionally, feature ranking methods from different independent sources are called in for the direct comparison.
- Description: Dimensionality reduction of the problem space through detection and removal of variables, contributing little or not at all to classification, is able to relieve the computational load and the data acquisition effort, considering all data components being accessed each time around. The approach to feature selection in this paper is based on the concept of coherent accumulation of data about class centers with respect to coordinates of informative features. Ranking is done on the degree, to which different variables exhibit random characteristics. The results are being verified using the Nearest Neighbor classifier. This also helps to address the feature irrelevance, what ranking does not immediately decide. Additionally, feature ranking methods available from different independent sources are called in for direct comparison.
- Authors: Bagirov, Adil , Yatsko, Andrew , Stranieri, Andrew
- Date: 2011
- Type: Conference proceedings , Unpublished work
- Relation: Proceedings of the 9th Australasian Data Mining Conference (AusDM 2011), 51-62. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 121.
- Full Text:
- Description: Dimensionality reduction of the problem space through detection and removal of variables, contributing little or not at all to classification, is able to relieve the computational load and instance acquisition effort, considering all the data attributes accessed each time around. The approach to feature selection in this paper is based on the concept of coherent accumulation of data about class centers with respect to coordinates of informative features. Ranking is done on the degree to which different variables exhibit random characteristics. The results are being verified using the Nearest Neighbor classifier. This also helps to address the feature irrelevance and redundancy, what ranking does not immediately decide. Additionally, feature ranking methods from different independent sources are called in for the direct comparison.
- Description: Dimensionality reduction of the problem space through detection and removal of variables, contributing little or not at all to classification, is able to relieve the computational load and the data acquisition effort, considering all data components being accessed each time around. The approach to feature selection in this paper is based on the concept of coherent accumulation of data about class centers with respect to coordinates of informative features. Ranking is done on the degree, to which different variables exhibit random characteristics. The results are being verified using the Nearest Neighbor classifier. This also helps to address the feature irrelevance, what ranking does not immediately decide. Additionally, feature ranking methods available from different independent sources are called in for direct comparison.
Max-min separability
- Authors: Bagirov, Adil
- Date: 2005
- Type: Text , Journal article
- Relation: Optimization Methods and Software Vol. 20, no. 2-3 (2005), p. 271-290
- Full Text:
- Reviewed:
- Description: We consider the problem of discriminating two finite point sets in the n-dimensional space by a finite number of hyperplanes generating a piecewise linear function. If the intersection of these sets is empty, then they can be strictly separated by a max-min of linear functions. An error function is introduced. This function is nonconvex piecewise linear. We discuss an algorithm for its minimization. The results of numerical experiments using some real-world datasets are presented, which show the effectiveness of the proposed approach.
- Description: C1
- Description: 2003001350
- Authors: Bagirov, Adil
- Date: 2005
- Type: Text , Journal article
- Relation: Optimization Methods and Software Vol. 20, no. 2-3 (2005), p. 271-290
- Full Text:
- Reviewed:
- Description: We consider the problem of discriminating two finite point sets in the n-dimensional space by a finite number of hyperplanes generating a piecewise linear function. If the intersection of these sets is empty, then they can be strictly separated by a max-min of linear functions. An error function is introduced. This function is nonconvex piecewise linear. We discuss an algorithm for its minimization. The results of numerical experiments using some real-world datasets are presented, which show the effectiveness of the proposed approach.
- Description: C1
- Description: 2003001350
Data mining with combined use of optimization techniques and self-organizing maps for improving risk grouping rules : Application to prostate cancer patients
- Churilov, Leonid, Bagirov, Adil, Schwartz, Daniel, Smith, Kate, Dally, Michael
- Authors: Churilov, Leonid , Bagirov, Adil , Schwartz, Daniel , Smith, Kate , Dally, Michael
- Date: 2005
- Type: Text , Journal article
- Relation: Journal of Management Information Systems Vol. 21, no. 4 (2005), p. 85-100
- Full Text:
- Reviewed:
- Description: Data mining techniques provide a popular and powerful tool set to generate various data-driven classification systems. In this paper, we investigate the combined use of self-organizing maps (SOM) and nonsmooth nonconvex optimization techniques in order to produce a working case of a data-driven risk classification system. The optimization approach strengthens the validity of SOM results, and the improved classification system increases both the quality of prediction and the homogeneity within the risk groups. Accurate classification of prostate cancer patients into risk groups is important to assist in the identification of appropriate treatment paths. We start with the existing rules and aim to improve classification accuracy by identifying inconsistencies utilizing self-organizing maps as a data visualization tool. Then, we progress to the study of assigning prostate cancer patients into homogenous groups with the aim to support future clinical treatment decisions. Using the case of prostate cancer patients grouping, we demonstrate strong potential of data-driven risk classification schemes for addressing the risk grouping issues in more general organizational settings. © 2005 M.E. Sharpe, Inc.
- Description: C1
- Description: 2003001265
- Authors: Churilov, Leonid , Bagirov, Adil , Schwartz, Daniel , Smith, Kate , Dally, Michael
- Date: 2005
- Type: Text , Journal article
- Relation: Journal of Management Information Systems Vol. 21, no. 4 (2005), p. 85-100
- Full Text:
- Reviewed:
- Description: Data mining techniques provide a popular and powerful tool set to generate various data-driven classification systems. In this paper, we investigate the combined use of self-organizing maps (SOM) and nonsmooth nonconvex optimization techniques in order to produce a working case of a data-driven risk classification system. The optimization approach strengthens the validity of SOM results, and the improved classification system increases both the quality of prediction and the homogeneity within the risk groups. Accurate classification of prostate cancer patients into risk groups is important to assist in the identification of appropriate treatment paths. We start with the existing rules and aim to improve classification accuracy by identifying inconsistencies utilizing self-organizing maps as a data visualization tool. Then, we progress to the study of assigning prostate cancer patients into homogenous groups with the aim to support future clinical treatment decisions. Using the case of prostate cancer patients grouping, we demonstrate strong potential of data-driven risk classification schemes for addressing the risk grouping issues in more general organizational settings. © 2005 M.E. Sharpe, Inc.
- Description: C1
- Description: 2003001265
Nonsmooth optimisation approach to data classification
- Bagirov, Adil, Soukhoroukova, Nadejda
- Authors: Bagirov, Adil , Soukhoroukova, Nadejda
- Date: 2001
- Type: Text , Conference paper
- Relation: Paper presented at Post-graduate ADFA Conference for Computer Science, PACCS01, Canberra, Australian Capital Territory : 14th July 2001
- Full Text:
- Description: We reduce the supervised classification to solving a nonsmooth optimization problem. The proposed method allows one to solve classification problems for databases with arbitrary number of classes. Numerical experiments have been carried out with databases of small and medium size. We present their results and provide comparison of these results with ones obtained by other algorithms of classification based on the optimization techniques. Results of numerical experiments show effectiveness of the proposed algorithms.
- Description: 2003003668
- Authors: Bagirov, Adil , Soukhoroukova, Nadejda
- Date: 2001
- Type: Text , Conference paper
- Relation: Paper presented at Post-graduate ADFA Conference for Computer Science, PACCS01, Canberra, Australian Capital Territory : 14th July 2001
- Full Text:
- Description: We reduce the supervised classification to solving a nonsmooth optimization problem. The proposed method allows one to solve classification problems for databases with arbitrary number of classes. Numerical experiments have been carried out with databases of small and medium size. We present their results and provide comparison of these results with ones obtained by other algorithms of classification based on the optimization techniques. Results of numerical experiments show effectiveness of the proposed algorithms.
- Description: 2003003668
Multi-source cyber-attacks detection using machine learning
- Taheri, Sona, Gondal, Iqbal, Bagirov, Adil, Harkness, Greg, Brown, Simon, Chi, Chihung
- Authors: Taheri, Sona , Gondal, Iqbal , Bagirov, Adil , Harkness, Greg , Brown, Simon , Chi, Chihung
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 2019 IEEE International Conference on Industrial Technology, ICIT 2019; Melbourne, Australia; 13th-15th February 2019 Vol. 2019-February, p. 1167-1172
- Full Text:
- Reviewed:
- Description: The Internet of Things (IoT) has significantly increased the number of devices connected to the Internet ranging from sensors to multi-source data information. As the IoT continues to evolve with new technologies number of threats and attacks against IoT devices are on the increase. Analyzing and detecting these attacks originating from different sources needs machine learning models. These models provide proactive solutions for detecting attacks and their sources. In this paper, we propose to apply a supervised machine learning classification technique to identify cyber-attacks from each source. More precisely, we apply the incremental piecewise linear classifier that constructs boundary between sources/classes incrementally starting with one hyperplane and adding more hyperplanes at each iteration. The algorithm terminates when no further significant improvement of the separation of sources/classes is possible. The construction and usage of piecewise linear boundaries allows us to avoid any possible overfitting. We apply the incremental piecewise linear classifier on the multi-source real world cyber security data set to identify cyber-attacks and their sources.
- Description: Proceedings of the IEEE International Conference on Industrial Technology
- Authors: Taheri, Sona , Gondal, Iqbal , Bagirov, Adil , Harkness, Greg , Brown, Simon , Chi, Chihung
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 2019 IEEE International Conference on Industrial Technology, ICIT 2019; Melbourne, Australia; 13th-15th February 2019 Vol. 2019-February, p. 1167-1172
- Full Text:
- Reviewed:
- Description: The Internet of Things (IoT) has significantly increased the number of devices connected to the Internet ranging from sensors to multi-source data information. As the IoT continues to evolve with new technologies number of threats and attacks against IoT devices are on the increase. Analyzing and detecting these attacks originating from different sources needs machine learning models. These models provide proactive solutions for detecting attacks and their sources. In this paper, we propose to apply a supervised machine learning classification technique to identify cyber-attacks from each source. More precisely, we apply the incremental piecewise linear classifier that constructs boundary between sources/classes incrementally starting with one hyperplane and adding more hyperplanes at each iteration. The algorithm terminates when no further significant improvement of the separation of sources/classes is possible. The construction and usage of piecewise linear boundaries allows us to avoid any possible overfitting. We apply the incremental piecewise linear classifier on the multi-source real world cyber security data set to identify cyber-attacks and their sources.
- Description: Proceedings of the IEEE International Conference on Industrial Technology
Diagnostic with incomplete nominal/discrete data
- Jelinek, Herbert, Yatsko, Andrew, Stranieri, Andrew, Venkatraman, Sitalakshmi, Bagirov, Adil
- Authors: Jelinek, Herbert , Yatsko, Andrew , Stranieri, Andrew , Venkatraman, Sitalakshmi , Bagirov, Adil
- Date: 2015
- Type: Text , Journal article
- Relation: Artificial Intelligence Research Vol. 4, no. 1 (2015), p. 22-35
- Full Text:
- Reviewed:
- Description: Missing values may be present in data without undermining its use for diagnostic / classification purposes but compromise application of readily available software. Surrogate entries can remedy the situation, although the outcome is generally unknown. Discretization of continuous attributes renders all data nominal and is helpful in dealing with missing values; particularly, no special handling is required for different attribute types. A number of classifiers exist or can be reformulated for this representation. Some classifiers can be reinvented as data completion methods. In this work the Decision Tree, Nearest Neighbour, and Naive Bayesian methods are demonstrated to have the required aptness. An approach is implemented whereby the entered missing values are not necessarily a close match of the true data; however, they intend to cause the least hindrance for classification. The proposed techniques find their application particularly in medical diagnostics. Where clinical data represents a number of related conditions, taking Cartesian product of class values of the underlying sub-problems allows narrowing down of the selection of missing value substitutes. Real-world data examples, some publically available, are enlisted for testing. The proposed and benchmark methods are compared by classifying the data before and after missing value imputation, indicating a significant improvement.
- Authors: Jelinek, Herbert , Yatsko, Andrew , Stranieri, Andrew , Venkatraman, Sitalakshmi , Bagirov, Adil
- Date: 2015
- Type: Text , Journal article
- Relation: Artificial Intelligence Research Vol. 4, no. 1 (2015), p. 22-35
- Full Text:
- Reviewed:
- Description: Missing values may be present in data without undermining its use for diagnostic / classification purposes but compromise application of readily available software. Surrogate entries can remedy the situation, although the outcome is generally unknown. Discretization of continuous attributes renders all data nominal and is helpful in dealing with missing values; particularly, no special handling is required for different attribute types. A number of classifiers exist or can be reformulated for this representation. Some classifiers can be reinvented as data completion methods. In this work the Decision Tree, Nearest Neighbour, and Naive Bayesian methods are demonstrated to have the required aptness. An approach is implemented whereby the entered missing values are not necessarily a close match of the true data; however, they intend to cause the least hindrance for classification. The proposed techniques find their application particularly in medical diagnostics. Where clinical data represents a number of related conditions, taking Cartesian product of class values of the underlying sub-problems allows narrowing down of the selection of missing value substitutes. Real-world data examples, some publically available, are enlisted for testing. The proposed and benchmark methods are compared by classifying the data before and after missing value imputation, indicating a significant improvement.
- «
- ‹
- 1
- ›
- »