Cancer classification utilizing voting classifier with ensemble feature selection method and transcriptomic data
- Authors: Khatun, Rabea , Akter, Maksuda , Islam, Md Manowarul , Uddin, Md Ashraf , Talukder, Md Alamin , Kamruzzaman, Joarder , Azad, Akm , Paul, Bikash , Almoyad, Muhammad , Aryal, Sunil , Moni, Mohammad
- Date: 2023
- Type: Text , Journal article
- Relation: Genes Vol. 14, no. 9 (2023), p.
- Full Text:
- Reviewed:
- Description: Biomarker-based cancer identification and classification tools are widely used in bioinformatics and machine learning fields. However, the high dimensionality of microarray gene expression data poses a challenge for identifying important genes in cancer diagnosis. Many feature selection algorithms optimize cancer diagnosis by selecting optimal features. This article proposes an ensemble rank-based feature selection method (EFSM) and an ensemble weighted average voting classifier (VT) to overcome this challenge. The EFSM uses a ranking method that aggregates features from individual selection methods to efficiently discover the most relevant and useful features. The VT combines support vector machine, k-nearest neighbor, and decision tree algorithms to create an ensemble model. The proposed method was tested on three benchmark datasets and compared to existing built-in ensemble models. The results show that our model achieved higher accuracy, with 100% for leukaemia, 94.74% for colon cancer, and 94.34% for the 11-tumor dataset. This study concludes by identifying a subset of the most important cancer-causing genes and demonstrating their significance compared to the original data. The proposed approach surpasses existing strategies in accuracy and stability, significantly impacting the development of ML-based gene analysis. It detects vital genes with higher precision and stability than other existing methods. © 2023 by the authors.
Detection and explanation of anomalies in healthcare data
- Authors: Samariya, Durgesh , Ma, Jiangang , Aryal, Sunil , Zhao, Xiaohui
- Date: 2023
- Type: Text , Journal article
- Relation: Health Information Science and Systems Vol. 11, no. 1 (2023), p. 20-20
- Full Text: false
- Reviewed:
- Description: The growth of databases in the healthcare domain opens multiple doors for machine learning and artificial intelligence technology. Many medical devices are available in the medical field however, medical errors remain a severe challenge. Different algorithms are developed to identify and solve medical errors, such as detecting anomalous readings, anomalous health conditions of a patient, etc. However, they fail to answer why those entries are considered an anomaly. This research gap leads to an outlying aspect mining problem. The problem of outlying aspect mining aims to discover the set of features (a.k.a subspace) in which the given data point is dramatically different than others. In this paper, we present a framework that detects anomalies in healthcare data and then provides an explanation of anomalies. This paper aims to effectively and efficiently detect anomalies and explain why they are considered anomalies by detecting outlying aspects. First, we re-introduced four anomaly detection techniques and outlying aspect mining algorithms. Then, we evaluate the performance of anomaly detection techniques and choose the best anomaly detection algorithm. Later, we detect the top k anomaly as a query and detect their outlying aspect. Lastly, we evaluate their performance on 16 real-world healthcare datasets. The experimental results show that the latest isolation-based outlying aspect mining measure, SiNNE, has outstanding performance on this task and has promising results.