Cancer classification utilizing voting classifier with ensemble feature selection method and transcriptomic data
- Authors: Khatun, Rabea , Akter, Maksuda , Islam, Md Manowarul , Uddin, Md Ashraf , Talukder, Md Alamin , Kamruzzaman, Joarder , Azad, Akm , Paul, Bikash , Almoyad, Muhammad , Aryal, Sunil , Moni, Mohammad
- Date: 2023
- Type: Text , Journal article
- Relation: Genes Vol. 14, no. 9 (2023), p.
- Full Text:
- Reviewed:
- Description: Biomarker-based cancer identification and classification tools are widely used in bioinformatics and machine learning fields. However, the high dimensionality of microarray gene expression data poses a challenge for identifying important genes in cancer diagnosis. Many feature selection algorithms optimize cancer diagnosis by selecting optimal features. This article proposes an ensemble rank-based feature selection method (EFSM) and an ensemble weighted average voting classifier (VT) to overcome this challenge. The EFSM uses a ranking method that aggregates features from individual selection methods to efficiently discover the most relevant and useful features. The VT combines support vector machine, k-nearest neighbor, and decision tree algorithms to create an ensemble model. The proposed method was tested on three benchmark datasets and compared to existing built-in ensemble models. The results show that our model achieved higher accuracy, with 100% for leukaemia, 94.74% for colon cancer, and 94.34% for the 11-tumor dataset. This study concludes by identifying a subset of the most important cancer-causing genes and demonstrating their significance compared to the original data. The proposed approach surpasses existing strategies in accuracy and stability, significantly impacting the development of ML-based gene analysis. It detects vital genes with higher precision and stability than other existing methods. © 2023 by the authors.
Levels of explainable artificial intelligence for human-aligned conversational explanations
- Authors: Dazeley, Richard , Vamplew, Peter , Foale, Cameron , Young, Cameron , Aryal, Sunil , Cruz, Francisco
- Date: 2021
- Type: Text , Journal article
- Relation: Artificial Intelligence Vol. 299, no. (2021), p.
- Full Text:
- Reviewed:
- Description: Over the last few years there has been rapid research growth into eXplainable Artificial Intelligence (XAI) and the closely aligned Interpretable Machine Learning (IML). Drivers for this growth include recent legislative changes and increased investments by industry and governments, along with increased concern from the general public. People are affected by autonomous decisions every day and the public need to understand the decision-making process to accept the outcomes. However, the vast majority of the applications of XAI/IML are focused on providing low-level ‘narrow’ explanations of how an individual decision was reached based on a particular datum. While important, these explanations rarely provide insights into an agent's: beliefs and motivations; hypotheses of other (human, animal or AI) agents' intentions; interpretation of external cultural expectations; or, processes used to generate its own explanation. Yet all of these factors, we propose, are essential to providing the explanatory depth that people require to accept and trust the AI's decision-making. This paper aims to define levels of explanation and describe how they can be integrated to create a human-aligned conversational explanation system. In so doing, this paper will survey current approaches and discuss the integration of different technologies to achieve these levels with Broad eXplainable Artificial Intelligence (Broad-XAI), and thereby move towards high-level ‘strong’ explanations. © 2021 Elsevier B.V.
A new image dissimilarity measure incorporating human perception
- Authors: Shojanazeri, Hamid , Teng, Shyh , Aryal, Sunil , Zhang, Dengsheng , Lu, Guojun
- Date: 2018
- Type: Text , Unpublished work
- Full Text:
- Description: Pairwise (dis) similarity measure of data objects is central to many applications of image anlaytics, such as image retrieval and classification. Geometric distance, particularly Euclidean distance ((
Detection and explanation of anomalies in healthcare data
- Authors: Samariya, Durgesh , Ma, Jiangang , Aryal, Sunil , Zhao, Xiaohui
- Date: 2023
- Type: Text , Journal article
- Relation: Health Information Science and Systems Vol. 11, no. 1 (2023), p. 20-20
- Full Text: false
- Reviewed:
- Description: The growth of databases in the healthcare domain opens multiple doors for machine learning and artificial intelligence technology. Many medical devices are available in the medical field however, medical errors remain a severe challenge. Different algorithms are developed to identify and solve medical errors, such as detecting anomalous readings, anomalous health conditions of a patient, etc. However, they fail to answer why those entries are considered an anomaly. This research gap leads to an outlying aspect mining problem. The problem of outlying aspect mining aims to discover the set of features (a.k.a subspace) in which the given data point is dramatically different than others. In this paper, we present a framework that detects anomalies in healthcare data and then provides an explanation of anomalies. This paper aims to effectively and efficiently detect anomalies and explain why they are considered anomalies by detecting outlying aspects. First, we re-introduced four anomaly detection techniques and outlying aspect mining algorithms. Then, we evaluate the performance of anomaly detection techniques and choose the best anomaly detection algorithm. Later, we detect the top k anomaly as a query and detect their outlying aspect. Lastly, we evaluate their performance on 16 real-world healthcare datasets. The experimental results show that the latest isolation-based outlying aspect mining measure, SiNNE, has outstanding performance on this task and has promising results.
Elastic step DDPG : multi-step reinforcement learning for improved sample efficiency
- Authors: Ly, Adrian , Dazeley, Richard , Vamplew, Peter , Cruz, Francisco , Aryal, Sunil
- Date: 2023
- Type: Text , Conference paper
- Relation: 2023 International Joint Conference on Neural Networks, IJCNN 2023 Vol. 2023-June
- Full Text: false
- Reviewed:
- Description: A major challenge in deep reinforcement learning is that it requires more data to converge to an policy for complex problems. One way to improve sample efficiency is to use n-step updates to reduce the number of samples required to converge to a good policy. However n-step updates are known to be brittle and difficult to tune. Elastic Step DQN has shown that it is possible to automate the value of n in DQN to solve problems involving discrete action spaces, however the efficacy of the technique when applied on more complex problems and against problems with continuous action spaces is yet to be shown. In this paper we adapt the innovations proposed by Elastic Step DQN onto the DDPG algorithm and show empirically that Elastic Step DDPG is able to achieve a much stronger final training policy and is more sample efficient than DDPG. © 2023 IEEE.
A new effective and efficient measure for outlying aspect mining
- Authors: Samariya, Durgesh , Aryal, Sunil , Ting, Kai , Ma, Jiangang
- Date: 2020
- Type: Text , Conference paper
- Relation: 21st International Conference on Web Information Systems Engineering, WISE 2020, Amsterdam. 20-24 October 2020, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics Vol. 12343 LNCS, p. 463-474
- Full Text: false
- Reviewed:
- Description: Outlying Aspect Mining (OAM) aims to find the subspaces (a.k.a. aspects) in which a given query is an outlier with respect to a given data set. Existing OAM algorithms use traditional distance/density-based outlier scores to rank subspaces. Because these distance/density-based scores depend on the dimensionality of subspaces, they cannot be compared directly between subspaces of different dimensionality. Z-score normalisation has been used to make them comparable. It requires to compute outlier scores of all instances in each subspace. This adds significant computational overhead on top of already expensive density estimation—making OAM algorithms infeasible to run in large and/or high-dimensional datasets. We also discover that Z-score normalisation is inappropriate for OAM in some cases. In this paper, we introduce a new score called Simple Isolation score using Nearest Neighbor Ensemble (SiNNE), which is independent of the dimensionality of subspaces. This enables the scores in subspaces with different dimensionalities to be compared directly without any additional normalisation. Our experimental results revealed that SiNNE produces better or at least the same results as existing scores; and it significantly improves the runtime of an existing OAM algorithm based on beam search. © 2020, Springer Nature Switzerland AG.
Simple supervised dissimilarity measure : bolstering iForest-induced similarity with class information without learning
- Authors: Wells, Jonathan , Aryal, Sunil , Ting, Kai
- Date: 2020
- Type: Text , Journal article
- Relation: Knowledge and Information Systems Vol. 62, no. 8 (2020), p. 3203-3216
- Full Text: false
- Reviewed:
- Description: Existing distance metric learning methods require optimisation to learn a feature space to transform data—this makes them computationally expensive in large datasets. In classification tasks, they make use of class information to learn an appropriate feature space. In this paper, we present a simple supervised dissimilarity measure which does not require learning or optimisation. It uses class information to measure dissimilarity of two data instances in the input space directly. It is a supervised version of an existing data-dependent dissimilarity measure called me. Our empirical results in k-NN and LVQ classification tasks show that the proposed simple supervised dissimilarity measure generally produces predictive accuracy better than or at least as good as existing state-of-the-art supervised and unsupervised dissimilarity measures. © 2020, Springer-Verlag London Ltd., part of Springer Nature.
A novel perceptual dissimilarity measure for image retrieval
- Authors: Shojanazeri, Hamid , Zhang, Dengsheng , Teng, Shyh , Aryal, Sunil , Lu, Guojun
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 2018 International Conference on Image and Vision Computing New Zealand, IVCNZ 2018; Auckland, New Zealand; 19th-21st November 2018 Vol. 2018-November, p. 1-6
- Full Text: false
- Reviewed:
- Description: Similarity measure is an important research topic in image classification and retrieval. Given a type of image features, a good similarity measure should be able to retrieve similar images from the database while discard irrelevant images from the retrieval. Similarity measures in literature are typically distance based which measure the spatial distance between two feature vectors in high dimensional feature space. However, this type of similarity measures do not have any perceptual meaning and ignore the neighborhood influence in the similarity decision making process. In this paper, we propose a novel dissimilarity measure, which can measure both the distance and perceptual similarity of two image features in feature space. Results show the proposed similarity measure has a significant improvement over the traditional distance based similarity measure commonly used in literature.
- Description: International Conference Image and Vision Computing New Zealand
Image clustering using a similarity measure incorporating human perception
- Authors: Shojanazeri, Hamid , Aryal, Sunil , Teng, Shyh , Zhang, Dengsheng , Lu, Guojun
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 2018 International Conference on Image and Vision Computing New Zealand, IVCNZ 2018; Auckland, New Zealand; 19th-21st November 2018 p. 1-6
- Full Text: false
- Reviewed:
- Description: Clustering similar images is an important task in image processing and computer vision. It requires a measure to quantify pairwise similarities of images. The performance of clustering algorithm depends on the choice of similarity measure. In this paper, we investigate the effectiveness of data independent (distance-based), data-dependent (mass-based) and hybrid (dis)similarity measures in the image clustering task using three benchmark image collections with different sets of features. Our results of K-Medoids clustering show that uses the hybrid Perceptual Dissimilarity Measure (PMD) produces better clustering results than distance-based l(p) - norm and mass-based m(p) - dissimilarity.
Modeling neurocognitive reaction time with gamma distribution
- Authors: Santhanagopalan, Meena , Chetty, Madhu , Foale, Cameron , Aryal, Sunil , Klein, Britt
- Date: 2018
- Type: Text , Conference proceedings
- Relation: ACSW'18 . Proceedings of the Australasian Computer Science Week Multiconference; Brisbane, QLD; January 2018; Article 28 p. 1-10
- Full Text: false
- Reviewed:
- Description: As a broader effort to build a holistic biopsychosocial health metric, reaction time data obtained from participants undertaking neurocognitive tests have been examined using Exploratory Data Analysis (EDA) for assessing its distribution. Many of the known existing methods assume, that the reaction time data follows a Gaussian distribution and thus commonly use statistical measures such as Analysis of Variance (ANOVA) for analysis. However, it is not mandatory for the reaction time data, to necessarily follow Gaussian distribution and in many instances, it can be better modeled by other representations such as Gamma distribution. Unlike Gaussian distribution which is defined using mean and variance, the Gamma distribution is defined using shape and scale parameters which also considers higher order moments of data such as skewness and kurtosis. Generalized Linear Models (GLM), based on the family exponential distributions such as Gamma distribution, which have been used to model reaction time in other domains, have not been fully explored for modeling reaction time data in psychology domain. While limited use of Gamma distribution have been reported [5, 17, 21], for analyzing response times, their application has been somewhat ad-hoc rather than systematic. For this proposed research, we use a real life biopsychosocial dataset, generated from the 'digital health' intervention programs conducted by the Faculty of Health, Federation University, Australia. The two digital intervention programs were the 'Mindfulness' program and 'Physical Activity' program. The neurocognitive tests were carried out as part of the 'Mindfulness' program. In this paper, we investigate the participants' reaction time distributions in neurocognitive tests such as the Psychology Experiment Building Language (PEBL) Go/No-Go test [19], which is a subset of the larger biopsychosocial data set. PEBL is an open source software system for designing and running psychological experiments. Analysis of participants' reaction time in the PEBL Go/No-Go test, shows that the reaction time data are more compatible with a Gamma distribution and clearly demonstrate that these can be better modeled by Gamma distribution.
Relevance of frequency of heart-rate peaks as indicator of ‘Biological’ Stress level
- Authors: Santhanagopalan, Meena , Chetty, Madhu , Foale, Cameron , Aryal, Sunil , Klein, Britt
- Date: 2018
- Type: Text , Conference proceedings
- Relation: ICONIP 2018 International on Neural Information Processing; Siem Reap, Cambodia; 13th-16th December, 2018 p. 598-609
- Full Text: false
- Reviewed:
- Description: The biopsychosocial (BPS) model proposes that health is best understood as a combination of bio-physiological, psychological and social determinants, and thus advocates for a far more comprehensive investigation of the relationships between ‘mind-body’ health. For this holistic analysis, we need a suitable measure to indicate participants’ ‘biological’ stress. With the advent of wearable sensor devices, health monitoring is becoming easier. In this study, we focus on bio-physiological indicators of stress, from wearable devices using the heart-rate data. The analysis of such heart-rate data presents a set of practical challenges. We review various measures currently in use for stress measurement and their relevance and significance with the wearables’ heart-rate data. In this paper, we propose to use the novel ‘peak heart-rate count’ metric to quantify level of ‘biological’ stress. Real life biometric data obtained from digital health intervention program was considered for the study. Our study indicates the significance of using frequency of ‘peak heart-rate count’ as a ‘biological’ stress measure.
Application of e-government principles in anti-corruption framework
- Authors: Neupane, Arjun , Soar, Jeffrey , Vaidya, Kishor , Aryal, Sunil
- Date: 2017
- Type: Text , Book chapter
- Relation: Digital governance and e-government principles applied to public procurement 3 p. 56-74
- Full Text: false
- Reviewed:
- Description: The use of Information and Communication Technologies (ICTs) plays a significant role in the economic, technological and social progression of a country. Corruption in government agencies and institutions is a serious problem in many countries in the world, especially in under-developed and developing countries. The use of ICT tools such as e-governance can help to reduce corruption. In this chapter, the authors discussed the application of e-government principles to mitigate corruption. Based on the available literature, this study identified some potential elements of e-government, which are currently practised around the world and how they are interrelated to fight against corruption. Finally, the authors present an evidencebased e-government anti-corruption framework.
Data-dependent dissimilarity measure : An effective alternative to geometric distance measures
- Authors: Aryal, Sunil , Ting, Kaiming , Washio, Takashi , Haffari, Gholamreza
- Date: 2017
- Type: Text , Journal article
- Relation: Knowledge and Information Systems Vol. 53, no. 2 (2017), p. 479-506
- Full Text: false
- Reviewed:
- Description: Nearest neighbor search is a core process in many data mining algorithms. Finding reliable closest matches of a test instance is still a challenging task as the effectiveness of many general-purpose distance measures such as ℓp -norm decreases as the number of dimensions increases. Their performances vary significantly in different data distributions. This is mainly because they compute the distance between two instances solely based on their geometric positions in the feature space, and data distribution has no influence on the distance measure. This paper presents a simple data-dependent general-purpose dissimilarity measure called ‘ mp -dissimilarity’. Rather than relying on geometric distance, it measures the dissimilarity between two instances as a probability mass in a region that encloses the two instances in every dimension. It deems two instances in a sparse region to be more similar than two instances of equal inter-point geometric distance in a dense region. Our empirical results in k-NN classification and content-based multimedia information retrieval tasks show that the proposed mp -dissimilarity measure produces better task-specific performance than existing widely used general-purpose distance measures such as ℓp -norm and cosine distance across a wide range of moderate- to high-dimensional data sets with continuous only, discrete only, and mixed attributes.
Defying the gravity of learning curve : A characteristic of nearest neighbour anomaly detectors
- Authors: Ting, Kaiming , Washio, Takashi , Wells, Jonathan , Aryal, Sunil
- Date: 2017
- Type: Text , Journal article
- Relation: Machine Learning Vol. 106, no. 1 (2017), p. 55-91
- Full Text: false
- Reviewed:
- Description: Conventional wisdom in machine learning says that all algorithms are expected to follow the trajectory of a learning curve which is often colloquially referred to as ‘more data the better’. We call this ‘the gravity of learning curve’, and it is assumed that no learning algorithms are ‘gravity-defiant’. Contrary to the conventional wisdom, this paper provides the theoretical analysis and the empirical evidence that nearest neighbour anomaly detectors are gravity-defiant algorithms.
A generic ensemble approach to estimate multidimensional likelihood in Bayesian classifier learning
- Authors: Aryal, Sunil , Ting, Kaiming
- Date: 2016
- Type: Text , Journal article
- Relation: Computational Intelligence Vol. 32, no. 3 (2016), p. 458-479
- Full Text: false
- Reviewed:
- Description: In Bayesian classifier learning, estimating the joint probability distribution (,) or the likelihood (|) directly from training data is considered to be difficult, especially in large multidimensional data sets. To circumvent this difficulty, existing Bayesian classifiers such as Naive Bayes, BayesNet, and ADE have focused on estimating simplified surrogates of (,) from different forms of one‐dimensional likelihoods. Contrary to the perceived difficulty in multidimensional likelihood estimation, we present a simple generic ensemble approach to estimate multidimensional likelihood directly from data. The idea is to aggregate (|) estimated from a random subsample of data . This article presents two ways to estimate multidimensional likelihoods using the proposed generic approach and introduces two new Bayesian classifiers called and that estimate (|) using a nearest‐neighbor density estimation and a probability estimation through feature space partitioning, respectively. Unlike the existing Bayesian classifiers, ENNBayes and MassBayes have constant training time and space complexities and they scale better than existing Bayesian classifiers in very large data sets. Our empirical evaluation shows that ENNBayes and MassBayes yield better predictive accuracy than the existing Bayesian classifiers in benchmark data sets.
Revisiting attribute independence assumption in probabilistic unsupervised anomaly detection
- Authors: Aryal, Sunil , Ting, Kaiming , Haffari, Gholamreza
- Date: 2016
- Type: Text , Conference proceedings
- Relation: 11th Pacific Asia Workshop on Intelligence and Security Informatics, PAISI 2016 - Auckland, New Zealand, 19th April, 2016 In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9650 p. 73-86
- Full Text: false
- Reviewed:
- Description: In this paper, we revisit the simple probabilistic approach of unsupervised anomaly detection by estimating multivariate probability as a product of univariate probabilities, assuming attributes are generated independently. We show that this simple traditional approach performs competitively to or better than five state-of-the-art unsupervised anomaly detection methods across a wide range of data sets from categorical, numeric or mixed domains. It is arguably the fastest anomaly detector. It is one order of magnitude faster than the fastest state-of-the- art method in high dimensional data sets.
Beyond tf-idf and cosine distance in documents dissimilarity measure
- Authors: Aryal, Sunil , Ting, Kaiming , Haffari, Gholamreza , Washio, Takashi
- Date: 2015
- Type: Text , Conference proceedings
- Relation: Asia Information Retrieval Symposium 2015 - Queensland University of Technology, Brisbane, Australia, Brisbane, 2nd-4th Dec, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9460 Vol. 9460, p. 400-406
- Full Text: false
- Reviewed:
The potential for ICT tools to promote public participation in fighting corruption
- Authors: Neupane, Arjun , Soar, Jeffrey , Vaidya, Kishor , Aryal, Sunil
- Date: 2015
- Type: Text , Book chapter
- Relation: Human Rights and the Impact of ICT in the Public Sphere: Participation, Democracy, and Political Autonomy p. 175-191
- Full Text: false
- Reviewed:
- Description: Information and Communication Technologies (ICTs) have been seen as pioneering tools for the promotion of the better delivery of government programmes and services, enabling the empowerment of citizens through greater access to information, delivery of more efficient government management processes, better transparency and accountability, and the mitigation of corruption risks. Based on a literature survey of previous research conducted on ICT systems implemented in various countries, this chapter discusses the potential of different ICT tools that have the capacity to help to promote public participation for the purpose of reducing corruption. The chapter specifically reviews the different ICT tools and platforms and their roles as potential weapons in fighting corruption. This chapter also evaluates different ICT tools, including e-government and public e-procurement. Finally, the authors develop a theoretical research model that depicts the anti-corruption capabilities of ICT tools, which in turn, has implications for academics, policy makers, and politicians.
Improving iForest with relative mass
- Authors: Aryal, Sunil , Ting, Kaiming , Wells, Jonathan , Washio, Takashi
- Date: 2014
- Type: Text , Conference paper
- Relation: 18th Pacific-Asia Conference, PAKDD 2014: Advances in Knowledge Discovery and Data Mining; Tainan, Taiwan; 13th-16th May 2014; published in Lecture Notes in Artificial Intelligence (subseries of Lecture Notes in Computer Science) Vol. 8444, p. 510-521
- Full Text: false
- Reviewed:
- Description: iForest uses a collection of isolation trees to detect anomalies. While it is effective in detecting global anomalies, it fails to detect local anomalies in data sets having multiple clusters of normal instances because the local anomalies are masked by normal clusters of similar density and they become less susceptible to isolation. In this paper, we propose a very simple but effective solution to overcome this limitation by replacing the global ranking measure based on path length with a local ranking measure based on relative mass that takes local data distribution into consideration. We demonstrate the utility of relative mass by improving the task specific performance of iForest in anomaly detection and information retrieval tasks.
Mp-dissimilarity : A data dependent dissimilarity measure
- Authors: Aryal, Sunil , Ting, Kaiming , Haffari, Gholamreza , Washio, Takashi
- Date: 2014
- Type: Text , Conference paper
- Relation: 14th IEEE International Conference on Data Mining (2014 ICDM); Shenzhen, China; 14th-17th December 2014 p. 707-712
- Full Text: false
- Reviewed:
- Description: Nearest neighbour search is a core process in many data mining algorithms. Finding reliable closest matches of a query in a high dimensional space is still a challenging task. This is because the effectiveness of many dissimilarity measures, that are based on a geometric model, such as lp-norm, decreases as the number of dimensions increases. In this paper, we examine how the data distribution can be exploited to measure dissimilarity between two instances and propose a new data dependent dissimilarity measure called 'mp-dissimilarity'. Rather than relying on geometric distance, it measures the dissimilarity between two instances in each dimension as a probability mass in a region that encloses the two instances. It deems the two instances in a sparse region to be more similar than two instances in a dense region, though these two pairs of instances have the same geometric distance. Our empirical results show that the proposed dissimilarity measure indeed provides a reliable nearest neighbour search in high dimensional spaces, particularly in sparse data. Mp-dissimilarity produced better task specific performance than lp-norm and cosine distance in classification and information retrieval tasks.