Addressing the complexities of big data analytics in healthcare : The diabetes screening case
- Authors: De Silva, Daswin , Burstein, Frada , Jelinek, Herbert , Stranieri, Andrew
- Date: 2015
- Type: Text , Journal article
- Relation: Australasian Journal of Information Systems Vol. 19, no. (2015), p. S99-S115
- Full Text:
- Reviewed:
- Description: The healthcare industry generates a high throughput of medical, clinical and omics data of varying complexity and features. Clinical decision-support is gaining widespread attention as medical institutions and governing bodies turn towards better management of this data for effective and efficient healthcare delivery and quality assured outcomes. Amass of data across all stages, from disease diagnosis to palliative care, is further indication of the opportunities and challenges to effective data management, analysis, prediction and optimization techniques as parts of knowledge management in clinical environments. Big Data analytics (BDA) presents the potential to advance this industry with reforms in clinical decision-support and translational research. However, adoption of big data analytics has been slow due to complexities posed by the nature of healthcare data. The success of these systems is hard to predict, so further research is needed to provide a robust framework to ensure investment in BDA is justified. In this paper we investigate these complexities from the perspective of updated Information Systems (IS) participation theory. We present a case study on a large diabetes screening project to integrate, converge and derive expedient insights from such an accumulation of data and make recommendations for a successful BDA implementation grounded in a participatory framework and the specificities of big data in healthcare context. © 2015 De Silva, Burstein, Jelinek, Stranieri.
Data-analytically derived flexible HbA1c thresholds for type 2 diabetes mellitus diagnostic
- Authors: Stranieri, Andrew , Yatsko, Andrew , Jelinek, Herbert , Venkatraman, Sitalakshmi
- Date: 2015
- Type: Text , Journal article
- Relation: Artificial Intelligence Research Vol. 5, no. 1 (2015), p. 111-134
- Full Text:
- Reviewed:
- Description: Glycated haemoglobin (HbA1c) is now more commonly used as an alternative test to the fasting plasma glucose and oral glucose tolerance tests for the identification of Type 2 Diabetes Mellitus (T2DM) because it is easily obtained using the point-of-care technology and represents long-term blood sugar levels. According to WHO guidelines, HbA1c values of 6.5% or above are required for a diagnosis of T2DM. However outcomes of a large number of trials with HbA1c have been inconsistent across the clinical spectrum and further research is required to determine the efficacy of HbA1c testing in identification of T2DM. Medical records from a diabetes screening program in Australia illustrate that many patients could be classified as diabetics if other clinical indicators are included, even though the HbA1c result does not exceed 6.5%. This suggests that a cutoff for the general population of 6.5% may be too simple and miss individuals at risk or with already overt, undiagnosed diabetes. In this study, data mining algorithms have been applied to identify markers that can be used with HbA1c. The results indicate that T2DM is best classified by HbA1c at 6.2% - a cutoff level lower than the currently recommended one, which can be even less, having assumed the threshold flexibility, if additionally to HbA1c being high the rule is conditioned on oxidative stress or inflammation being present, atherogenicity or adiposity being high, or hypertension being diagnosed, etc.
Personalised measures of obesity using waist to height ratios from an Australian health screening program
- Authors: Jelinek, Herbert , Stranieri, Andrew , Yatsko, Anderw , Venkatraman, Sitalakshmi
- Date: 2019
- Type: Text , Journal article
- Relation: Digital Health Vol. 5, no. (2019), p. 1-8
- Full Text:
- Reviewed:
- Description: Objectives The aim of the current study is to generate waist circumference to height ratio cut-off values for obesity categories from a model of the relationship between body mass index and waist circumference to height ratio. We compare the waist circumference to height ratio discovered in this way with cut-off values currently prevalent in practice that were originally derived using pragmatic criteria. Method Personalized data including age, gender, height, weight, waist circumference and presence of diabetes, hypertension and cardiovascular disease for 847 participants over eight years were assembled from participants attending a rural Australian health review clinic (DiabHealth). Obesity was classified based on the conventional body mass index measure (weight/height(2)) and compared to the waist circumference to height ratio. Correlations between the measures were evaluated on the screening data, and independently on data from the National Health and Nutrition Examination Survey that included age categories. Results This article recommends waist circumference to height ratio cut-off values based on an Australian rural sample and verified using the National Health and Nutrition Examination Survey database that facilitates the classification of obesity in clinical practice. Gender independent cut-off values are provided for waist circumference to height ratio that identify healthy (waist circumference to height ratio >= 0.45), overweight (0.53) and the three obese (0.60, 0.68, 0.75) categories verified on the National Health and Nutrition Examination Survey dataset. A strong linearity between the waist circumference to height ratio and the body mass index measure is demonstrated. Conclusion The recommended waist circumference to height ratio cut-off values provided a useful index for assessing stages of obesity and risk of chronic disease for improved healthcare in clinical practice.
Improving classifications for cardiac autonomic neuropathy using multi-level ensemble classifiers and feature selection based on random forest
- Authors: Kelarev, Andrei , Stranieri, Andrew , Abawajy, Jemal , Yearwood, John , Jelinek, Herbert
- Date: 2012
- Type: Text , Conference paper
- Relation: Tenth Australasian Data Mining Conference Vol. 134, p. 93-101
- Full Text: false
- Reviewed:
- Description: This paper is devoted to empirical investigation of novel multi-level ensemble meta classifiers for the detection and monitoring of progression of cardiac autonomic neuropathy, CAN, in diabetes patients. Our experiments relied on an extensive database and concentrated on ensembles of ensembles, or multi-level meta classifiers, for the classification of cardiac autonomic neuropathy progression. First, we carried out a thorough investigation comparing the performance of various base classifiers for several known sets of the most essential features in this database and determined that Random Forest significantly and consistently outperforms all other base classifiers in this new application. Second, we used feature selection and ranking implemented in Random Forest. It was able to identify a new set of features, which has turned out better than all other sets considered for this large and well-known database previously. Random Forest remained the very best classifier for the new set of features too. Third, we investigated meta classifiers and new multi-level meta classifiers based on Random Forest, which have improved its performance. The results obtained show that novel multi-level meta classifiers achieved further improvement and obtained new outcomes that are significantly better compared with the outcomes published in the literature previously for cardiac autonomic neuropathy.
AWSum -Combining classification with knowledge acquisition
- Authors: Quinn, Anthony , Stranieri, Andrew , Yearwood, John , Hafen, Gaudenz , Jelinek, Herbert
- Date: 2008
- Type: Text , Journal article
- Relation: International Journal of Software and Informatics Vol. 2, no. 2 (2008), p. 199-214
- Full Text: false
- Reviewed:
- Description: Many classifiers achieve high levels of accuracy but have limited applicability in real world situations because they do not lead to a greater understanding or insight into the way features influence the classification. In areas such as health informatics a classifier that clearly identifies the influences on classification can be used to direct research and formulate interventions. This research investigates the practical aplications of Automated Weighted Sum, (AWSum), a classifier that provides accuracy comparable to other techniques whist providing insight into the data. This is achieved by calculating a weight for each feature value that represents its influence on the class value. The merits of this approach in classification and insight are evaluated on a Cystic Fibrosis and diabetes datasets with positive results.
Atrial fibrillation analysis for real time patient monitoring
- Authors: Allami, Ragheed , Stranieri, Andrew , Marzbanrad, Faezeh , Balasubramanian, Venki , Jelinek, Herbert
- Date: 2017
- Type: Text , Conference proceedings , Conference paper
- Relation: 44th Computing in Cardiology Conference, CinC 2017 Vol. 44, p. 1-4
- Full Text: false
- Reviewed:
- Description: Atrial Fibrillation (AF) can lead to life-threatening conditions such as stroke and heart failure. The instant recognition of life-threatening cardiac arrhythmias based on a 3-lead ECG to record a Lead II configuration for a few seconds is a challenging problem of clinical significance. Five consecutive ECG beats that were identified by a cardiologist to characterise an AF episode and five consecutive heartbeat intervals representing an irregular RR intervals episode were analysed. The detection and analysis of P waves as the morphological features of AF was executed based on two template matching methods. An AF detector was developed by combining the correlation coefficients based on the template matching methods and the standard deviation of the RR intervals. The AF detector was then applied to classify 5 consecutive beats as AF or non-AF based on thresholding the calculated irregularity. The proposed algorithm was tested on the MIT-BIH Atrial Fibrillation and the Challenge 2017 databases. The proposed method resulted in an improved sensitivity, specificity and accuracy of 97.60%, 98.20% and 99% respectively compared to recent published methods. In addition, the proposed method is suitable for real-time patient monitoring as it is computationally simple and requires only a few seconds of ECG recording to detect an AF rhythm. © 2017 IEEE Computer Society. All rights reserved.
Data analytics to select markers and cut-off values for clinical scoring
- Authors: Stranieri, Andrew , Yatsko, Andrew , Venkatraman, Sitalakshmi , Jelinek, Herbert
- Date: 2018
- Type: Text , Conference proceedings
- Relation: ACSW '18: Proceedings of the Australasian Computer Science Week Multiconference; Brisbane; 29th January -2nd February 2018 p. 1-6
- Full Text: false
- Reviewed:
- Description: Scoring systems such as the Glasgow-Coma scale used to assess consciousness AusDrisk to assess the risk of diabetes, are prevalent in clinical practice. Scoring systems typically include relevant variables with ordinal values where each value is assigned a weight. Weights for selected values are summed and compared to thresholds for health care professionals to rapidly generate a score. Scoring systems are prevalent in clinical practice because they are easy and quick to use. However, most scoring systems comprise many variables and require some time to calculate an final score. Further, expensive population-wide studies are required to validate a scoring system. In this article, we present a new approach for the generation of a scoring system. The approach uses a search procedure invoking iterative decision tree induction to identify a suite of scoring rules, each of which requires values on only two variables. Twelve scoring rules were discovered using the approach, from an Australian screening program for the assessment of Type 2 Diabetes risk. However, classifications from the 12 rules can conflict. In this paper we argue that a simple rule preference relation is sufficient for the resolution of rule conflicts.
A heuristic gene regulatory networks model for cardiac function and pathology
- Authors: Zarnegar, Armita , Vamplew, Peter , Stranieri, Andrew , Jelinek, Herbert
- Date: 2016
- Type: Text , Conference proceedings
- Relation: 2016 Computing in Cardiology Conference (CinC); Vancouver; 11-14th Sept, 2016
- Full Text: false
- Reviewed:
- Description: Genome-wide association studies (GWAS) and next-generation sequencing (NGS) has led to an increase in information about the human genome and cardiovascular disease. Understanding the role of genes in cardiac function and pathology requires modeling gene interactions and identification of regulatory genes as part of a gene regulatory network (GRN). Feature selection and data reduction not sufficient and require domain knowledge to deal with large data. We propose three novel innovations in constructing a GRN based on heuristics. A 2D Visualised Co-regulation function. Post-processing to identify gene-gene interactions. Finally a threshold algorithm is applied to identify the hub genes that provide the backbone of the GRN. The 2D Visualized Co-regulation function performed significantly better compared to the Pearson's correlation for measuring pairwise associations (t=3.46, df=5, p=0.018). The F-measure, improved from 0.11 to 0.12. The hub network provided a 60% improvement to that reported in the literature. The performance of the hub network was then also compared against ARACNe and performed significantly better (p=0.024). We conclude that a heuristics approach in developing GRNs has potential to improve our understanding of gene regulation and interaction in diverse biological function and disease.
Comparing Pixel N-grams and bag of visual word features for the classification of diabetic retinopathy
- Authors: Kulkarni, Pradnya , Stranieri, Andrew , Jelinek, Herbert
- Date: 2019
- Type: Text , Conference proceedings
- Relation: ACSW 2019: Australasian Computer Science Week 2019;Sydney NSW Australia; January 29 - 31, 2019; published in Proceedings of the Australasian Computer Science Week Multiconference p. 1-7
- Full Text: false
- Reviewed:
- Description: The extraction of Bag of Visual Words (BoVW) features from retinal images for automated classification has been shown to be effective but computationally expensive. Histogram and co-variance matrix features do not generally result in models that have the same predictive accuracy as BoVW and are still computationally expensive. The discovery of features that result in accurate image classification on computationally constrained devices such as smartphones would enable new and promising applications for image classification. For example, smartphone retinal cameras can conceivably make diabetic retinopathy widely available and potentially reduce undiagnosed retinopathy if it could be achieved with computationally simple classification algorithms. A novel image feature extraction technique inspired by N-grams in text mining, called 'Pixel N-grams' is described that can serve this purpose. Results on mammogram and texture classification have shown high accuracy despite the reduced computational complexity. However retinal scan classification results using Pixel N-grams lag behind BoVW approaches. An explanation for the relative poor performance of Pixel N-grams with diabetic retinopathy that draws on concepts associated with the No Free Lunch theorem are presented.
Novel data mining techniques for incompleted clinical data in diabetes management
- Authors: Jelinek, Herbert , Yatsko, Andrew , Stranieri, Andrew , Venkatraman, Sitalakshmi
- Date: 2014
- Type: Text , Journal article
- Relation: British Journal of Applied Science & Technology Vol. 4, no. 33 (2014), p. 4591-4606
- Relation: https://doi.org/10.9734/BJAST/2014/11744
- Full Text:
- Reviewed:
- Description: An important part of health care involves upkeep and interpretation of medical databases containing patient records for clinical decision making, diagnosis and follow-up treatment. Missing clinical entries make it difficult to apply data mining algorithms for clinical decision support. This study demonstrates that higher predictive accuracy is possible using conventional data mining algorithms if missing values are dealt with appropriately. We propose a novel algorithm using a convolution of sub-problems to stage a super problem, where classes are defined by Cartesian Product of class values of the underlying problems, and Incomplete Information Dismissal and Data Completion techniques are applied for reducing features and imputing missing values. Predictive accuracies using Decision Branch, Nearest Neighborhood and Naïve Bayesian classifiers were compared to predict diabetes, cardiovascular disease and hypertension. Data is derived from Diabetes Screening Complications Research Initiative (DiScRi) conducted at a regional Australian university involving more than 2400 patient records with more than one hundred clinical risk factors (attributes). The results show substantial improvements in the accuracy achieved with each classifier for an effective diagnosis of diabetes, cardiovascular disease and hypertension as compared to those achieved without substituting missing values. The gain in improvement is 7% for diabetes, 21% for cardiovascular disease and 24% for hypertension, and our integrated novel approach has resulted in more than 90% accuracy for the diagnosis of any of the three conditions. This work advances data mining research towards achieving an integrated and holistic management of diabetes. - See more at: http://www.sciencedomain.org/abstract.php?iid=670&id=5&aid=6128#.VCSxDfmSx8E
Emerging point of care devices and artificial intelligence : prospects and challenges for public health
- Authors: Stranieri, Andrew , Venkatraman, Sitalakshmi , Minicz, John , Zarnegar, Armita , Firmin, Sally , Balasubramanian, Venki , Jelinek, Herbert
- Date: 2022
- Type: Text , Journal article
- Relation: Smart Health Vol. 24, no. (2022), p.
- Full Text:
- Reviewed:
- Description: Risk assessments for numerous conditions can now be performed cost-effectively and accurately using emerging point of care devices coupled with machine learning algorithms. In this article, the case is advanced that point of care testing in combination with risk assessments generated with artificial intelligence algorithms, applied to the universal screening of the general public for multiple conditions at one session, represents a new kind of in-expensive screening that can lead to the early detection of disease and other public health benefits. A case study of a diabetes screening clinic in a rural area of Australia is presented to illustrate its benefits. Universal, poly-aetiological screening is shown to meet the ten World Health Organisation criteria for screening programmes. © Elsevier Inc.