A classification algorithm that derives weighted sum scores for insight into disease
- Authors: Quinn, Anthony , Stranieri, Andrew , Yearwood, John , Hafen, Gaudenz
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at Third Australasian Workshop on Health Informatics and Knowledge Management (HIKM 2009), Wellington, New Zealand : Vol. 97, p. 13-17
- Full Text:
- Description: Data mining is often performed with datasets associated with diseases in order to increase insights that can ultimately lead to improved prevention or treatment. Classification algorithms can achieve high levels of predictive accuracy but have limited application for facilitating the insight that leads to deeper understanding of aspects of the disease. This is because the representation of knowledge that arises from classification algorithms is too opaque, too complex or too sparse to facilitate insight. Clustering, association and visualisation approaches enable greater scope for clinicians to be engaged in a way that leads to insight, however predictive accuracy is compromised or non-existent. This research investigates the practical applications of Automated Weighted Sum, (AWSum), a classification algorithm that provides accuracy comparable to other techniques whilst providing some insight into the data. This is achieved by calculating a weight for each feature value that represents its influence on the class value. Clinicians are very familiar with weighted sum scoring scales so the internal representation is intuitive and easily understood. This paper presents results from the use of the AWSum approach with data from patients suffering from Cystic Fibrosis.
A new scoring system in Cystic Fibrosis : Statistical tools for database analysis - A preliminary report
- Authors: Hafen, Gaudenz , Hurst, Cameron , Yearwood, John , Smith, Julie , Dzalilov, Zari , Robinson, P. J.
- Date: 2008
- Type: Text , Journal article
- Relation: BMC Medical Informatics and Decision Making Vol. 8, no. 44 (2008), p.1-11
- Full Text:
- Reviewed:
- Description: Background. Cystic fibrosis is the most common fatal genetic disorder in the Caucasian population. Scoring systems for assessment of Cystic fibrosis disease severity have been used for almost 50 years, without being adapted to the milder phenotype of the disease in the 21st century. The aim of this current project is to develop a new scoring system using a database and employing various statistical tools. This study protocol reports the development of the statistical tools in order to create such a scoring system. Methods. The evaluation is based on the Cystic Fibrosis database from the cohort at the Royal Children's Hospital in Melbourne. Initially, unsupervised clustering of the all data records was performed using a range of clustering algorithms. In particular incremental clustering algorithms were used. The clusters obtained were characterised using rules from decision trees and the results examined by clinicians. In order to obtain a clearer definition of classes expert opinion of each individual's clinical severity was sought. After data preparation including expert-opinion of an individual's clinical severity on a 3 point-scale (mild, moderate and severe disease), two multivariate techniques were used throughout the analysis to establish a method that would have a better success in feature selection and model derivation: 'Canonical Analysis of Principal Coordinates' and 'Linear Discriminant Analysis'. A 3-step procedure was performed with (1) selection of features, (2) extracting 5 severity classes out of a 3 severity class as defined per expert-opinion and (3) establishment of calibration datasets. Results. (1) Feature selection: CAP has a more effective "modelling" focus than DA. (2) Extraction of 5 severity classes: after variables were identified as important in discriminating contiguous CF severity groups on the 3-point scale as mild/moderate and moderate/severe, Discriminant Function (DF) was used to determine the new groups mild, intermediate moderate, moderate, intermediate severe and severe disease. (3) Generated confusion tables showed a misclassification rate of 19.1% for males and 16.5% for females, with a majority of misallocations into adjacent severity classes particularly for males. Conclusion. Our preliminary data show that using CAP for detection of selection features and Linear DA to derive the actual model in a CF database might be helpful in developing a scoring system. However, there are several limitations, particularly more data entry points are needed to finalize a score and the statistical tools have further to be refined and validated, with re-running the statistical methods in the larger dataset. © 2008 Hafen et al; licensee BioMed Central Ltd.
AWSum - Data mining for insight
- Authors: Quinn, Anthony , Stranieri, Andrew , Yearwood, John , Hafen, Gaudenz
- Date: 2008
- Type: Text , Journal article
- Relation: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 5139 LNAI, no. (8 October 2008 through 10 October 2008 2008), p. 524-531
- Full Text: false
- Reviewed:
- Description: Many classifiers achieve high levels of accuracy but have limited use in real world problems because they provide little insight into data sets, are difficult to interpret and require expertise to use. In areas such as health informatics not only do analysts require accurate classifications but they also want some insight into the influences on the classification. This can then be used to direct research and formulate interventions. This research investigates the practical applications of Automated Weighted Sum, (AWSum), a classifier that gives accuracy comparable to other techniques whist providing insight into the data. AWSum achieves this by calculating a weight for each feature value that represents its influence on the class value. The merits of AWSum in classification and insight are tested on a Cystic Fibrosis dataset with positive results. © 2008 Springer-Verlag Berlin Heidelberg.
- Description: 2003006692
AWSum -Combining classification with knowledge acquisition
- Authors: Quinn, Anthony , Stranieri, Andrew , Yearwood, John , Hafen, Gaudenz , Jelinek, Herbert
- Date: 2008
- Type: Text , Journal article
- Relation: International Journal of Software and Informatics Vol. 2, no. 2 (2008), p. 199-214
- Full Text: false
- Reviewed:
- Description: Many classifiers achieve high levels of accuracy but have limited applicability in real world situations because they do not lead to a greater understanding or insight into the way features influence the classification. In areas such as health informatics a classifier that clearly identifies the influences on classification can be used to direct research and formulate interventions. This research investigates the practical aplications of Automated Weighted Sum, (AWSum), a classifier that provides accuracy comparable to other techniques whist providing insight into the data. This is achieved by calculating a weight for each feature value that represents its influence on the class value. The merits of this approach in classification and insight are evaluated on a Cystic Fibrosis and diabetes datasets with positive results.