- Title
- Diagnostic with incomplete nominal/discrete data
- Creator
- Jelinek, Herbert; Yatsko, Andrew; Stranieri, Andrew; Venkatraman, Sitalakshmi; Bagirov, Adil
- Date
- 2015
- Type
- Text; Journal article
- Identifier
- http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/155260
- Identifier
- vital:11264
- Identifier
-
https://doi.org/10.5430/air.v4n1p22
- Identifier
- ISSN:1927-6982 1927-6974
- Abstract
- Missing values may be present in data without undermining its use for diagnostic / classification purposes but compromise application of readily available software. Surrogate entries can remedy the situation, although the outcome is generally unknown. Discretization of continuous attributes renders all data nominal and is helpful in dealing with missing values; particularly, no special handling is required for different attribute types. A number of classifiers exist or can be reformulated for this representation. Some classifiers can be reinvented as data completion methods. In this work the Decision Tree, Nearest Neighbour, and Naive Bayesian methods are demonstrated to have the required aptness. An approach is implemented whereby the entered missing values are not necessarily a close match of the true data; however, they intend to cause the least hindrance for classification. The proposed techniques find their application particularly in medical diagnostics. Where clinical data represents a number of related conditions, taking Cartesian product of class values of the underlying sub-problems allows narrowing down of the selection of missing value substitutes. Real-world data examples, some publically available, are enlisted for testing. The proposed and benchmark methods are compared by classifying the data before and after missing value imputation, indicating a significant improvement.
- Publisher
- Sciedu Press
- Relation
- Artificial Intelligence Research Vol. 4, no. 1 (2015), p. 22-35
- Rights
- Copyright © 2015 Jelinek et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- Rights
- Open Access
- Rights
- This metadata is freely available under a CCO license
- Subject
- Classification; Missing values; Categorical data; Continuous features; Discretization; 0801 Artificial Intelligence and Image Processing
- Full Text
- Reviewed
- Hits: 4096
- Visitors: 3847
- Downloads: 242
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | SOURCE1 | Published version | 1 MB | Adobe Acrobat PDF | View Details Download |