Gene expression imputation techniques for robust post genomic knowledge discovery
- Authors: Sehgal, Muhammad Shoaib B , Gondal, Iqbal , Dooley, Laurence
- Date: 2008
- Type: Text , Book chapter
- Relation: Studies in Computational Intelligence p. 185-206
- Full Text: false
- Reviewed:
- Description: Microarrays measure expression patterns of thousands of genes at a time, under same or diverse conditions, to facilitate faster analysis of biological processes. This gene expression data is being widely used for diagnosis, prognosis and tailored drug discovery. Microarray data, however, commonly contains missing values, which can have high impact on subsequent biological knowledge discovery methods. This has been catalyst for the manifest of different imputation algorithms, including Collateral Missing Value Estimation (CMVE), Bayesian Principal Component Analysis (BPCA), Least Square Impute (LSImpute), Local Least Square Impute (LLSImpute) and K-Nearest Neighbour (KNN). This Chapter investigates the impact of missing values on post genomic knowledge discovery methods like, Gene Selection and Gene Regulatory Network (GRN) reconstruction. A framework for robust subsequent biological knowledge inference has been proposed which has shown significant improvements in the outcomes of Gene Selection and GRN reconstruction methods.
Diagnostic with incomplete nominal/discrete data
- Authors: Jelinek, Herbert , Yatsko, Andrew , Stranieri, Andrew , Venkatraman, Sitalakshmi , Bagirov, Adil
- Date: 2015
- Type: Text , Journal article
- Relation: Artificial Intelligence Research Vol. 4, no. 1 (2015), p. 22-35
- Full Text:
- Reviewed:
- Description: Missing values may be present in data without undermining its use for diagnostic / classification purposes but compromise application of readily available software. Surrogate entries can remedy the situation, although the outcome is generally unknown. Discretization of continuous attributes renders all data nominal and is helpful in dealing with missing values; particularly, no special handling is required for different attribute types. A number of classifiers exist or can be reformulated for this representation. Some classifiers can be reinvented as data completion methods. In this work the Decision Tree, Nearest Neighbour, and Naive Bayesian methods are demonstrated to have the required aptness. An approach is implemented whereby the entered missing values are not necessarily a close match of the true data; however, they intend to cause the least hindrance for classification. The proposed techniques find their application particularly in medical diagnostics. Where clinical data represents a number of related conditions, taking Cartesian product of class values of the underlying sub-problems allows narrowing down of the selection of missing value substitutes. Real-world data examples, some publically available, are enlisted for testing. The proposed and benchmark methods are compared by classifying the data before and after missing value imputation, indicating a significant improvement.