A study on the importance of differential prioritization in feature selection using toy datasets
- Authors: Ooi, Chia , Teng, Shyh , Chetty, Madhu
- Date: 2008
- Type: Text , Conference paper
- Relation: Third IAPR International Conference, PRIB
- Full Text: false
- Reviewed:
- Description: Previous empirical works have shown the effectiveness of differential prioritization in feature selection prior to molecular classification. We now propose to determine the theoretical basis for the concept of differential prioritization through mathematical analyses of the characteristics of predictor sets found using different values of the DDP (degree of differential prioritization) from realistic toy datasets. Mathematical analyses based on analytical measures such as distance between classes are implemented on these predictor sets. We demonstrate that the optimal value of the DDP is capable of forming a predictor set which consists of classes of features which are well separated and are highly correlated to the target classes – a characteristic of a truly optimal predictor set. From these analyses, the necessity of adjusting the DDP based on the dataset of interest is confirmed in a mathematical manner, indicating that the DDP-based feature selection technique is superior to both simplistic rank-based selection and state-of-the-art equal-priorities scoring methods. Applying similar analyses to real-life multiclass microarray datasets, we obtain further proof of the theoretical significance of the DDP for practical applications
Degree of differential prioritization
- Authors: Ooi, Chia , Chetty, Madhu , Teng, Shyh
- Date: 2009
- Type: Journal article
- Relation: IEEE Engineering in Medicine and Biology Magazine Vol. 28, no. 4 (2009), p. 45-51
- Full Text: false
- Reviewed:
- Description: Because of the high dimensionality of the microarray data sets, feature selection (FS) has become an important challenge in molecular classification. Using the degree of differential prioritization (DDP) between relevance and antiredundancy, our proposed DDP-based FS technique is capable of achieving better accuracies than those previously reported, using a smaller predictor set. However, previously, we have neither devised nor used any method for determining the value of the DDP to be used for the data set of interest before the FS process. In this article, we propose a system for predicting the optimal value of the DDP, which costs less computationally than conventional tuning while maintaining the independence of the FS technique from the type of underlying classifier used