Unknown

Dataset Information

0

Robust proportional overlapping analysis for feature selection in binary classification within functional genomic experiments.


ABSTRACT: In this paper, a novel feature selection method called Robust Proportional Overlapping Score (RPOS), for microarray gene expression datasets has been proposed, by utilizing the robust measure of dispersion, i.e., Median Absolute Deviation (MAD). This method robustly identifies the most discriminative genes by considering the overlapping scores of the gene expression values for binary class problems. Genes with a high degree of overlap between classes are discarded and the ones that discriminate between the classes are selected. The results of the proposed method are compared with five state-of-the-art gene selection methods based on classification error, Brier score, and sensitivity, by considering eleven gene expression datasets. Classification of observations for different sets of selected genes by the proposed method is carried out by three different classifiers, i.e., random forest, k-nearest neighbors (k-NN), and support vector machine (SVM). Box-plots and stability scores of the results are also shown in this paper. The results reveal that in most of the cases the proposed method outperforms the other methods.

SUBMITTER: Hamraz M 

PROVIDER: S-EPMC8176540 | biostudies-literature | 2021

REPOSITORIES: biostudies-literature

altmetric image

Publications

Robust proportional overlapping analysis for feature selection in binary classification within functional genomic experiments.

Hamraz Muhammad M   Gul Naz N   Raza Mushtaq M   Khan Dost Muhammad DM   Khalil Umair U   Zubair Seema S   Khan Zardad Z  

PeerJ. Computer science 20210601


In this paper, a novel feature selection method called Robust Proportional Overlapping Score (RPOS), for microarray gene expression datasets has been proposed, by utilizing the robust measure of dispersion, i.e., Median Absolute Deviation (MAD). This method robustly identifies the most discriminative genes by considering the overlapping scores of the gene expression values for binary class problems. Genes with a high degree of overlap between classes are discarded and the ones that discriminate  ...[more]

Similar Datasets

| S-EPMC4141116 | biostudies-literature
| S-EPMC8093438 | biostudies-literature
| S-EPMC6986087 | biostudies-literature
| S-EPMC3047290 | biostudies-literature
| S-EPMC5013239 | biostudies-literature
| S-EPMC10581699 | biostudies-literature
| S-EPMC7459797 | biostudies-literature
| S-EPMC7397300 | biostudies-literature
| S-EPMC5373580 | biostudies-literature
| S-EPMC8147911 | biostudies-literature