Dataset Information


Analyzing kernel matrices for the identification of differentially expressed genes.

ABSTRACT: One of the most important applications of microarray data is the class prediction of biological samples. For this purpose, statistical tests have often been applied to identify the differentially expressed genes (DEGs), followed by the employment of the state-of-the-art learning machines including the Support Vector Machines (SVM) in particular. The SVM is a typical sample-based classifier whose performance comes down to how discriminant samples are. However, DEGs identified by statistical tests are not guaranteed to result in a training dataset composed of discriminant samples. To tackle this problem, a novel gene ranking method namely the Kernel Matrix Gene Selection (KMGS) is proposed. The rationale of the method, which roots in the fundamental ideas of the SVM algorithm, is described. The notion of ''the separability of a sample'' which is estimated by performing [Formula: see text]-like statistics on each column of the kernel matrix, is first introduced. The separability of a classification problem is then measured, from which the significance of a specific gene is deduced. Also described is a method of Kernel Matrix Sequential Forward Selection (KMSFS) which shares the KMGS method's essential ideas but proceeds in a greedy manner. On three public microarray datasets, our proposed algorithms achieved noticeably competitive performance in terms of the B.632+ error rate.


PROVIDER: S-EPMC3857896 | BioStudies | 2013-01-01

REPOSITORIES: biostudies

Similar Datasets

2011-01-01 | S-EPMC3159769 | BioStudies
2019-01-01 | S-EPMC6781126 | BioStudies
| S-EPMC3310257 | BioStudies
2017-01-01 | S-EPMC5526982 | BioStudies
2017-01-01 | S-EPMC5410141 | BioStudies
2014-01-01 | S-EPMC4208248 | BioStudies
1000-01-01 | S-EPMC1877816 | BioStudies
2015-01-01 | S-EPMC4522082 | BioStudies
2016-02-02 | GSE77434 | GEO
2017-01-01 | S-EPMC5763304 | BioStudies