Unknown

Dataset Information

0

Optimized application of penalized regression methods to diverse genomic data.


ABSTRACT:

Motivation

Penalized regression methods have been adopted widely for high-dimensional feature selection and prediction in many bioinformatic and biostatistical contexts. While their theoretical properties are well-understood, specific methodology for their optimal application to genomic data has not been determined.

Results

Through simulation of contrasting scenarios of correlated high-dimensional survival data, we compared the LASSO, Ridge and Elastic Net penalties for prediction and variable selection. We found that a 2D tuning of the Elastic Net penalties was necessary to avoid mimicking the performance of LASSO or Ridge regression. Furthermore, we found that in a simulated scenario favoring the LASSO penalty, a univariate pre-filter made the Elastic Net behave more like Ridge regression, which was detrimental to prediction performance. We demonstrate the real-life application of these methods to predicting the survival of cancer patients from microarray data, and to classification of obese and lean individuals from metagenomic data. Based on these results, we provide an optimized set of guidelines for the application of penalized regression for reproducible class comparison and prediction with genomic data.

Availability and implementation

A parallelized implementation of the methods presented for regression and for simulation of synthetic data is provided as the pensim R package, available at http://cran.r-project.org/web/packages/pensim/index.html.

Contact

chuttenh@hsph.harvard.edu; juris@ai.utoronto.ca

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Waldron L 

PROVIDER: S-EPMC3232376 | biostudies-literature | 2011 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Optimized application of penalized regression methods to diverse genomic data.

Waldron Levi L   Pintilie Melania M   Tsao Ming-Sound MS   Shepherd Frances A FA   Huttenhower Curtis C   Jurisica Igor I  

Bioinformatics (Oxford, England) 20111201 24


<h4>Motivation</h4>Penalized regression methods have been adopted widely for high-dimensional feature selection and prediction in many bioinformatic and biostatistical contexts. While their theoretical properties are well-understood, specific methodology for their optimal application to genomic data has not been determined.<h4>Results</h4>Through simulation of contrasting scenarios of correlated high-dimensional survival data, we compared the LASSO, Ridge and Elastic Net penalties for prediction  ...[more]

Similar Datasets

| S-EPMC4007772 | biostudies-literature
2015-08-04 | GSE71669 | GEO
2015-08-04 | E-GEOD-71669 | biostudies-arrayexpress
2015-08-04 | GSE71576 | GEO
2015-08-04 | GSE71666 | GEO
2015-08-04 | E-GEOD-71576 | biostudies-arrayexpress
2015-08-04 | E-GEOD-71666 | biostudies-arrayexpress
| S-EPMC4672920 | biostudies-literature
| S-EPMC3338337 | biostudies-literature
| S-EPMC5345248 | biostudies-literature