Unknown

Dataset Information

0

Compendiums of cancer transcriptomes for machine learning applications.


ABSTRACT: There are massive transcriptome profiles in the form of microarray. The challenge is that they are processed using diverse platforms and preprocessing tools, requiring considerable time and informatics expertise for cross-dataset analyses. If there exists a single, integrated data source, data-reuse can be facilitated for discovery, analysis, and validation of biomarker-based clinical strategy. Here, we present merged microarray-acquired datasets (MMDs) across 11 major cancer types, curating 8,386 patient-derived tumor and tumor-free samples from 95 GEO datasets. Using machine learning algorithms, we show that diagnostic models trained from MMDs can be directly applied to RNA-seq-acquired TCGA data with high classification accuracy. Machine learning optimized MMD further aids to reveal immune landscape across various carcinomas critically needed in disease management and clinical interventions. This unified data source may serve as an excellent training or test set to apply, develop, and refine machine learning algorithms that can be tapped to better define genomic landscape of human cancers.

SUBMITTER: Lim SB 

PROVIDER: S-EPMC6783425 | biostudies-literature | 2019 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Compendiums of cancer transcriptomes for machine learning applications.

Lim Su Bin SB   Tan Swee Jin SJ   Lim Wan-Teck WT   Lim Chwee Teck CT  

Scientific data 20191008 1


There are massive transcriptome profiles in the form of microarray. The challenge is that they are processed using diverse platforms and preprocessing tools, requiring considerable time and informatics expertise for cross-dataset analyses. If there exists a single, integrated data source, data-reuse can be facilitated for discovery, analysis, and validation of biomarker-based clinical strategy. Here, we present merged microarray-acquired datasets (MMDs) across 11 major cancer types, curating 8,3  ...[more]

Similar Datasets

| S-EPMC8498514 | biostudies-literature
2013-01-01 | E-GEOD-29210 | biostudies-arrayexpress
| S-EPMC7416435 | biostudies-literature
2017-02-01 | GSE85033 | GEO
| S-EPMC6825274 | biostudies-literature
| S-EPMC3223741 | biostudies-literature
| S-EPMC7083992 | biostudies-literature
| S-EPMC7090299 | biostudies-literature