Bulk RNA-Seq gene expression profiles from 13 TCGA cancer types and normal tissues for tumor classification
Ontology highlight
ABSTRACT: This dataset includes bulk RNA-Seq gene expression profiles of 6310 samples from 13 cancer types and normal tissues, obtained from The Cancer Genome Atlas (TCGA) and processed using a standardized pipeline. The raw count matrix, initially containing 60,660 genes, was filtered to retain only valid ENTREZ gene identifiers matched via the org.Hs.eg.db annotation package. Genes with zero variance across samples were removed. Normalization was performed by transforming expression values into log2 Counts Per Million (CPM) using the edgeR::cpm() function with log transformation. To ensure all values were positive and suitable for downstream modeling, a global shift was applied by adding the absolute minimum value. Subsequently, Gene Ontology (GO) analysis was performed across all three categories (Biological Process, Molecular Function, Cellular Component) using the TopGO package. Significant genes were selected, followed by Benjamini–Hochberg p-value correction, and validated using both Fisher’s and Kolmogorov–Smirnov tests. The resulting expression matrix consists of 18,564 functionally relevant genes across 6310 samples and forms the basis for clustering, classification, and metric evaluation within hybrid modeling frameworks for cancer diagnostics.
ORGANISM(S): Homo sapiens
PROVIDER: GSE304485 | GEO | 2026/02/12
REPOSITORIES: GEO
ACCESS DATA