WBT-DC Pipeline: Whole Blood Transcriptomics data-based Disease Classification
Ontology highlight
ABSTRACT: Machine learning together with cell/tissue transcriptomics data has been widely used for disease classification. However, obtaining transcriptomics data for human tissues require invasive procedures, making it challenging for widespread application in the clinic. In this study, we developed the WBT-DC (Whole Blood Transcriptomics (WBT) data based Disease Classification). We utilized gene rank-based methods for feature extraction to mitigate issues associated with batch effects and gene noise. We applied the ensemble machine learning model, Random Forest, and performed cross-validation and model tuning. We evaluated our methods on four different diseases including crohn's disease (CD), ulcerative colitis (UC) and amyotrophic lateral sclerosis (ALS) and rheumatoid arthritis (RA) datasets, using data from seven independent cohorts and 2,452 participants, across RNA-Sequencing and microarrays. Our machine learning based WBT-DC pipeline demonstrated a robust performance across various disease datasets and different transcriptomics platforms, establishing itself as a valuable non-invasive tool for future disease classification and prediction.
ORGANISM(S): Homo sapiens
PROVIDER: GSE282218 | GEO | 2026/05/20
REPOSITORIES: GEO
ACCESS DATA