Integrative multi-omic analyses identify major axes of heterogeneity in chronic obstructive pulmonary disease and uncover their molecular contributors - ECLIPSE replication cohort
Ontology highlight
ABSTRACT: Chronic Obstructive Pulmonary Disease (COPD) is a complex, heterogeneous disease. Traditional subtyping methods generally focus on either the clinical manifestations or the molecular endotypes of the disease, leading to classifications that only partially reflect disease heterogeneity. Here, we introduce a variational autoencoder-based subtyping pipeline that jointly embeds clinical and gene expression data into a single subject-level representation. We evaluate the framework in the COPDGene study, a large study of current and former smoking individuals with and without COPD. Prediction experiments show that the embeddings have predictive accuracy comparable to or better than other unsupervised embedding approaches. Using trajectory learning approaches, we identify five well-separated subtypes with distinct clinical phenotypes, expression signatures, and longitudinal outcomes. Finally, we show that our findings generalize to an external validation cohort. Overall, our approach enables a transition from isolated phenotypic or molecular subtyping toward an integrated and clinically meaningful understanding of COPD heterogeneity. This GEO dataset contains array-based blood gene expression and phenotype data from the ECLIPSE study, which served as a replication population for this study. COPDGene data are available through dbGaP.
ORGANISM(S): Homo sapiens
PROVIDER: GSE324136 | GEO | 2026/03/31
REPOSITORIES: GEO
ACCESS DATA