Project description:One of the most common smoking-related diseases, chronic obstructive pulmonary disease (COPD), results from a dysregulated, multi-tissue inflammatory response to cigarette smoke. We hypothesized that systemic inflammatory signals in genome-wide blood gene expression can identify clinically important COPD-related disease subtypes, and we leveraged pre-existing gene interaction networks to guide unsupervised clustering of blood microarray expression data. Using network-informed non-negative matrix factorization, we analyzed genome-wide blood gene expression from 229 former smokers in the ECLIPSE Study, and we identified novel, clinically relevant molecular subtypes of COPD. These network-informed clusters were more stable and more strongly associated with measures of lung structure and function than clusters derived from a network-naïve approach, and they were associated with subtype-specific enrichment for inflammatory and protein catabolic pathways. These clusters were successfully reproduced in an independent sample of 135 smokers from the COPDGene Study. Briefly, gene expression was derived from whole blood samples in ECLIPSE subjects and peripheral blood mononuclear cells (PBMCs) for the COPDGene subjects. Gene expression profiling was performed using the Affymetrix Human U133 Plus2 array. Gene expression data were log-transformed, and background correction and normalization were performed for the merged ECLIPSE and COPDGene samples using robust multi-array averaging and quantile normalization as implemented in the affy Bioconductor package[27]. Of the 136 COPDGene subjects reported in a previous publication[13], one self-reported African-American subject was removed from analysis, which was conducted on the remaining 135 non-Hispanic white subjects. To identify a set of genes associated with COPD, we performed differential expression analysis for 38,519 probesets in ECLIPSE that passed quality control measures. Normalized probeset intensities were related to measures indicative of two primary dimensions of pulmonary impairment in COPD airway obstruction as indicated by two measures of spirometric lung function (FEV1 (% of predicted) and FEV1/FVC) and lung parenchymal destruction, i.e., emphysema (as quantified by the percentage of low attenuation area less than -950 Hounsfield units on lung computed tomography, %LAA-950). The analysis was conducted using the limma Bioconductor package, and the false discovery rate was controlled at 5%. The following covariates were included in the differential expression analysis age, pack-years of cigarette smoke exposure, and gender. After standardizing gene expression data from 229 ECLIPSE subjects by the variance of each probe set, we applied NMF[29] and NBS[6] to identify meta-patients (i.e. subtypes or subject clusters) and meta-genes (i.e. representative subtype expression profiles). Cross-sectional study of smokers. 229 subjects from the ECLIPSE study were analyzed in the model discovery phase. 135 subjects from the COPDGene Study (GSE42057) were used for replication. Please note that the entire data set for total 364 samples including the re-analyzed samples is provided in the *364samples.txt files.
Project description:One of the most common smoking-related diseases, chronic obstructive pulmonary disease (COPD), results from a dysregulated, multi-tissue inflammatory response to cigarette smoke. We hypothesized that systemic inflammatory signals in genome-wide blood gene expression can identify clinically important COPD-related disease subtypes, and we leveraged pre-existing gene interaction networks to guide unsupervised clustering of blood microarray expression data. Using network-informed non-negative matrix factorization, we analyzed genome-wide blood gene expression from 229 former smokers in the ECLIPSE Study, and we identified novel, clinically relevant molecular subtypes of COPD. These network-informed clusters were more stable and more strongly associated with measures of lung structure and function than clusters derived from a network-naïve approach, and they were associated with subtype-specific enrichment for inflammatory and protein catabolic pathways. These clusters were successfully reproduced in an independent sample of 135 smokers from the COPDGene Study. Briefly, gene expression was derived from whole blood samples in ECLIPSE subjects and peripheral blood mononuclear cells (PBMCs) for the COPDGene subjects. Gene expression profiling was performed using the Affymetrix Human U133 Plus2 array. Gene expression data were log-transformed, and background correction and normalization were performed for the merged ECLIPSE and COPDGene samples using robust multi-array averaging and quantile normalization as implemented in the affy Bioconductor package[27]. Of the 136 COPDGene subjects reported in a previous publication[13], one self-reported African-American subject was removed from analysis, which was conducted on the remaining 135 non-Hispanic white subjects. To identify a set of genes associated with COPD, we performed differential expression analysis for 38,519 probesets in ECLIPSE that passed quality control measures. Normalized probeset intensities were related to measures indicative of two primary dimensions of pulmonary impairment in COPD airway obstruction as indicated by two measures of spirometric lung function (FEV1 (% of predicted) and FEV1/FVC) and lung parenchymal destruction, i.e., emphysema (as quantified by the percentage of low attenuation area less than -950 Hounsfield units on lung computed tomography, %LAA-950). The analysis was conducted using the limma Bioconductor package, and the false discovery rate was controlled at 5%. The following covariates were included in the differential expression analysis age, pack-years of cigarette smoke exposure, and gender.
Project description:Multiple gene expression studies have been performed separately in peripheral blood, lung, and airway tissues to study COPD. We performed RNA-sequencing gene expression profiling of large-airway epithelium, alveolar macrophage and peripheral blood samples from the same set of COPD cases and controls from the COPDGene study who underwent bronchoscopy at a single center. Using statistical and gene set enrichment approaches, we sought to improve the understanding of COPD by studying gene sets and pathways across these tissues, beyond the individual genomic determinants.
Project description:<p>Chronic obstructive pulmonary disease (COPD) is the fourth leading cause of death in the United States and the only leading cause of death that is steadily increasing in frequency. This project will establish a racially diverse cohort that is sufficiently large and appropriately designed for genome-wide association analysis of COPD. A total of 10,000 subjects will be recruited, including control smokers, definite COPD cases (GOLD Stage 2 to 4), and subjects not included in either group (GOLD 1 or GOLD-Unclassified). This cohort will be used for cross-sectional analysis, although long-term longitudinal follow-up will be a future goal. The primary focus of the study will be genome-wide association analysis to identify the genetic risk factors that determine susceptibility for COPD and COPD-related phenotypes. Detailed phenotyping of both cases and controls, including chest CT scan assessment of emphysema and airway disease, will allow identification of genetic determinants for the heterogeneous components of the COPD syndrome. <b>The hypotheses to be studied are: 1) Precise phenotypic characterization of COPD subjects using computed tomography, as well as clinical and physiological measures, will provide data that will enable the broad COPD syndrome to be decomposed into clinically significant subtypes. 2) Genome-wide association studies will identify genetic determinants for COPD susceptibility that will provide insight into clinically relevant COPD subtypes. 3) Distinct genetic determinants influence the development of emphysema and airway disease.</b> The initial phase of genome-wide association analysis included 500 COPD cases and 500 control subjects (all non-Hispanic White) genotyped with the Illumina Omni-1 chip. The second phase genotyped the entire study cohort using the Illumina Omni-Express chip. Unique aspects of the study include: 1) Inclusion of large numbers of African American subjects (approximately 1/3 of the cohort); 2) Obtaining chest CT scans (including inspiratory and expiratory images); and 3) Inclusion of the full range of disease severity.</p> <p><b>The COPDGene_v6 Cohort is utilized in the following dbGaP sub-studies.</b> To view genotypes, other molecular data, and derived variables collected in these sub-studies, please click on the following sub-studies below or in the "Sub-studies" box located on the right hand side of this top-level study page phs000179 COPDGene_v6 Cohort. <ul> <li><a href="./study.cgi?study_id=phs000296">phs000296</a> ESP LungGO COPDGene</li> <li><a href="./study.cgi?study_id=phs000765">phs000765</a> COPDGene_Geno</li> </ul> </p>
Project description:<p>Chronic obstructive pulmonary disease (COPD) is the fourth leading cause of death in the United States and the only leading cause of death that is steadily increasing in frequency. This project will establish a racially diverse cohort that is sufficiently large and appropriately designed for genome-wide association analysis of COPD. A total of 10,000 subjects will be recruited, including control smokers, definite COPD cases (GOLD Stage 2 to 4), and subjects not included in either group (GOLD 1 or GOLD-Unclassified). This cohort will be used for cross-sectional analysis, although long-term longitudinal follow-up will be a future goal. The primary focus of the study will be genome-wide association analysis to identify the genetic risk factors that determine susceptibility for COPD and COPD-related phenotypes. Detailed phenotyping of both cases and controls, including chest CT scan assessment of emphysema and airway disease, will allow identification of genetic determinants for the heterogeneous components of the COPD syndrome. <b>The hypotheses to be studied are: 1) Precise phenotypic characterization of COPD subjects using computed tomography, as well as clinical and physiological measures, will provide data that will enable the broad COPD syndrome to be decomposed into clinically significant subtypes. 2) Genome-wide association studies will identify genetic determinants for COPD susceptibility that will provide insight into clinically relevant COPD subtypes. 3) Distinct genetic determinants influence the development of emphysema and airway disease.</b> The initial phase of genome-wide association analysis included 500 COPD cases and 500 control subjects (all non-Hispanic White) genotyped with the Illumina Omni-1 chip, but plans are being developed to obtain genome-wide association analysis on the entire study cohort (using the Illumina Omni-Express chip). Unique aspects of the study include: 1) Inclusion of large numbers of African American subjects (approximately 1/3 of the cohort); 2) Obtaining chest CT scans (including inspiratory and expiratory images); and 3) Inclusion of the full range of disease severity.</p> <p><b>The COPDGene Cohort is utilized in the following dbGaP sub-study.</b> To view genotypes, other molecular data, and derived variables collected in these sub-study, please click on the following sub-study below or in the "Sub-studies" box located on the right hand side of this top-level study page phs000179 COPDGene Cohort. <ul> <li><a href="./study.cgi?study_id=phs000296">phs000296</a> ESP LungGO COPDGene</li> </ul> </p>