Project description:BackgroundThe National Birth Defects Prevention Study (NBDPS) contains a wealth of information on affected and unaffected family triads, and thus provides numerous opportunities to study gene-environment interactions (G×E) in the etiology of birth defect outcomes. Depending on the research objective, several analytic options exist to estimate G×E effects that use varying combinations of individuals drawn from available triads.MethodsIn this study, we discuss important considerations in the collection of genetic data and environmental exposures.ResultsWe will also present several population- and family-based approaches that can be applied to data from the NBDPS including case-control, case-only, family-based trio, and maternal versus fetal effects. For each, we describe the data requirements, applicable statistical methods, advantages, and disadvantages.ConclusionA range of approaches can be used to evaluate potentially important G×E effects in the NBDPS. Investigators should be aware of the limitations inherent to each approach when choosing a study design and interpreting results.
Project description:Alzheimer's disease (AD) is the leading cause of dementia in the United States and afflicts >5.7 million Americans in 2018. Therapeutic options remain extremely limited to those that are symptom targeting, while no drugs have been approved for the modification or reversal of the disease itself. Risk factors for AD including aging, the female sex, as well as carrying an APOE4 genotype. These risk factors have been extensively examined in the literature, while less attention has been paid to modifiable risk factors, including lifestyle, and environmental risk factors such as exposures to air pollution and pesticides. This review highlights the most recent data on risk factors in AD and identifies gene by environment interactions that have been investigated. It also provides a suggested framework for a personalized therapeutic approach to AD, by combining genetic, environmental and lifestyle risk factors. Understanding modifiable risk factors and their interaction with non-modifiable factors (age, susceptibility alleles, and sex) is paramount for designing personalized therapeutic interventions.
Project description:For analysis of the main effects of SNPs, meta-analysis of summary results from individual studies has been shown to provide comparable results as "mega-analysis" that jointly analyzes the pooled participant data from the available studies. This fact revolutionized the genetic analysis of complex traits through large GWAS consortia. Investigations of gene-environment (G×E) interactions are on the rise since they can potentially explain a part of the missing heritability and identify individuals at high risk for disease. However, for analysis of gene-environment interactions, it is not known whether these methods yield comparable results. In this empirical study, we report that the results from both methods were largely consistent for all four tests; the standard 1 degree of freedom (df) test of main effect only, the 1 df test of the main effect (in the presence of interaction effect), the 1 df test of the interaction effect, and the joint 2 df test of main and interaction effects. They provided similar effect size and standard error estimates, leading to comparable P-values. The genomic inflation factors and the number of SNPs with various thresholds were also comparable between the two approaches. Mega-analysis is not always feasible especially in very large and diverse consortia since pooling of raw data may be limited by the terms of the informed consent. Our study illustrates that meta-analysis can be an effective approach also for identifying interactions. To our knowledge, this is the first report investigating meta-versus mega-analyses for interactions.
Project description:The analysis of gene-environment interaction (G×E) may hold the key for further understanding the etiology of many complex traits. The current availability of high-volume genetic data, the wide range in types of environmental data that can be measured, and the formation of consortiums of multiple studies provide new opportunities to identify G×E but also new analytical challenges. In this article, we summarize several statistical approaches that can be used to test for G×E in a genome-wide association study. These include traditional models of G×E in a case-control or quantitative trait study as well as alternative approaches that can provide substantially greater power. The latest methods for analyzing G×E with gene sets and with data in a consortium setting are summarized, as are issues that arise due to the complexity of environmental data. We provide some speculation on why detecting G×E in a genome-wide association study has thus far been difficult. We conclude with a description of software programs that can be used to implement most of the methods described in the paper.
Project description:BackgroundData artifacts due to variations in experimental handling are ubiquitous in microarray studies, and they can lead to biased and irreproducible findings. A popular approach to correct for such artifacts is through post hoc data adjustment such as data normalization. Statistical methods for data normalization have been developed and evaluated primarily for the discovery of individual molecular biomarkers. Their performance has rarely been studied for the development of multi-marker molecular classifiers-an increasingly important application of microarrays in the era of personalized medicine.MethodsIn this study, we set out to evaluate the performance of three commonly used methods for data normalization in the context of molecular classification, using extensive simulations based on re-sampling from a unique pair of microRNA microarray datasets for the same set of samples. The data and code for our simulations are freely available as R packages at GitHub.ResultsIn the presence of confounding handling effects, all three normalization methods tended to improve the accuracy of the classifier when evaluated in an independent test data. The level of improvement and the relative performance among the normalization methods depended on the relative level of molecular signal, the distributional pattern of handling effects (e.g., location shift vs scale change), and the statistical method used for building the classifier. In addition, cross-validation was associated with biased estimation of classification accuracy in the over-optimistic direction for all three normalization methods.ConclusionNormalization may improve the accuracy of molecular classification for data with confounding handling effects; however, it cannot circumvent the over-optimistic findings associated with cross-validation for assessing classification accuracy.
Project description:The underlying pathogenesis of asthma, one of the most common chronic diseases of childhood, is not fully understood. There is a well-documented heritable component to this disease and environmental factors associated with a Westernised lifestyle have also been implicated; recent studies suggest gene-environment interactions are important in the development of this disease. In the absence of a previous review in children, the present report presents the accumulating evidence for gene-environment interactions in asthma pathogenesis. Studies of these interactions in different populations have yielded both expected and unexpected results. This is a new and rapidly developing field where there are currently many more questions than answers.
Project description:BackgroundThe identification of gene-gene and gene-environment interactions in genome-wide association studies is challenging due to the unknown nature of the interactions and the overwhelmingly large number of possible combinations. Parametric regression models are suitable to look for prespecified interactions. Nonparametric models such as tree ensemble models, with the ability to detect any unspecified interaction, have previously been difficult to interpret. However, with the development of methods for model explainability, it is now possible to interpret tree ensemble models efficiently and with a strong theoretical basis.ResultsWe propose a tree ensemble- and SHAP-based method for identifying as well as interpreting potential gene-gene and gene-environment interactions on large-scale biobank data. A set of independent cross-validation runs are used to implicitly investigate the whole genome. We apply and evaluate the method using data from the UK Biobank with obesity as the phenotype. The results are in line with previous research on obesity as we identify top SNPs previously associated with obesity. We further demonstrate how to interpret and visualize interaction candidates.ConclusionsThe new method identifies interaction candidates otherwise not detected with parametric regression models. However, further research is needed to evaluate the uncertainties of these candidates. The method can be applied to large-scale biobanks with high-dimensional data.
Project description:BackgroundWe address the problem of integratively analyzing multiple gene expression, microarray datasets in order to reconstruct gene-gene interaction networks. Integrating multiple datasets is generally believed to provide increased statistical power and to lead to a better characterization of the system under study. However, the presence of systematic variation across different studies makes network reverse-engineering tasks particularly challenging. We contrast two approaches that have been frequently used in the literature for addressing systematic biases: meta-analysis methods, which first calculate opportune statistics on single datasets and successively summarize them, and data-merging methods, which directly analyze the pooled data after removing eventual biases. This comparative evaluation is performed on both synthetic and real data, the latter consisting of two manually curated microarray compendia comprising several E. coli and Yeast studies, respectively. Furthermore, the reconstruction of the regulatory network of the transcription factor Ikaros in human Peripheral Blood Mononuclear Cells (PBMCs) is presented as a case-study.ResultsThe meta-analysis and data-merging methods included in our experimentations provided comparable performances on both synthetic and real data. Furthermore, both approaches outperformed (a) the naïve solution of merging data together ignoring possible biases, and (b) the results that are expected when only one dataset out of the available ones is analyzed in isolation. Using correlation statistics proved to be more effective than using p-values for correctly ranking candidate interactions. The results from the PBMC case-study indicate that the findings of the present study generalize to different types of network reconstruction algorithms.ConclusionsIgnoring the systematic variations that differentiate heterogeneous studies can produce results that are statistically indistinguishable from random guessing. Meta-analysis and data merging methods have proved equally effective in addressing this issue, and thus researchers may safely select the approach that best suit their specific application.
Project description:Genetic association analyses often involve data from multiple potentially-heterogeneous subgroups. The expected amount of heterogeneity can vary from modest (e.g. a typical meta-analysis), to large (e.g. a strong gene-environment interaction). However, existing statistical tools are limited in their ability to address such heterogeneity. Indeed, most genetic association meta-analyses use a "fixed effects" analysis, which assumes no heterogeneity. Here we develop and apply Bayesian association methods to address this problem. These methods are easy to apply (in the simplest case, requiring only a point estimate for the genetic effect, and its standard error, from each subgroup), and effectively include standard frequentist meta-analysis methods, including the usual "fixed effects" analysis, as special cases. We apply these tools to two large genetic association studies: one a meta-analysis of genome-wide association studies from the Global Lipids consortium, and the second a cross-population analysis for expression quantitative trait loci (eQTLs). In the Global Lipids data we find, perhaps surprisingly, that effects are generally quite homogeneous across studies. In the eQTL study we find that eQTLs are generally shared among different continental groups, and discuss consequences of this for study design.
Project description:BackgroundGene-environment interaction studies using genome-wide association study data are often underpowered after adjustment for multiple comparisons. Differential gene expression in response to the exposure of interest can capture the most biologically relevant genes at the genome-wide level.ObjectiveWe used differential genome-wide expression profiles from the Epidemiology of Home Allergens and Asthma birth cohort in response to Der f 1 allergen (sensitized vs nonsensitized) to inform a gene-environment study of dust mite exposure and asthma severity.MethodsPolymorphisms in differentially expressed genes were identified in genome-wide association study data from the Childhood Asthma Management Program, a clinical trial in childhood asthmatic patients. Home dust mite allergen levels (<10 or ≥10 μg/g dust) were assessed at baseline, and (≥1) severe asthma exacerbation (emergency department visit or hospitalization for asthma in the first trial year) served as the disease severity outcome. The Genetics of Asthma in Costa Rica Study and a Puerto Rico/Connecticut asthma cohort were used for replication.ResultsIL9, IL5, and proteoglycan 2 expression (PRG2) was upregulated in Der f 1-stimulated PBMCs from dust mite-sensitized patients (adjusted P < .04). IL9 polymorphisms (rs11741137, rs2069885, and rs1859430) showed evidence for interaction with dust mite in the Childhood Asthma Management Program (P = .02 to .03), with replication in the Genetics of Asthma in Costa Rica Study (P = .04). Subjects with the dominant genotype for these IL9 polymorphisms were more likely to report a severe asthma exacerbation if exposed to increased dust mite levels.ConclusionsGenome-wide differential gene expression in response to dust mite allergen identified IL9, a biologically plausible gene target that might interact with environmental dust mite to increase severe asthma exacerbations in children.