ABSTRACT: Gene-environment interactions have the potential to shed light on biological processes leading to disease and to improve the accuracy of epidemiological risk models. However, relatively few such interactions have yet been confirmed. In part this is because genetic markers such as tag SNPs are usually studied, rather than the causal variants themselves. Previous work has shown that this leads to substantial loss of power and increased sample size when gene and environment are independent. However, dependence between gene and environment can arise in several ways including mediation, pleiotropy, and confounding, and several examples of gene-environment interaction under gene-environment dependence have recently been published. Here we show that under gene-environment dependence, a statistical interaction can be present between a marker and environment even if there is no interaction between the causal variant and the environment. We give simple conditions under which there is no marker-environment interaction and note that they do not hold in general when there is gene-environment dependence. Furthermore, the gene-environment dependence applies to the causal variant and cannot be assessed from marker data. Gene-gene interactions are susceptible to the same problem if two causal variants are in linkage disequilibrium. In addition to existing concerns about mechanistic interpretations, we suggest further caution in reporting interactions for genetic markers.
Project description:Gene-environment interactions have the potential to shed light on biological processes leading to disease, identify individuals for whom risk factors are most relevant, and improve the accuracy of epidemiological risk models. We review the progress that has been made in investigating gene-environment interactions in the field of breast cancer. Although several large-scale analyses have been carried out, only a few significant interactions have been reported. One of these, an interaction between CASP8-rs1045485 and alcohol consumption has been replicated, but others have not, including LSP1- rs3817198 and parity, and 1p11.2-rs11249433 and ever being parous. False positive interactions may arise if the gene and environment are correlated and the causal variant is less frequent than the tag SNP. We conclude that while much progress has been made in this area it is still too soon to tell whether gene-environment interactions will fulfil their promise. Before we can make this assessment we will need to replicate (or refute) the reported interactions, identify the causal variants that underlie tag-SNP associations and validate the next generation of epidemiological risk models.
Project description:Despite current enthusiasm for investigation of gene-gene interactions and gene-environment interactions, the essential issue of how to define and detect gene-environment interactions remains unresolved. In this report, we define gene-environment interactions as a stochastic dependence in the context of the effects of the genetic and environmental risk factors on the cause of phenotypic variation among individuals. We use mutual information that is widely used in communication and complex system analysis to measure gene-environment interactions. We investigate how gene-environment interactions generate the large difference in the information measure of gene-environment interactions between the general population and a diseased population, which motives us to develop mutual information-based statistics for testing gene-environment interactions. We validated the null distribution and calculated the type 1 error rates for the mutual information-based statistics to test gene-environment interactions using extensive simulation studies. We found that the new test statistics were more powerful than the traditional logistic regression under several disease models. Finally, in order to further evaluate the performance of our new method, we applied the mutual information-based statistics to three real examples. Our results showed that P-values for the mutual information-based statistics were much smaller than that obtained by other approaches including logistic regression models.
Project description:Widespread multifactor interactions present a significant challenge in determining risk factors of complex diseases. Several combinatorial approaches, such as the multifactor dimensionality reduction (MDR) method, have emerged as a promising tool for better detecting gene-gene (G x G) and gene-environment (G x E) interactions. We recently developed a general combinatorial approach, namely the generalized multifactor dimensionality reduction (GMDR) method, which can entertain both qualitative and quantitative phenotypes and allows for both discrete and continuous covariates to detect G x G and G x E interactions in a sample of unrelated individuals. In this article, we report the development of an algorithm that can be used to study G x G and G x E interactions for family-based designs, called pedigree-based GMDR (PGMDR). Compared to the available method, our proposed method has several major improvements, including allowing for covariate adjustments and being applicable to arbitrary phenotypes, arbitrary pedigree structures, and arbitrary patterns of missing marker genotypes. Our Monte Carlo simulations provide evidence that the PGMDR method is superior in performance to identify epistatic loci compared to the MDR-pedigree disequilibrium test (PDT). Finally, we applied our proposed approach to a genetic data set on tobacco dependence and found a significant interaction between two taste receptor genes (i.e., TAS2R16 and TAS2R38) in affecting nicotine dependence.
Project description:We consider in this paper testing for interactions between a genetic marker set and an environmental variable. A common practice in studying gene-environment (GE) interactions is to analyze one single-nucleotide polymorphism (SNP) at a time. It is of significant interest to analyze SNPs in a biologically defined set simultaneously, e.g. gene or pathway. In this paper, we first show that if the main effects of multiple SNPs in a set are associated with a disease/trait, the classical single SNP-GE interaction analysis can be biased. We derive the asymptotic bias and study the conditions under which the classical single SNP-GE interaction analysis is unbiased. We further show that, the simple minimum p-value-based SNP-set GE analysis, can be biased and have an inflated Type 1 error rate. To overcome these difficulties, we propose a computationally efficient and powerful gene-environment set association test (GESAT) in generalized linear models. Our method tests for SNP-set by environment interactions using a variance component test, and estimates the main SNP effects under the null hypothesis using ridge regression. We evaluate the performance of GESAT using simulation studies, and apply GESAT to data from the Harvard lung cancer genetic study to investigate GE interactions between the SNPs in the 15q24-25.1 region and smoking on lung cancer risk.
Project description:Background:Mendelian randomization (MR) has developed into an established method for strengthening causal inference and estimating causal effects, largely due to the proliferation of genome-wide association studies. However, genetic instruments remain controversial, as horizontal pleiotropic effects can introduce bias into causal estimates. Recent work has highlighted the potential of gene-environment interactions in detecting and correcting for pleiotropic bias in MR analyses. Methods:We introduce MR using Gene-by-Environment interactions (MRGxE) as a framework capable of identifying and correcting for pleiotropic bias. If an instrument-covariate interaction induces variation in the association between a genetic instrument and exposure, it is possible to identify and correct for pleiotropic effects. The interpretation of MRGxE is similar to conventional summary MR approaches, with a particular advantage of MRGxE being the ability to assess the validity of an individual instrument. Results:We investigate the effect of adiposity, measured using body mass index (BMI), upon systolic blood pressure (SBP) using data from the UK Biobank and a single weighted allelic score informed by data from the GIANT consortium. We find MRGxE produces findings in agreement with two-sample summary MR approaches. Further, we perform simulations highlighting the utility of the approach even when the MRGxE assumptions are violated. Conclusions:By utilizing instrument-covariate interactions in MR analyses implemented within a linear-regression framework, it is possible to identify and correct for horizontal pleiotropic bias, provided the average magnitude of pleiotropy is constant across interaction-covariate subgroups.
Project description:Investigating the most likely causal variants identified by fine-mapping analyses may improve the power to detect gene-environment interactions. We assessed the interplay between 70 single nucleotide polymorphisms identified by genetic fine-scale mapping of susceptibility loci and 11 epidemiological breast cancer risk factors in relation to breast cancer. Analyses were conducted on up to 58,573 subjects (26,968 cases and 31,605 controls) from the Breast Cancer Association Consortium, in one of the largest studies of its kind. Analyses were carried out separately for estrogen receptor (ER) positive (ER+) and ER negative (ER-) disease. The Bayesian False Discovery Probability (BFDP) was computed to assess the noteworthiness of the results. Four potential gene-environment interactions were identified as noteworthy (BFDP?<?0.80) when assuming a true prior interaction probability of 0.01. The strongest interaction result in relation to overall breast cancer risk was found between CFLAR-rs7558475 and current smoking (ORint ?=?0.77, 95% CI: 0.67-0.88, pint ?=?1.8 × 10-4 ). The interaction with the strongest statistical evidence was found between 5q14-rs7707921 and alcohol consumption (ORint =1.36, 95% CI: 1.16-1.59, pint ?=?1.9 × 10-5 ) in relation to ER- disease risk. The remaining two gene-environment interactions were also identified in relation to ER- breast cancer risk and were found between 3p21-rs6796502 and age at menarche (ORint ?=?1.26, 95% CI: 1.12-1.43, pint =1.8 × 10-4 ) and between 8q23-rs13267382 and age at first full-term pregnancy (ORint ?=?0.89, 95% CI: 0.83-0.95, pint ?=?5.2 × 10-4 ). While these results do not suggest any strong gene-environment interactions, our results may still be useful to inform experimental studies. These may in turn, shed light on the potential interactions observed.
Project description:A central problem in genetic epidemiology is to identify and rank genetic markers involved in a disease. Complex diseases, such as cancer, hypertension, diabetes, are thought to be caused by an interaction of a panel of genetic factors, that can be identified by markers, which modulate environmental factors. Moreover, the effect of each genetic marker may be small. Hence, the association signal may be missed unless a large sample is considered, or a priori biomedical data are used. Recent advances generated a vast variety of a priori information, including linkage maps and information about gene regulatory dependence assembled into curated pathway databases. We propose a genotype-based approach that takes into account linkage disequilibrium (LD) information between genetic markers that are in moderate LD while modeling gene-gene and gene-environment interactions. A major advantage of our method is that the observed genetic information enters a model directly thus eliminating the need to estimate haplotype-phase. Our approach results in an algorithm that is inexpensive computationally and does not suffer from bias induced by haplotype-phase ambiguity. We investigated our model in a series of simulation experiments and demonstrated that the proposed approach results in estimates that are nearly unbiased and have small variability. We applied our method to the analysis of data from a melanoma case-control study and investigated interaction between a set of pigmentation genes and environmental factors defined by age and gender. Furthermore, an application of our method is demonstrated using a study of Alcohol Dependence.
Project description:Family-based designs protect analyses of genetic effects from bias that is due to population stratification. Investigators have assumed that this robustness extends to assessments of gene-environment interaction. Unfortunately, this assumption fails for the common scenario in which the genotyped variant is related to risk through linkage with a causative allele. Bias also plagues other methods of assessment of gene-environment interaction. When testing against multiplicative joint effects, the case-only design offers excellent power, but it is invalid if genotype and exposure are correlated in the population. The authors describe 4 mechanisms that produce genotype-exposure dependence: exposure-related genetic population stratification, effects of family history on behavior, genotype effects on exposure, and selective attrition. They propose a sibling-augmented case-only (SACO) design that protects against the former 2 mechanisms and is therefore valid for studying young-onset disease in which genotype does not influence exposure. A SACO design allows the ascertainment of genotype and exposure for cases and exposure for 1 or more unaffected siblings selected randomly. Conditional logistic regression permits assessment of exposure effects and gene-environment interactions. Via simulations, the authors compare the likelihood-based inference on interactions using the SACO design with that based on other designs. They also show that robust analyses of interactions using tetrads or disease-discordant sibling pairs are equivalent to analyses using the SACO design.
Project description:High-throughput cancer studies have been extensively conducted, searching for genetic markers associated with outcomes beyond clinical and environmental risk factors. Gene-environment interactions can have important implications beyond main effects. The commonly-adopted single-marker analysis cannot accommodate the joint effects of a large number of markers. The existing joint-effects methods also have limitations. Specifically, they may suffer from high computational cost, do not respect the "main effect, interaction" hierarchical structure, or use ineffective techniques. We develop a penalization method for the identification of important G × E interactions and main effects. It has an intuitive formulation, respects the hierarchical structure, accommodates the joint effects of multiple markers, and is computationally affordable. In numerical study, we analyze prognosis data under the AFT (accelerated failure time) model. Simulation shows satisfactory performance of the proposed method. Analysis of an NHL (non-Hodgkin lymphoma) study with SNP measurements shows that the proposed method identifies markers with important implications and satisfactory prediction performance.
Project description:BACKGROUND: The analysis of complex diseases is an important problem in human genetics. Because multifactoriality is expected to play a pivotal role, many studies are currently focused on collecting information on the genetic and environmental factors that potentially influence these diseases. However, there is still a lack of efficient and thoroughly tested statistical models that can be used to identify implicated features and their interactions. Simulations using large biologically realistic data sets with known gene-gene and gene-environment interactions that influence the risk of a complex disease are a convenient and useful way to assess the performance of statistical methods. RESULTS: The Gene-Environment iNteraction Simulator 2 (GENS2) simulates interactions among two genetic and one environmental factor and also allows for epistatic interactions. GENS2 is based on data with realistic patterns of linkage disequilibrium, and imposes no limitations either on the number of individuals to be simulated or on number of non-predisposing genetic/environmental factors to be considered. The GENS2 tool is able to simulate gene-environment and gene-gene interactions. To make the Simulator more intuitive, the input parameters are expressed as standard epidemiological quantities. GENS2 is written in Python language and takes advantage of operators and modules provided by the simuPOP simulation environment. It can be used through a graphical or a command-line interface and is freely available from http://sourceforge.net/projects/gensim. The software is released under the GNU General Public License version 3.0. CONCLUSIONS: Data produced by GENS2 can be used as a benchmark for evaluating statistical tools designed for the identification of gene-gene and gene-environment interactions.