A generalized sequential Bonferroni procedure using smoothed weights for genome-wide association studies incorporating information on Hardy-Weinberg disequilibrium among cases.
ABSTRACT: For genome-wide association studies (GWAS) with case-control designs, one of the most widely used association tests is the Cochran-Armitage (CA) trend test assuming an additive mode of inheritance. The CA trend test often has higher power than other association tests under additive and multiplicative disease models. However, it can have very low power under a recessive disease model in GWAS. Although tests (such as MAX3) robust to different genetic models have been developed, they often have relatively lower power than the CA trend test under additive and multiplicative models. The goal of this study is to propose an efficient method that not only has higher power than the CA trend test under dominant and recessive models but also maintains the power of the CA trend test under additive and multiplicative models.We employed the generalized sequential Bonferroni (GSB) procedure of Holm to incorporate information from a Hardy-Weinberg disequilibrium (HWD) test into the CA trend test based on estimating weights from the p values of the HWD test. We proposed to smooth the weights to reduce possible noise.Results from extensive simulation studies showed that the proposed GSB procedure can achieve the goal described above.
Project description:For genome-wide association studies (GWAS) using case-control data with stratification, a commonly used association test is the generalized Armitage (GA) trend test implemented in the software EIGENSTRAT. The GA trend test uses principal component analysis to correct for population stratification. It usually assumes an additive disease model and can have high power when the underlying disease model is additive or multiplicative, but may have relatively low power when the underlying disease model is recessive or dominant. The purpose of this paper is to provide a test procedure for GWAS with increased power over the GA trend test under the recessive and dominant models, while maintaining the power of the GA trend test under the additive and multiplicative models.We extend a Hardy-Weinberg disequilibrium (HWD) trend test for a homogeneous population to account for population stratification, and then propose a robust association test procedure for GWAS that incorporates information from the extended HWD trend test into the GA trend test.Our simulation studies and application of our method to a GWAS data set indicate that our proposed method can achieve the purpose described above.
Project description:In case-control genetic association studies, a standard practice is to perform the Cochran-Armitage (CA) trend test under the assumption of the additive model because of its robustness. We could even identify situations in which it outperformed the analysis model consistent with the underlying inheritance mode. In this article, we analytically reveal the statistical basis that leads to the phenomenon. By elucidating the origin of the CA trend test as a linear regression model, we decompose Pearson's ?2 -test statistic into two components-one is the CA trend test statistic that measures the goodness of fit of the linear regression model, and the other measures the discrepancy between data and the linear regression model. Under this framework, we show that the additive coding scheme, as well as the multiplicative coding scheme, increases the coefficient of determination of the regression model by increasing the spread of data points. We also obtain the conditions under which the CA trend test statistic equals the MAX statistic and Pearson's ?2 -test statistic.
Project description:There has been a long-standing controversy in epidemiology with regard to an appropriate risk scale for testing interactions between genes (G) and environmental exposure (E?). Although interaction tests based on the logistic model-which approximates the multiplicative risk for rare diseases-have been more widely applied because of its convenience in statistical modeling, interactions under additive risk models have been regarded as closer to true biologic interactions and more useful in intervention-related decision-making processes in public health. It has been well known that exploiting a natural assumption of G-E independence for the underlying population can dramatically increase statistical power for detecting multiplicative interactions in case-control studies. However, the implication of the independence assumption for tests for additive interaction has not been previously investigated. In this article, the authors develop a likelihood ratio test for detecting additive interactions for case-control studies that incorporates the G-E independence assumption. Numerical investigation of power suggests that incorporation of the independence assumption can enhance the efficiency of the test for additive interaction by 2- to 2.5-fold. The authors illustrate their method by applying it to data from a bladder cancer study.
Project description:There have been recent proposals advocating the use of additive gene-environment interaction instead of the widely used multiplicative scale, as a more relevant public health measure. Using gene-environment independence enhances statistical power for testing multiplicative interaction in case-control studies. However, under departure from this assumption, substantial bias in the estimates and inflated type I error in the corresponding tests can occur. In this paper, we extend the empirical Bayes (EB) approach previously developed for multiplicative interaction, which trades off between bias and efficiency in a data-adaptive way, to the additive scale. An EB estimator of the relative excess risk due to interaction is derived, and the corresponding Wald test is proposed with a general regression setting under a retrospective likelihood framework. We study the impact of gene-environment association on the resultant test with case-control data. Our simulation studies suggest that the EB approach uses the gene-environment independence assumption in a data-adaptive way and provides a gain in power compared with the standard logistic regression analysis and better control of type I error when compared with the analysis assuming gene-environment independence. We illustrate the methods with data from the Ovarian Cancer Association Consortium.
Project description:To develop effective methods for GWAS in admixed populations such as African Americans.We show that, when testing the null hypothesis that the test SNP is not in background linkage disequilibrium with the causal variants, several existing methods cannot control well the family-wise error rate (FWER) in the strong sense in GWAS. These existing methods include association tests adjusting for global ancestry and joint association tests that combine statistics from admixture mapping tests and association tests that correct for local ancestry. Furthermore, we describe a generalized sequential Bonferroni (smooth-GSB) procedure for GWAS that incorporates smoothed weights calculated from admixture mapping tests into association tests that correct for local ancestry. We have applied the smooth-GSB procedure to analyses of GWAS data on American Africans from the Atherosclerosis Risk in Communities (ARIC) Study.Our simulation studies indicate that the smooth-GSB procedure not only control the FWER, but also improves statistical power compared with association tests correcting for local ancestry.The smooth-GSB procedure can result in a better performance than several existing methods for GWAS in admixed populations.
Project description:BACKGROUND:Genotyping error can increase both type I and II errors. In order to elucidate potential genotyping errors, data quality control often includes testing genotype data for deviations from Hardy-Weinberg Equilibrium (HWE). METHODS:The Hardy-Weinberg Disequilibrium (HWD) coefficient and the ability to reject the null hypothesis of HWE were calculated analytically for genotype data from parents and unaffected siblings of affected probands. RESULTS:Genotype data from parents and unaffected siblings display deviations from HWE when functional or markers in LD with functional locus are tested. For the parental genotype data all deviations from HWE are negative, indicating an excess of heterozygous genotypes with the strongest deviations from HWE observed for the multiplicative model. In contrast, for affected proband genotype data, there is no deviation from HWE under the multiplicative model and the deviations from HWE for the recessive model are positive. For the unaffected sibling data, patterns of deviation from HWE are similar to those observed in the proband data with the exception of the multiplicative model where the HWD coefficient although close to 0 can be either positive or negative depending on the allele frequency. CONCLUSION:Deviations from HWE in parental and unaffected sibling genotype data could be due to an association with the functional locus. However these deviations for genotypic relative risk < or =2.0 are not large and therefore the power to detect them is usually low. Testing for deviations from HWE in parental and unaffected sibling genotype data is still beneficial for quality control even though functional loci, in parental and unaffected sibling genotype data, can produce an association signal.
Project description:Most genome-wide association studies assumed an additive model of inheritance which may result in significant loss of power when there is a strong departure from additivity. The General Regression Model (GRM), which allows performing an assumption-free test for association by testing for both additive effect and deviation from additive effect, may be more appropriate for association tests. Additionally, GRM allows testing the underlying genetic model. We compared the power of GRM association test to additive and other Cochran-Armitage Trend (CAT) tests through simulations and by applying GRM to a large case/control sample, the bipolar Welcome Trust Case Control Cohort data. Simulations were performed on two sets of case/control samples (1000/1000 and 2000/2000), using a large panel of genetic models. Four association tests (GRM and additive, recessive and dominant CAT tests) were applied to all replicates.We showed that GRM power to detect association was similar or greater than the additive CAT test, in particular in case of recessive inheritance, with up to 67% gain in power. GRM analysis of genome-wide bipolar disorder Welcome Trust Consortium data (1998 cases/3004 controls) showed significant association in the 16p12 region (rs420259; P = 3.4E-7) which has not been identified using the additive CAT test. As expected, rs42025 fitted a non-additive (recessive) model.GRM provides increased power compared to the additive CAT test for association studies and is easily applicable.
Project description:Case-parent trio studies considering genotype data from children affected by a disease and their parents are frequently used to detect single nucleotide polymorphisms (SNPs) associated with disease. The most popular statistical tests for this study design are transmission/disequilibrium tests (TDTs). Several types of these tests have been developed, for example, procedures based on alleles or genotypes. Therefore, it is of great interest to examine which of these tests have the highest statistical power to detect SNPs associated with disease. Comparisons of the allelic and the genotypic TDT for individual SNPs have so far been conducted based on simulation studies, since the test statistic of the genotypic TDT was determined numerically. Recently, however, it has been shown that this test statistic can be presented in closed form. In this article, we employ this analytic solution to derive equations for calculating the statistical power and the required sample size for different types of the genotypic TDT. The power of this test is then compared with the one of the corresponding score test assuming the same mode of inheritance as well as the allelic TDT based on a multiplicative mode of inheritance, which is equivalent to the score test assuming an additive mode of inheritance. This is, thus, the first time the power of these tests are compared based on equations, yielding instant results and omitting the need for time-consuming simulation studies. This comparison reveals that these tests have almost the same power, with the score test being slightly more powerful.
Project description:Advances in sequencing technology allow assessing the impact of rare variation on common disorders. For this purpose, methods combine rare variants across a gene and compare an aggregate statistic between cases and controls. However, sequencing many individuals is costly. Hence, it is necessary to identify case samples that are most likely to result in powerful tests under realistic model assumptions. Power can be increased by selecting cases that are highly likely to carry risk variants. As rare variants that contribute to the heritability of a disease co-segregate among affected family members, selecting cases that have affected family members may increase the power of rare variant tests considerably. Here I compare sequencing random cases to cases ascertained to have affected family members. I quantify the power of the different approaches and provide criteria for sample selection under different models of inheritance. Under a model of multiplicative gene-gene interaction, a sample of random cases has to be 2-16-fold larger to achieve the same power as a sample of cases ascertained to have affected family members. However, in traits with high heritability this power gain can be reduced or even reversed under models of additive gene-gene interaction. Hence study designs should depend on the studied disease's heritability and on the available sample size. I also show that selecting cases that share both chromosomes identical by descent with an affected sibling at candidate regions can result in a further power gain.
Project description:Hybrid zones as windows on evolutionary processes provide a natural laboratory for studying the genetic basis and mechanisms of postzygotic isolation. One resultant pattern in hybrid zones is the Hardy-Weinberg disequilibrium (HWD) for a single locus or the linkage disequilibrium (LD) for multiple loci produced by natural selection against hybrids. However, HWD and the commonly used low-order gametic or composite digenic LD cannot fully reflect the pattern of the high-order genotypic interactions. Here we propose the use of zygotic LD to elucidate the selection mechanisms of postzygotic isolation, and its calculation is based on genotypic frequencies only, irrespective of the type of mating system. We numerically and analytically show that the maximum composite digenic LD is always greater than the maximum absolute zygotic LD under the linear-additive selection, but is comparable to or smaller than the maximum absolute zygotic LD under the strong epistatic selection. Selection mechanisms can be inferred by testing such differences. We analyze a previously reported mouse hybrid zone assayed with genome-wide SNPs, and confirm that the composite digenic LD cannot appropriately indicate all possible significant genotypic interactions for a given SNP pair. A large proportion of significant zygotic LDs, ?75% in general in the mouse hybrid zone, cannot be revealed from the composite digenic LD analysis. Statistical tests indicate that epistatic selection occurred among multiple loci in the mouse hybrid zone. The results highlight that the joint patterns of the composite digenic and zygotic LDs can help to elucidate the selection mechanism that is potentially involved in postzygotic isolation.