Hardy-Weinberg equilibrium testing of biological ascertainment for Mendelian randomization studies.
ABSTRACT: Mendelian randomization (MR) permits causal inference between exposures and a disease. It can be compared with randomized controlled trials. Whereas in a randomized controlled trial the randomization occurs at entry into the trial, in MR the randomization occurs during gamete formation and conception. Several factors, including time since conception and sampling variation, are relevant to the interpretation of an MR test. Particularly important is consideration of the "missingness" of genotypes that can be originated by chance, genotyping errors, or clinical ascertainment. Testing for Hardy-Weinberg equilibrium (HWE) is a genetic approach that permits evaluation of missingness. In this paper, the authors demonstrate evidence of nonconformity with HWE in real data. They also perform simulations to characterize the sensitivity of HWE tests to missingness. Unresolved missingness could lead to a false rejection of causality in an MR investigation of trait-disease association. These results indicate that large-scale studies, very high quality genotyping data, and detailed knowledge of the life-course genetics of the alleles/genotypes studied will largely mitigate this risk. The authors also present a Web program (http://www.oege.org/software/hwe-mr-calc.shtml) for estimating possible missingness and an approach to evaluating missingness under different genetic models.
Project description:Objective: Departure from Hardy Weinberg Equilibrium (HWE) may occur due to a variety of causes, including purifying selection, inbreeding, population substructure, copy number variation or genotyping error. We searched for specific characteristics of HWE-departure due to genotyping error. Methods: Genotypes of a random set of genetic variants were obtained from the Exome Aggregation Consortium (ExAC) database. Variants with <80% successful genotypes or with minor allele frequency (MAF) <1% were excluded. HWE-departure (d-HWE) was considered significant at p < 10E-05 and classified as d-HWE with loss of heterozygosity (LoH d-HWE) or d-HWE with excess heterozygosity (gain of heterozygosity: GoH d-HWE). Missing genotypes, variant type (single nucleotide polymorphism (SNP) vs. insertion/deletion); MAF, standard deviation (SD) of MAF across populations (MAF-SD) and copy number variation were evaluated for association with HWE-departure. Results: The study sample comprised 3,204 genotype distributions. HWE-departure was observed in 134 variants: LoH d-HWE in 41 (1.3%), GoH d-HWE in 93 (2.9%) variants. LoH d-HWE was more likely in variants located within deletion polymorphisms (p < 0.001) and in variants with higher MAF-SD (p = 0.0077). GoH d-HWE was associated with low genotyping rate, with variants of insertion/deletion type and with high MAF (all at p < 0.001). In a sub-sample of 2,196 variants with genotyping rate >98%, LoH d-HWE was found in 29 (1.3%) variants, but no GoH d-HWE was detected. The findings of the non-random distribution of HWE-violating SNPs along the chromosome, the association with common deletion polymorphisms and indel-variant type, and the finding of excess heterozygotes in genomic regions that are prone to cross-hybridization were confirmed in a large sample of short variants from the 1,000 Genomes Project. Conclusions: We differentiated between two types of HWE-departure. GoH d-HWE was suggestive for genotyping error. LoH d-HWE, on the contrary, pointed to natural variabilities such as population substructure or common deletion polymorphisms.
Project description:Genotype error can greatly reduce the power of a genetic study. For family data, genotype error can be assessed by examining marker data for non-Mendelian inconsistencies, closely linked markers for double recombination events, and consistency of duplicate genotypes. For case-control data, duplicate samples are genotyped, and controls are tested for deviations from Hardy-Weinberg equilibrium (HWE). Duplicate samples can provide accurate estimates of genotyping error rates, unless systematic genotyping errors have occurred. Although genotyping errors can cause deviations from HWE, these deviations are usually small, and the power to detect them is low except for high rates of genotyping error and/or large sample sizes. An additional problem is that even when deviations from HWE are detected for marker loci, without additional experimentation it is not possible to unequivocally implicate genotyping error as the cause. The power and sample sizes necessary to detect deviations from HWE for single-nucleotide polymorphism (SNP) data are examined for a variety of genotyping error and pseudo-SNP models. For the majority of genotyping models examined, the power is poor to detect deviations from HWE. For example, for 1,000 controls, if an allele with a frequency of 0.1 fails to amplify for 28% of the heterozygous genotypes producing a sample error rate of 0.05, the power is 0.51 to detect a deviation from HWE at an alpha level of 0.05. On the other hand, the detection of deviations from HWE for pseudo-SNPs (paralogous and ectopic sequence variants) for the majority of models examined produces a power of >0.8 for sample sizes as small as 50 individuals.
Project description:BACKGROUND:Genotyping error can increase both type I and II errors. In order to elucidate potential genotyping errors, data quality control often includes testing genotype data for deviations from Hardy-Weinberg Equilibrium (HWE). METHODS:The Hardy-Weinberg Disequilibrium (HWD) coefficient and the ability to reject the null hypothesis of HWE were calculated analytically for genotype data from parents and unaffected siblings of affected probands. RESULTS:Genotype data from parents and unaffected siblings display deviations from HWE when functional or markers in LD with functional locus are tested. For the parental genotype data all deviations from HWE are negative, indicating an excess of heterozygous genotypes with the strongest deviations from HWE observed for the multiplicative model. In contrast, for affected proband genotype data, there is no deviation from HWE under the multiplicative model and the deviations from HWE for the recessive model are positive. For the unaffected sibling data, patterns of deviation from HWE are similar to those observed in the proband data with the exception of the multiplicative model where the HWD coefficient although close to 0 can be either positive or negative depending on the allele frequency. CONCLUSION:Deviations from HWE in parental and unaffected sibling genotype data could be due to an association with the functional locus. However these deviations for genotypic relative risk < or =2.0 are not large and therefore the power to detect them is usually low. Testing for deviations from HWE in parental and unaffected sibling genotype data is still beneficial for quality control even though functional loci, in parental and unaffected sibling genotype data, can produce an association signal.
Project description:Testing for Hardy-Weinberg equilibrium (HWE) is an important component in almost all analyses of population genetic data. Genetic markers that violate HWE are often treated as special cases; for example, they may be flagged as possible genotyping errors, or they may be investigated more closely for evolutionary signatures of interest. The presence of population structure is one reason why genetic markers may fail a test of HWE. This is problematic because almost all natural populations studied in the modern setting show some degree of structure. Therefore, it is important to be able to detect deviations from HWE for reasons other than structure. To this end, we extend statistical tests of HWE to allow for population structure, which we call a test of "structural HWE." Additionally, our new test allows one to automatically choose tuning parameters and identify accurate models of structure. We demonstrate our approach on several important studies, provide theoretical justification for the test, and present empirical evidence for its utility. We anticipate the proposed test will be useful in a broad range of analyses of genome-wide population genetic data.
Project description:Testing genetic markers for Hardy-Weinberg equilibrium (HWE) is an important tool for detecting genotyping errors in large-scale genotyping studies. For markers at the X chromosome, typically the ?(2) or exact test is applied to the females only, and the hemizygous males are considered to be uninformative. In this paper we show that the males are relevant, because a difference in allele frequency between males and females may indicate HWE not to hold. The testing of markers on the X chromosome has received little attention, and in this paper we lay down the foundation for testing biallelic X-chromosomal markers for HWE. We develop four frequentist statistical test procedures for X-linked markers that take both males and females into account: the ?(2) test, likelihood ratio test, exact test and permutation test. Exact tests that include males are shown to have a better Type I error rate. Empirical data from the GENEVA project on venous thromboembolism is used to illustrate the proposed tests. Results obtained with the new tests differ substantially from tests that are based on female genotype counts only. The new tests detect differences in allele frequencies and seem able to uncover additional genotyping error that would have gone unnoticed in HWE tests based on females only.
Project description:Testing Hardy-Weinberg equilibrium (HWE) in the control group is commonly used to detect genotyping errors in genetic association studies. We propose a likelihood ratio test for testing HWE in the study population using both case and control samples. This test incorporates underlying association models. Another feature is that, when we infer the disease-genotype association, we explicitly incorporate HWE or a possible departure from Hardy-Weinberg equilibrium (DHWE) into the model. Our unified framework enables us to infer the disease-genotype association when a detected DHWE needs to be part of the model after causes for the DHWE are explored. Real data sets are used to illustrate the application of the methodology and its implication in genetic association studies. Our analysis and interpretation touch on issues such as genotyping errors, population selection, population stratification, or the study sampling plan, that all could be the cause of DHWE.
Project description:Population-based genetic association studies have proven to be a powerful tool in identifying genes implicated in many complex human diseases that have a huge impact on public health. An essential quality control step in such studies is to undertake Hardy-Weinberg equilibrium (HWE) calculations. Deviations from HWE in the control group may reflect important problems including selection bias, population stratification and genotyping errors. If HWE is violated, the inferences of these studies may thus be biased. We therefore aimed to examine the extent to which HWE calculations are reported in genetic association studies published in Cell Journal(Yakhteh)(Cell J). Using keywords pertaining to genetic association studies, eleven relevant articles were identified of which ten provided full genotypic data. The genotype distribution of 16 single nucleotide polymorphisms (SNPs) was re-analyzed for HWE by using three different methods where appropriate. HWE was not reported in 60% of all articles investigated. Among those reporting, only one article provided calculations correctly and in detail. Therefore, 90% of articles analyzed failed to provide sufficient HWE data. Interestingly, three articles had significant HWE deviation in their control groups of which one highly deviated from HWE expectations (P= 9.8×10(-12)). We thus show that HWE calculations are under-reported in genetic association studies published in this journal. Furthermore, the conclusions of the three studies showing significant HWE in their control groups should be treated cautiously as they may be potentially misleading. We therefore recommend that reporting of detailed HWE calculations should become mandatory for such studies in the future.
Project description:Testing for the Hardy-Weinberg equilibrium (HWE) is often used as an initial step for checking the quality of genotyping. When testing the HWE for case-control data, the impact of a potential genetic association between the marker and the disease must be controlled for otherwise the results may be biased. Li and Li  proposed a likelihood ratio test (LRT) that accounts for this potential genetic association and it is more powerful than the commonly used control-only ?² test. However, the LRT is not efficient when the marker is independent of the disease, and also requires numerical optimization to calculate the test statistic. In this article, we propose a novel shrinkage test for assessing the HWE. The proposed shrinkage test yields higher statistical power than the LRT when the marker is independent of or weakly associated with the disease, and converges to the LRT when the marker is strongly associated with the disease. In addition, the proposed shrinkage test has a closed form and can be easily used to test the HWE for large datasets that result from genome-wide association studies. We compare the performance of the shrinkage test with existing methods using simulation studies, and apply the shrinkage test to a genome-wide association dataset for Alzheimer's disease.
Project description:BACKGROUND: Deviations from Hardy-Weinberg equilibrium (HWE) are commonly thought of as indicating genotyping errors, population stratification or some other artefact. However they could also arise through important biological mechanisms. In particular, genetic variants having a recessive effect on the successful fertilisation and/or development of an embryo might be manifest through such deviations in an unselected sample of "control" subjects. FINDINGS: We investigated genotypes from 463842 autosomal markers from 1504 British subjects. We identified regions in which several neighbouring markers exhibited deviation from HWE in the same direction by considering "heterozygosity scores" in windows of 10 markers. The heterozygosity score for each marker was defined as -log(p) or log(p) according to whether the marker demonstrated increased heterozygosity or homozygosity. In each window the marker with the highest absolute score was ignored and the positive and negative scores were summed for the other nine markers. Windows were selected on the basis of this sum exceeding a given threshold, for which we used values of 50 or 15.For the threshold of 50, we identified 7 regions with increased heterozygosity and for the threshold of 15 we identified 22 regions with increased heterozygosity, 23 with increased homozygosity and 2 containing both kinds of window. The most impressive of these results came from a group of 6 markers at 17q21, each of which showed increased heterozygosity significant at p < 10(-190). CONCLUSION: The human genome contains regions which deviate markedly from HWE and these might harbour genes influencing embryonic survival.
Project description:Hardy-Weinberg Equilibrium (HWE) is used to estimate the number of homozygous and heterozygous variant carriers based on its allele frequency in populations that are not evolving. Deviations from HWE in large population databases have been used to detect genotyping errors, which can result in extreme heterozygote excess (HetExc). However, HetExc might also be a sign of natural selection since recessive disease causing variants should occur less frequently in a homozygous state in the population, but may reach high allele frequency in a heterozygous state, especially if they are advantageous. We developed a filtering strategy to detect these variants and applied it on genome data from 137,842 individuals. The main limitations of this approach were quality of genotype calls and insufficient population sizes, whereas population structure and inbreeding can reduce sensitivity, but not precision, in certain populations. Nevertheless, we identified 161 HetExc variants in 149 genes, most of which were specific to African/African American populations (?79.5%). Although the majority of them were not associated with known diseases, or were classified as clinically "benign," they were enriched in genes associated with autosomal recessive diseases. The resulting dataset also contained two known recessive disease causing variants with evidence of heterozygote advantage in the sickle-cell anemia (HBB) and cystic fibrosis (CFTR). Finally, we provide supporting in silico evidence of a novel heterozygote advantageous variant in the chromodomain helicase DNA binding protein 6 gene (CHD6; involved in influenza virus replication). We anticipate that our approach will aid the detection of rare recessive disease causing variants in the future.