Project description:Genome-wide association studies (GWAS) have successfully identified a large number of genetic variants associated with complex traits, but these only explain a small proportion of the total heritability. It has been recently proposed that rare variants can create 'synthetic association' signals in GWAS, by occurring more often in association with one of the alleles of a common tag single nucleotide polymorphism. While the ultimate evaluation of this hypothesis will require the completion of large-scale sequencing studies, it is informative to place it in the broader context of what is known about the genetic architecture of complex disease. In this review, we draw from empirical and theoretical data to summarize evidence showing that synthetic associations do not underlie many reported GWAS associations.
Project description:Genome-wide association studies (GWAS) have now identified at least 2,000 common variants that appear associated with common diseases or related traits (http://www.genome.gov/gwastudies), hundreds of which have been convincingly replicated. It is generally thought that the associated markers reflect the effect of a nearby common (minor allele frequency >0.05) causal site, which is associated with the marker, leading to extensive resequencing efforts to find causal sites. We propose as an alternative explanation that variants much less common than the associated one may create "synthetic associations" by occurring, stochastically, more often in association with one of the alleles at the common site versus the other allele. Although synthetic associations are an obvious theoretical possibility, they have never been systematically explored as a possible explanation for GWAS findings. Here, we use simple computer simulations to show the conditions under which such synthetic associations will arise and how they may be recognized. We show that they are not only possible, but inevitable, and that under simple but reasonable genetic models, they are likely to account for or contribute to many of the recently identified signals reported in genome-wide association studies. We also illustrate the behavior of synthetic associations in real datasets by showing that rare causal mutations responsible for both hearing loss and sickle cell anemia create genome-wide significant synthetic associations, in the latter case extending over a 2.5-Mb interval encompassing scores of "blocks" of associated variants. In conclusion, uncommon or rare genetic variants can easily create synthetic associations that are credited to common variants, and this possibility requires careful consideration in the interpretation and follow up of GWAS signals.
Project description:In a common pharmacogenomic scenario, outcome measures are compared for treated and untreated subjects across genotype-defined subgroups. The key question is whether treatment benefit (or harm) is particularly strong in certain subgroups, and therefore the statistical analysis focuses on the interaction between treatment and genotype. However, genome-wide analysis in such scenarios requires careful statistical thought as, in addition to the usual problems of multiple testing, the marker-defined sample sizes, and therefore power, vary across the individual genotypes being evaluated. The variability in power means that the usual practice of using a common P-value threshold across tests has difficulties. The reason is that the use of a fixed threshold, with variable power, implies that the costs of type I and type II errors vary across tests in a manner that is implicit rather than dictated by the analyst. In this paper we discuss this problem and describe an easily implementable solution based on Bayes factors. We pay particular attention to the specification of priors, which is not a straightforward task. The methods are illustrated using data from a randomized controlled clinical trial in which homocysteine levels are compared in individuals receiving low and high doses of folate supplements and across marker subgroups. The method we describe is implemented in the R computing environment with code available from http://faculty.washington.edu/jonno/cv.html.
Project description:Studies using genome-wide platforms have yielded an unprecedented number of promising signals of association between genomic variants and human traits. This Review addresses the steps required to validate, augment and refine such signals to identify underlying causal variants for well-defined phenotypes. These steps include: large-scale exact replication across both similar and diverse populations; fine mapping and resequencing; determination of the most informative markers and multiple independent informative loci; incorporation of functional information; and improved phenotype mapping of the implicated genetic effects. Even in cases for which replication proves that an effect exists, confident localization of the causal variant often remains elusive.
Project description:Genome-wide association (GWA) studies have discovered multiple common genetic risk variants related to common diseases. It has been proposed that a number of these signals of common polymorphisms are based on synthetic associations that are generated by rare causative variants. We investigated if mutations in the low-density lipoprotein receptor (LDLR) gene causing familial hypercholesterolemia (FH, OMIM #143890) produce such signals. We genotyped 480?254 polymorphisms in 464 FH patients and in 5945 subjects from the general population. A total of 28 polymorphisms located up to 2.4?Mb from the LDLR gene were genome-wide significantly associated with FH (P<10(-8)). We replicated the 10 top signals in 2189 patients with a clinical diagnosis of FH and in 2157 subjects of a second sample of the general population (P<0.000087). Our findings confirm that rare variants are able to cause synthetic genome-wide significant associations, and that they exert this effect at relatively large distances from the causal mutation.
Project description:Although genome-wide association studies (GWASs) have identified numerous loci associated with complex traits, imprecise modeling of the genetic relatedness within study samples may cause substantial inflation of test statistics and possibly spurious associations. Variance component approaches, such as efficient mixed-model association (EMMA), can correct for a wide range of sample structures by explicitly accounting for pairwise relatedness between individuals, using high-density markers to model the phenotype distribution; but such approaches are computationally impractical. We report here a variance component approach implemented in publicly available software, EMMA eXpedited (EMMAX), that reduces the computational time for analyzing large GWAS data sets from years to hours. We apply this method to two human GWAS data sets, performing association analysis for ten quantitative traits from the Northern Finland Birth Cohort and seven common diseases from the Wellcome Trust Case Control Consortium. We find that EMMAX outperforms both principal component analysis and genomic control in correcting for sample structure.
Project description:The P-value approach has been employed to prioritizing genome-wide association (GWA) scan signals, with a genome-wide significance defined by a prior P-value threshold, although this is not ideal. A rationale put forward is that the association signals rather should be expected to give less support for single nucleotide polymorphisms (SNPs) that are rare (with associated low-power tests) than for common SNPs with equivalent P-values, unless investigators believe, a priori, that rare causative variants contribute to the disease and have more pronounced effects.Using data from a GWA scan for type 2 diabetes (1924 cases, 2938 controls, 393 453 SNPs), we compared P-values with four alternative signal measures: likelihood ratio (LR), Bayes factor (BF; with a specified prior distribution for true effects), 'frequentist factor' (FF; reflecting the ratio between estimated--post-data-- 'power' and P-value) and probability of pronounced effect size (PrPES).The 19 common SNPs [minor allele frequency (MAF) among the controls >29%] yielding strong P-value signals (P < 5 x 10(-7)) were also top ranked by the other approaches. There was a strong similarity between the P-values, LR and BF signals, in terms of ranking SNPs. In contrast, FF and PrPES signals down-weighted rare SNPs (control MAF <10%) with low P-values.For prioritization of signals that do not achieve compelling levels of evidence for association, the main driving force behind observed differences between the various association signals appears to be SNP MAF. The statistical power afforded by follow-up samples for establishing replication should be taken into account when tailoring the signal selection strategy.
Project description:BACKGROUND: Genome-wide association studies prove to be a powerful approach to identify the genetic basis of different human diseases. We studied the relationship between seven diseases characterized in a previous genome-wide association study by the Wellcome Trust Case Control Consortium. Instead of doing a horizontal association of SNPs to diseases, we did a vertical analysis of disease associations by comparing the genetic similarities of diseases. Our analysis was carried out at four levels - the nucleotide level (SNPs), the gene level, the protein level (through protein-protein interaction network), and the phenotype level. RESULTS: Our results show that Crohn's disease, rheumatoid arthritis, and type 1 diabetes share evidence of genetic associations at all levels of analysis, offering strong molecular support for the current grouping of the diseases. On the other hand, coronary artery disease, hypertension, and type 2 diabetes, despite being considered as a natural group with potential aetiological overlap, do not show any evidence of shared genetic basis at all levels. CONCLUSION: Our study is a first attempt on mining of GWA data to examine genetic associations between different diseases. The positive result is apparently not a coincidence and hence demonstrates the promising use of our approach.
Project description:We have leveraged a Drosophila model relevant to Alzheimer disease (AD) for functional screening of findings from a genome-wide scan for loci associated with a quantitative measure of AD pathology in humans. In six of the 15 genomic regions evaluated, we successfully identified a causal gene for the association, on the basis of in vivo interactions with the neurotoxicity of Tau, which forms neurofibrillary tangles in AD. Among the top results, rs10845990 within SLC2A14, encoding a glucose transporter, showed evidence of replication for association with AD pathology, and gain and loss of function in glut1, the Drosophila ortholog, was associated with suppression and enhancement of Tau toxicity, respectively. Our strategy of coupling genome-wide association in humans with functional screening in a model organism is likely to be a powerful approach for gene discovery in AD and other complex genetic disorders.
Project description:With the establishment of large biobanks, discovery of single nucleotide variants (SNVs, also known as single nucleotide polymorphisms (SNVs)) associated with various phenotypes has accelerated. An open question is whether genome-wide significant SNVs identified in earlier genome-wide association studies (GWAS) are replicated in later GWAS conducted in biobanks. To address this, we examined a publicly available GWAS database and identified two, independent GWAS on the same phenotype (an earlier, "discovery" GWAS and a later, "replication" GWAS done in the UK biobank). The analysis evaluated 136,318,924 SNVs (of which 6289 reached P < 5e-8 in the discovery GWAS) from 4,397,962 participants across nine phenotypes. The overall replication rate was 85.0%; although lower for binary than quantitative phenotypes (58.1% versus 94.8% respectively). There was a 18.0% decrease in SNV effect size for binary phenotypes, but a 12.0% increase for quantitative phenotypes. Using the discovery SNV effect size, phenotype trait (binary or quantitative), and discovery P value, we built and validated a model that predicted SNV replication with area under the Receiver Operator Curve = 0.90. While non-replication may reflect lack of power rather than genuine false-positives, these results provide insights about which discovered associations are likely to be replicated across subsequent GWAS.