Mining the Unknown: Assigning Function to Noncoding Single Nucleotide Polymorphisms.
ABSTRACT: One of the formative goals of genetics research is to understand how genetic variation leads to phenotypic differences and human disease. Genome-wide association studies (GWASs) bring us closer to this goal by linking variation with disease faster than ever before. Despite this, GWASs alone are unable to pinpoint disease-causing single nucleotide polymorphisms (SNPs). Noncoding SNPs, which represent the majority of GWAS SNPs, present a particular challenge. To address this challenge, an array of computational tools designed to prioritize and predict the function of noncoding GWAS SNPs have been developed. However, fewer than 40% of GWAS publications from 2015 utilized these tools. We discuss several leading methods for annotating noncoding variants and how they can be integrated into research pipelines in hopes that they will be broadly applied in future GWAS analyses.
Project description:Genome-wide association studies (GWASs) for many complex diseases, including inflammatory bowel disease (IBD), produced hundreds of disease-associated loci-the majority of which are noncoding. The number of GWAS loci is increasing very rapidly, but the process of translating single nucleotide polymorphisms (SNPs) from these loci to genomic medicine is lagging. In this study, we investigated 4,734 variants from 152 IBD associated GWAS loci (IBD associated 152 lead noncoding SNPs identified from pooled GWAS results + 4,582 variants in strong linkage-disequilibrium (LD) (r2 ?0.8) for EUR population of 1K Genomes Project) using four publicly available bioinformatics tools, e.g. dbPSHP, CADD, GWAVA, and RegulomeDB, to annotate and prioritize putative regulatory variants. Of the 152 lead noncoding SNPs, around 11% are under strong negative selection (GERP++ RS ?2); and ~30% are under balancing selection (Tajima's D score >2) in CEU population (1K Genomes Project)--though these regions are positively selected (GERP++ RS <0) in mammalian evolution. The analysis of 4,734 variants using three integrative annotation tools produced 929 putative functional SNPs, of which 18 SNPs (from 15 GWAS loci) are in concordance with all three classifiers. These prioritized noncoding SNPs may contribute to IBD pathogenesis by dysregulating the expression of nearby genes. This study showed the usefulness of integrative annotation for prioritizing fewer functional variants from a large number of GWAS markers.
Project description:Genome-wide association studies (GWASs) have enabled the discovery of common genetic variation contributing to normal and pathological traits and clinical drug responses, but recognizing the precise targets of these associations is now the major challenge. Here, we review recent approaches to the functional follow-up of GWAS loci, including fine mapping of GWAS signal(s), prioritization of putative functional SNPs by the integration of genetic epidemiological and bioinformatic methods, and in vitro and in vivo experimental verification of predicted molecular mechanisms for identifying the targeted genes. The majority of GWAS-identified variants fall in noncoding regions of the genome. Therefore, this review focuses on strategies for assessing likely mechanisms affected by noncoding variants; such mechanisms include transcriptional regulation, noncoding RNA function, and epigenetic regulation. These approaches have already accelerated progress from genetic studies to biological knowledge and might ultimately guide the development of prognostic, preventive, and therapeutic measures.
Project description:Human genome-wide association studies (GWASs) have identified numerous associations between single nucleotide polymorphisms (SNPs) and pulmonary function. Proving that there is a causal relationship between GWAS SNPs, many of which are noncoding and without known functional impact, and these traits has been elusive. Furthermore, noncoding GWAS-identified SNPs may exert trans-regulatory effects rather than impact the proximal gene. Noncoding variants in 5-hydroxytryptamine (serotonin) receptor 4 (HTR4) are associated with pulmonary function in human GWASs. To gain insight into whether this association is causal, we tested whether Htr4-null mice have altered pulmonary function. We found that HTR4-deficient mice have 12% higher baseline lung resistance and also increased methacholine-induced airway hyperresponsiveness (AHR) as measured by lung resistance (27%), tissue resistance (48%), and tissue elastance (30%). Furthermore, Htr4-null mice were more sensitive to serotonin-induced AHR. In models of exposure to bacterial lipopolysaccharide, bleomycin, and allergic airway inflammation induced by house dust mites, pulmonary function and cytokine profiles in Htr4-null mice differed little from their wild-type controls. The findings of altered baseline lung function and increased AHR in Htr4-null mice support a causal relationship between genetic variation in HTR4 and pulmonary function identified in human GWAS.
Project description:Functionally annotating genetic variations is an essential yet challenging topic in human genetics research. As large consortia including ENCODE and Roadmap Epigenomics Project continue to generate high-throughput transcriptomic and epigenomic data, many computational frameworks have been developed to integrate these experimental data to predict functionality of genetic variations in both protein-coding and noncoding regions. Here, we compare a number of recently developed annotation frameworks for noncoding regions through enrichment analysis on genome-wide association studies (GWASs). We also compare several different strategies to quantify enrichment using GWAS summary statistics. Our analyses highlight the importance of jointly modeling context-specific annotations with genome-wide data in providing statistically powerful and biologically interpretable enrichment for complex disease associations. Our findings provide insights into when and how computational genome annotations may benefit future complex disease studies on the genome-wide scale.
Project description:Primary hypertension is widely believed to be a complex polygenic disorder with the manifestation influenced by the interactions of genomic and environmental factors making identification of susceptibility genes a major challenge. With major advancement in high-throughput genotyping technology, genome-wide association study (GWAS) has become a powerful tool for researchers studying genetically complex diseases. GWASs work through revealing links between DNA sequence variation and a disease or trait with biomedical importance. The human genome is a very long DNA sequence which consists of billions of nucleotides arranged in a unique way. A single base-pair change in the DNA sequence is known as a single nucleotide polymorphism (SNP). With the help of modern genotyping techniques such as chip-based genotyping arrays, thousands of SNPs can be genotyped easily. Large-scale GWASs, in which more than half a million of common SNPs are genotyped and analyzed for disease association in hundreds of thousands of cases and controls, have been broadly successful in identifying SNPs associated with heart diseases, diabetes, autoimmune diseases, and psychiatric disorders. It is however still debatable whether GWAS is the best approach for hypertension. The following is a brief overview on the outcomes of a decade of GWASs on primary hypertension.
Project description:The genome-wide association study (GWAS) is a powerful approach for studying the genetic complexities of human disease. Unfortunately, GWASs often fail to identify clinically significant associations and describing function can be a challenge. GWAS is a phenotype-to-genotype approach. It is now possible to conduct a converse genotype-to-phenotype approach using extensive electronic medical records to define a phenome. This approach associates a single genetic variant with many phenotypes across the phenome and is called a phenome-wide association study (PheWAS). The majority of PheWASs conducted have focused on variants identified previously by GWASs. This approach has been efficient for rediscovering gene-disease associations while also identifying pleiotropic effects for some single-nucleotide polymorphisms (SNPs). However, the use of SNPs identified by GWAS in a PheWAS is limited by the inherent properties of the GWAS SNPs, including weak effect sizes and difficulty when translating discoveries to function. To address these challenges, we conducted a PheWAS on 105 presumed functional stop-gain and stop-loss variants genotyped on 4235 Marshfield Clinic patients. Associations were validated on an additional 10?640 Marshfield Clinic patients. PheWAS results indicate that a nonsense variant in ARMS2 (rs2736911) is associated with age-related macular degeneration (AMD). These results demonstrate that focusing on functional variants may be an effective approach when conducting a PheWAS.
Project description:Genome-wide association studies (GWASs) have revealed 59 genomic loci associated with type 1 diabetes (T1D). Functional interpretation of the SNPs located in the noncoding region of these loci remains challenging. We perform epigenomic profiling of two enhancer marks, H3K4me1 and H3K27ac, using primary TH1 and TREG cells isolated from healthy and T1D subjects. We uncover a large number of deregulated enhancers and altered transcriptional circuitries in both cell types of T1D patients. We identify four SNPs (rs10772119, rs10772120, rs3176792, rs883868) in linkage disequilibrium (LD) with T1D-associated GWAS lead SNPs that alter enhancer activity and expression of immune genes. Among them, rs10772119 and rs883868 disrupt the binding of retinoic acid receptor ? (RARA) and Yin and Yang 1 (YY1), respectively. Loss of binding by YY1 also results in the loss of long-range enhancer-promoter interaction. These findings provide insights into how noncoding variants affect the transcriptomes of two T-cell subtypes that play critical roles in T1D pathogenesis.
Project description:BACKGROUND:Over the relatively short history of Genome Wide Association Studies (GWASs), hundreds of GWASs have been published and thousands of disease risk-associated SNPs have been identified. Summary statistics from the conducted GWASs are often available and can be used to identify SNP features associated with the level of GWAS statistical significance. Those features could be used to select SNPs from gray zones (SNPs that are nominally significant but do not reach the genome-wide level of significance) for targeted analyses. METHODS:We used summary statistics from recently published breast and lung cancer and scleroderma GWASs to explore the association between the level of the GWAS statistical significance and the expression quantitative trait loci (eQTL) status of the SNP. Data from the Genotype-Tissue Expression Project (GTEx) were used to identify eQTL SNPs. RESULTS:We found that SNPs reported as eQTLs were more significant in GWAS (higher -log10p) regardless of the tissue specificity of the eQTL. Pan-tissue eQTLs (those reported as eQTLs in multiple tissues) tended to be more significant in the GWAS compared to those reported as eQTL in only one tissue type. eQTL density in the ±5?kb adjacent region of a given SNP was also positively associated with the level of GWAS statistical significance regardless of the eQTL status of the SNP. We found that SNPs located in the regions of high eQTL density were more likely to be located in regulatory elements (transcription factor or miRNA binding sites). When SNPs were stratified by the level of statistical significance, the proportion of eQTLs was positively associated with the mean level of statistical significance in the group. The association curve reaches a plateau around -log10p???5. The observed associations suggest that quasi-significant SNPs (10-?5?<?p?<?5?×?10-?8) and SNPs at the genome wide level of statistical significance (p?<?5?×?10-?8) may have a similar proportions of risk associated SNPs. CONCLUSIONS:The results of this study indicate that the SNP's eQTL status, as well as eQTL density in the adjacent region are positively associated with the level of statistical significance of the SNP in GWAS.
Project description:<h4>Rationale</h4>As the third leading cause of death in the United States, the impact of chronic obstructive pulmonary disease (COPD) makes identification of its molecular mechanisms of great importance. Genome-wide association studies (GWASs) have identified multiple genomic regions associated with COPD. However, genetic variation only explains a small fraction of the susceptibility to COPD, and sub-genome-wide significant loci may play a role in pathogenesis.<h4>Objectives</h4>Regulatory annotation with epigenetic evidence may give priority for further investigation, particularly for GWAS associations in noncoding regions. We performed integrative genomics analyses using DNA methylation profiling and genome-wide SNP genotyping from lung tissue samples from 90 subjects with COPD and 36 control subjects.<h4>Methods</h4>We performed methylation quantitative trait loci (mQTL) analyses, testing for SNPs associated with percent DNA methylation and assessed the colocalization of these results with previous COPD GWAS findings using Bayesian methods in the R package coloc to highlight potential regulatory features of the loci.<h4>Measurements and main results</h4>We identified 942,068 unique SNPs and 33,996 unique CpG sites among the significant (5% false discovery rate) cis-mQTL results. The genome-wide significant and subthreshold (P < 10<sup>-4</sup>) GWAS SNPs were enriched in the significant mQTL SNPs (hypergeometric test P < 0.00001). We observed enrichment for sites located in CpG shores and shelves, but not CpG islands. Using Bayesian colocalization, we identified loci in regions near KCNK3, EEFSEC, PIK3CD, DCDC2C, TCERG1L, FRMD4B, and IL27.<h4>Conclusions</h4>Colocalization of mQTL and GWAS loci provides regulatory characterization of significant and subthreshold GWAS findings, supporting a role for genetic control of methylation in COPD pathogenesis.