Joint Bayesian inference of risk variants and tissue-specific epigenomic enrichments across multiple complex human diseases.
ABSTRACT: Genome wide association studies (GWAS) provide a powerful approach for uncovering disease-associated variants in human, but fine-mapping the causal variants remains a challenge. This is partly remedied by prioritization of disease-associated variants that overlap GWAS-enriched epigenomic annotations. Here, we introduce a new Bayesian model RiVIERA (Risk Variant Inference using Epigenomic Reference Annotations) for inference of driver variants from summary statistics across multiple traits using hundreds of epigenomic annotations. In simulation, RiVIERA promising power in detecting causal variants and causal annotations, the multi-trait joint inference further improved the detection power. We applied RiVIERA to model the existing GWAS summary statistics of 9 autoimmune diseases and Schizophrenia by jointly harnessing the potential causal enrichments among 848 tissue-specific epigenomics annotations from ENCODE/Roadmap consortium covering 127 cell/tissue types and 8 major epigenomic marks. RiVIERA identified meaningful tissue-specific enrichments for enhancer regions defined by H3K4me1 and H3K27ac for Blood T-Cell specifically in the nine autoimmune diseases and Brain-specific enhancer activities exclusively in Schizophrenia. Moreover, the variants from the 95% credible sets exhibited high conservation and enrichments for GTEx whole-blood eQTLs located within transcription-factor-binding-sites and DNA-hypersensitive-sites. Furthermore, joint modeling the nine immune traits by simultaneously inferring and exploiting the underlying epigenomic correlation between traits further improved the functional enrichments compared to single-trait models.
Project description:The majority of variants identified by genome-wide association studies (GWAS) reside in the noncoding genome, affecting regulatory elements including transcriptional enhancers. However, characterizing their effects requires the integration of GWAS results with context-specific regulatory activity and linkage disequilibrium annotations to identify causal variants underlying noncoding association signals and the regulatory elements, tissue contexts, and target genes they affect. We propose INFERNO, a novel method which integrates hundreds of functional genomics datasets spanning enhancer activity, transcription factor binding sites, and expression quantitative trait loci with GWAS summary statistics. INFERNO includes novel statistical methods to quantify empirical enrichments of tissue-specific enhancer overlap and to identify co-regulatory networks of dysregulated long noncoding RNAs (lncRNAs). We applied INFERNO to two large GWAS studies. For schizophrenia (36,989 cases, 113,075 controls), INFERNO identified putatively causal variants affecting brain enhancers for known schizophrenia-related genes. For inflammatory bowel disease (IBD) (12,882 cases, 21,770 controls), INFERNO found enrichments of immune and digestive enhancers and lncRNAs involved in regulation of the adaptive immune response. In summary, INFERNO comprehensively infers the molecular mechanisms of causal noncoding variants, providing a sensitive hypothesis generation method for post-GWAS analysis. The software is available as an open source pipeline and a web server.
Project description:Annotating the molecular basis of human disease remains an unsolved challenge, as 93% of disease loci are non-coding and gene-regulatory annotations are highly incomplete<sup>1-3</sup>. Here we present EpiMap, a compendium comprising 10,000 epigenomic maps across 800 samples, which we used to define chromatin states, high-resolution enhancers, enhancer modules, upstream regulators and downstream target genes. We used this resource to annotate 30,000 genetic loci that were associated with 540 traits<sup>4</sup>, predicting trait-relevant tissues, putative causal nucleotide variants in enriched tissue enhancers and candidate tissue-specific target genes for each. We partitioned multifactorial traits into tissue-specific contributing factors with distinct functional enrichments and disease comorbidity patterns, and revealed both single-factor monotropic and multifactor pleiotropic loci. Top-scoring loci frequently had multiple predicted driver variants, converging through multiple enhancers with a common target gene, multiple genes in common tissues, or multiple genes and multiple tissues, indicating extensive pleiotropy. Our results demonstrate the importance of dense, rich, high-resolution epigenomic annotations for the investigation of complex traits.
Project description:Widespread linkage disequilibrium and incomplete annotation of cell-to-cell state variation represent substantial challenges to elucidating mechanisms of trait-associated genetic variation. Here we perform genetic fine-mapping for blood cell traits in the UK Biobank to identify putative causal variants. These variants are enriched in genes encoding proteins in trait-relevant biological pathways and in accessible chromatin of hematopoietic progenitors. For regulatory variants, we explore patterns of developmental enhancer activity, predict molecular mechanisms, and identify likely target genes. In several instances, we localize multiple independent variants to the same regulatory element or gene. We further observe that variants with pleiotropic effects preferentially act in common progenitor populations to direct the production of distinct lineages. Finally, we leverage fine-mapped variants in conjunction with continuous epigenomic annotations to identify trait-cell type enrichments within closely related populations and in single cells. Our study provides a comprehensive framework for single-variant and single-cell analyses of genetic associations.
Project description:Assessing the causal tissues of human complex diseases is important for the prioritization of trait-associated genetic variants. Yet, the biological underpinnings of trait-associated variants are extremely difficult to infer due to statistical noise in genome-wide association studies (GWAS), and because >90% of genetic variants from GWAS are located in non-coding regions. Here, we collected the largest human epigenomic map from ENCODE and Roadmap consortia and implemented a deep-learning-based convolutional neural network (CNN) model to predict the regulatory roles of genetic variants across a comprehensive list of epigenomic modifications. Our model, called DeepFun, was built on DNA accessibility maps, histone modification marks, and transcription factors. DeepFun can systematically assess the impact of non-coding variants in the most functional elements with tissue or cell-type specificity, even for rare variants or de novo mutations. By applying this model, we prioritized trait-associated loci for 51 publicly-available GWAS studies. We demonstrated that CNN-based analyses on dense and high-resolution epigenomic annotations can refine important GWAS associations in order to identify regulatory loci from background signals, which yield novel insights for better understanding the molecular basis of human complex disease. We anticipate our approaches will become routine in GWAS downstream analysis and non-coding variant evaluation.
Project description:Gene-based association tests aggregate genotypes across multiple variants for each gene, providing an interpretable gene-level analysis framework for genome-wide association studies (GWAS). Early gene-based test applications often focused on rare coding variants; a more recent wave of gene-based methods, e.g. TWAS, use eQTLs to interrogate regulatory associations. Regulatory variants are expected to be particularly valuable for gene-based analysis, since most GWAS associations to date are non-coding. However, identifying causal genes from regulatory associations remains challenging and contentious. Here, we present a statistical framework and computational tool to integrate heterogeneous annotations with GWAS summary statistics for gene-based analysis, applied with comprehensive coding and tissue-specific regulatory annotations. We compare power and accuracy identifying causal genes across single-annotation, omnibus, and annotation-agnostic gene-based tests in simulation studies and an analysis of 128 traits from the UK Biobank, and find that incorporating heterogeneous annotations in gene-based association analysis increases power and performance identifying causal genes.
Project description:Genome-wide association studies (GWASs) have revolutionized the field of complex trait genetics over the past decade, yet for most of the significant genotype-phenotype associations the true causal variants remain unknown. Identifying and interpreting how causal genetic variants confer disease susceptibility is still a big challenge. Herein we introduce a new database, CAUSALdb, to integrate the most comprehensive GWAS summary statistics to date and identify credible sets of potential causal variants using uniformly processed fine-mapping. The database has six major features: it (i) curates 3052 high-quality, fine-mappable GWAS summary statistics across five human super-populations and 2629 unique traits; (ii) estimates causal probabilities of all genetic variants in GWAS significant loci using three state-of-the-art fine-mapping tools; (iii) maps the reported traits to a powerful ontology MeSH, making it simple for users to browse studies on the trait tree; (iv) incorporates highly interactive Manhattan and LocusZoom-like plots to allow visualization of credible sets in a single web page more efficiently; (v) enables online comparison of causal relations on variant-, gene- and trait-levels among studies with different sample sizes or populations and (vi) offers comprehensive variant annotations by integrating massive base-wise and allele-specific functional annotations. CAUSALdb is freely available at http://mulinlab.org/causaldb.
Project description:More than 90% of common variants associated with complex traits do not affect proteins directly, but instead the circuits that control gene expression. This has increased the urgency of understanding the regulatory genome as a key component for translating genetic results into mechanistic insights and ultimately therapeutics. To address this challenge, we developed HaploReg (http://compbio.mit.edu/HaploReg) to aid the functional dissection of genome-wide association study (GWAS) results, the prediction of putative causal variants in haplotype blocks, the prediction of likely cell types of action, and the prediction of candidate target genes by systematic mining of comparative, epigenomic and regulatory annotations. Since first launching the website in 2011, we have greatly expanded HaploReg, increasing the number of chromatin state maps to 127 reference epigenomes from ENCODE 2012 and Roadmap Epigenomics, incorporating regulator binding data, expanding regulatory motif disruption annotations, and integrating expression quantitative trait locus (eQTL) variants and their tissue-specific target genes from GTEx, Geuvadis, and other recent studies. We present these updates as HaploReg v4, and illustrate a use case of HaploReg for attention deficit hyperactivity disorder (ADHD)-associated SNPs with putative brain regulatory mechanisms.
Project description:MicroRNAs (miRNA) are key modulators of gene expression and so act as putative fine-tuners of complex phenotypes. Here, we hypothesized that causal variants of complex traits are enriched in miRNAs and miRNA-target networks. First, we conducted a genome-wide association study (GWAS) for seven functional and milk production traits using imputed sequence variants (13~15?million) and >10,000 animals from three dairy cattle breeds, i.e., Holstein (HOL), Nordic red cattle (RDC) and Jersey (JER). Second, we analyzed for enrichments of association signals in miRNAs and their miRNA-target networks. Our results demonstrated that genomic regions harboring miRNA genes were significantly (P?<?0.05) enriched with GWAS signals for milk production traits and mastitis, and that enrichments within miRNA-target gene networks were significantly higher than in random gene-sets for the majority of traits. Furthermore, most between-trait and across-breed correlations of enrichments with miRNA-target networks were significantly greater than with random gene-sets, suggesting pleiotropic effects of miRNAs. Intriguingly, genes that were differentially expressed in response to mammary gland infections were significantly enriched in the miRNA-target networks associated with mastitis. All these findings were consistent across three breeds. Collectively, our observations demonstrate the importance of miRNAs and their targets for the expression of complex traits.
Project description:The cardiac transcription factor (TF) gene NKX2-5 has been associated with electrocardiographic (EKG) traits through genome-wide association studies (GWASs), but the extent to which differential binding of NKX2-5 at common regulatory variants contributes to these traits has not yet been studied. We analyzed transcriptomic and epigenomic data from induced pluripotent stem cell-derived cardiomyocytes from seven related individuals, and identified ~2,000 single-nucleotide variants associated with allele-specific effects (ASE-SNVs) on NKX2-5 binding. NKX2-5 ASE-SNVs were enriched for altered TF motifs, for heart-specific expression quantitative trait loci and for EKG GWAS signals. Using fine-mapping combined with epigenomic data from induced pluripotent stem cell-derived cardiomyocytes, we prioritized candidate causal variants for EKG traits, many of which were NKX2-5 ASE-SNVs. Experimentally characterizing two NKX2-5 ASE-SNVs (rs3807989 and rs590041) showed that they modulate the expression of target genes via differential protein binding in cardiac cells, indicating that they are functional variants underlying EKG GWAS signals. Our results show that differential NKX2-5 binding at numerous regulatory variants across the genome contributes to EKG phenotypes.
Project description:The resolution of genome-wide association studies (GWAS) is limited by the linkage disequilibrium (LD) structure of the population being studied. Selecting the most likely causal variants within an LD block is relatively straightforward within coding sequence, but is more difficult when all variants are intergenic. Predicting functional non-coding sequence has been recently facilitated by the availability of conservation and epigenomic information. We present HaploReg, a tool for exploring annotations of the non-coding genome among the results of published GWAS or novel sets of variants. Using LD information from the 1000 Genomes Project, linked SNPs and small indels can be visualized along with their predicted chromatin state in nine cell types, conservation across mammals and their effect on regulatory motifs. Sets of SNPs, such as those resulting from GWAS, are analyzed for an enrichment of cell type-specific enhancers. HaploReg will be useful to researchers developing mechanistic hypotheses of the impact of non-coding variants on clinical phenotypes and normal variation. The HaploReg database is available at http://compbio.mit.edu/HaploReg.