IFunMed: Integrative functional mediation analysis of GWAS and eQTL studies.
ABSTRACT: Genome-wide association studies (GWAS) have successfully identified thousands of genetic variants contributing to disease and other phenotypes. However, significant obstacles hamper our ability to elucidate causal variants, identify genes affected by causal variants, and characterize the mechanisms by which genotypes influence phenotypes. The increasing availability of genome-wide functional annotation data is providing unique opportunities to incorporate prior information into the analysis of GWAS to better understand the impact of variants on disease etiology. Although there have been many advances in incorporating prior information into prioritization of trait-associated variants in GWAS, functional annotation data have played a secondary role in the joint analysis of GWAS and molecular (i.e., expression) quantitative trait loci (eQTL) data in assessing evidence for association. To address this, we develop a novel mediation framework, iFunMed, to integrate GWAS and eQTL data with the utilization of publicly available functional annotation data. iFunMed extends the scope of standard mediation analysis by incorporating information from multiple genetic variants at a time and leveraging variant-level summary statistics. Data-driven computational experiments convey how informative annotations improve single-nucleotide polymorphism (SNP) selection performance while emphasizing robustness of iFunMed to noninformative annotations. Application to Framingham Heart Study data indicates that iFunMed is able to boost detection of SNPs with mediation effects that can be attributed to regulatory mechanisms.
Project description:Gene-based association tests aggregate genotypes across multiple variants for each gene, providing an interpretable gene-level analysis framework for genome-wide association studies (GWAS). Early gene-based test applications often focused on rare coding variants; a more recent wave of gene-based methods, e.g. TWAS, use eQTLs to interrogate regulatory associations. Regulatory variants are expected to be particularly valuable for gene-based analysis, since most GWAS associations to date are non-coding. However, identifying causal genes from regulatory associations remains challenging and contentious. Here, we present a statistical framework and computational tool to integrate heterogeneous annotations with GWAS summary statistics for gene-based analysis, applied with comprehensive coding and tissue-specific regulatory annotations. We compare power and accuracy identifying causal genes across single-annotation, omnibus, and annotation-agnostic gene-based tests in simulation studies and an analysis of 128 traits from the UK Biobank, and find that incorporating heterogeneous annotations in gene-based association analysis increases power and performance identifying causal genes.
Project description:Functional annotations have been shown to improve both the discovery power and fine-mapping accuracy in genome-wide association studies. However, the optimal strategy to incorporate the large number of existing annotations is still not clear. In this study, we propose a Bayesian framework to incorporate functional annotations in a systematic manner. We compute the maximum a posteriori solution and use cross validation to find the optimal penalty parameters. By extending our previous fine-mapping method CAVIARBF into this framework, we require only summary statistics as input. We also derived an exact calculation of Bayes factors using summary statistics for quantitative traits, which is necessary when a large proportion of trait variance is explained by the variants of interest, such as in fine mapping expression quantitative trait loci (eQTL). We compared the proposed method with PAINTOR using different strategies to combine annotations. Simulation results show that the proposed method achieves the best accuracy in identifying causal variants among the different strategies and methods compared. We also find that for annotations with moderate effects from a large annotation pool, screening annotations individually and then combining the top annotations can produce overly optimistic results. We applied these methods on two real data sets: a meta-analysis result of lipid traits and a cis-eQTL study of normal prostate tissues. For the eQTL data, incorporating annotations significantly increased the number of potential causal variants with high probabilities.
Project description:Despite ongoing large-scale population-based whole-genome sequencing (WGS) projects such as the NIH NHLBI TOPMed program and the NHGRI Genome Sequencing Program, WGS-based association analysis of complex traits remains a tremendous challenge due to the large number of rare variants, many of which are non-trait-associated neutral variants. External biological knowledge, such as functional annotations based on the ENCODE, Epigenomics Roadmap and GTEx projects, may be helpful in distinguishing causal rare variants from neutral ones; however, each functional annotation can only provide certain aspects of the biological functions. Our knowledge for selecting informative annotations a priori is limited, and incorporating non-informative annotations will introduce noise and lose power. We propose FunSPU, a versatile and adaptive test that incorporates multiple biological annotations and is adaptive at both the annotation and variant levels and thus maintains high power even in the presence of noninformative annotations. In addition to extensive simulations, we illustrate our proposed test using the TWINSUK cohort (n = 1,752) of UK10K WGS data based on six functional annotations: CADD, RegulomeDB, FunSeq, Funseq2, GERP++, and GenoSkyline. We identified genome-wide significant genetic loci on chromosome 19 near gene TOMM40 and APOC4-APOC2 associated with low-density lipoprotein (LDL), which are replicated in the UK10K ALSPAC cohort (n = 1,497). These replicated LDL-associated loci were missed by existing rare variant association tests that either ignore external biological information or rely on a single source of biological knowledge. We have implemented the proposed test in an R package "FunSPU".
Project description:Expression quantitative trait loci (eQTL) studies are used to understand the regulatory function of non-coding genome-wide association study (GWAS) risk loci, but colocalization alone does not demonstrate a causal relationship of gene expression affecting a trait. Evidence for mediation, that perturbation of gene expression in a given tissue or developmental context will induce a change in the downstream GWAS trait, can be provided by two-sample Mendelian Randomization (MR). Here, we introduce a new statistical method, MRLocus, for Bayesian estimation of the gene-to-trait effect from eQTL and GWAS summary data for loci with evidence of allelic heterogeneity, that is, containing multiple causal variants. MRLocus makes use of a colocalization step applied to each nearly-LD-independent eQTL, followed by an MR analysis step across eQTLs. Additionally, our method involves estimation of the extent of allelic heterogeneity through a dispersion parameter, indicating variable mediation effects from each individual eQTL on the downstream trait. Our method is evaluated against other state-of-the-art methods for estimation of the gene-to-trait mediation effect, using an existing simulation framework. In simulation, MRLocus often has the highest accuracy among competing methods, and in each case provides more accurate estimation of uncertainty as assessed through interval coverage. MRLocus is then applied to five candidate causal genes for mediation of particular GWAS traits, where gene-to-trait effects are concordant with those previously reported. We find that MRLocus's estimation of the causal effect across eQTLs within a locus provides useful information for determining how perturbation of gene expression or individual regulatory elements will affect downstream traits. The MRLocus method is implemented as an R package available at https://mikelove.github.io/mrlocus.
Project description:A main challenge in genome-wide association studies (GWAS) is to pinpoint possible causal variants. Results from GWAS typically do not directly translate into causal variants because the majority of hits are in non-coding or intergenic regions, and the presence of linkage disequilibrium leads to effects being statistically spread out across multiple variants. Post-GWAS annotation facilitates the selection of most likely causal variant(s). Multiple resources are available for post-GWAS annotation, yet these can be time consuming and do not provide integrated visual aids for data interpretation. We, therefore, develop FUMA: an integrative web-based platform using information from multiple biological resources to facilitate functional annotation of GWAS results, gene prioritization and interactive visualization. FUMA accommodates positional, expression quantitative trait loci (eQTL) and chromatin interaction mappings, and provides gene-based, pathway and tissue enrichment results. FUMA results directly aid in generating hypotheses that are testable in functional experiments aimed at proving causal relations.
Project description:With the increasing availability of functional genomic data, incorporating genomic annotations into genetic association analysis has become a standard procedure. However, the existing methods often lack rigor and/or computational efficiency and consequently do not maximize the utility of functional annotations. In this paper, we propose a rigorous inference procedure to perform integrative association analysis incorporating genomic annotations for both traditional GWASs and emerging molecular QTL mapping studies. In particular, we propose an algorithm, named deterministic approximation of posteriors (DAP), which enables highly efficient and accurate joint enrichment analysis and identification of multiple causal variants. We use a series of simulation studies to highlight the power and computational efficiency of our proposed approach and further demonstrate it by analyzing the cross-population eQTL data from the GEUVADIS project and the multi-tissue eQTL data from the GTEx project. In particular, we find that genetic variants predicted to disrupt transcription factor binding sites are enriched in cis-eQTLs across all tissues. Moreover, the enrichment estimates obtained across the tissues are correlated with the cell types for which the annotations are derived.
Project description:Biological interpretation of genomic summary data such as those resulting from genome-wide association studies (GWAS) and expression quantitative trait loci (eQTL) studies is one of the major bottlenecks in medical genomics research, calling for efficient and integrative tools to resolve this problem.We introduce eXploring Genomic Relations (XGR), an open source tool designed for enhanced interpretation of genomic summary data enabling downstream knowledge discovery. Targeting users of varying computational skills, XGR utilises prior biological knowledge and relationships in a highly integrated but easily accessible way to make user-input genomic summary datasets more interpretable. We show how by incorporating ontology, annotation, and systems biology network-driven approaches, XGR generates more informative results than conventional analyses. We apply XGR to GWAS and eQTL summary data to explore the genomic landscape of the activated innate immune response and common immunological diseases. We provide genomic evidence for a disease taxonomy supporting the concept of a disease spectrum from autoimmune to autoinflammatory disorders. We also show how XGR can define SNP-modulated gene networks and pathways that are shared and distinct between diseases, how it achieves functional, phenotypic and epigenomic annotations of genes and variants, and how it enables exploring annotation-based relationships between genetic variants.XGR provides a single integrated solution to enhance interpretation of genomic summary data for downstream biological discovery. XGR is released as both an R package and a web-app, freely available at http://galahad.well.ox.ac.uk/XGR .
Project description:More than 90% of common variants associated with complex traits do not affect proteins directly, but instead the circuits that control gene expression. This has increased the urgency of understanding the regulatory genome as a key component for translating genetic results into mechanistic insights and ultimately therapeutics. To address this challenge, we developed HaploReg (http://compbio.mit.edu/HaploReg) to aid the functional dissection of genome-wide association study (GWAS) results, the prediction of putative causal variants in haplotype blocks, the prediction of likely cell types of action, and the prediction of candidate target genes by systematic mining of comparative, epigenomic and regulatory annotations. Since first launching the website in 2011, we have greatly expanded HaploReg, increasing the number of chromatin state maps to 127 reference epigenomes from ENCODE 2012 and Roadmap Epigenomics, incorporating regulator binding data, expanding regulatory motif disruption annotations, and integrating expression quantitative trait locus (eQTL) variants and their tissue-specific target genes from GTEx, Geuvadis, and other recent studies. We present these updates as HaploReg v4, and illustrate a use case of HaploReg for attention deficit hyperactivity disorder (ADHD)-associated SNPs with putative brain regulatory mechanisms.
Project description:Although genome-wide association studies (GWAS) have been successful at finding thousands of disease-associated genetic variants (GVs), identifying causal variants and elucidating the mechanisms by which genotypes influence phenotypes are critical open questions. A key challenge is that a large percentage of disease-associated GVs are potential regulatory variants located in noncoding regions, making them difficult to interpret. Recent research efforts focus on going beyond annotating GVs by integrating functional annotation data with GWAS to prioritize GVs. However, applicability of these approaches is challenged by high dimensionality and heterogeneity of functional annotation data. Furthermore, existing methods often assume global associations of GVs with annotation data. This strong assumption is susceptible to violations for GVs involved in many complex diseases. To address these issues, we develop a general regression framework, named Annotation Regression for GWAS (ARoG). ARoG is based on a finite mixture of linear regressions model where GWAS association measures are viewed as responses and functional annotations as predictors. This mixture framework addresses heterogeneity of effects of GVs by grouping them into clusters and high dimensionality of the functional annotations by enabling annotation selection within each cluster. ARoG further employs permutation testing to evaluate the significance of selected annotations. Computational experiments indicate that ARoG can discover distinct associations between disease risk and functional annotations. Application of ARoG to autism and schizophrenia data from Psychiatric Genomics Consortium led to identification of GVs that significantly affect interactions of several transcription factors with DNA as potential mechanisms contributing to these disorders.