Combinatorial effects of multiple enhancer variants in linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits.
ABSTRACT: DNA variants (SNPs) that predispose to common traits often localize within noncoding regulatory elements such as enhancers. Moreover, loci identified by genome-wide association studies (GWAS) often contain multiple SNPs in linkage disequilibrium (LD), any of which may be causal. Thus, determining the effect of these multiple variant SNPs on target transcript levels has been a major challenge. Here, we provide evidence that for six common autoimmune disorders (rheumatoid arthritis, Crohn's disease, celiac disease, multiple sclerosis, lupus, and ulcerative colitis), the GWAS association arises from multiple polymorphisms in LD that map to clusters of enhancer elements active in the same cell type. This finding suggests a "multiple enhancer variant" hypothesis for common traits, where several variants in LD impact multiple enhancers and cooperatively affect gene expression. Using a novel method to delineate enhancer-gene interactions, we show that multiple enhancer variants within a given locus typically target the same gene. Using available data from HapMap and B lymphoblasts as a model system, we provide evidence at numerous loci that multiple enhancer variants cooperatively contribute to altered expression of their gene targets. The effects on target transcript levels tend to be modest and can be either gain- or loss-of-function. Additionally, the genes associated with multiple enhancer variants encode proteins that are often functionally related and enriched in common pathways. Overall, the multiple enhancer variant hypothesis offers a new paradigm by which noncoding variants can confer susceptibility to common traits.
Project description:SNPs associated with disease susceptibility often reside in enhancer clusters, or super-enhancers. Constituents of these enhancer clusters cooperate to regulate target genes and often extend beyond the linkage disequilibrium (LD) blocks containing risk SNPs identified in genome-wide association studies (GWAS). We identified 'outside variants', defined as SNPs in weak LD with GWAS risk SNPs that physically interact with risk SNPs as part of a target gene's regulatory circuitry. These outside variants further explain variation in target gene expression beyond that explained by GWAS-associated SNPs. Additionally, the clinical risk associated with GWAS SNPs is considerably modified by the genotype of outside variants. Collectively, these findings suggest a potential model in which outside variants and GWAS SNPs that physically interact in 3D chromatin collude to influence target transcript levels as well as clinical risk. This model offers an additional hypothesis for the source of missing heritability for complex traits.
Project description:Common genetic variants 3' of MC4R within two large linkage disequilibrium (LD) blocks spanning 288 kb have been associated with common and rare forms of obesity. This large association region has not been refined and the relevant DNA segments within the association region have not been identified. In this study, we investigated whether common variants in the MC4R gene region were associated with adiposity-related traits in a biracial population-based study. Single nucleotide polymorphisms (SNPs) in the MC4R region were genotyped with a custom array and a genome-wide array and associations between SNPs and five adiposity-related traits were determined using race-stratified linear regression. Previously reported associations between lower BMI and the minor alleles of rs2229616/Val103Ile and rs52820871/Ile251Leu were replicated in white female participants. Among white participants, rs11152221 in a proximal 3' LD block (closer to MC4R) was significantly associated with multiple adiposity traits, but SNPs in a distal 3' LD block (farther from MC4R) were not. In a case-control study of severe obesity, rs11152221 was significantly associated. The association results directed our follow-up studies to the proximal LD block downstream of MC4R. By considering nucleotide conservation, the significance of association, and proximity to the MC4R gene, we identified a candidate MC4R regulatory region. This candidate region was sequenced in 20 individuals from a study of severe obesity in an attempt to identify additional variants, and the candidate region was tested for enhancer activity using in vivo enhancer assays in zebrafish and mice. Novel variants were not identified by sequencing and the candidate region did not drive reporter gene expression in zebrafish or mice. The identification of a putative insulator in this region could help to explain the challenges faced in this study and others to link SNPs associated with adiposity to altered MC4R expression.
Project description:BACKGROUND:The CHRNA5/A3/B4 gene locus is associated with nicotine dependence and other smoking related disorders. While the non-synonymous CHRNA5 variant rs16969968 appears to be the main risk factor, linkage disequilibrium (LD) bins in the gene cluster carry frequent variants that regulate expression. Pairwise LD and haplotype analyses had identified at least three haplotype tagging SNPs including rs16969968 as main genetic risk factors. Searching for variants with evidence of regulatory functions, we have reported interactions between CHRNA5 and CHRNA3 enhancer variants (tagged by rs880395 and rs1948, respectively) and rs16969968, forming 3-SNP haplotypes and diplotypes that may more accurately reflect the cluster's combined effects on nicotine dependence (Barrie et al., Hum Mutat 38:112-9, 2017). Here we address further contributions by variants affecting CHRNB4, a possibly limiting component of nicotinic receptors. RESULTS:We identify an LD bin (tagged by rs4887074) associated with expression of CHRNB4. Additive logistic regression models indicate that rs4887074 is associated with nicotine dependence and modulates the effect of rs16969968 in GWAS datasets (COGEND, UW-TTURC, SAGE). 4-SNP haplotype and diplotype analyses (rs880395-rs16969968-rs1948 -rs4887074) yield nicotine dependence risk values that further differentiate those obtained with the 3-SNP model. Moreover, both the main G allele of rs16969968 and the minor G allele of rs4887074 (associated with reduced expression of CHRNB4), residing predominantly on common haplotypes that are protective, represent significant allele-specific variance QTLs, indicating that they interact with each other. CONCLUSIONS:These results indicate rs4887074 is associated with CHRNB4 expression, and along with two regulatory variants of CHRNA3 and CHRNA5, modulates the effect of rs16969968 on nicotine dependence risk. Assignable to individuals because of strong LD structures, 4-SNP haplotypes and diplotypes serve to assess the combined genetic influence of this multi-gene cluster on complex traits, accounting for complex LD relationships and tissue-specific genetic effects (CHRNA5/3) relevant to the traits analyzed. The 4-SNP haplotypes account at least in part for previous tagging SNPs, including the highly GWAS-significant rs6495308, located in a distinct pair-wise LD bin but included in protective 4-SNP haplotypes. Our approach refines and integrates the cluster's overall genetic influence, an important variable when integrating the genetics of multiple genomic loci.
Project description:Genome-wide association studies (GWAS) are identifying genetic predisposition to various diseases. The 17q24.3 locus harbors the single nucleotide polymorphism (SNP) rs1859962 that is statistically associated with prostate cancer (PCa). It defines a 130-kb linkage disequilibrium (LD) block that lies in an ?2-Mb gene desert area. The functional biology driving the risk associated with this LD block is unknown. Here, we integrate genome-wide chromatin landscape data sets, namely, epigenomes and chromatin openness from diverse cell types. This identifies a PCa-specific enhancer within the rs1859962 risk LD block that establishes a 1-Mb chromatin loop with the SOX9 gene. The rs8072254 and rs1859961 SNPs mapping to this enhancer impose allele-specific gene expression. The variant allele of rs8072254 facilitates androgen receptor (AR) binding driving increased enhancer activity. The variant allele of rs1859961 decreases FOXA1 binding while increasing AP-1 binding. The latter is key to imposing allele-specific gene expression. The rs8072254 variant in strong LD with the rs1859962 risk SNP can account for the risk associated with this locus, while rs1859961 is a rare variant less likely to contribute to the risk associated with this LD block. Together, our results demonstrate that multiple genetic variants mapping to a unique enhancer looping to the SOX9 oncogene can account for the risk associated with the PCa 17q24.3 locus. Allele-specific recruitment of the transcription factors androgen receptor (AR) and activating protein-1 (AP-1) account for the increased enhancer activity ascribed to this PCa-risk LD block. This further supports the notion that an integrative genomics approach can identify the functional biology disrupted by genetic risk variants.
Project description:For most complex traits, the majority of SNPs identified through genome-wide association studies (GWAS) reside within noncoding regions that have no known function. However, these regions are enriched for the regulatory enhancers specific to the cells relevant to the specific trait. Indeed, many of the GWAS loci that have been functionally characterized lie within enhancers that regulate expression levels of key genes. In order to identify polymorphisms with potential allele-specific regulatory effects, we developed a bioinformatics pipeline that harnesses epigenetic signatures as well as transcription factor (TF) binding motifs to identify putative enhancers containing a SNP with potential allele-specific TF binding in linkage disequilibrium (LD) with a GWAS-identified SNP. We applied the approach to GWAS findings for blood lipids, revealing 7 putative enhancers harboring associated SNPs, 3 of which lie within the introns of LCAT and ABCA1, genes that play crucial roles in cholesterol biogenesis and lipoprotein metabolism. All 3 enhancers demonstrated allele-specific in vitro regulatory activity in liver-derived cell lines. We demonstrated that these putative enhancers are in close physical proximity to the promoters of their respective genes, in situ, likely through chromatin looping. In addition, the associated alleles altered the likelihood of transcription activator STAT3 binding. Our results demonstrate that through our approach, the LD blocks that contain GWAS signals, often hundreds of kilobases in size with multiple SNPs serving as statistical proxies to the true functional site, can provide an experimentally testable hypothesis for the underlying regulatory mechanism linking genetic variants to complex traits.
Project description:Annotating the molecular basis of human disease remains an unsolved challenge, as 93% of disease loci are non-coding and gene-regulatory annotations are highly incomplete<sup>1-3</sup>. Here we present EpiMap, a compendium comprising 10,000 epigenomic maps across 800 samples, which we used to define chromatin states, high-resolution enhancers, enhancer modules, upstream regulators and downstream target genes. We used this resource to annotate 30,000 genetic loci that were associated with 540 traits<sup>4</sup>, predicting trait-relevant tissues, putative causal nucleotide variants in enriched tissue enhancers and candidate tissue-specific target genes for each. We partitioned multifactorial traits into tissue-specific contributing factors with distinct functional enrichments and disease comorbidity patterns, and revealed both single-factor monotropic and multifactor pleiotropic loci. Top-scoring loci frequently had multiple predicted driver variants, converging through multiple enhancers with a common target gene, multiple genes in common tissues, or multiple genes and multiple tissues, indicating extensive pleiotropy. Our results demonstrate the importance of dense, rich, high-resolution epigenomic annotations for the investigation of complex traits.
Project description:Genome-wide association studies (GWAS) of colorectal cancer (CRC) have led to the identification of a number of common variants associated with modest risk. Several risk variants map within the vicinity of TGF?/BMP signaling pathway genes, including rs4939827 within an intron of SMAD7 at 18q21.1. A previous study implicated a novel SNP (novel 1 or rs58920878) as a functional variant within an enhancer element in SMAD7 intron 4. In this study, we show that four SNPs including novel 1 (rs6507874, rs6507875, rs8085824, and rs58920878) in linkage disequilibrium (LD) with the index SNP rs4939827 demonstrate allele-specific enhancer effects in a large, multi-component enhancer of SMAD7. All four SNPs demonstrate allele-specific protein binding to nuclear extracts of CRC cell lines. Furthermore, some of the risk-associated alleles correlate with increased expression of SMAD7 in normal colon tissues. Finally, we show that the enhancer is responsive to BMP4 stimulation. Taken together, we propose that the associated CRC risk at 18q21.1 is due to four functional variants that regulate SMAD7 expression and potentially perturb a BMP negative feedback loop in TGF?/BMP signaling pathways.
Project description:Single-cell screens enable high-throughput functional assessment of enhancers in their endogenous genomic context. However, the design of current studies limits their application to identifying the primary gene targets of enhancers. Here, we improve the experimental and computational parameters of single-cell enhancer screens to identify the secondary gene targets of enhancers. Our analysis of >500 putative enhancers in K562 cells reveals an interwoven enhancer-driven gene regulatory network. We find that enhancers from distinct genomic loci converge to modulate the expression of common sub-modules, including the ?- and ?-globin loci, by directly regulating transcription factors. Our analysis suggests that several genetic variants associated with myeloid blood cell traits alter the activity of a distal enhancer of MYB (?140 kb away), with downstream consequences on hemoglobin genes expression and cell state. These data have implications for the understanding of enhancer-associated traits and emphasize the flexibility of controlling transcriptional systems by modifying enhancer activity.
Project description:Genome-wide association studies (GWASs) have revealed 59 genomic loci associated with type 1 diabetes (T1D). Functional interpretation of the SNPs located in the noncoding region of these loci remains challenging. We perform epigenomic profiling of two enhancer marks, H3K4me1 and H3K27ac, using primary TH1 and TREG cells isolated from healthy and T1D subjects. We uncover a large number of deregulated enhancers and altered transcriptional circuitries in both cell types of T1D patients. We identify four SNPs (rs10772119, rs10772120, rs3176792, rs883868) in linkage disequilibrium (LD) with T1D-associated GWAS lead SNPs that alter enhancer activity and expression of immune genes. Among them, rs10772119 and rs883868 disrupt the binding of retinoic acid receptor ? (RARA) and Yin and Yang 1 (YY1), respectively. Loss of binding by YY1 also results in the loss of long-range enhancer-promoter interaction. These findings provide insights into how noncoding variants affect the transcriptomes of two T-cell subtypes that play critical roles in T1D pathogenesis.