Allele-specific NKX2-5 binding underlies multiple genetic associations with human electrocardiographic traits.
ABSTRACT: The cardiac transcription factor (TF) gene NKX2-5 has been associated with electrocardiographic (EKG) traits through genome-wide association studies (GWASs), but the extent to which differential binding of NKX2-5 at common regulatory variants contributes to these traits has not yet been studied. We analyzed transcriptomic and epigenomic data from induced pluripotent stem cell-derived cardiomyocytes from seven related individuals, and identified ~2,000 single-nucleotide variants associated with allele-specific effects (ASE-SNVs) on NKX2-5 binding. NKX2-5 ASE-SNVs were enriched for altered TF motifs, for heart-specific expression quantitative trait loci and for EKG GWAS signals. Using fine-mapping combined with epigenomic data from induced pluripotent stem cell-derived cardiomyocytes, we prioritized candidate causal variants for EKG traits, many of which were NKX2-5 ASE-SNVs. Experimentally characterizing two NKX2-5 ASE-SNVs (rs3807989 and rs590041) showed that they modulate the expression of target genes via differential protein binding in cardiac cells, indicating that they are functional variants underlying EKG GWAS signals. Our results show that differential NKX2-5 binding at numerous regulatory variants across the genome contributes to EKG phenotypes.
Project description:We conducted a genome-wide analysis to identify regulatory variants affecting the binding of NKX2-5, a core cardiac development transcription factor, and investigated their role in cardiac gene expression and EKG phenotypes. We generated iPSC-derived cardiomyocytes (iPSC-CMs) from a pedigree of seven whole-genome sequenced individuals, and profiled them with a variety of functional genomic assays including RNA-Seq, ATAC-Seq, and ChIP-Seq of both histone modification H3K27ac and NKX2-5. After establishing that iPSC-CMs recapitulated cardiomyocyte-specific expression and epigenetic signatures, and that genetic variants affected the variability of molecular phenotypes across the iPSC-CM lines, we identified heterozygous sites that showed allele-specific effects (ASE). We then investigated NKX2-5 ASE variants in detail by examining whether they altered cardiac TF motifs, and whether they were enriched for eQTLs and EKG GWAS-SNPs. Our data reveal that variation affecting the binding of NKX2-5 and other cardiac TFs likely serves as a molecular mechanism underlying control of numerous EKG loci across the genome, and that fine-mapping approaches, combined with molecular phenotype data from iPSC-CMs, can be used to prioritize causal variants in EKG GWAS loci. Overall design: We selected seven individuals of Asian and European descent in the iPSCORE resource that are part of a three-generational family. Our study design included three genetically unrelated subjects and two parent-offspring quartets, which enabled us to examine the inheritance of genetic effects.We generated and analyzed 56 RNA-Seq (iPSCs: 29 independent samples; iPSC-CMs: 26 independent samples and 1 technical replicate), 48 ChIP-Seq of histone modification H3K27ac (iPSCs: 17 samples and 4 technical replicates; iPSC-CMs: 25 samples and 2 technical replicates), 15 ChIP-seq of NKX2-5 (iPSC-CMs: 12 samples and 3 technical replicates and 37 ATAC-Seq (iPSCs: 12 samples and 5 technical replicates; iPSC-CMs: 11 samples and 9 technical replicates). Raw data requires controlled access and is deposited at dbGaP (phs000924 and phs001325). Data for RNA-Seq, H3K7ac ChIP-Seq and 21/37 ATAC-Seq samples were deposited under the accession GSE125540.
Project description:Interpreting the functional impact of noncoding variants is an ongoing challenge in the field of genome analysis. With most noncoding variants associated with complex traits and disease residing in regulatory regions, altered transcription factor (TF) binding has been proposed as a mechanism of action. It is therefore imperative to develop methods that predict the impact of noncoding variants at TF binding sites (TFBSs). Here, we describe the update of our MANTA database that stores: 1) TFBS predictions in the human genome, and 2) the potential impact on TF binding for all possible single nucleotide variants (SNVs) at these TFBSs. TFBSs were predicted by combining experimental ChIP-seq data from ReMap and computational position weight matrices (PWMs) derived from JASPAR. Impact of SNVs at these TFBSs was assessed by means of PWM scores computed on the alternate alleles. The updated database, MANTA2, provides the scientific community with a critical map of TFBSs and SNV impact scores to improve the interpretation of noncoding variants in the human genome.
Project description:The majority of the single nucleotide variants (SNVs) identified by genome-wide association studies (GWAS) fall outside of the protein-coding regions. Elucidating the functional implications of these variants has been a major challenge. A possible mechanism for functional non-coding variants is that they disrupted the canonical transcription factor (TF) binding sites that affect the <i>in vivo</i> binding of the TF. However, their impact varies since many positions within a TF binding motif are not well conserved. Therefore, simply annotating all variants located in putative TF binding sites may overestimate the functional impact of these SNVs. We conducted a comprehensive survey to study the effect of SNVs on the TF binding affinity. A sequence-based machine learning method was used to estimate the change in binding affinity for each SNV located inside a putative motif site. From the results obtained on 18 TF binding motifs, we found that there is a substantial variation in terms of a SNV's impact on TF binding affinity. We found that only about 20% of SNVs located inside putative TF binding sites would likely to have significant impact on the TF-DNA binding.
Project description:Genome-wide association studies (GWAS) of complex traits, such as alcohol use disorders (AUD), usually identify variants in non-coding regions and cannot by themselves distinguish whether the associated variants are functional or in linkage disequilibrium with the functional variants. Transcriptome studies can identify genes whose expression differs between alcoholics and controls. To test which variants associated with AUD may cause expression differences, we integrated data from deep RNA-seq and GWAS of four postmortem brain regions from 30 subjects with AUD and 30 controls to analyze allele-specific expression (ASE). We identified 88 genes with differential ASE in subjects with AUD compared to controls. Next, to test one potential mechanism contributing to the differential ASE, we analyzed single nucleotide polymorphisms (SNPs) in the 3' untranslated regions (3'UTR) of these genes. Of the 88 genes with differential ASE, 61 genes contained 437 SNPs in the 3'UTR with at least one heterozygote among the subjects studied. Using a modified PASSPORT-seq (parallel assessment of polymorphisms in miRNA target-sites by sequencing) assay, we identified 25 SNPs that affected RNA levels in a consistent manner in two neuroblastoma cell lines, SH-SY5Y and SK-N-BE(2). Many of these SNPs are in binding sites of miRNAs and RNA-binding proteins, indicating that these SNPs are likely causal variants of AUD-associated differential ASE. In sum, we demonstrate that a combination of computational and experimental approaches provides a powerful strategy to uncover functionally relevant variants associated with the risk for AUD.
Project description:Single nucleotide variants (SNVs) located in transcriptional regulatory regions can result in gene expression changes that lead to adaptive or detrimental phenotypic outcomes. Here, we predict gain or loss of binding sites for 741 transcription factors (TFs) across the human genome. We calculated 'gainability' and 'disruptability' scores for each TF that represent the likelihood of binding sites being created or disrupted, respectively. We found that functional cis-eQTL SNVs are more likely to alter TF binding sites than rare SNVs in the human population. In addition, we show that cancer somatic mutations have different effects on TF binding sites from different TF families on a cancer-type basis. Finally, we discuss the relationship between these results and cancer mutational signatures. Altogether, we provide a blueprint to study the impact of SNVs derived from genetic variation or disease association on TF binding to gene regulatory regions.
Project description:Enhancer RNAs (eRNAs) are a subset of long noncoding RNA generated from genomic enhancers: they are thought to act as potent promoters of the expression of nearby genes through interaction with the transcriptional and epigenomic machineries. In the present work, we describe two eRNAs transcribed from the enhancer of Nkx2-5-a gene specifying a master cardiomyogenic lineage transcription factor (TF)-which we call Intergenic Regulatory Element Nkx2-5 Enhancers (IRENEs). The IRENEs are encoded, respectively, on the same strand (SS) and in the divergent direction (div) respect to the nearby gene. Of note, these two eRNAs have opposing roles in the regulation of Nkx2-5: IRENE-SS acts as a canonical promoter of transcription, whereas IRENE-div represses the activity of the enhancer through recruitment of the histone deacetylase sirtuin 1. Thus, we have identified an autoregulatory loop controlling expression of the master cardiac TF NKX2-5, in which one eRNA represses transcription.
Project description:Many variants associated with complex traits are in noncoding regions and contribute to phenotypes by disrupting regulatory sequences. To characterize these variants, we developed a streamlined protocol for a high-throughput reporter assay, Biallelic Targeted STARR-seq (BiT-STARR-seq), that identifies allele-specific expression (ASE) while accounting for PCR duplicates through unique molecular identifiers. We tested 75,501 oligos (43,500 SNPs) and identified 2720 SNPs with significant ASE (FDR < 10%). To validate disruption of binding as one of the mechanisms underlying ASE, we developed a new high-throughput allele-specific binding assay for NFKB1. We identified 2684 SNPs with allele-specific binding (ASB) (FDR < 10%); 256 of these SNPs also had ASE (OR = 1.97, P-value = 0.0006). Of variants associated with complex traits, 1531 resulted in ASE, and 1662 showed ASB. For example, we characterized that the Crohn's disease risk variant for rs3810936 increases NFKB1 binding and results in altered gene expression.
Project description:BACKGROUND:Homeodomain (HD) transcription factor (TF) NKX2-1 critical for the regional specification of the medial ganglionic eminence (MGE) as well as promoting the GABAergic and cholinergic neuron fates via the induction of TFs such as LHX6 and LHX8. NKX2-1 defines MGE regional identity in large part through transcriptional repression, while specification and maturation of GABAergic and cholinergic fates is mediated in part by transcriptional activation via TFs such as LHX6 and LHX8. Here we analyze the signaling and TF pathways, downstream of NKX2-1, required for GABAergic and cholinergic neuron fate maturation. METHODS:Differential ChIP-seq analysis was used to identify regulatory elements (REs) where chromatin state was sensitive to change in the Nkx2-1cKO MGE at embryonic day (E) 13.5. TF motifs in the REs were identified using RSAT. CRISPR-mediated genome editing was used to generate enhancer knockouts. Differential gene expression in these knockouts was analyzed through RT-qPCR and in situ hybridization. Functional analysis of motifs within hs623 was analyzed via site directed mutagenesis and reporter assays in primary MGE cultures. RESULTS:We identified 4782 activating REs (aREs) and 6391 repressing REs (rREs) in the Nkx2-1 conditional knockout (Nkx2-1cKO) MGE. aREs are associated with basic-Helix-Loop-Helix (bHLH) TFs. Deletion of hs623, an intragenic Tcf12 aRE, caused a reduction of Tcf12 expression in the sub-ventricular zone (SVZ) and mantle zone (MZ) of the MGE. Mutation of LHX, SOX and octamers, within hs623, caused a reduction of hs623 activity in MGE primary cultures. CONCLUSIONS:Tcf12 expression in the SVZ of the MGE is mediated through aRE hs623. The activity of hs623 is dependent on LHX6, SOX and octamers. Thus, maintaining the expression of Tcf12 in the SVZ involves on TF pathways parallel and genetically downstream of NKX2-1.
Project description:Transcription factors (TFs) are thought to function with partners to achieve specificity and precise quantitative outputs. In the developing heart, heterotypic TF interactions, such as between the T-box TF TBX5 and the homeodomain TF NKX2-5, have been proposed as a mechanism for human congenital heart defects. We report extensive and complex interdependent genomic occupancy of TBX5, NKX2-5, and the zinc finger TF GATA4 coordinately controlling cardiac gene expression, differentiation, and morphogenesis. Interdependent binding serves not only to co-regulate gene expression but also to prevent TFs from distributing to ectopic loci and activate lineage-inappropriate genes. We define preferential motif arrangements for TBX5 and NKX2-5 cooperative binding sites, supported at the atomic level by their co-crystal structure bound to DNA, revealing a direct interaction between the two factors and induced DNA bending. Complex interdependent binding mechanisms reveal tightly regulated TF genomic distribution and define a combinatorial logic for heterotypic TF regulation of differentiation.
Project description:Genome-wide association studies (GWAS) have identified over 100 loci containing single nucleotide variants (SNVs) that influence the risk of developing multiple sclerosis (MS). Most of these loci lie in non-coding regulatory regions of the genome that are active in immune cells and are therefore thought to modify risk by altering the expression of key immune genes. To explore this hypothesis we screened genes flanking MS-associated variants for evidence of allele specific expression (ASE) by quantifying the transcription of coding variants in linkage disequilibrium with MS-associated SNVs. In total, we were able to identify and successfully analyse 200 such coding variants (from 112 genes) in both CD4+ and CD8+ T cells from 106 MS patients and 105 controls. Fifty-six of these coding variants (from 43 genes) showed statistically significant evidence of ASE in one or both cell types. In the Lck interacting transmembrane adaptor 1 gene (LIME1), for example, we were able to show that in both cell types, the MS-associated variant rs2256814 increased the expression of some transcripts while simultaneously reducing the expression of other transcripts. In CD4+ cells from an additional independent set of 96 cases and 93 controls we were able to replicate the effect of this SNV on the balance of alternate LIME1 transcripts using qPCR (p = 5 × 10<sup>-24</sup>). Our data thus indicate that some of the MS-associated SNVs identified by GWAS likely exert their effects on risk by distorting the balance of alternate transcripts rather than by changing the overall level of gene expression.