Contribution of nucleosome binding preferences and co-occurring DNA sequences to transcription factor binding.
ABSTRACT: BACKGROUND: Chromatin plays a critical role in regulating transcription factors (TFs) binding to their canonical transcription factor binding sites (TFBS). Recent studies in vertebrates show that many TFs preferentially bind to genomic regions that are well bound by nucleosomes in vitro. Co-occurring secondary motifs sometimes correlated with functional TFBS. RESULTS: We used a logistic regression to evaluate how well the propensity for nucleosome binding and co-occurrence of a secondary motif identify which canonical motifs are bound in vivo. We used ChIP-seq data for three transcription factors binding to their canonical motifs: c-Jun binding the AP-1 motif (TGA(C)/(G)TCA), GR (glucocorticoid receptor) binding the GR motif (G-ACA---(T)/(C)GT-C), and Hoxa2 (homeobox a2) binding the Pbx (Pre-B-cell leukemia homeobox) motif (TGATTGAT). For all canonical TFBS in the mouse genome, we calculated intrinsic nucleosome occupancy scores (INOS) for its surrounding 150-bps DNA and examined the relationship with in vivo TF binding. In mouse mammary 3134 cells, c-Jun and GR proteins preferentially bound regions calculated to be well-bound by nucleosomes in vitro with the canonical AP-1 and GR motifs themselves contributing to the high INOS. Functional GR motifs are enriched for AP-1 motifs if they are within a nucleosome-sized 150-bps region. GR and Hoxa2 also bind motifs with low INOS, perhaps indicating a different mechanism of action. CONCLUSION: Our analysis quantified the contribution of INOS and co-occurring sequence to the identification of functional canonical motifs in the genome. This analysis revealed an inherent competition between some TFs and nucleosomes for binding canonical TFBS. GR and c-Jun cooperate if they are within 150-bps. Binding of Hoxa2 and a fraction of GR to motifs with low INOS values suggesting they are not in competition with nucleosomes and may function using different mechanisms.
Project description:Accurate gene expression requires the targeting of transcription factors (TFs) to regulatory sequences often occluded within nucleosomes. The ability to target a TF binding site (TFBS) within a nucleosome has been the defining characteristic for a special class of TFs known as pioneer factors. Recent studies suggest TP53 functions as a pioneer factor that can target its TFBS within nucleosomes, but it remains unclear how TP53 binds to nucleosomal DNA. To comprehensively examine TP53 nucleosome binding, we competitively bound TP53 to multiple in vitro-formed nucleosomes containing a high- or low-affinity TP53 TFBS located at differing translational and rotational positions within the nucleosome. Stable TP53-nucleosome complexes were isolated and quantified using next-generation sequencing. Our results demonstrate TP53 binding is limited to nucleosome edges with significant binding inhibition occurring within 50 bp of the nucleosome dyad. Binding site affinity only affects TP53 binding for TFBSs located at the same nucleosomal positions; otherwise, nucleosome position takes precedence. Furthermore, TP53 has strong nonspecific nucleosome binding facilitating its interaction with chromatin. Our in vitro findings were confirmed by examining TP53-induced binding in a cell line model, showing induced binding at nucleosome edges flanked by a nucleosome-free region. Overall, our results suggest that the pioneering capabilities of TP53 are driven by nonspecific nucleosome binding with specific binding at nucleosome edges.
Project description:Nannochloropsis spp. are a group of oleaginous microalgae that harbor an expanded array of lipid-synthesis related genes, yet how they are transcriptionally regulated remains unknown. Here a phylogenomic approach was employed to identify and functionally annotate the transcriptional factors (TFs) and TF binding-sites (TFBSs) in N. oceanica IMET1. Among 36 microalgae and higher plants genomes, a two-fold reduction in the number of TF families plus a seven-fold decrease of average family-size in Nannochloropsis, Rhodophyta and Chlorophyta were observed. The degree of similarity in TF-family profiles is indicative of the phylogenetic relationship among the species, suggesting co-evolution of TF-family profiles and species. Furthermore, comparative analysis of six Nannochloropsis genomes revealed 68 "most-conserved" TFBS motifs, with 11 of which predicted to be related to lipid accumulation or photosynthesis. Mapping the IMET1 TFs and TFBS motifs to the reference plant TF-"TFBS motif" relationships in TRANSFAC enabled the prediction of 78 TF-"TFBS motif" interaction pairs, which consisted of 34 TFs (with 11 TFs potentially involved in the TAG biosynthesis pathway), 30?TFBS motifs and 2,368 regulatory connections between TFs and target genes. Our results form the basis of further experiments to validate and engineer the regulatory network of Nannochloropsis spp. for enhanced biofuel production.
Project description:Chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) has become the dominant technique for mapping transcription factor (TF) binding regions genome-wide. We performed an integrative analysis centered around 457 ChIP-seq data sets on 119 human TFs generated by the ENCODE Consortium. We identified highly enriched sequence motifs in most data sets, revealing new motifs and validating known ones. The motif sites (TF binding sites) are highly conserved evolutionarily and show distinct footprints upon DNase I digestion. We frequently detected secondary motifs in addition to the canonical motifs of the TFs, indicating tethered binding and cobinding between multiple TFs. We observed significant position and orientation preferences between many cobinding TFs. Genes specifically expressed in a cell line are often associated with a greater occurrence of nearby TF binding in that cell line. We observed cell-line-specific secondary motifs that mediate the binding of the histone deacetylase HDAC2 and the enhancer-binding protein EP300. TF binding sites are located in GC-rich, nucleosome-depleted, and DNase I sensitive regions, flanked by well-positioned nucleosomes, and many of these features show cell type specificity. The GC-richness may be beneficial for regulating TF binding because, when unoccupied by a TF, these regions are occupied by nucleosomes in vivo. We present the results of our analysis in a TF-centric web repository Factorbook (http://factorbook.org) and will continually update this repository as more ENCODE data are generated.
Project description:Despite the apparent appropriateness of left ventricular (LV) remodeling following myocardial infarction (MI), it poses an independent risk factor for development of heart failure. There is a paucity of studies into the molecular mechanisms of LV remodeling in large animal species. We took an unbiased molecular approach to identify candidate transcription factors (TFs) mediating the genetic reprogramming involved in post-MI LV remodeling in swine. Left ventricular tissue was collected from remote, non-infarcted myocardium, 3 weeks after MI-induction or sham-surgery. Microarray analysis identified 285 upregulated and 278 downregulated genes (FDR < 0.05). Of these differentially expressed genes, the promoter regions of the human homologs were searched for common TF binding sites (TFBS). Eighteen TFBS were overrepresented >two-fold (p < 0.01) in upregulated and 13 in downregulated genes. Left ventricular nuclear protein extracts were assayed for DNA-binding activity by protein/DNA array. Out of 345 DNA probes, 30 showed signal intensity changes >two-fold. Five TFs were identified in both TFBS and protein/DNA array analyses, which showed matching changes for COUP-TFII and glucocorticoid receptor (GR) only. Treatment of swine with the GR antagonist mifepristone after MI reduced the post-MI increase in LV mass, but LV dilation remained unaffected. Thus, using an unbiased approach to study post-MI LV remodeling in a physiologically relevant large animal model, we identified COUP-TFII and GR as potential key mediators of post-MI remodeling.
Project description:BACKGROUND:Stretch enhancers (SEs) are large chromatin-defined regulatory elements that are at least 3,000 base pairs (bps) long, in contrast to the median enhancer length of 800 bps. SEs tend to be cell-type specific, regulate cell-type specific gene expression, and are enriched in disease-associated genetic variants in disease-relevant cell types. Transcription factors (TFs) can bind to enhancers to modulate enhancer activity, and their sequence specificity can be represented by motifs. We hypothesize motifs can provide a biological context for how genetic variants contribute to disease. RESULTS:We integrated chromatin state, gene expression, and chromatin accessibility [measured as DNase I Hypersensitive Sites (DHSs)] maps across nine different cell types. Motif enrichment analyses of chromatin-defined enhancer sequences identify several known cell-type specific "master" factors. Furthermore, de novo motif discovery not only recovers many of these motifs, but also identifies novel non-canonical motifs, providing additional insight into TF binding preferences. Across the length of SEs, motifs are most enriched in DHSs, though relative enrichment is also observed outside of DHSs. Interestingly, we show that single nucleotide polymorphisms associated with diseases or quantitative traits significantly overlap motif occurrences located in SEs, but outside of DHSs. CONCLUSIONS:These results reinforce the role of SEs in influencing risk for diseases and suggest an expanded regulatory functional role for motifs that occur outside highly accessible chromatin. Furthermore, the motif signatures generated here expand our understanding of the binding preference of well-characterized TFs.
Project description:Function of non-B DNA structures are poorly understood though several bioinformatics studies predict role of the G-quadruplex DNA structure in transcription. Earlier, using transcriptome profiling we found evidence of widespread G-quadruplex-mediated gene regulation. Herein, we asked whether potential G-quadruplex (PG4) motifs associate with transcription factors (TF). This was analyzed using 220 position weight matrices [designated as transcription factor binding sites (TFBS)], representing 187 unique TF, in >75,000 genes in human, chimpanzee, mouse and rat. Results show binding sites of nine TFs, including that of AP-2, SP1, MAZ and VDR, occurred significantly within 100 bases of the PG4 motif (P < 1.24E-10). PG4-TFBS combinations were conserved in 'orthologously' related promoters across all four organisms and were associated with >850 genes in each genome. Remarkably, seven of the nine TFs were zinc-finger binding proteins indicating a novel characteristic of PG4 motifs. To test these findings, transcriptome profiles from human cell lines treated with G-quadruplex-specific molecules were used; 66 genes were significantly differentially expressed across both cell-types, which also harbored conserved PG4 motifs along with one/more of the nine TFBS. In addition, genes regulated by PG4-TFBS combinations were found to be co-regulated in human tissues, further emphasizing the regulatory significance of the associations.
Project description:Several recent studies have portrayed DNA methylation as a new player in the recruitment of transcription factors (TF) within chromatin, highlighting a need to connect TF binding sites (TFBS) with their respective DNA methylation profiles. However, current TFBS databases are restricted to DNA binding motif sequences. Here, we present MethMotif, a two-dimensional TFBS database that records TFBS position weight matrices along with cell type specific CpG methylation information computed from a combination of ChIP-seq and whole genome bisulfite sequencing datasets. Integrating TFBS motifs with TFBS DNA methylation better portrays the features of DNA loci recognised by TFs. In particular, we found that DNA methylation patterns within TFBS can be cell specific (e.g. MAFF). Furthermore, for a given TF, different DNA methylation profiles are associated with different DNA binding motifs (e.g. REST). To date, MethMotif database records over 500 TFBSs computed from over 2000 ChIP-seq datasets in 11 different cell types. MethMotif portal is accessible through an open source web interface (https://bioinfo-csi.nus.edu.sg/methmotif) that allows users to intuitively explore the entire dataset and perform both single, and batch queries.
Project description:In the genome, most occurrences of transcription factor binding sites (TFBS) have no cis-regulatory activity, which suggests that flanking sequences contain information that distinguishes functional from nonfunctional TFBS. We interrogated the role of flanking sequences near Activator Protein 1 (AP-1) binding sites that reside in DNase I Hypersensitive Sites (DHS) and regions annotated as Enhancers. In these regions, we found that sequence features directly adjacent to the core motif distinguish high from low activity AP-1 sites. Some nearby features are motifs for other TFs that genetically interact with the AP-1 site. Other features are extensions of the AP-1 core motif, which cause the extended sites to match motifs of multiple AP-1 binding proteins. Computational models trained on these data distinguish between sequences with high and low activity AP-1 sites and also predict changes in cis-regulatory activity due to mutations in AP-1 core sites and their flanking sequences. Our results suggest that extended AP-1 binding sites, together with adjacent binding sites for additional TFs, encode part of the information that governs TFBS activity in the genome.
Project description:Nucleosomes present a barrier for the binding of most transcription factors (TFs). However, special TFs known as nucleosome-displacing factors (NDFs) can access embedded sites and cause the depletion of the local nucleosomes as well as repositioning of the neighboring nucleosomes. Here, we developed a novel high-throughput method in yeast to identify NDFs among 104 TFs and systematically characterized the impact of orientation, affinity, location, and copy number of their binding motifs on the nucleosome occupancy. Using this assay, we identified 29 NDF motifs and divided the nuclear TFs into three groups with strong, weak, and no nucleosome-displacing activities. Further studies revealed that tight DNA binding is the key property that underlies NDF activity, and the NDFs may partially rely on the DNA replication to compete with nucleosome. Overall, our study presents a framework to functionally characterize NDFs and elucidate the mechanism of nucleosome invasion.
Project description:eIF4E-binding proteins (4E-BPs) are a widespread class of translational regulators that share a canonical (C) eIF4E-binding motif (4E-BM) with eIF4G. Consequently, 4E-BPs compete with eIF4G for binding to the dorsal surface on eIF4E to inhibit translation initiation. Some 4E-BPs contain non-canonical 4E-BMs (NC 4E-BMs), but the contribution of these motifs to the repressive mechanism--and whether these motifs are present in all 4E-BPs--remains unknown. Here, we show that the three annotated Drosophila melanogaster 4E-BPs contain NC 4E-BMs. These motifs bind to a lateral surface on eIF4E that is not used by eIF4G. This distinct molecular recognition mode is exploited by 4E-BPs to dock onto eIF4E-eIF4G complexes and effectively displace eIF4G from the dorsal surface of eIF4E. Our data reveal a hitherto unrecognized role for the NC4E-BMs and the lateral surface of eIF4E in 4E-BP-mediated translational repression, and suggest that bipartite 4E-BP mimics might represent efficient therapeutic tools to dampen translation during oncogenic transformation.