The analysis of novel distal Cebpa enhancers and silencers using a transcriptional model reveals the complex regulatory logic of hematopoietic lineage specification.
ABSTRACT: C/EBP? plays an instructive role in the macrophage-neutrophil cell-fate decision and its expression is necessary for neutrophil development. How Cebpa itself is regulated in the myeloid lineage is not known. We decoded the cis-regulatory logic of Cebpa, and two other myeloid transcription factors, Egr1 and Egr2, using a combined experimental-computational approach. With a reporter design capable of detecting both distal enhancers and silencers, we analyzed 46 putative cis-regulatory modules (CRMs) in cells representing myeloid progenitors, and derived early macrophages or neutrophils. In addition to novel enhancers, this analysis revealed a surprisingly large number of silencers. We determined the regulatory roles of 15 potential transcriptional regulators by testing 32,768 alternative sequence-based transcriptional models against CRM activity data. This comprehensive analysis allowed us to infer the cis-regulatory logic for most of the CRMs. Silencer-mediated repression of Cebpa was found to be effected mainly by TFs expressed in non-myeloid lineages, highlighting a previously unappreciated contribution of long-distance silencing to hematopoietic lineage resolution. The repression of Cebpa by multiple factors expressed in alternative lineages suggests that hematopoietic genes are organized into densely interconnected repressive networks instead of hierarchies of mutually repressive pairs of pivotal TFs. More generally, our results demonstrate that de novo cis-regulatory dissection is feasible on a large scale with the aid of transcriptional modeling. Current address: Department of Biology, University of North Dakota, 10 Cornell Street, Stop 9019, Grand Forks, ND 58202-9019, USA.
Project description:Cebpa encodes a transcription factor (TF) that plays an instructive role in the development of multiple myeloid lineages. The expression of Cebpa itself is finely modulated, as Cebpa is expressed at high and intermediate levels in neutrophils and macrophages respectively and downregulated in non-myeloid lineages. The cis-regulatory logic underlying the lineage-specific modulation of Cebpa's expression level is yet to be fully characterized. Previously, we had identified 6 new cis-regulatory modules (CRMs) in a 78kb region surrounding Cebpa. We had also inferred the TFs that regulate each CRM by fitting a sequence-based thermodynamic model to a comprehensive reporter activity dataset. Here, we report the cis-regulatory logic of Cebpa CRMs at the resolution of individual binding sites. We tested the binding sites and functional roles of inferred TFs by designing and constructing mutated CRMs and comparing theoretical predictions of their activity against empirical measurements in a myeloid cell line. The enhancers were confirmed to be activated by combinations of PU.1, C/EBP family TFs, Egr1, and Gfi1 as predicted by the model. We show that silencers repress the activity of the proximal promoter in a dominant manner in G1ME cells, which are derived from the red-blood cell lineage. Dominant repression in G1ME cells can be traced to binding sites for GATA and Myb, a motif shared by all of the silencers. Finally, we demonstrate that GATA and Myb act redundantly to silence the proximal promoter. These results indicate that dominant repression is a novel mechanism for resolving hematopoietic lineages. Furthermore, Cebpa has a fail-safe cis-regulatory architecture, featuring several functionally similar CRMs, each of which contains redundant binding sites for multiple TFs. Lastly, by experimentally demonstrating the predictive ability of our sequence-based thermodynamic model, this work highlights the utility of this computational approach for understanding mammalian gene regulation.
Project description:Evolutionary conservation has been used successfully to help identify cis-acting DNA regions that are important in regulating tissue-specific gene expression. Motivated by increasing evidence that some DNA regulatory regions are not evolutionary conserved, we have developed an approach for cis-regulatory region identification that does not rely upon evolutionary sequence conservation.The conservation-independent approach is based on an empirical potential energy between interacting transcription factors (TFs). In this analysis, the potential energy is defined as a function of the number of TF interactions in a genomic region and the strength of the interactions. By identifying sets of interacting TFs, the analysis locates regions enriched with the binding sites of these interacting TFs. We applied this approach to 30 human tissues and identified 6232 putative cis-regulatory modules (CRMs) regulating 2130 tissue-specific genes. Interestingly, some genes appear to be regulated by different CRMs in different tissues. Known regulatory regions are highly enriched in our predicted CRMs. In addition, DNase I hypersensitive sites, which tend to be associated with active regulatory regions, significantly overlap with the predicted CRMs, but not with more conserved regions. We also find that conserved and non-conserved CRMs regulate distinct gene groups. Conserved CRMs control more essential genes and genes involved in fundamental cellular activities such as transcription. In contrast, non-conserved CRMs, in general, regulate more non-essential genes, such as genes related to neural activity.These results demonstrate that identifying relevant sets of binding motifs can help in the mapping of DNA regulatory regions, and suggest that non-conserved CRMs play an important role in gene regulation.
Project description:<h4>Background</h4>Developmental transcriptional regulatory networks are circuits of transcription factors (TFs) and cis-acting DNA elements (Cis Regulatory Modules, CRMs) that dynamically control expression of downstream genes. Comprehensive knowledge of these networks is an essential step towards our understanding of developmental processes. However, this knowledge is mostly based on genome-wide mapping of transcription factor binding sites, and therefore requires prior knowledge regarding the TFs involved in the network.<h4>Results</h4>Focusing on how temporal control of gene expression is integrated within a developmental network, we applied an in silico approach to discover regulatory motifs and CRMs of co-expressed genes, with no prior knowledge about the involved TFs. Our aim was to identify regulatory motifs and potential trans-acting factors which regulate the temporal expression of co-expressed gene sets during a particular process of organogenesis, namely adult heart formation in Drosophila. Starting from whole genome tissue specific expression dynamics, we used an in silico method, cisTargetX, to predict TF binding motifs and CRMs. Potential Nuclear Receptor (NR) binding motifs were predicted to control the temporal expression profile of a gene set with increased expression levels during mid metamorphosis. The predicted CRMs and NR motifs were validated in vivo by reporter gene essays. In addition, we provide evidence that three NRs modulate CRM activity and behave as temporal regulators of target enhancers.<h4>Conclusions</h4>Our approach was successful in identifying CRMs and potential TFs acting on the temporal regulation of target genes. In addition, our results suggest a modular architecture of the regulatory machinery, in which the temporal and spatial regulation can be uncoupled and encoded by distinct CRMs.
Project description:<h4>Background</h4>The detection of cis-regulatory modules (CRMs) that mediate transcriptional responses in eukaryotes remains a key challenge in the postgenomic era. A CRM is characterized by a set of co-occurring transcription factor binding sites (TFBS). In silico methods have been developed to search for CRMs by determining the combination of TFBS that are statistically overrepresented in a certain geneset. Most of these methods solve this combinatorial problem by relying on computational intensive optimization methods. As a result their usage is limited to finding CRMs in small datasets (containing a few genes only) and using binding sites for a restricted number of transcription factors (TFs) out of which the optimal module will be selected.<h4>Results</h4>We present an itemset mining based strategy for computationally detecting cis-regulatory modules (CRMs) in a set of genes. We tested our method by applying it on a large benchmark data set, derived from a ChIP-Chip analysis and compared its performance with other well known cis-regulatory module detection tools.<h4>Conclusion</h4>We show that by exploiting the computational efficiency of an itemset mining approach and combining it with a well-designed statistical scoring scheme, we were able to prioritize the biologically valid CRMs in a large set of coregulated genes using binding sites for a large number of potential TFs as input.
Project description:Gene transcriptional regulation relies on cis-regulatory DNA modules (CRMs), which serve as nexus sites for integration of multiple transcription factor (TF) activities. Here, we provide evidence and discuss recent literature indicating that TF recruitment to CRMs is organized into combinations of trans-regulatory protein modules (TRMs). We propose that TRMs are functional entities composed of TFs displaying the most highly interdependent chromatin binding which are, in addition, able to modulate their recruitment to CRMs through inter-TRM effects. These findings shed light on the architectural organization of TF recruitment encoded by their recognition motifs within CRMs.
Project description:<h4>Background</h4>In eukaryotes, transcriptional regulation is usually mediated by interactions of multiple transcription factors (TFs) with their respective specific cis-regulatory elements (CREs) in the so-called cis-regulatory modules (CRMs) in DNA. Although the knowledge of CREs and CRMs in a genome is crucial to elucidate gene regulatory networks and understand many important biological phenomena, little is known about the CREs and CRMs in most eukaryotic genomes due to the difficulty to characterize them by either computational or traditional experimental methods. However, the exponentially increasing number of TF binding location data produced by the recent wide adaptation of chromatin immunoprecipitation coupled with microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) technologies has provided an unprecedented opportunity to identify CRMs and CREs in genomes. Nonetheless, how to effectively mine these large volumes of ChIP data to identify CREs and CRMs at nucleotide resolution is a highly challenging task.<h4>Results</h4>We have developed a novel graph-theoretic based algorithm DePCRM for genome-wide de novo predictions of CREs and CRMs using a large number of ChIP datasets. DePCRM predicts CREs and CRMs by identifying overrepresented combinatorial CRE motif patterns in multiple ChIP datasets in an effective way. When applied to 168 ChIP datasets of 56 TFs from D. melanogaster, DePCRM identified 184 and 746 overrepresented CRE motifs and their combinatorial patterns, respectively, and predicted a total of 115,932 CRMs in the genome. The predictions recover 77.9% of known CRMs in the datasets and 89.3% of known CRMs containing at least one predicted CRE. We found that the putative CRMs as well as CREs as a whole in a CRM are more conserved than randomly selected sequences.<h4>Conclusion</h4>Our results suggest that the CRMs predicted by DePCRM are highly likely to be functional. Our algorithm is the first of its kind for de novo genome-wide prediction of CREs and CRMs using larger number of transcription factor ChIP datasets. The algorithm and predictions will hopefully facilitate the elucidation of gene regulatory networks in eukaryotes. All the predicted CREs, CRMs, and their target genes are available at http://bioinfo.uncc.edu/mniu/pcrms/www/.
Project description:Accumulating evidence indicates that transcription factor (TF) binding sites, or cis-regulatory elements (CREs), and their clusters termed cis-regulatory modules (CRMs) play a more important role than do gene-coding sequences in specifying complex traits in humans, including the susceptibility to common complex diseases. To fully characterize their roles in deriving the complex traits/diseases, it is necessary to annotate all CREs and CRMs encoded in the human genome. However, the current annotations of CREs and CRMs in the human genome are still very limited and mostly coarse-grained, as they often lack the detailed information of CREs in CRMs. Here, we integrated 620 TF ChIP-seq datasets produced by the ENCODE project for 168 TFs in 79 different cell/tissue types and predicted an unprecedentedly completely map of CREs in CRMs in the human genome at single nucleotide resolution. The map includes 305 912 CRMs containing a total of 1 178 913 CREs belonging to 736 unique TF binding motifs. The predicted CREs and CRMs tend to be subject to either purifying selection or positive selection, thus are likely to be functional. Based on the results, we also examined the status of available ChIP-seq datasets for predicting the entire regulatory genome of humans.
Project description:Genome-wide transcription factor (TF) binding signal analyses reveal co-localization of TF binding sites based on inferred cis-regulatory modules (CRMs). CRMs play a key role in understanding the cooperation of multiple TFs under specific conditions. However, the functions of CRMs and their effects on nearby gene transcription are highly dynamic and context-specific and therefore are challenging to characterize. BICORN (Bayesian Inference of COoperative Regulatory Network) builds a hierarchical Bayesian model and infers context-specific CRMs based on TF-gene binding events and gene expression data for a particular cell type. BICORN automatically searches for a list of candidate CRMs based on the input TF bindings at regulatory regions associated with genes of interest. Applying Gibbs sampling, BICORN iteratively estimates model parameters of CRMs, TF activities, and corresponding regulation on gene transcription, which it models as a sparse network of functional CRMs regulating target genes. The BICORN package is implemented in R (version 3.4 or later) and is publicly available on the CRAN server at https://cran.r-project.org/web/packages/BICORN/index.html.
Project description:Head specification by the head-selector gene, orthodenticle (otx), is highly conserved among bilaterian lineages. However, the molecular mechanisms by which Otx and other transcription factors (TFs) interact with the genome to direct head formation are largely unknown. Here we employ ChIP-seq and RNA-seq approaches in Xenopus tropicalis gastrulae and find that occupancy of the corepressor, TLE/Groucho, is a better indicator of tissue-specific cis-regulatory modules (CRMs) than the coactivator p300, during early embryonic stages. On the basis of TLE binding and comprehensive CRM profiling, we define two distinct types of Otx2- and TLE-occupied CRMs. Using these devices, Otx2 and other head organizer TFs (for example, Lim1/Lhx1 (activator) or Goosecoid (repressor)) are able to upregulate or downregulate a large battery of target genes in the head organizer. An underlying principle is that Otx marks target genes for head specification to be regulated positively or negatively by partner TFs through specific types of CRMs.
Project description:As exome sequencing gives way to genome sequencing, the need to interpret the function of regulatory DNA becomes increasingly important. To test whether evolutionary conservation of cis-regulatory modules (CRMs) gives insight into human gene regulation, we determined transcription factor (TF) binding locations of four liver-essential TFs in liver tissue from human, macaque, mouse, rat, and dog. Approximately, two thirds of the TF-bound regions fell into CRMs. Less than half of the human CRMs were found as a CRM in the orthologous region of a second species. Shared CRMs were associated with liver pathways and disease loci identified by genome-wide association studies. Recurrent rare human disease causing mutations at the promoters of several blood coagulation and lipid metabolism genes were also identified within CRMs shared in multiple species. This suggests that multi-species analyses of experimentally determined combinatorial TF binding will help identify genomic regions critical for tissue-specific gene control.