Exploring spatially adjacent TFBS-clustered regions with Hi-C data.
ABSTRACT: Transcription factor binding sites (TFBSs) are clustered in the human genome, forming the TFBS-clustered regions that regulate gene transcription, which requires dynamic chromatin configurations between promoters and distal regulatory elements. Here, we propose a regulatory model called spatially adjacent TFBS-clustered regions (SATs), in which TFBS-clustered regions are connected by spatial proximity as identified by high-resolution Hi-C data.TFBS-clustered regions forming SATs appeared less frequently in gene promoters than did isolated TFBS-clustered regions, whereas SATs as a whole appeared more frequently. These observations indicate that multiple distal TFBS-clustered regions combined to form SATs to regulate genes. Further examination confirmed that a substantial portion of genes regulated by SATs were located between the paired TFBS-clustered regions instead of the downstream. We reconstructed the chromosomal conformation of the H1 human embryonic stem cell line using the ShRec3D algorithm and proposed the SAT regulatory email@example.com or firstname.lastname@example.org.Supplementary data are available at Bioinformatics online.
Project description:BACKGROUND: Gene expression is regulated mainly by transcription factors (TFs) that interact with regulatory cis-elements on DNA sequences. To identify functional regulatory elements, computer searching can predict TF binding sites (TFBS) using position weight matrices (PWMs) that represent positional base frequencies of collected experimentally determined TFBS. A disadvantage of this approach is the large output of results for genomic DNA. One strategy to identify genuine TFBS is to utilize local concentrations of predicted TFBS. It is unclear whether there is a general tendency for TFBS to cluster at promoter regions, although this is the case for certain TFBS. Also unclear is the identification of TFs that have TFBS concentrated in promoters and to what level this occurs. This study hopes to answer some of these questions. RESULTS: We developed the cluster score measure to evaluate the correlation between predicted TFBS clusters and promoter sequences for each PWM. Non-promoter sequences were used as a control. Using the cluster score, we identified a PWM group called PWM-PCP, in which TFBS clusters positively correlate with promoters, and another PWM group called PWM-NCP, in which TFBS clusters negatively correlate with promoters. The PWM-PCP group comprises 47% of the 199 vertebrate PWMs, while the PWM-NCP group occupied 11 percent. After reducing the effect of CpG islands (CGI) against the clusters using partial correlation coefficients among three properties (promoter, CGI and predicted TFBS cluster), we identified two PWM groups including those strongly correlated with CGI and those not correlated with CGI. CONCLUSION: Not all PWMs predict TFBS correlated with human promoter sequences. Two main PWM groups were identified: (1) those that show TFBS clustered in promoters associated with CGI, and (2) those that show TFBS clustered in promoters independent of CGI. Assessment of PWM matches will allow more positive interpretation of TFBS in regulatory regions.
Project description:DNase I hypersensitive sites (DHSs) define the accessible chromatin landscape and have revolutionised the discovery of distinct cis-regulatory elements in diverse organisms. Here, we report the first comprehensive map of human transcription factor binding site (TFBS)-clustered regions using Gaussian kernel density estimation based on genome-wide mapping of the TFBSs in 133 human cell and tissue types. Approximately 1.6 million distinct TFBS-clustered regions, collectively spanning 27.7% of the human genome, were discovered. The TFBS complexity assigned to each TFBS-clustered region was highly correlated with genomic location, cell selectivity, evolutionary conservation, sequence features, and functional roles. An integrative analysis of these regions using ENCODE data revealed transcription factor occupancy, transcriptional activity, histone modification, DNA methylation, and chromatin structures that varied based on TFBS complexity. Furthermore, we found that we could recreate lineage-branching relationships by simple clustering of the TFBS-clustered regions from terminally differentiated cells. Based on these findings, a model of transcriptional regulation determined by TFBS complexity is proposed.
Project description:COTRASIF is a web-based tool for the genome-wide search of evolutionary conserved regulatory regions (transcription factor-binding sites, TFBS) in eukaryotic gene promoters. Predictions are made using either a position-weight matrix search method, or a hidden Markov model search method, depending on the availability of the matrix and actual sequences of the target TFBS. COTRASIF is a fully integrated solution incorporating both a gene promoter database (based on the regular Ensembl genome annotation releases) and both JASPAR and TRANSFAC databases of TFBS matrices. To decrease the false-positives rate an integrated evolutionary conservation filter is available, which allows the selection of only those of the predicted TFBS that are present in the promoters of the related species' orthologous genes. COTRASIF is very easy to use, implements a regularly updated database of promoters and is a powerful solution for genome-wide TFBS searching. COTRASIF is freely available at http://biomed.org.ua/COTRASIF/.
Project description:The development of cisplatin resistance in human cancers is controlled by multiple genes and leads to therapeutic failure. Hypermethylation of specific gene promoters is a key event in clinical resistance to cisplatin. Although the usage of multiple promoters is frequent in the transcription of human genes, the role of alternative promoters and their regulatory sequences have not yet been investigated in cisplatin resistance genes. In a new approach, we hypothesized that human cancers exploit the specific transcription factor-binding sites (TFBS) and CpG islands (CGIs) located in the alternative promoters of certain genes to acquire platinum drug resistance. To provide a useful resource of regulatory elements associated with cisplatin resistance, we investigated the TFBS and CGIs in 48 alternative promoters of 14 hypermethylated cisplatin resistance genes previously reported. CGIs prone to methylation were identified in 28 alternative promoters of 11 hypermethylated genes. The majority of alternative promoters harboring CGIs (93%) were clustered in one phylogenetic subclass, whereas the ones lacking CGIs were distributed in two unrelated subclasses. Regulatory sequences, initiator and TATA-532 prevailed over TATA-8 and were found in all the promoters. B recognition element (BRE) sequences were present only in alternative promoters harboring CGIs, but CCAAT and TAACC were found in both types of alternative promoters, whereas downstream promoter element sequences were significantly less frequent. Therefore, it was hypothesized that BRE and CGI sequences co-localized in alternative promoters of cisplatin resistance genes may be used to design molecular markers for drug resistance. A more extensive knowledge of alternative promoters and their regulatory elements in clinical resistance to cisplatin is likely to usher novel avenues for sensitizing human cancers to treatment.
Project description:Previously, we identified 8-bps long DNA sequences (8-mers) that localize in human proximal promoters and grouped them into known transcription factor binding sites (TFBS). We now examine split 8-mers consisting of two 4-mers separated by 1-bp to 30-bps (X(4)-N(1-30)-X(4)) to identify pairs of TFBS that localize in proximal promoters at a precise distance. These include two overlapping TFBS: the ETS?ETS motif ((C/G)CCGGAAGCGGAA) and the ETS?CRE motif ((C/G)CGGAAGTGACGTCAC). The nucleotides in bold are part of both TFBS. Molecular modeling shows that the ETS?CRE motif can be bound simultaneously by both the ETS and the B-ZIP domains without protein-protein clashes. The electrophoretic mobility shift assay (EMSA) shows that the ETS protein GABP? and the B-ZIP protein CREB preferentially bind to the ETS?CRE motif only when the two TFBS overlap precisely. In contrast, the ETS domain of ETV5 and CREB interfere with each other for binding the ETS?CRE. The 11-mer (CGGAAGTGACG), the conserved part of the ETS?CRE motif, occurs 226 times in the human genome and 83% are in known regulatory regions. In vivo GABP? and CREB ChIP-seq peaks identified the ETS?CRE as the most enriched motif occurring in promoters of genes involved in mRNA processing, cellular catabolic processes, and stress response, suggesting that a specific class of genes is regulated by this composite motif.
Project description:Germin-like proteins (GLPs) are involved in biotic and abiotic stress tolerance in different plant species. Rice (Oryza sativa L.) genome contains about 40 GLP family member proteins in nine chromosomes. Although some of the rice GLP (OsGLP) promoters have been studied through in silico analysis as well as experimentally, studies regarding the distribution pattern of the biotic and abiotic stress associated transcription factor binding sites (TFbs) in the promoter regions of OsGLP genes have not been attempted thoroughly. Several transcription factors (TFs) namely NAC, WRKY, bHLH, bZIP, MYB and AP2/ERF act as major TFs concerned with biotic as well as abiotic stress responses across various plant species. In the present study the in silico analysis was carried out using the 1.5 kilobases (kb) promoter regions from 40 different OsGLP genes for the presence of NAC, WRKY, bHLH, bZIP, MYB and AP2/ERF TFbs in it. Among various OsGLP gene promoters, OsGLP8-11 was found to contain highest number of tested TFbs in the promoter region whereas the promoter region of OsGLP5-1 depicted least amount of TFbs. Phylogenetic study of promoter regions of different OsGLP genes revealed four different clades. Our analyses could reveal the evolutionary significance of different OsGLP gene promoters. It can be presumed from the present findings as well as previous reports that OsGLP gene duplications and subsequent variations in the TFbs in OsGLP gene promoter regions might be the consequences of neofunctionalization of OsGLP genes and their promoters for biotic and abiotic stress tolerance in rice.
Project description:Purpose: 224 GSM Samples form GSE32970 and GSE29692 was reanalyzed to find the TFBS-clustered regions of 133 cell lines. TFBS-clustered regions were divided into ten classes belong to the TF complexity. Methods: 1. We assigned the binding sites of 542 TFs in 133 cell lines as our record GSE53962 . 2. We performed a Gaussian kernel density estimation across the genome with a bandwidth of 300 bp, using the centers of each of the TF binding peaks as points. Then, we scanned this density for peaks, and denoted each peak a TF region.To determine the complexity of the TF region, we summed the Gaussian kernalized distance from the peak to each TF that contributed at least 0.1 to its strength. The TF region around eat peak was derived by finding the maximum distance (in bp) from the peak to a contributing TF, and then adding 150 bp (one half of the bandwidth). Each TF region is centered on the peak, and have a TF complexity value. 3. According to TFBS complexity, we divided these TFBS-clustered regions into ten classes: from TC0 to TC9 with increasing TFBS complexity. Result: Using the binding sites of 542 TFs in 133 cell lines, we assigned a TF complexity score to each TF region corresponding to the number of distinct TFs bound, resulting in ten classes TFBS-clustered regions of 133 cell lines. Overall design: Ten classes TFBS-clustered regions of 133 Cell Lines. 54 GSM Samples from GSE32970 and 164 GSM Samples from GSE29692 were combined to 133 cell types.
Project description:The phenomenon of functional site turnover has important implications for the study of regulatory region evolution, such as for promoter sequence alignments and transcription factor binding site (TFBS) identification. At present, it remains difficult to estimate TFBS turnover rates on real genomic sequences, as reliable mappings of functional sites across related species are often not available. As an alternative, we introduce a flexible new simulation system, Phylogenetic Simulation of Promoter Evolution (PSPE), designed to study functional site turnovers in regulatory sequences.Using PSPE, we study replacement turnover rates of different individual TFBSs and simple modules of two sites under neutral evolutionary functional constraints. We find that TFBS replacement turnover can happen rapidly in promoters, and turnover rates vary significantly among different TFBSs and modules. We assess the influence of different constraints such as insertion/deletion rate and translocation distances. Complementing the simulations, we give simple but effective mathematical models for TFBS turnover rate prediction. As one important application of PSPE, we also present a first systematic evaluation of multiple sequence aligners regarding their capability of detecting TFBSs in promoters with site turnovers.PSPE allows researchers for the first time to investigate TFBS replacement turnovers in promoters systematically. The assessment of alignment tools points out the limitations of current approaches to identify TFBSs in non-coding sequences, where turnover events of functional sites may happen frequently, and where we are interested in assessing the similarity on the functional level. PSPE is freely available at the authors' website.
Project description:Gene expression is regulated by combinations of transcription factors, which can be mapped to regulatory elements on a genome-wide scale using ChIP experiments. In a previous ChIP-chip study of USF1 and USF2 we found evidence also of binding of GABP, FOXA2 and HNF4a within the enriched regions. Here, we have applied ChIP-seq for these transcription factors and identified 3064 peaks of enrichment for GABP, 7266 for FOXA2 and 18783 for HNF4a. Distal elements with USF2 signal was frequently bound also by HNF4a and FOXA2. GABP peaks were found at transcription start sites, whereas 94% of FOXA2 and 90% of HNF4a peaks were located at other positions. We developed a method to accurately define TFBS within peaks, and found the predicted sites to have an elevated conservation level compared to peak centers; however the majority of bindings were not evolutionary conserved. An interaction between HNF4a and GABP was seen at TSS, with one-third of the HNF4a positive promoters being bound also by GABP, and this interaction was verified by co-immunoprecipitations.
Project description:Increasing data show that intronic derived regulatory elements, such as transcription factor binding sites (TFBs), play key roles in gene regulation, and malfunction. Accordingly, characterizing the sequence context of the intronic regions of the human coagulation factor VIII (hFVIII) gene can be important. In this study, the intronic regions of the hFVIII gene were scrutinized based on in-silico methods. The results disclosed that these regions harbor a rich array of functional elements such as repetitive elements (REs), splicing sites, and transcription factor binding sites (TFBs). Among these elements, TFBs and REs showed a significant distribution and correlation to each other. This survey indicated that 31% of TFBs are localized in the intronic regions of the gene. Moreover, TFBs indicate a strong bias in the regions far from splice sites of introns with mapping to different REs. Accordingly, TFBs showed highly bias toward Short Interspersed Elements (SINEs), which in turn they covering about 12% of the total of REs. However, the distribution pattern of TFBs-REs showed different bias in the intronic regions, spatially into the Introns 13 and 25. The rich array of SINE-TFBs and CR1-TFBs were situated within 5'UTR of the gene that may be an important driving force for regulatory innovation of the hFVIII gene. Taken together, these data may lead to revealing intronic regions with the capacity to renewing gene regulatory networks of the hFVIII gene. On the other hand, these correlations might provide the novel idea for a new hypothesis of molecular evolution of the FVIII gene, and treatment of Hemophilia A which should be considered in future studies.