ChIPSummitDB: a ChIP-seq-based database of human transcription factor binding sites and the topological arrangements of the proteins bound to them.
ABSTRACT: ChIP-seq reveals genomic regions where proteins, e.g. transcription factors (TFs) interact with DNA. A substantial fraction of these regions, however, do not contain the cognate binding site for the TF of interest. This phenomenon might be explained by protein-protein interactions and co-precipitation of interacting gene regulatory elements. We uniformly processed 3727 human ChIP-seq data sets and determined the cistrome of 292 TFs, as well as the distances between the TF binding motif centers and the ChIP-seq peak summits. ChIPSummitDB enables the analysis of ChIP-seq data using multiple approaches. The 292 cistromes and corresponding ChIP-seq peak sets can be browsed in GenomeView. Overlapping SNPs can be inspected in dbSNPView. Most importantly, the MotifView and PairShiftView pages show the average distance between motif centers and overlapping ChIP-seq peak summits and distance distributions thereof, respectively. In addition to providing a comprehensive human TF binding site collection, the ChIPSummitDB database and web interface allows for the examination of the topological arrangement of TF complexes genome-wide. ChIPSummitDB is freely accessible at http://summit.med.unideb.hu/summitdb/. The database will be regularly updated and extended with the newly available human and mouse ChIP-seq data sets.
Project description:Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is the most popular assay to identify genomic regions, called ChIP-seq peaks, that are bound in vivo by transcription factors (TFs). These regions are derived from direct TF-DNA interactions, indirect binding of the TF to the DNA (through a co-binding partner), nonspecific binding to the DNA, and noise/bias/artifacts. Delineating the bona fide direct TF-DNA interactions within the ChIP-seq peaks remains challenging. We developed a dedicated software, ChIP-eat, that combines computational TF binding models and ChIP-seq peaks to automatically predict direct TF-DNA interactions. Our work culminated with predicted interactions covering >4% of the human genome, obtained by uniformly processing 1983 ChIP-seq peak data sets from the ReMap database for 232 unique TFs. The predictions were a posteriori assessed using protein binding microarray and ChIP-exo data, and were predominantly found in high quality ChIP-seq peaks. The set of predicted direct TF-DNA interactions suggested that high-occupancy target regions are likely not derived from direct binding of the TFs to the DNA. Our predictions derived co-binding TFs supported by protein-protein interaction data and defined cis-regulatory modules enriched for disease- and trait-associated SNPs. We provide this collection of direct TF-DNA interactions and cis-regulatory modules through the UniBind web-interface (http://unibind.uio.no).
Project description:Finding where transcription factors (TFs) bind to the DNA is of key importance to decipher gene regulation at a transcriptional level. Classically, computational prediction of TF binding sites (TFBSs) is based on basic position weight matrices (PWMs) which quantitatively score binding motifs based on the observed nucleotide patterns in a set of TFBSs for the corresponding TF. Such models make the strong assumption that each nucleotide participates independently in the corresponding DNA-protein interaction and do not account for flexible length motifs. We introduce transcription factor flexible models (TFFMs) to represent TF binding properties. Based on hidden Markov models, TFFMs are flexible, and can model both position interdependence within TFBSs and variable length motifs within a single dedicated framework. The availability of thousands of experimentally validated DNA-TF interaction sequences from ChIP-seq allows for the generation of models that perform as well as PWMs for stereotypical TFs and can improve performance for TFs with flexible binding characteristics. We present a new graphical representation of the motifs that convey properties of position interdependence. TFFMs have been assessed on ChIP-seq data sets coming from the ENCODE project, revealing that they can perform better than both PWMs and the dinucleotide weight matrix extension in discriminating ChIP-seq from background sequences. Under the assumption that ChIP-seq signal values are correlated with the affinity of the TF-DNA binding, we find that TFFM scores correlate with ChIP-seq peak signals. Moreover, using available TF-DNA affinity measurements for the Max TF, we demonstrate that TFFMs constructed from ChIP-seq data correlate with published experimentally measured DNA-binding affinities. Finally, TFFMs allow for the straightforward computation of an integrated TF occupancy score across a sequence. These results demonstrate the capacity of TFFMs to accurately model DNA-protein interactions, while providing a single unified framework suitable for the next generation of TFBS prediction.
Project description:Transcription factors (TFs) often work cooperatively, where the binding of one TF to DNA enhances the binding affinity of a second TF to a nearby location. Such cooperative binding is important for activating gene expression from promoters and enhancers in both prokaryotic and eukaryotic cells. Existing methods to detect cooperative binding of a TF pair rely on analyzing the sequence that is bound. We propose a method that uses, instead, only ChIP-seq peak intensities and an expectation maximization (CPI-EM) algorithm. We validate our method using ChIP-seq data from cells where one of a pair of TFs under consideration has been genetically knocked out. Our algorithm relies on our observation that cooperative TF-TF binding is correlated with weak binding of one of the TFs, which we demonstrate in a variety of cell types, including E. coli, S. cerevisiae and M. musculus cells. We show that this method performs significantly better than a predictor based only on the ChIP-seq peak distance of the TFs under consideration. This suggests that peak intensities contain information that can help detect the cooperative binding of a TF pair. CPI-EM also outperforms an existing sequence-based algorithm in detecting cooperative binding. The CPI-EM algorithm is available at https://github.com/vishakad/cpi-em.
Project description:Post-translational modification (PTM) of transcriptional factors and chromatin remodelling proteins is recognized as a major mechanism by which transcriptional regulation occurs. Chromatin immunoprecipitation (ChIP) in combination with high-throughput sequencing (ChIP-seq) is being applied as a gold standard when studying the genome-wide binding sites of transcription factor (TFs). This has greatly improved our understanding of protein-DNA interactions on a genomic-wide scale. However, current ChIP-seq peak calling tools are not sufficiently sensitive and are unable to simultaneously identify post-translational modified TFs based on ChIP-seq analysis; this is largely due to the wide-spread presence of multiple modified TFs. Using SUMO-1 modification as an example; we describe here an improved approach that allows the simultaneous identification of the particular genomic binding regions of all TFs with SUMO-1 modification.Traditional peak calling methods are inadequate when identifying multiple TF binding sites that involve long genomic regions and therefore we designed a ChIP-seq processing pipeline for the detection of peaks via a combinatorial fusion method. Then, we annotate the peaks with known transcription factor binding sites (TFBS) using the Transfac Matrix Database (v7.0), which predicts potential SUMOylated TFs. Next, the peak calling result was further analyzed based on the promoter proximity, TFBS annotation, a literature review, and was validated by ChIP-real-time quantitative PCR (qPCR) and ChIP-reChIP real-time qPCR. The results show clearly that SUMOylated TFs are able to be pinpointed using our pipeline.A methodology is presented that analyzes SUMO-1 ChIP-seq patterns and predicts related TFs. Our analysis uses three peak calling tools. The fusion of these different tools increases the precision of the peak calling results. TFBS annotation method is able to predict potential SUMOylated TFs. Here, we offer a new approach that enhances ChIP-seq data analysis and allows the identification of multiple SUMOylated TF binding sites simultaneously, which can then be utilized for other functional PTM binding site prediction in future.
Project description:We describe a novel computational approach to identify transcription factors (TFs) that are candidate regulators in a human cell type of interest. Our approach involves integrating cell type-specific expression quantitative trait locus (eQTL) data and TF data from chromatin immunoprecipitation-to-tag-sequencing (ChIP-seq) experiments in cell lines. To test the method, we used eQTL data from human monocytes in order to screen for TFs. Using a list of known monocyte-regulating TFs, we tested the hypothesis that the binding sites of cell type-specific TF regulators would be concentrated in the vicinity of monocyte eQTLs. For each of 397 ChIP-seq data sets, we obtained an enrichment ratio for the number of ChIP-seq peaks that are located within monocyte eQTLs. We ranked ChIP-seq data sets according to their statistical significances for eQTL overlap, and from this ranking, we observed that monocyte-regulating TFs are more highly ranked than would be expected by chance. We identified 27 TFs that had significant monocyte enrichment scores and mapped them into a protein interaction network. Our analysis uncovered two novel candidate monocyte-regulating TFs, BCLAF1 and SIN3A. Our approach is an efficient method to identify candidate TFs that can be used for any cell/tissue type for which eQTL data are available.
Project description:Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) allows researchers to determine the genome-wide binding locations of individual transcription factors (TFs) at high resolution. This information can be interrogated to study various aspects of TF behaviour, including the mechanisms that control TF binding. Physical interaction between TFs comprises one important aspect of TF binding in eukaryotes, mediating tissue-specific gene expression. We have developed an algorithm, spaced motif analysis (SpaMo), which is able to infer physical interactions between the given TF and TFs bound at neighbouring sites at the DNA interface. The algorithm predicts TF interactions in half of the ChIP-seq data sets we test, with the majority of these predictions supported by direct evidence from the literature or evidence of homodimerization. High resolution motif spacing information obtained by this method can facilitate an improved understanding of individual TF complex structures. SpaMo can assist researchers in extracting maximum information relating to binding mechanisms from their TF ChIP-seq data. SpaMo is available for download and interactive use as part of the MEME Suite (http://meme.nbcr.net).
Project description:Although ChIP-seq has become a routine experimental approach for quantitatively characterizing the genome-wide binding of transcription factors (TFs), computational analysis procedures remain far from standardized, making it difficult to compare ChIP-seq results across experiments. In addition, although genome-wide binding patterns must ultimately be determined by local constellations of DNA-binding sites, current analysis is typically limited to identifying enriched motifs in ChIP-seq peaks. Here we present Crunch, a completely automated computational method that performs all ChIP-seq analysis from quality control through read mapping and peak detecting and that integrates comprehensive modeling of the ChIP signal in terms of known and novel binding motifs, quantifying the contribution of each motif and annotating which combinations of motifs explain each binding peak. By applying Crunch to 128 data sets from the ENCODE Project, we show that Crunch outperforms current peak finders and find that TFs naturally separate into "solitary TFs," for which a single motif explains the ChIP-peaks, and "cobinding TFs," for which multiple motifs co-occur within peaks. Moreover, for most data sets, the motifs that Crunch identified de novo outperform known motifs, and both the set of cobinding motifs and the top motif of solitary TFs are consistent across experiments and cell lines. Crunch is implemented as a web server, enabling standardized analysis of any collection of ChIP-seq data sets by simply uploading raw sequencing data. Results are provided both in a graphical web interface and as downloadable files.
Project description:Integrated analysis of multiple genome-wide transcription factor (TF)-binding profiles will be vital to advance our understanding of the global impact of TF binding. However, existing methods for measuring similarity in large numbers of chromatin immunoprecipitation assays with sequencing (ChIP-seq), such as correlation, mutual information or enrichment analysis, are limited in their ability to display functionally relevant TF relationships. In this study, we propose the use of graphical models to determine conditional independence between TFs and showed that network visualization provides a promising alternative to distinguish 'direct' versus 'indirect' TF interactions. We applied four algorithms to measure 'direct' dependence to a compendium of 367 mouse haematopoietic TF ChIP-seq samples and obtained a consensus network known as a 'TF association network' where edges in the network corresponded to likely causal pairwise relationships between TFs. The 'TF association network' illustrates the role of TFs in developmental pathways, is reminiscent of combinatorial TF regulation, corresponds to known protein-protein interactions and indicates substantial TF-binding reorganization in leukemic cell types. With the rapid increase in TF ChIP-Seq data sets, the approach presented here will be a powerful tool to study transcriptional programmes across a wide range of biological systems.
Project description:The genome-wide analysis of the binding sites of the transcription factor vitamin D receptor (VDR) is essential for a global appreciation the physiological impact of the nuclear hormone 1?,25-dihydroxyvitamin D3 (1,25(OH)2D3). Genome-wide analysis of lipopolysaccharide (LPS)-polarized THP-1 human monocytic leukemia cells via chromatin immunoprecipitation sequencing (ChIP-seq) resulted in 1,318 high-confidence VDR binding sites, of which 789 and 364 occurred uniquely with and without 1,25(OH)2D3 stimulation, while only 165 were common. We re-analyzed five public VDR ChIP-seq datasets with identical peak calling settings (MACS, version 2) and found, using a novel consensus summit identification strategy, in total 23,409 non-overlapping VDR binding sites, 75% of which are unique within the six analyzed cellular models. LPS-differentiated THP-1 cells have 22% more genomic VDR locations than undifferentiated cells and both cell types display more overlap in their VDR locations than the other investigated cell types. In general, the intersection of VDR binding profiles of ligand-stimulated cells is higher than those of unstimulated cells. De novo binding site searches and HOMER screening for binding motifs formed by direct repeats spaced by three nucleotides (DR3) suggest for all six VDR ChIP-seq datasets that these sequences are found preferentially at highly ligand responsive VDR loci. Importantly, all VDR ChIP-seq datasets display the same relationship between the VDR occupancy and the percentage of DR3-type sequences below the peak summits. The comparative analysis of six VDR ChIP-seq datasets demonstrated that the mechanistic basis for the action of the VDR is independent of the cell type. Only the minority of genome-wide VDR binding sites contains a DR3-type sequence. Moreover, the total number of identified VDR binding sites in each ligand-stimulated cell line inversely correlates with the percentage of peak summits with DR3 sites.
Project description:BACKGROUND:Interactions among transcription factors (TFs) and histone modifications (HMs) play an important role in the precise regulation of gene expression. The context specificity of those interactions and further its dynamics in normal and disease remains largely unknown. Recent development in genomics technology enables transcription profiling by RNA-seq and protein's binding profiling by ChIP-seq. Integrative analysis of the two types of data allows us to investigate TFs and HMs interactions both from the genome co-localization and downstream target gene expression. RESULTS:We propose a integrative pipeline to explore the co-localization of 55 TFs and 11 HMs and its dynamics in human GM12878 and K562 by matched ChIP-seq and RNA-seq data from ENCODE. We classify TFs and HMs into three types based on their binding enrichment around transcription start site (TSS). Then a set of statistical indexes are proposed to characterize the TF-TF and TF-HM co-localizations. We found that Rad21, SMC3, and CTCF co-localized across five cell lines. High resolution Hi-C data in GM12878 shows that they associate most of the Hi-C peak loci with a specific CTCF-motif "anchor" and supports that CTCF, SMC3, and RAD2 co-localization serves important role in 3D chromatin structure. Meanwhile, 17 TF-TF pairs are highly dynamic between GM12878 and K562. We then build SVM models to correlate high and low expression level of target genes with TF binding and HM strength. We found that H3k9ac, H3k27ac, and three TFs (ELF1, TAF1, and POL2) are predictive with the accuracy about 85~92%. CONCLUSION:We propose a pipeline to analyze the co-localization of TF and HM and their dynamics across cell lines from ChIP-seq, and investigate their regulatory potency by RNA-seq. The integrative analysis of two level data reveals new insight for the cooperation of TFs and HMs and is helpful in understanding cell line specificity of TF/HM interactions.