We need your help! If you've ever found our data helpful, please take our impact survey (15 min). Your replies will help keep the data flowing to the scientific community. Please Click here for Survey

Unknown

Dataset Information

0

Improving analysis of transcription factor binding sites within ChIP-Seq data based on topological motif enrichment.


ABSTRACT: Chromatin immunoprecipitation (ChIP) coupled to high-throughput sequencing (ChIP-Seq) techniques can reveal DNA regions bound by transcription factors (TF). Analysis of the ChIP-Seq regions is now a central component in gene regulation studies. The need remains strong for methods to improve the interpretation of ChIP-Seq data and the study of specific TF binding sites (TFBS).We introduce a set of methods to improve the interpretation of ChIP-Seq data, including the inference of mediating TFs based on TFBS motif over-representation analysis and the subsequent study of spatial distribution of TFBSs. TFBS over-representation analysis applied to ChIP-Seq data is used to detect which TFBSs arise more frequently than expected by chance. Visualization of over-representation analysis results with new composition-bias plots reveals systematic bias in over-representation scores. We introduce the BiasAway background generating software to resolve the problem. A heuristic procedure based on topological motif enrichment relative to the ChIP-Seq peaks' local maximums highlights peaks likely to be directly bound by a TF of interest. The results suggest that on average two-thirds of a ChIP-Seq dataset's peaks are bound by the ChIP'd TF; the origin of the remaining peaks remaining undetermined. Additional visualization methods allow for the study of both inter-TFBS spatial relationships and motif-flanking sequence properties, as demonstrated in case studies for TBP and ZNF143/THAP11.Topological properties of TFBS within ChIP-Seq datasets can be harnessed to better interpret regulatory sequences. Using GC content corrected TFBS over-representation analysis, combined with visualization techniques and analysis of the topological distribution of TFBS, we can distinguish peaks likely to be directly bound by a TF. The new methods will empower researchers for exploration of gene regulation and TF binding.

SUBMITTER: Worsley Hunt R 

PROVIDER: S-EPMC4082612 | BioStudies | 2014-01-01

REPOSITORIES: biostudies

Similar Datasets

1000-01-01 | S-EPMC4234207 | BioStudies
1000-01-01 | S-EPMC3764009 | BioStudies
1000-01-01 | S-EPMC3287483 | BioStudies
1000-01-01 | S-EPMC3747862 | BioStudies
2019-01-01 | S-EPMC6323897 | BioStudies
1000-01-01 | S-EPMC3300004 | BioStudies
1000-01-01 | S-EPMC2822526 | BioStudies
1000-01-01 | S-EPMC6323985 | BioStudies
2011-01-01 | S-EPMC3166302 | BioStudies
2018-01-01 | S-EPMC6184544 | BioStudies