Unknown

Dataset Information

0

A weighted two-stage sequence alignment framework to identify motifs from ChIP-exo data.


ABSTRACT: In this study, we introduce TESA (weighted two-stage alignment), an innovative motif prediction tool that refines the identification of DNA-binding protein motifs, essential for deciphering transcriptional regulatory mechanisms. Unlike traditional algorithms that rely solely on sequence data, TESA integrates the high-resolution chromatin immunoprecipitation (ChIP) signal, specifically from ChIP-exonuclease (ChIP-exo), by assigning weights to sequence positions, thereby enhancing motif discovery. TESA employs a nuanced approach combining a binomial distribution model with a graph model, further supported by a "bookend" model, to improve the accuracy of predicting motifs of varying lengths. Our evaluation, utilizing an extensive compilation of 90 prokaryotic ChIP-exo datasets from proChIPdb and 167 H. sapiens datasets, compared TESA's performance against seven established tools. The results indicate TESA's improved precision in motif identification, suggesting its valuable contribution to the field of genomic research.

SUBMITTER: Li Y 

PROVIDER: S-EPMC10935504 | biostudies-literature | 2024 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

A weighted two-stage sequence alignment framework to identify motifs from ChIP-exo data.

Li Yang Y   Wang Yizhong Y   Wang Cankun C   Ma Anjun A   Ma Qin Q   Liu Bingqiang B  

Patterns (New York, N.Y.) 20240202 3


In this study, we introduce TESA (weighted two-stage alignment), an innovative motif prediction tool that refines the identification of DNA-binding protein motifs, essential for deciphering transcriptional regulatory mechanisms. Unlike traditional algorithms that rely solely on sequence data, TESA integrates the high-resolution chromatin immunoprecipitation (ChIP) signal, specifically from ChIP-exonuclease (ChIP-exo), by assigning weights to sequence positions, thereby enhancing motif discovery.  ...[more]

Similar Datasets

| S-EPMC4611657 | biostudies-literature
| S-EPMC6054642 | biostudies-literature
| S-EPMC7672471 | biostudies-literature
| S-EPMC2812490 | biostudies-literature
| S-EPMC6902276 | biostudies-literature
| S-EPMC3305290 | biostudies-literature
| S-EPMC4227761 | biostudies-literature
| S-EPMC11362851 | biostudies-literature
| S-EPMC6735894 | biostudies-literature
| S-EPMC3989762 | biostudies-literature