Genomics

Dataset Information

0

Evaluation of methods for modeling transcription factor sequence specificity


ABSTRACT: Genomic analyses often involve scanning for potential transcription-factor (TF) binding sites using models of the sequence specificity of DNA binding proteins. Many approaches have been developed to model and learn a protein’s binding specificity by representing sequence motifs, including the gaps and dependencies between binding-site residues, but these methods have not been systematically compared. Here we applied 26 such approaches to in vitro protein binding microarray data for 66 mouse TFs belonging to various families. For 9 TFs, we also scored the resulting motif models on in vivo data, and found that the best in vitro–derived motifs performed similarly to motifs derived from in vivo data. Our results indicate that simple models based on mononucleotide position weight matrices learned by the best methods perform similarly to more complex models for most TFs examined, but fall short in specific cases. In addition, the best-performing motifs typically have relatively low information content, consistent with widespread degeneracy in eukaryotic TF sequence preferences.

ORGANISM(S): synthetic construct

PROVIDER: GSE42864 | GEO | 2012/12/13

SECONDARY ACCESSION(S): PRJNA183675

REPOSITORIES: GEO

Similar Datasets

2012-12-13 | E-GEOD-42864 | biostudies-arrayexpress
2013-03-29 | E-GEOD-44437 | biostudies-arrayexpress
2013-03-29 | GSE44437 | GEO
2013-03-29 | GSE44436 | GEO
2023-12-31 | GSE227873 | GEO
2013-05-17 | GSE47026 | GEO
2013-05-17 | E-GEOD-47026 | biostudies-arrayexpress
2015-04-23 | E-GEOD-65719 | biostudies-arrayexpress
2013-03-29 | E-GEOD-44436 | biostudies-arrayexpress
2021-03-24 | GSE145090 | GEO