Unknown

Dataset Information

0

DNA sequence models of genome-wide Drosophila melanogaster Polycomb binding sites improve generalization to independent Polycomb Response Elements.


ABSTRACT: Polycomb Response Elements (PREs) are cis-regulatory DNA elements that maintain gene transcription states through DNA replication and mitosis. PREs have little sequence similarity, but are enriched in a number of sequence motifs. Previous methods for modelling Drosophila melanogaster PRE sequences (PREdictor and EpiPredictor) have used a set of 7 motifs and a training set of 12 PREs and 16-23 non-PREs. Advances in experimental methods for mapping chromatin binding factors and modifications has led to the publication of several genome-wide sets of Polycomb targets. In addition to the seven motifs previously used, PREs are enriched in the GTGT motif, recently associated with the sequence-specific DNA binding protein Combgap. We investigated whether models trained on genome-wide Polycomb sites generalize to independent PREs when trained with control sequences generated by naive PRE models and including the GTGT motif. We also developed a new PRE predictor: SVM-MOCCA. Training PRE predictors with genome-wide experimental data improves generalization to independent data, and SVM-MOCCA predicts the majority of PREs in three independent experimental sets. We present 2908 candidate PREs enriched in sequence and chromatin signatures. 2412 of these are also enriched in H3K4me1, a mark of Trithorax activated chromatin, suggesting that PREs/TREs have a common sequence code.

SUBMITTER: Bredesen BA 

PROVIDER: S-EPMC6735708 | biostudies-literature | 2019 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

DNA sequence models of genome-wide Drosophila melanogaster Polycomb binding sites improve generalization to independent Polycomb Response Elements.

Bredesen Bjørn André BA   Rehmsmeier Marc M  

Nucleic acids research 20190901 15


Polycomb Response Elements (PREs) are cis-regulatory DNA elements that maintain gene transcription states through DNA replication and mitosis. PREs have little sequence similarity, but are enriched in a number of sequence motifs. Previous methods for modelling Drosophila melanogaster PRE sequences (PREdictor and EpiPredictor) have used a set of 7 motifs and a training set of 12 PREs and 16-23 non-PREs. Advances in experimental methods for mapping chromatin binding factors and modifications has l  ...[more]

Similar Datasets

| S-EPMC3401425 | biostudies-literature
| S-EPMC3261919 | biostudies-literature
| S-EPMC4157523 | biostudies-literature
| S-EPMC4258113 | biostudies-literature
| S-EPMC3852391 | biostudies-other
| S-EPMC507012 | biostudies-literature
| S-EPMC3080135 | biostudies-literature
| S-EPMC2573935 | biostudies-literature
| S-EPMC1941339 | biostudies-literature