Dataset Information

Anchor: trans-cell type prediction of transcription factor binding sites.

ABSTRACT: The ENCyclopedia of DNA Elements (ENCODE) consortium has generated transcription factor (TF) binding ChIP-seq data covering hundreds of TF proteins and cell types; however, due to limits on time and resources, only a small fraction of all possible TF-cell type pairs have been profiled. One solution is to build machine learning models trained on currently available epigenomic data sets that can be applied to the remaining missing pairs. A major challenge is that TF binding sites are cell-type-specific, which can be attributed to cellular contexts such as chromatin accessibility. Meanwhile, indirect TF-DNA binding and interactions between TFs complicate this regulatory process. Technical issues such as sequencing biases and batch effects render the prediction task even more challenging. Many pioneering efforts have been made to predict TF binding profiles based on DNA sequence and DNase-seq footprints, but to what extent a model can be generalized to completely untested cell conditions remains unknown. In this study, we describe our first place solution to the 2017 ENCODE-DREAM in vivo TF binding site prediction challenge. By carefully addressing multisource biases and information imbalance across cell types, we created a pipeline that significantly outperforms the current state-of-the-art methods. The proposed method is sufficiently complex enough to model nonlinear interactions between TF binding motifs and chromatin accessibility information up to 1500 bp from the genomic region of interest.

SUBMITTER: Li H

PROVIDER: S-EPMC6360811 | biostudies-other | 2019 Feb

REPOSITORIES: biostudies-other

ACCESS DATA

Publications

Anchor: trans-cell type prediction of transcription factor binding sites.

Li Hongyang H Quang Daniel D Guan Yuanfang Y

Genome research 20181219 2

The ENCyclopedia of DNA Elements (ENCODE) consortium has generated transcription factor (TF) binding ChIP-seq data covering hundreds of TF proteins and cell types; however, due to limits on time and resources, only a small fraction of all possible TF-cell type pairs have been profiled. One solution is to build machine learning models trained on currently available epigenomic data sets that can be applied to the remaining missing pairs. A major challenge is that TF binding sites are cell-type-spe ...[more]

PMID: 30567711

Dataset Information

Anchor: trans-cell type prediction of transcription factor binding sites.

Publications

Anchor: trans-cell type prediction of transcription factor binding sites.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Similar Datasets

Accurate prediction of cell type-specific transcription factor binding.
| S-EPMC6327544 | biostudies-literature

Cell-type specificity of ChIP-predicted transcription factor binding sites.
| S-EPMC3574057 | biostudies-literature

Transcription factor binding sites prediction based on modified nucleosomes.
| S-EPMC3931712 | biostudies-literature

Distinct properties of cell-type-specific and shared transcription factor binding sites.
| S-EPMC3811135 | biostudies-literature

Reliable prediction of transcription factor binding sites by phylogenetic verification.
| S-EPMC1283155 | biostudies-literature

Prediction of nucleosome positioning based on transcription factor binding sites.
| S-EPMC2931695 | biostudies-other

Genome-wide prediction of transcription factor binding sites using an integrated model.
| S-EPMC2847719 | biostudies-literature

Binding site graphs: a new graph theoretical framework for prediction of transcription factor binding sites.
| S-EPMC1866359 | biostudies-literature

Exploiting ancestral mammalian genomes for the prediction of human transcription factor binding sites.
| S-EPMC3526440 | biostudies-literature

An efficient algorithm for improving structure-based prediction of transcription factor binding sites.
| S-EPMC5514533 | biostudies-literature