Unknown

Dataset Information

0

BERT-TFBS: a novel BERT-based model for predicting transcription factor binding sites by transfer learning.


ABSTRACT: Transcription factors (TFs) are proteins essential for regulating genetic transcriptions by binding to transcription factor binding sites (TFBSs) in DNA sequences. Accurate predictions of TFBSs can contribute to the design and construction of metabolic regulatory systems based on TFs. Although various deep-learning algorithms have been developed for predicting TFBSs, the prediction performance needs to be improved. This paper proposes a bidirectional encoder representations from transformers (BERT)-based model, called BERT-TFBS, to predict TFBSs solely based on DNA sequences. The model consists of a pre-trained BERT module (DNABERT-2), a convolutional neural network (CNN) module, a convolutional block attention module (CBAM) and an output module. The BERT-TFBS model utilizes the pre-trained DNABERT-2 module to acquire the complex long-term dependencies in DNA sequences through a transfer learning approach, and applies the CNN module and the CBAM to extract high-order local features. The proposed model is trained and tested based on 165 ENCODE ChIP-seq datasets. We conducted experiments with model variants, cross-cell-line validations and comparisons with other models. The experimental results demonstrate the effectiveness and generalization capability of BERT-TFBS in predicting TFBSs, and they show that the proposed model outperforms other deep-learning models. The source code for BERT-TFBS is available at https://github.com/ZX1998-12/BERT-TFBS.

SUBMITTER: Wang K 

PROVIDER: S-EPMC11066948 | biostudies-literature | 2024 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

BERT-TFBS: a novel BERT-based model for predicting transcription factor binding sites by transfer learning.

Wang Kai K   Zeng Xuan X   Zhou Jingwen J   Liu Fei F   Luan Xiaoli X   Wang Xinglong X  

Briefings in bioinformatics 20240301 3


Transcription factors (TFs) are proteins essential for regulating genetic transcriptions by binding to transcription factor binding sites (TFBSs) in DNA sequences. Accurate predictions of TFBSs can contribute to the design and construction of metabolic regulatory systems based on TFs. Although various deep-learning algorithms have been developed for predicting TFBSs, the prediction performance needs to be improved. This paper proposes a bidirectional encoder representations from transformers (BE  ...[more]

Similar Datasets

| S-EPMC11442149 | biostudies-literature
| S-EPMC3898213 | biostudies-literature
| S-EPMC10712318 | biostudies-literature
| S-EPMC7972936 | biostudies-literature
| S-EPMC8474956 | biostudies-literature
| S-EPMC8197256 | biostudies-literature
| S-EPMC11562833 | biostudies-literature
| S-EPMC6726224 | biostudies-literature
| S-EPMC1570149 | biostudies-literature
| S-EPMC5481346 | biostudies-literature