Unknown

Dataset Information

0

Employing bimodal representations to predict DNA bendability within a self-supervised pre-trained framework.


ABSTRACT: The bendability of genomic DNA, which measures the DNA looping rate, is crucial for numerous biological processes of DNA. Recently, an advanced high-throughput technique known as 'loop-seq' has made it possible to measure the inherent cyclizability of DNA fragments. However, quantifying the bendability of large-scale DNA is costly, laborious, and time-consuming. To close the gap between rapidly evolving large language models and expanding genomic sequence information, and to elucidate the DNA bendability's impact on critical regulatory sequence motifs such as super-enhancers in the human genome, we introduce an innovative computational model, named MIXBend, to forecast the DNA bendability utilizing both nucleotide sequences and physicochemical properties. In MIXBend, a pre-trained language model DNABERT and convolutional neural network with attention mechanism are utilized to construct both sequence- and physicochemical-based extractors for the sophisticated refinement of DNA sequence representations. These bimodal DNA representations are then fed to a k-mer sequence-physicochemistry matching module to minimize the semantic gap between each modality. Lastly, a self-attention fusion layer is employed for the prediction of DNA bendability. In conclusion, the experimental results validate MIXBend's superior performance relative to other state-of-the-art methods. Additionally, MIXBend reveals both novel and known motifs from the yeast. Moreover, MIXBend discovers significant bendability fluctuations within super-enhancer regions and transcription factors binding sites in the human genome.

SUBMITTER: Yang M 

PROVIDER: S-EPMC11014357 | biostudies-literature | 2024 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Employing bimodal representations to predict DNA bendability within a self-supervised pre-trained framework.

Yang Minghao M   Zhang Shichen S   Zheng Zhihang Z   Zhang Pengfei P   Liang Yan Y   Tang Shaojun S  

Nucleic acids research 20240401 6


The bendability of genomic DNA, which measures the DNA looping rate, is crucial for numerous biological processes of DNA. Recently, an advanced high-throughput technique known as 'loop-seq' has made it possible to measure the inherent cyclizability of DNA fragments. However, quantifying the bendability of large-scale DNA is costly, laborious, and time-consuming. To close the gap between rapidly evolving large language models and expanding genomic sequence information, and to elucidate the DNA be  ...[more]

Similar Datasets

| S-EPMC10287612 | biostudies-literature
| S-EPMC11751634 | biostudies-literature
| S-EPMC8751220 | biostudies-literature
| S-EPMC9918003 | biostudies-literature
| S-EPMC10829170 | biostudies-literature
| S-EPMC11549734 | biostudies-literature
| S-EPMC8976979 | biostudies-literature
| S-EPMC9228203 | biostudies-literature
| S-EPMC11482777 | biostudies-literature
| S-EPMC9950464 | biostudies-literature