Dataset Information

TVAE-RNA: ensemble-based RNA secondary structure prediction via transformer variational autoencoders.

ABSTRACT:

Motivation

Accurate prediction of RNA secondary structure remains challenging due to the presence of pseudoknots, long-range dependencies, and limited labeled data.

Results

We propose TVAE, a novel framework that integrates a Transformer encoder with a Variational Autoencoder (VAE). The Transformer captures global dependencies in the sequence, while the VAE models structural variability by learning a probabilistic latent space. Unlike deterministic models, TVAE generates diverse and biologically plausible secondary structures, enabling more comprehensive structure discovery. To obtain discrete predictions, we introduce GHA-Pairing, a fast and biologically constrained base-pairing algorithm. TVAE demonstrates strong generalization across different RNA families and achieves state-of-the-art performance on benchmark datasets, reaching an F1 score of 0.89 and 83% accuracy, surpassing existing methods by 10%. These results highlight the advantage of probabilistic modeling for RNA structure prediction and its potential to enhance biological insights.

Availability and implementation

Code and pretrained models are available at https://github.com/mei-rna/TVAE-RNA. The released version of the dataset and models can also be accessed via DOI: 10.5281/zenodo.16946114.

SUBMITTER: Mei X

PROVIDER: S-EPMC12640237 | biostudies-literature | 2025 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

TVAE-RNA: ensemble-based RNA secondary structure prediction via transformer variational autoencoders.

Mei Xiyuan X Liu Hanbo H Zhu Yuheng Y Zhao Enshuang E Li Longyi L Zhang Hao H

Bioinformatics (Oxford, England) 20251101 11

<h4>Motivation</h4>Accurate prediction of RNA secondary structure remains challenging due to the presence of pseudoknots, long-range dependencies, and limited labeled data.<h4>Results</h4>We propose TVAE, a novel framework that integrates a Transformer encoder with a Variational Autoencoder (VAE). The Transformer captures global dependencies in the sequence, while the VAE models structural variability by learning a probabilistic latent space. Unlike deterministic models, TVAE generates diverse a ...[more]

PMID: 40981507

Dataset Information

TVAE-RNA: ensemble-based RNA secondary structure prediction via transformer variational autoencoders.

Motivation

Results

Availability and implementation

Publications

TVAE-RNA: ensemble-based RNA secondary structure prediction via transformer variational autoencoders.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Ensemble-based prediction of RNA secondary structures.
| S-EPMC3750279 | biostudies-literature

Enhanced Generalizability of RNA Secondary Structure Prediction via Convolutional Block Attention Network and Ensemble Learning
| S-EPMC12388828 | biostudies-literature

RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble.
| S-EPMC1370799 | biostudies-literature

Characterization and visualization of RNA secondary structure Boltzmann ensemble via information theory.
| S-EPMC5836418 | biostudies-literature

Generating tertiary protein structures via interpretable graph variational autoencoders.
| S-EPMC9710582 | biostudies-literature

Bankruptcy prediction using ensemble of autoencoders optimized by genetic algorithm.
| S-EPMC10280414 | biostudies-literature

CoupleVAE: coupled variational autoencoders for predicting perturbational single-cell RNA sequencing data.
| S-EPMC11966612 | biostudies-literature

Biophysical modeling with variational autoencoders for bimodal, single-cell RNA sequencing data.
| S-EPMC9882246 | biostudies-literature

RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning.
| S-EPMC6881452 | biostudies-literature

RNA-SSPT: RNA Secondary Structure Prediction Tools.
| S-EPMC3819574 | biostudies-literature