Unknown

Dataset Information

0

RNA secondary structure packages evaluated and improved by high-throughput experiments.


ABSTRACT: Despite the popularity of computer-aided study and design of RNA molecules, little is known about the accuracy of commonly used structure modeling packages in tasks sensitive to ensemble properties of RNA. Here, we demonstrate that the EternaBench dataset, a set of more than 20,000 synthetic RNA constructs designed on the RNA design platform Eterna, provides incisive discriminative power in evaluating current packages in ensemble-oriented structure prediction tasks. We find that CONTRAfold and RNAsoft, packages with parameters derived through statistical learning, achieve consistently higher accuracy than more widely used packages in their standard settings, which derive parameters primarily from thermodynamic experiments. We hypothesized that training a multitask model with the varied data types in EternaBench might improve inference on ensemble-based prediction tasks. Indeed, the resulting model, named EternaFold, demonstrated improved performance that generalizes to diverse external datasets including complete messenger RNAs, viral genomes probed in human cells and synthetic designs modeling mRNA vaccines.

SUBMITTER: Wayment-Steele HK 

PROVIDER: S-EPMC9839360 | biostudies-literature | 2022 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

RNA secondary structure packages evaluated and improved by high-throughput experiments.

Wayment-Steele Hannah K HK   Kladwang Wipapat W   Strom Alexandra I AI   Lee Jeehyung J   Treuille Adrien A   Becka Alex A   Das Rhiju R  

Nature methods 20221003 10


Despite the popularity of computer-aided study and design of RNA molecules, little is known about the accuracy of commonly used structure modeling packages in tasks sensitive to ensemble properties of RNA. Here, we demonstrate that the EternaBench dataset, a set of more than 20,000 synthetic RNA constructs designed on the RNA design platform Eterna, provides incisive discriminative power in evaluating current packages in ensemble-oriented structure prediction tasks. We find that CONTRAfold and R  ...[more]

Similar Datasets

| S-EPMC4987914 | biostudies-literature
| S-EPMC4869249 | biostudies-literature
2016-11-01 | GSE78208 | GEO
| S-EPMC7327253 | biostudies-literature
| S-EPMC5389523 | biostudies-literature
| S-EPMC10159002 | biostudies-literature
| S-EPMC4564351 | biostudies-literature
| S-EPMC8002575 | biostudies-literature
| S-EPMC3476334 | biostudies-literature
| S-EPMC4245970 | biostudies-literature