Unknown

Dataset Information

0

Ribonanza: deep learning of RNA structure through dual crowdsourcing.


ABSTRACT: Prediction of RNA structure from sequence remains an unsolved problem, and progress has been slowed by a paucity of experimental data. Here, we present Ribonanza, a dataset of chemical mapping measurements on two million diverse RNA sequences collected through Eterna and other crowdsourced initiatives. Ribonanza measurements enabled solicitation, training, and prospective evaluation of diverse deep neural networks through a Kaggle challenge, followed by distillation into a single, self-contained model called RibonanzaNet. When fine tuned on auxiliary datasets, RibonanzaNet achieves state-of-the-art performance in modeling experimental sequence dropout, RNA hydrolytic degradation, and RNA secondary structure, with implications for modeling RNA tertiary structure.

SUBMITTER: He S 

PROVIDER: S-EPMC10925082 | biostudies-literature | 2024 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Ribonanza: deep learning of RNA structure through dual crowdsourcing.

He Shujun S   Huang Rui R   Townley Jill J   Kretsch Rachael C RC   Karagianes Thomas G TG   Cox David B T DBT   Cox David B T DBT   Blair Hamish H   Penzar Dmitry D   Vyaltsev Valeriy V   Aristova Elizaveta E   Zinkevich Arsenii A   Bakulin Artemy A   Sohn Hoyeol H   Krstevski Daniel D   Fukui Takaaki T   Tatematsu Fumiya F   Uchida Yusuke Y   Jang Donghoon D   Lee Jun Seong JS   Shieh Roger R   Ma Tom T   Martynov Eduard E   Shugaev Maxim V MV   Bukhari Habib S T HST   Fujikawa Kazuki K   Onodera Kazuki K   Henkel Christof C   Ron Shlomo S   Romano Jonathan J   Nicol John J JJ   Nye Grace P GP   Wu Yuan Y   Choe Christian C   Reade Walter W   Das Rhiju R  

bioRxiv : the preprint server for biology 20240611


Prediction of RNA structure from sequence remains an unsolved problem, and progress has been slowed by a paucity of experimental data. Here, we present Ribonanza, a dataset of chemical mapping measurements on two million diverse RNA sequences collected through Eterna and other crowdsourced initiatives. Ribonanza measurements enabled solicitation, training, and prospective evaluation of diverse deep neural networks through a Kaggle challenge, followed by distillation into a single, self-contained  ...[more]

Similar Datasets

| S-EPMC9771809 | biostudies-literature
| S-EPMC8528079 | biostudies-literature
| S-EPMC9829186 | biostudies-literature
| S-EPMC7490675 | biostudies-literature
| S-EPMC9580944 | biostudies-literature
| S-EPMC8838716 | biostudies-literature
| S-EPMC7878809 | biostudies-literature
| S-EPMC11702515 | biostudies-literature
| S-EPMC8860580 | biostudies-literature
| S-EPMC9938198 | biostudies-literature