Transcriptomics

Dataset Information

0

SnoBIRD: A tool to identify C/D box snoRNAs and refine their annotation across all eukaryotes


ABSTRACT: Small nucleolar RNAs (snoRNAs), a group of noncoding RNAs present amongst all eukaryotes, are known for their regulation of ribosome biogenesis and splicing. Despite their central cellular roles, current snoRNA annotations remain incomplete. Indeed, several eukaryote annotations contain few or no snoRNAs, and none distinguishes expressed snoRNAs from their pseudogenes—a recently characterized snoRNA subclass with distinct features and expression levels. To address this, we developed SnoBIRD, a BERT-based C/D box snoRNA predictor trained on snoRNAs spanning all eukaryote kingdoms. We show that SnoBIRD outperforms existing tools in a test set environment and is the only predictor capable of identifying snoRNA pseudogenes using biologically relevant signal. Applied on the fission yeast and human genomes, we demonstrate that only SnoBIRD scales well with genome size in terms of runtime, and we identify and experimentally validate several new SnoBIRD-predicted C/D box snoRNAs. By running SnoBIRD on multiple eukaryote genomes, we identify hundreds of novel C/D box snoRNA candidates and highlight SnoBIRD’s usefulness to determine the evolutionary paths of snoRNAs that share a common host locus but are distributed across different species. Overall, SnoBIRD represents a user‑friendly and efficient tool for reliably predicting C/D box snoRNAs and their pseudogenes across any eukaryote kingdom.

ORGANISM(S): Mus musculus Schizosaccharomyces pombe Saccharomyces cerevisiae Gallus gallus Macaca mulatta

PROVIDER: GSE290579 | GEO | 2025/07/30

REPOSITORIES: GEO

Dataset's files

Source:
Action DRS
Other
Items per page:
1 - 1 of 1

Similar Datasets

2014-07-01 | E-GEOD-55946 | biostudies-arrayexpress
2014-07-01 | GSE55946 | GEO
2021-12-08 | GSE181496 | GEO
2010-12-04 | GSE25028 | GEO
2013-05-08 | E-GEOD-43666 | biostudies-arrayexpress
2022-05-01 | E-MTAB-10529 | biostudies-arrayexpress
2022-05-18 | GSE184173 | GEO
2015-03-20 | GSE67050 | GEO
2013-05-08 | GSE43666 | GEO
2015-04-23 | E-MTAB-3209 | biostudies-arrayexpress