SnoBIRD: A tool to identify C/D box snoRNAs and refine their annotation across all eukaryotes
Ontology highlight
ABSTRACT: Small nucleolar RNAs (snoRNAs), a group of noncoding RNAs present amongst all eukaryotes, are known for their regulation of ribosome biogenesis and splicing. Despite their central cellular roles, current snoRNA annotations remain incomplete. Indeed, several eukaryote annotations contain few or no snoRNAs, and none distinguishes expressed snoRNAs from their pseudogenes—a recently characterized snoRNA subclass with distinct features and expression levels. To address this, we developed SnoBIRD, a BERT-based C/D box snoRNA predictor trained on snoRNAs spanning all eukaryote kingdoms. We show that SnoBIRD outperforms existing tools in a test set environment and is the only predictor capable of identifying snoRNA pseudogenes using biologically relevant signal. Applied on the fission yeast and human genomes, we demonstrate that only SnoBIRD scales well with genome size in terms of runtime, and we identify and experimentally validate several new SnoBIRD-predicted C/D box snoRNAs. By running SnoBIRD on multiple eukaryote genomes, we identify hundreds of novel C/D box snoRNA candidates and highlight SnoBIRD’s usefulness to determine the evolutionary paths of snoRNAs that share a common host locus but are distributed across different species. Overall, SnoBIRD represents a user‑friendly and efficient tool for reliably predicting C/D box snoRNAs and their pseudogenes across any eukaryote kingdom.
ORGANISM(S): Mus musculus Schizosaccharomyces pombe Saccharomyces cerevisiae Gallus gallus Macaca mulatta
PROVIDER: GSE290579 | GEO | 2025/07/30
REPOSITORIES: GEO
ACCESS DATA