Decoding RNA binding protein's binding preferences using a general generative model
Ontology highlight
ABSTRACT: Current computational approaches struggle to detect the binding preference of novel or data-scarce RNA binding proteins (RBPs). Here we present Protein Guided Ribonucleic Acid Generator (ProRiboGen), an integrated generative diffusion model that predicts RBPs binding preferences directly from their protein sequences. ProRiboGen leverages insights from protein language models and CLIP-seq interaction data, generating RNA sequences that align with the binding preferences of target RBPs. We demonstrated its capacity for recognizing RNA-binding domains and cross-species generalization ability. Using this approach, we experimentally identified binding motifs for five human and zebrafish RBPs in the absence of CLIP-seq data. Furthermore, its binding site prediction was validated in vivo, revealing a previously uncharacterized intrinsic Cas13d-RNA interaction across the transcriptome, suggesting a potential link to off-target and bystander effects. This study advances our understanding of RBP-RNA interaction and highlights the model’s potential in RNA post-transcriptional regulation.
ORGANISM(S): Homo sapiens
PROVIDER: GSE304950 | GEO | 2026/03/01
REPOSITORIES: GEO
ACCESS DATA