Unknown

Dataset Information

0

MncR: Late Integration Machine Learning Model for Classification of ncRNA Classes Using Sequence and Structural Encoding.


ABSTRACT: Non-coding RNA (ncRNA) classes take over important housekeeping and regulatory functions and are quite heterogeneous in terms of length, sequence conservation and secondary structure. High-throughput sequencing reveals that the expressed novel ncRNAs and their classification are important to understand cell regulation and identify potential diagnostic and therapeutic biomarkers. To improve the classification of ncRNAs, we investigated different approaches of utilizing primary sequences and secondary structures as well as the late integration of both using machine learning models, including different neural network architectures. As input, we used the newest version of RNAcentral, focusing on six ncRNA classes, including lncRNA, rRNA, tRNA, miRNA, snRNA and snoRNA. The late integration of graph-encoded structural features and primary sequences in our MncR classifier achieved an overall accuracy of >97%, which could not be increased by more fine-grained subclassification. In comparison to the actual best-performing tool ncRDense, we had a minimal increase of 0.5% in all four overlapping ncRNA classes on a similar test set of sequences. In summary, MncR is not only more accurate than current ncRNA prediction tools but also allows the prediction of long ncRNA classes (lncRNAs, certain rRNAs) up to 12.000 nts and is trained on a more diverse ncRNA dataset retrieved from RNAcentral.

SUBMITTER: Dunkel H 

PROVIDER: S-EPMC10218863 | biostudies-literature | 2023 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

MncR: Late Integration Machine Learning Model for Classification of ncRNA Classes Using Sequence and Structural Encoding.

Dunkel Heiko H   Wehrmann Henning H   Jensen Lars R LR   Kuss Andreas W AW   Simm Stefan S  

International journal of molecular sciences 20230517 10


Non-coding RNA (ncRNA) classes take over important housekeeping and regulatory functions and are quite heterogeneous in terms of length, sequence conservation and secondary structure. High-throughput sequencing reveals that the expressed novel ncRNAs and their classification are important to understand cell regulation and identify potential diagnostic and therapeutic biomarkers. To improve the classification of ncRNAs, we investigated different approaches of utilizing primary sequences and secon  ...[more]

Similar Datasets

| S-EPMC4896366 | biostudies-literature
| S-EPMC9893444 | biostudies-literature
2021-06-02 | GSE175942 | GEO
| S-EPMC11382266 | biostudies-literature
| S-EPMC3834213 | biostudies-literature
| S-EPMC4330382 | biostudies-literature
2021-06-01 | GSE171549 | GEO
| S-EPMC9890099 | biostudies-literature
| S-EPMC6425755 | biostudies-literature
| S-EPMC9983029 | biostudies-literature