Dataset Information


Deshpande2019 - Random Forest model to predict long non-coding RNAs from coding RNAs in Zea Mays plant transcriptomic data

ABSTRACT: This is a Random Forest algorithm-based machine learning model to predict lncRNAs from coding mRNAs in plant transcriptomic data. The model assigns 1 for coding sequences and 2 for long non-coding sequences. The prediction is performed using a combination of Open Reading Frame (ORF) based, Sequence-based and Codon-bias features. Users need to download the curated ONNX model and also need to convert the sequences into feature matrix as mentioned in PLIT paper (Deshpande et al. 2019) to make predictions on sequences from Zea Mays sequence data.

SUBMITTER: Sumukh Deshpande  

PROVIDER: BIOMD0000001067 | BioModels | 2023-05-22


altmetric image


PLIT: An alignment-free computational tool for identification of long non-coding RNAs in plant transcriptomic datasets.

Deshpande Sumukh S   Shuttleworth James J   Yang Jianhua J   Taramonli Sandy S   England Matthew M  

Computers in biology and medicine 20190104

Long non-coding RNAs (lncRNAs) are a class of non-coding RNAs which play a significant role in several biological processes. RNA-seq based transcriptome sequencing has been extensively used for identification of lncRNAs. However, accurate identification of lncRNAs in RNA-seq datasets is crucial for exploring their characteristic functions in the genome as most coding potential computation (CPC) tools fail to accurately identify them in transcriptomic data. Well-known CPC tools such as CPC2, lncS  ...[more]

Similar Datasets

2015-01-01 | GSE52255 | GEO
2004-12-08 | GSE2044 | GEO
2008-06-12 | E-GEOD-2044 | biostudies-arrayexpress
2015-12-25 | GSE75290 | GEO
2017-09-01 | ST000891 | MetabolomicsWorkbench
2023-07-03 | BIOMD0000001073 | BioModels
| PRJNA587984 | ENA
2010-04-12 | E-GEOD-19937 | biostudies-arrayexpress
2019-03-12 | MTBLS650 | MetaboLights
2020-08-05 | GSE155682 | GEO