Unknown

Dataset Information

0

Identification of protein coding regions in RNA transcripts.


ABSTRACT: Massive parallel sequencing of RNA transcripts by next-generation technology (RNA-Seq) generates critically important data for eukaryotic gene discovery. Gene finding in transcripts can be done by statistical (alignment-free) as well as by alignment-based methods. We describe a new tool, GeneMarkS-T, for ab initio identification of protein-coding regions in RNA transcripts. The algorithm parameters are estimated by unsupervised training which makes unnecessary manually curated preparation of training sets. We demonstrate that (i) the unsupervised training is robust with respect to the presence of transcripts assembly errors and (ii) the accuracy of GeneMarkS-T in identifying protein-coding regions and, particularly, in predicting translation initiation sites in modelled as well as in assembled transcripts compares favourably to other existing methods.

SUBMITTER: Tang S 

PROVIDER: S-EPMC4499116 | biostudies-literature | 2015 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Identification of protein coding regions in RNA transcripts.

Tang Shiyuyun S   Lomsadze Alexandre A   Borodovsky Mark M  

Nucleic acids research 20150413 12


Massive parallel sequencing of RNA transcripts by next-generation technology (RNA-Seq) generates critically important data for eukaryotic gene discovery. Gene finding in transcripts can be done by statistical (alignment-free) as well as by alignment-based methods. We describe a new tool, GeneMarkS-T, for ab initio identification of protein-coding regions in RNA transcripts. The algorithm parameters are estimated by unsupervised training which makes unnecessary manually curated preparation of tra  ...[more]

Similar Datasets

| S-EPMC8138839 | biostudies-literature
| S-EPMC3582448 | biostudies-literature
| S-EPMC4227794 | biostudies-literature
| S-EPMC19553 | biostudies-literature
| S-EPMC5389649 | biostudies-literature
| S-EPMC5356595 | biostudies-literature
| S-EPMC5850464 | biostudies-literature
| S-EPMC519121 | biostudies-literature