Unknown

Dataset Information

0

Genome-guided transcript assembly by integrative analysis of RNA sequence data.


ABSTRACT: The identification of full length transcripts entirely from short-read RNA sequencing data (RNA-seq) remains a challenge in the annotation of genomes. Here we describe an automated pipeline for genome annotation that integrates RNA-seq and gene-boundary data sets, which we call Generalized RNA Integration Tool, or GRIT. Applying GRIT to Drosophila melanogaster short-read RNA-seq, cap analysis of gene expression (CAGE) and poly(A)-site-seq data collected for the modENCODE project, we recovered the vast majority of previously annotated transcripts and doubled the total number of transcripts cataloged. We found that 20% of protein coding genes encode multiple protein-localization signals and that, in 20-d-old adult fly heads, genes with multiple polyadenylation sites are more common than genes with alternative splicing or alternative promoters. GRIT demonstrates 30% higher precision and recall than the most widely used transcript assembly tools. GRIT will facilitate the automated generation of high-quality genome annotations without the need for extensive manual annotation.

SUBMITTER: Boley N 

PROVIDER: S-EPMC4037530 | biostudies-literature | 2014 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Genome-guided transcript assembly by integrative analysis of RNA sequence data.

Boley Nathan N   Stoiber Marcus H MH   Booth Benjamin W BW   Wan Kenneth H KH   Hoskins Roger A RA   Bickel Peter J PJ   Celniker Susan E SE   Brown James B JB  

Nature biotechnology 20140316 4


The identification of full length transcripts entirely from short-read RNA sequencing data (RNA-seq) remains a challenge in the annotation of genomes. Here we describe an automated pipeline for genome annotation that integrates RNA-seq and gene-boundary data sets, which we call Generalized RNA Integration Tool, or GRIT. Applying GRIT to Drosophila melanogaster short-read RNA-seq, cap analysis of gene expression (CAGE) and poly(A)-site-seq data collected for the modENCODE project, we recovered th  ...[more]

Similar Datasets

2022-01-08 | GSE189482 | GEO
| S-EPMC9245221 | biostudies-literature
| S-EPMC4395069 | biostudies-literature
| S-EPMC5720828 | biostudies-literature
| S-EPMC7849386 | biostudies-literature
| S-EPMC4971760 | biostudies-literature
| S-EPMC7462077 | biostudies-literature
| S-EPMC7336458 | biostudies-literature
| PRJNA783213 | ENA
| S-EPMC3674385 | biostudies-other