Dataset Information

High-confidence coding and noncoding transcriptome maps.

ABSTRACT: The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap 2.0, The Cancer Genome Atlas, and GTEx projects, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalog that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of noncoding genomes.

SUBMITTER: You BH

PROVIDER: S-EPMC5453319 | biostudies-literature | 2017 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

High-confidence coding and noncoding transcriptome maps.

You Bo-Hyun BH Yoon Sang-Ho SH Nam Jin-Wu JW

Genome research 20170410 6

The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining th ...[more]

PMID: 28396519

Dataset Information

High-confidence coding and noncoding transcriptome maps.

Publications

High-confidence coding and noncoding transcriptome maps.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

High-confidence Coding and Noncoding Transcriptome Maps
2017-04-01 | GSE97211 | GEO

High-confidence Coding and Noncoding Transcriptome Maps
| PRJNA381218 | ENA

The Project for High-Confidence Coding and Noncoding Transcriptome Maps
2017-04-01 | GSE97212 | GEO

The Project for High-Confidence Coding and Noncoding Transcriptome Maps
| PRJNA381216 | ENA

The coding and noncoding transcriptome of Neurospora crassa.
| S-EPMC5738166 | biostudies-literature

Confidence maps: statistical inference of cryo-EM maps.
| S-EPMC7137106 | biostudies-literature

Age and poverty status alter the coding and noncoding transcriptome.
| S-EPMC6402526 | biostudies-literature

Transcriptome-wide discovery of coding and noncoding RNA-binding proteins.
| S-EPMC5924899 | biostudies-literature

Correction to: The coding and noncoding transcriptome of Neurospora crassa.
| S-EPMC5935966 | biostudies-literature

UniBind: maps of high-confidence direct TF-DNA interactions across nine species.
| S-EPMC8236138 | biostudies-literature