Dataset Information

Probabilistic error correction for RNA sequencing.

ABSTRACT: Sequencing of RNAs (RNA-Seq) has revolutionized the field of transcriptomics, but the reads obtained often contain errors. Read error correction can have a large impact on our ability to accurately assemble transcripts. This is especially true for de novo transcriptome analysis, where a reference genome is not available. Current read error correction methods, developed for DNA sequence data, cannot handle the overlapping effects of non-uniform abundance, polymorphisms and alternative splicing. Here we present SEquencing Error CorrEction in Rna-seq data (SEECER), a hidden Markov Model (HMM)-based method, which is the first to successfully address these problems. SEECER efficiently learns hundreds of thousands of HMMs and uses these to correct sequencing errors. Using human RNA-Seq data, we show that SEECER greatly improves on previous methods in terms of quality of read alignment to the genome and assembly accuracy. To illustrate the usefulness of SEECER for de novo transcriptome studies, we generated new RNA-Seq data to study the development of the sea cucumber Parastichopus parvimensis. Our corrected assembled transcripts shed new light on two important stages in sea cucumber development. Comparison of the assembled transcripts to known transcripts in other species has also revealed novel transcripts that are unique to sea cucumber, some of which we have experimentally validated. Supporting website: http://sb.cs.cmu.edu/seecer/.

SUBMITTER: Le HS

PROVIDER: S-EPMC3664804 | biostudies-literature | 2013 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Probabilistic error correction for RNA sequencing.

Le Hai-Son HS Schulz Marcel H MH McCauley Brenna M BM Hinman Veronica F VF Bar-Joseph Ziv Z

Nucleic acids research 20130404 10

Sequencing of RNAs (RNA-Seq) has revolutionized the field of transcriptomics, but the reads obtained often contain errors. Read error correction can have a large impact on our ability to accurately assemble transcripts. This is especially true for de novo transcriptome analysis, where a reference genome is not available. Current read error correction methods, developed for DNA sequence data, cannot handle the overlapping effects of non-uniform abundance, polymorphisms and alternative splicing. H ...[more]

PMID: 23558750

Dataset Information

Probabilistic error correction for RNA sequencing.

Publications

Probabilistic error correction for RNA sequencing.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

454 antibody sequencing - error characterization and correction.
| S-EPMC3228814 | biostudies-other

Sequencing error correction without a reference genome.
| S-EPMC3879328 | biostudies-literature

Evaluating the impact of sequencing error correction for RNA-seq data with ERCC RNA spike-in controls.
| S-EPMC4983418 | biostudies-literature

MeCorS: Metagenome-enabled error correction of single cell sequencing reads.
| S-EPMC4937190 | biostudies-literature

Lighter: fast and memory-efficient sequencing error correction without counting.
| S-EPMC4248469 | biostudies-literature

Efficient error correction for next-generation sequencing of viral amplicons.
| S-EPMC3382444 | biostudies-literature

Error correction of high-throughput sequencing datasets with non-uniform coverage.
| S-EPMC3117386 | biostudies-literature

A systematic comparison of error correction enzymes by next-generation sequencing.
| S-EPMC5587813 | biostudies-literature

LCAT: an isoform-sensitive error correction for transcriptome sequencing long reads
| S-EPMC10245045 | biostudies-literature

An error correction strategy for image reconstruction by DNA sequencing microscopy.
| S-EPMC10899105 | biostudies-literature