Dataset Information

STAR: ultrafast universal RNA-seq aligner.

ABSTRACT:

Motivation

Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases.

Results

To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy.

Availability and implementation

STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

SUBMITTER: Dobin A

PROVIDER: S-EPMC3530905 | biostudies-literature | 2013 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

STAR: ultrafast universal RNA-seq aligner.

Dobin Alexander A Davis Carrie A CA Schlesinger Felix F Drenkow Jorg J Zaleski Chris C Jha Sonali S Batut Philippe P Chaisson Mark M Gingeras Thomas R TR

Bioinformatics (Oxford, England) 20121025 1

<h4>Motivation</h4>Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases.<h4>Results</h4>To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we develo ...[more]

PMID: 23104886

Dataset Information

STAR: ultrafast universal RNA-seq aligner.

Motivation

Results

Availability and implementation

Publications

STAR: ultrafast universal RNA-seq aligner.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Hot-starting software containers for STAR aligner.
| S-EPMC6131214 | biostudies-literature

Magic-BLAST, an accurate RNA-seq aligner for long and short reads.
| S-EPMC6659269 | biostudies-literature

UClncR: Ultrafast and comprehensive long non-coding RNA detection from RNA-seq.
| S-EPMC5660178 | biostudies-literature

RNA-seq-based identification of Star upregulation by islet amyloid formation.
| S-EPMC6908819 | biostudies-literature

Ultrafast functional profiling of RNA-seq data for nonmodel organisms.
| S-EPMC8015844 | biostudies-literature

Highly sensitive and ultrafast read mapping for RNA-seq analysis.
| S-EPMC4833417 | biostudies-literature

Evaluation of STAR and Kallisto on Single Cell RNA-Seq Data Alignment.
| S-EPMC7202009 | biostudies-literature

Secuer: Ultrafast, scalable and accurate clustering of single-cell RNA-seq data.
| S-EPMC9754601 | biostudies-literature

STAR+WASP reduces reference bias in the allele-specific mapping of RNA-seq reads.
| S-EPMC10871176 | biostudies-literature

CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data.
| S-EPMC5371246 | biostudies-literature