Unknown

Dataset Information

0

HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data.


ABSTRACT: High-throughput sequencing of an organism's transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions back to the reference genome, especially in the case where a splice junction is previously unknown.Here we introduce HMMSplicer, an accurate and efficient algorithm for discovering canonical and non-canonical splice junctions in short read datasets. HMMSplicer identifies more splice junctions than currently available algorithms when tested on publicly available A. thaliana, P. falciparum, and H. sapiens datasets without a reduction in specificity.HMMSplicer was found to perform especially well in compact genomes and on genes with low expression levels, alternative splice isoforms, or non-canonical splice junctions. Because HHMSplicer does not rely on pre-built gene models, the products of inexact splicing are also detected. For H. sapiens, we find 3.6% of 3' splice sites and 1.4% of 5' splice sites are inexact, typically differing by 3 bases in either direction. In addition, HMMSplicer provides a score for every predicted junction allowing the user to set a threshold to tune false positive rates depending on the needs of the experiment. HMMSplicer is implemented in Python. Code and documentation are freely available at http://derisilab.ucsf.edu/software/hmmsplicer.

SUBMITTER: Dimon MT 

PROVIDER: S-EPMC2975632 | biostudies-literature | 2010 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data.

Dimon Michelle T MT   Sorber Katherine K   DeRisi Joseph L JL  

PloS one 20101108 11


<h4>Background</h4>High-throughput sequencing of an organism's transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions back to the reference genome, especially in the case where a splice junction is previously unknown.<h4>Methodology/principal findings</h4>Here we introduce HMMSplicer, an accurate and efficient algor  ...[more]

Similar Datasets

| S-EPMC6302956 | biostudies-literature
| S-EPMC2864574 | biostudies-literature
| S-EPMC2672628 | biostudies-literature
| S-EPMC2919714 | biostudies-literature
| S-EPMC3268599 | biostudies-literature
| S-EPMC2952873 | biostudies-other
| S-EPMC6307148 | biostudies-literature
| S-EPMC4889935 | biostudies-literature
| S-EPMC4878813 | biostudies-literature