Dataset Information

Oarfish: Enhanced probabilistic modeling leads to improved accuracy in long read transcriptome quantification.

ABSTRACT:

Motivation

Long read sequencing technology is becoming an increasingly indispensable tool in genomic and transcriptomic analysis. In transcriptomics in particular, long reads offer the possibility of sequencing full-length isoforms, which can vastly simplify the identification of novel transcripts and transcript quantification. However, despite this promise, the focus of much long read method development to date has been on transcript identification, with comparatively little attention paid to quantification. Yet, due to differences in the underlying protocols and technologies, lower throughput (i.e. fewer reads sequenced per sample compared to short read technologies), as well as technical artifacts, long read quantification remains a challenge, motivating the continued development and assessment of quantification methods tailored to this increasingly prevalent type of data.

Results

We introduce a new method and software tool for long read transcript quantification called oarfish. Our model incorporates a novel and innovative coverage score, which affects the conditional probability of fragment assignment in the underlying probabilistic model. We demonstrate that by accounting for this coverage information, oarfish is able to produce more accurate quantification estimates than existing long read quantification methods, particularly when one considers the primary isoforms present in a particular cell line or tissue type.

Availability and implementation

Oarfish is implemented in the Rust programming language, and is made available as free and open-source software under the BSD 3-clause license. The source code is available at https://www.github.com/COMBINE-lab/oarfish.

SUBMITTER: Jousheghani ZZ

PROVIDER: S-EPMC10925290 | biostudies-literature | 2024 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Oarfish: Enhanced probabilistic modeling leads to improved accuracy in long read transcriptome quantification.

Jousheghani Zahra Zare ZZ Patro Rob R

bioRxiv : the preprint server for biology 20240301

<h4>Motivation</h4>Long read sequencing technology is becoming an increasingly indispensable tool in genomic and transcriptomic analysis. In transcriptomics in particular, long reads offer the possibility of sequencing full-length isoforms, which can vastly simplify the identification of novel transcripts and transcript quantification. However, despite this promise, the focus of much long read method development to date has been on transcript identification, with comparatively little attention p ...[more]

PMID: 38464200

Dataset Information

Oarfish: Enhanced probabilistic modeling leads to improved accuracy in long read transcriptome quantification.

Motivation

Results

Availability and implementation

Publications

Oarfish: Enhanced probabilistic modeling leads to improved accuracy in long read transcriptome quantification.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Long-read sequencing transcriptome quantification with lr-kallisto.
| S-EPMC11275803 | biostudies-literature

Long-read transcriptome data for improved gene prediction in Lentinula edodes.
| S-EPMC5961913 | biostudies-literature

Improving PacBio long read accuracy by short read alignment.
| S-EPMC3464235 | biostudies-literature

LocusMasterTE: long-read assisted short-read RNA-seq TE quantification [long]
2023-09-01 | GSE225377 | GEO

Combining probabilistic alignments with read pair information improves accuracy of split-alignments.
| S-EPMC6198854 | biostudies-literature

LocusMasterTE: long-read assisted short-read TE quantification [short]
2023-09-01 | GSE225380 | GEO

LIQA: long-read isoform quantification and analysis.
| S-EPMC8212471 | biostudies-literature

A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification
2020-03-18 | GSE147118 | GEO

A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification
2019-06-15 | GSE132766 | GEO

LSCplus: a fast solution for improving long read accuracy by short read alignment.
| S-EPMC5103424 | biostudies-literature