Unknown

Dataset Information

0

Oarfish: Enhanced probabilistic modeling leads to improved accuracy in long read transcriptome quantification.


ABSTRACT:

Motivation

Long read sequencing technology is becoming an increasingly indispensable tool in genomic and transcriptomic analysis. In transcriptomics in particular, long reads offer the possibility of sequencing full-length isoforms, which can vastly simplify the identification of novel transcripts and transcript quantification. However, despite this promise, the focus of much long read method development to date has been on transcript identification, with comparatively little attention paid to quantification. Yet, due to differences in the underlying protocols and technologies, lower throughput (i.e. fewer reads sequenced per sample compared to short read technologies), as well as technical artifacts, long read quantification remains a challenge, motivating the continued development and assessment of quantification methods tailored to this increasingly prevalent type of data.

Results

We introduce a new method and software tool for long read transcript quantification called oarfish. Our model incorporates a novel and innovative coverage score, which affects the conditional probability of fragment assignment in the underlying probabilistic model. We demonstrate that by accounting for this coverage information, oarfish is able to produce more accurate quantification estimates than existing long read quantification methods, particularly when one considers the primary isoforms present in a particular cell line or tissue type.

Availability and implementation

Oarfish is implemented in the Rust programming language, and is made available as free and open-source software under the BSD 3-clause license. The source code is available at https://www.github.com/COMBINE-lab/oarfish.

SUBMITTER: Jousheghani ZZ 

PROVIDER: S-EPMC10925290 | biostudies-literature | 2024 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

Oarfish: Enhanced probabilistic modeling leads to improved accuracy in long read transcriptome quantification.

Jousheghani Zahra Zare ZZ   Patro Rob R  

bioRxiv : the preprint server for biology 20240301


<h4>Motivation</h4>Long read sequencing technology is becoming an increasingly indispensable tool in genomic and transcriptomic analysis. In transcriptomics in particular, long reads offer the possibility of sequencing full-length isoforms, which can vastly simplify the identification of novel transcripts and transcript quantification. However, despite this promise, the focus of much long read method development to date has been on transcript identification, with comparatively little attention p  ...[more]

Similar Datasets

| S-EPMC11275803 | biostudies-literature
| S-EPMC5961913 | biostudies-literature
| S-EPMC3464235 | biostudies-literature
2023-09-01 | GSE225377 | GEO
| S-EPMC6198854 | biostudies-literature
2023-09-01 | GSE225380 | GEO
| S-EPMC8212471 | biostudies-literature
2020-03-18 | GSE147118 | GEO
2019-06-15 | GSE132766 | GEO
| S-EPMC5103424 | biostudies-literature