Dataset Information

Alternating EM algorithm for a bilinear model in isoform quantification from RNA-seq data.

ABSTRACT:

Motivation

Estimation of isoform-level gene expression from RNA-seq data depends on simplifying assumptions, such as uniform read distribution, that are easily violated in real data. Such violations typically lead to biased estimates. Most existing methods provide bias correction step(s), which is based on biological considerations-such as GC content-and applied in single samples separately. The main problem is that not all biases are known.

Results

We have developed a novel method called XAEM based on a more flexible and robust statistical model. Existing methods are essentially based on a linear model Xβ, where the design matrix X is known and is computed based on the simplifying assumptions. In contrast XAEM considers Xβ as a bilinear model with both X and β unknown. Joint estimation of X and β is made possible by a simultaneous analysis of multi-sample RNA-seq data. Compared to existing methods, XAEM automatically performs empirical correction of potentially unknown biases. We use an alternating expectation-maximization (AEM) algorithm, alternating between estimation of X and β. For speed XAEM utilizes quasi-mapping for read alignment, thus leading to a fast algorithm. Overall XAEM performs favorably compared to recent advanced methods. For simulated datasets, XAEM obtains higher accuracy for multiple-isoform genes. In a differential-expression analysis of a real single-cell RNA-seq dataset, XAEM achieves substantially better rediscovery rates in independent validation sets.

Availability and implementation

The method and pipeline are implemented as a tool and freely available for use at http://fafner.meb.ki.se/biostatwiki/xaem/.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Deng W

PROVIDER: S-EPMC9883676 | biostudies-literature | 2020 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Alternating EM algorithm for a bilinear model in isoform quantification from RNA-seq data.

Deng Wenjiang W Mou Tian T Kalari Krishna R KR Niu Nifang N Wang Liewei L Pawitan Yudi Y Vu Trung Nghia TN

Bioinformatics (Oxford, England) 20200201 3

<h4>Motivation</h4>Estimation of isoform-level gene expression from RNA-seq data depends on simplifying assumptions, such as uniform read distribution, that are easily violated in real data. Such violations typically lead to biased estimates. Most existing methods provide bias correction step(s), which is based on biological considerations-such as GC content-and applied in single samples separately. The main problem is that not all biases are known.<h4>Results</h4>We have developed a novel metho ...[more]

PMID: 31400221

Dataset Information

Alternating EM algorithm for a bilinear model in isoform quantification from RNA-seq data.

Motivation

Results

Availability and implementation

Supplementary information

Publications

Alternating EM algorithm for a bilinear model in isoform quantification from RNA-seq data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Towards reliable isoform quantification using RNA-SEQ data.
| S-EPMC2863065 | biostudies-literature

Efficient RNA isoform identification and quantification from RNA-Seq data with network flows.
| S-EPMC4147886 | biostudies-literature

WemIQ: an accurate and robust isoform quantification method for RNA-seq data.
| S-EPMC4380033 | biostudies-literature

Network-Based Isoform Quantification with RNA-Seq Data for Cancer Transcriptome Analysis.
| S-EPMC4689380 | biostudies-literature

Simultaneous isoform discovery and quantification from RNA-seq.
| S-EPMC3718502 | biostudies-literature

Quantification of mutant-allele expression at isoform level in cancer from RNA-seq data.
| S-EPMC9278039 | biostudies-literature

Comparative evaluation of full-length isoform quantification from RNA-Seq.
| S-EPMC8145802 | biostudies-literature

A model for isoform-level differential expression analysis using RNA-seq data without pre-specifying isoform structure.
| S-EPMC9109925 | biostudies-literature

Simulation-based benchmarking of isoform quantification in single-cell RNA-seq.
| S-EPMC6223048 | biostudies-literature

Evaluation and comparison of computational tools for RNA-seq isoform quantification.
| S-EPMC5547501 | biostudies-other