Unknown

Dataset Information

0

MIXnorm: normalizing RNA-seq data from formalin-fixed paraffin-embedded samples.


ABSTRACT:

Motivation

Recent studies have shown that RNA-sequencing (RNA-seq) can be used to measure mRNA of sufficient quality extracted from formalin-fixed paraffin-embedded (FFPE) tissues to provide whole-genome transcriptome analysis. However, little attention has been given to the normalization of FFPE RNA-seq data, a key step that adjusts for unwanted biological and technical effects that can bias the signal of interest. Existing methods, developed based on fresh-frozen or similar-type samples, may cause suboptimal performance.

Results

We proposed a new normalization method, labeled MIXnorm, for FFPE RNA-seq data. MIXnorm relies on a two-component mixture model, which models non-expressed genes by zero-inflated Poisson distributions and models expressed genes by truncated normal distributions. To obtain maximum likelihood estimates, we developed a nested EM algorithm, in which closed-form updates are available in each iteration. By eliminating the need for numerical optimization in the M-step, the algorithm is easy to implement and computationally efficient. We evaluated MIXnorm through simulations and cancer studies. MIXnorm makes a significant improvement over commonly used methods for RNA-seq expression data.

Availability and implementation

R code available at https://github.com/S-YIN/MIXnorm.

Contact

swang@smu.edu.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Yin S 

PROVIDER: S-EPMC7267832 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC8024626 | biostudies-literature
| S-EPMC5658167 | biostudies-literature
| S-EPMC6800129 | biostudies-literature
| S-EPMC6836741 | biostudies-literature
| S-EPMC8744681 | biostudies-literature
| S-EPMC8643707 | biostudies-literature
| S-EPMC3511535 | biostudies-literature
| S-EPMC4298956 | biostudies-literature
| S-EPMC6498859 | biostudies-literature