Dataset Information

Compound models and Pearson residuals for normalization of single-cell RNA-seq data without UMIs.

ABSTRACT: Before downstream analysis can reveal biological signals in single-cell RNA sequencing data, normalization and variance stabilization are required to remove technical noise. Recently, Pearson residuals based on negative binomial models have been suggested as an efficient normalization approach. These methods were developed for UMI-based sequencing protocols, where unique molecular identifiers (UMIs) help to remove PCR amplification noise by keeping track of the original molecules. In contrast, full-length protocols such as Smart-seq2 lack UMIs and retain amplification noise, making negative binomial models inapplicable. Here, we extend Pearson residuals to such read count data by modeling them as a compound process: we assume that the captured RNA molecules follow the negative binomial distribution, but are replicated according to an amplification distribution. Based on this model, we introduce compound Pearson residuals and show that they can be analytically obtained without explicit knowledge of the amplification distribution. Further, we demonstrate that compound Pearson residuals lead to a biologically meaningful gene selection and low-dimensional embeddings of complex Smart-seq2 datasets. Finally, we empirically study amplification distributions across several sequencing protocols, and suggest that they can be described by a broken power law. We show that the resulting compound distribution captures overdispersion and zero-inflation patterns characteristic of read count data. In summary, compound Pearson residuals provide an efficient and effective way to normalize read count data based on simple mechanistic assumptions.

SUBMITTER: Lause J

PROVIDER: S-EPMC10418209 | biostudies-literature | 2023 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Compound models and Pearson residuals for single-cell RNA-seq data without UMIs.

Lause Jan J Ziegenhain Christoph C Hartmanis Leonard L Berens Philipp P Kobak Dmitry D

bioRxiv : the preprint server for biology 20240725

Recent work employed Pearson residuals from Poisson or negative binomial models to normalize UMI data. To extend this approach to non-UMI data, we model the additional amplification step with a compound distribution: we assume that sequenced RNA molecules follow a negative binomial distribution, and are then replicated following an amplification distribution. We show how this model leads to compound Pearson residuals, which yield meaningful gene selection and embeddings of Smart-seq2 datasets. F ...[more]

PMID: 37577688

Dataset Information

Compound models and Pearson residuals for normalization of single-cell RNA-seq data without UMIs.

Publications

Compound models and Pearson residuals for single-cell RNA-seq data without UMIs.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data.
| S-EPMC8419999 | biostudies-literature

Quantile normalization of single-cell RNA-seq read counts without unique molecular identifiers.
| S-EPMC7333325 | biostudies-literature

Assessment of Single Cell RNA-Seq Normalization Methods.
| S-EPMC5499114 | biostudies-literature

SCnorm: robust normalization of single-cell RNA-seq data.
| S-EPMC5473255 | biostudies-literature

RUV-III-NB: normalization of single cell RNA-seq data.
| S-EPMC9458465 | biostudies-literature

Non-linear Normalization for Non-UMI Single Cell RNA-Seq.
| S-EPMC8063035 | biostudies-literature

Normalization and noise reduction for single cell RNA-seq experiments.
| S-EPMC4481848 | biostudies-literature

PsiNorm: a scalable normalization for single-cell RNA-seq data.
| S-EPMC8696108 | biostudies-literature

Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey.
| S-EPMC7019105 | biostudies-literature

scKWARN: Kernel-weighted-average robust normalization for single-cell RNA-seq data.
| S-EPMC10868328 | biostudies-literature