Unknown

Dataset Information

0

Feature selection followed by a residuals-based normalization simplifies and improves single-cell gene expression analysis.


ABSTRACT: Normalization is a critical step in the computational analysis of single-cell RNA-sequencing (scRNA-seq) counts data. The objective is to reduce systematic biases introduced by technical sources that can obscure underlying biological differences. This is typically accomplished by re-scaling the observed counts to reduce the differences in total counts between the cells and then transforming the scaled counts to stabilize the variances. In the standard scRNA-seq workflow, this is followed by feature selection to identify genes that capture most of the biologically meaningful variation across the cells. Here, we propose a simple feature selection method and show that we can perform feature selection before normalization. We also propose a novel residuals-based normalization method that includes a monotonic non-linear transformation to ensure effective variance stabilization of the residuals. We demonstrate significant improvements in downstream clustering analyses through the application of our feature selection and normalization methods to truth-known biological as well as simulated counts data sets. Based on these results, we make the case for a revised scRNA-seq analysis workflow wherein feature selection precedes and in fact informs our residuals-based normalization. This novel workflow has been implemented in an R package called Piccolo.

SUBMITTER: Singh A 

PROVIDER: S-EPMC10849523 | biostudies-literature | 2024 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Feature selection followed by a novel residuals-based normalization simplifies and improves single-cell gene expression analysis.

Singh Amartya A   Khiabanian Hossein H  

bioRxiv : the preprint server for biology 20240509


Normalization is a crucial step in the analysis of single-cell RNA-sequencing (scRNA-seq) counts data. Its principal objectives are to reduce the systematic biases primarily introduced through technical sources and to transform the data to make it more amenable for application of established statistical frameworks. In the standard workflows, normalization is followed by feature selection to identify highly variable genes (HVGs) that capture most of the biologically meaningful variation across th  ...[more]

Similar Datasets

| S-EPMC11290295 | biostudies-literature
| S-EPMC8644062 | biostudies-literature
| S-EPMC8419999 | biostudies-literature
| S-EPMC11229035 | biostudies-literature
| S-EPMC10418209 | biostudies-literature
| S-EPMC11429615 | biostudies-literature
| S-EPMC5972664 | biostudies-literature
| S-EPMC5209828 | biostudies-literature
| S-EPMC9029158 | biostudies-literature
| S-EPMC10379039 | biostudies-literature