Unknown

Dataset Information

0

Removal of batch effects using distribution-matching residual networks.


ABSTRACT:

Motivation

Sources of variability in experimentally derived data include measurement error in addition to the physical phenomena of interest. This measurement error is a combination of systematic components, originating from the measuring instrument and random measurement errors. Several novel biological technologies, such as mass cytometry and single-cell RNA-seq (scRNA-seq), are plagued with systematic errors that may severely affect statistical analysis if the data are not properly calibrated.

Results

We propose a novel deep learning approach for removing systematic batch effects. Our method is based on a residual neural network, trained to minimize the Maximum Mean Discrepancy between the multivariate distributions of two replicates, measured in different batches. We apply our method to mass cytometry and scRNA-seq datasets, and demonstrate that it effectively attenuates batch effects.

Availability and implementation

our codes and data are publicly available at https://github.com/ushaham/BatchEffectRemoval.git.

Contact

yuval.kluger@yale.edu.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Shaham U 

PROVIDER: S-EPMC5870543 | biostudies-literature | 2017 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

Removal of batch effects using distribution-matching residual networks.

Shaham Uri U   Stanton Kelly P KP   Zhao Jun J   Li Huamin H   Raddassi Khadir K   Montgomery Ruth R   Kluger Yuval Y  

Bioinformatics (Oxford, England) 20170801 16


<h4>Motivation</h4>Sources of variability in experimentally derived data include measurement error in addition to the physical phenomena of interest. This measurement error is a combination of systematic components, originating from the measuring instrument and random measurement errors. Several novel biological technologies, such as mass cytometry and single-cell RNA-seq (scRNA-seq), are plagued with systematic errors that may severely affect statistical analysis if the data are not properly ca  ...[more]

Similar Datasets

| S-EPMC10827218 | biostudies-literature
| S-EPMC9477887 | biostudies-literature
| S-EPMC9364370 | biostudies-literature
| S-EPMC11925496 | biostudies-literature
| S-EPMC7472465 | biostudies-literature
| S-EPMC10350225 | biostudies-literature
| S-EPMC11441315 | biostudies-literature
| S-EPMC6152897 | biostudies-literature
| S-EPMC8888767 | biostudies-literature
| S-EPMC5821853 | biostudies-literature