Unknown

Dataset Information

0

ExperimentSubset: an R package to manage subsets of Bioconductor Experiment objects.


ABSTRACT:

Motivation

R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance.

Results

To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets.

Availability and implementation

ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Sarfraz I 

PROVIDER: S-EPMC9940906 | biostudies-literature | 2021 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

ExperimentSubset: an R package to manage subsets of Bioconductor Experiment objects.

Sarfraz Irzam I   Asif Muhammad M   Campbell Joshua D JD  

Bioinformatics (Oxford, England) 20210901 18


<h4>Motivation</h4>R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-st  ...[more]

Similar Datasets

| S-EPMC9710567 | biostudies-literature
| S-EPMC6584971 | biostudies-literature
| S-EPMC4918025 | biostudies-other
| S-EPMC9048699 | biostudies-literature
| S-EPMC7904076 | biostudies-literature
| S-EPMC3128033 | biostudies-literature
| S-EPMC10582516 | biostudies-literature
| S-EPMC11479578 | biostudies-literature
| S-EPMC2777013 | biostudies-literature
| S-EPMC3892686 | biostudies-literature