Unknown

Dataset Information

0

Understanding sequencing data as compositions: an outlook and review.


ABSTRACT:

Motivation

Although seldom acknowledged explicitly, count data generated by sequencing platforms exist as compositions for which the abundance of each component (e.g. gene or transcript) is only coherently interpretable relative to other components within that sample. This property arises from the assay technology itself, whereby the number of counts recorded for each sample is constrained by an arbitrary total sum (i.e. library size). Consequently, sequencing data, as compositional data, exist in a non-Euclidean space that, without normalization or transformation, renders invalid many conventional analyses, including distance measures, correlation coefficients and multivariate statistical models.

Results

The purpose of this review is to summarize the principles of compositional data analysis (CoDA), provide evidence for why sequencing data are compositional, discuss compositionally valid methods available for analyzing sequencing data, and highlight future directions with regard to this field of study.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Quinn TP 

PROVIDER: S-EPMC6084572 | biostudies-literature | 2018 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

Understanding sequencing data as compositions: an outlook and review.

Quinn Thomas P TP   Erb Ionas I   Richardson Mark F MF   Crowley Tamsyn M TM  

Bioinformatics (Oxford, England) 20180801 16


<h4>Motivation</h4>Although seldom acknowledged explicitly, count data generated by sequencing platforms exist as compositions for which the abundance of each component (e.g. gene or transcript) is only coherently interpretable relative to other components within that sample. This property arises from the assay technology itself, whereby the number of counts recorded for each sample is constrained by an arbitrary total sum (i.e. library size). Consequently, sequencing data, as compositional data  ...[more]

Similar Datasets

| S-EPMC11827019 | biostudies-literature
| S-EPMC7880888 | biostudies-literature
| S-EPMC5513693 | biostudies-literature
| S-EPMC3944661 | biostudies-other
| S-EPMC9137692 | biostudies-literature
| S-EPMC6936759 | biostudies-literature
| S-EPMC10019394 | biostudies-literature
| S-EPMC9579741 | biostudies-literature
| S-EPMC11850509 | biostudies-literature
| S-EPMC8105794 | biostudies-literature