Unknown

Dataset Information

0

SDA: a semi-parametric differential abundance analysis method for metabolomics and proteomics data.


ABSTRACT: BACKGROUND:Identifying differentially abundant features between different experimental groups is a common goal for many metabolomics and proteomics studies. However, analyzing data from mass spectrometry (MS) is difficult because the data may not be normally distributed and there is often a large fraction of zero values. Although several statistical methods have been proposed, they either require the data normality assumption or are inefficient. RESULTS:We propose a new semi-parametric differential abundance analysis (SDA) method for metabolomics and proteomics data from MS. The method considers a two-part model, a logistic regression for the zero proportion and a semi-parametric log-linear model for the possibly non-normally distributed non-zero values, to characterize data from each feature. A kernel-smoothed likelihood method is developed to estimate model coefficients and a likelihood ratio test is constructed for differential abundant analysis. The method has been implemented into an R package, SDAMS, which is available at https://www.bioconductor.org/packages/release/bioc/html/SDAMS.html . CONCLUSION:By introducing the two-part semi-parametric model, SDA is able to handle both non-normally distributed data and large fraction of zero values in a MS dataset. It also allows for adjustment of covariates. Simulations and real data analyses demonstrate that SDA outperforms existing methods.

SUBMITTER: Li Y 

PROVIDER: S-EPMC6798423 | biostudies-literature | 2019 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

SDA: a semi-parametric differential abundance analysis method for metabolomics and proteomics data.

Li Yuntong Y   Fan Teresa W M TWM   Lane Andrew N AN   Kang Woo-Young WY   Arnold Susanne M SM   Stromberg Arnold J AJ   Wang Chi C   Chen Li L  

BMC bioinformatics 20191017 1


<h4>Background</h4>Identifying differentially abundant features between different experimental groups is a common goal for many metabolomics and proteomics studies. However, analyzing data from mass spectrometry (MS) is difficult because the data may not be normally distributed and there is often a large fraction of zero values. Although several statistical methods have been proposed, they either require the data normality assumption or are inefficient.<h4>Results</h4>We propose a new semi-param  ...[more]

Similar Datasets

| S-EPMC4017673 | biostudies-literature
| S-EPMC2876132 | biostudies-literature
| S-EPMC3471932 | biostudies-literature
| S-EPMC5400113 | biostudies-literature
| S-EPMC4481852 | biostudies-other
| S-EPMC7032029 | biostudies-literature
| S-EPMC4086067 | biostudies-literature