Dataset Information

SDA: a semi-parametric differential abundance analysis method for metabolomics and proteomics data.

ABSTRACT: BACKGROUND:Identifying differentially abundant features between different experimental groups is a common goal for many metabolomics and proteomics studies. However, analyzing data from mass spectrometry (MS) is difficult because the data may not be normally distributed and there is often a large fraction of zero values. Although several statistical methods have been proposed, they either require the data normality assumption or are inefficient. RESULTS:We propose a new semi-parametric differential abundance analysis (SDA) method for metabolomics and proteomics data from MS. The method considers a two-part model, a logistic regression for the zero proportion and a semi-parametric log-linear model for the possibly non-normally distributed non-zero values, to characterize data from each feature. A kernel-smoothed likelihood method is developed to estimate model coefficients and a likelihood ratio test is constructed for differential abundant analysis. The method has been implemented into an R package, SDAMS, which is available at https://www.bioconductor.org/packages/release/bioc/html/SDAMS.html . CONCLUSION:By introducing the two-part semi-parametric model, SDA is able to handle both non-normally distributed data and large fraction of zero values in a MS dataset. It also allows for adjustment of covariates. Simulations and real data analyses demonstrate that SDA outperforms existing methods.

SUBMITTER: Li Y

PROVIDER: S-EPMC6798423 | biostudies-literature | 2019 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

SDA: a semi-parametric differential abundance analysis method for metabolomics and proteomics data.

Li Yuntong Y Fan Teresa W M TWM Lane Andrew N AN Kang Woo-Young WY Arnold Susanne M SM Stromberg Arnold J AJ Wang Chi C Chen Li L

BMC bioinformatics 20191017 1

<h4>Background</h4>Identifying differentially abundant features between different experimental groups is a common goal for many metabolomics and proteomics studies. However, analyzing data from mass spectrometry (MS) is difficult because the data may not be normally distributed and there is often a large fraction of zero values. Although several statistical methods have been proposed, they either require the data normality assumption or are inefficient.<h4>Results</h4>We propose a new semi-param ...[more]

PMID: 31623550

Similar Datasets

Project description:In the era of open-modification search engines, more posttranslational modifications than ever can be detected by LC-MS/MS-based proteomics. This development can switch proteomics research into a higher gear, as PTMs are key in many cellular pathways important in cell proliferation, migration, metastasis, and aging. However, despite these advances in modification identification, statistical methods for PTM-level quantification and differential analysis have yet to catch up. This absence can partly be explained by statistical challenges inherent to the data, such as the confounding of PTM intensities with its parent protein abundance. Therefore, we have developed msqrob2PTM, a new workflow in the msqrob2 universe capable of differential abundance analysis at the PTM and at the peptidoform level. The latter is important for validating PTMs found as significantly differential. Indeed, as our method can deal with multiple PTMs per peptidoform, there is a possibility that significant PTMs stem from one significant peptidoform carrying another PTM, hinting that it might be the other PTM driving the perceived differential abundance. Our workflows can flag both differential peptidoform abundance (DPA) and differential peptidoform usage (DPU). This enables a distinction between direct assessment of differential abundance of peptidoforms (DPA) and differences in the relative usage of peptidoforms corrected for corresponding protein abundances (DPU). For DPA, we directly model the log2-transformed peptidoform intensities, while for DPU, we correct for parent protein abundance by an intermediate normalization step which calculates the log2-ratio of the peptidoform intensities to their summarized parent protein intensities. We demonstrated the utility and performance of msqrob2PTM by applying it to datasets with known ground truth, as well as to biological PTM-rich datasets. Our results show that msqrob2PTM is on par with, or surpassing the performance of, the current state-of-the-art methods. Moreover, msqrob2PTM is currently unique in providing output at the peptidoform level.

Project description:Early evaluation of new drug entities for their potential to cause mitochondrial dysfunction is becoming an important task for drug development. Multi-parametric high-content screening (mp-HCS) of mitochondrial toxicity holds promise as a lead in-vitro strategy for drug testing and safety evaluations. In this study, we have developed a mp-HCS and multi-parametric data analysis scheme for assessing cell responses to induced mitochondrial perturbation. The mp-HCS measurements are shown to be robust enough to allow for quantitative comparison of biological systems with different metabolic pathways simulated by alteration of growth media. Substitution of medium glucose for galactose sensitized cells to drug action and revealed novel response parameters. Each compound was quantitatively characterized according to induced phenotypic changes of cell morphology and functionality measured by fluorescent biomarkers for mitochondrial activity, plasma membrane permeability, and nuclear morphology. Descriptors of drug effects were established by generation of a SCRIT (Specialized-Cell-Response-to-Induced-Toxicity) vector, consisting of normalized statistical measures of each parameter at each dose and growth condition. The dimensionality of SCRIT vectors depends on the number of parameters chosen, which in turn depends on the hypothesis being tested. Specifically, incorporation of three parameters of response into SCRIT vectors enabled clustering of 84 training compounds with known pharmacological and toxicological activities according to the degree of toxicity and mitochondrial involvement. Inclusion of 6 parameters enabled the resolution of more subtle differences between compounds within a common therapeutic class; scoring enabled a ranking of statins in direct agreement with clinical outcomes. Comparison of drug-induced changes required variations in glucose for separation of mitochondrial dysfunction from other types of cytotoxicity. These results also demonstrate that the number of drugs in a training set, the choice of parameters used in analysis, and statistical measures are fundamental for specific hypothesis testing and assessment of quantitative phenotypic differences.

Project description:In the analysis of semi-competing risks data interest lies in estimation and inference with respect to a so-called non-terminal event, the observation of which is subject to a terminal event. Multi-state models are commonly used to analyse such data, with covariate effects on the transition/intensity functions typically specified via the Cox model and dependence between the non-terminal and terminal events specified, in part, by a unit-specific shared frailty term. To ensure identifiability, the frailties are typically assumed to arise from a parametric distribution, specifically a Gamma distribution with mean 1.0 and variance, say, σ2. When the frailty distribution is misspecified, however, the resulting estimator is not guaranteed to be consistent, with the extent of asymptotic bias depending on the discrepancy between the assumed and true frailty distributions. In this paper, we propose a novel class of transformation models for semi-competing risks analysis that permit the non-parametric specification of the frailty distribution. To ensure identifiability, the class restricts to parametric specifications of the transformation and the error distribution; the latter are flexible, however, and cover a broad range of possible specifications. We also derive the semi-parametric efficient score under the complete data setting and propose a non-parametric score imputation method to handle right censoring; consistency and asymptotic normality of the resulting estimators is derived and small-sample operating characteristics evaluated via simulation. Although the proposed semi-parametric transformation model and non-parametric score imputation method are motivated by the analysis of semi-competing risks data, they are broadly applicable to any analysis of multivariate time-to-event outcomes in which a unit-specific shared frailty is used to account for correlation. Finally, the proposed model and estimation procedures are applied to a study of hospital readmission among patients diagnosed with pancreatic cancer.

Dataset Information

SDA: a semi-parametric differential abundance analysis method for metabolomics and proteomics data.

Publications

SDA: a semi-parametric differential abundance analysis method for metabolomics and proteomics data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets