Unknown

Dataset Information

0

Bayesian Hierarchical Model for Protein Identifications.


ABSTRACT: In proteomics, identification of proteins from complex mixtures of proteins extracted from biological samples is an important problem. Among the experimental technologies, Mass-Spectrometry (MS) is the most popular one. Protein identification from MS data typically relies on a "two-step" procedure of identifying the peptide first followed by the separate protein identification procedure next. In this setup, the interdependence of peptides and proteins are neglected resulting in relatively inaccurate protein identification. In this article, we propose a Markov chain Monte Carlo (MCMC) based Bayesian hierarchical model, a first of its kind in protein identification, which integrates the two steps and performs joint analysis of proteins and peptides using posterior probabilities. We remove the assumption of independence of proteins by using clustering group priors to the proteins based on the assumption that proteins sharing the same biological pathway are likely to be present or absent together and are correlated. The complete conditionals of the proposed joint model being tractable, we propose and implement a Gibbs sampling scheme for full posterior inference that provides the estimation and statistical uncertainties of all relevant parameters. The model has better operational characteristics compared to two existing "one-step" procedures on a range of simulation settings as well as on two well-studied datasets.

SUBMITTER: Mitra R 

PROVIDER: S-EPMC6519717 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

altmetric image

Publications

Bayesian Hierarchical Model for Protein Identifications.

Mitra Riten R   Gill Ryan R   Sikdar Sinjini S   Datta Susmita S  

Journal of applied statistics 20180325 1


In proteomics, identification of proteins from complex mixtures of proteins extracted from biological samples is an important problem. Among the experimental technologies, Mass-Spectrometry (MS) is the most popular one. Protein identification from MS data typically relies on a "two-step" procedure of identifying the peptide first followed by the separate protein identification procedure next. In this setup, the interdependence of peptides and proteins are neglected resulting in relatively inaccu  ...[more]

Similar Datasets

| S-EPMC5891374 | biostudies-literature
| S-EPMC3153957 | biostudies-literature
| S-EPMC5064853 | biostudies-literature
| S-EPMC4627701 | biostudies-literature
| S-EPMC3063681 | biostudies-literature
| S-EPMC8626927 | biostudies-literature
| S-EPMC3042776 | biostudies-literature
| S-EPMC7094348 | biostudies-literature
| S-EPMC3731670 | biostudies-literature
| S-EPMC7523653 | biostudies-literature