Proteomics

Dataset Information

0

Mspire-Simulator: LC-MS shotgun proteomic simulator for creating realistic gold standard data - HEK cell lysate


ABSTRACT: The most important step in any quantitative proteomic pipeline is feature detection (aka peak picking). However, generating quality hand-annotated data sets to validate the algorithms, especially for lower abundance peaks, is nearly impossible. An alternative for creating gold standard data is to simulate it with features closely mimicking real data. We present Mspire-Simulator, a free, open source shotgun proteomic simulator that goes beyond previous simulation attempts by generating LC-MS features with realistic m/z and intensity variance along with other noise components. It also includes machine learned models for retention time and peak intensity prediction and a genetic algorithm to custom fit model parameters for experimental data sets. We show that these methods are applicable to data from three different mass spectrometers, including two fundamentally different types, and show visually and analytically that simulated peaks are nearly indistinguishable from actual data. Researchers can use simulated data to rigorously test quantitation software, and proteomic researchers may benefit from overlaying simulated data on actual data sets. While not directly relevant in this case, a search was conducted by Proteome-Discoverer v1.4 by both mascot and Sequest-HT. The parameters included 2 missed cleavages by Trypsin, carboamidomethylation of the Cysteines, Phosphorylations of STY residues, and Oxidations of HW residues, and at a precursor mass tolerance of 10 ppm.

INSTRUMENT(S): LTQ Orbitrap

ORGANISM(S): Homo Sapiens (human)

TISSUE(S): Hek-293t Cell

SUBMITTER: Ryan Taylor  

LAB HEAD: John T. Prince

PROVIDER: PXD000477 | Pride | 2020-01-24

REPOSITORIES: Pride

altmetric image

Publications

Mspire-Simulator: LC-MS shotgun proteomic simulator for creating realistic gold standard data.

Noyce Andrew B AB   Smith Rob R   Dalgleish James J   Taylor Ryan M RM   Erb K C KC   Okuda Nozomu N   Prince John T JT  

Journal of proteome research 20131003 12


The most important step in any quantitative proteomic pipeline is feature detection (aka peak picking). However, generating quality hand-annotated data sets to validate the algorithms, especially for lower abundance peaks, is nearly impossible. An alternative for creating gold standard data is to simulate it with features closely mimicking real data. We present Mspire-Simulator, a free, open-source shotgun proteomic simulator that goes beyond previous simulation attempts by generating LC-MS feat  ...[more]

Similar Datasets

2013-05-24 | GSE43462 | GEO
2013-05-24 | E-GEOD-43462 | biostudies-arrayexpress
2010-05-16 | E-GEOD-14241 | biostudies-arrayexpress
2010-05-16 | E-GEOD-14239 | biostudies-arrayexpress
2023-03-11 | PXD036178 | Pride
2020-12-31 | PXD023329 | iProX
2020-05-26 | MTBLS1455 | MetaboLights
2022-02-17 | PXD027654 | Pride
2016-07-21 | GSE72929 | GEO
2014-02-01 | E-GEOD-53748 | biostudies-arrayexpress