Project description:Benchmarking Proteomics Quantitation in DIA-type data using real patient material to create a benchmark dataset comprising inter-patient heterogeneity
Project description:Recent advances in liquid chromatography–mass spectrometry (LC-MS) have accelerated the adoption of high-throughput workflows that deliver deep proteome coverage using minimal sample amounts. This trend is largely driven by single-cell proteomics, where sensitivity and reproducibility are essential. Here, we extend our previous benchmark dataset (PXD028735) that was generated using next-generation LC-MS platforms optimized for rapid proteome analysis. With shorter LC gradients and lower sample amounts, we generated an extensive DDA/DIA dataset on a standardized human-yeast-E. coli hybrid proteome. This new dataset includes data acquired by the Orbitrap Astral, which combines an Orbitrap with a time-of-flight (TOF) mass analyzer, and features new scanning quadrupole-based implementations, extending coverage across different instruments and acquisition strategies. Our comprehensive evaluation highlights how technological advances and reduced LC gradients affect proteome depth, quantitative precision, and cross-instrument consistency. The release of this benchmark dataset via ProteomeXchange (PXD070049), allows for the acceleration of cross-platform algorithm development, enhance data mining strategies, and support the continued standardization of short-gradient, high-throughput LC-MS-based proteomics.
Project description:To unbiasedly evaluate the quantitative performance of different quantitative methods, and compare different popular proteomics data processing workflows, we prepared a benchmark dataset where the various levels of spikeed-in E. Coli proteome that true fold change (i.e. 1 fold, 1.5 fold, 2 fold, 2.5 fold and 3 fold) and true identities of positives/negatives (i.e. E.Coli proteins are true positives while Human proteins are true negatives) are known. To best mimic the proteomics application in comparison of multiple replicates, each fold change group contains 4 replicates, so there are 20 LC-MS/MS analysis in this benchmark dataset. To our knowledge, this spike-in benchmark dataset is largest-scale ever that encompasses 5 different spike level, >500 true positive proteins, and >3000 true negative proteins (2peptide criteria, 1% protein FDR), with a wide concentration dynamic range. The dataset is ideal to test quantitative accuracy, precision, false-positive biomarker discovery and missing data level.
Project description:In the last decade, a revolution in liquid chromatography-mass spectrometry (LC-MS) based proteomics was unfolded with the introduction of dozens of novel instruments that incorporate additional data dimensions through innovative acquisition methodologies, in turn inspiring specialized data analysis pipelines. Simultaneously, a growing number of proteomics datasets have been made publicly available through data repositories such as ProteomeXchange, Zenodo and Skyline Panorama. However, developing algorithms to mine this data and assessing the performance on different platforms is currently hampered by the lack of a single benchmark experimental design. Therefore, we acquired a hybrid proteome mixture on different instrument platforms and in all currently available families of data acquisition. Here, we present a comprehensive Data-Dependent and Data-Independent Acquisition (DDA/DIA) dataset acquired using several of the most commonly used current day instrumental platforms. The dataset consists of over 700 LC-MS runs, including adequate replicates allowing robust statistics and covering over nearly 10 different data formats, including scanning quadrupole and ion mobility enabled acquisitions. Datasets are available via ProteomeXchange (PXD028735).