Project description:Benchmarking Proteomics Quantitation in DIA-type data using real patient material to create a benchmark dataset comprising inter-patient heterogeneity
Project description:Proteomic workflows based on nanoLC-MS/MS data-dependent-acquisition analysis have progressed tremendously in recent years due to the technical improvement of mass spectrometers, and now allow to extensively characterize complex protein mixtures. High-resolution and fast sequencing instruments have enabled the use of label-free quantitative methods, which appear as an attractive way to analyze differential protein expression in complex biological samples. Classical label-free quantitative workflows are based either on spectral counting of MS/MS sequencing scans for each protein, or on the extraction of peptide ion peak area values in the LC-MS map composed of all the survey MS scans acquired during the chromatographic gradient. However, the computational processing of the data for label-free quantification still remains a challenge. Here, we provide a dual proteomic standard composed of an equimolar mixture of 48 human proteins (Sigma UPS1) spiked at different concentrations into a background of yeast cell lysate, that was used to benchmark several label-free quantitative workflows, involving different software packages developed in recent years. This experimental design allowed to finely assess their performances in terms of sensitivity and false discovery rate, by measuring the number of true and false-positive (respectively UPS1 or yeast background proteins found as differential). This dataset can also be used to benchmark other label-free workflows, adjust software parameter settings, improve algorithms for extraction of the quantitative metrics from raw MS data, or evaluate downstream statistical methods
Project description:Data independent acquisition (DIA) has become a well-established method in LC-MS driven proteomics. Nonetheless, there are still a lot of possibilities at the data analysis level. By benchmarking different DIA analysis workflows through a ground truth sample mimicking real differential abundance samples, consisting of a differential spike-in of UPS2 in a constant yeast background, we provide a roadmap for DIA data analysis of shotgun samples based on whether sensitivity, precision or accuracy is of the essence. Three different commonly used DIA software tools (DIA-NN, EncyclopeDIA and SpectronautTM) were tested in both spectral library mode and spectral library free mode. In spectral library mode we used the independent spectral library prediction tools Prosit and MS2PIP together with DeepLC, next to the classical DDA-based spectral libraries. In total we benchmarked 12 DIA workflows. DIA-NN in library free mode or using in silico predicted libraries shows the highest sensitivity maintaining a high reproducibility and accuracy.
Project description:Data-independent mass spectrometry is the method of choice for deep, consistent and accurate single-shot profiling in bottom-up proteomics. While classic workflows required auxiliary DDA-MS analysis of subject samples to derive prior knowledge spectral libraries for targeted quantification from DIA-MS maps, library-free approaches based on in silico predicted libraries promise deep DIA-MS profiling with reduced experimental effort and cost. Coverage and sensitivity in such analyses, however, is limited, in part, by large library size and persistent deviations from experimental data. We present MSLibrarian, a workflow and tool to obtain optimized predicted spectral libraries by the integrated usage of spectrum-centric DIA data interpretation via the DIA-Umpire approach to inform and calibrate the in silico predicted library approach. Predicted-vs-observed comparisons enable optimization of intensity prediction parameters, calibration of retention time prediction for deviating chromatographic setups and optimization of library scope and sample representativeness. Benchmarking via a dedicated ground-truth-embedded species mixture experiment and quantitative ratio-validation confirms gains of up to 9 % on precursor and 7 % protein level at equivalent FDR control and validation criteria. MSLibrarian has been implemented as open-source R software package and, with step-by-step usage instructions, is availabe at https://github.com/MarcIsak/MSLibrarian.
Project description:To unbiasedly evaluate the quantitative performance of different quantitative methods, and compare different popular proteomics data processing workflows, we prepared a benchmark dataset where the various levels of spikeed-in E. Coli proteome that true fold change (i.e. 1 fold, 1.5 fold, 2 fold, 2.5 fold and 3 fold) and true identities of positives/negatives (i.e. E.Coli proteins are true positives while Human proteins are true negatives) are known. To best mimic the proteomics application in comparison of multiple replicates, each fold change group contains 4 replicates, so there are 20 LC-MS/MS analysis in this benchmark dataset. To our knowledge, this spike-in benchmark dataset is largest-scale ever that encompasses 5 different spike level, >500 true positive proteins, and >3000 true negative proteins (2peptide criteria, 1% protein FDR), with a wide concentration dynamic range. The dataset is ideal to test quantitative accuracy, precision, false-positive biomarker discovery and missing data level.
Project description:This repository is related to the work "Benchmarking commonly usedsoftware suites and analysis workflows for DIA proteomics and phosphoproteomics". The following files are stored here: 1. four MS datasets generated in this work, including two two-species benchmark datasets acquired on QE HF and timsTOF Pro, and two TNF-alpha induced phosphoproteomics datasets acquired on QE HF-X and timsTOF Pro; 2. all spectral libraries used in this work; 3. raw MS data search reports with analysis logs included if software reported; 4. preprocessed MS data search reports for any result from HF and TIMS DIA benchmark datasets; 5. FASTAs used in this work; 6. a description file (Description_of_iProX_and_public_data.xlsx) for the stored files in this repository and other public data used in this work. Note: 3 and 4 are packed into one zip file for each data search experiment.
Project description:To further development of our gene expression approach to HCC development, we have employed whole genome microarray expression profiling as a discovery platform to identify genes with the potential to distinguish defferent grades of HCC development by real-time PCR, confirming variability betweenHCC grades as well as the predicted HCC biomarkers
Project description:High-throughput and streamlined workflows are essential in clinical proteomics for standardized processing of samples originating from a variety of sources, including frozen tissue, FFPE tissue, or blood. To reach this goal, we have implemented single-pot solid-phase-enhanced sample preparation (SP3) on a liquid handling robot for automated processing (autoSP3) of tissue lysates in a 96-well format, performing unbiased protein purification and digestion delivering peptides that can be directly analyzed by LCMS. AutoSP3 eliminates hands-on time and minimizes the risk of error, and we show it reduces the protein quantification variability, and improves longitudinal performance and reproducibility. We demonstrate the distinguishing ability of autoSP3 to process low-input samples, reproducibly quantifying 500-1000 proteins from 100-1000 cells (<100 ng protein). Furthermore, we added a LE220-plus focused- ultrasonicator (Covaris Ltd, UK) to our pipeline to include 96-well format lysis of fresh-frozen tissue and cells. Collectively, autoSP3 provides a generic, scalable, and cost-effective pipeline for routine and standardized proteomic sample processing that should enable reproducible proteomics in broad range of clinical and non-clinical applications.
Project description:High-throughput and streamlined workflows are essential in clinical proteomics for standardized processing of samples originating from a variety of sources, including frozen tissue, FFPE tissue, or blood. To reach this goal, we have implemented single-pot solid-phase-enhanced sample preparation (SP3) on a liquid handling robot for automated processing (autoSP3) of tissue lysates in a 96-well format, performing unbiased protein purification and digestion delivering peptides that can be directly analyzed by LCMS. AutoSP3 eliminates hands-on time and minimizes the risk of error, and we show it reduces the protein quantification variability, and improves longitudinal performance and reproducibility. We demonstrate the distinguishing ability of autoSP3 to process low-input samples, reproducibly quantifying 500-1000 proteins from 100-1000 cells (<100 ng protein). Furthermore we applied it to process a cohort of clinical FFPE pulmonary adenocarcinoma (ADC) samples, and recapitulate their separation into known histological growth patterns based on proteome profiles. Collectively, autoSP3 provides a generic, scalable, and cost-effective pipeline for routine and standardized proteomic sample processing that should enable reproducible proteomics in broad range of clinical and non-clinical applications.