DeMix-Q: Quantification-Centered Data Processing Workflow.
ABSTRACT: For historical reasons, most proteomics workflows focus on MS/MS identification but consider quantification as the end point of a comparative study. The stochastic data-dependent MS/MS acquisition (DDA) gives low reproducibility of peptide identifications from one run to another, which inevitably results in problems with missing values when quantifying the same peptide across a series of label-free experiments. However, the signal from the molecular ion is almost always present among the MS(1)spectra. Contrary to what is frequently claimed, missing values do not have to be an intrinsic problem of DDA approaches that perform quantification at the MS(1)level. The challenge is to perform sound peptide identity propagation across multiple high-resolution LC-MS/MS experiments, from runs with MS/MS-based identifications to runs where such information is absent. Here, we present a new analytical workflow DeMix-Q (https://github.com/userbz/DeMix-Q), which performs such propagation that recovers missing values reliably by using a novel scoring scheme for quality control. Compared with traditional workflows for DDA as well as previous DIA studies, DeMix-Q achieves deeper proteome coverage, fewer missing values, and lower quantification variance on a benchmark dataset. This quantification-centered workflow also enables flexible and robust proteome characterization based on covariation of peptide abundances.
Project description:Data-independent acquisition (DIA) is a promising technique for the proteomic analysis of complex protein samples. A number of studies have claimed that DIA experiments are more reproducible than data-dependent acquisition (DDA), but these claims are unsubstantiated since different data analysis methods are used in the two methods. Data analysis in most DIA workflows depends on spectral library searches, whereas DDA typically employs sequence database searches. In this study, we examined the reproducibility of the DIA and DDA results using both sequence database and spectral library search. The comparison was first performed using a cell lysate and then extended to an interactome study. Protein overlap among the technical replicates in both DDA and DIA experiments was 30% higher with library-based identifications than with sequence database identifications. The reproducibility of quantification was also improved with library search compared to database search, with the mean of the coefficient of variation decreasing more than 30% and a reduction in the number of missing values of more than 35%. Our results show that regardless of the acquisition method, higher identification and quantification reproducibility is observed when library search was used.
Project description:Conventional TopN data-dependent acquisition (DDA) LC-MS/MS analysis identifies only a limited fraction of all detectable precursors because the ion-sampling rate of contemporary mass spectrometers is insufficient to target each precursor in a complex sample. TopN DDA preferentially targets high-abundance precursors with limited sampling of low-abundance precursors and repeated analyses only marginally improve sample coverage due to redundant precursor sampling. In this work, advanced precursor ion selection algorithms were developed and applied in the bottom-up analysis of HeLa cell lysate to overcome the above deficiencies. Precursors fragmented in previous runs were efficiently excluded using an automatically aligned exclusion list, which reduced overlap of identified peptides to ?10% between replicates. Exclusion of previously fragmented high-abundance peptides allowed deeper probing of the HeLa proteome over replicate LC-MS runs, resulting in the identification of 29% more peptides beyond the saturation level achievable using conventional TopN DDA. The gain in peptide identifications using the developed approach translated to the identification of several hundred low-abundance protein groups, which were not detected by conventional TopN DDA. Exclusion of only identified peptides compared with the exclusion of all previously fragmented precursors resulted in an increase of 1000 (?10%) additional peptide identifications over four runs, suggesting the potential for further improvement in the depth of proteomic profiling using advanced precursor ion selection algorithms.
Project description:For data-independent acquisition by means of sequential window acquisition of all theoretical fragment ion spectra (SWATH), a reference library of data-dependent acquisition (DDA) runs is typically used to correlate the quantitative data from the fragment ion spectra with peptide identifications. The quality and coverage of such a reference library is therefore essential when processing SWATH data. In general, library sizes can be increased by reducing the impact of DDA precursor selection with replicate runs or fractionation. However, these strategies can affect the match between the library and SWATH measurement, and thus larger library sizes do not necessarily correspond to improved SWATH quantification. Here, three fractionation strategies to increase local library size were compared to standard library building using replicate DDA injection: protein SDS-PAGE fractionation, peptide high-pH RP-HPLC fractionation and MS-acquisition gas phase fractionation. The impact of these libraries on SWATH performance was evaluated in terms of the number of extracted peptides and proteins, the match quality of the peptides and the extraction reproducibility of the transitions. These analyses were conducted using the hydrophilic proteome of differentiating human embryonic stem cells. Our results show that SWATH quantitative results and interpretations are affected by choice of fractionation technique. Data are available via ProteomeXchange with identifier PXD006190.
Project description:<h4>Background</h4>Comprehensive characterization of the phosphoproteome in living cells is critical in signal transduction research. But the low abundance of phosphopeptides among the total proteome in cells remains an obstacle in mass spectrometry-based proteomic analysis. To provide a solution, an alternative analytic strategy to confidently identify phosphorylated peptides by using the alkaline phosphatase (AP) treatment combined with high-resolution mass spectrometry was provided. While the process is applicable, the key integration along the pipeline was mostly done by tedious manual work.<h4>Results</h4>We developed a software toolkit, iPhos, to facilitate and streamline the work-flow of AP-assisted phosphoproteome characterization. The iPhos tookit includes one assister and three modules. The iPhos Peak Extraction Assister automates the batch mode peak extraction for multiple liquid chromatography mass spectrometry (LC-MS) runs. iPhos Module-1 can process the peak lists extracted from the LC-MS analyses derived from the original and dephosphorylated samples to mine out potential phosphorylated peptide signals based on mass shift caused by the loss of some multiples of phosphate groups. And iPhos Module-2 provides customized inclusion lists with peak retention time windows for subsequent targeted LC-MS/MS experiments. Finally, iPhos Module-3 facilitates to link the peptide identifications from protein search engines to the quantification results from pattern-based label-free quantification tools. We further demonstrated the utility of the iPhos toolkit on the data of human metastatic lung cancer cells (CL1-5).<h4>Conclusions</h4>In the comparison study of the control group of CL1-5 cell lysates and the treatment group of dasatinib-treated CL1-5 cell lysates, we demonstrated the applicability of the iPhos toolkit and reported the experimental results based on the iPhos-facilitated phosphoproteome investigation. And further, we also compared the strategy with pure DDA-based LC-MS/MS phosphoproteome investigation. The results of iPhos-facilitated targeted LC-MS/MS analysis convey more thorough and confident phosphopeptide identification than the results of pure DDA-based analysis.
Project description:Current analytical strategies for collecting proteomic data using data-dependent acquisition (DDA) are limited by the low analytical reproducibility of the method. Proteomic discovery efforts that exploit the benefits of DDA, such as providing peptide sequence information, but that enable improved analytical reproducibility, represent an ideal scenario for maximizing measureable peptide identifications in "shotgun"-type proteomic studies. Therefore, we propose an analytical workflow combining DDA with retention time aligned extracted ion chromatogram (XIC) areas obtained from high mass accuracy MS1 data acquired in parallel. We applied this workflow to the analyses of sample matrixes prepared from mouse blood plasma and brain tissues and observed increases in peptide detection of up to 30.5% due to the comparison of peptide MS1 XIC areas following retention time alignment of co-identified peptides. Furthermore, we show that the approach is quantitative using peptide standards diluted into a complex matrix. These data revealed that peptide MS1 XIC areas provide linear response of over three orders of magnitude down to low femtomole (fmol) levels. These findings argue that augmenting "shotgun" proteomic workflows with retention time alignment of peptide identifications and comparative analyses of corresponding peptide MS1 XIC areas improve the analytical performance of global proteomic discovery methods using DDA.
Project description:Constant improvements to the Orbitrap mass analyzer, such as acquisition speed, resolution, dynamic range and sensitivity have strengthened its value for the large-scale identification and quantification of metabolites in complex biological matrices. Here, we report the development and optimization of Data Dependent Acquisition (DDA) and Sequential Window Acquisition of all THeoretical fragment ions (SWATH-type) Data Independent Acquisition (DIA) workflows on a high-field Orbitrap Fusion<sup>TM</sup> Tribrid<sup>TM</sup> instrument for the robust identification and quantification of metabolites in human plasma. By using a set of 47 exogenous and 72 endogenous molecules, we compared the efficiency and complementarity of both approaches. We exploited the versatility of this mass spectrometer to collect meaningful MS/MS spectra at both high- and low-mass resolution and various low-energy collision-induced dissociation conditions under optimized DDA conditions. We also observed that complex and composite DIA-MS/MS spectra can be efficiently exploited to identify metabolites in plasma thanks to a reference tandem spectral library made from authentic standards while also providing a valuable data resource for further identification of unknown metabolites. Finally, we found that adding multi-event MS/MS acquisition did not degrade the ability to use survey MS scans from DDA and DIA workflows for the reliable absolute quantification of metabolites down to 0.05 ng/mL in human plasma.
Project description:Currently data-dependent acquisition (DDA) is the method of choice for mass spectrometry-based proteomics discovery experiments, but data-independent acquisition (DIA) is steadily becoming more important. One of the most important requirements to perform a DIA analysis is the availability of suitable spectral libraries for peptide identification and quantification. Several studies were performed addressing the evaluation of spectral library performance for protein identification in DIA measurements. But so far only few experiments estimate the effect of these libraries on the quantitative level.In this work we created a gold standard spike-in sample set with known contents and ratios of proteins in a complex protein matrix that allowed a detailed comparison of DIA quantification data obtained with different spectral library approaches. We used in-house generated sample-specific spectral libraries created using varying sample preparation approaches and repeated DDA measurement. In addition, two different search engines were tested for protein identification from DDA data and subsequent library generation. In total, eight different spectral libraries were generated, and the quantification results compared with a library free method, as well as a default DDA analysis. Not only the number of identifications on peptide and protein level in the spectral libraries and the corresponding DIA analysis results was inspected, but also the number of expected and identified differentially abundant protein groups and their ratios.We found, that while libraries of prefractionated samples were generally larger, there was no significant increase in DIA identifications compared with repetitive non-fractionated measurements. Furthermore, we show that the accuracy of the quantification is strongly dependent on the applied spectral library and whether the quantification is based on peptide or protein level. Overall, the reproducibility and accuracy of DIA quantification is superior to DDA in all applied approaches.Data has been deposited to the ProteomeXchange repository with identifiers PXD012986, PXD012987, PXD012988 and PXD014956.
Project description:Stochasticity between independent LC-MS/MS runs is a challenging problem in the field of proteomics, resulting in significant missing values (i.e., abundance measurements) among observed peptides. To address this issue, several approaches have been developed including computational methods such as MaxQuant's match-between-runs (MBR) algorithm. Often dozens of runs are all considered at once by MBR, transferring identifications from any one run to any of the others. To evaluate the error associated with these transfer events, we created a two-sample/two-proteome approach. In this way, samples containing no yeast lysate (n = 20) were assessed for false identification transfers from samples containing yeast (n = 20). While MBR increased the total number of spectral identifications by ?40%, we also found that 44% of all identified yeast proteins had identifications transferred to at least one sample without yeast. However, of these only 2.7% remained in the final data set after applying the MaxQuant LFQ algorithm. We conclude that false transfers by MBR are plentiful, but few are retained in the final data set.
Project description:The ability to perform thorough sampling is of critical importance when using mass spectrometry to characterize complex proteomic mixtures. A common approach is to reinterrogate a sample multiple times by LC-MS/MS. However, the conventional data-dependent acquisition methods that are typically used in proteomics studies will often redundantly sample high-intensity precursor ions while failing to sample low-intensity precursors entirely. We describe a method wherein the masses of successfully identified peptides are used to generate an accurate mass exclusion list such that those precursors are not selected for sequencing during subsequent analyses. We performed multiple concatenated analytical runs to sample a complex cell lysate, using either accurate mass exclusion-based data-dependent acquisition (AMEx) or standard data-dependent acquisition, and found that utilization of AMEx on an ESI-Orbitrap instrument significantly increases the total number of validated peptide identifications relative to a standard DDA approach. The additional identified peptides represent precursor ions that exhibit low signal intensity in the sample. Increasing the total number of peptide identifications augmented the number of proteins identified, as well as improved the sequence coverage of those proteins. Together, these data indicate that using AMEx is an effective strategy to improve the characterization of complex proteomic mixtures.
Project description:The data described here provide a systematic performance evaluation of popular data-dependent (DDA) and independent (DIA) mass spectrometric (MS) workflows currently used in quantitative proteomics. We assessed the limits of identification, quantification and detection for each method by analyzing a dilution series of 20 unmodified and 10 phosphorylated synthetic heavy labeled reference peptides, respectively, covering six orders of magnitude in peptide concentration with and without a complex human cell digest background. We found that all methods performed very similarly in the absence of background proteins, however, when analyzing whole cell lysates, targeted methods were at least 5-10 times more sensitive than directed or DDA methods. In particular, higher stage fragmentation (MS3) of the neutral loss peak using a linear ion trap increased dynamic quantification range of some phosphopeptides up to 100-fold. We illustrate the power of this targeted MS3 approach for phosphopeptide monitoring by successfully quantifying 9 phosphorylation sites of the kinetochore and spindle assembly checkpoint component Mad1 over different cell cycle states from non-enriched pull-down samples. The data are associated to the research article 'Evaluation of data-dependent and data-independent mass spectrometric workflows for sensitive quantification of proteins and phosphorylation sites? (Bauer et al., 2014) . The mass spectrometry and the analysis dataset have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository with the dataset identifier PXD000964.