Arabidopsis thaliana spiked UPS1 standard protein mixtures
Ontology highlight
ABSTRACT: This project aims at providing a quantitative dataset from LC-MS/MS injections of a calibrated UPS1 mixture spiked in an Arabidopsis thaliana background.
INSTRUMENT(S): Q Exactive HF
ORGANISM(S): Homo Sapiens (human) Arabidopsis Thaliana (mouse-ear Cress)
Project description:Proteomic workflows based on nanoLC-MS/MS data-dependent-acquisition analysis have progressed tremendously in recent years due to the technical improvement of mass spectrometers, and now allow to extensively characterize complex protein mixtures. High-resolution and fast sequencing instruments have enabled the use of label-free quantitative methods, which appear as an attractive way to analyze differential protein expression in complex biological samples. Classical label-free quantitative workflows are based either on spectral counting of MS/MS sequencing scans for each protein, or on the extraction of peptide ion peak area values in the LC-MS map composed of all the survey MS scans acquired during the chromatographic gradient. However, the computational processing of the data for label-free quantification still remains a challenge. Here, we provide a dual proteomic standard composed of an equimolar mixture of 48 human proteins (Sigma UPS1) spiked at different concentrations into a background of yeast cell lysate, that was used to benchmark several label-free quantitative workflows, involving different software packages developed in recent years. This experimental design allowed to finely assess their performances in terms of sensitivity and false discovery rate, by measuring the number of true and false-positive (respectively UPS1 or yeast background proteins found as differential). This dataset can also be used to benchmark other label-free workflows, adjust software parameter settings, improve algorithms for extraction of the quantitative metrics from raw MS data, or evaluate downstream statistical methods
Project description:A standard proteolytic digest of a human protein mixture, prepared at 1.5-fold to 3-fold protein concentration changes, and diluted into a constant background of yeast proteins. Similar to other datasets used for ground truth in quantitative studies, with the exception of being more granular, and much larger in terms of replicates, to enable more rigorous and accurate testing of quantitative algorithms.
Project description:Labelling-based proteomics is a powerful method for detection of differentially expressed proteins (DEPs) between biological samples. The current data analysis platform relies on protein-level ratios, where peptide-level ratios are averaged to yield a single summary ratio for each protein. In shotgun proteomics, however, some proteins are quantified with more peptides than others, and this reproducibility information is incorporated into the differential expression (DE) analysis. Here we propose a novel probabilistic framework EBprot that directly models the peptide-to-protein hierarchy and rewards the proteins with reproducible quantification over multiple peptides. To evaluate its performance with known DE states, we first verified that the peptide-level analysis of EBprot provides more accurate estimation of the false discovery rates and better receiver-operating characteristic than other protein ratio analyses using simulation datasets, and confirmed the superior classification performance in a UPS1 mixture spike-in dataset. To illustrate the performance of EBprot in realistic applications, we applied EBprot to a SILAC dataset for lung cancer subtype analysis and an iTRAQ dataset for time course phosphoproteome analysis of EGF-stimulated HeLa cells, each featuring a different experimental design. Through these various examples, we show that the peptide-level analysis of EBprot provides a competitive advantage over alternative methods for the DE analysis of labelling-based quantitative datasets.
Project description:The standard proteomics database search strategy involves searching spectra against a peptide database and estimating the false discovery rate (FDR) of the resulting set of peptide-spectrum matches. One assumption of this protocol is that all the peptides in the database are reDR control strategies are needed. Recently, two methods were proposed to address this problem: subset-search and all-sub. We show that both methods fail to control the FDR. For subset-search, this failure is due to the presence of “neighbor” peptides, which are defined as irrelevant peptides with a similar precursor mass and fragmentation spectrum as a relevant peptide. Not considering neighbors compromises the FDR estimate because a spectrum generated by an irrelevant peptide can incorrectly match well to a relevant peptide. Therefore, we have developed a new method, “filter then subset-neighbor search” (FSNS), that accounts for neighbor peptides. We show evidence that FSNS properly controls the FDR when neighbors are present and that FSNS outperforms group-FDR, the only other method able to control the FDR relative to a subset of relevant peptides
Project description:In bottom-up proteomics, data are acquired on peptides resulting from proteolysis. In XIC-based quantification, the quality of the protein abundance estimation depends on how peptide data are filtered and on which quantification method is used to sum up peptide intensities into protein abundances. So far, these two questions have been addressed independently. Here, we studied to which extent the relative performances of the quantification methods depend on the filters applied on peptide intensity data. To this end, we performed a spike-in experiment using Universal Protein Standard (UPS1) to evaluate the performances of five quantification methods, including TOP3, iBAQ, Average of all peptide intensities or log-intensities and intensity modeling, in five datasets obtained after application of four peptide filters based on peptide sharing between proteins, retention time variability, peptides occurrence and peptide intensity profiles. We showed that estimated protein abundances were not equally affected by filters depending on the computation mode (sum or average) and the type of data (intensity or log intensity) used in the quantification methods and that filters could have contrasting effects depending on the quantification objective (absolute or relative). Our results also indicate that intensity modeling was the most robust method, providing the best results in absence of any filter, but that the different quantification methods can reach similar performances when appropriate peptide filters are used. Altogether, our findings provide clues to best handle intensity data according to the quantification objective and to the experimental design.
Project description:As tools for quantitative label-free mass spectrometry (MS) rapidly develop a consensus about the best practices is not apparent. In the work described here we compared five popular statistical methods for detecting differential protein expression from quantitative MS data using both controlled experiments with known quantitative differences for specific proteins used as standards, as well as ‘real’ experiments where differences in protein abundance are not known a priori. Our results suggest that data-driven reproducibility-optimization can consistently produce reliable differential expression rankings for label-free proteome tools and are straightforward in their application.
Project description:In this study, label free peptide intensities from a well quantitatively characterised yeast cell lysate were acquired on two orbitrap mass spectrometers (LTQ-Velos and Q Exactive HF). Additionally, samples containing Universal Proteomics Standard and the Proteomics Dynamic range Standard (http://www.sigmaaldrich.com/life-science/proteomics/mass-spectrometry/ups1-and-ups2-proteomic.html) in a yeast background were also acquired. The absolute abundances of over 340 proteins present in that yeast lysate (determined in a parallel SRM-based study) were then used in order to determine the flyability (or detectability) of thousands of peptides. The flyability scores, termed the ‘F-factors’, reflect how well a peptide ionises in a complex chromatographically separated sample. Specifically, F-factors are calculated by normalising peptide precursor ion intensity by the absolute abundance of its parent protein. Based on the analysis of the six datasets deposited here (reflecting different gradients and instrument platforms) the study found that physicochemical properties including peptide length, charge and hydrophobicity are predictors of peptide detectability. Furthermore, it was established that hydrophobicity has a non-linear relationship with detectability and that coelution with competing ions significantly affects peptide detectability. The analysis based on the data deposited here suggests that F-factors have great utility for understanding peptide detectability and gas-phase ion chemistry in complex chromatographically-separated peptide mixtures. The concept of F-factors will also undoubtedly assist in better surrogate selection in targeted mass spectrometry studies and allow more accurate calibration of peptide ion signal in label-free workflows.
Project description:Motivation: In bottom-up mass spectrometry proteins are enzymatically digested before measurement. The relationship between proteins and peptides can be represented by bipartite graphs that can be split into connected components. This representation is useful to aid protein inference and quantification, which is complex due to the occurrence of shared peptides. We conducted a comprehensive analysis of these bipartite graphs using peptides from an in silico digestion of protein databases as well as quantified peptides. Results: The graphs based on quantified peptides are smaller and have less complex structures compared to the database level. However, the proportion of protein nodes without unique peptides and the proportion of graphs that contain these proteins increase. Large differences between the two underlying organisms (mouse and yeast) on database as well as quantitative level could be observed. Insights of this analysis may be useful for the development of protein inference and quantification algorithms. Link to preprint: https://www.biorxiv.org/content/10.1101/2021.07.28.454128v1?ct=
Project description:The controlled mixtures were prepared to evaluate the ability of our proposed approach when dealing with the complicated design, e.g. multiple TMT mixtures with technical replicates. 500, 333, 250, and 62.5 fmol UPS1 peptides were spiked-into 50 g SILAC HeLa peptides in duplicate. It produced a dilution series corresponding to 1, 0.667, 0.5, and 0.125 of the highest UPS1 peptide amount (500 fmol). In addition, a reference sample was generated by pooling all four diluted UPS1 peptide samples (286.5 fmol) and combined with 50 g of SILAC HeLa in duplicate. These 10 replicates were labeled with TMT10-plex reagents and mixed together to pass LC-MS/MS analysis. The procedure was repeated to generate a total of five such controlled mixtures. To assess technical variability, three technical replicates were prepared for each mixture. Totally there are 15 MS runs from 5 TMT mixtures. LC-MS/MS was performed using an EASY-nLC 1200 ultrahigh pressure liquid chromatography (UHPLC) connected to an Orbitrap Fusion Lumos Tribrid and equipped with an EASY-spray source (Thermo Fisher Scientific, San Jose, CA). The data was acquired using an MS2/MS3 (also called multinotch MS3 or Synchronous Precursor Selection, SPS).