Project description:We Have human breast milk dataset in this project. We predict all spectra in the datasets via Prosit then rescore. We have 100% FDR maxquant search results, and using percolator we get 1%FDR filtered results with andromeda Scores and another with features extracted from Prosit predictions.
Project description:We Have a human yeast dilution dataset in this project. We predict all spectra in the datasets via Prosit then rescore. We have 100% FDR max quant search results using percolator. In addition, we get 1%FDR filtered results with andromeda Scores and another with features extracted from Prosit predictions.
Project description:We Have a pancreas MS2/MS3 dataset in this project. We predict all spectra in the datasets via Prosit then rescore. We have 100% FDR maxquant search results, and using percolator we get 1%FDR filtered results with andromeda Scores and another with features extracted from Prosit predictions.
Project description:It has been shown that integrating peptide property predictions such as fragment intensity into the scoring process of peptide spectrum match can greatly increase the number of confidently identified peptides compared to using traditional scoring methods. Here, we introduce Prosit-XL, a robust and accurate fragment intensity predictor covering the cleavable (DSSO/DSBU) and non-cleavable cross-linkers (DSS/BS3), achieving high accuracy on various holdout sets with consistent performance on external datasets without fine-tuning. Due to the complex nature of false positives in XL-MS, a novel approach to data-driven rescoring was developed that benefits from Prosit-XL’s predictions while limiting the overestimation of the false discovery rate (FDR). We first evaluated this approach using two ground truth datasets (PXD029252, PXD042173) that demonstrate the accurate and precise FDR estimation. Second, we applied Prosit-XL on a proteome-scale dataset (JPST000845, PXD017711), demonstrating an up to ~3.4-fold improvement in PPI discovery compared to classic approaches. Finally, Prosit-XL was used to increase the coverage and depth of a spatially resolved interactome map of intact human cytomegalovirus virions (PXD031911), leading to the discovery of previously unobserved interactions between human and cytomegalovirus proteins.
Project description:Immunopeptidomics aims to identify Major Histocompatibility Complex-presented peptides on every cell that can be used in anti-cancer vaccine development. However, existing immunopeptidomics data analysis pipelines suffer from the non-tryptic nature of immunopeptides, complicating their identification. Previously, peak intensity predictions by MS²PIP and retention time predictions by DeepLC, have been shown to improve tryptic peptide identifications when rescoring peptide-spectrum matches with Percolator. However, as MS²PIP was tailored towards tryptic peptides, we have here retrained MS²PIP to include non-tryptic peptides. Interestingly, the new models not only greatly improve predictions for immunopeptides, but also yield further improvements for tryptic peptides. We show that the integration of new MS²PIP models, DeepLC, and Percolator in one software package, MS²Rescore, increases spectrum identification rate and unique identified peptides with 46% and 36% compared to standard Percolator rescoring at 1% FDR. Moreover, MS²Rescore also outperforms the current state-of-the-art in immunopeptide-specific identification approaches. Integration of immunopeptide MS²PIP models, DeepLC, and Percolator into MS²Rescore thus allows substantial improved identification of novel epitopes from existing immunopeptidomics workflows.
Project description:To compare the rescoring performance on Orbitrap versus timsTOF data, we utilized a comparison dataset comprising both HLA-I and HLA-II peptides measured on an Orbitrap and on a timsTOF. For detailed information on data acquisition, please refer to the original publication by Gravel et al. (PXD038782). In brief, 10 samples were measured in technical triplicate (two technical replicates for the HNSCC sample) on the Orbitrap Fusion Lumos mass spectrometer (Thermo Fisher Scientic, Waltham, USA) and on the timsTOF Pro (Bruker Daltonik, Germany).
Individual spectrum peak files were searched against a database containing 20,598 human UniProt entries downloaded from https://www.ebi.ac.uk/reference_proteomes/ with MaxQuant version 2.0.3.1 and rescored by integrating Prosit's fragment ion intensity predictions, using Oktoberfest (https://github.com/wilhelm-lab/oktoberfest). To perform rescoring on the Orbitrap data we employed the 2020 CID Prosit model with a CE set to 35 for HLA-I peptides, and the 2020 HCD Prosit model with CE set to 30 for the HLA-II peptides. For timsTOF data, rescoring was performed using the TOF Prosit 2023 model with the reported CEs for each PSM.
Rescoring the Orbitrap data resulted in on average 2.5-fold more unique HLA-I peptides and 1.4-fold more unique HLA-II peptides. In contrast, rescoring timsTOF data resulted in a higher increase, with on average 2.8-fold more unique HLA-I peptides and 1.7-fold more unique HLA-II peptides.
Project description:We built and characterised a mass spectrometer capable of performing CID (both beam type and resonant type), UVPD, EID and ECD in an automated fashion during an LCMS type experiment. We exploited this ability to generate large datasets through multienzyme deep proteomics experiments for characterisation of these activation techniques. As a further step, motivated by the complexity generated by these dissociation techniques, we developed a single Prosit deep learning model for fragment ion intensity prediction covering all of these techniques. The generated model has been made publicly available and has been utilised in FragPipe within its MSBooster module. Rescoring allowed both data-dependent acquisition (DDA) and data-independent acquisition (DIA) to achieve on average more than 10% increase in protein identifications across all dissociation techniques and enzymatic digestions. We demonstrate that these alternative fragmentation approaches can now be used within standard data analysis pipelines and can produce data competitive to CID in terms of eficiency, but in the cases of EID and UVPD with far richer and more comprehensive spectra.
Project description:The development of the TMTpro-16plex series expanded the breadth of commercial isobaric tagging reagents by nearly 50% over classic TMT-11plex. In addition to the described 16plex reagents, the proline-based TMTpro molecule can accommodate two additional combinations of heavy carbon and nitrogen isotopes. Here, we introduce the final two labeling reagents, TMTpro-134C and TMTpro-135N, which permit the simultaneous global protein profiling of 18 samples with essentially no missing values. For example, six conditions with three biological replicates can now be perfectly accommodated. We showcase the 18plex reagent set by profiling the proteome and phosphoproteome of a pair of isogenic mammary epithelial cell lines under three conditions in triplicate. We compare the depth and quantitative performance of this data set with a TMTpro-16plex experiment in which two samples were omitted. Our analysis revealed similar numbers of quantified peptides and proteins, with high quantitative correlation. We interrogated further the TMTpro-18plex data set by highlighting changes in protein abundance profiles under different conditions in the isogenic cell lines. We conclude that TMTpro-18plex further expands the sample multiplexing landscape, allowing for complex and innovative experimental designs.
Project description:Citrullination is a key yet underexplored post-translational modification involved in various biological processes. Its identification via mass spectrometry faces challenges like limited enrichment tools and false positives due to mass overlap with deamidation (+0.9840 Da). To address this, we developed a data analysis pipeline integrating the deep learning model Prosit-Cit, trained on ~53,000 spectra from ~2,500 synthetic citrullinated peptides, which improves sensitivity and precision in identifying citrullination sites. This approach has identified up to 14 times more citrullinated sites in human tissue proteomes and revealed new insights, including the first large-scale citrullination mapping in Arabidopsis. This upload includes: 1) Raw files and search SEARCHs from the evaluation dataset, used to assess the precision of citrullination identifications. 2) Raw files, search SEARCHs, and rescoring outcomes from validation experiments conducted on Arabidopsis flowers. 4) Re-analyzed search and rescoring SEARCHs from human (PXD010154) and Arabidopsis (PXD013868) tissue proteomes.
Project description:The development of the TMTpro-16plex series expanded the breadth of commercial isobaric tagging reagents by nearly 50% over classic TMT-11plex. In addition to the described 16plex reagents, the proline-based TMTpro molecule can accommodate two additional combinations of heavy carbon and nitrogen isotopes. Here, we introduce the final two labeling reagents, TMTpro-134C and TMTpro-135N, which permit the simultaneous global protein profiling of 18 samples with no missing values. For example, six conditions with three biological replicates can now be perfectly accommodated. We showcase the 18plex reagent set by profiling the proteome and phosphoproteome of a pair of isogenic breast cancer cell lines under three conditions in triplicate. We compare the depth and quantitative performance of this dataset with a TMTpro-16plex experiment in which two samples were omitted. Our analysis revealed similar numbers of quantified peptides and proteins, with high quantitative correlation. We interrogated further the TMTpro-18plex dataset by highlighting changes in protein abundance profiles under different conditions in the isogenic cell lines. We conclude that TMTpro-18plex further expands the sample multiplexing landscape, allowing for complex and innovative experimental designs.