Proteomics

Dataset Information

0

PeptideForest: Semi-supervised machine learning integrating multiple search engines for peptide identification


ABSTRACT: We introduce PeptideForest, a semi-supervised machine learning approach that integrates the assignment of peptides to mass spectra from multiple algorithms to train a random forest classifier, thereby combining the results from different search engines. PeptideForest increases the number of peptide-to-spectrum matches that exhibit a q-value lower than 1% by 25.2 ± 1.6% compared to MS-GF+ data on samples containing mixed HEK and Escherichia coli proteomes. However, an increase in quantity does not necessarily reflect an increase in quality and this is why we devised a novel approach to determine the quality of the assigned spectra through TMT quantification of samples with known ground truths. Thereby, we could show that the increase in PSMs below 1% q-value does not come with a decrease in quantification quality and as such PeptideForest offers a possibility to gain deeper insights into bottom-up proteomics. PeptideForest has been integrated into our pipeline framework Ursgal and can therefore be combined with a wide array of algorithms.

INSTRUMENT(S):

ORGANISM(S): Homo Sapiens (human) Escherichia Coli

TISSUE(S): Cell Culture

SUBMITTER: Stefan Schulze  

LAB HEAD: Christian Fufezan

PROVIDER: PXD056915 | Pride | 2025-06-02

REPOSITORIES: Pride

Dataset's files

Source:
altmetric image

Publications

PeptideForest: Semisupervised Machine Learning Integrating Multiple Search Engines for Peptide Identification.

Ranff Tristan T   Dennison Matthew M   Bédorf Jeroen J   Schulze Stefan S   Zinn Nico N   Bantscheff Marcus M   van Heugten Jasper J R M JJRM   Fufezan Christian C  

Journal of proteome research 20250122 2


The first step in bottom-up proteomics is the assignment of measured fragmentation mass spectra to peptide sequences, also known as peptide spectrum matches. In recent years novel algorithms have pushed the assignment to new heights; unfortunately, different algorithms come with different strengths and weaknesses and choosing the appropriate algorithm poses a challenge for the user. Here we introduce PeptideForest, a semisupervised machine learning approach that integrates the assignments of mul  ...[more]

Similar Datasets

2018-07-11 | PXD008782 | Pride
2021-08-18 | PXD024584 | Pride
| PRJNA1053763 | ENA
| PRJNA1054009 | ENA
2017-04-06 | PXD001240 | Pride
2019-09-25 | PXD005280 | Pride
2011-09-08 | E-GEOD-24191 | biostudies-arrayexpress
2025-05-06 | PXD048412 | Pride
2018-07-11 | PXD008783 | Pride
2016-05-03 | E-MTAB-4012 | biostudies-arrayexpress