Dataset Information

Predicted spectral database search for TMT-labeled phosphopeptides and its false discovery rate estimation

ABSTRACT: Predicted spectral database search for TMT-labeled phosphopeptides and its false discovery rate estimation

ORGANISM(S): Homo Sapiens (human)

SUBMITTER: Seungjin

PROVIDER: PXD060128 | JPOST Repository | Sat Feb 07 00:00:00 GMT 2026

REPOSITORIES: jPOST

ACCESS DATA

Dataset's files

Source:

			Action	DRS
	YB20171221_SK_TMT11yeast_1ugSingleShot_86min_APDoff_01.mgf	Mgf
	YB20171221_SK_TMT11yeast_1ugSingleShot_86min_APDoff_01.mzid	Mzid
	YB20171221_SK_TMT11yeast_1ugSingleShot_86min_APDoff_01.raw	Raw
	eeee.mzTab	Mztab
	note.txt	Txt

Items per page:

1 - 5 of 5

Similar Datasets

Project description:Spectral library search (SLS) is a major approach for peptide identification from tandem mass spectrometry data, offering a complementary approach to conventional database search. Moreover, with the emergence of spectrum prediction models, proteomics database search is progressively becoming more like spectral library search of predicted peptide spectra. The performance of peptide identification algorithms thus frequently depends on how well the underlying Spectrum-Spectrum Matching (SSM) scoring functions distinguish true and false positive matches. However, detailed comparative studies evaluating the performance of SSM scoring functions remain limited by the absence of comprehensive benchmark datasets. We propose new methods to build benchmarks that assess the effectiveness and robustness of SSM scoring functions. The resulting benchmark dataset is composed of (i) a set of 476,063 precursors used to construct 8 query spectrum sets with different levels of noise added to "ideal" and real experimental spectra, and (ii) three spectral libraries with different spectra for the same 3,065,819 precursors: experimental spectra, annotated/de-noised spectra and predicted spectra. The benchmark set was then used to evaluate 9 common spectrum preprocessing scenarios, followed by the evaluation of 3 standard SSM scoring functions, Cosine, Projected-Cosine (commonly used for the analysis of chimeric/mixture spectra), and Jensen-Shannon divergence, and 2 additional scoring functions used in state-of-the-art SLS tools: SpectraST and EntropyScore. The results revealed that scoring spectrum-spectrum matches is still an important open problem, with the best recall for typical SLS searches still assessed to be poor at just ~70% at the typical 1% error rate. Overall, SpectraST performed best for spectra with little-to-no noise, but JS-divergence performed better in some cases as it was found to be most resistant to noise. Conversely, the performance of Cosine and Entropy score was found to be generally lower than previously reported, with Projected-Cosine performing especially poorly in most cases. However, the performance of the SSM scoring functions was also found to depend quite significantly on the minimum number of matching peaks required for each SSM, with benchmark results showing that the scoring functions' performance and relative ranking can be very significantly affected by how this important parameter is set. The resulting benchmark dataset can be used to test and support the development of SSM scoring functions and the proposed benchmark construction approach, providing a foundation that can be extended for additional types of spectrum-spectrum matching.

Dataset Information

Predicted spectral database search for TMT-labeled phosphopeptides and its false discovery rate estimation

Dataset's files

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets