Dataset Information

ICEBERG: Neural Spectral Prediction for Structure Elucidation with Tandem Mass Spectrometry

ABSTRACT: Contains supplementary data for the paper Neural Spectral Prediction for Structure Elucidation with Tandem Mass Spectrometry.

INSTRUMENT(S): Q Exactive Plus, Q Exactive HF

ORGANISM(S): Homo Sapiens (ncbitaxon:9606)

SUBMITTER: Connor W. Coley

PROVIDER: MSV000097986 | MassIVE | Sat May 24 11:06:00 BST 2025

REPOSITORIES: MassIVE

ACCESS DATA

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Similar Datasets

Project description:Abstract: Crosslinking mass spectrometry (Crosslinking MS) has developed into a robust technique that is increasingly used to investigate the interactomes of organelles and cells. However, the incomplete and noisy information contained in spectra limits especially the identification of heteromeric protein-protein interactions (PPIs) from the many theoretically possible PPIs. We successfully leveraged here chromatographic retention time (RT) to complement the mass spectrometry-centric identification process. For this, we first made crosslinked peptides amenable to RT prediction, through a Siamese neural network, and then added RT information to the identification process. Our multi-task machine learning model xiRT achieved highly accurate predictions in a multi-dimensional separation experiment of crosslinked E. coli lysate conducted for this study. We combined strong cation exchange (SCX), hydrophilic strong anion exchange (hSAX) and reversed-phase (RP) chromatography and reached R^2 0.94 in RP and a margin of error of 1 fraction for hSAX in 94%, and SCX in 85% of the cases. Importantly, supplementing the search engine score with retention time features led to a 1.4-fold increase in PPIs, at 1% PPI false discovery rate (FDR). We also demonstrated the value of this approach for the more routine analysis of multiprotein complexes. In the Fanconi anaemia monoubiquitin ligase complex, an increase of 1.7-fold in heteromeric residue-pairs was achieved at 1% residue-pair FDR, solely using reversed-phase RT. Retention times therefore proved to be a powerful complement to mass spectrometric information to improve the identification of crosslinked peptides. We envision xiRT to supplement search engines in their scoring routines to increase the sensitivity of Crosslinking MS analyses especially for protein-protein interactions. Conclusion: Using a Siamese network architecture, we succeeded in bringing RT prediction into the Crosslinking MS field, independent of separation setup and search software. Our open source application xiRT introduces the concept of multi-task learning to achieve multi-dimensional chromatographic retention time prediction, and may use any peptide sequence-dependent measure including for example collision cross section or isoelectric point. The black-box character of the neural network was reduced by means of interpretable machine learning that revealed individual amino acid contributions towards the separation behavior. The RT predictions – even when using only the RP dimension – complement mass spectrometric information to enhance the identification of heteromeric crosslinks in multiprotein complex and proteome-wide studies. Overfitting does not account for this gain as known false target matches from an entrapment database did not increase. Leveraging additional information sources may help to address the mass-spectrometric identification challenge of heteromeric crosslinks.

Dataset Information

ICEBERG: Neural Spectral Prediction for Structure Elucidation with Tandem Mass Spectrometry

Dataset's files

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets