Project description:The incorporation of machine learning methods into proteomics workflows improves the identification of disease-relevant biomarkers and biological pathways. However, machine learning models, such as deep neural networks, typically suffer from lack of interpretability. Here, we present a deep learning approach to combine biological pathway analysis and biomarker identification to increase the interpretability of proteomics experiments. Our approach integrates a priori knowledge of the relationships between proteins and biological pathways and biological processes into sparse neural networks to create biologically informed neural networks. We employ these networks to differentiate between clinical subphenotypes of septic acute kidney injury and COVID-19, as well as acute respiratory distress syndrome of different aetiologies. To gain biological insight into the complex syndromes, we utilize feature attribution-methods to introspect the networks for the identification of proteins and pathways important for distinguishing between subtypes. The algorithms are implemented in a freely available open source Python-package (https://github.com/InfectionMedicineProteomics/BINN).
Project description:Deep learning has achieved a notable success in mass spectrometry-based proteomics and is now emerging in glycoproteomics. While various deep learning models can predict fragment mass spectra of peptides with good accuracy, they cannot cope with the non-linear glycan structure in an intact glycopeptide. Herein, we propose a deep learning-based approach for the prediction of fragment spectra of intact glycopeptides. Our model adopts tree-structured long-short term memory networks to process the glycan moiety and a graph neural network architecture to incorporate potential fragmentation pathways of a specific glycan structure. This feature is beneficial to model explainability and differentiation ability of glycan structural isomers. We further demonstrated that predicted spectral libraries can be used for analyzing DIA data of glycopeptides as a supplement for library completeness. We expect that this work will provide a valuable deep learning resource for glycoproteomics.
Project description:The molecular networks underlying Alzheimer’s disease (AD) are not well-defined. We present temporal profiling of >14,000 proteins and >34,000 phosphosites at the asymptomatic and symptomatic stages of AD, deep proteomics analysis of transgenic mouse models.
Project description:The molecular networks underlying Alzheimer’s disease (AD) are not well-defined. We present temporal profiling of >14,000 proteins and >34,000 phosphosites at the asymptomatic and symptomatic stages of AD, deep proteomics analysis of transgenic mouse models.
Project description:inSPIRE is an open-source tool for spectral rescoring of mass spectrometry search results. For this project, inSPIRE was applied to MaxQuant, PEAKS DB, and Mascot search results from a tryptic digestion of the K562 proteome. Here we provide the RAW files and search results using MaxQuant, PEAKS DB, and Mascot. We also reprocessed RAW data from the PXD031709 and PXD031812 repositories for which we provide the search result files. Additionally, we provide PEAKS search results from RAW files from the PXD015489 repository which was used as training data for a predictor used within inSPIRE. Michele Mishto, Head of the research group Molecular Immunology at King’s College London and the Francis Crick Institute, London (UK). Email: michele.mishto@kcl.ac.uk,