GNPS - Living Data - Continuous Identification - Networks and Identification Dump
ABSTRACT: This is a dump of all the living data analyses, including molecular networks, clustering, and library identifications that are up to date.
They are organized respectively in 5 folders:
CLUSTERSUMMARY - Represents the clusters for each dataset and the associated metadata (e.g. mz, rt, and number of MS2 spectra)
CLUSTERINFO - Represents the clustering enabling tracing back from the clusters to the original files and scans they came from
MGF - Clustered MS/MS spectra represented as an MGF file - scan numbers correspond to cluster index in CLUSTERSUMMARY
PAIRS - Molecular Networking alignment pairs representing spectral similarity
IDENTIFICATIONS - Spectral library identifications reported by living data/continuous id to the latest GNPS spectral libraries
Project description:Chagas disease is one of the most important neglected diseases with an estimated number of 12 million infected individuals, the majority living in Central and South America. The Trypanosoma cruzi (T.cruzi) protozoan parasite is the etiological agent of Chagas disease. T.cruzi is highly genetically diverse and a new nomenclature assigned each strain to seven genetic groups (TcI-TcVI and Tcbat), named Discrete Typing Units (DTUs), based on their biochemical, immunological and phenotypical characteristics. T.cruzi DTUs have been correlated to diverse clinical outcomes highlighting the importance of molecular epidemiological screens. Despite the development of T.cruzi typing methods based on genetic signatures, each method presenting its own advantages and challenges. The work presented here shows the application of mass spectrometry for Trypanosoma cruzi Strain Typing Assay using MS2 peptide spectral libraries (Tc-STAMS2). The novelty of the method is based on the use of peptide fragmentation spectra as strain-specific fingerprints to classify and identify DTUs. Initially, a spectra library is generated from characterized T.cruzi strains. The library is subsequently inspected using MS/MS spectra from unknown strains and confidently assigned to a specific strain in an automated and computationally-driven approach. The Tc-STAMS2 method was challenged to test several variables such as sample type and preparation, instrument setup and identification platform. Tc-STAMS2 provided high confidence and robustness in T.cruzi strain typing. The Tc-STAMS2 method represents a proof-of-concept of a complementary strategy to the current DNA-based T. cruzi genotyping methods. Moreover, the method allows the identification of strain-specific features that could be related to the biology of T.cruzi strains and their clinical outcomes.
Project description:Sequential window acquisition of all theoretical mass spectra (SWATH-MS) requires a spectral library to extract quantitative measurements from the mass spectrometry data acquired in data-independent acquisition mode (DIA). Large combined spectral libraries containing SWATH assays have been generated for humans and several other organisms, but so far no publicly available library exists for measuring the proteome of zebrafish, a rapidly emerging model system in biomedical research. Here, we present a large zebrafish SWATH spectral library to measure the abundance of 104’185 proteotypic peptides from 10’405 proteins. The library includes proteins expressed in 9 different zebrafish tissues (brain, eye, heart, intestine, liver, muscle, ovaries, spleen, and testes) and provides an important new resource to quantify 40% of the protein-coding zebrafish genes.
Project description:Intact glycopeptide MS analysis to reveal site-specific protein glycosylation is an important frontier of proteomics. However, computational tools for analyzing MS/MS spectra of intact glycopeptides are still limited and not well-integrated into existing workflows. In this work, a novel computational tool which combines the spectral library building/searching tool, SpectraST (Lam et al. Nat. Methods 2008, 5, 873-875), and the glycopeptide fragmentation prediction tool, MassAnalyzer (Zhang et al. Anal. Chem. 2010, 82, 10194-10202) for intact glycopeptide analysis has been developed. Specifically, this tool enables the determination of the glycan structure directly from low-energy collision-induced dissociation (CID) spectra of intact glycopeptides. Given a list of possible glycopeptide sequences as input, a sample-specific spectral library of MassAnalyzer- predicted spectra is built using SpectraST. Glycan identification from CID spectra is achieved by spectral library searching against this library, in which both m/z and intensity information of the possible fragmentation ions are taken into consideration for improved accuracy. We validated our method using a standard glycoprotein, human transferrin, and evaluated its potential to be used in site-specific glycosylation profiling of glycoprotein datasets from LC/MS. For maximum usability, SpectraST is developed as part of the Trans-Proteomic Pipeline (TPP), a freely available and open-source software suite for MS data analysis
Project description:Sequential window acquisition of all theoretical mass spectra (SWATH-MS) requires a spectral library to extract quantitative measurements from the mass spectrometry data acquired in data-independent acquisition mode (DIA). Large combined spectral libraries containing SWATH assays have been generated for humans and several other organisms, but so far no publicly available library exists for measuring the proteome of zebrafish, a rapidly emerging model system in biomedical research. Here, we present a large zebrafish SWATH spectral library to measure the abundance of 104’185 proteotypic peptides from 10’405 proteins. The library includes proteins expressed in 9 different zebrafish tissues (brain, eye, heart, intestine, liver, muscle, ovaries, spleen, and testes) and provides an important new resource to quantify 40% of the protein-coding zebrafish genes. We employ this resource to quantify the proteome across brain, muscle, and liver and characterize divergent expression levels of paralogous proteins in different tissues.
Project description:The heterogeneity caused by post-translational modifications and wide dynamic range of relative protein abundances of the human serum proteome, challenge the capabilities of existing mass spectrometry-based methodologies in accurate identifying N-linked glycopeptides from human serum. To overcome these challenges, we utilized pParse 2.0 to find monoisotopic peak of each precursor ion as well as co-eluted ones, then applied spectral library search to intact glycopeptide identification through pMatchGlyco. Spectra of deglycosylated peptide including semi-tryptic peptides and common modifications were used for spectral library construction. Through scoring of ion matches on MS/MS spectra and target-decoy false positive rate control, and manual check of co-eluted precursor ions at MS1 level, the accuracy of identification was improved. In total, this method identified 1,194 N-linked glycosites and 448 unique glycoproteins in human serum.
Project description:Data independent acquisition-mass spectrometry (DIA-MS) coupled with liquid chromatography is a promising approach for rapid, automatic sampling of MS/MS data in untargeted metabolomics. However, wide isolation windows in DIA-MS generate MS/MS spectra containing a mixed population of fragment ions together with their precursor ions. This precursor-fragment ion map in a comprehensive MS/MS spectral library is crucial for relative quantification of fragment ions uniquely representative of each precursor ion. However, existing reference libraries are not sufficient for this purpose since the fragmentation patterns of small molecules can vary in different instrument setups. Here we developed a bioinformatics workflow called MetaboDIA to build customized MS/MS spectral libraries using a user's own data dependent acquisition (DDA) data and to perform MS/MS-based quantification with DIA data, thus complementing conventional MS1-based quantification. MetaboDIA also allows users to build a spectral library directly from DIA data in studies of a large sample size. Using a marine algae data set, we show that quantification of fragment ions extracted with a customized MS/MS library can provide as reliable quantitative data as the direct quantification of precursor ions based on MS1 data. To test its applicability in complex samples, we applied MetaboDIA to a clinical serum metabolomics data set, where we built a DDA-based spectral library containing consensus spectra for 1829 compounds. We performed fragment ion quantification using DIA data using this library, yielding sensitive differential expression analysis. </br></br> Serum metabolome of 40 age-related macular degeneration patients and 20 control samples was analyzed using untargeted mass spectrometry. We used data dependent acquisition data to build a MS/MS spectral assay library for more than 1,000 compounds and performed targeted extraction of MS2 ion chromatograms from data independent acquisition analysis.
Project description:Gene activation is thought to involve a multistep process whereby transcription factors bind to distal enhancer sites and recruit the Mediator complex which contacts the pre-initiating RNA Polymerase II (Pol II) complex assembled at the start site of the gene. The interaction of Mediator and Pol II has yet to be observed in the nucleus of living cells and the dynamics of this interaction are not yet elucidated. Here we use quantitative live cell super-resolution and light sheet imaging to study the organization and dynamics of endogenous Mediator and Pol II directly in living mouse embryonic stem cells. In addition to forming transient clusters with average lifetimes of 11.1 (± 0.9) s, and 12.1 (± 1.4) s respectively, Mediator and Pol II also form large and stable clusters in stem cells (~15 stable clusters per cell). The large and stable Mediator and Pol II clusters gradually disappear within hours after induction of stem cell differentiation. Mediator and Pol II colocalize in the large clusters. Inhibition of Brd4 bromodomains necessary for enhancer association eliminates both Mediator and Pol II stable clusters, and inhibition of transcription elongation selectively eliminates stable Pol II but not stable Mediator clusters. Tracking of Mediator and Pol II stable clusters suggests they are chromatin associated and they coalesce upon contact, a property associated with phase separated droplets. We conclude that Mediator and Pol II associate in diffraction-sized condensates with a defined lifetime dependent on active transcription in living stem cells. Overall design: H3K27ac, RPB1, and Dendra2 ChIP-seq in WT and Dendra2-RPB1, Halo-MED19 tagged (DRHM) R1 mouse ES cells in the ES state and the EpiLC state.
Project description:We have developed a new workflow to unambiguously localize phosphorylation sites on proteins. We demonstrate that spectral matching of phosphopeptide datasets against a library of the well-simulated spectra provided higher sensitivity for confident site localization than other tested programs. To computationally simulate tandem mass spectra representing all possible singly phosphorylated forms of a peptide, characteristic fragment ions are predicted from ions of their dephosphorylated form generated by beam-type collision-induced dissociation.
Project description:The proteome of the anaerobic bacterium Dehalococcoides mccartyi strain CBDB1 from the phylum Chloroflexi was investigated. D. mccartyi strain CBDB1 is a model organism for organohalide respiration where halogenated organic compounds serve as terminal electron acceptors. A wide range of halogenated organic compounds have been shown to be dehalogenated by the strain CBDB1. Therefore, D. mccartyi strain CBDB1 is a promising candidate for bioremediation application. Proteomic analysis of cultures grown with hexachlorobenzene as only electron acceptor resulted in identification of 8,491 distinct peptides which represents 1,023 proteins. A coverage of 70% of the 1,458 annotated proteins for strain CBDB1 was achieved. In addition, a spectral library was created from the annotated spectra. By using proteogenomics, 18 previously not annotated peptides were identified which contribute to four proteins previously not annotated and corrections in length of eight protein coding sequences.
Project description:Open modification searching (OMS) is a powerful search strategy that identifies peptides carrying any type of modification by allowing a modified spectrum to match against its unmodified variant by using a very wide precursor mass window. A drawback of this strategy, however, is that it leads to a large increase in search time. Although performing an open search can be done using existing spectral library search engines by simply setting a wide precursor mass window, none of these tools have been optimized for OMS, leading to excessive runtimes and suboptimal identification results. This data set contains the evaluation results of the ANN-SoLo tool for fast and accurate open spectral library searching. ANN-SoLo uses approximate nearest neighbor indexing to speed up OMS by selecting only a limited number of the most relevant library spectra to compare to an unknown query spectrum. This approach is combined with a cascade search strategy to maximize the number of identified unmodified and modified spectra while strictly controlling the false discovery rate, as well as a shifted dot product score to sensitively match modified spectra to their unmodified counterparts. ANN-SoLo achieves state-of-the-art performance in terms of speed and the number of identifications. On a previously published human cell line data set, ANN-SoLo confidently identifies more spectra than SpectraST or MSFragger and achieves a speedup of an order of magnitude compared to SpectraST.