Project description:We present a statistical model to estimate the accuracy of derivatized heparin and heparan sulfate (HS) glycosaminoglycan (GAG) assignments to tandem mass (MS/MS) spectra made by the first published database search application, GAG-ID. Employing a multivariate expectation-maximization algorithm, this statistical model distinguishes correct from ambiguous and incorrect database search results when computing the probability that heparin/HS GAG assignments to spectra are correct based upon database search scores. Using GAG-ID search results for spectra generated from a defined mixture of 21 synthesized tetrasaccharide sequences as well as seven spectra of longer defined oligosaccharides, we demonstrate that the computed probabilities are accurate and have high power to discriminate between correctly, ambiguously, and incorrectly assigned heparin/HS GAGs. This analysis makes it possible to filter large MS/MS database search results with predictable false identification error rates.
Project description:For bottom-up proteomics, there are wide variety of database-searching algorithms in use for matching peptide sequences to tandem MS spectra. Likewise, there are numerous strategies being employed to produce a confident list of peptide identifications from the different search algorithm outputs. Here we introduce a grid-search approach for determining optimal database filtering criteria in shotgun proteomics data analyses that is easily adaptable to any search. Systematic Trial and Error Parameter Selection--referred to as STEPS--utilizes user-defined parameter ranges to test a wide array of parameter combinations to arrive at an optimal "parameter set" for data filtering, thus maximizing confident identifications. The benefits of this approach in terms of numbers of true-positive identifications are demonstrated using datasets derived from immunoaffinity-depleted blood serum and a bacterial cell lysate, two common proteomics sample types.
Project description:Forensic analysis of seized drug evidence often involves determining whether the components of an unknown mixture are illicit compounds. One approach to this task is to screen the evidence using direct analysis in real time mass spectrometry (DART-MS) to make presumptive identifications. This manuscript introduces a new library-search algorithm that enhances presumptive identifications of mixture components using a series of in-source collision-induced dissociation mass spectra collected through DART-MS. The multistage search, titled the Inverted Library-Search Algorithm (ILSA), identifies potential components in a mixture by first searching the lowest fragmentation mass spectrum for target peaks, assuming these peaks are protonated molecules, and then scoring each target peak with possible library matches. As a proof of concept, the ILSA is demonstrated through several example searches of model seized drug mixtures of acetyl fentanyl, benzyl fentanyl, amphetamine, and methamphetamine searched against a small library of select compounds and the freely available NIST DART-MS Forensics Database. Discussion of the search results and several open areas of research to further extend the method are provided. This new approach for presumptive identification provides analysts with refined information about mixture components and will be of immediate importance in forensic analysis using DART-MS. A prototype implementation of the ILSA is available at https://github.com/asm3-nist/DART-MS-DST.
Project description:In shotgun proteomics, the analysis of tandem mass spectrometry data from peptides can benefit greatly from high mass accuracy measurements. In this study, we have evaluated two database search strategies which use high mass accuracy measurements of the peptide precursor ion. Our results indicate that peptide identifications are improved when spectra are searched with a wide mass tolerance window and precursor mass is used as a filter to discard incorrect matches. Database searches with a peptide data set constrained to peptides within a narrow mass window resulted in fewer peptide identifications but a significantly faster database search time.
Project description:In shotgun proteomics analysis, user-specified parameters are critical to database search performance and therefore to the yield of confident peptide-spectrum matches (PSMs). Two of the most important parameters are related to the accuracy of the mass spectrometer. Precursor mass tolerance defines the peptide candidates considered for each spectrum. Fragment mass tolerance or bin size determines how close observed and theoretical fragments must be to be considered a match. For either of these two parameters, too wide a setting yields randomly high-scoring false PSMs, whereas too narrow a setting erroneously excludes true PSMs, in both cases, lowering the yield of peptides detected at a given false discovery rate. We describe a strategy for inferring optimal search parameters by assembling and analyzing pairs of spectra that are likely to have been generated by the same peptide ion to infer precursor and fragment mass error. This strategy does not rely on a database search, making it usable in a wide variety of settings. In our experiments on data from a variety of instruments including Orbitrap and Q-TOF acquisitions, this strategy yields more high-confidence PSMs than using settings based on instrument defaults or determined by experts. Param-Medic is open-source and cross-platform. It is available as a standalone tool ( http://noble.gs.washington.edu/proj/param-medic/ ) and has been integrated into the Crux proteomics toolkit ( http://crux.ms ), providing automatic parameter selection for the Comet and Tide search engines.
Project description:Cashew is one of the most prevalent causes of tree nut allergies. However, the cashew proteome is far from complete, which limits the quality of peptide identification in mass spectrometric analyses. In this study, bioinformatics tools were utilized to construct a customized cashew protein database and improve sequence quality for proteins of interest, based on a publicly available cashew genome database. As a result, two additional isoforms for cashew 2S albumins and five other isoforms for cashew 11S proteins were identified, along with several other potential allergens. Using the optimized protein database, the protein profiles of cashew nuts subjected to different oil-roasting conditions (138 °C and 166 °C for 2-10 minutes) were analyzed using discovery LC-MS/MS analysis. The results showed that cashew 2S protein is most heat-stable, followed by 11S and 7S proteins, though protein isoforms might be affected differently. Preliminary target peptide selection indicated that out of the 29 potential targets, 18 peptides were derived from the newly developed database. In the evaluation of thermal processing effects on cashew proteins, several Maillard reaction adducts were also identified. The cashew protein database developed in this study allows for comprehensive analyses of cashew proteome and development of high-quality allergen detection method.
Project description:The ribosome-associated protein quality control (RQC) core factor nuclear export mediator factor (NEMF) appends C-terminal extended sequences (CESs) to ribosome-stalled nascent chains (NCs). Specific CESs compositions could be directly recognized by enzymes and facilitate NC degradation. Yet, NEMF-mediated CESs remains largely unidentified. Here, we present a protocol for identifying and characterizing NEMF-mediated C-terminal modifications on mitochondrial NCs (mitoNCs) via tandem mass spectrometry (MS/MS) analysis. We describe strategies aimed at constructing a customized MS/MS spectra database for unknown CESs and detail the steps for CES-modified sample preparation. For complete details on the use and execution of this protocol, please refer to Lv et al.1.
Project description:Mass spectrometry (MS) instruments and experimental protocols are rapidly advancing, but the software tools to analyse tandem mass spectra are lagging behind. We present a database search tool MS-GF+ that is sensitive (it identifies more peptides than most other database search tools) and universal (it works well for diverse types of spectra, different configurations of MS instruments and different experimental protocols). We benchmark MS-GF+ using diverse spectral data sets: (i) spectra of varying fragmentation methods; (ii) spectra of multiple enzyme digests; (iii) spectra of phosphorylated peptides; and (iv) spectra of peptides with unusual fragmentation propensities produced by a novel alpha-lytic protease. For all these data sets, MS-GF+ significantly increases the number of identified peptides compared with commonly used methods for peptide identifications. We emphasize that although MS-GF+ is not specifically designed for any particular experimental set-up, it improves on the performance of tools specifically designed for these applications (for example, specialized tools for phosphoproteomics).
Project description:RNA Polymerase II ChIP-chip using polyclonal antibody (N-20) performed on GM06990 cells for Nimblegen ENCODE arrays which comprise 50mer oligonucleotides spaces every 38bps (overlapping by 12nts). Goal was to identify Pol II-binding regions. Use of this data requires permission from its producers. Keywords: ChIP-chip
Project description:RNA Polymerase II ChIP-chip using polyclonal antibody (N-20) performed on HeLaS3 cells for Nimblegen ENCODE arrays which comprise 50mer oligonucleotides spaces every 38bps (overlapping by 12nts). Goal was to identify Pol II-binding regions. Use of this data requires permission from its producers. Keywords: ChIP-chip