Analysis of O-glycoproteomics data with MS-Decipher, MetaMorpheus, MSFragger-Glyco and Mascot
ABSTRACT: We presented a user-friendly proteomic database search platform, MS-Decipher, for the identification of peptides from MS data. Two scoring schemes, rank score and hyperscore, could be used for peptide spectra matching. FDR controlling strategies could be used after searching, and it was found that MS-Decipher performs well compared to traditional database searching software. In addition, a special search mode, O-search, is presented to search O-glycopeptides for the O-glycoproteomics analysis. Data and result files in several formats could be used in searching and validation. Dataset of tryptic peptides from serum were used to evaluate the performance of our software in analysis of O-glycoproteomics . The searches using the same parameters were also performed by other commonly used software, like MetaMorpheus, MSFragger-Glyco and Mascot, to compare the performance.
Project description:We presented a user-friendly proteomic database search platform, MS-Decipher, for the identification of peptides from MS data. Two scoring schemes, rank score and hyperscore, could be used for peptide spectra matching. FDR controlling strategies could be used after searching, and it was found that MS-Decipher performs well compared to traditional database searching software. In addition, a special search mode, O-search, is presented to search O-glycopeptides for the O-glycoproteomics analysis. Data and result files in several formats could be used in searching and validation. Dataset of tryptic peptides from HeLa cells was used to evaluate the performance of our software. The searches using the same parameters were also performed by other commonly used software, like Mascot, SEQUEST, MS Amanda 2.0 and MS-GF+, to compare the performance.
Project description:We presented a user-friendly proteomic database search platform, MS-Decipher, for the identification of peptides from MS data. Two scoring schemes, rank score and hyperscore, could be used for peptide spectra matching. FDR controlling strategies could be used after searching, and it was found that MS-Decipher performs well compared to traditional database searching software. In addition, a special search mode, O-search, is presented to search O-glycopeptides for the O-glycoproteomics analysis. Data and result files in several formats could be used in searching and validation. Dataset of tryptic peptides from serum were used to evaluate the performance of our software. The searches using the same parameters were also performed by other commonly used software, like Mascot, SEQUEST, MS Amanda 2.0 and MS-GF+, to compare the performance.
Project description:As a result of recent improvements in mass spectrometry (MS), there is increased interest in data-independent acquisition (DIA) strategies in which all peptides are systematically fragmented using wide mass-isolation windows ('multiplex fragmentation'). DIA-Umpire (http://diaumpire.sourceforge.net/), a comprehensive computational workflow and open-source software for DIA data, detects precursor and fragment chromatographic features and assembles them into pseudo-tandem MS spectra. These spectra can be identified with conventional database-searching and protein-inference tools, allowing sensitive, untargeted analysis of DIA data without the need for a spectral library. Quantification is done with both precursor- and fragment-ion intensities. Furthermore, DIA-Umpire enables targeted extraction of quantitative information based on peptides initially identified in only a subset of the samples, resulting in more consistent quantification across multiple samples. We demonstrated the performance of the method with control samples of varying complexity and publicly available glycoproteomics and affinity purification-MS data.
Project description:BACKGROUND: In proteomic analysis, MS/MS spectra acquired by mass spectrometer are assigned to peptides by database searching algorithms such as SEQUEST. The assignations of peptides to MS/MS spectra by SEQUEST searching algorithm are defined by several scores including Xcorr, Delta Cn, Sp, Rsp, matched ion count and so on. Filtering criterion using several above scores is used to isolate correct identifications from random assignments. However, the filtering criterion was not favorably optimized up to now. RESULTS: In this study, we implemented a machine learning approach known as predictive genetic algorithm (GA) for the optimization of filtering criteria to maximize the number of identified peptides at fixed false-discovery rate (FDR) for SEQUEST database searching. As the FDR was directly determined by decoy database search scheme, the GA based optimization approach did not require any pre-knowledge on the characteristics of the data set, which represented significant advantages over statistical approaches such as PeptideProphet. Compared with PeptideProphet, the GA based approach can achieve similar performance in distinguishing true from false assignment with only 1/10 of the processing time. Moreover, the GA based approach can be easily extended to process other database search results as it did not rely on any assumption on the data. CONCLUSION: Our results indicated that filtering criteria should be optimized individually for different samples. The new developed software using GA provides a convenient and fast way to create tailored optimal criteria for different proteome samples to improve proteome coverage.
Project description:Tandem mass spectrometry (MS/MS) has been used in analysis of proteins and their post-translational modifications. A recently developed data analysis method, which simulates MS/MS spectra of phosphopeptides and performs spectral library searching using SpectraST, facilitates confident localization of phosphorylation sites. However, its performance has been evaluated only on MS/MS spectra acquired using Orbitrap HCD mass spectrometers so far. In this study, we have investigated whether this approach would be applicable to another type of mass spectrometers, and optimized the simulation and search conditions to achieve sensitive and confident site localization. Synthetic phosphopeptides and enriched K562 cell phosphopeptides were analyzed using a TripleTOF 6600 mass spectrometer before and after enzymatic dephosphorylation. Dephosphorylated peptides identified by X!Tandem database searching were subjected to spectral simulation of all possible single phosphorylations using SimPhospho software. Phosphopeptides were identified and localized by SpectraST searching against a library of the simulated spectra. Although no synthetic phosphopeptide was localized at 1% false localization rate under the previous conditions, optimization of the spectral simulation and search conditions for the TripleTOF datasets achieved the localization and improved the sensitivity. Furthermore, the optimized conditions enabled sensitive localization of K562 phosphopeptides at 1% false discovery and localization rates. These results suggest that accurate phosphopeptide simulation of TripleTOF MS/MS spectra is possible and the simulated spectral libraries can be used in SpectraST searching for confident localization of phosphorylation sites.
Project description:This study developed a multilayered, gel-based, and underivatized strategy for de novo protein sequence analysis of unsequenced dinoflagellates using a MALDI-TOF/TOF mass spectrometer with the assistance of DeNovo Explorer software. MASCOT was applied as the first layer screen to identify either known or unknown proteins sharing identical peptides presented in a database. Once the confident identifications were removed after searching against the NCBInr database, the remainder was searched against the dinoflagellate expressed sequence tag database. In the last layer, those borderline and nonconfident hits were further subjected to de novo interpretation using DeNovo Explorer software. The de novo sequences passing a reliability filter were subsequently submitted to nonredundant MS-BLAST search. Using this layer identification method, 216 protein spots representing 158 unique proteins out of 220 selected protein spots from Alexandrium tamarense, a dinoflagellate with unsequenced genome, were confidently or tentatively identified by database searching. These proteins were involved in various intracellular physiological activities. This study is the first effort to develop a completely automated approach to identify proteins from unsequenced dinoflagellate databases and establishes a preliminary protein database for various physiological studies of dinoflagellates in the future.
Project description:Proteogenomic searching is a useful method for identifying novel proteins, annotating genes and detecting peptides unique to an individual genome. The approach, however, can be laborious, as it often requires search segmentation and the use of several unintegrated tools. Furthermore, many proteogenomic efforts have been limited to small genomes, as large genomes can prove impractical due to the required amount of computer memory and computation time. We present Peppy, a software tool designed to perform every necessary task of proteogenomic searches quickly, accurately and automatically. The software generates a peptide database from a genome, tracks peptide loci, matches peptides to MS/MS spectra and assigns confidence values to those matches. Peppy automatically performs a decoy database generation, search and analysis to return identifications at the desired false discovery rate threshold. Written in Java for cross-platform execution, the software is fully multithreaded for enhanced speed. The program can run on regular desktop computers, opening the doors of proteogenomic searching to a wider audience of proteomics and genomics researchers. Peppy is available at http://geneffects.com/peppy .
Project description:Mass spectrometric analyses of protein digests produce large numbers of fragmentation spectra that are not identified by routine database searching strategies. Some of these spectra could be identified by development of improved search engines. However, many of these spectra represent fragmentation of peptide components bearing modifications that are not routinely considered in database searches. Here we present new software within Protein Prospector that allows comprehensive analysis of data sets by analyzing the data at increasing levels of depth. Analysis of published data sets is presented to illustrate that the software is not biased to any instrument types. The results show that these data sets contain many modified peptides. As well as searching for known modification types, Protein Prospector permits the detection and identification of unexpected or novel modifications by searching for any mass shift within a user-specified mass range to any chosen amino acid(s). Several modifications never previously reported in proteomics data were identified in these standard data sets using this mass modification searching approach.
Project description:Neuropeptides are essential for cell-cell communication in neurological and endocrine physiological processes in health and disease. While many neuropeptides have been identified in previous studies, the resulting data has not been structured to facilitate further analysis by tandem mass spectrometry (MS/MS), the main technology for high-throughput neuropeptide identification. Many neuropeptides are difficult to identify when searching MS/MS spectra against large protein databases because of their atypical lengths (e.g. shorter/longer than common tryptic peptides) and lack of tryptic residues to facilitate peptide ionization/fragmentation. NeuroPedia is a neuropeptide encyclopedia of peptide sequences (including genomic and taxonomic information) and spectral libraries of identified MS/MS spectra of homolog neuropeptides from multiple species. Searching neuropeptide MS/MS data against known NeuroPedia sequences will improve the sensitivity of database search tools. Moreover, the availability of neuropeptide spectral libraries will also enable the utilization of spectral library search tools, which are known to further improve the sensitivity of peptide identification. These will also reinforce the confidence in peptide identifications by enabling visual comparisons between new and previously identified neuropeptide MS/MS spectra.http://proteomics.ucsd.edu/Software/NeuroPedia.firstname.lastname@example.orgSupplementary materials are available at Bioinformatics online.
Project description:Protein identification via peptide mass fingerprinting (PMF) remains a key component of high-throughput proteomics experiments in post-genomic science. Candidate protein identifications are made using bioinformatic tools from peptide peak lists obtained via mass spectrometry (MS). These algorithms rely on several search parameters, including the number of potential uncut peptide bonds matching the primary specificity of the hydrolytic enzyme used in the experiment. Typically, up to one of these "missed cleavages" are considered by the bioinformatics search tools, usually after digestion of the in silico proteome by trypsin. Using two distinct, nonredundant datasets of peptides identified via PMF and tandem MS, a simple predictive method based on information theory is presented which is able to identify experimentally defined missed cleavages with up to 90% accuracy from amino acid sequence alone. Using this simple protocol, we are able to "mask" candidate protein databases so that confident missed cleavage sites need not be considered for in silico digestion. We show that that this leads to an improvement in database searching, with two different search engines, using the PMF dataset as a test set. In addition, the improved approach is also demonstrated on an independent PMF data set of known proteins that also has corresponding high-quality tandem MS data, validating the protein identifications. This approach has wider applicability for proteomics database searching, and the program for predicting missed cleavages and masking Fasta-formatted protein sequence databases has been made available via http:// ispider.smith.man.ac uk/MissedCleave.