Project description:PeptideAtlas, SRMAtlas, and PASSEL are Web-accessible resources to support discovery and targeted proteomics research. PeptideAtlas is a multi-species compendium of shotgun proteomic data provided by the scientific community; SRMAtlas is a resource of high-quality, complete proteome SRM assays generated in a consistent manner for the targeted identification and quantification of proteins; and PASSEL is a repository that compiles and represents selected reaction monitoring data, all in an easy-to-use interface. The databases are generated from native mass spectrometry data files that are analyzed in a standardized manner including statistical validation of the results. Each resource offers search functionalities and can be queried by user-defined constraints; the query results are provided in tables or are graphically displayed. PeptideAtlas, SRMAtlas, and PASSEL are publicly available freely via the Web site http://www.peptideatlas.org. In this protocol, we describe the use of these resources, we highlight how to submit, search, collate and download data.
Project description:Here we present a dataset generated using formalin-fixed paraffin-embedded archival samples from two rare lung neuroendocrine tumor subtypes (namely, two atypical carcinoids, ACs, and two large-cell neuroendocrine carcinomas, LCNECs). Samples were subjected to a shotgun proteomics pipeline, comprising full-length protein extraction, SDS removal through spin columns, in solution trypsin digestion, long gradient liquid chromatography peptide separation and LTQ-Orbitrap mass spectrometry analysis. A total of 1260 and 2436 proteins were identified in the AC and LCNEC samples, respectively, with FDR <1%. MS data are available in the PeptideAtlas repository at http://www.peptideatlas.org/PASS/PASS00375.
Project description:The open XML format mzML, used for representation of MS data, is pivotal for the development of platform-independent MS analysis software. Although conversion from vendor formats to mzML must take place on a platform on which the vendor libraries are available (i.e. Windows), once mzML files have been generated, they can be used on any platform. However, the mzML format has turned out to be less efficient than vendor formats. In many cases, the naïve mzML representation is fourfold or even up to 18-fold larger compared with the original vendor file. In disk I/O limited setups, a larger data file also leads to longer processing times, which is a problem given the data production rates of modern mass spectrometers. In an attempt to reduce this problem, we here present a family of numerical compression algorithms called MS-Numpress, intended for efficient compression of MS data. To facilitate ease of adoption, the algorithms target the binary data in the mzML standard, and support in main proteomics tools is already available. Using a test set of 10 representative MS data files we demonstrate typical file size decreases of 90% when combined with traditional compression, as well as read time decreases of up to 50%. It is envisaged that these improvements will be beneficial for data handling within the MS community.
Project description:Myopia is the most common refractive error which is estimated to affect half the population of the world by 2050. It has been suggested that it could be determined by multiple factors such as environmental and genetic, but the mechanism behind the cause of myopia is still yet to be identified. Vitreous humor (VH) is a transparent gelatin-like substance that takes up to 80% of the volume of the eye, making it the largest component of the eye. Although VH is the main contributor to axial elongation of the eye including normal eye growth (emmetropization) and myopia, the diluted nature of VH (made up of 99% of water) made it difficult for less abundant molecules to be identified and therefore often overlooked. Using the more sensitive label-free mass spectrometry approach with data-independent acquisition (SWATH-MS), we established a comprehensive VH proteome library in chick animal model and quantified possible protein biomarkers that are responsible for the axial elongation during emmetropization (7, 14, 21, 28 days after hatching, n?=?48 eyes). Raw data files for both information-dependent acquisition (IDA) and data-independent acquisition (SWATH-MS) were uploaded on PeptideAtlas for public access (http://www.peptideatlas.org/PASS/PASS01258).
Project description:We present xiSPEC, a standard compliant, next-generation web-based spectrum viewer for visualizing, analyzing and sharing mass spectrometry data. Peptide-spectrum matches from standard proteomics and cross-linking experiments are supported. xiSPEC is to date the only browser-based tool supporting the standardized file formats mzML and mzIdentML defined by the proteomics standards initiative. Users can either upload data directly or select files from the PRIDE data repository as input. xiSPEC allows users to save and share their datasets publicly or password protected for providing access to collaborators or readers and reviewers of manuscripts. The identification table features advanced interaction controls and spectra are presented in three interconnected views: (i) annotated mass spectrum, (ii) peptide sequence fragmentation key and (iii) quality control error plots of matched fragments. Highlighting or selecting data points in any view is represented in all other views. Views are interactive scalable vector graphic elements, which can be exported, e.g. for use in publication. xiSPEC allows for re-annotation of spectra for easy hypothesis testing by modifying input data. xiSPEC is freely accessible at http://spectrumviewer.org and the source code is openly available on https://github.com/Rappsilber-Laboratory/xiSPEC.
Project description:The ProteoWizard software project provides a modular and extensible set of open-source, cross-platform tools and libraries. The tools perform proteomics data analyses; the libraries enable rapid tool creation by providing a robust, pluggable development framework that simplifies and unifies data file access, and performs standard proteomics and LCMS dataset computations. The library contains readers and writers of the mzML data format, which has been written using modern C++ techniques and design principles and supports a variety of platforms with native compilers. The software has been specifically released under the Apache v2 license to ensure it can be used in both academic and commercial projects. In addition to the library, we also introduce a rapidly growing set of companion tools whose implementation helps to illustrate the simplicity of developing applications on top of the ProteoWizard library.Cross-platform software that compiles using native compilers (i.e. GCC on Linux, MSVC on Windows and XCode on OSX) is available for download free of charge, at http://proteowizard.sourceforge.net. This website also provides code examples, and documentation. It is our hope the ProteoWizard project will become a standard platform for proteomics development; consequently, code use, contribution and further development are strongly encouraged.
Project description:Bottom-up proteomics relies on the proteolytic or chemical cleavage of proteins into peptides, the identification of those peptides via mass spectrometry, and the mapping of the identified peptides back to the reference proteome to infer which possible proteins are identified. Reliable mapping of peptides to proteins still poses substantial challenges when considering similar proteins, protein families, splice isoforms, sequence variation, and possible residue mass modifications, combined with an imperfect and incomplete understanding of the proteome. The ProteoMapper tool enables a comprehensive and rapid mapping of peptides to a reference proteome. The indexer component creates a segmented index for an input proteome from a FASTA or PEFF file. The ProMaST component provides ultrafast mapping of one or more input peptides against the index. ProteoMapper allows searches that take into account known sequence variation encoded in PEFF files. It also enables fuzzy searches to find highly similar peptides with residue order changes or other isobaric or near-isobaric substitutions within a specified mass tolerance. We demonstrate an example of a one-hit-wonder identification in PeptideAtlas that may be better explained by a combination of catalogued and uncatalogued sequence variation in another highly observed protein. ProteoMapper is a free and open source, available for local use after downloading, embedding in other applications, as an online web tool at http://www.peptideatlas.org/map , and as a web service.
Project description:The ms-data-core-api is a free, open-source library for developing computational proteomics tools and pipelines. The Application Programming Interface, written in Java, enables rapid tool creation by providing a robust, pluggable programming interface and common data model. The data model is based on controlled vocabularies/ontologies and captures the whole range of data types included in common proteomics experimental workflows, going from spectra to peptide/protein identifications to quantitative results. The library contains readers for three of the most used Proteomics Standards Initiative standard file formats: mzML, mzIdentML, and mzTab. In addition to mzML, it also supports other common mass spectra data formats: dta, ms2, mgf, pkl, apl (text-based), mzXML and mzData (XML-based). Also, it can be used to read PRIDE XML, the original format used by the PRIDE database, one of the world-leading proteomics resources. Finally, we present a set of algorithms and tools whose implementation illustrates the simplicity of developing applications using the library.The software is freely available at https://github.com/PRIDE-Utilities/ms-data-core-api.Supplementary data are available at Bioinformatics email@example.com.
Project description:The closed nature of vendor file formats in mass spectrometry is a significant barrier to progress in developing robust bioinformatics software. In response, the community has developed the open mzML format, implemented in XML and based on controlled vocabularies. Widely adopted, mzML is an important step forward; however, it suffers from two challenges that are particularly apparent as the field moves to high-throughput proteomics: large increase in file size, and a largely sequential I/O access pattern. Described here is 'toffee', an open, random I/O format backed by HDF5, with lossless compression that gives file sizes similar to the original vendor format and can be reconverted back to mzML without penalty. It is shown that mzML and toffee are equivalent when processing data using OpenSWATH algorithms, in additional to novel applications that are enabled by new data access patterns. For instance, a peptide-centric deep-learning pipeline for peptide identification is proposed. Documentation and examples are available at https://toffee.readthedocs.io, and all code is MIT licensed at https://bitbucket.org/cmriprocan/toffee.
Project description:This article contains SRM proteomics data related to the research article entitled"Inactivation of iron-sulfur cluster biogenesis regulator SufR in Synechocystis sp. PCC 6803 induces unique iron-dependent protein-level responses" (L. Vuorijoki, A. Tiwari, P. Kallio, E.M. Aro, 2017) . The data described here provide comprehensive information on the applied SRM assays, together with the results of quantifying 94 Synechocystis sp. PCC 6803 proteins. The data has been deposited in Panorama public (https://panoramaweb.org/labkey/SufR) and in PASSEL under the PASS00765 identifier (http://www.peptideatlas.org/PASS/PASS00765).