Progress and Challenges in Ocean Metaproteomics and Proposed Best Practices for Data Sharing.
ABSTRACT: Ocean metaproteomics is an emerging field enabling discoveries about marine microbial communities and their impact on global biogeochemical processes. Recent ocean metaproteomic studies have provided insight into microbial nutrient transport, colimitation of carbon fixation, the metabolism of microbial biofilms, and dynamics of carbon flux in marine ecosystems. Future methodological developments could provide new capabilities such as characterizing long-term ecosystem changes, biogeochemical reaction rates, and in situ stoichiometries. Yet challenges remain for ocean metaproteomics due to the great biological diversity that produces highly complex mass spectra, as well as the difficulty in obtaining and working with environmental samples. This review summarizes the progress and challenges facing ocean metaproteomic scientists and proposes best practices for data sharing of ocean metaproteomic data sets, including the data types and metadata needed to enable intercomparisons of protein distributions and annotations that could foster global ocean metaproteomic capabilities.
Project description:Metaproteomics, the study of protein expression in microbial communities, is a versatile tool for environmental microbiology. Achieving sufficiently high metaproteome coverage to obtain a comprehensive picture of the activities and interactions in microbial communities is one of the current challenges in metaproteomics. An essential step to maximize the number of identified proteins is peptide separation via liquid chromatography (LC) prior to mass spectrometry (MS). Thorough optimization and comparison of LC methods for metaproteomics are, however, currently lacking. Here, we present an extensive development and test of different 1D and 2D-LC approaches for metaproteomic peptide separations. We used fully characterized mock community samples to evaluate metaproteomic approaches with very long analytical columns (50 and 75 cm) and long gradients (up to 12 h). We assessed a total of over 20 different 1D and 2D-LC approaches in terms of number of protein groups and unique peptides identified, peptide spectrum matches (PSMs) generated, the ability to detect proteins of low-abundance species, the effect of technical replicate runs on protein identifications and method reproducibility. We show here that, while 1D-LC approaches are faster and easier to set up and lead to more identifications per minute of runtime, 2D-LC approaches allow for a higher overall number of identifications with up to >10,000 protein groups identified. We also compared the 1D and 2D-LC approaches to a standard GeLC workflow, in which proteins are pre-fractionated via gel electrophoresis. This method yielded results comparable to the 2D-LC approaches, however with the drawback of a much increased sample preparation time. Based on our results, we provide recommendations on how to choose the best LC approach for metaproteomics experiments, depending on the study aims.
Project description:Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes are first directly predicted from metagenomic (and/or metatranscriptomic) sequences or from their assemblies, and the resulting protein sequences are then used as the reference database for peptide/protein identification from MS/MS spectra. This approach is often limited because protein coding genes predicted from metagenomes are incomplete and fragmental. In this paper, we present a graph-centric approach to improving metagenome-guided peptide and protein identification in metaproteomics. Our method exploits the de Bruijn graph structure reported by metagenome assembly algorithms to generate a comprehensive database of protein sequences encoded in the community. We tested our method using several public metaproteomic datasets with matched metagenomic and metatranscriptomic sequencing data acquired from complex microbial communities in a biological wastewater treatment plant. The results showed that many more peptides and proteins can be identified when assembly graphs were utilized, improving the characterization of the proteins expressed in the microbial communities. The additional proteins we identified contribute to the characterization of important pathways such as those involved in degradation of chemical hazards. Our tools are released as open-source software on github at https://github.com/COL-IU/Graph2Pro.
Project description:Environmental meta-omics is rapidly expanding as sequencing capabilities improve, computing technologies become more accessible, and associated costs are reduced. The in situ snapshots of marine microbial life afforded by these data provide a growing knowledge of the functional roles of communities in ecosystem processes. Metaproteomics allows for the characterization of the dynamic proteome of a complex microbial community. It has the potential to reveal impacts of microbial metabolism on biogeochemical transport, storage and cycling (for example, Hawley et al., 2014), while additionally clarifying which taxonomic groups perform these roles. Previous work illuminated many of the important functions and interactions within marine microbial communities (for example, Morris et al., 2010), but a review of ocean metaproteomics literature revealed little standardization in bioinformatics pipelines for detecting peptides and inferring and annotating proteins. As prevalence of these data sets grows, there is a critical need to develop standardized approaches for mass spectrometry (MS) proteomic spectrum identification and annotation to maximize the scientific value of the data obtained. Here, we demonstrate that bioinformatics decisions made throughout the peptide identification process are as important for data interpretation as choices of sampling protocol and bacterial community manipulation experimental design. Our analysis offers a best practices guide for environmental metaproteomics.
Project description:Pathogenesis of colorectal cancer (CRC) is associated with alterations in gut microbiome. Previous studies have focused on the changes of taxonomic abundances by metagenomics. Variations of the function of intestinal bacteria in CRC patients compared to healthy crowds remain largely unknown. Here we collected fecal samples from CRC patients and healthy volunteers and characterized their microbiome using quantitative metaproteomic method. We have identified and quantified 91,902 peptides, 30,062 gut microbial protein groups, and 195 genera of microbes. Among the proteins, 341 were found significantly different in abundance between the CRC patients and the healthy volunteers. Microbial proteins related to iron intake/transport; oxidative stress; and DNA replication, recombination, and repair were significantly alternated in abundance as a result of high local concentration of iron and high oxidative stress in the large intestine of CRC patients. Our study shows that metaproteomics can provide functional information on intestinal microflora that is of great value for pathogenesis research, and can help guide clinical diagnosis in the future.
Project description:Motivation:Complex microbial communities can be characterized by metagenomics and metaproteomics. However, metagenome assemblies often generate enormous, and yet incomplete, protein databases, which undermines the identification of peptides and proteins in metaproteomics. This challenge calls for increased discrimination of true identifications from false identifications by database searching and filtering algorithms in metaproteomics. Results:Sipros Ensemble was developed here for metaproteomics using an ensemble approach. Three diverse scoring functions from MyriMatch, Comet and the original Sipros were incorporated within a single database searching engine. Supervised classification with logistic regression was used to filter database searching results. Benchmarking with soil and marine microbial communities demonstrated a higher number of peptide and protein identifications by Sipros Ensemble than MyriMatch/Percolator, Comet/Percolator, MS-GF+/Percolator, Comet & MyriMatch/iProphet and Comet & MyriMatch & MS-GF+/iProphet. Sipros Ensemble was computationally efficient and scalable on supercomputers. Availability and implementation:Freely available under the GNU GPL license at http://sipros.omicsbio.org. Contact:firstname.lastname@example.org. Supplementary information:Supplementary data are available at Bioinformatics online.
Project description:The microbiome has a strong impact on human health and disease and is, therefore, increasingly studied in a clinical context. Metaproteomics is also attracting considerable attention, and such data can be efficiently generated today owing to improvements in mass spectrometry-based proteomics. As we will discuss in this study, there are still major challenges notably in data analysis that need to be overcome. Here, we analyzed 212 fecal samples from 56 hospitalized acute leukemia patients with multidrug-resistant Enterobactericeae (MRE) gut colonization using metagenomics and metaproteomics. This is one of the largest clinical metaproteomic studies to date, and the first metaproteomic study addressing the gut microbiome in MRE colonized acute leukemia patients. Based on this substantial data set, we discuss major current limitations in clinical metaproteomic data analysis to provide guidance to researchers in the field. Notably, the results show that public metagenome databases are incomplete and that sample-specific metagenomes improve results. Furthermore, biological variation is tremendous which challenges clinical study designs and argues that longitudinal measurements of individual patients are a valuable future addition to the analysis of patient cohorts.
Project description:Unraveling the complex structure and functioning of microbial communities is essential to accurately predict the impact of perturbations and/or environmental changes. From all molecular tools available today to resolve the dynamics of microbial communities, metaproteomics stands out, allowing the establishment of phenotype-genotype linkages. Despite its rapid development, this technology has faced many technical challenges that still hamper its potential power. How to maximize the number of protein identification, improve quality of protein annotation, and provide reliable ecological interpretation are questions of immediate urgency. In our study, we used a robust metaproteomic workflow combining two protein fractionation approaches (gel-based versus gel-free) and four protein search databases derived from the same metagenome to analyze the same seawater sample. The resulting eight metaproteomes provided different outcomes in terms of (i) total protein numbers, (ii) taxonomic structures, and (iii) protein functions. The characterization and/or representativeness of numerous proteins from ecologically relevant taxa such as Pelagibacterales, Rhodobacterales, and Synechococcales, as well as crucial environmental processes, such as nutrient uptake, nitrogen assimilation, light harvesting, and oxidative stress response, were found to be particularly affected by the methodology. Our results provide clear evidences that the use of different protein search databases significantly alters the biological conclusions in both gel-free and gel-based approaches. Our findings emphasize the importance of diversifying the experimental workflow for a comprehensive metaproteomic study.
Project description:Marine oxygen minimum zones (OMZs) are intrinsic water column features arising from respiratory oxygen demand during organic matter degradation in stratified waters. Currently OMZs are expanding due to global climate change with resulting feedback on marine ecosystem function. Here we use metaproteomics to chart spatial and temporal patterns of gene expression along defined redox gradients in a seasonally stratified fjord to better understand microbial community responses to OMZ expansion. The expression of metabolic pathway components for nitrification, anaerobic ammonium oxidation (anammox), denitrification, and inorganic carbon fixation were differentially expressed across the redoxcline and covaried with distribution patterns of ubiquitous OMZ microbes including Thaumarchaeota, Nitrospina, Nitrospira, Planctomycetes, and SUP05/ARCTIC96BD-19 Gammaproteobacteria. Nitrification and inorganic carbon fixation pathways affiliated with Thaumarchaeota dominated dysoxic waters, and denitrification, sulfur oxidation, and inorganic carbon fixation pathways affiliated with the SUP05 group of nitrate-reducing sulfur oxidizers dominated suboxic and anoxic waters. Nitrifier nitrite oxidation and anammox pathways affiliated with Nirospina, Nitrospira, and Planctomycetes, respectively, also exhibited redox partitioning between dysoxic and suboxic waters. The numerical abundance of SUP05 proteins mediating inorganic carbon fixation under anoxic conditions suggests that SUP05 will become increasingly important in global ocean carbon and nutrient cycling as OMZs expand.
Project description:Hitherto, the main goal of metaproteomic analyses has been to characterize the functional role of particular microorganisms in the microbial ecology of various microbial communities. Recently, it has been suggested that metaproteomics could be used for bioprospecting microbial communities to query for the most active enzymes to improve the selection process of industrially relevant enzymes. In the present study, to reduce the complexity of metaproteomic samples for targeted bioprospecting of novel enzymes, a microbial community capable of producing cellulases was maintained on a chemically defined medium in an enzyme suppressed metabolic steady state. From this state, it was possible to specifically and distinctively induce the desired cellulolytic activity. The extracellular fraction of the protein complement of the induced sample could thereby be purified and compared to a non-induced sample of the same community by differential gel electrophoresis to discriminate between constitutively expressed proteins and proteins upregulated in response to the inducing substance.Using the applied approach, downstream analysis by mass spectrometry could be limited to only proteins recognized as upregulated in the cellulase-induced sample. Of 39 selected proteins, the majority were found to be linked to the need to degrade, take up, and metabolize cellulose. In addition, 28 (72%) of the proteins were non-cytosolic and 17 (44%) were annotated as carbohydrate-active enzymes. The results demonstrated both the applicability of the proposed approach for identifying extracellular proteins and guiding the selection of proteins toward those specifically upregulated and targeted by the enzyme inducing substance. Further, because identification of interesting proteins was based on the regulation of enzyme expression in response to a need to hydrolyze and utilize a specific substance, other unexpected enzyme activities were able to be identified.The described approach created the conditions necessary to be able to select relevant extracellular enzymes that were extracted from the enzyme-induced microbial community. However, for the purpose of bioprospecting for enzymes to clone, produce, and characterize for practical applications, it was concluded that identification against public databases was not sufficient to identify the correct gene or protein sequence for cloning of the identified novel enzymes.
Project description:The impact of microbial communities, also known as the microbiome, on human health and the environment is receiving increased attention. Studying translated gene products (proteins) and comparing metaproteomic profiles may elucidate how microbiomes respond to specific environmental stimuli, and interact with host organisms. Characterizing proteins expressed by a complex microbiome and interpreting their functional signature requires sophisticated informatics tools and workflows tailored to metaproteomics. Additionally, there is a need to disseminate these informatics resources to researchers undertaking metaproteomic studies, who could use them to make new and important discoveries in microbiome research. The Galaxy for proteomics platform (Galaxy-P) offers an open source, web-based bioinformatics platform for disseminating metaproteomics software and workflows. Within this platform, we have developed easily-accessible and documented metaproteomic software tools and workflows aimed at training researchers in their operation and disseminating the tools for more widespread use. The modular workflows encompass the core requirements of metaproteomic informatics: (a) database generation; (b) peptide spectral matching; (c) taxonomic analysis and (d) functional analysis. Much of the software available via the Galaxy-P platform was selected, packaged and deployed through an online metaproteomics "Contribution Fest" undertaken by a unique consortium of expert software developers and users from the metaproteomics research community, who have co-authored this manuscript. These resources are documented on GitHub and freely available through the Galaxy Toolshed, as well as a publicly accessible metaproteomics gateway Galaxy instance. These documented workflows are well suited for the training of novice metaproteomics researchers, through online resources such as the Galaxy Training Network, as well as hands-on training workshops. Here, we describe the metaproteomics tools available within these Galaxy-based resources, as well as the process by which they were selected and implemented in our community-based work. We hope this description will increase access to and utilization of metaproteomics tools, as well as offer a framework for continued community-based development and dissemination of cutting edge metaproteomics software.