MicrobioLink: An Integrated Computational Pipeline to Infer Functional Effects of Microbiome-Host Interactions.
ABSTRACT: Microbiome-host interactions play significant roles in health and in various diseases including autoimmune disorders. Uncovering these inter-kingdom cross-talks propels our understanding of disease pathogenesis and provides useful leads on potential therapeutic targets. Despite the biological significance of microbe-host interactions, there is a big gap in understanding the downstream effects of these interactions on host processes. Computational methods are expected to fill this gap by generating, integrating, and prioritizing predictions-as experimental detection remains challenging due to feasibility issues. Here, we present MicrobioLink, a computational pipeline to integrate predicted interactions between microbial and host proteins together with host molecular networks. Using the concept of network diffusion, MicrobioLink can analyse how microbial proteins in a certain context are influencing cellular processes by modulating gene or protein expression. We demonstrated the applicability of the pipeline using a case study. We used gut metaproteomic data from Crohn's disease patients and healthy controls to uncover the mechanisms by which the microbial proteins can modulate host genes which belong to biological processes implicated in disease pathogenesis. MicrobioLink, which is agnostic of the microbial protein sources (bacterial, viral, etc.), is freely available on GitHub.
Project description:<label>BACKGROUND</label>Research involving microbial ecosystems has drawn increasing attention in recent years. Studying microbe-microbe, host-microbe, and environment-microbe interactions are essential for the understanding of microbial ecosystems. Currently, metaproteomics provide qualitative and quantitative information of proteins, providing insights into the functional changes of microbial communities. However, computational analysis of large-scale data generated in metaproteomic studies remains a challenge. Conventional proteomic software have difficulties dealing with the extreme complexity and species diversity present in microbiome samples leading to lower rates of peptide and protein identification. To address this issue, we previously developed the MetaPro-IQ approach for highly efficient microbial protein/peptide identification and quantification.<label>RESULT</label>Here, we developed an integrated software platform, named MetaLab, providing a complete and automated, user-friendly pipeline for fast microbial protein identification, quantification, as well as taxonomic profiling, directly from mass spectrometry raw data. Spectral clustering adopted in the pre-processing step dramatically improved the speed of peptide identification from database searches. Quantitative information of identified peptides was used for estimating the relative abundance of taxa at all phylogenetic ranks. Taxonomy result files exported by MetaLab are fully compatible with widely used metagenomics tools. Herein, the potential of MetaLab is evaluated by reanalyzing a metaproteomic dataset from mouse gut microbiome samples.<label>CONCLUSION</label>MetaLab is a fully automatic software platform enabling an integrated data-processing pipeline for metaproteomics. The function of sample-specific database generation can be very advantageous for searching peptides against huge protein databases. It provides a seamless connection between peptide determination and taxonomic profiling; therefore, the peptide abundance is readily used for measuring the microbial variations. MetaLab is designed as a versatile, efficient, and easy-to-use tool which can greatly simplify the procedure of metaproteomic data analysis for researchers in microbiome studies.
Project description:Metaproteomics is a powerful tool for obtaining data on all proteins recovered directly from environmental samples at a given time. It provides a direct evidence of functional diversity and structure among microbiota present in niches and significant insights into microbial activity together with metabolomics, which is the study of the intermediate and end-products of cellular processes. Metaproteomics is a comparatively new approach which is facing a number of empirical, technical, computational and experimental design challenges that needs to be addressed. Presently only little efforts have been made to have information on microbial proteins in rhizospheric soil of maize through metagemonics approach but there is no direct evidence on functions of microbial community in this very important niche. Since rhizosphere microbiome plays important role in plant growth and development the present study is conducted to optimize the metaproteomic extraction protocol from maize rhizosphere and analyse functionality of microbial communities. We present metaproteome data from maize rhizospheric soil. Isolation of metaproteome from maize rhizosphere collected from ICAR-IISS, Mau experimental Farm was done with the standardized protocol at our laboratory and metaproteome analysis was done with the standardized pipeline. In total 696 proteins with different functions representing 244 genus and 393 species were identified. The proteome data provides direct evidence on the biological processes in soil ecosystem and is the first reported reference data from maize rhizosphere. The LC MS/MS proteomic data are available via ProteomeXchange with identifier PXD014519.
Project description:Ocean metaproteomics is an emerging field enabling discoveries about marine microbial communities and their impact on global biogeochemical processes. Recent ocean metaproteomic studies have provided insight into microbial nutrient transport, colimitation of carbon fixation, the metabolism of microbial biofilms, and dynamics of carbon flux in marine ecosystems. Future methodological developments could provide new capabilities such as characterizing long-term ecosystem changes, biogeochemical reaction rates, and in situ stoichiometries. Yet challenges remain for ocean metaproteomics due to the great biological diversity that produces highly complex mass spectra, as well as the difficulty in obtaining and working with environmental samples. This review summarizes the progress and challenges facing ocean metaproteomic scientists and proposes best practices for data sharing of ocean metaproteomic data sets, including the data types and metadata needed to enable intercomparisons of protein distributions and annotations that could foster global ocean metaproteomic capabilities.
Project description:Sponges harbour complex communities of diverse microorganisms, which have been postulated to form intimate symbiotic relationships with their host. Here we unravel some of these interactions by characterising the functional features of the microbial community of the sponge Cymbastela concentrica through a combined metagenomic and metaproteomic approach. We discover the expression of specific transport functions for typical sponge metabolites (for example, halogenated aromatics, dipeptides), which indicates metabolic interactions between the community and the host. We also uncover the simultaneous performance of aerobic nitrification and anaerobic denitrification, which would aid to remove ammonium secreted by the sponge. Our analysis also highlights the requirement for the microbial community to respond to variable environmental conditions and hence express an array of stress protection proteins. Molecular interactions between symbionts and their host might also be mediated by a set of expressed eukaryotic-like proteins and cell-cell mediators. Finally, some sponge-associated bacteria (for example, a Phyllobacteriaceae phylotype) appear to undergo an evolutionary adaptation process to the sponge environment as evidenced by active mobile genetic elements. Our data clearly show that a combined metaproteogenomic approach can provide novel information on the activities, physiology and interactions of sponge-associated microbial communities.
Project description:Matching metagenomic and/or metatranscriptomic data, currently often under-used, can be useful reference for metaproteomic tandem mass spectra (MS/MS) data analysis. Here we developed a software pipeline for identification of peptides and proteins from metaproteomic MS/MS data using proteins derived from matching metagenomic (and metatranscriptomic) data as the search database, based on two novel approaches Graph2Pro (published) and Var2Pep (new). Graph2Pro retains and uses uncertainties of metagenome assembly for reference-based MS/MS data analysis. Var2Pep considers the variations found in metagenomic/metatranscriptomic sequencing reads that are not retained in the assemblies (contigs). The new software pipeline provides one stop application of both tools, and it supports the use of metagenome assembly from commonly used assemblers including MegaHit and metaSPAdes. When tested on two collections of multi-omic microbiome data sets, our pipeline significantly improved the identification rate of the metaproteomic MS/MS spectra by about two folds, comparing to conventional contig- or read-based approaches (the Var2Pep alone identified 5.6% to 24.1% more unique peptides, depending on the data set). We also showed that identified variant peptides are important for functional profiling of microbiomes. All results suggested that it is important to take into consideration of the assembly uncertainties and genomic variants to facilitate metaproteomic MS/MS data interpretation.
Project description:The establishment of early life microbiota in the human infant gut is highly variable and plays a crucial role in host nutrient availability/uptake and maturation of immunity. Although high-performance mass spectrometry (MS)-based metaproteomics is a powerful method for the functional characterization of complex microbial communities, the acquisition of comprehensive metaproteomic information in human fecal samples is inhibited by the presence of abundant human proteins. To alleviate this restriction, we have designed a novel metaproteomic strategy based on double filtering (DF) the raw samples, a method that fractionates microbial from human cells to enhance microbial protein identification and characterization in complex fecal samples from healthy premature infants. This method dramatically improved the overall depth of infant gut proteome measurement, with an increase in the number of identified low-abundance proteins and a greater than 2-fold improvement in microbial protein identification and quantification. This enhancement of proteome measurement depth enabled a more extensive microbiome comparison between infants by not only increasing the confidence of identified microbial functional categories but also revealing previously undetected categories.
Project description:Although microbial communities are ubiquitous in nature, relatively little is known about the structural and functional roles of their constituent organisms' underlying interactions. A common approach to study such questions begins with extracting a network of statistically significant pairwise co-occurrences from a matrix of observed operational taxonomic unit (OTU) abundances across sites. The structure of this network is assumed to encode information about ecological interactions and processes, resistance to perturbation, and the identity of keystone species. However, common methods for identifying these pairwise interactions can contaminate the network with spurious patterns that obscure true ecological signals. Here, we describe this problem in detail and develop a solution that incorporates null models to distinguish ecological signals from statistical noise. We apply these methods to the initial OTU abundance matrix and to the extracted network. We demonstrate this approach by applying it to a large soil microbiome data set and show that many previously reported patterns for these data are statistical artifacts. In contrast, we find the frequency of three-way interactions among microbial OTUs to be highly statistically significant. These results demonstrate the importance of using appropriate null models when studying observational microbiome data, and suggest that extracting and characterizing three-way interactions among OTUs is a promising direction for unraveling the structure and function of microbial ecosystems.
Project description:By their metabolic activities, microorganisms have a crucial role in the biogeochemical cycles of elements. The complete understanding of these processes requires, however, the deciphering of both the structure and the function, including synecologic interactions, of microbial communities. Using a metagenomic approach, we demonstrated here that an acid mine drainage highly contaminated with arsenic is dominated by seven bacterial strains whose genomes were reconstructed. Five of them represent yet uncultivated bacteria and include two strains belonging to a novel bacterial phylum present in some similar ecosystems, and which was named 'Candidatus Fodinabacter communificans.' Metaproteomic data unravelled several microbial capabilities expressed in situ, such as iron, sulfur and arsenic oxidation that are key mechanisms in biomineralization, or organic nutrient, amino acid and vitamin metabolism involved in synthrophic associations. A statistical analysis of genomic and proteomic data and reverse transcriptase-PCR experiments allowed us to build an integrated model of the metabolic interactions that may be of prime importance in the natural attenuation of such anthropized ecosystems.
Project description:BACKGROUND:Natural microbial communities are extremely complex and dynamic systems in terms of their population structure and functions. However, little is known about the in situ functions of the microbial communities. RESULTS:This study describes the application of proteomic approaches (metaproteomics) to observe expressed protein profiles of natural microbial communities (metaproteomes). The technique was validated using a constructed community and subsequently used to analyze Chesapeake Bay microbial community (0.2 to 3.0 microm) metaproteomes. Chesapeake Bay metaproteomes contained proteins from pI 4-8 with apparent molecular masses between 10-80 kDa. Replicated middle Bay metaproteomes shared approximately 92% of all detected spots, but only shared 30% and 70% of common protein spots with upper and lower Bay metaproteomes. MALDI-TOF analysis of highly expressed proteins produced no significant matches to known proteins. Three Chesapeake Bay proteins were tentatively identified by LC-MS/MS sequencing coupled with MS-BLAST searching. The proteins identified were of marine microbial origin and correlated with abundant Chesapeake Bay microbial lineages, Bacteroides and alpha-proteobacteria. CONCLUSION:Our results represent the first metaproteomic study of aquatic microbial assemblages and demonstrate the potential of metaproteomic approaches to link metagenomic data, taxonomic diversity, functional diversity and biological processes in natural environments.
Project description:The human intestinal tract is colonized by microbial communities that show a subject-specific composition and a high-level temporal stability in healthy adults. To determine whether this is reflected at the functional level, we compared the faecal metaproteomes of healthy subjects over time using a novel high-throughput approach based on denaturing polyacrylamide gel electrophoresis and liquid chromatography-tandem mass spectrometry. The developed robust metaproteomics workflow and identification pipeline was used to study the composition and temporal stability of the intestinal metaproteome using faecal samples collected from 3 healthy subjects over a period of six to twelve months. The same samples were also subjected to DNA extraction and analysed for their microbial composition and diversity using the Human Intestinal Tract Chip, a validated phylogenetic microarray. Using metagenome and single genome sequence data out of the thousands of mass spectra generated per sample, approximately 1,000 peptides per sample were identified. Our results indicate that the faecal metaproteome is subject-specific and stable during a one-year period. A stable common core of approximately 1,000 proteins could be recognized in each of the subjects, indicating a common functional core that is mainly involved in carbohydrate transport and degradation. Additionally, a variety of surface proteins could be identified, including potential microbes-host interacting components such as flagellins and pili. Altogether, we observed a highly comparable subject-specific clustering of the metaproteomic and phylogenetic profiles, indicating that the distinct microbial activity is reflected by the individual composition.