Assessing and Interpreting the Metagenome Heterogeneity With Power Law.
ABSTRACT: There are two major sequencing technologies for investigating the microbiome: the amplicon sequencing that generates the OTU (operational taxonomic unit) tables of marker genes (e.g., bacterial 16S-rRNA), and the metagenomic shotgun sequencing that generates metagenomic gene abundance (MGA) tables. The OTU table is the counterpart of species abundance tables in macrobial ecology of plants and animals, and has been the target of numerous ecological and network analyses in recent gold rush for microbiome research and in great efforts for establishing an inclusive theoretical ecology. Nevertheless, MGA analyses have been largely limited to bioinformatics pipelines and ad hoc statistical methods, and systematic approaches to MGAs guided by classic ecological theories are still few. Here, we argue that, the difference between "gene kinds" and "gene species" are nominal, and the metagenome that a microbiota carries is essentially a 'community' of metagenomic genes (MGs). Each row of a MGA table represents a metagenome of a microbiota, and the whole MGA table represents a 'meta-metagenome' (or an assemblage of metagenomes) of N microbiotas (microbiome samples). Consequently, the same ecological/network analyses used in OTU analyses should be equally applicable to MGA tables. Here we choose to analyze the heterogeneity of metagenome by introducing classic Taylor's power law (TPL) and its recent extensions in community ecology. Heterogeneity is a fundamental property of metagenome, particularly in the context of human microbiomes. Recent studies have shown that the heterogeneity of human metagenomes is far more significant than that of human genomes. Therefore, without deep understanding of the human metagenome heterogeneity, personalized medicine of the human microbiome-associated diseases is hardly feasible. The TPL extensions have been successfully applied to measure the heterogeneity of human microbiome based on amplicon-sequencing reads of marker genes (e.g., 16s-rRNA). In this article, we demonstrate the analysis of the metagenomic heterogeneity of human gut microbiome at whole metagenome scale (with type-I power law extension) and metagenomic gene scale (type-III), as well as the heterogeneity of gene clusters, respectively. We further examine the influences of obesity, IBD and diabetes on the heterogeneity, which is of important ramifications for the diagnosis and treatment of human microbiome-associated diseases.
Project description:When a bacterial genome is compared to the metagenome of an environment it inhabits, most genes recruit at high sequence identity. In free-living bacteria (for instance marine bacteria compared against the ocean metagenome) certain genomic regions are totally absent in recruitment plots, representing therefore genes unique to individual bacterial isolates. We show that these Metagenomic Islands (MIs) are also visible in bacteria living in human hosts when their genomes are compared to sequences from the human microbiome, despite the compartmentalized structure of human-related environments such as the gut. From an applied point of view, MIs of human pathogens (e.g. those identified in enterohaemorragic Escherichia coli against the gut metagenome or in pathogenic Neisseria meningitidis against the oral metagenome) include virulence genes that appear to be absent in related strains or species present in the microbiome of healthy individuals. We propose that this strategy (i.e. recruitment analysis of pathogenic bacteria against the metagenome of healthy subjects) can be used to detect pathogenicity regions in species where the genes involved in virulence are poorly characterized. Using this approach, we detect well-known pathogenicity islands and identify new potential virulence genes in several human pathogens.
Project description:BACKGROUND:Advances in bioinformatics recently allowed for the recovery of 'metagenomes assembled genomes' from human microbiome studies carried on with shotgun sequencing techniques. Such approach is used as a mean to discover new unclassified metagenomic species, putative biological entities having distinct metabolic traits. RESULTS:In the present analysis we compare 400 genomes from isolates available on NCBI database and 10,000 human gut metagenomic species, screening all of them for the presence of a minimal set of core functionalities necessary, but not sufficient, for life. As a result, the metagenome-assembled genomes resulted systematically depleted in genes encoding for essential functions apparently needed to support autonomous bacterial life. CONCLUSIONS:The relevant degree of lacking core functionalities that we observed in metagenome-assembled genomes raises some concerns about the effective completeness of metagenome-assembled genomes, suggesting caution in extrapolating biological information about their metabolic propensity and ecology in a complex environment like the human gastrointestinal tract.
Project description:MOTIVATION: Over the recent years, the field of whole-metagenome shotgun sequencing has witnessed significant growth owing to the high-throughput sequencing technologies that allow sequencing genomic samples cheaper, faster and with better coverage than before. This technical advancement has initiated the trend of sequencing multiple samples in different conditions or environments to explore the similarities and dissimilarities of the microbial communities. Examples include the human microbiome project and various studies of the human intestinal tract. With the availability of ever larger databases of such measurements, finding samples similar to a given query sample is becoming a central operation. RESULTS: In this article, we develop a content-based exploration and retrieval method for whole-metagenome sequencing samples. We apply a distributed string mining framework to efficiently extract all informative sequence k-mers from a pool of metagenomic samples and use them to measure the dissimilarity between two samples. We evaluate the performance of the proposed approach on two human gut metagenome datasets as well as human microbiome project metagenomic samples. We observe significant enrichment for diseased gut samples in results of queries with another diseased sample and high accuracy in discriminating between different body sites even though the method is unsupervised. AVAILABILITY AND IMPLEMENTATION: A software implementation of the DSM framework is available at https://github.com/HIITMetagenomics/dsm-framework. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Project description:The animal gastrointestinal tract contains a complex community of microbes, whose composition ultimately reflects the co-evolution of microorganisms with their animal host. An analysis of 78,619 pyrosequencing reads generated from pygmy loris fecal DNA extracts was performed to help better understand the microbial diversity and functional capacity of the pygmy loris gut microbiome. The taxonomic analysis of the metagenomic reads indicated that pygmy loris fecal microbiomes were dominated by Bacteroidetes and Proteobacteria phyla. The hierarchical clustering of several gastrointestinal metagenomes demonstrated the similarities of the microbial community structures of pygmy loris and mouse gut systems despite their differences in functional capacity. The comparative analysis of function classification revealed that the metagenome of the pygmy loris was characterized by an overrepresentation of those sequences involved in aromatic compound metabolism compared with humans and other animals. The key enzymes related to the benzoate degradation pathway were identified based on the Kyoto Encyclopedia of Genes and Genomes pathway assignment. These results would contribute to the limited body of primate metagenome studies and provide a framework for comparative metagenomic analysis between human and non-human primates, as well as a comparative understanding of the evolution of humans and their microbiome. However, future studies on the metagenome sequencing of pygmy loris and other prosimians regarding the effects of age, genetics, and environment on the composition and activity of the metagenomes are required.
Project description:The human gut microbiome is a complex ecosystem composed mainly of uncultured bacteria. It plays an essential role in the catabolism of dietary fibers, the part of plant material in our diet that is not metabolized in the upper digestive tract, because the human genome does not encode adequate carbohydrate active enzymes (CAZymes). We describe a multi-step functionally based approach to guide the in-depth pyrosequencing of specific regions of the human gut metagenome encoding the CAZymes involved in dietary fiber breakdown. High-throughput functional screens were first applied to a library covering 5.4 × 10(9) bp of metagenomic DNA, allowing the isolation of 310 clones showing beta-glucanase, hemicellulase, galactanase, amylase, or pectinase activities. Based on the results of refined secondary screens, sequencing efforts were reduced to 0.84 Mb of nonredundant metagenomic DNA, corresponding to 26 clones that were particularly efficient for the degradation of raw plant polysaccharides. Seventy-three CAZymes from 35 different families were discovered. This corresponds to a fivefold target-gene enrichment compared to random sequencing of the human gut metagenome. Thirty-three of these CAZy encoding genes are highly homologous to prevalent genes found in the gut microbiome of at least 20 individuals for whose metagenomic data are available. Moreover, 18 multigenic clusters encoding complementary enzyme activities for plant cell wall degradation were also identified. Gene taxonomic assignment is consistent with horizontal gene transfer events in dominant gut species and provides new insights into the human gut functional trophic chain.
Project description:Understanding gut microbiome functions requires cultivated bacteria for experimental validation and reference bacterial genome sequences to interpret metagenome datasets and guide functional analyses. We present the Human Gastrointestinal Bacteria Culture Collection (HBC), a comprehensive set of 737 whole-genome-sequenced bacterial isolates, representing 273 species (105 novel species) from 31 families found in the human gastrointestinal microbiota. The HBC increases the number of bacterial genomes derived from human gastrointestinal microbiota by 37%. The resulting global Human Gastrointestinal Bacteria Genome Collection (HGG) classifies 83% of genera by abundance across 13,490 shotgun-sequenced metagenomic samples, improves taxonomic classification by 61% compared to the Human Microbiome Project (HMP) genome collection and achieves subspecies-level classification for almost 50% of sequences. The improved resource of gastrointestinal bacterial reference sequences circumvents dependence on de novo assembly of metagenomes and enables accurate and cost-effective shotgun metagenomic analyses of human gastrointestinal microbiota.
Project description:Identifying bacterial strains in metagenome and microbiome samples using computational analyses of short-read sequences remains a difficult problem. Here, we present an analysis of a human gut microbiome using TruSeq synthetic long reads combined with computational tools for metagenomic long-read assembly, variant calling and haplotyping (Nanoscope and Lens). Our analysis identifies 178 bacterial species, of which 51 were not found using shotgun reads alone. We recover bacterial contigs that comprise multiple operons, including 22 contigs of >1 Mbp. Furthermore, we observe extensive intraspecies variation within microbial strains in the form of haplotypes that span up to hundreds of Kbp. Incorporation of synthetic long-read sequencing technology with standard short-read approaches enables more precise and comprehensive analyses of metagenomic samples.
Project description:BACKGROUND: Uncovering the taxonomic composition and functional capacity within the swine gut microbial consortia is of great importance to animal physiology and health as well as to food and water safety due to the presence of human pathogens in pig feces. Nonetheless, limited information on the functional diversity of the swine gut microbiome is available. RESULTS: Analysis of 637, 722 pyrosequencing reads (130 megabases) generated from Yorkshire pig fecal DNA extracts was performed to help better understand the microbial diversity and largely unknown functional capacity of the swine gut microbiome. Swine fecal metagenomic sequences were annotated using both MG-RAST and JGI IMG/M-ER pipelines. Taxonomic analysis of metagenomic reads indicated that swine fecal microbiomes were dominated by Firmicutes and Bacteroidetes phyla. At a finer phylogenetic resolution, Prevotella spp. dominated the swine fecal metagenome, while some genes associated with Treponema and Anareovibrio species were found to be exclusively within the pig fecal metagenomic sequences analyzed. Functional analysis revealed that carbohydrate metabolism was the most abundant SEED subsystem, representing 13% of the swine metagenome. Genes associated with stress, virulence, cell wall and cell capsule were also abundant. Virulence factors associated with antibiotic resistance genes with highest sequence homology to genes in Bacteroidetes, Clostridia, and Methanosarcina were numerous within the gene families unique to the swine fecal metagenomes. Other abundant proteins unique to the distal swine gut shared high sequence homology to putative carbohydrate membrane transporters. CONCLUSIONS: The results from this metagenomic survey demonstrated the presence of genes associated with resistance to antibiotics and carbohydrate metabolism suggesting that the swine gut microbiome may be shaped by husbandry practices.
Project description:Vanillin is a phenolic food additive commonly used for flavor, antimicrobial, and antioxidant properties. Though it is one of the widely used food additives, strategies of the human gut microbes to evade its antimicrobial activity await extensive elucidation. The current study explores the human gut microbiome with a multi-omics approach to elucidate its composition and metabolic machinery to counter vanillin bioactivity. A combination of SSU rRNA gene diversity, metagenomic RNA features diversity, phylogenetic affiliation of metagenome encoded proteins, uniformly (R = 0.99) indicates the abundance of Bacteroidetes followed by Firmicutes and Proteobacteria. Manual curation of metagenomic dataset identified gene clusters specific for the vanillin metabolism (ligV, ligK, and vanK) and intermediary metabolic pathways (pca and cat operon). Metagenomic dataset comparison identified the omnipresence of vanillin catabolic features across diverse populations. The metabolomic analysis brings forth the functionality of the vanillin catabolic pathway through the Protocatechuate branch of the beta-ketoadipate pathway. These results highlight the human gut microbial features and metabolic bioprocess involved in vanillin catabolism to overcome its antimicrobial activity. The current study advances our understanding of the human gut microbiome adaption toward changing dietary habits.
Project description:On going efforts are directed at understanding the mutualism between the gut microbiota and the host in breast-fed versus formula-fed infants. Due to the lack of tissue biopsies, no investigators have performed a global transcriptional (gene expression) analysis of the developing human intestine in healthy infants. As a result, the crosstalk between the microbiome and the host transcriptome in the developing mucosal-commensal environment has not been determined. In this study, we examined the host intestinal mRNA gene expression and microbial DNA profiles in full term 3 month-old infants exclusively formula fed (FF) (n=6) or breast fed (BF) (n=6) from birth to 3 months. Host mRNA microarray measurements were performed using isolated intact sloughed epithelial cells in stool samples collected at 3 months. Microbial composition from the same stool samples was assessed by metagenomic pyrosequencing. Both the host mRNA expression and bacterial microbiome phylogenetic profiles provided strong feature sets that clearly classified the two groups of babies (FF and BF). To determine the relationship between host epithelial cell gene expression and the bacterial colony profiles, the host transcriptome and functionally profiled microbiome data were analyzed in a multivariate manner. From a functional perspective, analysis of the gut microbiota's metagenome revealed that characteristics associated with virulence differed between the FF and BF babies. Using canonical correlation analysis, evidence of multivariate structure relating eleven host immunity / mucosal defense-related genes and microbiome virulence characteristics was observed. These results, for the first time, provide insight into the integrated responses of the host and microbiome to dietary substrates in the early neonatal period. Our data suggest that systems biology and computational modeling approaches that integrate “-omic” information from the host and the microbiome can identify important mechanistic pathways of intestinal development affecting the gut microbiome in the first few months of life. KEYWORDS: infant, breast-feeding, infant formula, exfoliated cells, transcriptome, metagenome, multivariate analysis, canonical correlation analysis 12 samples, 2 groups