Project description:<h4>Background</h4>Despite recent decreases in the cost of sequencing, shotgun metagenome sequencing remains more expensive compared with 16S rRNA amplicon sequencing. Methods have been developed to predict the functional profiles of microbial communities based on their taxonomic composition. In this study, we evaluated the performance of three commonly used metagenome prediction tools (PICRUSt, PICRUSt2, and Tax4Fun) by comparing the significance of the differential abundance of predicted functional gene profiles to those from shotgun metagenome sequencing across different environments.<h4>Results</h4>We selected 7 datasets of human, non-human animal, and environmental (soil) samples that have publicly available 16S rRNA and shotgun metagenome sequences. As we would expect based on previous literature, strong Spearman correlations were observed between predicted gene compositions and gene relative abundance measured with shotgun metagenome sequencing. However, these strong correlations were preserved even when the abundance of genes were permuted across samples. This suggests that simple correlation coefficient is a highly unreliable measure for the performance of metagenome prediction tools. As an alternative, we compared the performance of genes predicted with PICRUSt, PICRUSt2, and Tax4Fun to sequenced metagenome genes in inference models associated with metadata within each dataset. With this approach, we found reasonable performance for human datasets, with the metagenome prediction tools performing better for inference on genes related to "housekeeping" functions. However, their performance degraded sharply outside of human datasets when used for inference.<h4>Conclusion</h4>We conclude that the utility of PICRUSt, PICRUSt2, and Tax4Fun for inference with the default database is likely limited outside of human samples and that development of tools for gene prediction specific to different non-human and environmental samples is warranted. Video abstract.
Project description:BACKGROUND: The new research field of metagenomics is providing exciting insights into various, previously unclassified ecological systems. Next-generation sequencing technologies are producing a rapid increase of environmental data in public databases. There is great need for specialized software solutions and statistical methods for dealing with complex metagenome data sets. METHODOLOGY/PRINCIPAL FINDINGS: To facilitate the development and improvement of metagenomic tools and the planning of metagenomic projects, we introduce a sequencing simulator called MetaSim. Our software can be used to generate collections of synthetic reads that reflect the diverse taxonomical composition of typical metagenome data sets. Based on a database of given genomes, the program allows the user to design a metagenome by specifying the number of genomes present at different levels of the NCBI taxonomy, and then to collect reads from the metagenome using a simulation of a number of different sequencing technologies. A population sampler optionally produces evolved sequences based on source genomes and a given evolutionary tree. CONCLUSIONS/SIGNIFICANCE: MetaSim allows the user to simulate individual read datasets that can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software.
Project description:The complexity of microbial communities, combined with technical biases in next-generation sequencing, pose a challenge to metagenomic analysis. Here, we develop a set of internal DNA standards, termed "sequins" (sequencing spike-ins), that together constitute a synthetic community of artificial microbial genomes. Sequins are added to environmental DNA samples prior to library preparation, and undergo concurrent sequencing with the accompanying sample. We validate the performance of sequins by comparison to mock microbial communities, and demonstrate their use in the analysis of real metagenome samples. We show how sequins can be used to measure fold change differences in the size and structure of accompanying microbial communities, and perform quantitative normalization between samples. We further illustrate how sequins can be used to benchmark and optimize new methods, including nanopore long-read sequencing technology. We provide metagenome sequins, along with associated data sets, protocols, and an accompanying software toolkit, as reference standards to aid in metagenomic studies.
Project description:Microbial communities in incipient soil systems serve as the only biotic force shaping landscape evolution. However, the underlying ecological forces shaping microbial community structure and function are inadequately understood. We used amplicon sequencing to determine microbial taxonomic assembly and metagenome sequencing to evaluate microbial functional assembly in incipient basaltic soil subjected to precipitation. Community composition was stratified with soil depth in the pre-precipitation samples, with surficial communities maintaining their distinct structure and diversity after precipitation, while the deeper soil samples appeared to become more uniform. The structural community assembly remained deterministic in pre- and post-precipitation periods, with homogenous selection being dominant. Metagenome analysis revealed that carbon and nitrogen functional potential was assembled stochastically. Sub-populations putatively involved in the nitrogen cycle and carbon fixation experienced counteracting assembly pressures at the deepest depths, suggesting the communities may functionally assemble to respond to short-term environmental fluctuations and impact the landscape-scale response to perturbations. We propose that contrasting assembly forces impact microbial structure and potential function in an incipient landscape; <i>in situ</i> landscape characteristics (here homogenous parent material) drive community structure assembly, while short-term environmental fluctuations (here precipitation) shape environmental variations that are random in the soil depth profile and drive stochastic sub-population functional dynamics.
Project description:The Genomes Online Database (GOLD) (https://gold.jgi.doe.gov) is a manually curated data management system that catalogs sequencing projects with associated metadata from around the world. In the current version of GOLD (v.6), all projects are organized based on a four level classification system in the form of a Study, Organism (for isolates) or Biosample (for environmental samples), Sequencing Project and Analysis Project. Currently, GOLD provides information for 26 117 Studies, 239 100 Organisms, 15 887 Biosamples, 97 212 Sequencing Projects and 78 579 Analysis Projects. These are integrated with over 312 metadata fields from which 58 are controlled vocabularies with 2067 terms. The web interface facilitates submission of a diverse range of Sequencing Projects (such as isolate genome, single-cell genome, metagenome, metatranscriptome) and complex Analysis Projects (such as genome from metagenome, or combined assembly from multiple Sequencing Projects). GOLD provides a seamless interface with the Integrated Microbial Genomes (IMG) system and supports and promotes the Genomic Standards Consortium (GSC) Minimum Information standards. This paper describes the data updates and additional features added during the last two years.
Project description:Microbiome samples with low microbial biomass or severe DNA degradation remain challenging for amplicon-based or whole-metagenome sequencing approaches. Here, we introduce 2bRAD-M, a highly reduced and cost-effective strategy which only sequences ~ 1% of metagenome and can simultaneously produce species-level bacterial, archaeal, and fungal profiles. 2bRAD-M can accurately generate species-level taxonomic profiles for otherwise hard-to-sequence samples with merely 1 pg of total DNA, high host DNA contamination, or severely fragmented DNA from degraded samples. Tests of 2bRAD-M on various stool, skin, environmental, and clinical FFPE samples suggest a successful reconstruction of comprehensive, high-resolution microbial profiles.
Project description:Geothermal habitats in Yellowstone National Park (YNP) provide an unparalleled opportunity to understand the environmental factors that control the distribution of archaea in thermal habitats. Here we describe, analyze, and synthesize metagenomic and geochemical data collected from seven high-temperature sites that contain microbial communities dominated by archaea relative to bacteria. The specific objectives of the study were to use metagenome sequencing to determine the structure and functional capacity of thermophilic archaeal-dominated microbial communities across a pH range from 2.5 to 6.4 and to discuss specific examples where the metabolic potential correlated with measured environmental parameters and geochemical processes occurring in situ. Random shotgun metagenome sequence (?40-45?Mb Sanger sequencing per site) was obtained from environmental DNA extracted from high-temperature sediments and/or microbial mats and subjected to numerous phylogenetic and functional analyses. Analysis of individual sequences (e.g., MEGAN and G?+?C content) and assemblies from each habitat type revealed the presence of dominant archaeal populations in all environments, 10 of whose genomes were largely reconstructed from the sequence data. Analysis of protein family occurrence, particularly of those involved in energy conservation, electron transport, and autotrophic metabolism, revealed significant differences in metabolic strategies across sites consistent with differences in major geochemical attributes (e.g., sulfide, oxygen, pH). These observations provide an ecological basis for understanding the distribution of indigenous archaeal lineages across high-temperature systems of YNP.
Project description:An understanding of microbial community structure is an important issue in the field of molecular ecology. The traditional molecular method involves amplification of small subunit ribosomal RNA (SSU rRNA) genes by polymerase chain reaction (PCR). However, PCR-based amplicon approaches are affected by primer bias and chimeras. With the development of high-throughput sequencing technology, unbiased SSU rRNA gene sequences can be mined from shotgun sequencing-based metagenomic or metatranscriptomic datasets to obtain a reflection of the microbial community structure in specific types of environment and to evaluate SSU primers. However, the use of short reads obtained through next-generation sequencing for primer evaluation has not been well resolved. The software MIPE (MIcrobiota metagenome Primer Explorer) was developed to adapt numerous short reads from metagenomes and metatranscriptomes. Using metagenomic or metatranscriptomic datasets as input, MIPE extracts and aligns rRNA to reveal detailed information on microbial composition and evaluate SSU rRNA primers. A mock dataset, a real Metagenomics Rapid Annotation using Subsystem Technology (MG-RAST) test dataset, two PrimerProspector test datasets and a real metatranscriptomic dataset were used to validate MIPE. The software calls Mothur (v1.33.3) and the SILVA database (v119) for the alignment and classification of rRNA genes from a metagenome or metatranscriptome. MIPE can effectively extract shotgun rRNA reads from a metagenome or metatranscriptome and is capable of classifying these sequences and exhibiting sensitivity to different SSU rRNA PCR primers. Therefore, MIPE can be used to guide primer design for specific environmental samples.
Project description:<h4>Background</h4>Metagenomics can reveal the vast majority of microbes that have been missed by traditional cultivation-based methods. Due to its extremely wide range of application areas, fast metagenome sequencing simulation systems with high fidelity are in great demand to facilitate the development and comparison of metagenomics analysis tools.<h4>Results</h4>We present here a customizable metagenome simulation system: NeSSM (Next-generation Sequencing Simulator for Metagenomics). Combining complete genomes currently available, a community composition table, and sequencing parameters, it can simulate metagenome sequencing better than existing systems. Sequencing error models based on the explicit distribution of errors at each base and sequencing coverage bias are incorporated in the simulation. In order to improve the fidelity of simulation, tools are provided by NeSSM to estimate the sequencing error models, sequencing coverage bias and the community composition directly from existing metagenome sequencing data. Currently, NeSSM supports single-end and pair-end sequencing for both 454 and Illumina platforms. In addition, a GPU (graphics processing units) version of NeSSM is also developed to accelerate the simulation. By comparing the simulated sequencing data from NeSSM with experimental metagenome sequencing data, we have demonstrated that NeSSM performs better in many aspects than existing popular metagenome simulators, such as MetaSim, GemSIM and Grinder. The GPU version of NeSSM is more than one-order of magnitude faster than MetaSim.<h4>Conclusions</h4>NeSSM is a fast simulation system for high-throughput metagenome sequencing. It can be helpful to develop tools and evaluate strategies for metagenomics analysis and it's freely available for academic users at http://cbb.sjtu.edu.cn/~ccwei/pub/software/NeSSM.php.
Project description:Bacterioplankton are main drivers of biogeochemical cycles and important components of aquatic food webs. While sequencing-based studies have revealed how bacterioplankton communities are structured in time and space, relatively little is known about intraspecies diversity patterns and their ecological relevance. Here, we use the newly developed software POGENOM (POpulation GENomics from Metagenomes) to investigate genomic diversity and differentiation in metagenome-assembled genomes from the Baltic Sea, and investigate their genomic variation using metagenome data spanning a 1700 km transect and covering seasonal variation at one station. The majority of the investigated species, representing several major bacterioplankton clades, displayed population structures correlating significantly with environmental factors such as salinity and temperature. Population differentiation was more pronounced over spatial than temporal scales. We discovered genes that have undergone adaptation to different salinity regimes, potentially responsible for the populations' existence along with the salinity range. This in turn implies the broad existence of ecotypes that may remain undetected by rRNA gene sequencing. Our findings emphasize the importance of physiological barriers, and highlight the role of adaptive divergence as a structuring mechanism of bacterioplankton species.