Project description:The goal of the Human Microbiome Project (HMP) is to generate a comprehensive catalog of human-associated microorganisms including reference genomes representing the most common species. Toward this goal, the HMP has characterized the microbial communities at 18 body habitats in a cohort of over 200 healthy volunteers using 16S rRNA gene (16S) sequencing and has generated nearly 1,000 reference genomes from human-associated microorganisms. To determine how well current reference genome collections capture the diversity observed among the healthy microbiome and to guide isolation and future sequencing of microbiome members, we compared the HMP's 16S data sets to several reference 16S collections to create a 'most wanted' list of taxa for sequencing. Our analysis revealed that the diversity of commonly occurring taxa within the HMP cohort microbiome is relatively modest, few novel taxa are represented by these OTUs and many common taxa among HMP volunteers recur across different populations of healthy humans. Taken together, these results suggest that it should be possible to perform whole-genome sequencing on a large fraction of the human microbiome, including the 'most wanted', and that these sequences should serve to support microbiome studies across multiple cohorts. Also, in stark contrast to other taxa, the 'most wanted' organisms are poorly represented among culture collections suggesting that novel culture- and single-cell-based methods will be required to isolate these organisms for sequencing.
Project description:The Human Microbiome Project (HMP) aims to characterize the microbial communities of 18 body sites from healthy individuals. To accomplish this, the HMP generated two types of shotgun data: reference shotgun sequences isolated from different anatomical sites on the human body and shotgun metagenomic sequences from the microbial communities of each site. The alignment strategy for characterizing these metagenomic communities using available reference sequence is important to the success of HMP data analysis. Six next-generation aligners were used to align a community of known composition against a database comprising reference organisms known to be present in that community. All aligners report nearly complete genome coverage (>97%) for strains with over 6X depth of coverage, however they differ in speed, memory requirement and ease of use issues such as database size limitations and supported mapping strategies. The selected aligner was tested across a range of parameters to maximize sensitivity while maintaining a low false positive rate. We found that constraining alignment length had more impact on sensitivity than does constraining similarity in all cases tested. However, when reference species were replaced with phylogenetic neighbors, similarity begins to play a larger role in detection. We also show that choosing the top hit randomly when multiple, equally strong mappings are available increases overall sensitivity at the expense of taxonomic resolution. The results of this study identified a strategy that was used to map over 3 tera-bases of microbial sequence against a database of more than 5,000 reference genomes in just over a month.
Project description:Identifying causative disease agents in human patients from shotgun metagenomic sequencing (SMS) presents a powerful tool to apply when other targeted diagnostics fail. Numerous technical challenges remain, however, before SMS can move beyond the role of research tool. Accurately separating the known and unknown organism content remains difficult, particularly when SMS is applied as a last resort. The true amount of human DNA that remains in a sample after screening against the human reference genome and filtering nonbiological components left from library preparation has previously been underreported. In this study, we create the most comprehensive collection of microbial and reference-free human genetic variation available in a database optimized for efficient metagenomic search by extracting sequences from GenBank and the 1000 Genomes Project. The results reveal new human sequences found in individual Human Microbiome Project (HMP) samples. Individual samples contain up to 95% human sequence, and 4% of the individual HMP samples contain 10% or more human reads. Left unidentified, human reads can complicate and slow down further analysis and lead to inaccurately labeled microbial taxa and ultimately lead to privacy concerns as more human genome data is collected.
Project description:Understanding gut microbiome functions requires cultivated bacteria for experimental validation and reference bacterial genome sequences to interpret metagenome datasets and guide functional analyses. We present the Human Gastrointestinal Bacteria Culture Collection (HBC), a comprehensive set of 737 whole-genome-sequenced bacterial isolates, representing 273 species (105 novel species) from 31 families found in the human gastrointestinal microbiota. The HBC increases the number of bacterial genomes derived from human gastrointestinal microbiota by 37%. The resulting global Human Gastrointestinal Bacteria Genome Collection (HGG) classifies 83% of genera by abundance across 13,490 shotgun-sequenced metagenomic samples, improves taxonomic classification by 61% compared to the Human Microbiome Project (HMP) genome collection and achieves subspecies-level classification for almost 50% of sequences. The improved resource of gastrointestinal bacterial reference sequences circumvents dependence on de novo assembly of metagenomes and enables accurate and cost-effective shotgun metagenomic analyses of human gastrointestinal microbiota.
Project description:Phase 1 of the Human Microbiome Project (HMP) investigated 18 body subsites of 242 healthy American adults to produce the first comprehensive reference for the composition and variation of the "healthy" human microbiome. Publicly available data sets from amplicon sequencing of two 16S ribosomal RNA variable regions, with extensive controlled-access participant data, provide a reference for ongoing microbiome studies. However, utilization of these data sets can be hindered by the complex bioinformatic steps required to access, import, decrypt, and merge the various components in formats suitable for ecological and statistical analysis. The HMP16SData package provides count data for both 16S ribosomal RNA variable regions, integrated with phylogeny, taxonomy, public participant data, and controlled participant data for authorized researchers, using standard integrative Bioconductor data objects. By removing bioinformatic hurdles of data access and management, HMP16SData enables epidemiologists with only basic R skills to quickly analyze HMP data.