Comparative analysis of pyrosequencing and a phylogenetic microarray for exploring microbial community structures in the human distal intestine.
ABSTRACT: BACKGROUND: Variations in the composition of the human intestinal microbiota are linked to diverse health conditions. High-throughput molecular technologies have recently elucidated microbial community structure at much higher resolution than was previously possible. Here we compare two such methods, pyrosequencing and a phylogenetic array, and evaluate classifications based on two variable 16S rRNA gene regions. METHODS AND FINDINGS: Over 1.75 million amplicon sequences were generated from the V4 and V6 regions of 16S rRNA genes in bacterial DNA extracted from four fecal samples of elderly individuals. The phylotype richness, for individual samples, was 1,400-1,800 for V4 reads and 12,500 for V6 reads, and 5,200 unique phylotypes when combining V4 reads from all samples. The RDP-classifier was more efficient for the V4 than for the far less conserved and shorter V6 region, but differences in community structure also affected efficiency. Even when analyzing only 20% of the reads, the majority of the microbial diversity was captured in two samples tested. DNA from the four samples was hybridized against the Human Intestinal Tract (HIT) Chip, a phylogenetic microarray for community profiling. Comparison of clustering of genus counts from pyrosequencing and HITChip data revealed highly similar profiles. Furthermore, correlations of sequence abundance and hybridization signal intensities were very high for lower-order ranks, but lower at family-level, which was probably due to ambiguous taxonomic groupings. CONCLUSIONS: The RDP-classifier consistently assigned most V4 sequences from human intestinal samples down to genus-level with good accuracy and speed. This is the deepest sequencing of single gastrointestinal samples reported to date, but microbial richness levels have still not leveled out. A majority of these diversities can also be captured with five times lower sampling-depth. HITChip hybridizations and resulting community profiles correlate well with pyrosequencing-based compositions, especially for lower-order ranks, indicating high robustness of both approaches. However, incompatible grouping schemes make exact comparison difficult.
Project description:Large-scale and in-depth characterization of the intestinal microbiota necessitates application of high-throughput 16S rRNA gene-based technologies, such as barcoded pyrosequencing and phylogenetic microarray analysis. In this study, the two techniques were compared and contrasted for analysis of the bacterial composition in three fecal and three small intestinal samples from human individuals. As PCR remains a crucial step in sample preparation for both techniques, different forward primers were used for amplification to assess their impact on microbial profiling results. An average of 7,944 pyrosequences, spanning the V1 and V2 region of 16S rRNA genes, was obtained per sample. Although primer choice in barcoded pyrosequencing did not affect species richness and diversity estimates, detection of Actinobacteria strongly depended on the selected primer. Microbial profiles obtained by pyrosequencing and phylogenetic microarray analysis (HITChip) correlated strongly for fecal and ileal lumen samples but were less concordant for ileostomy effluent. Quantitative PCR was employed to investigate the deviations in profiling between pyrosequencing and HITChip analysis. Since cloning and sequencing of random 16S rRNA genes from ileostomy effluent confirmed the presence of novel intestinal phylotypes detected by pyrosequencing, especially those belonging to the Veillonella group, the divergence between pyrosequencing and the HITChip is likely due to the relatively low number of available 16S rRNA gene sequences of small intestinal origin in the DNA databases that were used for HITChip probe design. Overall, this study demonstrated that equivalent biological conclusions are obtained by high-throughput profiling of microbial communities, independent of technology or primer choice.
Project description:Sequencing of 16S rRNA gene tags is a popular method for profiling and comparing microbial communities. The protocols and methods used, however, vary considerably with regard to amplification primers, sequencing primers, sequencing technologies; as well as quality filtering and clustering. How results are affected by these choices, and whether data produced with different protocols can be meaningfully compared, is often unknown. Here we compare results obtained using three different amplification primer sets (targeting V4, V6-V8, and V7-V8) and two sequencing technologies (454 pyrosequencing and Illumina MiSeq) using DNA from a mock community containing a known number of species as well as complex environmental samples whose PCR-independent profiles were estimated using shotgun sequencing. We find that paired-end MiSeq reads produce higher quality data and enabled the use of more aggressive quality control parameters over 454, resulting in a higher retention rate of high quality reads for downstream data analysis. While primer choice considerably influences quantitative abundance estimations, sequencing platform has relatively minor effects when matched primers are used. Beta diversity metrics are surprisingly robust to both primer and sequencing platform biases.
Project description:<h4>Background</h4>The rapidly expanding field of microbiome studies offers investigators a large choice of methods for each step in the process of determining the microorganisms in a sample. The human cervicovaginal microbiome affects female reproductive health, susceptibility to and natural history of many sexually transmitted infections, including human papillomavirus (HPV). At present, long-term behavior of the cervical microbiome in early sexual life is poorly understood.<h4>Methods</h4>The V6 and V6-V9 regions of the 16S ribosomal RNA gene were amplified from DNA isolated from exfoliated cervical cells. Specimens from 10 women participating in the Natural History Study of HPV in Guanacaste, Costa Rica were sampled successively over a period of 5-7 years. We sequenced amplicons using 3 different platforms (Sanger, Roche 454, and Illumina HiSeq 2000) and analyzed sequences using pipelines based on 3 different classification algorithms (usearch, RDP Classifier, and pplacer).<h4>Results</h4>Usearch and pplacer provided consistent microbiome classifications for all sequencing methods, whereas RDP Classifier deviated significantly when characterizing Illumina reads. Comparing across sequencing platforms indicated 7%-41% of the reads were reclassified, while comparing across software pipelines reclassified up to 32% of the reads. Variability in classification was shown not to be due to a difference in read lengths. Six cervical microbiome community types were observed and are characterized by a predominance of either G. vaginalis or Lactobacillus spp. Over the 5-7 year period, subjects displayed fluctuation between community types. A PERMANOVA analysis on pairwise Kantorovich-Rubinstein distances between the microbiota of all samples yielded an F-test ratio of 2.86 (p<0.01), indicating a significant difference comparing within and between subjects' microbiota.<h4>Conclusions</h4>Amplification and sequencing methods affected the characterization of the microbiome more than classification algorithms. Pplacer and usearch performed consistently with all sequencing methods. The analyses identified 6 community types consistent with those previously reported. The long-term behavior of the cervical microbiome indicated that fluctuations were subject dependent.
Project description:The human intestinal microbiota is essential to the health of the host and plays a role in nutrition, development, metabolism, pathogen resistance, and regulation of immune responses. Antibiotics may disrupt these coevolved interactions, leading to acute or chronic disease in some individuals. Our understanding of antibiotic-associated disturbance of the microbiota has been limited by the poor sensitivity, inadequate resolution, and significant cost of current research methods. The use of pyrosequencing technology to generate large numbers of 16S rDNA sequence tags circumvents these limitations and has been shown to reveal previously unexplored aspects of the "rare biosphere." We investigated the distal gut bacterial communities of three healthy humans before and after treatment with ciprofloxacin, obtaining more than 7,000 full-length rRNA sequences and over 900,000 pyrosequencing reads from two hypervariable regions of the rRNA gene. A companion paper in PLoS Genetics (see Huse et al., doi: 10.1371/journal.pgen.1000255) shows that the taxonomic information obtained with these methods is concordant. Pyrosequencing of the V6 and V3 variable regions identified 3,300-5,700 taxa that collectively accounted for over 99% of the variable region sequence tags that could be obtained from these samples. Ciprofloxacin treatment influenced the abundance of about a third of the bacterial taxa in the gut, decreasing the taxonomic richness, diversity, and evenness of the community. However, the magnitude of this effect varied among individuals, and some taxa showed interindividual variation in the response to ciprofloxacin. While differences of community composition between individuals were the largest source of variability between samples, we found that two unrelated individuals shared a surprising degree of community similarity. In all three individuals, the taxonomic composition of the community closely resembled its pretreatment state by 4 weeks after the end of treatment, but several taxa failed to recover within 6 months. These pervasive effects of ciprofloxacin on community composition contrast with the reports by participants of normal intestinal function and with prior assumptions of only modest effects of ciprofloxacin on the intestinal microbiota. These observations support the hypothesis of functional redundancy in the human gut microbiota. The rapid return to the pretreatment community composition is indicative of factors promoting community resilience, the nature of which deserves future investigation.
Project description:Analysis of microbial communities by high-throughput pyrosequencing of SSU rRNA gene PCR amplicons has transformed microbial ecology research and led to the observation that many communities contain a diverse assortment of rare taxa-a phenomenon termed the Rare Biosphere. Multiple studies have investigated the effect of pyrosequencing read quality on operational taxonomic unit (OTU) richness for contrived communities, yet there is limited information on the fidelity of community structure estimates obtained through this approach. Given that PCR biases are widely recognized, and further unknown biases may arise from the sequencing process itself, a priori assumptions about the neutrality of the data generation process are at best unvalidated. Furthermore, post-sequencing quality control algorithms have not been explicitly evaluated for the accuracy of recovered representative sequences and its impact on downstream analyses, reducing useful discussion on pyrosequencing reads to their diversity and abundances. Here we report on community structures and sequences recovered for in vitro-simulated communities consisting of twenty 16S rRNA gene clones tiered at known proportions. PCR amplicon libraries of the V3-V4 and V6 hypervariable regions from the in vitro-simulated communities were sequenced using the Roche 454 GS FLX Titanium platform. Commonly used quality control protocols resulted in the formation of OTUs with >1% abundance composed entirely of erroneous sequences, while over-aggressive clustering approaches obfuscated real, expected OTUs. The pyrosequencing process itself did not appear to impose significant biases on overall community structure estimates, although the detection limit for rare taxa may be affected by PCR amplicon size and quality control approach employed. Meanwhile, PCR biases associated with the initial amplicon generation may impose greater distortions in the observed community structure.
Project description:The Ribosomal Database Project (RDP) Classifier, a naïve Bayesian classifier, can rapidly and accurately classify bacterial 16S rRNA sequences into the new higher-order taxonomy proposed in Bergey's Taxonomic Outline of the Prokaryotes (2nd ed., release 5.0, Springer-Verlag, New York, NY, 2004). It provides taxonomic assignments from domain to genus, with confidence estimates for each assignment. The majority of classifications (98%) were of high estimated confidence (> or = 95%) and high accuracy (98%). In addition to being tested with the corpus of 5,014 type strain sequences from Bergey's outline, the RDP Classifier was tested with a corpus of 23,095 rRNA sequences as assigned by the NCBI into their alternative higher-order taxonomy. The results from leave-one-out testing on both corpora show that the overall accuracies at all levels of confidence for near-full-length and 400-base segments were 89% or above down to the genus level, and the majority of the classification errors appear to be due to anomalies in the current taxonomies. For shorter rRNA segments, such as those that might be generated by pyrosequencing, the error rate varied greatly over the length of the 16S rRNA gene, with segments around the V2 and V4 variable regions giving the lowest error rates. The RDP Classifier is suitable both for the analysis of single rRNA sequences and for the analysis of libraries of thousands of sequences. Another related tool, RDP Library Compare, was developed to facilitate microbial-community comparison based on 16S rRNA gene sequence libraries. It combines the RDP Classifier with a statistical test to flag taxa differentially represented between samples. The RDP Classifier and RDP Library Compare are available online at http://rdp.cme.msu.edu/.
Project description:Pyrosequencing of 16S rRNA (16S) variable tags has become the most popular method for assessing microbial diversity, but the method remains costly for the evaluation of large numbers of environmental samples with high sequencing depths. We developed a barcoded Illumina paired-end (PE) sequencing (BIPES) method that sequences each 16S V6 tag from both ends on the Illumina HiSeq 2000, and the PE reads are then overlapped to obtain the V6 tag. The average accuracy of Illumina single-end (SE) reads was only 97.9%, which decreased from ?99.9% at the start of the read to less than 85% at the end of the read; nevertheless, overlapping of the PE reads significantly increased the sequencing accuracy to 99.65% by verifying the 3' end of each SE in which the sequencing quality was degraded. After the removal of tags with two or more mismatches within the medial 40-70 bases of the reads and of tags with any primer errors, the overall base sequencing accuracy of the BIPES reads was further increased to 99.93%. The BIPES reads reflected the amounts of the various tags in the initial template, but long tags and high GC tags were underestimated. The BIPES method yields 20-50 times more 16S V6 tags than does pyrosequencing in a single-flow cell run, and each of the BIPES reads costs less than 1/40 of a pyrosequencing read. As a laborsaving and cost-effective method, BIPES can be routinely used to analyze the microbial ecology of both environmental and human microbiomes.
Project description:Residual feed intake (RFI) testing has increased selection pressure on biological efficiency in cattle. The objective of this study was to assess the association of the rumen microbiome in inefficient, positive RFI (p-RFI) and efficient, negative RFI (n-RFI) Brahman bulls grazing 'Coastal' bermudagrass [Cynodondactylon (L.) Pers.]under two levels of forage allowance (high and low stocking intensity). Sixteen Brahman bulls were previously fed in confinement for 70 d to determine the RFI phenotype. Bulls were then allotted 60 d stocking on bermudagrass pastures to estimate RFI using the n-alkane technique. At the conclusion of the grazing period, rumen liquid samples were collected from each bull by stomach tube to evaluate the rumen microbiome. Extraction of DNA, amplification of the V4-V6 region of the 16S rRNA gene, and 454 pyrosequencing were performed on each sample. After denoising the sequences, chimera checking, and quality trimming, 4,573 ± 1,287 sequences were generated per sample. Sequences were then assigned taxonomy from the Greengenes database using the RDP classifier. Overall, 67.5 and 22.9% of sequences were classified as Bacteroidetes and Firmicutes, respectively. Within the phylum Bacteroidetes, Prevotella was the most predominant genus and was observed in greater relative abundance in p-RFI bulls compared with n-RFI bulls (P?=?0.01). In contrast, an unidentified Bacteroidales family was greater in relative abundance for n-RFI bulls than p-RFI (26.7 vs. 19.1%; P?=?0.03). Ruminococcaceae was the third most abundant family in our samples, but it was not affected by RFI phenotype. No effect of stocking intensity was observed for bacterial taxa, but there was a tendency for alpha diversity and operational taxonomic unit richness to increase with lower stocking intensity. Results suggested the rumen microbiome of p-RFI Brahman bulls has greater levels of Prevotella, but the bacterial community composition was unaffected by stocking intensity.
Project description:Due to potential sequencing errors in pyrosequencing data, species richness and diversity indices of microbial systems can be miscalculated. The "traditional" sequence refinement method is not sufficient to account for overestimations (e.g., length, primer errors, ambiguous nucleotides). Recent in silico and single-organism studies have revealed the importance of sequence quality scores in the estimation of ecological indices; however, this is the first study to compare quality-score stringencies across four regions of the SSU rRNA gene sequence (V1V2, V3, V4, and V6) with actual environmental samples compared directly to corresponding clone libraries produced from the same primer sets. The nucleic acid sequences determined via pyrosequencing were subjected to varying quality-score cutoffs that ranged from 25 to 32, and at each quality-score cutoff, either 10 or 15 % of the nucleotides were allowed to be below the cutoff. When species richness estimates were compared for the tested samples, the cutoff values of Q27(15%), Q30(10%), and Q32(15%) for V1V2, V4, and V6, respectively, estimated similar values as obtained with clone libraries and Sanger sequencing. The most stringent Q tested (Q32(10%)) was not enough to account for species richness inflation of the V3 region pyrosequence data. Results indicated that quality-score assessment greatly improved estimates of ecological indices for environmental samples (species richness and ?-diversity) and that the effect of quality-score filtering was region-dependent.
Project description:The novel multi-million read generating sequencing technologies are very promising for resolving the immense soil 16S rRNA gene bacterial diversity. Yet they have a limited maximum sequence length screening ability, restricting studies in screening DNA stretches of single 16S rRNA gene hypervariable (V) regions. The aim of the present study was to assess the effects of properties of four consecutive V regions (V3-6) on commonly applied analytical methodologies in bacterial ecology studies. Using an in silico approach, the performance of each V region was compared with the complete 16S rRNA gene stretch. We assessed related properties of the soil derived bacterial sequence collection of the Ribosomal Database Project (RDP) database and concomitantly performed simulations based on published datasets. Results indicate that overall the most prominent V region for soil bacterial diversity studies was V3, even though it was outperformed in some of the tests. Despite its high performance during most tests, V4 was less conserved along flanking sites, thus reducing its ability for bacterial diversity coverage. V5 performed well in the non-redundant RDP database based analysis. However V5 did not resemble the full-length 16S rRNA gene sequence results as well as V3 and V4 did when the natural sequence frequency and occurrence approximation was considered in the virtual experiment. Although, the highly conserved flanking sequence regions of V6 provide the ability to amplify partial 16S rRNA gene sequences from very diverse owners, it was demonstrated that V6 was the least informative compared to the rest examined V regions. Our results indicate that environment specific database exploration and theoretical assessment of the experimental approach are strongly suggested in 16S rRNA gene based bacterial diversity studies.