Identification and Quantification of Abundant Species from Pyrosequences of 16S rRNA by Consensus Alignment.
ABSTRACT: 16S rRNA gene profiling has recently been boosted by the development of pyrosequencing methods. A common analysis is to group pyrosequences into Operational Taxonomic Units (OTUs), such that reads in an OTU are likely sampled from the same species. However, species diversity estimated from error-prone 16S rRNA pyrosequences may be inflated because the reads sampled from the same 16S rRNA gene may appear different, and current OTU inference approaches typically involve time-consuming pairwise/multiple distance calculation and clustering. I propose a novel approach AbundantOTU based on a Consensus Alignment (CA) algorithm, which infers consensus sequences, each representing an OTU, taking advantage of the sequence redundancy for abundant species. Pyrosequencing reads can then be recruited to the consensus sequences to give quantitative information for the corresponding species. As tested on 16S rRNA pyrosequence datasets from mock communities with known species, AbundantOTU rapidly reported identified sequences of the source 16S rRNAs and the abundances of the corresponding species. AbundantOTU was also applied to 16S rRNA pyrosequence datasets derived from real microbial communities and the results are in general agreement with previous studies.
Project description:High-throughput parallel sequencing is a powerful tool for the quantification of microbial diversity through the amplification of nuclear ribosomal gene regions. Recent work has extended this approach to the quantification of diversity within otherwise difficult-to-study metazoan groups. However, nuclear ribosomal genes present both analytical challenges and practical limitations that are a consequence of the mutational properties of nuclear ribosomal genes. Here we exploit useful properties of protein-coding genes for cross-species amplification and denoising of 454 flowgrams. We first use experimental mixtures of species from the class Collembola to amplify and pyrosequence the 5' region of the COI barcode, and we implement a new algorithm called PyroClean for the denoising of Roche GS FLX pyrosequences. Using parameter values from the analysis of experimental mixtures, we then analyse two communities sampled from field sites on the island of Tenerife. Cross-species amplification success of target mitochondrial sequences in experimental species mixtures is high; however, there is little relationship between template DNA concentrations and pyrosequencing read abundance. Homopolymer error correction and filtering against a consensus reference sequence reduced the volume of unique sequences to approximately 5% of the original unique raw reads. Filtering of remaining non-target sequences attributed to PCR error, sequencing error, or numts further reduced unique sequence volume to 0.8% of the original raw reads. PyroClean reduces or eliminates the need for an additional, time-consuming step to cluster reads into Operational Taxonomic Units, which facilitates the detection of intraspecific DNA sequence variation. PyroCleaned sequence data from field sites in Tenerife demonstrate the utility of our approach for quantifying evolutionary diversity and its spatial structure. Comparison of our sequence data to public databases reveals that we are able to successfully recover both interspecific and intraspecific sequence diversity.
Project description:Considerable Nanoarchaeota novelty and diversity were encountered in Yellowstone Lake, Yellowstone National Park (YNP), where sampling targeted lake floor hydrothermal vent fluids, streamers and sediments associated with these vents, and in planktonic photic zones in three different regions of the lake. Significant homonucleotide repeats (HR) were observed in pyrosequence reads and in near full-length Sanger sequences, averaging 112 HR per 1349 bp clone and could confound diversity estimates derived from pyrosequencing, resulting in false nucleotide insertions or deletions (indels). However, Sanger sequencing of two different sets of PCR clones (110 bp, 1349 bp) demonstrated that at least some of these indels are real. The majority of the Nanoarchaeota PCR amplicons were vent associated; however, curiously, one relatively small Nanoarchaeota OTU (71 pyrosequencing reads) was only found in photic zone water samples obtained from a region of the lake furthest removed from the hydrothermal regions of the lake. Extensive pyrosequencing failed to demonstrate the presence of an Ignicoccus lineage in this lake, suggesting the Nanoarchaeota in this environment are associated with novel Archaea hosts. Defined phylogroups based on near full-length PCR clones document the significant Nanoarchaeota 16S rRNA gene diversity in this lake and firmly establish a terrestrial clade distinct from the marine Nanoarcheota as well as from other geographical locations.
Project description:Pyrosequence targeting of the 16S rRNA gene has been adopted for microbial communities associated with field-grown plants. To examine phylogenetic drifts according to read length and bioinformatic tools, original and chopped sequences (250-570 bp) covering the V1-V4 regions of 16S rRNA genes were compared using pyrosequence and Sanger reads of rice root microbiomes. The phylogenetic assignment at genus level depended on read length, especially in the genus Bradyrhizobium, which is one of the ecologically important bacterial genera associated with plants. We discuss the methodology of phylogenetic assignments of plant-associated bacteria by 16S rRNA pyrosequence.
Project description:High-throughput bacterial 16S rRNA gene sequencing followed by clustering of short sequences into operational taxonomic units (OTUs) is widely used for microbiome profiling. However, clustering of short 16S rRNA gene reads into biologically meaningful OTUs is challenging, in part because nucleotide variation along the 16S rRNA gene is only partially captured by short reads. The recent emergence of long-read platforms, such as single-molecule real-time (SMRT) sequencing from Pacific Biosciences, offers the potential for improved taxonomic and phylogenetic profiling. Here, we evaluate the performance of long- and short-read 16S rRNA gene sequencing using simulated and experimental data, followed by OTU inference using computational pipelines based on heuristic and complete-linkage hierarchical clustering.In simulated data, long-read sequencing was shown to improve OTU quality and decrease variance. We then profiled 40 human gut microbiome samples using a combination of Illumina MiSeq and Blautia-specific SMRT sequencing, further supporting the notion that long reads can identify additional OTUs. We implemented a complete-linkage hierarchical clustering strategy using a flexible computational pipeline, tailored specifically for PacBio circular consensus sequencing (CCS) data that outperforms heuristic methods in most settings: https://github.com/oscar-franzen/oclust/ .Our data demonstrate that long reads can improve OTU inference; however, the choice of clustering algorithm and associated clustering thresholds has significant impact on performance.
Project description:Advances in next-generation sequencing technologies are providing longer nucleotide sequence reads that contain more information about phylogenetic relationships. We sought to use this information to understand the evolution and ecology of bacterioplankton at our long-term study site in the Western Sargasso Sea. A bioinformatics pipeline called PhyloAssigner was developed to align pyrosequencing reads to a reference multiple sequence alignment of 16S ribosomal RNA (rRNA) genes and assign them phylogenetic positions in a reference tree using a maximum likelihood algorithm. Here, we used this pipeline to investigate the ecologically important SAR11 clade of Alphaproteobacteria. A combined set of 2.7 million pyrosequencing reads from the 16S rRNA V1-V2 regions, representing 9 years at the Bermuda Atlantic Time-series Study (BATS) site, was quality checked and parsed into a comprehensive bacterial tree, yielding 929?036 Alphaproteobacteria reads. Phylogenetic structure within the SAR11 clade was linked to seasonally recurring spatiotemporal patterns. This analysis resolved four new SAR11 ecotypes in addition to five others that had been described previously at BATS. The data support a conclusion reached previously that the SAR11 clade diversified by subdivision of niche space in the ocean water column, but the new data reveal a more complex pattern in which deep branches of the clade diversified repeatedly across depth strata and seasonal regimes. The new data also revealed the presence of an unrecognized clade of Alphaproteobacteria, here named SMA-1 (Sargasso Mesopelagic Alphaproteobacteria, group 1), in the upper mesopelagic zone. The high-resolution phylogenetic analyses performed herein highlight significant, previously unknown, patterns of evolutionary diversification, within perhaps the most widely distributed heterotrophic marine bacterial clade, and strongly links to ecosystem regimes.
Project description:Large-scale and in-depth characterization of the intestinal microbiota necessitates application of high-throughput 16S rRNA gene-based technologies, such as barcoded pyrosequencing and phylogenetic microarray analysis. In this study, the two techniques were compared and contrasted for analysis of the bacterial composition in three fecal and three small intestinal samples from human individuals. As PCR remains a crucial step in sample preparation for both techniques, different forward primers were used for amplification to assess their impact on microbial profiling results. An average of 7,944 pyrosequences, spanning the V1 and V2 region of 16S rRNA genes, was obtained per sample. Although primer choice in barcoded pyrosequencing did not affect species richness and diversity estimates, detection of Actinobacteria strongly depended on the selected primer. Microbial profiles obtained by pyrosequencing and phylogenetic microarray analysis (HITChip) correlated strongly for fecal and ileal lumen samples but were less concordant for ileostomy effluent. Quantitative PCR was employed to investigate the deviations in profiling between pyrosequencing and HITChip analysis. Since cloning and sequencing of random 16S rRNA genes from ileostomy effluent confirmed the presence of novel intestinal phylotypes detected by pyrosequencing, especially those belonging to the Veillonella group, the divergence between pyrosequencing and the HITChip is likely due to the relatively low number of available 16S rRNA gene sequences of small intestinal origin in the DNA databases that were used for HITChip probe design. Overall, this study demonstrated that equivalent biological conclusions are obtained by high-throughput profiling of microbial communities, independent of technology or primer choice.
Project description:BACKGROUND: Besides the development of comprehensive tools for high-throughput 16S ribosomal RNA amplicon sequence analysis, there exists a growing need for protocols emphasizing alternative phylogenetic markers such as those representing eukaryotic organisms. RESULTS: Here we introduce CloVR-ITS, an automated pipeline for comparative analysis of internal transcribed spacer (ITS) pyrosequences amplified from metagenomic DNA isolates and representing fungal species. This pipeline performs a variety of steps similar to those commonly used for 16S rRNA amplicon sequence analysis, including preprocessing for quality, chimera detection, clustering of sequences into operational taxonomic units (OTUs), taxonomic assignment (at class, order, family, genus, and species levels) and statistical analysis of sample groups of interest based on user-provided information. Using ITS amplicon pyrosequencing data from a previous human gastric fluid study, we demonstrate the utility of CloVR-ITS for fungal microbiota analysis and provide runtime and cost examples, including analysis of extremely large datasets on the cloud. We show that the largest fractions of reads from the stomach fluid samples were assigned to Dothideomycetes, Saccharomycetes, Agaricomycetes and Sordariomycetes but that all samples were dominated by sequences that could not be taxonomically classified. Representatives of the Candida genus were identified in all samples, most notably C. quercitrusa, while sequence reads assigned to the Aspergillus genus were only identified in a subset of samples. CloVR-ITS is made available as a pre-installed, automated, and portable software pipeline for cloud-friendly execution as part of the CloVR virtual machine package (http://clovr.org). CONCLUSION: The CloVR-ITS pipeline provides fungal microbiota analysis that can be complementary to bacterial 16S rRNA and total metagenome sequence analysis allowing for more comprehensive studies of environmental and host-associated microbial communities.
Project description:Identification of pathogenic bacteria in ascites correlates with poor clinical outcomes. Ascites samples are commonly reported culture-negative, even where frank infection is indicated. Culture-independent methods have previously reported bacterial DNA in ascites, however, whether this represents viable bacterial populations has not been determined. We report the first application of 16S rRNA gene pyrosequencing and quantitative PCR in conjunction with propidium monoazide sample treatment to characterise the viable bacterial composition of ascites. Twenty five cirrhotic patients undergoing paracentesis provided ascites. Samples were treated with propidium monoazide to exclude non-viable bacterial DNA. Total bacterial load was quantified by 16S rRNA Q-PCR with species identity and relative abundance determined by 16S rRNA gene pyrosequencing. Correlation of molecular microbiology data with clinical measures and diagnostic microbiology was performed. Viable bacterial signal was obtained in 84% of ascites samples, both by Q-PCR and pyrosequencing. Approximately 190,000 ribosomal pyrosequences were obtained, representing 236 species, including both gut and non gut-associated species. Substantial variation in the species detected was observed between patients. Statistically significant relationships were identified between the bacterial community similarity and clinical measures, including ascitic polymorphonuclear leukocyte count and Child-Pugh class. Viable bacteria are present in the ascites of a majority of patients with cirrhosis including those with no clinical signs of infection. Microbiota composition significantly correlates with clinical measures. Entry of bacteria into ascites is unlikely to be limited to translocation from the gut, raising fundamental questions about the processes that underlie the development of spontaneous bacterial peritonitis.
Project description:Deep-sequencing technologies are becoming nearly routine to describe microbial community composition in environmental samples. The 18S ribosomal DNA (rDNA) pyrosequencing has revealed a vast diversity of infrequent sequences, leading to the proposition of the existence of an extremely diverse microbial 'rare biosphere'. Although rare microbes no doubt exist, critical views suggest that many rare sequences may actually be artifacts. However, information about how diversity revealed by molecular methods relates to that revealed by classical morphology approaches is practically nonexistent. To address this issue, we used different approaches to assess the diversity of tintinnid ciliates, a species-rich group in which species can be easily distinguished morphologically. We studied two Mediterranean marine samples with different patterns of tintinnid diversity. We estimated tintinnid diversity in these samples employing morphological observations and both classical cloning and sequencing and pyrosequencing of two different markers, the 18S rDNA and the internal transcribed spacer (ITS) regions, applying a variety of computational approaches currently used to analyze pyrosequence reads. We found that both molecular approaches were efficient in detecting the tintinnid species observed by microscopy and revealed similar phylogenetic structures of the tintinnid community at the species level. However, depending on the method used to analyze the pyrosequencing results, we observed discrepancies with the morphology-based assessments up to several orders of magnitude. In several cases, the inferred number of operational taxonomic units (OTUs) largely exceeded the total number of tintinnid cells in the samples. Such inflation of the OTU numbers corresponded to 'rare biosphere' taxa, composed largely of artifacts. Our results suggest that a careful and rigorous analysis of pyrosequencing data sets, including data denoising and sequence clustering with well-adjusted parameters, is necessary to accurately describe microbial biodiversity using this molecular approach.
Project description:The exploration of microbial communities by sequencing 16S rRNA genes has expanded with low-cost, high-throughput sequencing instruments. Illumina-based 16S rRNA gene sequencing has recently gained popularity over 454 pyrosequencing due to its lower costs, higher accuracy and greater throughput. Although recent reports suggest that Illumina and 454 pyrosequencing provide similar beta diversity measures, it remains to be demonstrated that pre-existing 454 pyrosequencing workflows can transfer directly from 454 to Illumina MiSeq sequencing by simply changing the sequencing adapters of the primers. In this study, we modified 454 pyrosequencing primers targeting the V4-V5 hyper-variable regions of the 16S rRNA gene to be compatible with Illumina sequencers. Microbial communities from cows, humans, leeches, mice, sewage, and termites and a mock community were analyzed by 454 and MiSeq sequencing of the V4-V5 region and MiSeq sequencing of the V4 region. Our analysis revealed that reference-based OTU clustering alone introduced biases compared to de novo clustering, preventing certain taxa from being observed in some samples. Based on this we devised and recommend an analysis pipeline that includes read merging, contaminant filtering, and reference-based clustering followed by de novo OTU clustering, which produces diversity measures consistent with de novo OTU clustering analysis. Low levels of dataset contamination with Illumina sequencing were discovered that could affect analyses that require highly sensitive approaches. While moving to Illumina-based sequencing platforms promises to provide deeper insights into the breadth and function of microbial diversity, our results show that care must be taken to ensure that sequencing and processing artifacts do not obscure true microbial diversity.