Project description:BACKGROUND:Careful consideration of experimental artefacts is required in order to successfully apply high-throughput 16S ribosomal ribonucleic acid (rRNA) gene sequencing technology. Here we introduce experimental design, quality control and "denoising" approaches for sequencing low biomass specimens. RESULTS:We found that bacterial biomass is a key driver of 16S rRNA gene sequencing profiles generated from bacterial mock communities and that the use of different deoxyribonucleic acid (DNA) extraction methods [DSP Virus/Pathogen Mini Kit® (Kit-QS) and ZymoBIOMICS DNA Miniprep Kit (Kit-ZB)] and storage buffers [PrimeStore® Molecular Transport medium (Primestore) and Skim-milk, Tryptone, Glucose and Glycerol (STGG)] further influence these profiles. Kit-QS better represented hard-to-lyse bacteria from bacterial mock communities compared to Kit-ZB. Primestore storage buffer yielded lower levels of background operational taxonomic units (OTUs) from low biomass bacterial mock community controls compared to STGG. In addition to bacterial mock community controls, we used technical repeats (nasopharyngeal and induced sputum processed in duplicate, triplicate or quadruplicate) to further evaluate the effect of specimen biomass and participant age at specimen collection on resultant sequencing profiles. We observed a positive correlation (r?=?0.16) between specimen biomass and participant age at specimen collection: low biomass technical repeats (represented by <?500 16S rRNA gene copies/?l) were primarily collected at <?14?days of age. We found that low biomass technical repeats also produced higher alpha diversities (r?=?-?0.28); 16S rRNA gene profiles similar to no template controls (Primestore); and reduced sequencing reproducibility. Finally, we show that the use of statistical tools for in silico contaminant identification, as implemented through the decontam package in R, provides better representations of indigenous bacteria following decontamination. CONCLUSIONS:We provide insight into experimental design, quality control steps and "denoising" approaches for 16S rRNA gene high-throughput sequencing of low biomass specimens. We highlight the need for careful assessment of DNA extraction methods and storage buffers; sequence quality and reproducibility; and in silico identification of contaminant profiles in order to avoid spurious results.
Project description:DNA sequencing and analysis methods were compared for 16S rRNA V4 PCR amplicon and genomic DNA (gDNA) mock communities encompassing nine bacterial species commonly found in milk and dairy products. The two communities comprised strain-specific DNA that was pooled before (gDNA) or after (PCR amplicon) the PCR step. The communities were sequenced on the Illumina MiSeq and Ion Torrent PGM platforms and then analyzed using the QIIME 1 (UCLUST) and Divisive Amplicon Denoising Algorithm 2 (DADA2) analysis pipelines with taxonomic comparisons to the Greengenes and Ribosomal Database Project (RDP) databases. Examination of the PCR amplicon mock community with these methods resulted in operational taxonomic units (OTUs) and amplicon sequence variants (ASVs) that ranged from 13 to 118 and were dependent on the DNA sequencing method and read assembly steps. The additional 4 to 109 OTUs/ASVs (from 9 OTUs/ASVs) included assignments to spurious taxa and sequence variants of the 9 species included in the mock community. Comparisons between the gDNA and PCR amplicon mock communities showed that combining gDNAs from the different strains prior to PCR resulted in up to 8.9-fold greater numbers of spurious OTUs/ASVs. However, the DNA sequencing method and paired-end read assembly steps conferred the largest effects on predictions of bacterial diversity, with effect sizes of 0.88 (Bray-Curtis) and 0.32 (weighted Unifrac), independent of the mock community type. Overall, DNA sequencing performed with the Ion Torrent PGM and analyzed with DADA2 and the Greengenes database resulted in the most accurate predictions of the mock community phylogeny, taxonomy, and diversity.IMPORTANCE Validated methods are urgently needed to improve DNA sequence-based assessments of complex bacterial communities. In this study, we used 16S rRNA PCR amplicon and gDNA mock community standards, consisting of nine, dairy-associated bacterial species, to evaluate the most commonly applied 16S rRNA marker gene DNA sequencing and analysis platforms used in evaluating dairy and other bacterial habitats. Our results show that bacterial metataxonomic assessments are largely dependent on the DNA sequencing platform and read curation method used. DADA2 improved sequence annotation compared with QIIME 1, and when combined with the Ion Torrent PGM DNA sequencing platform and the Greengenes database for taxonomic assignment, the most accurate representation of the dairy mock community standards was reached. This approach will be useful for validating sample collection and DNA extraction methods and ultimately investigating bacterial population dynamics in milk- and dairy-associated environments.
Project description:Profiling of 16S rRNA gene sequences is an important tool for testing hypotheses in complex microbial communities, and analysis methods must be updated and validated as sequencing technologies advance. In host-associated bacterial communities, the V1-V3 region of the 16S rRNA gene is a valuable region to profile because it provides a useful level of taxonomic resolution; however, use of Illumina MiSeq data for experiments targeting this region needs validation.Using a MiSeq machine and the version 3 (300 × 2) chemistry, we sequenced the V1-V3 region of the 16S rRNA gene within a mock community. Nineteen bacteria and one archaeon comprised the mock community, and 12 replicate amplifications of the community were performed and sequenced. Sequencing the large fragment (490 bp) that encompasses V1-V3 yielded a higher error rate (3.6 %) than has been reported when using smaller fragment sizes. This higher error rate was due to a large number of sequences that occurred only one or two times among all mock community samples. Removing sequences that occurred one time among all samples (singletons) reduced the error rate to 1.4 %. Diversity estimates of the mock community containing all sequences were inflated, whereas estimates following singleton removal more closely reflected the actual mock community membership. A higher percentage of the sequences could be taxonomically assigned after singleton and doubleton sequences were removed, and the assignments reflected the membership of the input DNA.Sequencing the V1-V3 region of the 16S rRNA gene on the MiSeq platform may require additional sequence curation in silico, and improved error rates and diversity estimates show that removing low-frequency sequences is reasonable. When datasets have a high number of singletons, these singletons can be removed from the analysis without losing statistical power while reducing error and improving microbiota assessment.
Project description:High-throughput sequencing of the taxonomically informative 16S rRNA gene provides a powerful approach for exploring microbial diversity. Here we compare the performances of two common "benchtop" sequencing platforms, Illumina MiSeq and Ion Torrent Personal Genome Machine (PGM), for bacterial community profiling by 16S rRNA (V1-V2) amplicon sequencing. We benchmarked performance by using a 20-organism mock bacterial community and a collection of primary human specimens. We observed comparatively higher error rates with the Ion Torrent platform and report a pattern of premature sequence truncation specific to semiconductor sequencing. Read truncation was dependent on both the directionality of sequencing and the target species, resulting in organism-specific biases in community profiles. We found that these sequencing artifacts could be minimized by using bidirectional amplicon sequencing and an optimized flow order on the Ion Torrent platform. Results of bacterial community profiling performed on the mock community and a collection of 18 human-derived microbiological specimens were generally in good agreement for both platforms; however, in some cases, results differed significantly. Disparities could be attributed to the failure to generate full-length reads for particular organisms on the Ion Torrent platform, organism-dependent differences in sequence error rates affecting classification of certain species, or some combination of these factors. This study demonstrates the potential for differential bias in bacterial community profiles resulting from the choice of sequencing platform alone.
Project description:BACKGROUND:Pan-bacterial 16S rRNA microbiome surveys performed with massively parallel DNA sequencing technologies have transformed community microbiological studies. Current 16S profiling methods, however, fail to provide sufficient taxonomic resolution and accuracy to adequately perform species-level associative studies for specific conditions. This is due to the amplification and sequencing of only short 16S rRNA gene regions, typically providing for only family- or genus-level taxonomy. Moreover, sequencing errors often inflate the number of taxa present. Pacific Biosciences' (PacBio's) long-read technology in particular suffers from high error rates per base. Herein, we present a microbiome analysis pipeline that takes advantage of PacBio circular consensus sequencing (CCS) technology to sequence and error correct full-length bacterial 16S rRNA genes, which provides high-fidelity species-level microbiome data. RESULTS:Analysis of a mock community with 20 bacterial species demonstrated 100% specificity and sensitivity with regard to taxonomic classification. Examination of a 250-plus species mock community demonstrated correct species-level classification of >?90% of taxa, and relative abundances were accurately captured. The majority of the remaining taxa were demonstrated to be multiply, incorrectly, or incompletely classified. Using this methodology, we examined the microgeographic variation present among the microbiomes of six sinonasal sites, by both swab and biopsy, from the anterior nasal cavity to the sphenoid sinus from 12 subjects undergoing trans-sphenoidal hypophysectomy. We found greater variation among subjects than among sites within a subject, although significant within-individual differences were also observed. Propiniobacterium acnes (recently renamed Cutibacterium acnes) was the predominant species throughout, but was found at distinct relative abundances by site. CONCLUSIONS:Our microbial composition analysis pipeline for single-molecule real-time 16S rRNA gene sequencing (MCSMRT, https://github.com/jpearl01/mcsmrt ) overcomes deficits of standard marker gene-based microbiome analyses by using CCS of entire 16S rRNA genes to provide increased taxonomic and phylogenetic resolution. Extensions of this approach to other marker genes could help refine taxonomic assignments of microbial species and improve reference databases, as well as strengthen the specificity of associations between microbial communities and dysbiotic states.
Project description:Profiling of microbial community composition is frequently performed by partial 16S rRNA gene sequencing on benchtop platforms following PCR amplification of specific hypervariable regions within this gene. Accuracy and reproducibility of this strategy are two key parameters to consider, which may be influenced during all processes from sample collection and storage, through DNA extraction and PCR based library preparation to the final sequencing. In order to evaluate both the reproducibility and accuracy of 16S rRNA gene based microbial profiling using the Ion Torrent PGM platform, we prepared libraries and performed sequencing of a well-defined and validated 20-member bacterial DNA mock community on five separate occasions and compared results with the expected even distribution. In general the applied method had a median coefficient of variance of 11.8% (range 5.5-73.7%) for all 20 included strains in the mock community across five separate sequencing runs, with underrepresented strains generally showing the largest degree of variation. In terms of accuracy, mock community species belonging to Proteobacteria were underestimated, whereas those belonging to Firmicutes were mostly overestimated. This could be explained partly by premature read truncation, but to larger degree their genomic GC-content, which correlated negatively with the observed relative abundances, suggesting a PCR bias against GC-rich species during library preparation. Increasing the initial denaturation time during the PCR amplification from 30 to 120 s resulted in an increased average relative abundance of the three mock community members with the highest genomic GC%, but did not significantly change the overall evenness of the community distribution. Therefore, efforts should be made to optimize the PCR conditions prior to sequencing in order to maximize accuracy.
Project description:PCR amplification and sequencing of phylogenetic markers, primarily Small Sub-Unit ribosomal RNA (SSU rRNA) genes, has been the paradigm for defining the taxonomic composition of microbiomes. However, 'universal' SSU rRNA gene PCR primer sets are likely to miss much of the diversity therein. We sequenced a library comprising purified and reverse-transcribed SSU rRNA (RT-SSU rRNA) molecules from the canine oral microbiome and compared it to a general bacterial 16S rRNA gene PCR amplicon library generated from the same biological sample. In addition, we have developed BIONmeta, a novel, open-source, computer package for the processing and taxonomic classification of the randomly fragmented RT-SSU rRNA reads produced. Direct RT-SSU rRNA sequencing revealed that 16S rRNA molecules belonging to the bacterial phyla Actinobacteria, Bacteroidetes, Firmicutes, Proteobacteria and Spirochaetes, were most abundant in the canine oral microbiome (92.5% of total bacterial SSU rRNA). The direct rRNA sequencing approach detected greater taxonomic diversity (1 additional phylum, 2 classes, 1 order, 10 families and 61 genera) when compared with general bacterial 16S rRNA amplicons from the same sample, simultaneously provided SSU rRNA gene inventories of Bacteria, Archaea and Eukarya, and detected significant numbers of sequences not recognised by 'universal' primer sets. Proteobacteria and Spirochaetes were found to be under-represented by PCR-based analysis of the microbiome, and this was due to primer mismatches and taxon-specific variations in amplification efficiency, validated by qPCR analysis of 16S rRNA amplicons from a mock community. This demonstrated the veracity of direct RT-SSU rRNA sequencing for molecular microbial ecology.
Project description:Amplification and sequencing of 16S amplicons are widely used for profiling the structure of oral microbiota. However, it remains not clear whether and to what degree DNA extraction and targeted 16S rRNA hypervariable regions influence the analysis. Based on a mock community consisting of five oral bacterial species in equal abundance, we compared the 16S amplicon sequencing results on the Illumina MiSeq platform from six frequently employed DNA extraction procedures and three pairs of widely used 16S rRNA hypervariable primers targeting different 16S rRNA regions. Technical reproducibility of selected 16S regions was also assessed. DNA extraction method exerted considerable influence on the observed bacterial diversity while hypervariable regions had a relatively minor effect. Protocols with beads added to the enzyme-mediated DNA extraction reaction produced more accurate bacterial community structure than those without either beads or enzymes. Hypervariable regions targeting V3-V4 and V4-V5 seemed to produce more reproducible results than V1-V3. Neither sequencing batch nor change of operator affected the reproducibility of bacterial diversity profiles. Therefore, DNA extraction strategy and 16S rDNA hypervariable regions both influenced the results of oral microbiota biodiversity profiling, thus should be carefully considered in study design and data interpretation.
Project description:Microbial communities are commonly studied using culture-independent methods, such as 16S rRNA gene sequencing. However, one challenge in accurately characterizing microbial communities is exogenous bacterial DNA contamination, particularly in low-microbial-biomass niches. Computational approaches to identify contaminant sequences have been proposed, but their performance has not been independently evaluated. To identify the impact of decreasing microbial biomass on polymicrobial 16S rRNA gene sequencing experiments, we created a mock microbial community dilution series. We evaluated four computational approaches to identify and remove contaminants, as follows: (i) filtering sequences present in a negative control, (ii) filtering sequences based on relative abundance, (iii) identifying sequences that have an inverse correlation with DNA concentration implemented in Decontam, and (iv) predicting the sequence proportion arising from defined contaminant sources implemented in SourceTracker. As expected, the proportion of contaminant bacterial DNA increased with decreasing starting microbial biomass, with 80.1% of the most diluted sample arising from contaminant sequences. Inclusion of contaminant sequences led to overinflated diversity estimates and distorted microbiome composition. All methods for contaminant identification successfully identified some contaminant sequences, which varied depending on the method parameters used and contaminant prevalence. Notably, removing sequences present in a negative control erroneously removed >20% of expected sequences. SourceTracker successfully removed over 98% of contaminants when the experimental environments were well defined. However, SourceTracker misclassified expected sequences and performed poorly when the experimental environment was unknown, failing to remove >97% of contaminants. In contrast, the Decontam frequency method did not remove expected sequences and successfully removed 70 to 90% of the contaminants.IMPORTANCE The relative scarcity of microbes in low-microbial-biomass environments makes accurate determination of community composition challenging. Identifying and controlling for contaminant bacterial DNA are critical steps in understanding microbial communities from these low-biomass environments. Our study introduces the use of a mock community dilution series as a positive control and evaluates four computational strategies that can identify contaminants in 16S rRNA gene sequencing experiments in order to remove them from downstream analyses. The appropriate computational approach for removing contaminant sequences from an experiment depends on prior knowledge about the microbial environment under investigation and can be evaluated with a dilution series of a mock microbial community.
Project description:As 16S rRNA gene targeted massively parallel sequencing has become a common tool for microbial diversity investigations, numerous advances have been made to minimize the influence of sequencing and chimeric PCR artifacts through rigorous quality control measures. However, there has been little effort towards understanding the effect of multi-template PCR biases on microbial community structure. In this study, we used three bacterial and three archaeal mock communities consisting of, respectively, 33 bacterial and 24 archaeal 16S rRNA gene sequences combined in different proportions to compare the influences of (1) sequencing depth, (2) sequencing artifacts (sequencing errors and chimeric PCR artifacts), and (3) biases in multi-template PCR, towards the interpretation of community structure in pyrosequencing datasets. We also assessed the influence of each of these three variables on ?- and ?-diversity metrics that rely on the number of OTUs alone (richness) and those that include both membership and the relative abundance of detected OTUs (diversity). As part of this study, we redesigned bacterial and archaeal primer sets that target the V3-V5 region of the 16S rRNA gene, along with multiplexing barcodes, to permit simultaneous sequencing of PCR products from the two domains. We conclude that the benefits of deeper sequencing efforts extend beyond greater OTU detection and result in higher precision in ?-diversity analyses by reducing the variability between replicate libraries, despite the presence of more sequencing artifacts. Additionally, spurious OTUs resulting from sequencing errors have a significant impact on richness or shared-richness based ?- and ?-diversity metrics, whereas metrics that utilize community structure (including both richness and relative abundance of OTUs) are minimally affected by spurious OTUs. However, the greatest obstacle towards accurately evaluating community structure are the errors in estimated mean relative abundance of each detected OTU due to biases associated with multi-template PCR reactions.