Implications of Error-Prone Long-Read Whole-Genome Shotgun Sequencing on Characterizing Reference Microbiomes.
ABSTRACT: Long-read sequencing techniques, such as the Oxford Nanopore Technology, can generate reads that are tens of kilobases in length and are therefore particularly relevant for microbiome studies. However, owing to the higher per-base error rates than typical short-read sequencing, the application of long-read sequencing on microbiomes remains largely unexplored. Here we deeply sequenced two human microbiota mock community samples (HM-276D and HM-277D) from the Human Microbiome Project. We showed that assembly programs consistently achieved high accuracy (?99%) and completeness (?99%) for bacterial strains with adequate coverage. We also found that long-read sequencing provides accurate estimates of species-level abundance (R = 0.94 for 20 bacteria with abundance ranging from 0.005% to 64%). Our results not only demonstrate the feasibility of characterizing complete microbial genomes and populations from error-prone Nanopore sequencing data but also highlight necessary bioinformatics improvements for future metagenomics tool development.
Project description:BACKGROUND:Nanopore sequencing provides a rapid, cheap and portable real-time sequencing platform with the potential to revolutionize genomics. However, several applications are limited by relatively high single-read error rates (>10 %), including RNA-seq, haplotype sequencing and 16S sequencing. RESULTS:We developed the Intramolecular-ligated Nanopore Consensus Sequencing (INC-Seq) as a strategy for obtaining long and accurate nanopore reads, starting with low input DNA. Applying INC-Seq for 16S rRNA-based bacterial profiling generated full-length amplicon sequences with a median accuracy >97 %. CONCLUSIONS:INC-Seq reads enabled accurate species-level classification, identification of species at 0.1 % abundance and robust quantification of relative abundances, providing a cheap and effective approach for pathogen detection and microbiome profiling on the MinION system.
Project description:Microbial genomes can be assembled from short-read sequencing data, but the assembly contiguity of these metagenome-assembled genomes is constrained by repeat elements. Correct assignment of genomic positions of repeats is crucial for understanding the effect of genome structure on genome function. We applied nanopore sequencing and our workflow, named Lathe, which incorporates long-read assembly and short-read error correction, to assemble closed bacterial genomes from complex microbiomes. We validated our approach with a synthetic mixture of 12 bacterial species. Seven genomes were completely assembled into single contigs and three genomes were assembled into four or fewer contigs. Next, we used our methods to analyze metagenomics data from 13 human stool samples. We assembled 20 circular genomes, including genomes of Prevotella copri and a candidate Cibiobacter sp. Despite the decreased nucleotide accuracy compared with alternative sequencing and assembly approaches, our methods improved assembly contiguity, allowing for investigation of the role of repeat elements in microbial function and adaptation.
Project description:Speed, single-base sensitivity and long read lengths make nanopores a promising technology for high-throughput sequencing. We evaluated and optimized the performance of the MinION nanopore sequencer using M13 genomic DNA and used expectation maximization to obtain robust maximum-likelihood estimates for insertion, deletion and substitution error rates (4.9%, 7.8% and 5.1%, respectively). Over 99% of high-quality 2D MinION reads mapped to the reference at a mean identity of 85%. We present a single-nucleotide-variant detection tool that uses maximum-likelihood parameter estimates and marginalization over many possible read alignments to achieve precision and recall of up to 99%. By pairing our high-confidence alignment strategy with long MinION reads, we resolved the copy number for a cancer-testis gene family (CT47) within an unresolved region of human chromosome Xq24.
Project description:<h4>Background</h4>Hepatitis C (HCV) and many other RNA viruses exist as rapidly mutating quasi-species populations in a single infected host. High throughput characterization of full genome, within-host variants is still not possible despite advances in next generation sequencing. This limitation constrains viral genomic studies that depend on accurate identification of hemi-genome or whole genome, within-host variants, especially those occurring at low frequencies. With the advent of third generation long read sequencing technologies, including Oxford Nanopore Technology (ONT) and PacBio platforms, this problem is potentially surmountable. ONT is particularly attractive in this regard due to the portable nature of the MinION sequencer, which makes real-time sequencing in remote and resource-limited locations possible. However, this technology (termed here 'nanopore sequencing') has a comparatively high technical error rate. The present study aimed to assess the utility, accuracy and cost-effectiveness of nanopore sequencing for HCV genomes. We also introduce a new bioinformatics tool (Nano-Q) to differentiate within-host variants from nanopore sequencing.<h4>Results</h4>The Nanopore platform, when the coverage exceeded 300 reads, generated comparable consensus sequences to Illumina sequencing. Using HCV Envelope plasmids (~?1800?nt) mixed in known proportions, the capacity of nanopore sequencing to reliably identify variants with an abundance as low as 0.1% was demonstrated, provided the autologous reference sequence was available to identify the matching reads. Successful pooling and nanopore sequencing of 52 samples from patients with HCV infection demonstrated its cost effectiveness (AUD$ 43 per sample with nanopore sequencing versus $100 with paired-end short read technology). The Nano-Q tool successfully separated between-host sequences, including those from the same subtype, by bulk sorting and phylogenetic clustering without an autologous reference sequence (using only a subtype-specific generic reference). The pipeline also identified within-host viral variants and their abundance when the parameters were appropriately adjusted.<h4>Conclusion</h4>Cost effective HCV whole genome sequencing and within-host variant identification without haplotype reconstruction are potential advantages of nanopore sequencing.
Project description:Nanopore sequencing promises long read-lengths and single-molecule resolution, but the stochastic motion of the DNA molecule inside the pore is, as of this writing, a barrier to high accuracy reads. We develop a method of statistical inference that explicitly accounts for this error, and demonstrate that high accuracy (>99%) sequence inference is feasible even under highly diffusive motion by using a hidden Markov model to jointly analyze multiple stochastic reads. Using this model, we place bounds on achievable inference accuracy under a range of experimental parameters.
Project description:Viral whole-genome sequencing (WGS) provides critical insight into the transmission and evolution of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). Long-read sequencing devices from Oxford Nanopore Technologies (ONT) promise significant improvements in turnaround time, portability and cost, compared to established short-read sequencing platforms for viral WGS (e.g., Illumina). However, adoption of ONT sequencing for SARS-CoV-2 surveillance has been limited due to common concerns around sequencing accuracy. To address this, here we perform viral WGS with ONT and Illumina platforms on 157 matched SARS-CoV-2-positive patient specimens and synthetic RNA controls, enabling rigorous evaluation of analytical performance. We report that, despite the elevated error rates observed in ONT sequencing reads, highly accurate consensus-level sequence determination was achieved, with single nucleotide variants (SNVs) detected at >99% sensitivity and >99% precision above a minimum ~60-fold coverage depth, thereby ensuring suitability for SARS-CoV-2 genome analysis. ONT sequencing also identified a surprising diversity of structural variation within SARS-CoV-2 specimens that were supported by evidence from short-read sequencing on matched samples. However, ONT sequencing failed to accurately detect short indels and variants at low read-count frequencies. This systematic evaluation of analytical performance for SARS-CoV-2 WGS will facilitate widespread adoption of ONT sequencing within local, national and international COVID-19 public health initiatives.
Project description:Monitoring the progress of DNA molecules through a membrane pore has been postulated as a method for sequencing DNA for several decades. Recently, a nanopore-based sequencing instrument, the Oxford Nanopore MinION, has become available, and we used this for sequencing the Saccharomyces cerevisiae genome. To make use of these data, we developed a novel open-source hybrid error correction algorithm Nanocorr specifically for Oxford Nanopore reads, because existing packages were incapable of assembling the long read lengths (5-50 kbp) at such high error rates (between ∼5% and 40% error). With this new method, we were able to perform a hybrid error correction of the nanopore reads using complementary MiSeq data and produce a de novo assembly that is highly contiguous and accurate: The contig N50 length is more than ten times greater than an Illumina-only assembly (678 kb versus 59.9 kbp) and has >99.88% consensus identity when compared to the reference. Furthermore, the assembly with the long nanopore reads presents a much more complete representation of the features of the genome and correctly assembles gene cassettes, rRNAs, transposable elements, and other genomic features that were almost entirely absent in the Illumina-only assembly.
Project description:Long-read sequencing technologies were launched a few years ago, and in contrast with short-read sequencing technologies, they offered a promise of solving assembly problems for large and complex genomes. Moreover by providing long-range information, it could also solve haplotype phasing. However, existing long-read technologies still have several limitations that complicate their use for most research laboratories, as well as in large and/or complex genome projects. In 2014, Oxford Nanopore released the MinION® device, a small and low-cost single-molecule nanopore sequencer, which offers the possibility of sequencing long DNA fragments.The assembly of long reads generated using the Oxford Nanopore MinION® instrument is challenging as existing assemblers were not implemented to deal with long reads exhibiting close to 30% of errors. Here, we presented a hybrid approach developed to take advantage of data generated using MinION® device. We sequenced a well-known bacterium, Acinetobacter baylyi ADP1 and applied our method to obtain a highly contiguous (one single contig) and accurate genome assembly even in repetitive regions, in contrast to an Illumina-only assembly. Our hybrid strategy was able to generate NaS (Nanopore Synthetic-long) reads up to 60 kb that aligned entirely and with no error to the reference genome and that spanned highly conserved repetitive regions. The average accuracy of NaS reads reached 99.99% without losing the initial size of the input MinION® reads.We described NaS tool, a hybrid approach allowing the sequencing of microbial genomes using the MinION® device. Our method, based ideally on 20x and 50x of NaS and Illumina reads respectively, provides an efficient and cost-effective way of sequencing microbial or small eukaryotic genomes in a very short time even in small facilities. Moreover, we demonstrated that although the Oxford Nanopore technology is a relatively new sequencing technology, currently with a high error rate, it is already useful in the generation of high-quality genome assemblies.
Project description:The upper respiratory tract microbiome has an important role in respiratory health. Influenza A is a common viral infection that challenges that health, and a well-recognized sequela is bacterial pneumonia. Given this connection, we sought to characterize the upper respiratory tract microbiota of individuals suffering from the pandemic H1N1 influenza A outbreak of 2009 and determine if microbiome profiles could be correlated with patient characteristics. We determined the microbial profiles of 65 samples from H1N1 patients by cpn60 universal target amplification and sequencing. Profiles were examined at the phylum and nearest neighbor "species" levels using the characteristics of patient gender, age, originating health authority, sample type and designation (STAT/non-STAT). At the phylum level, Actinobacteria-, Firmicutes- and Proteobacteria-dominated microbiomes were observed, with none of the patient characteristics showing significant profile composition differences. At the nearest neighbor "species" level, the upper respiratory tract microbiomes were composed of 13-20 "species" and showed a trend towards increasing diversity with patient age. Interestingly, at an individual level, most patients had one to three organisms dominant in their microbiota. A limited number of discrete microbiome profiles were observed, shared among influenza patients regardless of patient status variables. To assess the validity of analyses derived from sequence read abundance, several bacterial species were quantified by quantitative PCR and compared to the abundance of cpn60 sequence read counts obtained in the study. A strong positive correlation between read abundance and absolute bacterial quantification was observed. This study represents the first examination of the upper respiratory tract microbiome using a target other than the 16S rRNA gene and to our knowledge, the first thorough examination of this microbiome during a viral infection.
Project description:Widespread release of Bacillus anthracis (anthrax) or Yersinia pestis (plague) would prompt a public health emergency. During an exposure event, high-quality whole genome sequencing (WGS) can identify genetic engineering, including the introduction of antimicrobial resistance (AMR) genes. Here, we developed rapid WGS laboratory and bioinformatics workflows using a long-read nanopore sequencer (MinION) for Y. pestis (6.5?h) and B. anthracis (8.5?h) and sequenced strains with different AMR profiles. Both salt-precipitation and silica-membrane extracted DNA were suitable for MinION WGS using both rapid and field library preparation methods. In replicate experiments, nanopore quality metrics were defined for genome assembly and mutation analysis. AMR markers were correctly detected and >99% coverage of chromosomes and plasmids was achieved using 100,000 raw sequencing reads. While chromosomes and large and small plasmids were accurately assembled, including novel multimeric forms of the Y. pestis virulence plasmid, pPCP1, MinION reads were error-prone, particularly in homopolymer regions. MinION sequencing holds promise as a practical, front-line strategy for on-site pathogen characterization to speed the public health response during a biothreat emergency.