Evaluation of strategies for the assembly of diverse bacterial genomes using MinION long-read sequencing.
ABSTRACT: BACKGROUND:Short-read sequencing technologies have made microbial genome sequencing cheap and accessible. However, closing genomes is often costly and assembling short reads from genomes that are repetitive and/or have extreme %GC content remains challenging. Long-read, single-molecule sequencing technologies such as the Oxford Nanopore MinION have the potential to overcome these difficulties, although the best approach for harnessing their potential remains poorly evaluated. RESULTS:We sequenced nine bacterial genomes spanning a wide range of GC contents using Illumina MiSeq and Oxford Nanopore MinION sequencing technologies to determine the advantages of each approach, both individually and combined. Assemblies using only MiSeq reads were highly accurate but lacked contiguity, a deficiency that was partially overcome by adding MinION reads to these assemblies. Even more contiguous genome assemblies were generated by using MinION reads for initial assembly, but these assemblies were more error-prone and required further polishing. This was especially pronounced when Illumina libraries were biased, as was the case for our strains with both high and low GC content. Increased genome contiguity dramatically improved the annotation of insertion sequences and secondary metabolite biosynthetic gene clusters, likely because long-reads can disambiguate these highly repetitive but biologically important genomic regions. CONCLUSIONS:Genome assembly using short-reads is challenged by repetitive sequences and extreme GC contents. Our results indicate that these difficulties can be largely overcome by using single-molecule, long-read sequencing technologies such as the Oxford Nanopore MinION. Using MinION reads for assembly followed by polishing with Illumina reads generated the most contiguous genomes with sufficient accuracy to enable the accurate annotation of important but difficult to sequence genomic features such as insertion sequences and secondary metabolite biosynthetic gene clusters. The combination of Oxford Nanopore and Illumina sequencing can therefore cost-effectively advance studies of microbial evolution and genome-driven drug discovery.
Project description:BACKGROUND:Oxford Nanopore Technologies Ltd (Oxford, UK) have recently commercialized MinION, a small single-molecule nanopore sequencer, that offers the possibility of sequencing long DNA fragments from small genomes in a matter of seconds. The Oxford Nanopore technology is truly disruptive; it has the potential to revolutionize genomic applications due to its portability, low cost, and ease of use compared with existing long reads sequencing technologies. The MinION sequencer enables the rapid sequencing of small eukaryotic genomes, such as the yeast genome. Combined with existing assembler algorithms, near complete genome assemblies can be generated and comprehensive population genomic analyses can be performed. RESULTS:Here, we resequenced the genome of the Saccharomyces cerevisiae S288C strain to evaluate the performance of nanopore-only assemblers. Then we de novo sequenced and assembled the genomes of 21 isolates representative of the S. cerevisiae genetic diversity using the MinION platform. The contiguity of our assemblies was 14 times higher than the Illumina-only assemblies and we obtained one or two long contigs for 65 % of the chromosomes. This high contiguity allowed us to accurately detect large structural variations across the 21 studied genomes. CONCLUSION:Because of the high completeness of the nanopore assemblies, we were able to produce a complete cartography of transposable elements insertions and inspect structural variants that are generally missed using a short-read sequencing strategy. Our analyses show that the Oxford Nanopore technology is already usable for de novo sequencing and assembly; however, non-random errors in homopolymers require polishing the consensus using an alternate sequencing technology.
Project description:The MinION is a portable single-molecule DNA sequencing instrument that was released by Oxford Nanopore Technologies in 2014, producing long sequencing reads by measuring changes in ionic flow when single-stranded DNA molecules translocate through the pores. While MinION long reads have an error rate substantially higher than the ones produced by short-read sequencing technologies, they can generate de novo assemblies of microbial genomes, after an initial correction step that includes alignment of Illumina sequencing data or detection of overlaps between Oxford Nanopore reads to improve accuracy. In this study, MinION reads were generated from the multi-chromosome genome of Agrobacterium tumefaciens strain LBA4404. Errors in the consensus two-directional (sense and antisense) "2D" sequences were first characterized by way of comparison with an internal reference assembly. Both Illumina-based correction and self-correction were performed and the resulting corrected reads assembled into high-quality hybrid and non-hybrid assemblies. Corrected read datasets and assemblies were subsequently compared. The results shown here indicate that both hybrid and non-hybrid methods can be used to assemble Oxford Nanopore reads into informative multi-chromosome assemblies, each with slightly different outcomes in terms of contiguity and accuracy.
Project description:Long-read sequencing technologies were launched a few years ago, and in contrast with short-read sequencing technologies, they offered a promise of solving assembly problems for large and complex genomes. Moreover by providing long-range information, it could also solve haplotype phasing. However, existing long-read technologies still have several limitations that complicate their use for most research laboratories, as well as in large and/or complex genome projects. In 2014, Oxford Nanopore released the MinION® device, a small and low-cost single-molecule nanopore sequencer, which offers the possibility of sequencing long DNA fragments.The assembly of long reads generated using the Oxford Nanopore MinION® instrument is challenging as existing assemblers were not implemented to deal with long reads exhibiting close to 30% of errors. Here, we presented a hybrid approach developed to take advantage of data generated using MinION® device. We sequenced a well-known bacterium, Acinetobacter baylyi ADP1 and applied our method to obtain a highly contiguous (one single contig) and accurate genome assembly even in repetitive regions, in contrast to an Illumina-only assembly. Our hybrid strategy was able to generate NaS (Nanopore Synthetic-long) reads up to 60 kb that aligned entirely and with no error to the reference genome and that spanned highly conserved repetitive regions. The average accuracy of NaS reads reached 99.99% without losing the initial size of the input MinION® reads.We described NaS tool, a hybrid approach allowing the sequencing of microbial genomes using the MinION® device. Our method, based ideally on 20x and 50x of NaS and Illumina reads respectively, provides an efficient and cost-effective way of sequencing microbial or small eukaryotic genomes in a very short time even in small facilities. Moreover, we demonstrated that although the Oxford Nanopore technology is a relatively new sequencing technology, currently with a high error rate, it is already useful in the generation of high-quality genome assemblies.
Project description:Long-read sequencing technologies such as Pacific Biosciences and Oxford Nanopore MinION are capable of producing long sequencing reads with average fragment lengths of over 10,000 base-pairs and maximum lengths reaching 100,000 base- pairs. Compared with short reads, the assemblies obtained from long-read sequencing platforms have much higher contig continuity and genome completeness as long fragments are able to extend paths into problematic or repetitive regions. Many successful assembly applications of the Pacific Biosciences technology have been reported ranging from small bacterial genomes to large plant and animal genomes. Recently, genome assemblies using Oxford Nanopore MinION data have attracted much attention due to the portability and low cost of this novel sequencing instrument. In this paper, we re-sequenced a well characterized genome, the Saccharomyces cerevisiae S288C strain using three different platforms: MinION, PacBio and MiSeq. We present a comprehensive metric comparison of assemblies generated by various pipelines and discuss how the platform associated data characteristics affect the assembly quality. With a given read depth of 31X, the assemblies from both Pacific Biosciences and Oxford Nanopore MinION show excellent continuity and completeness for the 16 nuclear chromosomes, but not for the mitochondrial genome, whose reconstruction still represents a significant challenge.
Project description:Chromosome-scale genome assembly of the yeast Saprochaete ingens CBS 517.90 was determined by a combination of technologies producing short (HiSeq X; Illumina) and long (MinION; Oxford Nanopore Technologies) reads. The 21.2-Mbp genome sequence has a GC content of 36.9% and codes for 6,475 predicted proteins.
Project description:Kroppenstedtia eburnea DSM 45196T and Kroppenstedtia pulmonis W9323T are aerobic, Gram-positive, filamentous, chemoorganotrophic thermoactinomycetes. Here, we report on the complete and circular genome assemblies generated using Illumina MiSeq and Oxford Nanopore Technologies MinION reads. Putative gene clusters predicted to be involved in the production of secondary metabolites were also identified.
Project description:Pectobacterium punjabense is a newly described species causing blackleg disease in potato plants. Therefore, by the combination of long (Oxford Nanopore Technologies, MinION) and short (Illumina MiSeq) reads, we sequenced the complete genome of P. punjabense SS95T, which contains a circular chromosome of 4.793?Mb with a GC content of 50.7%.
Project description:The Oxford Nanopore Technologies (ONT) MinION is a new sequencing technology that potentially offers read lengths of tens of kilobases (kb) limited only by the length of DNA molecules presented to it. The device has a low capital cost, is by far the most portable DNA sequencer available, and can produce data in real-time. It has numerous prospective applications including improving genome sequence assemblies and resolution of repeat-rich regions. Before such a technology is widely adopted, it is important to assess its performance and limitations in respect of throughput and accuracy. In this study we assessed the performance of the MinION by re-sequencing three bacterial genomes, with very different nucleotide compositions ranging from 28.6% to 70.7%; the high G + C strain was underrepresented in the sequencing reads. We estimate the error rate of the MinION (after base calling) to be 38.2%. Mean and median read lengths were 2 kb and 1 kb respectively, while the longest single read was 98 kb. The whole length of a 5 kb rRNA operon was covered by a single read. As the first nanopore-based single molecule sequencer available to researchers, the MinION is an exciting prospect; however, the current error rate limits its ability to compete with existing sequencing technologies, though we do show that MinION sequence reads can enhance contiguity of de novo assembly when used in conjunction with Illumina MiSeq data.
Project description:Pseudoalteromonas haloplanktis TAC125 is among the most commonly studied bacteria adapted to cold environments. Aside from its ecological relevance, P. haloplanktis has a potential use for biotechnological applications. Due to its importance, we decided to take advantage of next generation sequencing (Illumina) and third generation sequencing (PacBio and Oxford Nanopore) technologies to resequence its genome. The availability of a reference genome, obtained using whole genome shotgun sequencing, allowed us to study and compare the results obtained by the different technologies and draw useful conclusions for future de novo genome assembly projects. We found that assembly polishing using Illumina reads is needed to achieve a consensus accuracy over 99.9% when using Oxford Nanopore sequencing, but not in PacBio sequencing. However, the dependency of consensus accuracy on coverage is lower in Oxford Nanopore than in PacBio, suggesting that a cost-effective solution might be the use of low coverage Oxford Nanopore sequencing together with Illumina reads. Despite the differences in consensus accuracy, all sequencing technologies revealed the presence of a large plasmid, pMEGA, which was undiscovered until now. Among the most interesting features of pMEGA is the presence of a putative error-prone polymerase regulated through the SOS response. Aside from the characterization of the newly discovered plasmid, we confirmed the sequence of the small plasmid pMtBL and uncovered the presence of a potential partitioning system. Crucially, this study shows that the combination of next and third generation sequencing technologies give us an unprecedented opportunity to characterize our bacterial model organisms at a very detailed level.
Project description:Metagenomic sequence data from defined mock communities is crucial for the assessment of sequencing platform performance and downstream analyses, including assembly, binning and taxonomic assignment. We report a comparison of shotgun metagenome sequencing and assembly metrics of a defined microbial mock community using the Oxford Nanopore Technologies (ONT) MinION, PacBio and Illumina sequencing platforms. Our synthetic microbial community BMock12 consists of 12 bacterial strains with genome sizes spanning 3.2-7.2 Mbp, 40-73% GC content, and 1.5-7.3% repeats. Size selection of both PacBio and ONT sequencing libraries prior to sequencing was essential to yield comparable relative abundances of organisms among all sequencing technologies. While the Illumina-based metagenome assembly yielded good coverage with few misassemblies, contiguity was greatly improved by both, Illumina?+?ONT and Illumina?+?PacBio hybrid assemblies but increased misassemblies, most notably in genomes with high sequence similarity to each other. Our resulting datasets allow evaluation and benchmarking of bioinformatics software on Illumina, PacBio and ONT platforms in parallel.