A chromosome-scale assembly of the sorghum genome using nanopore sequencing and optical mapping.
ABSTRACT: Long-read sequencing technologies have greatly facilitated assemblies of large eukaryotic genomes. In this paper, Oxford Nanopore sequences generated on a MinION sequencer are combined with Bionano Genomics Direct Label and Stain (DLS) optical maps to generate a chromosome-scale de novo assembly of the repeat-rich Sorghum bicolor Tx430 genome. The final assembly consists of 29 scaffolds, encompassing in most cases entire chromosome arms. It has a scaffold N50 of 33.28 Mbps and covers 90% of the expected genome length. A sequence accuracy of 99.85% is obtained after aligning the assembly against Illumina Tx430 data and 99.6% of the 34,211 public gene models align to the assembly. Comparisons of Tx430 and BTx623 DLS maps against the public BTx623 v3.0.1 genome assembly suggest substantial discrepancies whose origin remains to be determined. In summary, this study demonstrates that informative assemblies of complex plant genomes can be generated by combining nanopore sequencing with DLS optical maps.
Project description:Sorghum (Sorghum bicolor L.) ranks as the fifth most widely planted cereal in the world and is used for food as well as a biomass plant for ethanol production. Use of the TX430 non-tannin sorghum variety has enhanced Agrobacterium-mediated sorghum transformation. These protocols could not be applied, however, to other tannin producing sorghum varieties such as the BTx623 model cultivar for sorghum with full genome information of sorghum. Here we report an improved protocol for Agrobacterium-mediated genetic transformation of tannin-producing sorghum variety BTx623. We successfully developed modification of root regeneration condition for generation of transgenic plant of BTx623. We inoculated immature embryos with Agrobacterium tumefaciens strain EHA105 harboring pMDC32-35S-GFP to generate transgenic plants. In the root regeneration step, we found that regeneration from transformed calli was affected by tannin. For root regeneration, shoots that appeared were not transferred to agar plate, but instead transferred to vermiculite in a plastic pod. Direct planting of regenerated shoots into vermiculite prevented the toxic effect of tannin. Root regeneration efficiency from calli emerged shoots in vermiculite was 78.57%. Presence of sGFP transgene in the genome of transgenic plants was confirmed by PCR and sGFP expression was confirmed in transgenic plants. This improved protocol of Agrobacterium-mediated transformation for tannin-producing sorghum BTx623 could be a useful tool for functional genomics using this plant.
Project description:Background:The barn swallow (Hirundo rustica) is a migratory bird that has been the focus of a large number of ecological, behavioral, and genetic studies. To facilitate further population genetics and genomic studies, we present a reference genome assembly for the European subspecies (H. r. rustica). Findings:As part of the Genome10K effort on generating high-quality vertebrate genomes (Vertebrate Genomes Project), we have assembled a highly contiguous genome assembly using single molecule real-time (SMRT) DNA sequencing and several Bionano optical map technologies. We compared and integrated optical maps derived from both the Nick, Label, Repair, and Stain technology and from the Direct Label and Stain (DLS) technology. As proposed by Bionano, DLS more than doubled the scaffold N50 with respect to the nickase. The dual enzyme hybrid scaffold led to a further marginal increase in scaffold N50 and an overall increase of confidence in the scaffolds. After removal of haplotigs, the final assembly is approximately 1.21 Gbp in size, with a scaffold N50 value of more than 25.95 Mbp. Conclusions:This high-quality genome assembly represents a valuable resource for future studies of population genetics and genomics in the barn swallow and for studies concerning the evolution of avian genomes. It also represents one of the very first genomes assembled by combining SMRT long-read sequencing with the new Bionano DLS technology for scaffolding. The quality of this assembly demonstrates the potential of this methodology to substantially increase the contiguity of genome assemblies.
Project description:<h4>Motivation</h4>Long read sequencing and Bionano Genomics optical maps are two techniques that, when used together, make it possible to reconstruct entire chromosome or chromosome arms structure. However, the existing tools are often too conservative and organization of contigs into scaffolds is not always optimal.<h4>Results</h4>We developed BiSCoT (Bionano SCaffolding COrrection Tool), a tool that post-processes files generated during a Bionano scaffolding in order to produce an assembly of greater contiguity and quality. BiSCoT was tested on a human genome and four publicly available plant genomes sequenced with Nanopore long reads and improved significantly the contiguity and quality of the assemblies. BiSCoT generates a fasta file of the assembly as well as an AGP file which describes the new organization of the input assembly.<h4>Availability</h4>BiSCoT and improved assemblies are freely available on GitHub at http://www.genoscope.cns.fr/biscot and Pypi at https://pypi.org/project/biscot/.
Project description:The recent technological advances in genome sequencing techniques have resulted in an exponential increase in the number of sequenced human and non-human genomes. The ever increasing number of assemblies generated by novel de novo pipelines and strategies demands the development of new software to evaluate assembly quality and completeness. One way to determine the completeness of an assembly is by detecting its Presence-Absence variations (PAV) with respect to a reference, where PAVs between two assemblies are defined as the sequences present in one assembly but entirely missing in the other one. Beyond assembly error or technology bias, PAVs can also reveal real genome polymorphism, consequence of species or individual evolution, or horizontal transfer from viruses and bacteria.We present scanPAV, a pipeline for pairwise assembly comparison to identify and extract sequences present in one assembly but not the other. In this note we use the GRCh38 reference assembly to assess the completeness of six human genome assemblies from various assembly strategies and sequencing technologies including Illumina short reads, 10X genomics linked-reads, PacBio and Oxford Nanopore long reads, and Bionano optical maps. We also discuss the PAV polymorphism of seven Tasmanian devil whole genome assemblies of normal animal tissues and devil facial tumour 1 (DFT1) and 2 (DFT2) samples, and the identification of bacterial sequences as contamination in some of the tumorous assemblies.The pipeline is available under the MIT License at https://github.com/wtsi-hpag/scanPAV.firstname.lastname@example.org, email@example.com.A supplementary note is available at Bioinformatics online.
Project description:Abstract <h4>Background</h4> The combination of long reads and long-range information to produce genome assemblies is now accepted as a common standard. This strategy not only allows access to the gene catalogue of a given species but also reveals the architecture and organization of chromosomes, including complex regions such as telomeres and centromeres. The Brassica genus is not exempt, and many assemblies based on long reads are now available. The reference genome for Brassica napus, Darmor-bzh, which was published in 2014, was produced using short reads and its contiguity was extremely low compared with current assemblies of the Brassica genus. <h4>Findings</h4> Herein, we report the new long-read assembly of Darmor-bzh genome (Brassica napus) generated by combining long-read sequencing data and optical and genetic maps. Using the PromethION device and 6 flowcells, we generated ?16 million long reads representing 93× coverage and, more importantly, 6× with reads longer than 100 kb. This ultralong-read dataset allows us to generate one of the most contiguous and complete assemblies of a Brassica genome to date (contig N50 > 10 Mb). In addition, we exploited all the advantages of the nanopore technology to detect modified bases and sequence transcriptomic data using direct RNA to annotate the genome and focus on resistance genes. <h4>Conclusion</h4> Using these cutting-edge technologies, and in particular by relying on all the advantages of the nanopore technology, we provide the most contiguous Brassica napus assembly, a resource that will be valuable to the Brassica community for crop improvement and will facilitate the rapid selection of agronomically important traits.
Project description:Whole genome sequence comparisons have become essential for establishing a robust scheme in bacterial taxonomy. To generalize this genome-based taxonomy, fast, reliable, and cost-effective genome sequencing methodologies are required. MinION, the palm-sized sequencer from Oxford Nanopore Technologies, enables rapid sequencing of bacterial genomes using minimal laboratory resources. Here we tested the ability of Nanopore sequences for the genome-based taxonomy of Vibrionaceae and compared Nanopore-only assemblies to complete genomes of five Rumoiensis clade species: Vibrio aphrogenes, V. algivorus, V. casei, V. litoralis, and V. rumoiensis. Comparison of overall genome relatedness indices (OGRI) and multilocus sequence analysis (MLSA) based on Nanopore-only assembly and Illumina or hybrid assemblies revealed that errors in Nanopore-only assembly do not influence average nucleotide identity (ANI), in silico DNA-DNA hybridization (DDH), G+C content, or MLSA tree topology in Vibrionaceae. Our results show that the genome sequences from Nanopore-based approach can be used for rapid species identification based on the OGRI and MLSA.
Project description:The chloroplast genome harbors plenty of valuable information for phylogenetic research. Illumina short-read data is generally used for de novo assembly of whole plastomes. PacBio or Oxford Nanopore long reads are additionally employed in hybrid approaches to enable assembly across the highly similar inverted repeats of a chloroplast genome. Unlike for PacBio, plastome assemblies based solely on Nanopore reads are rarely found, due to their high error rate and non-random error profile. However, the actual quality decline connected to their use has rarely been quantified. Furthermore, no study has employed reference-based assembly using Nanopore reads, which is common with Illumina data. Using Leucanthemum Mill. as an example, we compared the sequence quality of seven chloroplast genome assemblies of the same species, using combinations of two sequencing platforms and three analysis pipelines. In addition, we assessed the factors which might influence Nanopore assembly quality during sequence generation and bioinformatic processing. The consensus sequence derived from de novo assembly of Nanopore data had a sequence identity of 99.59% compared to Illumina short-read de novo assembly. Most of the errors detected were indels (81.5%), and a large majority of them is part of homopolymer regions. The quality of reference-based assembly is heavily dependent upon the choice of a close-enough reference. When using a reference with 0.83% sequence divergence from the studied species, mapping of Nanopore reads results in a consensus comparable to that from Nanopore de novo assembly, and of only slightly inferior quality compared to a reference-based assembly with Illumina data. For optimal de novo assembly of Nanopore data, appropriate filtering of contaminants and chimeric sequences, as well as employing moderate read coverage, is essential. Based on these results, we conclude that Nanopore long reads are a suitable alternative to Illumina short reads in plastome phylogenomics. Few errors remain in the finalized assembly, which can be easily masked in phylogenetic analyses without loss in analytical accuracy. The easily applicable and cost-effective technology might warrant more attention by researchers dealing with plant chloroplast genomes.
Project description:Standard antimicrobial susceptibility testing (AST) approaches lead to delays in the selection of optimal antimicrobial therapy. Here, we sought to determine the accuracy of antimicrobial resistance (AMR) determinants identified by Nanopore whole-genome sequencing in predicting AST results. Using a cohort of 40 clinical isolates (21 carbapenemase-producing carbapenem-resistant Klebsiella pneumoniae, 10 non-carbapenemase-producing carbapenem-resistant K. pneumoniae, and 9 carbapenem-susceptible K. pneumoniae isolates), three separate sequencing and analysis pipelines were performed, as follows: (i) a real-time Nanopore analysis approach identifying acquired AMR genes, (ii) an assembly-based Nanopore approach identifying acquired AMR genes and chromosomal mutations, and (iii) an approach using short-read correction of Nanopore assemblies. The short-read correction of Nanopore assemblies served as the reference standard to determine the accuracy of Nanopore sequencing results. With the real-time analysis approach, full annotation of acquired AMR genes occurred within 8?h from subcultured isolates. Assemblies sufficient for full resistance gene and single-nucleotide polymorphism annotation were available within 14?h from subcultured isolates. The overall agreement of genotypic results and anticipated AST results for the 40?K. pneumoniae isolates was 77% (range, 30% to 100%) and 92% (range, 80% to 100%) for the real-time approach and the assembly approach, respectively. Evaluating the patients contributing the 40 isolates, the real-time approach and assembly approach could shorten the median time to effective antibiotic therapy by 20?h and 26?h, respectively, compared to standard AST. Nanopore sequencing offers a rapid approach to both accurately identify resistance mechanisms and to predict AST results for K. pneumoniae isolates. Bioinformatics improvements enabling real-time alignment, coupled with rapid extraction and library preparation, will further enhance the accuracy and workflow of the Nanopore real-time approach.
Project description:Background: The introduction of the MinION sequencing device by Oxford Nanopore Technologies may greatly accelerate whole genome sequencing. Nanopore sequence data offers great potential for de novo assembly of complex genomes without using other technologies. Furthermore, Nanopore data combined with other sequencing technologies is highly useful for accurate annotation of all genes in the genome. In this manuscript we used nanopore sequencing as a tool to classify yeast strains. Methods: We compared various technical and software developments for the nanopore sequencing protocol, showing that the R9 chemistry is, as predicted, higher in quality than R7.3 chemistry. The R9 chemistry is an essential improvement for assembly of the extremely AT-rich mitochondrial genome. We double corrected assemblies from four different assemblers with PILON and assessed sequence correctness before and after PILON correction with a set of 290 Fungi genes using BUSCO. Results: In this study, we used this new technology to sequence and de novo assemble the genome of a recently isolated ethanologenic yeast strain, and compared the results with those obtained by classical Illumina short read sequencing. This strain was originally named Candida vartiovaarae ( Torulopsis vartiovaarae) based on ribosomal RNA sequencing. We show that the assembly using nanopore data is much more contiguous than the assembly using short read data. We also compared various technical and software developments for the nanopore sequencing protocol, showing that nanopore-derived assemblies provide the highest contiguity. Conclusions: The mitochondrial and chromosomal genome sequences showed that our strain is clearly distinct from other yeast taxons and most closely related to published Cyberlindnera species. In conclusion, MinION-mediated long read sequencing can be used for high quality de novo assembly of new eukaryotic microbial genomes.
Project description:We sequenced the <i>Paenibacillus</i> sp. R4 using Oxford Nanopore Technology (ONT), single molecule real-time (SMRT) technology from Pacific Biosciences (PacBio), and Illumina technologies to investigate the application of nanopore reads in de novo sequencing of bacterial genomes. We compared the differences in both genome sequences between genome assemblies using nanopore and PacBio reads and focused on the difference in the prediction of coding sequences. The results indicated that for more accurate predictions of open reading frames, contigs in the assemblies using only PacBio reads also needed to be corrected using short reads with high-quality bases, and repeat regions in genomes did not affect the increase of mispredicted coding sequences via genome polishing significantly. In assemblies using only nanopore reads, genome polishing was essential, but many repeat regions in genomes might increase the number of mispredicted coding sequences via genome polishing. The hybrid assembly combining the long reads and short reads represents the best result for coding sequence predictions in genome assemblies using nanopore reads.