Next-generation DNA barcoding: using next-generation sequencing to enhance and accelerate DNA barcode capture from single specimens.
ABSTRACT: DNA barcoding is an efficient method to identify specimens and to detect undescribed/cryptic species. Sanger sequencing of individual specimens is the standard approach in generating large-scale DNA barcode libraries and identifying unknowns. However, the Sanger sequencing technology is, in some respects, inferior to next-generation sequencers, which are capable of producing millions of sequence reads simultaneously. Additionally, direct Sanger sequencing of DNA barcode amplicons, as practiced in most DNA barcoding procedures, is hampered by the need for relatively high-target amplicon yield, coamplification of nuclear mitochondrial pseudogenes, confusion with sequences from intracellular endosymbiotic bacteria (e.g. Wolbachia) and instances of intraindividual variability (i.e. heteroplasmy). Any of these situations can lead to failed Sanger sequencing attempts or ambiguity of the generated DNA barcodes. Here, we demonstrate the potential application of next-generation sequencing platforms for parallel acquisition of DNA barcode sequences from hundreds of specimens simultaneously. To facilitate retrieval of sequences obtained from individual specimens, we tag individual specimens during PCR amplification using unique 10-mer oligonucleotides attached to DNA barcoding PCR primers. We employ 454 pyrosequencing to recover full-length DNA barcodes of 190 specimens using 12.5% capacity of a 454 sequencing run (i.e. two lanes of a 16 lane run). We obtained an average of 143 sequence reads for each individual specimen. The sequences produced are full-length DNA barcodes for all but one of the included specimens. In a subset of samples, we also detected Wolbachia, nontarget species, and heteroplasmic sequences. Next-generation sequencing is of great value because of its protocol simplicity, greatly reduced cost per barcode read, faster throughout and added information content.
Project description:Over the past decade, biodiversity researchers have dedicated tremendous efforts to constructing DNA reference barcodes for rapid species registration and identification. Although analytical cost for standard DNA barcoding has been significantly reduced since early 2000, further dramatic reduction in barcoding costs is unlikely because Sanger sequencing is approaching its limits in throughput and chemistry cost. Constraints in barcoding cost not only led to unbalanced barcoding efforts around the globe, but also prevented high-throughput sequencing (HTS)-based taxonomic identification from applying binomial species names, which provide crucial linkages to biological knowledge. We developed an Illumina-based pipeline, HIFI-Barcode, to produce full-length Cytochrome c oxidase subunit I (COI) barcodes from pooled polymerase chain reaction amplicons generated by individual specimens. The new pipeline generated accurate barcode sequences that were comparable to Sanger standards, even for different haplotypes of the same species that were only a few nucleotides different from each other. Additionally, the new pipeline was much more sensitive in recovering amplicons at low quantity. The HIFI-Barcode pipeline successfully recovered barcodes from more than 78% of the polymerase chain reactions that didn't show clear bands on the electrophoresis gel. Moreover, sequencing results based on the single molecular sequencing platform Pacbio confirmed the accuracy of the HIFI-Barcode results. Altogether, the new pipeline can provide an improved solution to produce full-length reference barcodes at about one-tenth of the current cost, enabling construction of comprehensive barcode libraries for local fauna, leading to a feasible direction for DNA barcoding global biomes.
Project description:DNA barcoding is an effective approach for species identification and for discovery of new and/or cryptic species. Sanger sequencing technology is the method of choice for obtaining standard 650 bp cytochrome c oxidase subunit I (COI) barcodes. However, DNA degradation/fragmentation makes it difficult to obtain a full-length barcode from old specimens. Mini-barcodes of 130 bp from the standard barcode region have been shown to be effective for accurate identification in many animal groups and may be readily obtained from museum samples. Here we demonstrate the application of an alternative sequencing technology, the four-enzymes single-specimen pyrosequencing, in rapid, cost-effective mini-barcode analysis. We were able to generate sequences of up to 100 bp from mini-barcode fragments of COI in 135 fresh and 50 old Lepidoptera specimens (ranging from 53-97 year-old). The sequences obtained using pyrosequencing were of high quality and we were able to robustly match all the tested pyro-sequenced samples to their respective Sanger-sequenced standard barcode sequences, where available. Simplicity of the protocol and instrumentation coupled with higher speed and lower cost per sequence than Sanger sequencing makes this approach potentially useful in efforts to link standard barcode sequences from unidentified specimens to known museum specimens with only short DNA fragments.
Project description:<h4>Background</h4>Over the last decade, the rapid development of high-throughput sequencing platforms has accelerated species description and assisted morphological classification through DNA barcoding. However, the current high-throughput DNA barcoding methods cannot obtain full-length barcode sequences due to read length limitations (e.g. a maximum read length of 300?bp for the Illumina's MiSeq system), or are hindered by a relatively high cost or low sequencing output (e.g. a maximum number of eight million reads per cell for the PacBio's SEQUEL II system).<h4>Results</h4>Pooled cytochrome c oxidase subunit I (COI) barcodes from individual specimens were sequenced on the MGISEQ-2000 platform using the single-end 400?bp (SE400) module. We present a bioinformatic pipeline, HIFI-SE, that takes reads generated from the 5' and 3' ends of the COI barcode region and assembles them into full-length barcodes. HIFI-SE is written in Python and includes four function modules of filter, assign, assembly and taxonomy. We applied the HIFI-SE to a set of 845 samples (30 marine invertebrates, 815 insects) and delivered a total of 747 fully assembled COI barcodes as well as 70 Wolbachia and fungi symbionts. Compared to their corresponding Sanger sequences (72 sequences available), nearly all samples (71/72) were correctly and accurately assembled, including 46 samples that had a similarity score of 100% and 25 of ca. 99%.<h4>Conclusions</h4>The HIFI-SE pipeline represents an efficient way to produce standard full-length barcodes, while the reasonable cost and high sensitivity of our method can contribute considerably more DNA barcodes under the same budget. Our method thereby advances DNA-based species identification from diverse ecosystems and increases the number of relevant applications.
Project description:Genetic information is a valuable component of biosystematics, especially specimen identification through the use of species-specific DNA barcodes. Although many genomics applications have shifted to High-Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies, sample identification (e.g., via DNA barcoding) is still most often done with Sanger sequencing. Here, we present a scalable double dual-indexing approach using an Illumina Miseq platform to sequence DNA barcode markers. We achieved 97.3% success by using half of an Illumina Miseq flowcell to obtain 658 base pairs of the cytochrome c oxidase I DNA barcode in 1,010 specimens from eleven orders of arthropods. Our approach recovers a greater proportion of DNA barcode sequences from individuals than does conventional Sanger sequencing, while at the same time reducing both per specimen costs and labor time by nearly 80%. In addition, the use of HTS allows the recovery of multiple sequences per specimen, for deeper analysis of genetic variation in target gene regions.
Project description:Species substitution is a form of seafood fraud for the purpose of economic gain. DNA barcoding utilizes species-specific DNA sequence information for specimen identification. Previous work has established the usability of short DNA sequences-mini-barcodes-for identification of specimens harboring degraded DNA. This study aims at establishing a DNA mini-barcoding system for all fish species commonly used in processed fish products in North America. Six mini-barcode primer pairs targeting short (127-314?bp) fragments of the cytochrome c oxidase I (CO1) DNA barcode region were developed by examining over 8,000 DNA barcodes from species in the U.S. Food and Drug Administration (FDA) Seafood List. The mini-barcode primer pairs were then tested against 44 processed fish products representing a range of species and product types. Of the 44 products, 41 (93.2%) could be identified at the species or genus level. The greatest mini-barcoding success rate found with an individual primer pair was 88.6% compared to 20.5% success rate achieved by the full-length DNA barcode primers. Overall, this study presents a mini-barcoding system that can be used to identify a wide range of fish species in commercial products and may be utilized in high throughput DNA sequencing for authentication of heavily processed fish products.
Project description:Incompleteness and inaccuracy of DNA barcode databases is considered an important hindrance to the use of metabarcoding in biodiversity analysis of zooplankton at the species-level. Species barcoding by Sanger sequencing is inefficient for organisms with small body sizes, such as zooplankton. Here mitochondrial cytochrome c oxidase I (COI) fragment barcodes from 910 freshwater zooplankton specimens (87 morphospecies) were recovered by a high-throughput sequencing platform, Ion Torrent PGM. Intraspecific divergence of most zooplanktons was < 5%, except Branchionus leydign (Rotifer, 14.3%), Trichocerca elongate (Rotifer, 11.5%), Lecane bulla (Rotifer, 15.9%), Synchaeta oblonga (Rotifer, 5.95%) and Schmackeria forbesi (Copepod, 6.5%). Metabarcoding data of 28 environmental samples from Lake Tai were annotated by both an indigenous database and NCBI Genbank database. The indigenous database improved the taxonomic assignment of metabarcoding of zooplankton. Most zooplankton (81%) with barcode sequences in the indigenous database were identified by metabarcoding monitoring. Furthermore, the frequency and distribution of zooplankton were also consistent between metabarcoding and morphology identification. Overall, the indigenous database improved the taxonomic assignment of zooplankton.
Project description:Even though ladybirds are well known as economically important biological control agents, an integrative framework of DNA barcoding research was not available for the family so far. We designed and present a set of efficient mini-barcoding primers to recover full DNA barcoding sequences for Coccinellidae, even for specimens collected 40 years ago. Based on these mini-barcoding primers, we obtained 104 full DNA barcode sequences for 104 species of Coccinellidae, in which 101 barcodes were newly reported for the first time. We also downloaded 870 COI barcode sequences (658?bp) from GenBank and BOLD database, belonging to 108 species within 46 genera, to assess the optimum genetic distance threshold and compare four methods of species delimitation (GMYC, bPTP, BIN and ABGD) to determine the most accurate approach for the family. The results suggested the existence of a 'barcode gap' and that 3% is likely an appropriate genetic distance threshold to delimit species of Coccinellidae using DNA barcodes. Species delimitation analyses confirm ABGD as an accurate and efficient approach, more suitable than the other three methods. Our research provides an integrative framework for DNA barcoding and descriptions of new taxa in Coccinellidae. Our results enrich DNA barcoding public reference libraries, including data for Chinese coccinellids. This will facilitate taxonomic identification and biodiversity monitoring of ladybirds using metabarcoding.
Project description:BACKGROUND: The goal of DNA barcoding is to develop a species-specific sequence library for all eukaryotes. A 650 bp fragment of the cytochrome c oxidase 1 (CO1) gene has been used successfully for species-level identification in several animal groups. It may be difficult in practice, however, to retrieve a 650 bp fragment from archival specimens, (because of DNA degradation) or from environmental samples (where universal primers are needed). RESULTS: We used a bioinformatics analysis using all CO1 barcode sequences from GenBank and calculated the probability of having species-specific barcodes for varied size fragments. This analysis established the potential of much smaller fragments, mini-barcodes, for identifying unknown specimens. We then developed a universal primer set for the amplification of mini-barcodes. We further successfully tested the utility of this primer set on a comprehensive set of taxa from all major eukaryotic groups as well as archival specimens. CONCLUSION: In this study we address the important issue of minimum amount of sequence information required for identifying species in DNA barcoding. We establish a novel approach based on a much shorter barcode sequence and demonstrate its effectiveness in archival specimens. This approach will significantly broaden the application of DNA barcoding in biodiversity studies.
Project description:Wolbachia is a genus of bacterial endosymbionts that impacts the breeding systems of their hosts. Wolbachia can confuse the patterns of mitochondrial variation, including DNA barcodes, because it influences the pathways through which mitochondria are inherited. We examined the extent to which these endosymbionts are detected in routine DNA barcoding, assessed their impact upon the insect sequence divergence and identification accuracy, and considered the variation present in Wolbachia COI. Using both standard PCR assays (Wolbachia surface coding protein--wsp), and bacterial COI fragments we found evidence of Wolbachia in insect total genomic extracts created for DNA barcoding library construction. When >2 million insect COI trace files were examined on the Barcode of Life Datasystem (BOLD) Wolbachia COI was present in 0.16% of the cases. It is possible to generate Wolbachia COI using standard insect primers; however, that amplicon was never confused with the COI of the host. Wolbachia alleles recovered were predominantly Supergroup A and were broadly distributed geographically and phylogenetically. We conclude that the presence of the Wolbachia DNA in total genomic extracts made from insects is unlikely to compromise the accuracy of the DNA barcode library; in fact, the ability to query this DNA library (the database and the extracts) for endosymbionts is one of the ancillary benefits of such a large scale endeavor--which we provide several examples. It is our conclusion that regular assays for Wolbachia presence and type can, and should, be adopted by large scale insect barcoding initiatives. While COI is one of the five multi-locus sequence typing (MLST) genes used for categorizing Wolbachia, there is limited overlap with the eukaryotic DNA barcode region.
Project description:Large-scale DNA barcoding projects are now moving toward activation while the creation of a comprehensive barcode library for eukaryotes will ultimately require the acquisition of some 100 million barcodes. To satisfy this need, analytical facilities must adopt protocols that can support the rapid, cost-effective assembly of barcodes. In this paper we discuss the prospects for establishing high volume DNA barcoding facilities by evaluating key steps in the analytical chain from specimens to barcodes. Alliances with members of the taxonomic community represent the most effective strategy for provisioning the analytical chain with specimens. The optimal protocols for DNA extraction and subsequent PCR amplification of the barcode region depend strongly on their condition, but production targets of 100K barcode records per year are now feasible for facilities working with compliant specimens. The analysis of museum collections is currently challenging, but PCR cocktails that combine polymerases with repair enzyme(s) promise future success. Barcode analysis is already a cost-effective option for species identification in some situations and this will increasingly be the case as reference libraries are assembled and analytical protocols are simplified.