Unknown

Dataset Information

0

Efficient COI barcoding using high throughput single-end 400?bp sequencing.


ABSTRACT:

Background

Over the last decade, the rapid development of high-throughput sequencing platforms has accelerated species description and assisted morphological classification through DNA barcoding. However, the current high-throughput DNA barcoding methods cannot obtain full-length barcode sequences due to read length limitations (e.g. a maximum read length of 300?bp for the Illumina's MiSeq system), or are hindered by a relatively high cost or low sequencing output (e.g. a maximum number of eight million reads per cell for the PacBio's SEQUEL II system).

Results

Pooled cytochrome c oxidase subunit I (COI) barcodes from individual specimens were sequenced on the MGISEQ-2000 platform using the single-end 400?bp (SE400) module. We present a bioinformatic pipeline, HIFI-SE, that takes reads generated from the 5' and 3' ends of the COI barcode region and assembles them into full-length barcodes. HIFI-SE is written in Python and includes four function modules of filter, assign, assembly and taxonomy. We applied the HIFI-SE to a set of 845 samples (30 marine invertebrates, 815 insects) and delivered a total of 747 fully assembled COI barcodes as well as 70 Wolbachia and fungi symbionts. Compared to their corresponding Sanger sequences (72 sequences available), nearly all samples (71/72) were correctly and accurately assembled, including 46 samples that had a similarity score of 100% and 25 of ca. 99%.

Conclusions

The HIFI-SE pipeline represents an efficient way to produce standard full-length barcodes, while the reasonable cost and high sensitivity of our method can contribute considerably more DNA barcodes under the same budget. Our method thereby advances DNA-based species identification from diverse ecosystems and increases the number of relevant applications.

SUBMITTER: Yang C 

PROVIDER: S-EPMC7716423 | biostudies-literature | 2020 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications


<h4>Background</h4>Over the last decade, the rapid development of high-throughput sequencing platforms has accelerated species description and assisted morphological classification through DNA barcoding. However, the current high-throughput DNA barcoding methods cannot obtain full-length barcode sequences due to read length limitations (e.g. a maximum read length of 300 bp for the Illumina's MiSeq system), or are hindered by a relatively high cost or low sequencing output (e.g. a maximum number  ...[more]

Similar Datasets

| S-EPMC5531050 | biostudies-literature
| S-EPMC3166835 | biostudies-literature
| S-EPMC7427513 | biostudies-literature
| S-EPMC4624326 | biostudies-literature
| S-EPMC5292727 | biostudies-literature
| S-EPMC8599736 | biostudies-literature
2015-07-01 | E-GEOD-66550 | biostudies-arrayexpress
2015-07-01 | GSE66550 | GEO
| S-EPMC6874760 | biostudies-literature