Unknown

Dataset Information

0

Konnector v2.0: pseudo-long reads from paired-end sequencing data.


ABSTRACT: Reading the nucleotides from two ends of a DNA fragment is called paired-end tag (PET) sequencing. When the fragment length is longer than the combined read length, there remains a gap of unsequenced nucleotides between read pairs. If the target in such experiments is sequenced at a level to provide redundant coverage, it may be possible to bridge these gaps using bioinformatics methods. Konnector is a local de novo assembly tool that addresses this problem. Here we report on version 2.0 of our tool.Konnector uses a probabilistic and memory-efficient data structure called Bloom filter to represent a k-mer spectrum - all possible sequences of length k in an input file, such as the collection of reads in a PET sequencing experiment. It performs look-ups to this data structure to construct an implicit de Bruijn graph, which describes (k-1) base pair overlaps between adjacent k-mers. It traverses this graph to bridge the gap between a given pair of flanking sequences.Here we report the performance of Konnector v2.0 on simulated and experimental datasets, and compare it against other tools with similar functionality. We note that, representing k-mers with 1.5 bytes of memory on average, Konnector can scale to very large genomes. With our parallel implementation, it can also process over a billion bases on commodity hardware.

SUBMITTER: Vandervalk BP 

PROVIDER: S-EPMC4582294 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

altmetric image

Publications


<h4>Background</h4>Reading the nucleotides from two ends of a DNA fragment is called paired-end tag (PET) sequencing. When the fragment length is longer than the combined read length, there remains a gap of unsequenced nucleotides between read pairs. If the target in such experiments is sequenced at a level to provide redundant coverage, it may be possible to bridge these gaps using bioinformatics methods. Konnector is a local de novo assembly tool that addresses this problem. Here we report on  ...[more]

Similar Datasets

| S-EPMC7168855 | biostudies-literature
| S-EPMC6302405 | biostudies-other
| S-EPMC4542784 | biostudies-literature
| S-EPMC3614465 | biostudies-other
| S-EPMC5834899 | biostudies-literature
| S-EPMC3076424 | biostudies-literature
| S-EPMC4168710 | biostudies-literature
| S-EPMC3358655 | biostudies-literature
| S-EPMC4074385 | biostudies-literature
| S-EPMC4005600 | biostudies-literature