Dataset Information


Exploiting sparseness in de novo genome assembly.

ABSTRACT: BACKGROUND: The very large memory requirements for the construction of assembly graphs for de novo genome assembly limit current algorithms to super-computing environments. METHODS: In this paper, we demonstrate that constructing a sparse assembly graph which stores only a small fraction of the observed k-mers as nodes and the links between these nodes allows the de novo assembly of even moderately-sized genomes (~500 M) on a typical laptop computer. RESULTS: We implement this sparse graph concept in a proof-of-principle software package, SparseAssembler, utilizing a new sparse k-mer graph structure evolved from the de Bruijn graph. We test our SparseAssembler with both simulated and real data, achieving ~90% memory savings and retaining high assembly accuracy, without sacrificing speed in comparison to existing de novo assemblers.


PROVIDER: S-EPMC3369186 | BioStudies | 2012-01-01

REPOSITORIES: biostudies

Similar Datasets

2013-01-01 | S-EPMC3848682 | BioStudies
2017-01-01 | S-EPMC5591975 | BioStudies
2014-01-01 | S-EPMC4120145 | BioStudies
2017-01-01 | S-EPMC5870571 | BioStudies
1000-01-01 | S-EPMC4908363 | BioStudies
2012-01-01 | S-EPMC3290790 | BioStudies
2020-01-01 | S-EPMC7499882 | BioStudies
1000-01-01 | S-EPMC6061703 | BioStudies
2012-01-01 | S-EPMC3488206 | BioStudies
2012-01-01 | S-EPMC3517413 | BioStudies