Unknown

Dataset Information

0

Matchtigs: minimum plain text representation of k-mer sets.


ABSTRACT: We propose a polynomial algorithm computing a minimum plain-text representation of k-mer sets, as well as an efficient near-minimum greedy heuristic. When compressing read sets of large model organisms or bacterial pangenomes, with only a minor runtime increase, we shrink the representation by up to 59% over unitigs and 26% over previous work. Additionally, the number of strings is decreased by up to 97% over unitigs and 90% over previous work. Finally, a small representation has advantages in downstream applications, as it speeds up SSHash-Lite queries by up to 4.26× over unitigs and 2.10× over previous work.

SUBMITTER: Schmidt S 

PROVIDER: S-EPMC10251615 | biostudies-literature | 2023 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Matchtigs: minimum plain text representation of k-mer sets.

Schmidt Sebastian S   Khan Shahbaz S   Alanko Jarno N JN   Pibiri Giulio E GE   Tomescu Alexandru I AI  

Genome biology 20230609 1


We propose a polynomial algorithm computing a minimum plain-text representation of k-mer sets, as well as an efficient near-minimum greedy heuristic. When compressing read sets of large model organisms or bacterial pangenomes, with only a minor runtime increase, we shrink the representation by up to 59% over unitigs and 26% over previous work. Additionally, the number of strings is decreased by up to 97% over unitigs and 90% over previous work. Finally, a small representation has advantages in d  ...[more]

Similar Datasets

| S-EPMC9477520 | biostudies-literature
| S-EPMC6375602 | biostudies-literature
| S-EPMC3636516 | biostudies-literature
| S-EPMC10659450 | biostudies-literature
| S-EPMC4015147 | biostudies-literature
| S-EPMC5514192 | biostudies-other
| S-EPMC11325533 | biostudies-literature
| S-EPMC5645146 | biostudies-literature
| S-EPMC7986389 | biostudies-literature
| S-EPMC7732815 | biostudies-literature