Ontology highlight
ABSTRACT:
SUBMITTER: Schmidt S
PROVIDER: S-EPMC10251615 | biostudies-literature | 2023 Jun
REPOSITORIES: biostudies-literature
Schmidt Sebastian S Khan Shahbaz S Alanko Jarno N JN Pibiri Giulio E GE Tomescu Alexandru I AI
Genome biology 20230609 1
We propose a polynomial algorithm computing a minimum plain-text representation of k-mer sets, as well as an efficient near-minimum greedy heuristic. When compressing read sets of large model organisms or bacterial pangenomes, with only a minor runtime increase, we shrink the representation by up to 59% over unitigs and 26% over previous work. Additionally, the number of strings is decreased by up to 97% over unitigs and 90% over previous work. Finally, a small representation has advantages in d ...[more]