Ontology highlight
ABSTRACT:
SUBMITTER: Yu YW
PROVIDER: S-EPMC10824537 | biostudies-literature | 2022 Jan
REPOSITORIES: biostudies-literature
IEEE transactions on knowledge and data engineering 20200317 1
In this extended abstract, we describe and analyze a lossy compression of MinHash from buckets of size O(logn) to buckets of size O(loglogn) by encoding using floating-point notation. This new compressed sketch, which we call HyperMinHash, as we build off a HyperLogLog scaffold, can be used as a drop-in replacement of MinHash. Unlike comparable Jaccard index fingerprinting algorithms in sub-logarithmic space (such as b-bit MinHash), HyperMinHash retains MinHash's features of streaming updates, u ...[more]