Ontology highlight
ABSTRACT:
SUBMITTER: Yi H
PROVIDER: S-EPMC7962209 | biostudies-literature | 2021 Mar
REPOSITORIES: biostudies-literature
Yi Huiguang H Lin Yanling Y Lin Chengqi C Jin Wenfei W
Genome biology 20210316 1
Here, we develop k -mer substring space decomposition (Kssd), a sketching technique which is significantly faster and more accurate than current sketching methods. We show that it is the only method that can be used for large-scale dataset comparisons at population resolution on simulated and real data. Using Kssd, we prioritize references for all 1,019,179 bacteria whole genome sequencing (WGS) runs from NCBI Sequence Read Archive and find misidentification or contamination in 6164 of these. Ad ...[more]