Ontology highlight
ABSTRACT: Summary
In whole genome sequencing data, polymerase chain reaction amplification results in duplicate DNA fragments coming from the same location in the genome. The process of preparing a whole genome bisulfite sequencing (WGBS) library, on the other hand, can create two DNA fragments from the same location that should not be considered duplicates. Currently, only one WGBS-aware duplicate marking tool exists. However, it only works with the output from a single tool, does not accept streaming input or output, and requires a substantial amount of memory relative to the input size. Dupsifter provides an aligner-agnostic duplicate marking tool that is lightweight, has streaming capabilities, and is memory efficient.Availability and implementation
Source code and binaries are freely available at https://github.com/huishenlab/dupsifter under the MIT license. Dupsifter is implemented in C and is supported on macOS and Linux.
SUBMITTER: Morrison J
PROVIDER: S-EPMC10724848 | biostudies-literature | 2023 Dec
REPOSITORIES: biostudies-literature
Morrison Jacob J Zhou Wanding W Johnson Benjamin K BK Shen Hui H
Bioinformatics (Oxford, England) 20231201 12
<h4>Summary</h4>In whole genome sequencing data, polymerase chain reaction amplification results in duplicate DNA fragments coming from the same location in the genome. The process of preparing a whole genome bisulfite sequencing (WGBS) library, on the other hand, can create two DNA fragments from the same location that should not be considered duplicates. Currently, only one WGBS-aware duplicate marking tool exists. However, it only works with the output from a single tool, does not accept stre ...[more]