Dataset Information

A simple refined DNA minimizer operator enables 2-fold faster computation.

ABSTRACT:

Motivation

The minimizer concept is a data structure for sequence sketching. The standard canonical minimizer selects a subset of k-mers from the given DNA sequence by comparing the forward and reverse k-mers in a window simultaneously according to a predefined selection scheme. It is widely employed by sequence analysis such as read mapping and assembly. k-mer density, k-mer repetitiveness (e.g. k-mer bias), and computational efficiency are three critical measurements for minimizer selection schemes. However, there exist trade-offs between kinds of minimizer variants. Generic, effective, and efficient are always the requirements for high-performance minimizer algorithms.

Results

We propose a simple minimizer operator as a refinement of the standard canonical minimizer. It takes only a few operations to compute. However, it can improve the k-mer repetitiveness, especially for the lexicographic order. It applies to other selection schemes of total orders (e.g. random orders). Moreover, it is computationally efficient and the density is close to that of the standard minimizer. The refined minimizer may benefit high-performance applications like binning and read mapping.

Availability and implementation

The source code of the benchmark in this work is available at the github repository https://github.com/xp3i4/mini_benchmark.

SUBMITTER: Pan C

PROVIDER: S-EPMC10868324 | biostudies-literature | 2024 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A simple refined DNA minimizer operator enables 2-fold faster computation.

Pan Chenxu C Reinert Knut K

Bioinformatics (Oxford, England) 20240201 2

<h4>Motivation</h4>The minimizer concept is a data structure for sequence sketching. The standard canonical minimizer selects a subset of k-mers from the given DNA sequence by comparing the forward and reverse k-mers in a window simultaneously according to a predefined selection scheme. It is widely employed by sequence analysis such as read mapping and assembly. k-mer density, k-mer repetitiveness (e.g. k-mer bias), and computational efficiency are three critical measurements for minimizer sele ...[more]

PMID: 38269626

Dataset Information

A simple refined DNA minimizer operator enables 2-fold faster computation.

Motivation

Results

Availability and implementation

Publications

A simple refined DNA minimizer operator enables 2-fold faster computation.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Faster computation of exact RNA shape probabilities.
| S-EPMC2828121 | biostudies-literature

Regulated Formation of lncRNA-DNA Hybrids Enables Faster Transcriptional Induction and Environmental Adaptation.
| S-EPMC4744127 | biostudies-literature

Refined DNA Repair Manipulation Enables a Universal Knock-in Strategy in Mouse Embryos
2025-05-26 | GSE297792 | GEO

Improving the computation efficiency of polygenic risk score modeling: faster in Julia.
| S-EPMC9297586 | biostudies-literature

Material parameter computation for multi-layered vocal fold models.
| S-EPMC3087394 | biostudies-other

A Single Amino Acid Change to Taq DNA Polymerase Enables Faster PCR, Reverse Transcription and Strand-Displacement.
| S-EPMC7841393 | biostudies-literature

Elastic coupling of limb joints enables faster bipedal walking.
| S-EPMC2696144 | biostudies-literature

Parallel window decoding enables scalable fault tolerant quantum computation.
| S-EPMC10624853 | biostudies-literature

Effective Viscous Damping Enables Morphological Computation in Legged Locomotion.
| S-EPMC7805837 | biostudies-literature

Genotype Value Decomposition: Simple Methods for the Computation of Kernel Statistics.
| S-EPMC9744480 | biostudies-literature