Dataset Information

Block Aligner: an adaptive SIMD-accelerated aligner for sequences and position-specific scoring matrices.

ABSTRACT:

Motivation

Efficiently aligning sequences is a fundamental problem in bioinformatics. Many recent algorithms for computing alignments through Smith-Waterman-Gotoh dynamic programming (DP) exploit Single Instruction Multiple Data (SIMD) operations on modern CPUs for speed. However, these advances have largely ignored difficulties associated with efficiently handling complex scoring matrices or large gaps (insertions or deletions).

Results

We propose a new SIMD-accelerated algorithm called Block Aligner for aligning nucleotide and protein sequences against other sequences or position-specific scoring matrices. We introduce a new paradigm that uses blocks in the DP matrix that greedily shift, grow, and shrink. This approach allows regions of the DP matrix to be adaptively computed. Our algorithm reaches over 5-10 times faster than some previous methods while incurring an error rate of less than 3% on protein and long read datasets, despite large gaps and low sequence identities.

Availability and implementation

Our algorithm is implemented for global, local, and X-drop alignments. It is available as a Rust library (with C bindings) at https://github.com/Daniel-Liu-c0deb0t/block-aligner.

SUBMITTER: Liu D

PROVIDER: S-EPMC10457662 | biostudies-literature | 2023 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Block Aligner: an adaptive SIMD-accelerated aligner for sequences and position-specific scoring matrices.

Liu Daniel D Steinegger Martin M

Bioinformatics (Oxford, England) 20230801 8

<h4>Motivation</h4>Efficiently aligning sequences is a fundamental problem in bioinformatics. Many recent algorithms for computing alignments through Smith-Waterman-Gotoh dynamic programming (DP) exploit Single Instruction Multiple Data (SIMD) operations on modern CPUs for speed. However, these advances have largely ignored difficulties associated with efficiently handling complex scoring matrices or large gaps (insertions or deletions).<h4>Results</h4>We propose a new SIMD-accelerated algorithm ...[more]

PMID: 37535681

Dataset Information

Block Aligner: an adaptive SIMD-accelerated aligner for sequences and position-specific scoring matrices.

Motivation

Results

Availability and implementation

Publications

Block Aligner: an adaptive SIMD-accelerated aligner for sequences and position-specific scoring matrices.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Adaptable probabilistic mapping of short reads using position specific scoring matrices.
| S-EPMC4021105 | biostudies-literature

3DCONS-DB: A Database of Position-Specific Scoring Matrices in Protein Structures.
| S-EPMC6149929 | biostudies-literature

A library of sensitive position-specific scoring matrices for high-throughput identification of nuclear pore complex subunits.
| S-EPMC10034585 | biostudies-literature

Lineage-specific accelerated sequences underlying primate evolution.
| S-EPMC10413682 | biostudies-literature

Amino acid substitution scoring matrices specific to intrinsically disordered regions in proteins.
| S-EPMC6841959 | biostudies-literature

Inferring homologous protein-protein interactions through pair position specific scoring matrix.
| S-EPMC3549806 | biostudies-literature

A comparison of position-specific score matrices based on sequence and structure alignments.
| S-EPMC2373449 | biostudies-literature

lra: A long read aligner for sequences and contigs.
| S-EPMC8248648 | biostudies-literature

Aligning multiple genomic sequences with the threaded blockset aligner.
| S-EPMC383317 | biostudies-literature

Substitution scoring matrices for proteins - An overview.
| S-EPMC7586916 | biostudies-literature