Unknown

Dataset Information

0

Chaining for accurate alignment of erroneous long reads to acyclic variation graphs.


ABSTRACT:

Motivation

Aligning reads to a variation graph is a standard task in pangenomics, with downstream applications such as improving variant calling. While the vg toolkit [Garrison et al. (Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol 2018;36:875-9)] is a popular aligner of short reads, GraphAligner [Rautiainen and Marschall (GraphAligner: rapid and versatile sequence-to-graph alignment. Genome Biol 2020;21:253-28)] is the state-of-the-art aligner of erroneous long reads. GraphAligner works by finding candidate read occurrences based on individually extending the best seeds of the read in the variation graph. However, a more principled approach recognized in the community is to co-linearly chain multiple seeds.

Results

We present a new algorithm to co-linearly chain a set of seeds in a string labeled acyclic graph, together with the first efficient implementation of such a co-linear chaining algorithm into a new aligner of erroneous long reads to acyclic variation graphs, GraphChainer. We run experiments aligning real and simulated PacBio CLR reads with average error rates 15% and 5%. Compared to GraphAligner, GraphChainer aligns 12-17% more reads, and 21-28% more total read length, on real PacBio CLR reads from human chromosomes 1, 22, and the whole human pangenome. On both simulated and real data, GraphChainer aligns between 95% and 99% of all reads, and of total read length. We also show that minigraph [Li et al. (The design and construction of reference pangenome graphs with minigraph. Genome Biol 2020;21:265-19.)] and minichain [Chandra and Jain (Sequence to graph alignment using gap-sensitive co-linear chaining. In: Proceedings of the 27th Annual International Conference on Research in Computational Molecular Biology (RECOMB 2023). Springer, 2023, 58-73.)] obtain an accuracy of <60% on this setting.

Availability and implementation

GraphChainer is freely available at https://github.com/algbio/GraphChainer. The datasets and evaluation pipeline can be reached from the previous address.

SUBMITTER: Ma J 

PROVIDER: S-EPMC10423031 | biostudies-literature | 2023 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

Chaining for accurate alignment of erroneous long reads to acyclic variation graphs.

Ma Jun J   Cáceres Manuel M   Salmela Leena L   Mäkinen Veli V   Tomescu Alexandru I AI  

Bioinformatics (Oxford, England) 20230801 8


<h4>Motivation</h4>Aligning reads to a variation graph is a standard task in pangenomics, with downstream applications such as improving variant calling. While the vg toolkit [Garrison et al. (Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol 2018;36:875-9)] is a popular aligner of short reads, GraphAligner [Rautiainen and Marschall (GraphAligner: rapid and versatile sequence-to-graph alignment. Genome Biol 2020;21:253-28)] is the st  ...[more]

Similar Datasets

| S-EPMC8665758 | biostudies-literature
| S-EPMC9636371 | biostudies-literature
| S-EPMC5351550 | biostudies-literature
| S-EPMC10541625 | biostudies-literature
| S-EPMC4906657 | biostudies-literature
| S-EPMC4652746 | biostudies-literature
| S-EPMC6122196 | biostudies-literature
2008-12-30 | GSE8880 | GEO
| S-EPMC6612831 | biostudies-literature
| S-EPMC5206522 | biostudies-literature