Unknown

Dataset Information

0

Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes.


ABSTRACT: Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces reference biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability to characterize repetitive genomic regions, which are particularly challenging for fast k-mer-based genotypers. In the present study, we propose a new algorithm, PanGenie, that leverages a haplotype-resolved pangenome reference together with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation-a process we refer to as genome inference. Compared with mapping-based approaches, PanGenie is more than 4 times faster at 30-fold coverage and achieves better genotype concordances for almost all variant types and coverages tested. Improvements are especially pronounced for large insertions (≥50 bp) and variants in repetitive regions, enabling the inclusion of these classes of variants in genome-wide association studies. PanGenie efficiently leverages the increasing amount of haplotype-resolved assemblies to unravel the functional impact of previously inaccessible variants while being faster compared with alignment-based workflows.

SUBMITTER: Ebler J 

PROVIDER: S-EPMC9005351 | biostudies-literature | 2022 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes.

Ebler Jana J   Ebert Peter P   Clarke Wayne E WE   Rausch Tobias T   Audano Peter A PA   Houwaart Torsten T   Mao Yafei Y   Korbel Jan O JO   Eichler Evan E EE   Zody Michael C MC   Dilthey Alexander T AT   Marschall Tobias T  

Nature genetics 20220411 4


Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces reference biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability to characterize repetitive genomic regions, which are particularly challenging for fast k-mer-based genotypers. In the present study, we propose a new algorithm, PanGenie, that leverages a haplotype-resolved pangenome reference together with k-mer  ...[more]

Similar Datasets

| S-EPMC10664547 | biostudies-literature
2021-12-23 | GSE178827 | GEO
| S-EPMC9417177 | biostudies-literature
| S-EPMC11642879 | biostudies-literature
| S-EPMC6521551 | biostudies-literature
| S-EPMC7744038 | biostudies-literature
| S-EPMC11774466 | biostudies-literature
| S-EPMC11568064 | biostudies-literature
| S-EPMC6136750 | biostudies-literature
| S-EPMC10519407 | biostudies-literature