Unknown

Dataset Information

0

Curated variation benchmarks for challenging medically relevant autosomal genes.


ABSTRACT: The repetitive nature and complexity of some medically relevant genes poses a challenge for their accurate analysis in a clinical setting. The Genome in a Bottle Consortium has provided variant benchmark sets, but these exclude nearly 400 medically relevant genes due to their repetitiveness or polymorphic complexity. Here, we characterize 273 of these 395 challenging autosomal genes using a haplotype-resolved whole-genome assembly. This curated benchmark reports over 17,000 single-nucleotide variations, 3,600 insertions and deletions and 200 structural variations each for human genome reference GRCh37 and GRCh38 across HG002. We show that false duplications in either GRCh37 or GRCh38 result in reference-specific, missed variants for short- and long-read technologies in medically relevant genes, including CBS, CRYAA and KCNE1. When masking these false duplications, variant recall can improve from 8% to 100%. Forming benchmarks from a haplotype-resolved whole-genome assembly may become a prototype for future benchmarks covering the whole genome.

SUBMITTER: Wagner J 

PROVIDER: S-EPMC9117392 | biostudies-literature | 2022 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

Curated variation benchmarks for challenging medically relevant autosomal genes.

Wagner Justin J   Olson Nathan D ND   Harris Lindsay L   McDaniel Jennifer J   Cheng Haoyu H   Fungtammasan Arkarachai A   Hwang Yih-Chii YC   Gupta Richa R   Wenger Aaron M AM   Rowell William J WJ   Khan Ziad M ZM   Farek Jesse J   Zhu Yiming Y   Pisupati Aishwarya A   Mahmoud Medhat M   Xiao Chunlin C   Yoo Byunggil B   Sahraeian Sayed Mohammad Ebrahim SME   Miller Danny E DE   Jáspez David D   Lorenzo-Salazar José M JM   Muñoz-Barrera Adrián A   Rubio-Rodríguez Luis A LA   Flores Carlos C   Narzisi Giuseppe G   Evani Uday Shanker US   Clarke Wayne E WE   Lee Joyce J   Mason Christopher E CE   Lincoln Stephen E SE   Miga Karen H KH   Ebbert Mark T W MTW   Shumate Alaina A   Li Heng H   Li Heng H   Chin Chen-Shan CS   Zook Justin M JM   Sedlazeck Fritz J FJ  

Nature biotechnology 20220207 5


The repetitive nature and complexity of some medically relevant genes poses a challenge for their accurate analysis in a clinical setting. The Genome in a Bottle Consortium has provided variant benchmark sets, but these exclude nearly 400 medically relevant genes due to their repetitiveness or polymorphic complexity. Here, we characterize 273 of these 395 challenging autosomal genes using a haplotype-resolved whole-genome assembly. This curated benchmark reports over 17,000 single-nucleotide var  ...[more]

Similar Datasets

| S-EPMC2970608 | biostudies-literature
| S-EPMC9205953 | biostudies-literature
| S-EPMC9121574 | biostudies-literature
| S-EPMC10836243 | biostudies-literature
| S-EPMC2684120 | biostudies-literature
| S-EPMC11486962 | biostudies-literature
| S-EPMC10983041 | biostudies-literature
| S-EPMC10457089 | biostudies-literature
| S-EPMC10620610 | biostudies-literature