Unknown

Dataset Information

0

Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing.


ABSTRACT: Over the last two decades, the human reference genome has undergone multiple updates as we complete a linear representation of our genome. Two versions of human references are currently used in the biomedical literature, GRCh37/hg19 and GRCh38. Conversions between these versions are critical for quality control, imputation, and association analysis. In the present study, we show that single-nucleotide variants (SNVs) in regions inverted between different builds of the reference genome are often mishandled bioinformatically. Depending on the array type, SNVs are found in approximately 2-5 Mb of the genome that are inverted between reference builds. Coordinate conversions of these variants are mishandled by both the TOPMed imputation server as well as routine in-house quality control pipelines, leading to underrecognized downstream analytical consequences. Specifically, we observe that undetected allelic conversion errors for palindromic (i.e., A/T or C/G) variants in these inverted regions would destabilize the local haplotype structure, leading to loss of imputation accuracy and power in association analyses. Though only a small proportion of the genome is affected, these regions include important disease susceptibility variants that would be affected. For example, the p value of a known locus associated with prostate cancer on chromosome 10 (chr10) would drop from 2.86 × 10-7 to 0.0011 in a case-control analysis of 20,286 Africans and African Americans (10,643 cases and 9,643 controls). We devise a straight-forward heuristic based on the popular tool, liftOver, that can easily detect and correct these variants in the inverted regions between genome builds to locally improve imputation accuracy.

SUBMITTER: Sheng X 

PROVIDER: S-EPMC9709082 | biostudies-literature | 2023 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing.

Sheng Xin X   Xia Lucy L   Cahoon Jordan L JL   Conti David V DV   Haiman Christopher A CA   Kachuri Linda L   Chiang Charleston W K CWK  

HGG advances 20221111 1


Over the last two decades, the human reference genome has undergone multiple updates as we complete a linear representation of our genome. Two versions of human references are currently used in the biomedical literature, GRCh37/hg19 and GRCh38. Conversions between these versions are critical for quality control, imputation, and association analysis. In the present study, we show that single-nucleotide variants (SNVs) in regions inverted between different builds of the reference genome are often  ...[more]

Similar Datasets

| S-EPMC3559437 | biostudies-literature
| S-EPMC4143631 | biostudies-literature
| PRJNA75845 | ENA
| S-EPMC5520064 | biostudies-literature
| S-EPMC6933688 | biostudies-literature
| S-EPMC9731225 | biostudies-literature
| S-EPMC10300601 | biostudies-literature
| S-EPMC2709633 | biostudies-literature
| S-EPMC11391750 | biostudies-literature
| S-EPMC10788679 | biostudies-literature