Unknown

Dataset Information

0

Genotype error due to low-coverage sequencing induces uncertainty in polygenic scoring.


ABSTRACT: Polygenic scores (PGSs) have emerged as a standard approach to predict phenotypes from genotype data in a wide array of applications from socio-genomics to personalized medicine. Traditional PGSs assume genotype data to be error-free, ignoring possible errors and uncertainties introduced from genotyping, sequencing, and/or imputation. In this work, we investigate the effects of genotyping error due to low coverage sequencing on PGS estimation. We leverage SNP array and low-coverage whole-genome sequencing data (lcWGS, median coverage 0.04×) of 802 individuals from the Dana-Farber PROFILE cohort to show that PGS error correlates with sequencing depth (p = 1.2 × 10-7). We develop a probabilistic approach that incorporates genotype error in PGS estimation to produce well-calibrated PGS credible intervals and show that the probabilistic approach increases classification accuracy by up to 6% as compared to traditional PGSs that ignore genotyping error. Finally, we use simulations to explore the combined effect of genotyping and effect size errors and their implication on PGS-based risk-stratification. Our results illustrate the importance of considering genotyping error as a source of PGS error especially for cohorts with varying genotyping technologies and/or low-coverage sequencing.

SUBMITTER: Petter E 

PROVIDER: S-EPMC10432141 | biostudies-literature | 2023 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

Genotype error due to low-coverage sequencing induces uncertainty in polygenic scoring.

Petter Ella E   Ding Yi Y   Hou Kangcheng K   Bhattacharya Arjun A   Gusev Alexander A   Zaitlen Noah N   Pasaniuc Bogdan B  

American journal of human genetics 20230724 8


Polygenic scores (PGSs) have emerged as a standard approach to predict phenotypes from genotype data in a wide array of applications from socio-genomics to personalized medicine. Traditional PGSs assume genotype data to be error-free, ignoring possible errors and uncertainties introduced from genotyping, sequencing, and/or imputation. In this work, we investigate the effects of genotyping error due to low coverage sequencing on PGS estimation. We leverage SNP array and low-coverage whole-genome  ...[more]

Similar Datasets

| S-EPMC3044311 | biostudies-literature
| S-EPMC3038916 | biostudies-literature
| S-EPMC8449454 | biostudies-literature
| S-EPMC10879460 | biostudies-literature
| S-EPMC11373650 | biostudies-literature
| S-EPMC8762119 | biostudies-literature
| S-EPMC4216915 | biostudies-literature
| S-EPMC6880438 | biostudies-literature
2021-01-31 | GSE165845 | GEO
| S-EPMC3117386 | biostudies-literature