Unknown

Dataset Information

0

Ancestry may confound genetic machine learning: Candidate-gene prediction of opioid use disorder as an example.


ABSTRACT:

Background

Machine learning (ML) models are beginning to proliferate in psychiatry, however machine learning models in psychiatric genetics have not always accounted for ancestry. Using an empirical example of a proposed genetic test for OUD, and exploring a similar test for tobacco dependence and a simulated binary phenotype, we show that genetic prediction using ML is vulnerable to ancestral confounding.

Methods

We utilize five ML algorithms trained with 16 brain reward-derived "candidate" SNPs proposed for commercial use and examine their ability to predict OUD vs. ancestry in an out-of-sample test set (N = 1000, stratified into equal groups of n = 250 cases and controls each of European and African ancestry). We rerun analyses with 8 random sets of allele-frequency matched SNPs. We contrast findings with 11 genome-wide significant variants for tobacco smoking. To document generalizability, we generate and test a random phenotype.

Results

None of the 5 ML algorithms predict OUD better than chance when ancestry was balanced but were confounded with ancestry in an out-of-sample test. In addition, the algorithms preferentially predicted admixed subpopulations. Random sets of variants matched to the candidate SNPs by allele frequency produced similar bias. Genome-wide significant tobacco smoking variants were also confounded by ancestry. Finally, random SNPs predicting a random simulated phenotype show that the bias attributable to ancestral confounding could impact any ML-based genetic prediction.

Conclusions

Researchers and clinicians are encouraged to be skeptical of claims of high prediction accuracy from ML-derived genetic algorithms for polygenic traits like addiction, particularly when using candidate variants.

SUBMITTER: Hatoum AS 

PROVIDER: S-EPMC9358969 | biostudies-literature | 2021 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Ancestry may confound genetic machine learning: Candidate-gene prediction of opioid use disorder as an example.

Hatoum Alexander S AS   Wendt Frank R FR   Galimberti Marco M   Polimanti Renato R   Neale Benjamin B   Kranzler Henry R HR   Gelernter Joel J   Edenberg Howard J HJ   Agrawal Arpana A  

Drug and alcohol dependence 20211009 Pt B


<h4>Background</h4>Machine learning (ML) models are beginning to proliferate in psychiatry, however machine learning models in psychiatric genetics have not always accounted for ancestry. Using an empirical example of a proposed genetic test for OUD, and exploring a similar test for tobacco dependence and a simulated binary phenotype, we show that genetic prediction using ML is vulnerable to ancestral confounding.<h4>Methods</h4>We utilize five ML algorithms trained with 16 brain reward-derived  ...[more]

Similar Datasets

| S-EPMC11372321 | biostudies-literature
| S-EPMC10202846 | biostudies-literature
| S-EPMC10541796 | biostudies-literature
| S-EPMC9754174 | biostudies-literature
2013-01-01 | E-GEOD-29210 | biostudies-arrayexpress
| S-EPMC11556439 | biostudies-literature
| S-EPMC9720482 | biostudies-literature
| S-EPMC8871589 | biostudies-literature
| S-EPMC8661425 | biostudies-literature
| S-EPMC10606184 | biostudies-literature