Unknown

Dataset Information

0

Equitable machine learning counteracts ancestral bias in precision medicine, improving outcomes for all.


ABSTRACT: Gold standard genomic datasets severely under-represent non-European populations, leading to inequities and a limited understanding of human disease [1-8]. Therapeutics and outcomes remain hidden because we lack insights that we could gain from analyzing ancestry-unbiased genomic data. To address this significant gap, we present PhyloFrame, the first-ever machine learning method for equitable genomic precision medicine. PhyloFrame corrects for ancestral bias by integrating big data tissue-specific functional interaction networks, global population variation data, and disease-relevant transcriptomic data. Application of PhyloFrame to breast, thyroid, and uterine cancers shows marked improvements in predictive power across all ancestries, less model overfitting, and a higher likelihood of identifying known cancer-related genes. The ability to provide accurate predictions for underrepresented groups, in particular, is substantially increased. These results demonstrate how AI can mitigate ancestral bias in training data and contribute to equitable representation in medical research.

SUBMITTER: Graim K 

PROVIDER: S-EPMC10402189 | biostudies-literature | 2023 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Equitable machine learning counteracts ancestral bias in precision medicine, improving outcomes for all.

Smith Leslie A LA   Cahill James A JA   Graim Kiley K  

Research square 20230727


Gold standard genomic datasets severely under-represent non-European populations, leading to inequities and a limited understanding of human disease [1-8]. Therapeutics and outcomes remain hidden because we lack insights that we could gain from analyzing ancestry-unbiased genomic data. To address this significant gap, we present PhyloFrame, the first-ever machine learning method for equitable genomic precision medicine. PhyloFrame corrects for ancestral bias by integrating big data tissue-specif  ...[more]

Similar Datasets

| S-EPMC11894161 | biostudies-literature
2017-12-13 | GSE108004 | GEO
2017-11-30 | GSE107465 | GEO
| S-EPMC5937700 | biostudies-literature
2017-12-13 | GSE108003 | GEO
| S-EPMC11910143 | biostudies-literature
| S-EPMC8846336 | biostudies-literature
| S-EPMC10572800 | biostudies-literature
| S-EPMC10803549 | biostudies-literature
| S-EPMC9365193 | biostudies-literature