Unknown

Dataset Information

0

Can Race-sensitive Biomedical Embeddings Improve Healthcare Predictive Models?


ABSTRACT: This reproducibility study presents an algorithm to weigh in race distribution data of clinical research study samples when training biomedical embeddings. We extracted 12,864 PubMed abstracts published between January 1st, 2000 and January 1st, 2022 and weighed them based on the race distribution data extracted from their corresponding clinical trials registered on ClinicalTrials.gov. We trained Word2vec and BERT embeddings and evaluated their performance on predicting length of hospital stay (LHS) and intensive care unit (ICU) readmission using MIMIC-IV electronic health record data. We observed that models trained using race-sensitive embeddings do not consistently outperform the neutral embeddings ones when used for LHS prediction (with similar Mean Absolute Error 1.975 vs. 2.008) or ICU readmission prediction (with similar accuracy 74.61% vs. 75.17% and the same AUC 0.775), respectively. We conclude that demographic sensitive embeddings do not necessarily significantly improve the accuracy of health predictive models as previously reported in the literature.

SUBMITTER: Liu H 

PROVIDER: S-EPMC10283113 | biostudies-literature | 2023

REPOSITORIES: biostudies-literature

altmetric image

Publications

Can Race-sensitive Biomedical Embeddings Improve Healthcare Predictive Models?

Liu Hao H   Moustafa-Fahmy Nour N   Ta Casey C   Weng Chunhua C  

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science 20230616


This reproducibility study presents an algorithm to weigh in race distribution data of clinical research study samples when training biomedical embeddings. We extracted 12,864 PubMed abstracts published between January 1<sup>st</sup>, 2000 and January 1<sup>st</sup>, 2022 and weighed them based on the race distribution data extracted from their corresponding clinical trials registered on ClinicalTrials.gov. We trained Word2vec and BERT embeddings and evaluated their performance on predicting len  ...[more]

Similar Datasets

| S-EPMC8800511 | biostudies-literature
| S-EPMC535463 | biostudies-literature
| S-EPMC7971091 | biostudies-literature
| S-EPMC10942778 | biostudies-literature
| S-EPMC11519529 | biostudies-literature
| S-EPMC7959619 | biostudies-literature
| S-EPMC6510737 | biostudies-literature
| S-EPMC8009088 | biostudies-literature
| S-EPMC10650050 | biostudies-literature
| S-EPMC6585427 | biostudies-literature