Dataset Information

Comparing Machine Learning to Regression Methods for Mortality Prediction Using Veterans Affairs Electronic Health Record Clinical Data.

ABSTRACT:

Background

It is unclear whether machine learning methods yield more accurate electronic health record (EHR) prediction models compared with traditional regression methods.

Objective

The objective of this study was to compare machine learning and traditional regression models for 10-year mortality prediction using EHR data.

Design

This was a cohort study.

Setting

Veterans Affairs (VA) EHR data.

Participants

Veterans age above 50 with a primary care visit in 2005, divided into separate training and testing cohorts (n= 124,360 each).

Measurements and analytic methods

The primary outcome was 10-year all-cause mortality. We considered 924 potential predictors across a wide range of EHR data elements including demographics (3), vital signs (9), medication classes (399), disease diagnoses (293), laboratory results (71), and health care utilization (149). We compared discrimination (c-statistics), calibration metrics, and diagnostic test characteristics (sensitivity, specificity, and positive and negative predictive values) of machine learning and regression models.

Results

Our cohort mean age (SD) was 68.2 (10.5), 93.9% were male; 39.4% died within 10 years. Models yielded testing cohort c-statistics between 0.827 and 0.837. Utilizing all 924 predictors, the Gradient Boosting model yielded the highest c-statistic [0.837, 95% confidence interval (CI): 0.835-0.839]. The full (unselected) logistic regression model had the highest c-statistic of regression models (0.833, 95% CI: 0.830-0.835) but showed evidence of overfitting. The discrimination of the stepwise selection logistic model (101 predictors) was similar (0.832, 95% CI: 0.830-0.834) with minimal overfitting. All models were well-calibrated and had similar diagnostic test characteristics.

Limitation

Our results should be confirmed in non-VA EHRs.

Conclusion

The differences in c-statistic between the best machine learning model (924-predictor Gradient Boosting) and 101-predictor stepwise logistic models for 10-year mortality prediction were modest, suggesting stepwise regression methods continue to be a reasonable method for VA EHR mortality prediction model development.

SUBMITTER: Jing B

PROVIDER: S-EPMC9106858 | biostudies-literature | 2022 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Comparing Machine Learning to Regression Methods for Mortality Prediction Using Veterans Affairs Electronic Health Record Clinical Data.

Jing Bocheng B Boscardin W John WJ Deardorff W James WJ Jeon Sun Young SY Lee Alexandra K AK Donovan Anne L AL Lee Sei J SJ

Medical care 20220330 6

<h4>Background</h4>It is unclear whether machine learning methods yield more accurate electronic health record (EHR) prediction models compared with traditional regression methods.<h4>Objective</h4>The objective of this study was to compare machine learning and traditional regression models for 10-year mortality prediction using EHR data.<h4>Design</h4>This was a cohort study.<h4>Setting</h4>Veterans Affairs (VA) EHR data.<h4>Participants</h4>Veterans age above 50 with a primary care visit in 20 ...[more]

PMID: 35352701

Similar Datasets

Project description:BackgroundElectronic health record (EHR) prediction models may be easier to use in busy clinical settings since EHR data can be auto-populated into models. This study assessed whether adding functional status and/or Medicare claims data (which are often not available in EHRs) improves the accuracy of a previously developed Veterans Affairs (VA) EHR-based mortality index.MethodsThis was a retrospective cohort study of veterans aged 75 years and older enrolled in VA primary care clinics followed from January 2014 to April 2020 (n = 62,014). We randomly split participants into development (n = 49,612) and validation (n = 12,402) cohorts. The primary outcome was all-cause mortality. We performed logistic regression with backward stepwise selection to develop a 100-predictor base model using 854 EHR candidate variables, including demographics, laboratory values, medications, healthcare utilization, diagnosis codes, and vitals. We incorporated functional measures in a base + function model by adding activities of daily living (range 0-5) and instrumental activities of daily living (range 0-7) scores. Medicare data, including healthcare utilization (e.g., emergency department visits, hospitalizations) and diagnosis codes, were incorporated in a base + Medicare model. A base + function + Medicare model included all data elements. We assessed model performance with the c-statistic, reclassification metrics, fraction of new information provided, and calibration plots.ResultsIn the overall cohort, mean age was 82.6 years and 98.6% were male. At the end of follow-up, 30,263 participants (48.8%) had died. The base model c-statistic was 0.809 (95% CI 0.805-0.812) in the development cohort and 0.804 (95% CI 0.796-0.812) in the validation cohort. Validation cohort c-statistics for the base + function, base + Medicare, and base + function + Medicare models were 0.809 (95% CI 0.801-0.816), 0.811 (95% CI 0.803-0.818), and 0.814 (95% CI 0.807-0.822), respectively. Adding functional status and Medicare data resulted in similarly small improvements among other model performance measures. All models showed excellent calibration.ConclusionsIncorporation of functional status and Medicare data into a VA EHR-based mortality index led to small but likely clinically insignificant improvements in model performance.

Dataset Information

Comparing Machine Learning to Regression Methods for Mortality Prediction Using Veterans Affairs Electronic Health Record Clinical Data.

Background

Objective

Design

Setting

Participants

Measurements and analytic methods

Results

Limitation

Conclusion

Publications

Comparing Machine Learning to Regression Methods for Mortality Prediction Using Veterans Affairs Electronic Health Record Clinical Data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets