Dataset Information

Assessing stroke severity using electronic health record data: a machine learning approach.

ABSTRACT:

Background

Stroke severity is an important predictor of patient outcomes and is commonly measured with the National Institutes of Health Stroke Scale (NIHSS) scores. Because these scores are often recorded as free text in physician reports, structured real-world evidence databases seldom include the severity. The aim of this study was to use machine learning models to impute NIHSS scores for all patients with newly diagnosed stroke from multi-institution electronic health record (EHR) data.

Methods

NIHSS scores available in the Optum© de-identified Integrated Claims-Clinical dataset were extracted from physician notes by applying natural language processing (NLP) methods. The cohort analyzed in the study consists of the 7149 patients with an inpatient or emergency room diagnosis of ischemic stroke, hemorrhagic stroke, or transient ischemic attack and a corresponding NLP-extracted NIHSS score. A subset of these patients (n = 1033, 14%) were held out for independent validation of model performance and the remaining patients (n = 6116, 86%) were used for training the model. Several machine learning models were evaluated, and parameters optimized using cross-validation on the training set. The model with optimal performance, a random forest model, was ultimately evaluated on the holdout set.

Results

Leveraging machine learning we identified the main factors in electronic health record data for assessing stroke severity, including death within the same month as stroke occurrence, length of hospital stay following stroke occurrence, aphagia/dysphagia diagnosis, hemiplegia diagnosis, and whether a patient was discharged to home or self-care. Comparing the imputed NIHSS scores to the NLP-extracted NIHSS scores on the holdout data set yielded an R² (coefficient of determination) of 0.57, an R (Pearson correlation coefficient) of 0.76, and a root-mean-squared error of 4.5.

Conclusions

Machine learning models built on EHR data can be used to determine proxies for stroke severity. This enables severity to be incorporated in studies of stroke patient outcomes using administrative and EHR databases.

SUBMITTER: Kogan E

PROVIDER: S-EPMC6950922 | biostudies-literature | 2020 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Assessing stroke severity using electronic health record data: a machine learning approach.

Kogan Emily E Twyman Kathryn K Heap Jesse J Milentijevic Dejan D Lin Jennifer H JH Alberts Mark M

BMC medical informatics and decision making 20200108 1

<h4>Background</h4>Stroke severity is an important predictor of patient outcomes and is commonly measured with the National Institutes of Health Stroke Scale (NIHSS) scores. Because these scores are often recorded as free text in physician reports, structured real-world evidence databases seldom include the severity. The aim of this study was to use machine learning models to impute NIHSS scores for all patients with newly diagnosed stroke from multi-institution electronic health record (EHR) da ...[more]

PMID: 31914991

Dataset Information

Assessing stroke severity using electronic health record data: a machine learning approach.

Background

Methods

Results

Conclusions

Publications

Assessing stroke severity using electronic health record data: a machine learning approach.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Classifying Pseudogout Using Machine Learning Approaches With Electronic Health Record Data.
| S-EPMC7338229 | biostudies-literature

Identification of postoperative complications using electronic health record data and machine learning.
| S-EPMC7183252 | biostudies-literature

Utilizing electronic health record data to understand comorbidity burden among people living with HIV: a machine learning approach.
| S-EPMC8058944 | biostudies-literature

Postoperative delirium prediction using machine learning models and preoperative electronic health record data.
| S-EPMC8722098 | biostudies-literature

Predicting Intensive Care Unit Readmission with Machine Learning Using Electronic Health Record Data.
| S-EPMC6207111 | biostudies-literature

Machine learning applied to electronic health record data in home healthcare: A scoping review.
| S-EPMC9869861 | biostudies-literature

Classification of Current Procedural Terminology Codes from Electronic Health Record Data Using Machine Learning.
| S-EPMC7665375 | biostudies-literature

Prediction of incident myocardial infarction using machine learning applied to harmonized electronic health record data.
| S-EPMC7532582 | biostudies-literature

Diagnostic signature for heart failure with preserved ejection fraction (HFpEF): a machine learning approach using multi-modality electronic health record data.
| S-EPMC9791783 | biostudies-literature

Early prediction of end-stage kidney disease using electronic health record data: a machine learning approach with a 2-year horizon.
| S-EPMC10898824 | biostudies-literature