Correlating electronic health record concepts with healthcare process events.
ABSTRACT: To study the relation between electronic health record (EHR) variables and healthcare process events.Lagged linear correlation was calculated between five healthcare process events and 84 EHR variables (24 clinical laboratory values and 60 clinical concepts extracted from clinical notes) in a 24-year database. The EHR variables were clustered for each healthcare process event and interpreted.Laboratory tests tended to cluster together and note concepts tended to cluster together. Within each of those two classes, the variables clustered into clinically sensible groupings. The exact groupings varied from healthcare process event to event, with the largest differences occurring between inpatient events and outpatient events.Unlike previously reported pairwise associations between variables, which highlighted correlations across the laboratory-clinical note divide, incorporating healthcare process events appeared to be sensitive to the manner in which the variables were collected.We believe that it may be possible to exploit this sensitivity to help knowledge engineers select variables and correct for biases.
Project description:OBJECTIVE:With its increasingly widespread adoption, electronic health records (EHR) have enabled phenotypic information extraction at an unprecedented granularity and scale. However, often a medical concept (e.g. diagnosis, prescription, symptom) is described in various synonyms across different EHR systems, hindering data integration for signal enhancement and complicating dimensionality reduction for knowledge discovery. Despite existing ontologies and hierarchies, tremendous human effort is needed for curation and maintenance - a process that is both unscalable and susceptible to subjective biases. This paper aims to develop a data-driven approach to automate grouping medical terms into clinically relevant concepts by combining multiple up-to-date data sources in an unbiased manner. METHODS:We present a novel data-driven grouping approach - multi-view banded spectral clustering (mvBSC) combining summary data from multiple healthcare systems. The proposed method consists of a banding step that leverages the prior knowledge from the existing coding hierarchy, and a combining step that performs spectral clustering on an optimally weighted matrix. RESULTS:We apply the proposed method to group ICD-9 and ICD-10-CM codes together by integrating data from two healthcare systems. We show grouping results and hierarchies for 13 representative disease categories. Individual grouping qualities were evaluated using normalized mutual information, adjusted Rand index, and F1-measure, and were found to consistently exhibit great similarity to the existing manual grouping counterpart. The resulting ICD groupings also enjoy comparable interpretability and are well aligned with the current ICD hierarchy. CONCLUSION:The proposed approach, by systematically leveraging multiple data sources, is able to overcome bias while maximizing consensus to achieve generalizability. It has the advantage of being efficient, scalable, and adaptive to the evolving human knowledge reflected in the data, showing a significant step toward automating medical knowledge integration.
Project description:OBJECTIVE:Electronic health records (EHR) hold great promise for managing patient information in ways that improve healthcare delivery. Physicians differ, however, in their use of this health information technology (IT), and these differences are not well understood. The authors study the differences in individual physicians' EHR use patterns and identify perceptions of uncertainty as an important new variable in understanding EHR use. DESIGN:Qualitative study using semi-structured interviews and direct observation of physicians (n=28) working in a multispecialty outpatient care organization. MEASUREMENTS:We identified physicians' perceptions of uncertainty as an important variable in understanding differences in EHR use patterns. Drawing on theories from the medical and organizational literatures, we identified three categories of perceptions of uncertainty: reduction, absorption, and hybrid. We used an existing model of EHR use to categorize physician EHR use patterns as high, medium, and low based on degree of feature use, level of EHR-enabled communication, and frequency that EHR use patterns change. RESULTS:Physicians' perceptions of uncertainty were distinctly associated with their EHR use patterns. Uncertainty reductionists tended to exhibit high levels of EHR use, uncertainty absorbers tended to exhibit low levels of EHR use, and physicians demonstrating both perspectives of uncertainty (hybrids) tended to exhibit medium levels of EHR use. CONCLUSIONS:We find evidence linking physicians' perceptions of uncertainty with EHR use patterns. Study findings have implications for health IT research, practice, and policy, particularly in terms of impacting health IT design and implementation efforts in ways that consider differences in physicians' perceptions of uncertainty.
Project description:BACKGROUND:When a paucity of clinical information is communicated from ordering physicians to radiologists at the time of radiology order entry, suboptimal imaging interpretations and patient care may result. OBJECTIVES:Compare documentation of relevant clinical information in electronic health record (EHR) provider note to computed tomography (CT) order requisition, prior to ordering of head CT for emergency department (ED) patients presenting with headache. METHODS:In this institutional review board-approved retrospective observational study performed between April 1, 2013 and September 30, 2014 at an adult quaternary academic hospital, we reviewed data from 666 consecutive ED encounters for patients with headaches who received head CT. The primary outcome was the number of concept unique identifiers (CUIs) relating to headache extracted via ontology-based natural language processing from the history of present illness (HPI) section in ED notes compared with the number of concepts obtained from the imaging order requisition. RESULTS:Our analysis was conducted on cases where the HPI note section was completed prior to image order entry, which occurred in 23.1% (154/666) of encounters. For these 154 encounters, the number of CUIs specific to headache per note extracted from the HPI (median?=?3, interquartile range [IQR]: 2-4) was significantly greater than the number of CUIs per encounter obtained from the imaging order requisition (median?=?1, IQR: 1-2; Wilcoxon signed rank p?<?0.0001). Extracted concepts from notes were distinct from order requisition indications in 92.9% (143/154) of cases. CONCLUSION:EHR provider notes are a valuable source of relevant clinical information at the time of imaging test ordering. Automated extraction of clinical information from notes to prepopulate imaging order requisitions may improve communication between ordering physicians and radiologists, enhance efficiency of ordering process by reducing redundant data entry, and may help improve clinical relevance of clinical decision support at the time of order entry, potentially reducing provider burnout from extraneous alerts.
Project description:OBJECTIVE:To evaluate the effects of Process-Reengineering interventions on the Electronic Health Records (EHR) of a hospital over 7 years. MATERIALS AND METHODS:Temporal Variability Assessment (TVA) based on probabilistic data quality assessment was applied to the historic monthly-batched admission data of Hospital La Fe Valencia, Spain from 2010 to 2016. Routine healthcare data with a complete EHR was expanded by processed variables such as the Charlson Comorbidity Index. RESULTS:Four Process-Reengineering interventions were detected by quantifiable effects on the EHR: (1) the hospital relocation in 2011 involved progressive reduction of admissions during the next four months, (2) the hospital services re-configuration incremented the number of inter-services transfers, (3) the care-services re-distribution led to transfers between facilities (4) the assignment to the hospital of a new area with 80,000 patients in 2015 inspired the discharge to home for follow up and the update of the pre-surgery planned admissions protocol that produced a significant decrease of the patient length of stay. DISCUSSION:TVA provides an indicator of the effect of process re-engineering interventions on healthcare practice. Evaluating the effect of facilities' relocation and increment of citizens (findings 1, 3-4), the impact of strategies (findings 2-3), and gradual changes in protocols (finding 4) may help on the hospital management by optimizing interventions based on their effect on EHRs or on data reuse. CONCLUSIONS:The effects on hospitals EHR due to process re-engineering interventions can be evaluated using the TVA methodology. Being aware of conditioned variations in EHR is of the utmost importance for the reliable reuse of routine hospitalization data.
Project description:Electronic health records (EHR) are introduced into healthcare organizations worldwide to improve patient safety, healthcare quality and efficiency. A rigorous evaluation of this technology is important to reduce potential negative effects on patient and staff, to provide decision makers with accurate information for system improvement and to ensure return on investment. Therefore, this study develops a theoretical model and questionnaire survey instrument to assess the success of organizational EHR in routine use from the viewpoint of nursing staff in residential aged care homes. The proposed research model incorporates six variables in the reformulated DeLone and McLean information systems success model: system quality, information quality, service quality, use, user satisfaction and net benefits. Two variables training and self-efficacy were also incorporated into the model. A questionnaire survey instrument was designed to measure the eight variables in the model. After a pilot test, the measurement scale was used to collect data from 243 nursing staff members in 10 residential aged care homes belonging to three management groups in Australia. Partial least squares path modeling was conducted to validate the model. The validated EHR systems success model predicts the impact of the four antecedent variables-training, self-efficacy, system quality and information quality-on the net benefits, the indicator of EHR systems success, through the intermittent variables use and user satisfaction. A 24-item measurement scale was developed to quantitatively evaluate the performance of an EHR system. The parsimonious EHR systems success model and the measurement scale can be used to benchmark EHR systems success across organizations and units and over time.
Project description:BACKGROUND:Data heterogeneity is a common phenomenon related to the secondary use of electronic health records (EHR) data from different sources. The Observational Health Data Sciences and Informatics (OHDSI) Common Data Model (CDM) organizes healthcare data into standard data structures using concepts that are explicitly and formally specified through standard vocabularies, thereby facilitating large-scale analysis. The objective of this study is to design, develop, and evaluate generic survival analysis routines built using the OHDSI CDM. METHODS:We used intrahepatic cholangiocarcinoma (ICC) patient data to implement CDM-based survival analysis methods. Our methods comprise the following modules: 1) Mapping local terms to standard OHDSI concepts. The analytical expression of variables and values related to demographic characteristics, medical history, smoking status, laboratory results, and tumor feature data. These data were mapped to standard OHDSI concepts through a manual analysis; 2) Loading patient data into the CDM using the concept mappings; 3) Developing an R interface that supports the portable survival analysis on top of OHDSI CDM, and comparing the CDM-based analysis results with those using traditional statistical analysis methods. RESULTS:Our dataset contained 346 patients diagnosed with ICC. The collected clinical data contains 115 variables, of which 75 variables were mapped to the OHDSI concepts. These concepts mainly belong to four domains: condition, observation, measurement, and procedure. The corresponding standard concepts are scattered in six vocabularies: ICD10CM, ICD10PCS, SNOMED, LOINC, NDFRT, and READ. We loaded a total of 25,950 patient data records into the OHDSI CDM database. However, 40 variables failed to map to the OHDSI CDM as they mostly belong to imaging data and pathological data. CONCLUSIONS:Our study demonstrates that conducting survival analysis using the OHDSI CDM is feasible and can produce reusable analysis routines. However, challenges to be overcome include 1) semantic loss caused by inaccurate mapping and value normalization; 2) incomplete OHDSI vocabularies describing imaging data, pathological data, and modular data representation.
Project description:Columbia Open Health Data (COHD) is a publicly accessible database of electronic health record (EHR) prevalence and co-occurrence frequencies between conditions, drugs, procedures, and demographics. COHD was derived from Columbia University Irving Medical Center's Observational Health Data Sciences and Informatics (OHDSI) database. The lifetime dataset, derived from all records, contains 36,578 single concepts (11,952 conditions, 12,334 drugs, and 10,816 procedures) and 32,788,901 concept pairs from 5,364,781 patients. The 5-year dataset, derived from records from 2013-2017, contains 29,964 single concepts (10,159 conditions, 10,264 drugs, and 8,270 procedures) and 15,927,195 concept pairs from 1,790,431 patients. Exclusion of rare concepts (count???10) and Poisson randomization enable data sharing by eliminating risks to patient privacy. EHR prevalences are informative of healthcare consumption rates. Analysis of co-occurrence frequencies via relative frequency analysis and observed-expected frequency ratio are informative of associations between clinical concepts, useful for biomedical research tasks such as drug repurposing and pharmacovigilance. COHD is publicly accessible through a web application-programming interface (API) and downloadable from the Figshare repository. The code is available on GitHub.
Project description:Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient's record. We propose a representation of patients' entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two US academic medical centers with 216,221 adult patients hospitalized for at least 24 h. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting: in-hospital mortality (area under the receiver operator curve [AUROC] across sites 0.93-0.94), 30-day unplanned readmission (AUROC 0.75-0.76), prolonged length of stay (AUROC 0.85-0.86), and all of a patient's final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed traditional, clinically-used predictive models in all cases. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios. In a case study of a particular prediction, we demonstrate that neural networks can be used to identify relevant information from the patient's chart.
Project description:Background?With the widespread adoption of electronic health records (EHRs), there is a growing awareness of problems in EHR training for new users and subsequent problems with the quality of information present in EHR-generated progress notes. By standardizing the case, simulation allows for the discovery of EHR patterns of use as well as a modality to aid in EHR training. Objective?To develop a high-fidelity EHR training exercise for internal medicine interns to understand patterns of EHR utilization in the generation of daily progress notes. Methods?Three months after beginning their internship, 32 interns participated in an EHR simulation designed to assess patterns in note writing and generation. Each intern was given a simulated chart and instructed to create a daily progress note. Notes were graded for use of copy-paste, macros, and accuracy of presented data. Results?A total of 31 out of 32 interns (97%) completed the exercise. There was wide variance in use of macros to populate data, with multiple macro types used for the same data category. Three-quarters of notes contained either copy-paste elements or the elimination of active medical problems from the prior days' notes. This was associated with a significant number of quality issues, including failure to recognize a lack of deep vein thrombosis prophylaxis, medications stopped on admission, and issues in prior discharge summary. Conclusions?Interns displayed wide variation in the process of creating progress notes. Additional studies are being conducted to determine the impact EHR-based simulation has on standardization of note content.
Project description:BACKGROUND:Patient portals are widely adopted in the United States and allow millions of patients access to their electronic health records (EHRs), including their EHR clinical notes. A patient's ability to understand the information in the EHR is dependent on their overall health literacy. Although many tests of health literacy exist, none specifically focuses on EHR note comprehension. OBJECTIVE:The aim of this paper was to develop an instrument to assess patients' EHR note comprehension. METHODS:We identified 6 common diseases or conditions (heart failure, diabetes, cancer, hypertension, chronic obstructive pulmonary disease, and liver failure) and selected 5 representative EHR notes for each disease or condition. One note that did not contain natural language text was removed. Questions were generated from these notes using Sentence Verification Technique and were analyzed using item response theory (IRT) to identify a set of questions that represent a good test of ability for EHR note comprehension. RESULTS:Using Sentence Verification Technique, 154 questions were generated from the 29 EHR notes initially obtained. Of these, 83 were manually selected for inclusion in the Amazon Mechanical Turk crowdsourcing tasks and 55 were ultimately retained following IRT analysis. A follow-up validation with a second Amazon Mechanical Turk task and IRT analysis confirmed that the 55 questions test a latent ability dimension for EHR note comprehension. A short test of 14 items was created along with the 55-item test. CONCLUSIONS:We developed ComprehENotes, an instrument for assessing EHR note comprehension from existing EHR notes, gathered responses using crowdsourcing, and used IRT to analyze those responses, thus resulting in a set of questions to measure EHR note comprehension. Crowdsourced responses from Amazon Mechanical Turk can be used to estimate item parameters and select a subset of items for inclusion in the test set using IRT. The final set of questions is the first test of EHR note comprehension.