Comparative effectiveness research of chronic hepatitis B and C cohort study (CHeCS): improving data collection and cohort identification.
ABSTRACT: The Chronic Hepatitis Cohort Study (CHeCS) is a longitudinal observational study of risks and benefits of treatments and care in patients with chronic hepatitis B (HBV) and C (HCV) infection from four US health systems. We hypothesized that comparative effectiveness methods-including a centralized data management system and an adaptive approach for cohort selection-would improve cohort selection while controlling data quality and reducing the cost.Cohort selection and data collection were performed primarily via the electronic health record (EHR); cases were confirmed via chart abstraction. Two parallel sources fed data to a centralized data management system: direct EHR data collection with common data elements, and chart abstraction via electronic data capture. An adaptive Classification and Regression Tree (CART) identified a set of electronic variables to improve case ascertainment accuracy.Over 16 million patient records were collected on 23 case report forms in 2006-2008. The vast majority of data (99.2%) were collected electronically from EHR; only 0.8% was collected via chart abstraction. Initial electronic criteria identified 12,144 chronic hepatitis patients; 10,098 were confirmed via chart abstraction with positive predictive values (PPV) 79 and 83% for HBV and HCV, respectively. CART-optimized models significantly increased PPV to 88 for HBV and 95% for HCV.CHeCS is a comparative effectiveness research project that leverages electronic centralized data collection and adaptive cohort identification approaches to enhance study efficiency. The adaptive CART model significantly improved the positive predictive value of cohort identification methods.
Project description:While electronic health records (EHRs) play a key role in increasing colorectal cancer (CRC) screening by identifying individuals who are overdue, important shortfalls remain.As part of the Strategies and Opportunities to STOP Colon Cancer (STOP CRC) study, we assessed the accuracy of EHR codes in identifying patients eligible for CRC screening.We selected a stratified random sample of 800 study participants from 26 participating clinics, in the Pacific Northwest region of the USA. We compared data obtained through codes in the EHR to conduct a manual chart audit. A trained chart abstractor completed the abstraction of eligible and ineligible patients.Of 520 individuals in need of CRC screening, identified via the EHR, 459 were confirmed through chart review (positive predictive value = 88%). Of 280 individuals flagged as up-to-date in their screening per EHR data, 269 were confirmed through chart review (negative predictive value = 96%). Among the 61 patients incorrectly classified as eligible, 83.6% of disagreements were due to evidence of a prior colonoscopy or referral that was not captured in recognizable fields in the EHR.Our findings highlight importance of better capture of past screening events in the EHR. While the need for better population-based data is not unique to CRC screening, it provides an important example of the use of population-based data not only for tracking care, but also for delivering interventions.
Project description:The objective was to compare case ascertainment, agreement, validity, and missing values for clinical research data obtained, processed, and linked electronically from electronic health records (EHR) compared to "manual" data processing and record abstraction in a cohort of out-of-hospital trauma patients.? This was a secondary analysis of two sets of data collected for a prospective, population-based, out-of-hospital trauma cohort evaluated by 10 emergency medical services (EMS) agencies transporting to 16 hospitals, from January 1, 2006, through October 2, 2007. Eighteen clinical, operational, procedural, and outcome variables were collected and processed separately and independently using two parallel data processing strategies by personnel blinded to patients in the other group. The electronic approach included EHR data exports from EMS agencies, reformatting, and probabilistic linkage to outcomes from local trauma registries and state discharge databases. The manual data processing approach included chart matching, data abstraction, and data entry by a trained abstractor. Descriptive statistics, measures of agreement, and validity were used to compare the two approaches to data processing.? During the 21-month period, 418 patients underwent both data processing methods and formed the primary cohort. Agreement was good to excellent (kappa = 0.76 to 0.97; intraclass correlation coefficient [ICC] = 0.49 to 0.97), with exact agreement in 67% to 99% of cases and a median difference of zero for all continuous and ordinal variables. The proportions of missing out-of-hospital values were similar between the two approaches, although electronic processing generated more missing outcomes (87 of 418, 21%, 95% confidence interval [CI] = 17% to 25%) than the manual approach (11 of 418, 3%, 95% CI = 1% to 5%). Case ascertainment of eligible injured patients was greater using electronic methods (n = 3,008) compared to manual methods (n = 629).? In this sample of out-of-hospital trauma patients, an all-electronic data processing strategy identified more patients and generated values with good agreement and validity compared to traditional data collection and processing methods.
Project description:OBJECTIVE:To evaluate the ability of electronic health record (EHR) data extracted into a data-sharing system to accurately identify contraceptive use. STUDY DESIGN:We compared rates of contraceptive use from electronic extraction of EHR data via a data-sharing system and manual abstraction of the EHR among 142 female patients ages 15-49?years from a family medicine clinic within a primary care practice-based research network (PBRN). Cohen's kappa coefficient measured agreement between electronic extraction and manual abstraction. RESULTS:Manual abstraction identified 62% of women as contraceptive users, whereas electronic extraction identified only 27%. Long acting reversible (LARC) methods had 96% agreement (Cohen's kappa 0.78; confidence interval, 0.57-0.99) between electronic extraction and manual abstraction. EHR data extracted via a data-sharing system was unable to identify barrier or over-the-counter contraceptives. CONCLUSIONS:Electronic extraction found substantially lower overall rates of contraceptive method use, but produced more comparable LARC method use rates when compared to manual abstraction among women in this study's primary care clinic. IMPLICATIONS:Quality metrics related to contraceptive use that rely on EHR data in this study's data-sharing system likely under-estimated true contraceptive use.
Project description:BACKGROUND:Existing prediction models for acute respiratory distress syndrome (ARDS) require manual chart abstraction and have only fair performance-limiting their suitability for driving clinical interventions. We sought to develop a machine learning approach for the prediction of ARDS that (a) leverages electronic health record (EHR) data, (b) is fully automated, and (c) can be applied at clinically relevant time points throughout a patient's stay. METHODS AND FINDINGS:We trained a risk stratification model for ARDS using a cohort of 1,621 patients with moderate hypoxia from a single center in 2016, of which 51 patients developed ARDS. We tested the model in a temporally distinct cohort of 1,122 patients from 2017, of which 27 patients developed ARDS. Gold standard diagnosis of ARDS was made by intensive care trained physicians during retrospective chart review. We considered both linear and non-linear approaches to learning the model. The best model used L2-logistic regression with 984 features extracted from the EHR. For patients observed in the hospital at least six hours who then developed moderate hypoxia, the model achieved an area under the receiver operating characteristics curve (AUROC) of 0.81 (95% CI: 0.73-0.88). Selecting a threshold based on the 85th percentile of risk, the model had a sensitivity of 56% (95% CI: 35%, 74%), specificity of 86% (95% CI: 85%, 87%) and positive predictive value of 9% (95% CI: 5%, 14%), identifying a population at four times higher risk for ARDS than other patients with moderate hypoxia and 17 times the risk of hospitalized adults. CONCLUSIONS:We developed an ARDS prediction model based on EHR data with good discriminative performance. Our results demonstrate the feasibility of a machine learning approach to risk stratifying patients for ARDS solely from data extracted automatically from the EHR.
Project description:<h4>Background</h4>The Meaningful Use (MU) program has increased the national emphasis on electronic measurement of hospital quality.<h4>Objective</h4>To evaluate stroke MU and one VHA stroke electronic clinical quality measure (eCQM) in national VHA data and determine sources of error in using centralized electronic health record (EHR) data.<h4>Design</h4>Our study is a retrospective cross-sectional study of stroke quality measure eCQMs vs. chart review in a national EHR. We developed local SQL algorithms to generate the eCQMs, then modified them to run on VHA Central Data Warehouse (CDW) data. eCQM results were generated from CDW data in 2130 ischemic stroke admissions in 11 VHA hospitals. Local and CDW results were compared to chart review.<h4>Main measures</h4>We calculated the raw proportion of matching cases, sensitivity/specificity, and positive/negative predictive values (PPV/NPV) for the numerators and denominators of each eCQM. To assess overall agreement for each eCQM, we calculated a weighted kappa and prevalence-adjusted bias-adjusted kappa statistic for a three-level outcome: ineligible, eligible-passed, or eligible-failed.<h4>Key results</h4>In five eCQMs, the proportion of matched cases between CDW and chart ranged from 95.4 %-99.7 % (denominators) and 87.7 %-97.9 % (numerators). PPVs tended to be higher (range 96.8 %-100 % in CDW) with NPVs less stable and lower. Prevalence-adjusted bias-adjusted kappas for overall agreement ranged from 0.73-0.95. Common errors included difficulty in identifying: (1) mechanical VTE prophylaxis devices, (2) hospice and other specific discharge disposition, and (3) contraindications to receiving care processes.<h4>Conclusions</h4>Stroke MU indicators can be relatively accurately generated from existing EHR systems (nearly 90 % match to chart review), but accuracy decreases slightly in central compared to local data sources. To improve stroke MU measure accuracy, EHRs should include standardized data elements for devices, discharge disposition (including hospice and comfort care status), and recording contraindications.
Project description:<h4>Importance</h4>A cornerstone of precision medicine is the identification and use of biomarkers that help subtype patients for targeted treatment. Such an approach requires the development and subsequent interrogation of large-scale biobanks linked to well-annotated clinical data. Traditional means of creating these data-linked biobanks are costly and lengthy, especially in acute conditions that require time-sensitive clinical data and biospecimens.<h4>Objectives</h4>To develop a virtually enabled biorepository and electronic health record (EHR)-embedded, scalable cohort for precision medicine (VESPRE) and compare the feasibility, enrollment, and costs of VESPRE with those of a traditional study design in acute care.<h4>Design, setting, and participants</h4>In a prospective cohort study, the EHR-embedded screening alert was generated for 3428 patients, and 2199 patients (64%) were eligible and screened. Of these, 1027 patients (30%) were enrolled. VESPRE was developed for regulatory compliance, feasibility, internal validity, and cost in a prospective cohort of 1027 patients (aged ?18 years) with sepsis-3 within 6 hours of presentation to the emergency department. The VESPRE infrastructure included (1) automated EHR screening, (2) remnant blood collection for creation of a virtually enabled biorepository, and (3) automated clinical data abstraction. The study was conducted at an academic institution in southwestern Pennsylvania from October 17, 2017, to June 6, 2019.<h4>Main outcomes and measures</h4>Regulatory compliance, enrollment, internal validity of automated screening, biorepository acquisition, and costs.<h4>Results</h4>Of the 1027 patients enrolled in the study, 549 were included in the proof-of-concept analysis (305 [56%] men); median (SD) age was 59 (17) years. VESPRE collected 12?963 remnant blood and urine samples and demonstrated adequate feasibility for clinical, biomarker, and microbiome analyses. Over the 20-month test, the total cost beyond the existing operations infrastructure was $39?417.50 ($14?880.00 project management, $22?717.50 laboratory supplies/staff, and $1820.00 data management)-approximately $39 per enrolled patient vs $239 per patient for a traditional cohort study.<h4>Conclusions and relevance</h4>Results of this study suggest that, in a large US health system that collects data using a common EHR platform and centralized laboratory system, VESPRE, a large-scale, inexpensive EHR-embedded infrastructure for precision medicine can be used. Tested in the sepsis setting, VESPRE appeared to capture a high proportion of eligible patients at low incremental cost.
Project description:<h4>Background</h4>The availability of high fidelity electronic health record (EHR) data is a hallmark of the learning health care system. Washington State's Surgical Care Outcomes and Assessment Program (SCOAP) is a network of hospitals participating in quality improvement (QI) registries wherein data are manually abstracted from EHRs. To create the Comparative Effectiveness Research and Translation Network (CERTAIN), we semi-automated SCOAP data abstraction using a centralized federated data model, created a central data repository (CDR), and assessed whether these data could be used as real world evidence for QI and research.<h4>Objectives</h4>Describe the validation processes and complexities involved and lessons learned.<h4>Methods</h4>Investigators installed a commercial CDR to retrieve and store data from disparate EHRs. Manual and automated abstraction systems were conducted in parallel (10/2012-7/2013) and validated in three phases using the EHR as the gold standard: 1) ingestion, 2) standardization, and 3) concordance of automated versus manually abstracted cases. Information retrieval statistics were calculated.<h4>Results</h4>Four unaffiliated health systems provided data. Between 6 and 15 percent of data elements were abstracted: 51 to 86 percent from structured data; the remainder using natural language processing (NLP). In phase 1, data ingestion from 12 out of 20 feeds reached 95 percent accuracy. In phase 2, 55 percent of structured data elements performed with 96 to 100 percent accuracy; NLP with 89 to 91 percent accuracy. In phase 3, concordance ranged from 69 to 89 percent. Information retrieval statistics were consistently above 90 percent.<h4>Conclusions</h4>Semi-automated data abstraction may be useful, although raw data collected as a byproduct of health care delivery is not immediately available for use as real world evidence. New approaches to gathering and analyzing extant data are required.
Project description:<h4>Background</h4>Some physicians in intensive care units (ICUs) report that electronic health records (EHRs) can be cumbersome and disruptive to workflow. There are significant gaps in our understanding of the physician-EHR interaction.<h4>Objective</h4>To better understand how clinicians use the EHR for chart review during ICU pre-rounds through the characterisation and description of screen navigation pathways and workflow patterns.<h4>Method</h4>We conducted a live, direct observational study of six physician trainees performing electronic chart review during daily pre-rounds in the 30-bed medical ICU at a large academic medical centre in the Southeastern United States. A tailored checklist was used by observers for data collection.<h4>Results</h4>We observed 52 distinct live patient chart review encounters, capturing a total of 2.7 hours of pre-rounding chart review activity by six individual physicians. Physicians reviewed an average of 8.7 patients (range = 5-12), spending a mean of 3:05 minutes per patient (range = 1:34-5:18). On average, physicians visited 6.3 (±3.1) total EHR screens per patient (range = 1-16). Four unique screens were viewed most commonly, accounting for over half (52.7%) of all screen visits: results review (17.9%), summary/overview (13.0%), flowsheet (12.7%), and the chart review tab (9.1%). Navigation pathways were highly variable, but several common screen transition patterns emerged across users. Average interrater reliability for the paired EHR observation was 80.0%.<h4>Conclusion</h4>We observed the physician-EHR interaction during ICU pre-rounds to be brief and highly focused. Although we observed a high degree of "information sprawl" in physicians' digital navigation, we also identified common launch points for electronic chart review, key high-traffic screens and common screen transition patterns.<h4>Implications</h4>From the study findings, we suggest recommendations towards improved EHR design.
Project description:Objectives:Clinical guidelines recommending the use of myeloid growth factors are largely based on the prescribed chemotherapy regimen. The guidelines suggest that oncologists consider patient-specific characteristics when prescribing granulocyte-colony stimulating factor (G-CSF) prophylaxis; however, a mechanism to quantify individual patient risk is lacking. Readily available electronic health record (EHR) data can provide patient-specific information needed for individualized neutropenia risk estimation. An evidence-based, individualized neutropenia risk estimation algorithm has been developed. This study evaluated the automated extraction of EHR chemotherapy treatment data and externally validated the neutropenia risk prediction model. Materials and Methods:A retrospective cohort of adult patients with newly diagnosed breast, colorectal, lung, lymphoid, or ovarian cancer who received the first cycle of a cytotoxic chemotherapy regimen from 2008 to 2013 were recruited from a single cancer clinic. Electronically extracted EHR chemotherapy treatment data were validated by chart review. Neutropenia risk stratification was conducted and risk model performance was assessed using calibration and discrimination. Results:Chemotherapy treatment data electronically extracted from the EHR were verified by chart review. The neutropenia risk prediction tool classified 126 patients (57%) as being low risk for febrile neutropenia, 44 (20%) as intermediate risk, and 51 (23%) as high risk. The model was well calibrated (Hosmer-Lemeshow goodness-of-fit test?=?0.24). Discrimination was adequate and slightly less than in the original internal validation (c-statistic 0.75 vs 0.81). Conclusion:Chemotherapy treatment data were electronically extracted from the EHR successfully. The individualized neutropenia risk prediction model performed well in our retrospective external cohort.
Project description:<h4>Objectives</h4>Clinico-genomic data (CGD) acquired through routine clinical practice has the potential to improve our understanding of clinical oncology. However, these data often reside in heterogeneous and semistructured data, resulting in prolonged time-to-analyses.<h4>Materials and methods</h4>We created GENETEX: an R package and Shiny application for text mining genomic reports from electronic health record (EHR) and direct import into Research Electronic Data Capture (REDCap).<h4>Results</h4>GENETEX facilitates the abstraction of CGD from EHR and streamlines the capture of structured data into REDCap. Its functions include natural language processing of key genomic information, transformation of semistructured data into structured data, and importation into REDCap. When evaluated with manual abstraction, GENETEX had >99% agreement and captured CGD in approximately one-fifth the time.<h4>Conclusions</h4>GENETEX is freely available under the Massachusetts Institute of Technology license and can be obtained from GitHub (https://github.com/TheMillerLab/genetex). GENETEX is executed in R and deployed as a Shiny application for non-R users. It produces high-fidelity abstraction of CGD in a fraction of the time.