Project description:BackgroundThe Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) has been implemented on various claims and electronic health record (EHR) databases, but has not been applied to a hospital transactional database. This study addresses the implementation of the OMOP CDM on the U.S. Premier Hospital database.MethodsWe designed and implemented an extract, transform, load (ETL) process to convert the Premier hospital database into the OMOP CDM. Standard charge codes in Premier were mapped between the OMOP version 4.0 Vocabulary and standard charge descriptions. Visit logic was added to impute the visit dates. We tested the conversion by replicating a published study using the raw and transformed databases. The Premier hospital database was compared to a claims database, in regard to prevalence of disease.FindingsThe data transformed into the CDM resulted in 1% of the data being discarded due to data errors in the raw data. A total of 91.4% of Premier standard charge codes were mapped successfully to a standard vocabulary. The results of the replication study resulted in a similar distribution of patient characteristics. The comparison to the claims data yields notable similarities and differences amongst conditions represented in both databases.DiscussionThe transformation of the Premier database into the OMOP CDM version 4.0 adds value in conducting analyses due to successful mapping of the drugs and procedures. The addition of visit logic gives ordinality to drugs and procedures that wasn't present prior to the transformation. Comparing conditions in Premier against a claims database can provide an understanding about Premier's potential use in pharmacoepidemiology studies that are traditionally conducted via claims databases.Conclusion and next stepsThe conversion of the Premier database into the OMOP CDM 4.0 was completed successfully. The next steps include refinement of vocabularies and mappings and continual maintenance of the transformed CDM.
Project description:BackgroundElectronic health records (EHRs, such as those created by an anesthesia management system) generate a large amount of data that can notably be reused for clinical audits and scientific research. The sharing of these data and tools is generally affected by the lack of system interoperability. To overcome these issues, Observational Health Data Sciences and Informatics (OHDSI) developed the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) to standardize EHR data and promote large-scale observational and longitudinal research. Anesthesia data have not previously been mapped into the OMOP CDM.ObjectiveThe primary objective was to transform anesthesia data into the OMOP CDM. The secondary objective was to provide vocabularies, queries, and dashboards that might promote the exploitation and sharing of anesthesia data through the CDM.MethodsUsing our local anesthesia data warehouse, a group of 5 experts from 5 different medical centers identified local concepts related to anesthesia. The concepts were then matched with standard concepts in the OHDSI vocabularies. We performed structural mapping between the design of our local anesthesia data warehouse and the OMOP CDM tables and fields. To validate the implementation of anesthesia data into the OMOP CDM, we developed a set of queries and dashboards.ResultsWe identified 522 concepts related to anesthesia care. They were classified as demographics, units, measurements, operating room steps, drugs, periods of interest, and features. After semantic mapping, 353 (67.7%) of these anesthesia concepts were mapped to OHDSI concepts. Further, 169 (32.3%) concepts related to periods and features were added to the OHDSI vocabularies. Then, 8 OMOP CDM tables were implemented with anesthesia data and 2 new tables (EPISODE and FEATURE) were added to store secondarily computed data. We integrated data from 5,72,609 operations and provided the code for a set of 8 queries and 4 dashboards related to anesthesia care.ConclusionsGeneric data concerning demographics, drugs, units, measurements, and operating room steps were already available in OHDSI vocabularies. However, most of the intraoperative concepts (the duration of specific steps, an episode of hypotension, etc) were not present in OHDSI vocabularies. The OMOP mapping provided here enables anesthesia data reuse.
Project description:BackgroundPatient-monitoring software generates a large amount of data that can be reused for clinical audits and scientific research. The Observational Health Data Sciences and Informatics (OHDSI) consortium developed the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) to standardize electronic health record data and promote large-scale observational and longitudinal research.ObjectiveThis study aimed to transform primary care data into the OMOP CDM format.MethodsWe extracted primary care data from electronic health records at a multidisciplinary health center in Wattrelos, France. We performed structural mapping between the design of our local primary care database and the OMOP CDM tables and fields. Local French vocabularies concepts were mapped to OHDSI standard vocabularies. To validate the implementation of primary care data into the OMOP CDM format, we applied a set of queries. A practical application was achieved through the development of a dashboard.ResultsData from 18,395 patients were implemented into the OMOP CDM, corresponding to 592,226 consultations over a period of 20 years. A total of 18 OMOP CDM tables were implemented. A total of 17 local vocabularies were identified as being related to primary care and corresponded to patient characteristics (sex, location, year of birth, and race), units of measurement, biometric measures, laboratory test results, medical histories, and drug prescriptions. During semantic mapping, 10,221 primary care concepts were mapped to standard OHDSI concepts. Five queries were used to validate the OMOP CDM by comparing the results obtained after the completion of the transformations with the results obtained in the source software. Lastly, a prototype dashboard was developed to visualize the activity of the health center, the laboratory test results, and the drug prescription data.ConclusionsPrimary care data from a French health care facility have been implemented into the OMOP CDM format. Data concerning demographics, units, measurements, and primary care consultation steps were already available in OHDSI vocabularies. Laboratory test results and drug prescription data were mapped to available vocabularies and structured in the final model. A dashboard application provided health care professionals with feedback on their practice.
Project description:BACKGROUND:The development and adoption of health care common data models (CDMs) has addressed some of the logistical challenges of performing research on data generated from disparate health care systems by standardizing data representations and leveraging standardized terminology to express clinical information consistently. However, transforming a data system into a CDM is not a trivial task, and maintaining an operational, enterprise capable CDM that is incrementally updated within a data warehouse is challenging. OBJECTIVES:To develop a quality assurance (QA) process and code base to accompany our incremental transformation of the Department of Veterans Affairs Corporate Data Warehouse health care database into the Observational Medical Outcomes Partnership (OMOP) CDM to prevent incremental load errors. METHODS:We designed and implemented a multistage QA) approach centered on completeness, value conformance, and relational conformance data-quality elements. For each element we describe key incremental load challenges, our extract, transform, and load (ETL) solution of data to overcome those challenges, and potential impacts of incremental load failure. RESULTS:Completeness and value conformance data-quality elements are most affected by incremental changes to the CDW, while updates to source identifiers impact relational conformance. ETL failures surrounding these elements lead to incomplete and inaccurate capture of clinical concepts as well as data fragmentation across patients, providers, and locations. CONCLUSION:Development of robust QA processes supporting accurate transformation of OMOP and other CDMs from source data is still in evolution, and opportunities exist to extend the existing QA framework and tools used for incremental ETL QA processes.
Project description:ObjectiveTo develop a framework for identifying temporal clinical event trajectories from Observational Medical Outcomes Partnership-formatted observational healthcare data.Materials and methodsA 4-step framework based on significant temporal event pair detection is described and implemented as an open-source R package. It is used on a population-based Estonian dataset to first replicate a large Danish population-based study and second, to conduct a disease trajectory detection study for type 2 diabetes patients in the Estonian and Dutch databases as an example.ResultsAs a proof of concept, we apply the methods in the Estonian database and provide a detailed breakdown of our findings. All Estonian population-based event pairs are shown. We compare the event pairs identified from Estonia to Danish and Dutch data and discuss the causes of the differences. The overlap in the results was only 2.4%, which highlights the need for running similar studies in different populations.ConclusionsFor the first time, there is a complete software package for detecting disease trajectories in health data.
Project description:BackgroundGiven the geographical sparsity of Rare Diseases (RDs), assembling a cohort is often a challenging task. Common data models (CDM) can harmonize disparate sources of data that can be the basis of decision support systems and artificial intelligence-based studies, leading to new insights in the field. This work is sought to support the design of large-scale multi-center studies for rare diseases.MethodsIn an interdisciplinary group, we derived a list of elements of RDs in three medical domains (endocrinology, gastroenterology, and pneumonology) according to specialist knowledge and clinical guidelines in an iterative process. We then defined a RDs data structure that matched all our data elements and built Extract, Transform, Load (ETL) processes to transfer the structure to a joint CDM. To ensure interoperability of our developed CDM and its subsequent usage for further RDs domains, we ultimately mapped it to Observational Medical Outcomes Partnership (OMOP) CDM. We then included a fourth domain, hematology, as a proof-of-concept and mapped an acute myeloid leukemia (AML) dataset to the developed CDM.ResultsWe have developed an OMOP-based rare diseases common data model (RD-CDM) using data elements from the three domains (endocrinology, gastroenterology, and pneumonology) and tested the CDM using data from the hematology domain. The total study cohort included 61,697 patients. After aligning our modules with those of Medical Informatics Initiative (MII) Core Dataset (CDS) modules, we leveraged its ETL process. This facilitated the seamless transfer of demographic information, diagnoses, procedures, laboratory results, and medication modules from our RD-CDM to the OMOP. For the phenotypes and genotypes, we developed a second ETL process. We finally derived lessons learned for customizing our RD-CDM for different RDs.DiscussionThis work can serve as a blueprint for other domains as its modularized structure could be extended towards novel data types. An interdisciplinary group of stakeholders that are actively supporting the project's progress is necessary to reach a comprehensive CDM.ConclusionThe customized data structure related to our RD-CDM can be used to perform multi-center studies to test data-driven hypotheses on a larger scale and take advantage of the analytical tools offered by the OHDSI community.
Project description:PurposeEvaluate the degree of concept coverage of the general eye examination in one widely used electronic health record (EHR) system using the Observational Health Data Sciences and Informatics Observational Medical Outcomes Partnership (OMOP) common data model (CDM).DesignStudy of data elements.ParticipantsNot applicable.MethodsData elements (field names and predefined entry values) from the general eye examination in the Epic foundation system were mapped to OMOP concepts and analyzed. Each mapping was given a Health Level 7 equivalence designation-equal when the OMOP concept had the same meaning as the source EHR concept, wider when it was missing information, narrower when it was overly specific, and unmatched when there was no match. Initial mappings were reviewed by 2 graders. Intergrader agreement for equivalence designation was calculated using Cohen's kappa. Agreement on the mapped OMOP concept was calculated as a percentage of total mappable concepts. Discrepancies were discussed and a final consensus created. Quantitative analysis was performed on wider and unmatched concepts.Main outcome measuresGaps in OMOP concept coverage of EHR elements and intergrader agreement of mapped OMOP concepts.ResultsA total of 698 data elements (210 fields, 488 values) from the EHR were analyzed. The intergrader kappa on the equivalence designation was 0.88 (standard error 0.03, P < 0.001). There was a 96% agreement on the mapped OMOP concept. In the final consensus mapping, 25% (1% fields, 31% values) of the EHR to OMOP concept mappings were considered equal, 50% (27% fields, 60% values) wider, 4% (8% fields, 2% values) narrower, and 21% (52% fields, 8% values) unmatched. Of the wider mapped elements, 46% were missing the laterality specification, 24% had other missing attributes, and 30% had both issues. Wider and unmatched EHR elements could be found in all areas of the general eye examination.ConclusionsMost data elements in the general eye examination could not be represented precisely using the OMOP CDM. Our work suggests multiple ways to improve the incorporation of important ophthalmology concepts in OMOP, including adding laterality to existing concepts. There exists a strong need to improve the coverage of ophthalmic concepts in source vocabularies so that the OMOP CDM can better accommodate vision research.Financial disclosuresProprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Project description:BackgroundCancer staging information is an essential component of cancer research. However, the information is primarily stored as either a full or semistructured free-text clinical document which is limiting the data use. By transforming the cancer-specific data to the Observational Medical Outcome Partnership Common Data Model (OMOP CDM), the information can contribute to establish multicenter observational cancer studies. To the best of our knowledge, there have been no studies on OMOP CDM transformation and natural language processing (NLP) for thyroid cancer to date.ObjectiveWe aimed to demonstrate the applicability of the OMOP CDM oncology extension module for thyroid cancer diagnosis and cancer stage information by processing free-text medical reports.MethodsThyroid cancer diagnosis and stage-related modifiers were extracted with rule-based NLP from 63,795 thyroid cancer pathology reports and 56,239 Iodine whole-body scan reports from three medical institutions in the Observational Health Data Sciences and Informatics data network. The data were converted into the OMOP CDM v6.0 according to the OMOP CDM oncology extension module. The cancer staging group was derived and populated using the transformed CDM data.ResultsThe extracted thyroid cancer data were completely converted into the OMOP CDM. The distributions of histopathological types of thyroid cancer were approximately 95.3 to 98.8% of papillary carcinoma, 0.9 to 3.7% of follicular carcinoma, 0.04 to 0.54% of adenocarcinoma, 0.17 to 0.81% of medullary carcinoma, and 0 to 0.3% of anaplastic carcinoma. Regarding cancer staging, stage-I thyroid cancer accounted for 55 to 64% of the cases, while stage III accounted for 24 to 26% of the cases. Stage-II and -IV thyroid cancers were detected at a low rate of 2 to 6%.ConclusionAs a first study on OMOP CDM transformation and NLP for thyroid cancer, this study will help other institutions to standardize thyroid cancer-specific data for retrospective observational research and participate in multicenter studies.
Project description:The role of statins in chronic kidney disease (CKD) has been extensively evaluated, but it remains controversial in specific population such as dialysis-dependent CKD. This study examined the effect of statins on mortality in CKD patients using two large databases. In data from the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM) from two hospitals, CKD was defined as an estimated glomerular filtration rate < 60 mL/min/m2; we compared survival between patients with or without statin treatment. As a sensitivity analysis, the results were validated with the Korea National Health Insurance (KNHI) claims database. In the analysis of CDM datasets, statin users showed significantly lower risks of all-cause and cardiovascular mortality in both hospitals, compared to non-users. Similar results were observed in CKD patients from the KNHI claims database. Lower mortality in the statin group was consistently evident in all subgroup analyses, including patients on dialysis and low-risk young patients. In conclusion, we found that statins were associated with lower mortality in CKD patients, regardless of dialysis status or other risk factors.
Project description:BackgroundAccurate hospital length of stay (LoS) prediction enables efficient resource management. Conventional LoS prediction models with limited covariates and nonstandardized data have limited reproducibility when applied to the general population.ObjectiveIn this study, we developed and validated a machine learning (ML)-based LoS prediction model for planned admissions using the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM).MethodsRetrospective patient-level prediction models used electronic health record (EHR) data converted to the OMOP CDM (version 5.3) from Seoul National University Bundang Hospital (SNUBH) in South Korea. The study included 137,437 hospital admission episodes between January 2016 and December 2020. Covariates from the patient, condition occurrence, medication, observation, measurement, procedure, and visit occurrence tables were included in the analysis. To perform feature selection, we applied Lasso regularization in the logistic regression. The primary outcome was an LoS of 7 days or longer, while the secondary outcome was an LoS of 3 days or longer. The prediction models were developed using 6 ML algorithms, with the training and test set split in a 7:3 ratio. The performance of each model was evaluated based on the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). Shapley Additive Explanations (SHAP) analysis measured feature importance, while calibration plots assessed the reliability of the prediction models. External validation of the developed models occurred at an independent institution, the Seoul National University Hospital.ResultsThe final sample included 129,938 patient entry events in the planned admissions. The Extreme Gradient Boosting (XGB) model achieved the best performance in binary classification for predicting an LoS of 7 days or longer, with an AUROC of 0.891 (95% CI 0.887-0.894) and an AUPRC of 0.819 (95% CI 0.813-0.826) on the internal test set. The Light Gradient Boosting (LGB) model performed the best in the multiclassification for predicting an LoS of 3 days or more, with an AUROC of 0.901 (95% CI 0.898-0.904) and an AUPRC of 0.770 (95% CI 0.762-0.779). The most important features contributing to the models were the operation performed, frequency of previous outpatient visits, patient admission department, age, and day of admission. The RF model showed robust performance in the external validation set, achieving an AUROC of 0.804 (95% CI 0.802-0.807).ConclusionsThe use of the OMOP CDM in predicting hospital LoS for planned admissions demonstrates promising predictive capabilities for stays of varying durations. It underscores the advantage of standardized data in achieving reproducible results. This approach should serve as a model for enhancing operational efficiency and patient care coordination across health care settings.