Supporting interoperability of genetic data with LOINC.
ABSTRACT: Electronic reporting of genetic testing results is increasing, but they are often represented in diverse formats and naming conventions. Logical Observation Identifiers Names and Codes (LOINC) is a vocabulary standard that provides universal identifiers for laboratory tests and clinical observations. In genetics, LOINC provides codes to improve interoperability in the midst of reporting style transition, including codes for cytogenetic or mutation analysis tests, specific chromosomal alteration or mutation testing, and fully structured discrete genetic test reporting. LOINC terms follow the recommendations and nomenclature of other standards such as the Human Genome Organization Gene Nomenclature Committee's terminology for gene names. In addition to the narrative text they report now, we recommend that laboratories always report as discrete variables chromosome analysis results, genetic variation(s) found, and genetic variation(s) tested for. By adopting and implementing data standards like LOINC, information systems can help care providers and researchers unlock the potential of genetic information for delivering more personalized care.
Project description:Logical Observation Identifiers Names and Codes (LOINC) is the most widely used controlled vocabulary to identify laboratory tests. A given laboratory test can often be reported in more than 1 unit of measure (eg, grams or moles), and LOINC defines unique codes for each unit. Consequently, an identical laboratory test performed by 2 different clinical laboratories may have different LOINC codes. The absence of unit conversions between compatible LOINC codes impedes data aggregation and analysis of laboratory results. To develop such conversions, a computational process was developed to review the LOINC standard for potential conversions, and multiple expert reviewers oversaw and finalized the conversion list. In all, 285 bidirectional conversions were identified, including conversions for routine clinical tests such as sodium, magnesium, and human immunodeficiency virus (HIV). Unit conversions were applied to the aggregation of laboratory test results to demonstrate their usefulness. Diverse informatics projects may benefit from the ability to interconvert compatible results.
Project description:Objective:Standards such as the Logical Observation Identifiers Names and Codes (LOINC®) are critical for interoperability and integrating data into common data models, but are inconsistently used. Without consistent mapping to standards, clinical data cannot be harmonized, shared, or interpreted in a meaningful context. We sought to develop an automated machine learning pipeline that leverages noisy labels to map laboratory data to LOINC codes. Materials and Methods:Across 130 sites in the Department of Veterans Affairs Corporate Data Warehouse, we selected the 150 most commonly used laboratory tests with numeric results per site from 2000 through 2016. Using source data text and numeric fields, we developed a machine learning model and manually validated random samples from both labeled and unlabeled datasets. Results:The raw laboratory data consisted of >6.5 billion test results, with 2215 distinct LOINC codes. The model predicted the correct LOINC code in 85% of the unlabeled data and 96% of the labeled data by test frequency. In the subset of labeled data where the original and model-predicted LOINC codes disagreed, the model-predicted LOINC code was correct in 83% of the data by test frequency. Conclusion:Using a completely automated process, we are able to assign LOINC codes to unlabeled data with high accuracy. When the model-predicted LOINC code differed from the original LOINC code, the model prediction was correct in the vast majority of cases. This scalable, automated algorithm may improve data quality and interoperability, while substantially reducing the manual effort currently needed to accurately map laboratory data.
Project description:Objectives:We aimed to gain a better understanding of how standardization of laboratory data can impact predictive model performance in multi-site datasets. We hypothesized that standardizing local laboratory codes to logical observation identifiers names and codes (LOINC) would produce predictive models that significantly outperform those learned utilizing local laboratory codes. Materials and Methods:We predicted 30-day hospital readmission for a set of heart failure-specific visits to 13 hospitals from 2008 to 2012. Laboratory test results were extracted and then manually cleaned and mapped to LOINC. We extracted features to summarize laboratory data for each patient and used a training dataset (2008-2011) to learn models using a variety of feature selection techniques and classifiers. We evaluated our hypothesis by comparing model performance on an independent test dataset (2012). Results:Models that utilized LOINC performed significantly better than models that utilized local laboratory test codes, regardless of the feature selection technique and classifier approach used. Discussion and Conclusion:We quantitatively demonstrated the positive impact of standardizing multi-site laboratory data to LOINC prior to use in predictive models. We used our findings to argue for the need for detailed reporting of data standardization procedures in predictive modeling, especially in studies leveraging multi-site datasets extracted from electronic health records.
Project description:OBJECTIVE: To address the problem of mapping local laboratory terminologies to Logical Observation Identifiers Names and Codes (LOINC). To study different ontology matching algorithms and investigate how the probability of term combinations in LOINC helps to increase match quality and reduce manual effort. MATERIALS AND METHODS: We proposed two matching strategies: full name and multi-part. The multi-part approach also considers the occurrence probability of combined concept parts. It can further recommend possible combinations of concept parts to allow more local terms to be mapped. Three real-world laboratory databases from Taiwanese hospitals were used to validate the proposed strategies with respect to different quality measures and execution run time. A comparison with the commonly used tool, Regenstrief LOINC Mapping Assistant (RELMA) Lab Auto Mapper (LAM), was also carried out. RESULTS: The new multi-part strategy yields the best match quality, with F-measure values between 89% and 96%. It can automatically match 70-85% of the laboratory terminologies to LOINC. The recommendation step can further propose mapping to (proposed) LOINC concepts for 9-20% of the local terminology concepts. On average, 91% of the local terminology concepts can be correctly mapped to existing or newly proposed LOINC concepts. CONCLUSIONS: The mapping quality of the multi-part strategy is significantly better than that of LAM. It enables domain experts to perform LOINC matching with little manual work. The probability of term combinations proved to be a valuable strategy for increasing the quality of match results, providing recommendations for proposed LOINC conepts, and decreasing the run time for match processing.
Project description:The LOINC-RSNA Radiology Playbook represents the future direction of standardization for radiology procedure names. We developed a software solution ("RadMatch") utilizing Python 2.7 and FuzzyWuzzy, an open-source fuzzy string matching algorithm created by SeatGeek, to implement the LOINC-RSNA Radiology Playbook for adult abdomen and pelvis CT and MR procedures performed at our institution. Execution of this semi-automated method resulted in the assignment of appropriate LOINC numbers to 86% of local CT procedures. For local MR procedures, appropriate LOINC numbers were assigned to 75% of these procedures whereas 12.5% of local MR procedures could only be partially mapped. For the standardized local procedures, only 63% of CT and 71% of MR procedures had corresponding RadLex Playbook identifier (RPID) codes in the LOINC-RSNA Radiology Playbook, which limited the utility of RPID codes. RadMatch is a semi-automated open-source software tool that can assist radiology departments seeking to standardize their radiology procedures via implementation of the LOINC-RSNA Radiology Playbook.
Project description:The PhenX Toolkit provides researchers with recommended, well-established, low-burden measures suitable for human subject research. The database of Genotypes and Phenotypes (dbGaP) is the data repository for a variety of studies funded by the National Institutes of Health, including genome-wide association studies. The dbGaP requires that investigators provide a data dictionary of study variables as part of the data submission process. Thus, dbGaP is a unique resource that can help investigators identify studies that share the same or similar variables. As a proof of concept, variables from 16 studies deposited in dbGaP were mapped to PhenX measures. Soon, investigators will be able to search dbGaP using PhenX variable identifiers and find comparable and related variables in these 16 studies. To enhance effective data exchange, PhenX measures, protocols, and variables were modeled in Logical Observation Identifiers Names and Codes (LOINC® ). PhenX domains and measures are also represented in the Cancer Data Standards Registry and Repository (caDSR). Associating PhenX measures with existing standards (LOINC® and caDSR) and mapping to dbGaP study variables extends the utility of these measures by revealing new opportunities for cross-study analysis.
Project description:The National Health Information Standards Committee was established in 2004 in Korea. The practical subcommittee for laboratory test terminology was placed in charge of standardizing laboratory medicine terminology in Korean. We aimed to establish a standardized Korean laboratory terminology database, Korea-Logical Observation Identifier Names and Codes (K-LOINC) based on former products sponsored by this committee. The primary product was revised based on the opinions of specialists. Next, we mapped the electronic data interchange (EDI) codes that were revised in 2014, to the corresponding K-LOINC. We established a database of synonyms, including the laboratory codes of three reference laboratories and four tertiary hospitals in Korea. Furthermore, we supplemented the clinical microbiology section of K-LOINC using an alternative mapping strategy. We investigated other systems that utilize laboratory codes in order to investigate the compatibility of K-LOINC with statistical standards for a number of tests. A total of 48,990 laboratory codes were adopted (21,539 new and 16,330 revised). All of the LOINC synonyms were translated into Korean, and 39,347 Korean synonyms were added. Moreover, 21,773 synonyms were added from reference laboratories and tertiary hospitals. Alternative strategies were established for mapping within the microbiology domain. When we applied these to a smaller hospital, the mapping rate was successfully increased. Finally, we confirmed K-LOINC compatibility with other statistical standards, including a newly proposed EDI code system. This project successfully established an up-to-date standardized Korean laboratory terminology database, as well as an updated EDI mapping to facilitate the introduction of standard terminology into institutions.
Project description:The National Cancer Institute (NCI) is developing an integrated biomedical informatics infrastructure, the cancer Biomedical Informatics Grid (caBIG), to support collaboration within the cancer research community. A key part of the caBIG architecture is the establishment of terminology standards for representing data. In order to evaluate the suitability of existing controlled terminologies, the caBIG Vocabulary and Data Elements Workspace (VCDE WS) working group has developed a set of criteria that serve to assess a terminology's structure, content, documentation, and editorial process. This paper describes the evolution of these criteria and the results of their use in evaluating four standard terminologies: the Gene Ontology (GO), the NCI Thesaurus (NCIt), the Common Terminology for Adverse Events (known as CTCAE), and the laboratory portion of the Logical Objects, Identifiers, Names and Codes (LOINC). The resulting caBIG criteria are presented as a matrix that may be applicable to any terminology standardization effort.
Project description:Electronic Health Record (EHR) systems typically define laboratory test results using the Laboratory Observation Identifier Names and Codes (LOINC) and can transmit them using Fast Healthcare Interoperability Resource (FHIR) standards. LOINC has not yet been semantically integrated with computational resources for phenotype analysis. Here, we provide a method for mapping LOINC-encoded laboratory test results transmitted in FHIR standards to Human Phenotype Ontology (HPO) terms. We annotated the medical implications of 2923 commonly used laboratory tests with HPO terms. Using these annotations, our software assesses laboratory test results and converts each result into an HPO term. We validated our approach with EHR data from 15,681 patients with respiratory complaints and identified known biomarkers for asthma. Finally, we provide a freely available SMART on FHIR application that can be used within EHR systems. Our approach allows readily available laboratory tests in EHR to be reused for deep phenotyping and exploits the hierarchical structure of HPO to integrate distinct tests that have comparable medical interpretations for association studies.