ABSTRACT: Logical Observation Identifiers Names and Codes (LOINC) is the most widely used controlled vocabulary to identify laboratory tests. A given laboratory test can often be reported in more than 1 unit of measure (eg, grams or moles), and LOINC defines unique codes for each unit. Consequently, an identical laboratory test performed by 2 different clinical laboratories may have different LOINC codes. The absence of unit conversions between compatible LOINC codes impedes data aggregation and analysis of laboratory results. To develop such conversions, a computational process was developed to review the LOINC standard for potential conversions, and multiple expert reviewers oversaw and finalized the conversion list. In all, 285 bidirectional conversions were identified, including conversions for routine clinical tests such as sodium, magnesium, and human immunodeficiency virus (HIV). Unit conversions were applied to the aggregation of laboratory test results to demonstrate their usefulness. Diverse informatics projects may benefit from the ability to interconvert compatible results.
Project description:Electronic reporting of genetic testing results is increasing, but they are often represented in diverse formats and naming conventions. Logical Observation Identifiers Names and Codes (LOINC) is a vocabulary standard that provides universal identifiers for laboratory tests and clinical observations. In genetics, LOINC provides codes to improve interoperability in the midst of reporting style transition, including codes for cytogenetic or mutation analysis tests, specific chromosomal alteration or mutation testing, and fully structured discrete genetic test reporting. LOINC terms follow the recommendations and nomenclature of other standards such as the Human Genome Organization Gene Nomenclature Committee's terminology for gene names. In addition to the narrative text they report now, we recommend that laboratories always report as discrete variables chromosome analysis results, genetic variation(s) found, and genetic variation(s) tested for. By adopting and implementing data standards like LOINC, information systems can help care providers and researchers unlock the potential of genetic information for delivering more personalized care.
Project description:Objective:Standards such as the Logical Observation Identifiers Names and Codes (LOINC®) are critical for interoperability and integrating data into common data models, but are inconsistently used. Without consistent mapping to standards, clinical data cannot be harmonized, shared, or interpreted in a meaningful context. We sought to develop an automated machine learning pipeline that leverages noisy labels to map laboratory data to LOINC codes. Materials and Methods:Across 130 sites in the Department of Veterans Affairs Corporate Data Warehouse, we selected the 150 most commonly used laboratory tests with numeric results per site from 2000 through 2016. Using source data text and numeric fields, we developed a machine learning model and manually validated random samples from both labeled and unlabeled datasets. Results:The raw laboratory data consisted of >6.5 billion test results, with 2215 distinct LOINC codes. The model predicted the correct LOINC code in 85% of the unlabeled data and 96% of the labeled data by test frequency. In the subset of labeled data where the original and model-predicted LOINC codes disagreed, the model-predicted LOINC code was correct in 83% of the data by test frequency. Conclusion:Using a completely automated process, we are able to assign LOINC codes to unlabeled data with high accuracy. When the model-predicted LOINC code differed from the original LOINC code, the model prediction was correct in the vast majority of cases. This scalable, automated algorithm may improve data quality and interoperability, while substantially reducing the manual effort currently needed to accurately map laboratory data.
Project description:Objectives:We aimed to gain a better understanding of how standardization of laboratory data can impact predictive model performance in multi-site datasets. We hypothesized that standardizing local laboratory codes to logical observation identifiers names and codes (LOINC) would produce predictive models that significantly outperform those learned utilizing local laboratory codes. Materials and Methods:We predicted 30-day hospital readmission for a set of heart failure-specific visits to 13 hospitals from 2008 to 2012. Laboratory test results were extracted and then manually cleaned and mapped to LOINC. We extracted features to summarize laboratory data for each patient and used a training dataset (2008-2011) to learn models using a variety of feature selection techniques and classifiers. We evaluated our hypothesis by comparing model performance on an independent test dataset (2012). Results:Models that utilized LOINC performed significantly better than models that utilized local laboratory test codes, regardless of the feature selection technique and classifier approach used. Discussion and Conclusion:We quantitatively demonstrated the positive impact of standardizing multi-site laboratory data to LOINC prior to use in predictive models. We used our findings to argue for the need for detailed reporting of data standardization procedures in predictive modeling, especially in studies leveraging multi-site datasets extracted from electronic health records.
Project description:OBJECTIVE: To address the problem of mapping local laboratory terminologies to Logical Observation Identifiers Names and Codes (LOINC). To study different ontology matching algorithms and investigate how the probability of term combinations in LOINC helps to increase match quality and reduce manual effort. MATERIALS AND METHODS: We proposed two matching strategies: full name and multi-part. The multi-part approach also considers the occurrence probability of combined concept parts. It can further recommend possible combinations of concept parts to allow more local terms to be mapped. Three real-world laboratory databases from Taiwanese hospitals were used to validate the proposed strategies with respect to different quality measures and execution run time. A comparison with the commonly used tool, Regenstrief LOINC Mapping Assistant (RELMA) Lab Auto Mapper (LAM), was also carried out. RESULTS: The new multi-part strategy yields the best match quality, with F-measure values between 89% and 96%. It can automatically match 70-85% of the laboratory terminologies to LOINC. The recommendation step can further propose mapping to (proposed) LOINC concepts for 9-20% of the local terminology concepts. On average, 91% of the local terminology concepts can be correctly mapped to existing or newly proposed LOINC concepts. CONCLUSIONS: The mapping quality of the multi-part strategy is significantly better than that of LAM. It enables domain experts to perform LOINC matching with little manual work. The probability of term combinations proved to be a valuable strategy for increasing the quality of match results, providing recommendations for proposed LOINC conepts, and decreasing the run time for match processing.
Project description:The LOINC-RSNA Radiology Playbook represents the future direction of standardization for radiology procedure names. We developed a software solution ("RadMatch") utilizing Python 2.7 and FuzzyWuzzy, an open-source fuzzy string matching algorithm created by SeatGeek, to implement the LOINC-RSNA Radiology Playbook for adult abdomen and pelvis CT and MR procedures performed at our institution. Execution of this semi-automated method resulted in the assignment of appropriate LOINC numbers to 86% of local CT procedures. For local MR procedures, appropriate LOINC numbers were assigned to 75% of these procedures whereas 12.5% of local MR procedures could only be partially mapped. For the standardized local procedures, only 63% of CT and 71% of MR procedures had corresponding RadLex Playbook identifier (RPID) codes in the LOINC-RSNA Radiology Playbook, which limited the utility of RPID codes. RadMatch is a semi-automated open-source software tool that can assist radiology departments seeking to standardize their radiology procedures via implementation of the LOINC-RSNA Radiology Playbook.
Project description:The National Health Information Standards Committee was established in 2004 in Korea. The practical subcommittee for laboratory test terminology was placed in charge of standardizing laboratory medicine terminology in Korean. We aimed to establish a standardized Korean laboratory terminology database, Korea-Logical Observation Identifier Names and Codes (K-LOINC) based on former products sponsored by this committee. The primary product was revised based on the opinions of specialists. Next, we mapped the electronic data interchange (EDI) codes that were revised in 2014, to the corresponding K-LOINC. We established a database of synonyms, including the laboratory codes of three reference laboratories and four tertiary hospitals in Korea. Furthermore, we supplemented the clinical microbiology section of K-LOINC using an alternative mapping strategy. We investigated other systems that utilize laboratory codes in order to investigate the compatibility of K-LOINC with statistical standards for a number of tests. A total of 48,990 laboratory codes were adopted (21,539 new and 16,330 revised). All of the LOINC synonyms were translated into Korean, and 39,347 Korean synonyms were added. Moreover, 21,773 synonyms were added from reference laboratories and tertiary hospitals. Alternative strategies were established for mapping within the microbiology domain. When we applied these to a smaller hospital, the mapping rate was successfully increased. Finally, we confirmed K-LOINC compatibility with other statistical standards, including a newly proposed EDI code system. This project successfully established an up-to-date standardized Korean laboratory terminology database, as well as an updated EDI mapping to facilitate the introduction of standard terminology into institutions.
Project description:Electronic Health Record (EHR) systems typically define laboratory test results using the Laboratory Observation Identifier Names and Codes (LOINC) and can transmit them using Fast Healthcare Interoperability Resource (FHIR) standards. LOINC has not yet been semantically integrated with computational resources for phenotype analysis. Here, we provide a method for mapping LOINC-encoded laboratory test results transmitted in FHIR standards to Human Phenotype Ontology (HPO) terms. We annotated the medical implications of 2923 commonly used laboratory tests with HPO terms. Using these annotations, our software assesses laboratory test results and converts each result into an HPO term. We validated our approach with EHR data from 15,681 patients with respiratory complaints and identified known biomarkers for asthma. Finally, we provide a freely available SMART on FHIR application that can be used within EHR systems. Our approach allows readily available laboratory tests in EHR to be reused for deep phenotyping and exploits the hierarchical structure of HPO to integrate distinct tests that have comparable medical interpretations for association studies.
Project description:Background:We recently confirmed that the deactivation of T. reesei cellulases at the air-liquid interface reduces microcrystalline cellulose conversion at low enzyme loadings in shaken flasks. It is one of the main causes for lowering of cellulose conversions at low enzyme loadings. However, supplementing cellulases with small quantities of surface-active additives in shaken flasks can increase cellulose conversions at low enzyme loadings. It was also shown that cellulose conversions at low enzyme loadings can be increased in unshaken flasks if the reactions are carried for a longer time. This study further explores these recent findings to better understand the impact of air-liquid interfacial phenomena on enzymatic hydrolysis of cellulose contained in Avicel, Sigmacell, ?-cellulose, cotton linters, and filter paper. The impacts of solids and enzyme loadings, supplementation with nonionic surfactant Tween 20 and xylanases, and application of different types of mixing and reactor designs on cellulose hydrolysis were also evaluated. Results:Avicel cellulose conversions at high solid loading were more than doubled by minimizing loss of cellulases to the air-liquid interface. Maximum cellulose conversions were high for surface-active supplemented shaken flasks or unshaken flasks because of low cellulase deactivation at the air-liquid interface. The nonionic surfactant Tween 20 was unable to completely prevent cellulase deactivation in shaken flasks and only reduced cellulose conversions at unreasonably high concentrations. Conclusions:High dynamic interfacial areas created through baffles in reactor vessels, low volumes in high-capacity vessels, or high shaking speeds severely limited cellulose conversions at low enzyme loadings. Precipitation of cellulases due to aggregation at the air-liquid interface caused their continuous deactivation in shaken flasks and severely limited solubilization of cellulose.
Project description:Gene conversion can have a profound impact on both the short- and long-term evolution of genes and genomes. Here, we examined the gene families that are located on the X-chromosomes of human (Homo sapiens), chimpanzee (Pan troglodytes), mouse (Mus musculus) and rat (Rattus norvegicus) for evidence of gene conversion. We identified seven gene families (WD repeat protein family, Ferritin Heavy Chain family, RAS-related Protein RAB-40 family, Diphosphoinositol polyphosphate phosphohydrolase family, Transcription Elongation Factor A family, LDOC1-related family, Zinc Finger Protein ZIC, and GLI family) that show evidence of gene conversion. Through phylogenetic analyses and synteny evidence, we show that gene conversion has played an important role in the evolution of these gene families and that gene conversion has occurred independently in both primates and rodents. Comparing the results with those of two gene conversion prediction programs (GENECONV and Partimatrix), we found that both GENECONV and Partimatrix have very high false negative rates (i.e. failed to predict gene conversions), which leads to many undetected gene conversions. The combination of phylogenetic analyses with physical synteny evidence exhibits high resolution in the detection of gene conversions.