Determining correspondences between high-frequency MedDRA concepts and SNOMED: a case study.
ABSTRACT: BACKGROUND: The Systematic Nomenclature of Medicine Clinical Terms (SNOMED CT) is being advocated as the foundation for encoding clinical documentation. While the electronic medical record is likely to play a critical role in pharmacovigilance - the detection of adverse events due to medications - classification and reporting of Adverse Events is currently based on the Medical Dictionary of Regulatory Activities (MedDRA). Complete and high-quality MedDRA-to-SNOMED CT mappings can therefore facilitate pharmacovigilance. The existing mappings, as determined through the Unified Medical Language System (UMLS), are partial, and record only one-to-one correspondences even though SNOMED CT can be used compositionally. Efforts to map previously unmapped MedDRA concepts would be most productive if focused on concepts that occur frequently in actual adverse event data. We aimed to identify aspects of MedDRA that complicate mapping to SNOMED CT, determine pattern in unmapped high-frequency MedDRA concepts, and to identify types of integration errors in the mapping of MedDRA to UMLS. METHODS: Using one years' data from the US Federal Drug Administrations Adverse Event Reporting System, we identified MedDRA preferred terms that collectively accounted for 95% of both Adverse Events and Therapeutic Indications records. After eliminating those already mapping to SNOMED CT, we attempted to map the remaining 645 Adverse-Event and 141 Therapeutic-Indications preferred terms with software assistance. RESULTS: All but 46 Adverse-Event and 7 Therapeutic-Indications preferred terms could be composed using SNOMED CT concepts: none of these required more than 3 SNOMED CT concepts to compose. We describe the common composition patterns in the paper. About 30% of both Adverse-Event and Therapeutic-Indications Preferred Terms corresponded to single SNOMED CT concepts: the correspondence was detectable by human inspection but had been missed during the integration process, which had created duplicated concepts in UMLS. CONCLUSIONS: Identification of composite mapping patterns, and the types of errors that occur in the MedDRA content within UMLS, can focus larger-scale efforts on improving the quality of such mappings, which may assist in the creation of an adverse-events ontology.
Project description:BACKGROUND: In order to satisfy different needs, medical terminology systems must have richer structures. This study examines whether a Swedish primary health care version of the mono-hierarchical ICD-10 (KSH97-P) may obtain a richer structure using category and chapter mappings from KSH97-P to SNOMED CT and SNOMED CT's structure. Manually-built mappings from KSH97-P's categories and chapters to SNOMED CT's concepts are used as a starting point. RESULTS: The mappings are manually evaluated using computer-produced information and a small number of mappings are updated. A new and poly-hierarchical chapter division of KSH97-P's categories has been created using the category and chapter mappings and SNOMED CT's generic structure. In the new chapter division, most categories are included in their original chapters. A considerable number of concepts are included in other chapters than their original chapters. Most of these inclusions can be explained by ICD-10's design. KSH97-P's categories are also extended with attributes using the category mappings and SNOMED CT's defining attribute relationships. About three-fourths of all concepts receive an attribute of type Finding site and about half of all concepts receive an attribute of type Associated morphology. Other types of attributes are less common. CONCLUSIONS: It is possible to use mappings from KSH97-P to SNOMED CT and SNOMED CT's structure to enrich KSH97-P's mono-hierarchical structure with a poly-hierarchical chapter division and attributes of type Finding site and Associated morphology. The final mappings are available as additional files for this paper.
Project description:To identify challenges in mapping internal International Classification of Disease, 9th edition, Clinical Modification (ICD-9-CM) encoded legacy data to Systematic Nomenclature of Medicine (SNOMED), using SNOMED-prescribed compositional approaches where appropriate, and to explore the mapping coverage provided by the US National Library of Medicine (NLM)'s SNOMED clinical core subset.This study selected ICD-CM codes that occurred at least 100 times in the organization's problem list or diagnosis data in 2008. After eliminating codes whose exact mappings were already available in UMLS, the remainder were mapped manually with software assistance.Of the 2194 codes, 784 (35.7%) required manual mapping. 435 of these represented concept types documented in SNOMED as deprecated: these included the qualifying phrases such as 'not elsewhere classified'. A third of the codes were composite, requiring multiple SNOMED code to map. Representing 45 composite concepts required introducing disjunction ('or') or set-difference ('without') operators, which are not currently defined in SNOMED. Only 47% of the concepts required for composition were present in the clinical core subset. Search of SNOMED for the correct concepts often required extensive application of knowledge of both English and medical synonymy.Strategies to deal with legacy ICD data must address the issue of codes created by non-taxonomist users. The NLM core subset possibly needs augmentation with concepts from certain SNOMED hierarchies, notably qualifiers, body structures, substances/products and organisms. Concept-matching software needs to utilize query expansion strategies, but these may be effective in production settings only if a large but non-redundant SNOMED subset that minimizes the proportion of extensively pre-coordinated concepts is also available.
Project description:Semantic similarity measures estimate the similarity between concepts, and play an important role in many text processing tasks. Approaches to semantic similarity in the biomedical domain can be roughly divided into knowledge based and distributional based methods. Knowledge based approaches utilize knowledge sources such as dictionaries, taxonomies, and semantic networks, and include path finding measures and intrinsic information content (IC) measures. Distributional measures utilize, in addition to a knowledge source, the distribution of concepts within a corpus to compute similarity; these include corpus IC and context vector methods. Prior evaluations of these measures in the biomedical domain showed that distributional measures outperform knowledge based path finding methods; but more recent studies suggested that intrinsic IC based measures exceed the accuracy of distributional approaches. Limitations of previous evaluations of similarity measures in the biomedical domain include their focus on the SNOMED CT ontology, and their reliance on small benchmarks not powered to detect significant differences between measure accuracy. There have been few evaluations of the relative performance of these measures on other biomedical knowledge sources such as the UMLS, and on larger, recently developed semantic similarity benchmarks.We evaluated knowledge based and corpus IC based semantic similarity measures derived from SNOMED CT, MeSH, and the UMLS on recently developed semantic similarity benchmarks. Semantic similarity measures based on the UMLS, which contains SNOMED CT and MeSH, significantly outperformed those based solely on SNOMED CT or MeSH across evaluations. Intrinsic IC based measures significantly outperformed path-based and distributional measures. We released all code required to reproduce our results and all tools developed as part of this study as open source, available under http://code.google.com/p/ytex. We provide a publicly-accessible web service to compute semantic similarity, available under http://informatics.med.yale.edu/ytex.web/.Knowledge based semantic similarity measures are more practical to compute than distributional measures, as they do not require an external corpus. Furthermore, knowledge based measures significantly and meaningfully outperformed distributional measures on large semantic similarity benchmarks, suggesting that they are a practical alternative to distributional measures. Future evaluations of semantic similarity measures should utilize benchmarks powered to detect significant differences in measure accuracy.
Project description:The "Psychiatric Treatment Adverse Reactions" (PsyTAR) dataset contains patients' expression of effectiveness and adverse drug events associated with psychiatric medications. The PsyTAR was generated in four phases. In the first phase, a sample of 891 drugs reviews posted by patients on an online healthcare forum, "askapatient.com", was collected for four psychiatric drugs: Zoloft, Lexapro, Cymbalta, and Effexor XR. For each drug review, patient demographic information, duration of treatment, and satisfaction with the drugs were reported. In the second phase, sentence classification, drug reviews were split to 6009 sentences, and each sentence was labeled for the presence of Adverse Drug Reaction (ADR), Withdrawal Symptoms (WDs), Sign/Symptoms/Illness (SSIs), Drug Indications (DIs), Drug Effectiveness (EF), Drug Infectiveness (INF), and Others (not applicable). In the third phases, entities including ADRs (4813 mentions), WDs (590 mentions), SSIs (1219 mentions), and DIs (792 mentions) were identified and extracted from the sentences. In the four phases, all the identified entities were mapped to the corresponding UMLS Metathesaurus concepts (916) and SNOMED CT concepts (755). In this phase, qualifiers representing severity and persistency of ADRs, WDs, SSIs, and DIs (e.g., mild, short term) were identified. All sentences and identified entities were linked to the original post using IDs (e.g., Zoloft.1, Effexor.29, Cymbalta.31). The PsyTAR dataset can be accessed via Online Supplement #1 under the CC BY 4.0 Data license. The updated versions of the dataset would also be accessible in https://sites.google.com/view/pharmacovigilanceinpsychiatry/home.
Project description:Background: Searching into the MedDRA terminology is usually limited to a hierarchical search, and/or a string search. Our objective was to compare user performances when using a new kind of user interface enabling semantic queries versus classical methods, and evaluating term selection improvement in MedDRA. Methods: We implemented a forms-based web interface: OntoADR Query Tools (OQT). It relies on OntoADR, a formal resource describing MedDRA terms using SNOMED CT concepts and corresponding semantic relations, enabling terminological reasoning. We then compared time spent on five examples of medical conditions using OQT or the MedDRA web-based browser (MWB), and precision and recall of the term selection. Results: OntoADR Query Tools allows the user to search in MedDRA: One may enter search criteria by selecting one semantic property from a dropdown list and one or more SNOMED CT concepts related to the range of the chosen property. The user is assisted in building his query: he can add criteria and combine them. Then, the interface displays the set of MedDRA terms matching the query. Meanwhile, on average, the time spent on OQT (about 4 min 30 s) is significantly lower (-35%; p < 0.001) than time spent on MWB (about 7 min). The results of the System Usability Scale (SUS) gave a score of 62.19 for OQT (rated as good). We also demonstrated increased precision (+27%; p = 0.01) and recall (+34%; p = 0.02). Computed "performance" (correct terms found per minute) is more than three times better with OQT than with MWB. Discussion: This pilot study establishes the feasibility of our approach based on our initial assumption: performing MedDRA queries on the five selected medical conditions, using terminological reasoning, expedites term selection, and improves search capabilities for pharmacovigilance end users. Evaluation with a larger number of users and medical conditions are required in order to establish if OQT is appropriate for the needs of different user profiles, and to check if conclusions can be extended to other kinds of medical conditions. The application is currently limited by the non-exhaustive coverage of MedDRA by OntoADR, but nevertheless shows good performance which encourages continuing in the same direction.
Project description:INTRODUCTION AND OBJECTIVE: Social media has been suggested as a source for safety information, supplementing existing safety surveillance data sources. This article summarises the activities undertaken, and the associated challenges, to create a benchmark reference dataset that can be used to evaluate the performance of automated methods and systems for adverse event recognition. METHODS:A retrospective analysis of public English-language Twitter posts (Tweets) was performed. We sampled 57,473 Tweets out of 5,645,336 Tweets created between 1 March, 2012 and 1 March, 2015 that mentioned at least one of six medicinal products of interest (insulin glargine, levetiracetam, methylphenidate, sorafenib, terbinafine, zolpidem). Products, adverse events, indications, product-event combinations, and product-indication combinations were extracted and coded by two independent teams of safety reviewers. RESULTS:The benchmark reference dataset consisted of 1056 positive controls ("adverse event Tweets") and 56,417 negative controls ("non-adverse event Tweets"). The 1056 adverse event Tweets contained 1396 product-event combinations referring to personal adverse event experiences, comprising 292 different MedDRA® Preferred Terms. The 1171 product-event combinations (83.9%) were confined to four MedDRA® System Organ Classes. The 195 Tweets (18.5%) contained indication information, comprising 25 different Preferred Terms. CONCLUSIONS:A manually curated benchmark reference dataset based on Twitter data has been created and is made available to the research community to evaluate the performance of automated methods and systems for adverse event recognition in unstructured free-text information.
Project description:Objective:The purpose of this article is to describe the current nursing problem list subset of Systematized Nomenclature of Medicine Clinical Terms (NPLS) coverage of the American Nurses Association (ANA) recognized standardized nursing terminologies (SNTs) and to identify potential ways to expand and enhance the utility of this list. Materials and Methods:The study is a cross-sectional exploratory design. We mapped the content of the North American Nursing Diagnosis Association International (NANDA-I) (2018-2020), International Classification for Nursing Practice (ICNP) (2017 AB), Clinical Care Classification (CCC) (2018 AA), and Omaha System (2007AC) terminologies with each other and into NPLS (August 2017 edition) using Unified Medical Language System (UMLS) (release 2018AA) as the intermediary. Results:We identified a total of 1470 unique nursing diagnosis concepts across SNTs in UMLS, including 175 in CCC, 840 in ICNP, 244 in NANDA-I, 418 in Omaha System, and 631 in NPLS. The NPLS covers approximately 43% of the 1470 concepts-coverage for SNT content is 90% for CCC, 47% for ICNP, 59% for NANDA-I, and 32% for the Omaha System. Discussion/Recommendations:The NPLS version 2017 coverage of SNT nursing diagnoses included in the UMLS is incomplete and equivocal. Recommendations: (1) ensure all SNT concepts in the UMLS are represented by SNOMED CT terms, (2) devise a formal strategy of partial matching to further enhance interoperability, (3) add a classification structure to the NPLS to enhance the ease of use and utility of the list, and (4) minimize redundancy within NPLS.
Project description:In most electronic health record (EHR) systems, clinicians record diagnoses using interface terminologies, such as Intelligent Medical Objects (IMO). When extracting data from EHRs for collaborative research, local codes are often transformed to standard terminologies for consistent analyses despite the potential for loss of fidelity. EHR diagnosis codes may be standardized directly during the Extract-Transform-Load (ETL) process to the "Meaningful Use" clinical data exchange standard, SNOMED-CT, or to the International Classification of Diseases (ICD) terminologies commonly used for billing. We examined the performance of ETL standardization via the direct IMO mapping to SNOMED-CT, and via IMO mapping to ICD-9-CM or ICD-10-CM followed by UMLS mapping to SNOMED-CT. We found that for both ICD-9-CM and ICD-10-CM, only 24-27% of diagnosis codes map to the same SNOMED-CT code selected by the direct IMO-SNOMED crosswalk. We identified that differences in mapping lead to loss in the granularity and laterality of the initial diagnosis.
Project description:Background: Deep Phenotyping is the precise and comprehensive analysis of phenotypic features in which the individual components of the phenotype are observed and described. In UK mental health clinical practice, most clinically relevant information is recorded as free text in the Electronic Health Record, and offers a granularity of information beyond what is expressed in most medical knowledge bases. The SNOMED CT nomenclature potentially offers the means to model such information at scale, yet given a sufficiently large body of clinical text collected over many years, it is difficult to identify the language that clinicians favour to express concepts. Methods: By utilising a large corpus of healthcare data, we sought to make use of semantic modelling and clustering techniques to represent the relationship between the clinical vocabulary of internationally recognised SMI symptoms and the preferred language used by clinicians within a care setting. We explore how such models can be used for discovering novel vocabulary relevant to the task of phenotyping Serious Mental Illness (SMI) with only a small amount of prior knowledge. Results: 20 403 terms were derived and curated via a two stage methodology. The list was reduced to 557 putative concepts based on eliminating redundant information content. These were then organised into 9 distinct categories pertaining to different aspects of psychiatric assessment. 235 concepts were found to be expressions of putative clinical significance. Of these, 53 were identified having novel synonymy with existing SNOMED CT concepts. 106 had no mapping to SNOMED CT. Conclusions: We demonstrate a scalable approach to discovering new concepts of SMI symptomatology based on real-world clinical observation. Such approaches may offer the opportunity to consider broader manifestations of SMI symptomatology than is typically assessed via current diagnostic frameworks, and create the potential for enhancing nomenclatures such as SNOMED CT based on real-world expressions.
Project description:OBJECTIVE:The Unified Medical Language System (UMLS) integrates various source terminologies to support interoperability between biomedical information systems. In this article, we introduce a novel transformation-based auditing method that leverages the UMLS knowledge to systematically identify missing hierarchical IS-A relations in the source terminologies. MATERIALS AND METHODS:Given a concept name in the UMLS, we first identify its base and secondary noun chunks. For each identified noun chunk, we generate replacement candidates that are more general than the noun chunk. Then, we replace the noun chunks with their replacement candidates to generate new potential concept names that may serve as supertypes of the original concept. If a newly generated name is an existing concept name in the same source terminology with the original concept, then a potentially missing IS-A relation between the original and the new concept is identified. RESULTS:Applying our transformation-based method to English-language concept names in the UMLS (2019AB release), a total of 39 359 potentially missing IS-A relations were detected in 13 source terminologies. Domain experts evaluated a random sample of 200 potentially missing IS-A relations identified in the SNOMED CT (U.S. edition) and 100 in Gene Ontology. A total of 173 of 200 and 63 of 100 potentially missing IS-A relations were confirmed by domain experts, indicating that our method achieved a precision of 86.5% and 63% for the SNOMED CT and Gene Ontology, respectively. CONCLUSIONS:Our results showed that our transformation-based method is effective in identifying missing IS-A relations in the UMLS source terminologies.