Use of general-purpose negation detection to augment concept indexing of medical documents: a quantitative study using the UMLS.
ABSTRACT: OBJECTIVES:To test the hypothesis that most instances of negated concepts in dictated medical documents can be detected by a strategy that relies on tools developed for the parsing of formal (computer) languages-specifically, a lexical scanner ("lexer") that uses regular expressions to generate a finite state machine, and a parser that relies on a restricted subset of context-free grammars, known as LALR(1) grammars. METHODS:A diverse training set of 40 medical documents from a variety of specialties was manually inspected and used to develop a program (Negfinder) that contained rules to recognize a large set of negated patterns occurring in the text. Negfinder's lexer and parser were developed using tools normally used to generate programming language compilers. The input to Negfinder consisted of medical narrative that was preprocessed to recognize UMLS concepts: the text of a recognized concept had been replaced with a coded representation that included its UMLS concept ID. The program generated an index with one entry per instance of a concept in the document, where the presence or absence of negation of that concept was recorded. This information was used to mark up the text of each document by color-coding it to make it easier to inspect. The parser was then evaluated in two ways: 1) a test set of 60 documents (30 discharge summaries, 30 surgical notes) marked-up by Negfinder was inspected visually to quantify false-positive and false-negative results; and 2) a different test set of 10 documents was independently examined for negatives by a human observer and by Negfinder, and the results were compared. RESULTS:In the first evaluation using marked-up documents, 8,358 instances of UMLS concepts were detected in the 60 documents, of which 544 were negations detected by the program and verified by human observation (true-positive results, or TPs). Thirteen instances were wrongly flagged as negated (false-positive results, or FPs), and the program missed 27 instances of negation (false-negative results, or FNs), yielding a sensitivity of 95.3 percent and a specificity of 97.7 percent. In the second evaluation using independent negation detection, 1,869 concepts were detected in 10 documents, with 135 TPs, 12 FPs, and 6 FNs, yielding a sensitivity of 95.7 percent and a specificity of 91.8 percent. One of the words "no," "denies/denied," "not," or "without" was present in 92.5 percent of all negations. CONCLUSIONS:Negation of most concepts in medical narrative can be reliably detected by a simple strategy. The reliability of detection depends on several factors, the most important being the accuracy of concept matching.
Project description:OBJECTIVES:To explore the feasibility of using the National Library of Medicine's Unified Medical Language System (UMLS) Metathesaurus as the basis for a computational strategy to identify concepts in medical narrative text preparatory to indexing. To quantitatively evaluate this strategy in terms of true positives, false positives (spuriously identified concepts) and false negatives (concepts missed by the identification process). METHODS:Using the 1999 UMLS Metathesaurus, the authors processed a training set of 100 documents (50 discharge summaries, 50 surgical notes) with a concept-identification program, whose output was manually analyzed. They flagged concepts that were erroneously identified and added new concepts that were not identified by the program, recording the reason for failure in such cases. After several refinements to both their algorithm and the UMLS subset on which it operated, they deployed the program on a test set of 24 documents (12 of each kind). RESULTS:Of 8,745 matches in the training set, 7,227 (82.6 percent ) were true positives, whereas of 1,701 matches in the test set, 1, 298 (76.3 percent) were true positives. Matches other than true positive indicated potential problems in production-mode concept indexing. Examples of causes of problems were redundant concepts in the UMLS, homonyms, acronyms, abbreviations and elisions, concepts that were missing from the UMLS, proper names, and spelling errors. CONCLUSIONS:The error rate was too high for concept indexing to be the only production-mode means of preprocessing medical narrative. Considerable curation needs to be performed to define a UMLS subset that is suitable for concept matching.
Project description:Processing negated mental representations comes with a price: Not only are negations harder to resolve than affirmative statements, but they may even invoke ironic effects, producing the exact opposite of the intended outcome. Negation effects also behave ironically when subjected to high-frequency training; when they are confronted often, the difficulty to process negations strangely increases. Here we show that negation effects can be mitigated under certain circumstances. Based on models of cognitive control and conflict adaptation, we hypothesized that negation effects diminish when two criteria are met: negations have to be resolved not only frequently, but also just recently. We confirmed this prediction in two experiments by using an innovative, two-dimensional finger tracking design, in which we measured the influence of the original semantic content during negation processing via temporal and spatial measures. Negation effects were present throughout the experiment, but were reduced after recent negations, particularly during or after a high-frequency negation context. The combined influence of frequency and recency thus seems to be the most successful and promising attempt to mitigate ironic negation effects on overt behavior.
Project description:The second track of the 2014 i2b2 challenge asked participants to automatically identify risk factors for heart disease among diabetic patients using natural language processing techniques for clinical notes. This paper describes a rule-based system developed using a combination of regular expressions, concepts from the Unified Medical Language System (UMLS), and freely-available resources from the community. With a performance (F1=90.7) that is significantly higher than the median (F1=87.20) and close to the top performing system (F1=92.8), it was the best rule-based system of all the submissions in the challenge. We also used this system to evaluate the utility of different terminologies in the UMLS towards the challenge task. Of the 155 terminologies in the UMLS, 129 (76.78%) have no representation in the corpus. The Consumer Health Vocabulary had very good coverage of relevant concepts and was the most useful terminology for the challenge task. While segmenting notes into sections and lists has a significant impact on the performance, identifying negations and experiencer of the medical event results in negligible gain.
Project description:FACTA is a text search engine for MEDLINE abstracts, which is designed particularly to help users browse biomedical concepts (e.g. genes/proteins, diseases, enzymes and chemical compounds) appearing in the documents retrieved by the query. The concepts are presented to the user in a tabular format and ranked based on the co-occurrence statistics. Unlike existing systems that provide similar functionality, FACTA pre-indexes not only the words but also the concepts mentioned in the documents, which enables the user to issue a flexible query (e.g. free keywords or Boolean combinations of keywords/concepts) and receive the results immediately even when the number of the documents that match the query is very large. The user can also view snippets from MEDLINE to get textual evidence of associations between the query terms and the concepts. The concept IDs and their names/synonyms for building the indexes were collected from several biomedical databases and thesauri, such as UniProt, BioThesaurus, UMLS, KEGG and DrugBank.The system is available at http://www.nactem.ac.uk/software/facta/
Project description:OBJECTIVE:Concept normalization, the task of linking phrases in text to concepts in an ontology, is useful for many downstream tasks including relation extraction, information retrieval, etc. We present a generate-and-rank concept normalization system based on our participation in the 2019 National NLP Clinical Challenges Shared Task Track 3 Concept Normalization. MATERIALS AND METHODS:The shared task provided 13 609 concept mentions drawn from 100 discharge summaries. We first design a sieve-based system that uses Lucene indices over the training data, Unified Medical Language System (UMLS) preferred terms, and UMLS synonyms to generate a list of possible concepts for each mention. We then design a listwise classifier based on the BERT (Bidirectional Encoder Representations from Transformers) neural network to rank the candidate concepts, integrating UMLS semantic types through a regularizer. RESULTS:Our generate-and-rank system was third of 33 in the competition, outperforming the candidate generator alone (81.66% vs 79.44%) and the previous state of the art (76.35%). During postevaluation, the model's accuracy was increased to 83.56% via improvements to how training data are generated from UMLS and incorporation of our UMLS semantic type regularizer. DISCUSSION:Analysis of the model shows that prioritizing UMLS preferred terms yields better performance, that the UMLS semantic type regularizer results in qualitatively better concept predictions, and that the model performs well even on concepts not seen during training. CONCLUSIONS:Our generate-and-rank framework for UMLS concept normalization integrates key UMLS features like preferred terms and semantic types with a neural network-based ranking model to accurately link phrases in text to UMLS concepts.
Project description:When talking about absence, we may express it in a negative statement (using explicit negation e.g. I was not) or in a positive statement (using implicit negation e.g. I wished I were). Previous research has shown that explicitly negated statements may cause false recall-negated items may paradoxically be remembered as present. The current study compares false recall caused by implicit and explicit negation. Participants listened to a recording in which some objects were mentioned as present, some as absent, and some not mentioned at all. The absence of objects was expressed using explicit or implicit negation. Participants' recall of the recording was measured either five minutes or one week after exposure to the recording. Results indicate that implicit and explicit negation lead to a nearly identical false recall of negated items. However, items not mentioned in the recording (i.e. neither mentioned nor negated) were more often recognized as present by participants exposed to implicit, rather than explicit negation. We postulate that false recall of negated items could be explained by participants remembering the item itself, but forgetting the context in which it was present (an affirmative or a negative statement), hence objects would be recalled as present just because they were spoken of.
Project description:In this article we demonstrate that negation of ideas can have paradoxical effects, possibly leading the listener to believe that the negated ideas actually existed. In Experiment 1, participants listened to a description of a house, in which some objects were mentioned, some were negated, and some were not mentioned at all. When questioned about the existence of these objects a week later, the participants gave more false positives for items that were negated in the original material than for items that were not mentioned at all, an effect we call negation related false memories (NRFM). The NRFM effect was replicated again in Experiment 2 with a sample of five and six year-old children. Experiment 3 confirmed NRFM in the case of negated actions. The results are discussed in terms of retention hypothesis, as well as the theory that negation can activate a representation of an entity and behaviour. It is also indicated that future research is needed to ensure that it is indeed negation which caused false alarms, not merely mentioning an object.
Project description:We analyze in this paper the data collected in a set of experiments investigating how people combine natural concepts. We study the mutual influence of conceptual conjunction and negation by measuring the membership weights of a list of exemplars with respect to two concepts, e.g., Fruits and Vegetables, and their conjunction Fruits And Vegetables, but also their conjunction when one or both concepts are negated, namely, Fruits And Not Vegetables, Not Fruits And Vegetables, and Not Fruits And Not Vegetables. Our findings sharpen and advance existing analysis on conceptual combinations, revealing systematic deviations from classical (fuzzy set) logic and probability theory. And, more important, our results give further considerable evidence to the validity of our quantum-theoretic framework for the combination of two concepts. Indeed, the representation of conceptual negation naturally arises from the general assumptions of our two-sector Fock space model, and this representation faithfully agrees with the collected data. In addition, we find a new significant and a priori unexpected deviation from classicality, which can exactly be explained by assuming that human reasoning is the superposition of an "emergent reasoning" and a "logical reasoning," and that these two processes are represented in a Fock space algebraic structure.
Project description:The main results are about the groups of the negations on the unit square, which is considered as a bilattice. It is proven that all the automorphisms on it form a group; the set, containing the monotonic isomorphisms and the strict negations of the first (or the second or the third) kind, with the operator "composition," is a group G₂ (or G₃ or G₄, correspondingly). All these four kinds of mappings form a group G₅. And all the groups Gi , i = 2,3, 4 are normal subgroups of G₅. Moreover, for G₅, a generator set is given, which consists of all the involutive negations of the second kind and the standard negation of the first kind. As a subset of the unit square, the interval-valued set is also studied. Two groups are found: one group consists of all the isomorphisms on L(I) , and the other group contains all the isomorphisms and all the strict negations on L(I) , which keep the diagonal. Moreover, the former is a normal subgroup of the latter. And all the involutive negations on the interval-valued set form a generator set of the latter group.
Project description:BACKGROUND:Standardization in clinical documentation can increase efficiency and can save time and resources. OBJECTIVE:The objectives of this work are to compare documentation forms for acute coronary syndrome (ACS), check for standardization, and generate a list of the most common data elements using semantic form annotation with the Unified Medical Language System (UMLS). METHODS:Forms from registries, studies, risk scores, quality assurance, official guidelines, and routine documentation from four hospitals in Germany were semantically annotated using UMLS. This allowed for automatic comparison of concept frequencies and the generation of a list of the most common concepts. RESULTS:A total of 3710 forms items from 86 sources were semantically annotated using 842 unique UMLS concepts. Half of all medical concept occurrences were covered by 60 unique concepts, which suggests the existence of a core dataset of relevant concepts. Overlap percentages between forms were relatively low, hinting at inconsistent documentation structures and lack of standardization. CONCLUSIONS:This analysis shows a lack of standardized and semantically enriched documentation for patients with ACS. Efforts made by official institutions like the European Society for Cardiology have not yet been fully implemented. Utilizing a standardized and annotated core dataset of the most important data concepts could make export and automatic reuse of data easier. The generated list of common data elements is an exemplary implementation suggestion of the concepts to use in a standardized approach.