Dataset Information

Leveraging weak supervision to perform named entity recognition in electronic health records progress notes to identify the ophthalmology exam.

ABSTRACT:

Objective

To develop deep learning models to recognize ophthalmic examination components from clinical notes in electronic health records (EHR) using a weak supervision approach.

Methods

A corpus of 39,099 ophthalmology notes weakly labeled for 24 examination entities was assembled from the EHR of one academic center. Four pre-trained transformer-based language models (DistilBert, BioBert, BlueBert, and ClinicalBert) were fine-tuned to this named entity recognition task and compared to a baseline regular expression model. Models were evaluated on the weakly labeled test dataset, a human-labeled sample of that set, and a human-labeled independent dataset.

Results

On the weakly labeled test set, all transformer-based models had recall > 0.93, with precision varying from 0.815 to 0.843. The baseline model had lower recall (0.769) and precision (0.682). On the human-annotated sample, the baseline model had high recall (0.962, 95 % CI 0.955-0.067) with variable precision across entities (0.081-0.999). Bert models had recall ranging from 0.771 to 0.831, and precision >=0.973. On the independent dataset, precision was 0.926 and recall 0.458 for BlueBert. The baseline model had better recall (0.708, 95 % CI 0.674-0.738) but worse precision (0.399, 95 % CI -0.352-0.451).

Conclusion

We developed the first deep learning system to recognize eye examination components from clinical notes, leveraging a novel opportunity for weak supervision. Transformer-based models had high precision on human-annotated labels, whereas the baseline model had poor precision but higher recall. This system may be used to improve cohort and feature identification using free-text notes.Our weakly supervised approach may help amass large datasets of domain-specific entities from EHRs in many fields.

SUBMITTER: Wang SY

PROVIDER: S-EPMC9901505 | biostudies-literature | 2022 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Leveraging weak supervision to perform named entity recognition in electronic health records progress notes to identify the ophthalmology exam.

Wang Sophia Y SY Huang Justin J Hwang Hannah H Hu Wendeng W Tao Shiqi S Hernandez-Boussard Tina T

International journal of medical informatics 20220916

<h4>Objective</h4>To develop deep learning models to recognize ophthalmic examination components from clinical notes in electronic health records (EHR) using a weak supervision approach.<h4>Methods</h4>A corpus of 39,099 ophthalmology notes weakly labeled for 24 examination entities was assembled from the EHR of one academic center. Four pre-trained transformer-based language models (DistilBert, BioBert, BlueBert, and ClinicalBert) were fine-tuned to this named entity recognition task and compar ...[more]

PMID: 36179600

Dataset Information

Leveraging weak supervision to perform named entity recognition in electronic health records progress notes to identify the ophthalmology exam.

Objective

Methods

Results

Conclusion

Publications

Leveraging weak supervision to perform named entity recognition in electronic health records progress notes to identify the ophthalmology exam.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Leveraging network analysis to evaluate biomedical named entity recognition tools.
| S-EPMC8242017 | biostudies-literature

Ontology-driven weak supervision for clinical entity classification in electronic health records.
| S-EPMC8016863 | biostudies-literature

Not so weak PICO: leveraging weak supervision for participants, interventions, and outcomes recognition for systematic review automation.
| S-EPMC9828146 | biostudies-literature

CheNER: chemical named entity recognizer.
| S-EPMC3967102 | biostudies-literature

Leveraging large language models for knowledge-free weak supervision in clinical natural language processing.
| S-EPMC11893743 | biostudies-literature

Using weak supervision and deep learning to classify clinical notes for identification of current suicidal ideation.
| S-EPMC8009838 | biostudies-literature

DroNER: Dataset for drone named entity recognition.
| S-EPMC10293979 | biostudies-literature

Towards reliable named entity recognition in the biomedical domain.
| S-EPMC6956779 | biostudies-literature

Improving dictionary-based named entity recognition with deep learning.
| S-EPMC11373323 | biostudies-literature

Named entity recognition for bacterial Type IV secretion systems.
| S-EPMC3066171 | biostudies-literature