Unknown

Dataset Information

0

Leveraging weak supervision to perform named entity recognition in electronic health records progress notes to identify the ophthalmology exam.


ABSTRACT:

Objective

To develop deep learning models to recognize ophthalmic examination components from clinical notes in electronic health records (EHR) using a weak supervision approach.

Methods

A corpus of 39,099 ophthalmology notes weakly labeled for 24 examination entities was assembled from the EHR of one academic center. Four pre-trained transformer-based language models (DistilBert, BioBert, BlueBert, and ClinicalBert) were fine-tuned to this named entity recognition task and compared to a baseline regular expression model. Models were evaluated on the weakly labeled test dataset, a human-labeled sample of that set, and a human-labeled independent dataset.

Results

On the weakly labeled test set, all transformer-based models had recall > 0.93, with precision varying from 0.815 to 0.843. The baseline model had lower recall (0.769) and precision (0.682). On the human-annotated sample, the baseline model had high recall (0.962, 95 % CI 0.955-0.067) with variable precision across entities (0.081-0.999). Bert models had recall ranging from 0.771 to 0.831, and precision >=0.973. On the independent dataset, precision was 0.926 and recall 0.458 for BlueBert. The baseline model had better recall (0.708, 95 % CI 0.674-0.738) but worse precision (0.399, 95 % CI -0.352-0.451).

Conclusion

We developed the first deep learning system to recognize eye examination components from clinical notes, leveraging a novel opportunity for weak supervision. Transformer-based models had high precision on human-annotated labels, whereas the baseline model had poor precision but higher recall. This system may be used to improve cohort and feature identification using free-text notes.Our weakly supervised approach may help amass large datasets of domain-specific entities from EHRs in many fields.

SUBMITTER: Wang SY 

PROVIDER: S-EPMC9901505 | biostudies-literature | 2022 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Leveraging weak supervision to perform named entity recognition in electronic health records progress notes to identify the ophthalmology exam.

Wang Sophia Y SY   Huang Justin J   Hwang Hannah H   Hu Wendeng W   Tao Shiqi S   Hernandez-Boussard Tina T  

International journal of medical informatics 20220916


<h4>Objective</h4>To develop deep learning models to recognize ophthalmic examination components from clinical notes in electronic health records (EHR) using a weak supervision approach.<h4>Methods</h4>A corpus of 39,099 ophthalmology notes weakly labeled for 24 examination entities was assembled from the EHR of one academic center. Four pre-trained transformer-based language models (DistilBert, BioBert, BlueBert, and ClinicalBert) were fine-tuned to this named entity recognition task and compar  ...[more]

Similar Datasets

| S-EPMC8242017 | biostudies-literature
| S-EPMC8016863 | biostudies-literature
| S-EPMC9828146 | biostudies-literature
| S-EPMC3967102 | biostudies-literature
| S-EPMC11893743 | biostudies-literature
| S-EPMC8009838 | biostudies-literature
| S-EPMC10293979 | biostudies-literature
| S-EPMC6956779 | biostudies-literature
| S-EPMC11373323 | biostudies-literature
| S-EPMC3066171 | biostudies-literature