Unknown

Dataset Information

0

Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research.


ABSTRACT: Background and Aims: The utility of clinical information from esophagogastroduodenoscopy (EGD) reports has been limited because of its unstructured narrative format. We developed a natural language processing (NLP) pipeline that automatically extracts information about gastric diseases from unstructured EGD reports and demonstrated its applicability in clinical research. Methods: An NLP pipeline was developed using 2000 EGD and associated pathology reports that were retrieved from a single healthcare center. The pipeline extracted clinical information, including the presence, location, and size, for 10 gastric diseases from the EGD reports. It was validated with 1000 EGD reports by evaluating sensitivity, positive predictive value (PPV), accuracy, and F1 score. The pipeline was applied to 248,966 EGD reports from 2010-2019 to identify patient demographics and clinical information for 10 gastric diseases. Results: For gastritis information extraction, we achieved an overall sensitivity, PPV, accuracy, and F1 score of 0.966, 0.972, 0.996, and 0.967, respectively. Other gastric diseases, such as ulcers, and neoplastic diseases achieved an overall sensitivity, PPV, accuracy, and F1 score of 0.975, 0.982, 0.999, and 0.978, respectively. The study of EGD data of over 10 years revealed the demographics of patients with gastric diseases by sex and age. In addition, the study identified the extent and locations of gastritis and other gastric diseases, respectively. Conclusions: We demonstrated the feasibility of the NLP pipeline providing an automated extraction of gastric disease information from EGD reports. Incorporating the pipeline can facilitate large-scale clinical research to better understand gastric diseases.

SUBMITTER: Song G 

PROVIDER: S-EPMC9181010 | biostudies-literature | 2022 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research.

Song Gyuseon G   Chung Su Jin SJ   Seo Ji Yeon JY   Yang Sun Young SY   Jin Eun Hyo EH   Chung Goh Eun GE   Shim Sung Ryul SR   Sa Soonok S   Hong Moongi Simon MS   Kim Kang Hyun KH   Jang Eunchan E   Lee Chae Won CW   Bae Jung Ho JH   Han Hyun Wook HW  

Journal of clinical medicine 20220524 11


<b>Background and Aims</b>: The utility of clinical information from esophagogastroduodenoscopy (EGD) reports has been limited because of its unstructured narrative format. We developed a natural language processing (NLP) pipeline that automatically extracts information about gastric diseases from unstructured EGD reports and demonstrated its applicability in clinical research. <b>Methods:</b> An NLP pipeline was developed using 2000 EGD and associated pathology reports that were retrieved from  ...[more]

Similar Datasets

| S-EPMC4849652 | biostudies-literature
| S-EPMC8028406 | biostudies-literature
| S-EPMC6672807 | biostudies-literature
| S-EPMC9101856 | biostudies-literature
| S-EPMC9683031 | biostudies-literature
| S-EPMC10083066 | biostudies-literature
| S-EPMC7797509 | biostudies-literature
| S-EPMC3879259 | biostudies-literature
| S-EPMC10898068 | biostudies-literature
| S-EPMC10791738 | biostudies-literature