Unknown

Dataset Information

0

A deep learning approach to identify missing is-a relations in SNOMED CT.


ABSTRACT:

Objective

SNOMED CT is the largest clinical terminology worldwide. Quality assurance of SNOMED CT is of utmost importance to ensure that it provides accurate domain knowledge to various SNOMED CT-based applications. In this work, we introduce a deep learning-based approach to uncover missing is-a relations in SNOMED CT.

Materials and methods

Our focus is to identify missing is-a relations between concept-pairs exhibiting a containment pattern (ie, the set of words of one concept being a proper subset of that of the other concept). We use hierarchically related containment concept-pairs as positive instances and hierarchically unrelated containment concept-pairs as negative instances to train a model predicting whether an is-a relation exists between 2 concepts with containment pattern. The model is a binary classifier leveraging concept name features, hierarchical features, enriched lexical attribute features, and logical definition features. We introduce a cross-validation inspired approach to identify missing is-a relations among all hierarchically unrelated containment concept-pairs.

Results

We trained and applied our model on the Clinical finding subhierarchy of SNOMED CT (September 2019 US edition). Our model (based on the validation sets) achieved a precision of 0.8164, recall of 0.8397, and F1 score of 0.8279. Applying the model to predict actual missing is-a relations, we obtained a total of 1661 potential candidates. Domain experts performed evaluation on randomly selected 230 samples and verified that 192 (83.48%) are valid.

Conclusions

The results showed that our deep learning approach is effective in uncovering missing is-a relations between containment concept-pairs in SNOMED CT.

SUBMITTER: Abeysinghe R 

PROVIDER: S-EPMC9933066 | biostudies-literature | 2023 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

A deep learning approach to identify missing is-a relations in SNOMED CT.

Abeysinghe Rashmie R   Zheng Fengbo F   Bernstam Elmer V EV   Shi Jay J   Bodenreider Olivier O   Cui Licong L  

Journal of the American Medical Informatics Association : JAMIA 20230201 3


<h4>Objective</h4>SNOMED CT is the largest clinical terminology worldwide. Quality assurance of SNOMED CT is of utmost importance to ensure that it provides accurate domain knowledge to various SNOMED CT-based applications. In this work, we introduce a deep learning-based approach to uncover missing is-a relations in SNOMED CT.<h4>Materials and methods</h4>Our focus is to identify missing is-a relations between concept-pairs exhibiting a containment pattern (ie, the set of words of one concept b  ...[more]

Similar Datasets

| S-EPMC6080685 | biostudies-literature
| S-EPMC3000783 | biostudies-literature
| S-EPMC10426782 | biostudies-literature
| S-EPMC7413283 | biostudies-literature
| S-EPMC10585817 | biostudies-literature
| S-EPMC6318012 | biostudies-literature
| S-EPMC7233442 | biostudies-literature
2022-07-27 | GSE209804 | GEO
| S-EPMC7921635 | biostudies-literature
2020-08-12 | GSE149225 | GEO