Unknown

Dataset Information

0

Replacing non-biomedical concepts improves embedding of biomedical concepts.


ABSTRACT:

Objectives

Concept embeddings are low-dimensional vector representations of concepts such as MeSH:D009203 (Myocardial Infarction), whose similarity in the embedded vector space reflects their semantic similarity. Here, we test the hypothesis that non-biomedical concept synonym replacement can improve the quality of biomedical concepts embeddings.

Materials and methods

We developed an approach that leverages WordNet to replace sets of synonyms with the most common representative of the synonym set.

Results

We tested our approach on 1055 concept sets and found that, on average, the mean intra-cluster distance was reduced by 8% in the vector-space. Assuming that homophily of related concepts in the vector space is desirable, our approach tends to improve the quality of embeddings.

Discussion and conclusion

This pilot study shows that non-biomedical synonym replacement tends to improve the quality of embeddings of biomedical concepts using the Word2Vec algorithm. We have implemented our approach in a freely available Python package available at https://github.com/TheJacksonLaboratory/wn2vec.

SUBMITTER: Niyonkuru E 

PROVIDER: S-EPMC11244985 | biostudies-literature | 2024 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Replacing non-biomedical concepts improves embedding of biomedical concepts.

Niyonkuru Enock E   Gomez Mauricio Soto MS   Casiraghi Elena E   Antogiovanni Stephan S   Blau Hannah H   Reese Justin T JT   Valentini Giorgio G   Robinson Peter N PN  

bioRxiv : the preprint server for biology 20240704


<h4>Objectives</h4>Concept embeddings are low-dimensional vector representations of concepts such as MeSH:D009203 (Myocardial Infarction), whose similarity in the embedded vector space reflects their semantic similarity. Here, we test the hypothesis that non-biomedical concept synonym replacement can improve the quality of biomedical concepts embeddings.<h4>Materials and methods</h4>We developed an approach that leverages WordNet to replace sets of synonyms with the most common representative of  ...[more]

Similar Datasets

| S-EPMC11498200 | biostudies-literature
| S-EPMC5890980 | biostudies-literature
| S-EPMC7703771 | biostudies-literature
| S-EPMC2572701 | biostudies-literature
| S-EPMC6565371 | biostudies-literature
| S-EPMC2736022 | biostudies-literature
| S-EPMC2387223 | biostudies-literature
| S-EPMC10154515 | biostudies-literature
| S-EPMC6460644 | biostudies-literature
| S-EPMC5860403 | biostudies-literature