Unknown

Dataset Information

0

BioLORD-2023: semantic textual representations fusing large language models and clinical knowledge graph insights.


ABSTRACT:

Objective

In this study, we investigate the potential of large language models (LLMs) to complement biomedical knowledge graphs in the training of semantic models for the biomedical and clinical domains.

Materials and methods

Drawing on the wealth of the Unified Medical Language System knowledge graph and harnessing cutting-edge LLMs, we propose a new state-of-the-art approach for obtaining high-fidelity representations of biomedical concepts and sentences, consisting of 3 steps: an improved contrastive learning phase, a novel self-distillation phase, and a weight averaging phase.

Results

Through rigorous evaluations of diverse downstream tasks, we demonstrate consistent and substantial improvements over the previous state of the art for semantic textual similarity (STS), biomedical concept representation (BCR), and clinically named entity linking, across 15+ datasets. Besides our new state-of-the-art biomedical model for English, we also distill and release a multilingual model compatible with 50+ languages and finetuned on 7 European languages.

Discussion

Many clinical pipelines can benefit from our latest models. Our new multilingual model enables a range of languages to benefit from our advancements in biomedical semantic representation learning, opening a new avenue for bioinformatics researchers around the world. As a result, we hope to see BioLORD-2023 becoming a precious tool for future biomedical applications.

Conclusion

In this article, we introduced BioLORD-2023, a state-of-the-art model for STS and BCR designed for the clinical domain.

SUBMITTER: Remy F 

PROVIDER: S-EPMC11339519 | biostudies-literature | 2024 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

BioLORD-2023: semantic textual representations fusing large language models and clinical knowledge graph insights.

Remy François F   Demuynck Kris K   Demeester Thomas T  

Journal of the American Medical Informatics Association : JAMIA 20240901 9


<h4>Objective</h4>In this study, we investigate the potential of large language models (LLMs) to complement biomedical knowledge graphs in the training of semantic models for the biomedical and clinical domains.<h4>Materials and methods</h4>Drawing on the wealth of the Unified Medical Language System knowledge graph and harnessing cutting-edge LLMs, we propose a new state-of-the-art approach for obtaining high-fidelity representations of biomedical concepts and sentences, consisting of 3 steps:  ...[more]

Similar Datasets

| S-EPMC10238089 | biostudies-literature
| S-EPMC10280465 | biostudies-literature
| S-EPMC11380662 | biostudies-literature
| S-EPMC7889424 | biostudies-literature
| S-EPMC8450100 | biostudies-literature
| S-EPMC11186750 | biostudies-literature
| S-EPMC8513397 | biostudies-literature
| S-EPMC3639949 | biostudies-literature
| S-EPMC10396962 | biostudies-literature
| S-EPMC9627348 | biostudies-literature