Unknown

Dataset Information

0

Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study.


ABSTRACT:

Background

The competence of ChatGPT (Chat Generative Pre-Trained Transformer) in non-English languages is not well studied.

Objective

This study compared the performances of GPT-3.5 (Generative Pre-trained Transformer) and GPT-4 on the Japanese Medical Licensing Examination (JMLE) to evaluate the reliability of these models for clinical reasoning and medical knowledge in non-English languages.

Methods

This study used the default mode of ChatGPT, which is based on GPT-3.5; the GPT-4 model of ChatGPT Plus; and the 117th JMLE in 2023. A total of 254 questions were included in the final analysis, which were categorized into 3 types, namely general, clinical, and clinical sentence questions.

Results

The results indicated that GPT-4 outperformed GPT-3.5 in terms of accuracy, particularly for general, clinical, and clinical sentence questions. GPT-4 also performed better on difficult questions and specific disease questions. Furthermore, GPT-4 achieved the passing criteria for the JMLE, indicating its reliability for clinical reasoning and medical knowledge in non-English languages.

Conclusions

GPT-4 could become a valuable tool for medical education and clinical support in non-English-speaking regions, such as Japan.

SUBMITTER: Takagi S 

PROVIDER: S-EPMC10365615 | biostudies-literature | 2023 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study.

Takagi Soshi S   Watari Takashi T   Erabi Ayano A   Sakaguchi Kota K  

JMIR medical education 20230629


<h4>Background</h4>The competence of ChatGPT (Chat Generative Pre-Trained Transformer) in non-English languages is not well studied.<h4>Objective</h4>This study compared the performances of GPT-3.5 (Generative Pre-trained Transformer) and GPT-4 on the Japanese Medical Licensing Examination (JMLE) to evaluate the reliability of these models for clinical reasoning and medical knowledge in non-English languages.<h4>Methods</h4>This study used the default mode of ChatGPT, which is based on GPT-3.5;  ...[more]

Similar Datasets

| S-EPMC10884900 | biostudies-literature
| S-EPMC10665355 | biostudies-literature
| S-EPMC11394718 | biostudies-literature
| S-EPMC11009855 | biostudies-literature
| S-EPMC10570896 | biostudies-literature
| S-EPMC11406751 | biostudies-literature
| S-EPMC11893186 | biostudies-literature
| S-EPMC10723673 | biostudies-literature
| S-EPMC10805303 | biostudies-literature
| S-EPMC4397843 | biostudies-literature