Ontology highlight
ABSTRACT: Purpose
We aimed to describe the performance and evaluate the educational value of justifications provided by artificial intelligence chatbots, including GPT-3.5, GPT-4, Bard, Claude, and Bing, on the Peruvian National Medical Licensing Examination (P-NLME).Methods
This was a cross-sectional analytical study. On July 25, 2023, each multiple-choice question (MCQ) from the P-NLME was entered into each chatbot (GPT-3, GPT-4, Bing, Bard, and Claude) 3 times. Then, 4 medical educators categorized the MCQs in terms of medical area, item type, and whether the MCQ required Peru-specific knowledge. They assessed the educational value of the justifications from the 2 top performers (GPT-4 and Bing).Results
GPT-4 scored 86.7% and Bing scored 82.2%, followed by Bard and Claude, and the historical performance of Peruvian examinees was 55%. Among the factors associated with correct answers, only MCQs that required Peru-specific knowledge had lower odds (odds ratio, 0.23; 95% confidence interval, 0.09-0.61), whereas the remaining factors showed no associations. In assessing the educational value of justifications provided by GPT-4 and Bing, neither showed any significant differences in certainty, usefulness, or potential use in the classroom.Conclusion
Among chatbots, GPT-4 and Bing were the top performers, with Bing performing better at Peru-specific MCQs. Moreover, the educational value of justifications provided by the GPT-4 and Bing could be deemed appropriate. However, it is essential to start addressing the educational value of these chatbots, rather than merely their performance on examinations.
SUBMITTER: Torres-Zegarra BC
PROVIDER: S-EPMC11009012 | biostudies-literature | 2023
REPOSITORIES: biostudies-literature
Torres-Zegarra Betzy Clariza BC Rios-Garcia Wagner W Ñaña-Cordova Alvaro Micael AM Arteaga-Cisneros Karen Fatima KF Chalco Xiomara Cristina Benavente XCB Ordoñez Marina Atena Bustamante MAB Rios Carlos Jesus Gutierrez CJG Godoy Carlos Alberto Ramos CAR Quezada Kristell Luisa Teresa Panta KLTP Gutierrez-Arratia Jesus Daniel JD Flores-Cohaila Javier Alejandro JA
Journal of educational evaluation for health professions 20231120
<h4>Purpose</h4>We aimed to describe the performance and evaluate the educational value of justifications provided by artificial intelligence chatbots, including GPT-3.5, GPT-4, Bard, Claude, and Bing, on the Peruvian National Medical Licensing Examination (P-NLME).<h4>Methods</h4>This was a cross-sectional analytical study. On July 25, 2023, each multiple-choice question (MCQ) from the P-NLME was entered into each chatbot (GPT-3, GPT-4, Bing, Bard, and Claude) 3 times. Then, 4 medical educators ...[more]