Dataset Information

Probing artificial intelligence in neurosurgical training: ChatGPT takes a neurosurgical residents written exam.

ABSTRACT:

Introduction

Artificial Intelligence tools are being introduced in almost every field of human life, including medical sciences and medical education, among scepticism and enthusiasm.

Research question

to assess how a generative language tool (Generative Pretrained Transformer 3.5, ChatGPT) performs at both generating questions and answering a neurosurgical residents' written exam. Namely, to assess how ChatGPT generates questions, how it answers human-generated questions, how residents answer AI-generated questions and how AI answers its self-generated question.

Materials and methods

50 questions were included in the written exam, 46 questions were generated by humans (senior staff members) and 4 were generated by ChatGPT. 11 participants took the exam (ChatGPT and 10 residents). Questions were both open-ended and multiple-choice.8 questions were not submitted to ChatGPT since they contained images or schematic drawings to interpret.

Results

formulating requests to ChatGPT required an iterative process to precise both questions and answers. Chat GPT scored among the lowest ranks (9/11) among all the participants). There was no difference in response rate for residents' between human-generated vs AI-generated questions that could have been attributed to less clarity of the question. ChatGPT answered correctly to all its self-generated questions.

Discussion and conclusions

AI is a promising and powerful tool for medical education and for specific medical purposes, which need to be further determined. To request AI to generate logical and sound questions, that request must be formulated as precise as possible, framing the content, the type of question and its correct answers.

SUBMITTER: Bartoli A

PROVIDER: S-EPMC10753430 | biostudies-literature | 2024

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Probing artificial intelligence in neurosurgical training: ChatGPT takes a neurosurgical residents written exam.

Bartoli A A May A T AT Al-Awadhi A A Schaller K K

Brain & spine 20231129

<h4>Introduction</h4>Artificial Intelligence tools are being introduced in almost every field of human life, including medical sciences and medical education, among scepticism and enthusiasm.<h4>Research question</h4>to assess how a generative language tool (Generative Pretrained Transformer 3.5, ChatGPT) performs at both generating questions and answering a neurosurgical residents' written exam. Namely, to assess how ChatGPT generates questions, how it answers human-generated questions, how res ...[more]

PMID: 38163001

Similar Datasets

Project description:OpenAI's Chat Generative Pre-trained Transformer (ChatGPT) technology enables conversational interactions with applications across various fields, including sport. Here, ChatGPT's proficiency in designing a 12-week resistance training programme, following specific prompts, was investigated. GPT3.5 and GPT4.0 versions were requested to design 12-week resistance training programmes for male and female hypothetical subjects (20-years-old, no injury, and 'intermediate' resistance training experience). Subsequently, GPT4.0 was requested to design an 'advanced' training programme for the same profiles. The proposed training programmes were compared with established guidelines and literature (e.g., National Strength and Conditioning Association textbook), and discussed. ChatGPT suggested 12-week training programmes comprising three, 4-week phases, each with different objectives (e.g., hypertrophy/strength). GPT3.5 proposed a weekly frequency of ~3 sessions, load intensity of 70-85% of one repetition-maximum, repetition range of 4-8 (2-4 sets), and tempo of 2/0/2 (eccentric/pause/concentric/'pause'). GPT4.0 proposed intermediate- and advanced programme, with a frequency of 5 or 4 sessions, 60-90% or 70-95% intensity, 3-5 sets or 3-6 sets, 5-12 or 3-12 repetitions, respectively. GPT3.5 proposed rest intervals of 90-120 s, and exercise tempo of 2/0/2. GPT4.0 proposed 60-180 (intermediate) or 60-300 s (advanced), with exercise tempo of 2/1/2 for intermediates, and 3/0/1/0, 2/0/1/0, and 1/0/1/0 for advanced programmes. All derived programmes were objectively similar regardless of sex. ChatGPT generated training programmes which likely require additional fine-tuning before application. GPT4.0 synthesised more information than GPT3.5 in response to the prompt, and demonstrated recognition awareness of training experience (intermediate vs advanced). ChatGPT may serve as a complementary tool for writing 'draft' programme, but likely requires human expertise to maximise training programme effectiveness.

Project description:BackgroundLarge language models, exemplified by ChatGPT, have reached a level of sophistication that makes distinguishing between human- and artificial intelligence (AI)-generated texts increasingly challenging. This has raised concerns in academia, particularly in medicine, where the accuracy and authenticity of written work are paramount.ObjectiveThis semirandomized controlled study aims to examine the ability of 2 blinded expert groups with different levels of content familiarity-medical professionals and humanities scholars with expertise in textual analysis-to distinguish between longer scientific texts in German written by medical students and those generated by ChatGPT. Additionally, the study sought to analyze the reasoning behind their identification choices, particularly the role of content familiarity and linguistic features.MethodsBetween May and August 2023, a total of 35 experts (medical: n=22; humanities: n=13) were each presented with 2 pairs of texts on different medical topics. Each pair had similar content and structure: 1 text was written by a medical student, and the other was generated by ChatGPT (version 3.5, March 2023). Experts were asked to identify the AI-generated text and justify their choice. These justifications were analyzed through a multistage, interdisciplinary qualitative analysis to identify relevant textual features. Before unblinding, experts rated each text on 6 characteristics: linguistic fluency and spelling/grammatical accuracy, scientific quality, logical coherence, expression of knowledge limitations, formulation of future research questions, and citation quality. Univariate tests and multivariate logistic regression analyses were used to examine associations between participants' characteristics, their stated reasons for author identification, and the likelihood of correctly determining a text's authorship.ResultsOverall, in 48 out of 69 (70%) decision rounds, participants accurately identified the AI-generated texts, with minimal difference between groups (medical: 31/43, 72%; humanities: 17/26, 65%; odds ratio [OR] 1.37, 95% CI 0.5-3.9). While content errors had little impact on identification accuracy, stylistic features-particularly redundancy (OR 6.90, 95% CI 1.01-47.1), repetition (OR 8.05, 95% CI 1.25-51.7), and thread/coherence (OR 6.62, 95% CI 1.25-35.2)-played a crucial role in participants' decisions to identify a text as AI-generated.ConclusionsThe findings suggest that both medical and humanities experts were able to identify ChatGPT-generated texts in medical contexts, with their decisions largely based on linguistic attributes. The accuracy of identification appears to be independent of experts' familiarity with the text content. As the decision-making process primarily relies on linguistic attributes-such as stylistic features and text coherence-further quasi-experimental studies using texts from other academic disciplines should be conducted to determine whether instructions based on these features can enhance lecturers' ability to distinguish between student-authored and AI-generated work.

Dataset Information

Probing artificial intelligence in neurosurgical training: ChatGPT takes a neurosurgical residents written exam.

Introduction

Research question

Materials and methods

Results

Discussion and conclusions

Publications

Probing artificial intelligence in neurosurgical training: ChatGPT takes a neurosurgical residents written exam.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets