Dataset Information

Medical triage as an AI ethics benchmark.

ABSTRACT: We present the TRIAGE benchmark, a novel machine ethics benchmark designed to evaluate the ethical decision-making abilities of large language models (LLMs) in mass casualty scenarios. TRIAGE uses medical dilemmas created by healthcare professionals to evaluate the ethical decision-making of AI systems in real-world, high-stakes scenarios. We evaluated six major LLMs on TRIAGE, examining how different ethical and adversarial prompts influence model behavior. Our results show that most models consistently outperformed random guessing, with open source models making more serious ethical errors than proprietary models. Providing guiding ethical principles to LLMs degraded performance on TRIAGE, which stand in contrast to results from other machine ethics benchmarks where explicating ethical principles improved results. Adversarial prompts significantly decreased accuracy. By demonstrating the influence of context and ethical framing on the performance of LLMs, we provide critical insights into the current capabilities and limitations of AI in high-stakes ethical decision making in medicine.

SUBMITTER: Kirch NM

PROVIDER: S-EPMC12373810 | biostudies-literature | 2025 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Medical triage as an AI ethics benchmark.

Kirch Nathalie Maria NM Hebenstreit Konstantin K Samwald Matthias M

Scientific reports 20250822 1

We present the TRIAGE benchmark, a novel machine ethics benchmark designed to evaluate the ethical decision-making abilities of large language models (LLMs) in mass casualty scenarios. TRIAGE uses medical dilemmas created by healthcare professionals to evaluate the ethical decision-making of AI systems in real-world, high-stakes scenarios. We evaluated six major LLMs on TRIAGE, examining how different ethical and adversarial prompts influence model behavior. Our results show that most models con ...[more]

PMID: 40846886

Dataset Information

Medical triage as an AI ethics benchmark.

Publications

Medical triage as an AI ethics benchmark.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Medical practitioner perspectives on AI in emergency triage.
| S-EPMC10731272 | biostudies-literature

Exploring ethical considerations in medical research: Harnessing pre-generated transformers for AI-powered ethics discussions.
| S-EPMC11790142 | biostudies-literature

Hartwig Medical Foundation Benchmark Set
| PRJEB33197 | ENA

Responsible AI measures dataset for ethics evaluation of AI systems.
| S-EPMC12722731 | biostudies-literature

Mapping the flow of knowledge as guidance for ethics implementation in medical AI: A qualitative study.
| S-EPMC10621848 | biostudies-literature

Co-creating Consent for Data Use - AI-Powered Ethics for Biomedical AI.
| S-EPMC12412891 | biostudies-literature

Mapping and Summarizing the Research on AI Systems for Automating Medical History Taking and Triage: Scoping Review.
| S-EPMC11843066 | biostudies-literature

Integrating ethics in AI development: a qualitative study.
| S-EPMC10804710 | biostudies-literature

Medical ethics of long-duration spaceflight.
| S-EPMC10684496 | biostudies-literature

Laboratory animal ethics education improves medical students' awareness of laboratory animal ethics.
| S-EPMC11218205 | biostudies-literature