Unknown

Dataset Information

0

A large-scale dataset of patient summaries for retrieval-based clinical decision support systems.


ABSTRACT: Retrieval-based Clinical Decision Support (ReCDS) can aid clinical workflow by providing relevant literature and similar patients for a given patient. However, the development of ReCDS systems has been severely obstructed by the lack of diverse patient collections and publicly available large-scale patient-level annotation datasets. In this paper, we collect a novel dataset of patient summaries and relations called PMC-Patients to benchmark two ReCDS tasks: Patient-to-Article Retrieval (ReCDS-PAR) and Patient-to-Patient Retrieval (ReCDS-PPR). Specifically, we extract patient summaries from PubMed Central articles using simple heuristics and utilize the PubMed citation graph to define patient-article relevance and patient-patient similarity. PMC-Patients contains 167k patient summaries with 3.1 M patient-article relevance annotations and 293k patient-patient similarity annotations, which is the largest-scale resource for ReCDS and also one of the largest patient collections. Human evaluation and analysis show that PMC-Patients is a diverse dataset with high-quality annotations. We also implement and evaluate several ReCDS systems on the PMC-Patients benchmarks to show its challenges and conduct several case studies to show the clinical utility of PMC-Patients.

SUBMITTER: Zhao Z 

PROVIDER: S-EPMC10728216 | biostudies-literature | 2023 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

A large-scale dataset of patient summaries for retrieval-based clinical decision support systems.

Zhao Zhengyun Z   Jin Qiao Q   Chen Fangyuan F   Peng Tuorui T   Yu Sheng S  

Scientific data 20231218 1


Retrieval-based Clinical Decision Support (ReCDS) can aid clinical workflow by providing relevant literature and similar patients for a given patient. However, the development of ReCDS systems has been severely obstructed by the lack of diverse patient collections and publicly available large-scale patient-level annotation datasets. In this paper, we collect a novel dataset of patient summaries and relations called PMC-Patients to benchmark two ReCDS tasks: Patient-to-Article Retrieval (ReCDS-PA  ...[more]

Similar Datasets

| S-EPMC9887724 | biostudies-literature
| S-EPMC10654852 | biostudies-literature
| S-EPMC9377614 | biostudies-literature
| S-EPMC6444220 | biostudies-literature
| S-EPMC8908582 | biostudies-literature
| S-EPMC11222760 | biostudies-literature
| S-EPMC11655968 | biostudies-literature
| S-EPMC7072392 | biostudies-literature
| S-EPMC10501571 | biostudies-literature
| S-EPMC10746304 | biostudies-literature