Unknown

Dataset Information

0

Criteria2Query: a natural language interface to clinical databases for cohort definition.


ABSTRACT: OBJECTIVE:Cohort definition is a bottleneck for conducting clinical research and depends on subjective decisions by domain experts. Data-driven cohort definition is appealing but requires substantial knowledge of terminologies and clinical data models. Criteria2Query is a natural language interface that facilitates human-computer collaboration for cohort definition and execution using clinical databases. MATERIALS AND METHODS:Criteria2Query uses a hybrid information extraction pipeline combining machine learning and rule-based methods to systematically parse eligibility criteria text, transforms it first into a structured criteria representation and next into sharable and executable clinical data queries represented as SQL queries conforming to the OMOP Common Data Model. Users can interactively review, refine, and execute queries in the ATLAS web application. To test effectiveness, we evaluated 125 criteria across different disease domains from ClinicalTrials.gov and 52 user-entered criteria. We evaluated F1 score and accuracy against 2 domain experts and calculated the average computation time for fully automated query formulation. We conducted an anonymous survey evaluating usability. RESULTS:Criteria2Query achieved 0.795 and 0.805 F1 score for entity recognition and relation extraction, respectively. Accuracies for negation detection, logic detection, entity normalization, and attribute normalization were 0.984, 0.864, 0.514 and 0.793, respectively. Fully automatic query formulation took 1.22 seconds/criterion. More than 80% (11+ of 13) of users would use Criteria2Query in their future cohort definition tasks. CONCLUSIONS:We contribute a novel natural language interface to clinical databases. It is open source and supports fully automated and interactive modes for autonomous data-driven cohort definition by researchers with minimal human effort. We demonstrate its promising user friendliness and usability.

SUBMITTER: Yuan C 

PROVIDER: S-EPMC6402359 | biostudies-literature | 2019 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Criteria2Query: a natural language interface to clinical databases for cohort definition.

Yuan Chi C   Ryan Patrick B PB   Ta Casey C   Guo Yixuan Y   Li Ziran Z   Hardin Jill J   Makadia Rupa R   Jin Peng P   Shang Ning N   Kang Tian T   Weng Chunhua C  

Journal of the American Medical Informatics Association : JAMIA 20190401 4


<h4>Objective</h4>Cohort definition is a bottleneck for conducting clinical research and depends on subjective decisions by domain experts. Data-driven cohort definition is appealing but requires substantial knowledge of terminologies and clinical data models. Criteria2Query is a natural language interface that facilitates human-computer collaboration for cohort definition and execution using clinical databases.<h4>Materials and methods</h4>Criteria2Query uses a hybrid information extraction pip  ...[more]

Similar Datasets

| S-EPMC8908213 | biostudies-literature
| S-EPMC11461909 | biostudies-literature
| S-EPMC9471718 | biostudies-literature
2017-12-05 | GSE87656 | GEO
| S-EPMC11893743 | biostudies-literature
| S-EPMC11756426 | biostudies-literature
| S-EPMC6894100 | biostudies-literature
| S-EPMC9515453 | biostudies-literature
| S-EPMC10148319 | biostudies-literature
| S-EPMC9109773 | biostudies-literature