Unknown

Dataset Information

0

Identifying Breast Cancer Distant Recurrences from Electronic Health Records Using Machine Learning.


ABSTRACT: Accurately identifying distant recurrences in breast cancer from the Electronic Health Records (EHR) is important for both clinical care and secondary analysis. Although multiple applications have been developed for computational phenotyping in breast cancer, distant recurrence identification still relies heavily on manual chart review. In this study, we aim to develop a model that identifies distant recurrences in breast cancer using clinical narratives and structured data from EHR. We applied MetaMap to extract features from clinical narratives and also retrieved structured clinical data from EHR. Using these features, we trained a support vector machine model to identify distant recurrences in breast cancer patients. We trained the model using 1,396 double-annotated subjects and validated the model using 599 double-annotated subjects. In addition, we validated the model on a set of 4,904 single-annotated subjects as a generalization test. In the held-out test and generalization test, we obtained F-measure scores of 0.78 and 0.74, area under curve (AUC) scores of 0.95 and 0.93, respectively. To explore the representation learning utility of deep neural networks, we designed multiple convolutional neural networks and multilayer neural networks to identify distant recurrences. Using the same test set and generalizability test set, we obtained F-measure scores of 0.79 ± 0.02 and 0.74 ± 0.004, AUC scores of 0.95 ± 0.002 and 0.95 ± 0.01, respectively. Our model can accurately and efficiently identify distant recurrences in breast cancer by combining features extracted from unstructured clinical narratives and structured clinical data.

SUBMITTER: Zeng Z 

PROVIDER: S-EPMC7678240 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

altmetric image

Publications

Identifying Breast Cancer Distant Recurrences from Electronic Health Records Using Machine Learning.

Zeng Zexian Z   Yao Liang L   Roy Ankita A   Li Xiaoyu X   Espino Sasa S   Clare Susan E SE   Khan Seema A SA   Luo Yuan Y  

Journal of healthcare informatics research 20190408


Accurately identifying distant recurrences in breast cancer from the Electronic Health Records (EHR) is important for both clinical care and secondary analysis. Although multiple applications have been developed for computational phenotyping in breast cancer, distant recurrence identification still relies heavily on manual chart review. In this study, we aim to develop a model that identifies distant recurrences in breast cancer using clinical narratives and structured data from EHR. We applied  ...[more]

Similar Datasets

| S-EPMC7556423 | biostudies-literature
| S-EPMC8447665 | biostudies-literature
| S-EPMC6352440 | biostudies-literature