Dataset Information

Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning.

ABSTRACT:

Background

There is progress to be made in building artificially intelligent systems to detect abnormalities that are not only accurate but can handle the true breadth of findings that radiologists encounter in body (chest, abdomen, and pelvis) computed tomography (CT). Currently, the major bottleneck for developing multi-disease classifiers is a lack of manually annotated data. The purpose of this work was to develop high throughput multi-label annotators for body CT reports that can be applied across a variety of abnormalities, organs, and disease states thereby mitigating the need for human annotation.

Methods

We used a dictionary approach to develop rule-based algorithms (RBA) for extraction of disease labels from radiology text reports. We targeted three organ systems (lungs/pleura, liver/gallbladder, kidneys/ureters) with four diseases per system based on their prevalence in our dataset. To expand the algorithms beyond pre-defined keywords, attention-guided recurrent neural networks (RNN) were trained using the RBA-extracted labels to classify reports as being positive for one or more diseases or normal for each organ system. Alternative effects on disease classification performance were evaluated using random initialization or pre-trained embedding as well as different sizes of training datasets. The RBA was tested on a subset of 2158 manually labeled reports and performance was reported as accuracy and F-score. The RNN was tested against a test set of 48,758 reports labeled by RBA and performance was reported as area under the receiver operating characteristic curve (AUC), with 95% CIs calculated using the DeLong method.

Results

Manual validation of the RBA confirmed 91-99% accuracy across the 15 different labels. Our models extracted disease labels from 261,229 radiology reports of 112,501 unique subjects. Pre-trained models outperformed random initialization across all diseases. As the training dataset size was reduced, performance was robust except for a few diseases with a relatively small number of cases. Pre-trained classification AUCs reached > 0.95 for all four disease outcomes and normality across all three organ systems.

Conclusions

Our label-extracting pipeline was able to encompass a variety of cases and diseases in body CT reports by generalizing beyond strict rules with exceptional accuracy. The method described can be easily adapted to enable automated labeling of hospital-scale medical data sets for training image-based disease classifiers.

SUBMITTER: D'Anniballe VM

PROVIDER: S-EPMC9011942 | biostudies-literature | 2022 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning.

D'Anniballe Vincent M VM Tushar Fakrul Islam FI Faryna Khrystyna K Han Songyue S Mazurowski Maciej A MA Rubin Geoffrey D GD Lo Joseph Y JY

BMC medical informatics and decision making 20220415 1

<h4>Background</h4>There is progress to be made in building artificially intelligent systems to detect abnormalities that are not only accurate but can handle the true breadth of findings that radiologists encounter in body (chest, abdomen, and pelvis) computed tomography (CT). Currently, the major bottleneck for developing multi-disease classifiers is a lack of manually annotated data. The purpose of this work was to develop high throughput multi-label annotators for body CT reports that can be ...[more]

PMID: 35428335

Similar Datasets

Project description:PurposeOrgan autosegmentation efforts to date have largely been focused on adult populations, due to limited availability of pediatric training data. Pediatric patients may present additional challenges for organ segmentation. This paper describes a dataset of 359 pediatric chest-abdomen-pelvis and abdomen-pelvis Computed Tomography (CT) images with expert contours of up to 29 anatomical organ structures to aid in the evaluation and development of autosegmentation algorithms for pediatric CT imaging.Acquisition and validation methodsThe dataset collection consists of axial CT images in Digital Imaging and Communications in Medicine (DICOM) format of 180 male and 179 female pediatric chest-abdomen-pelvis or abdomen-pelvis exams acquired from one of three CT scanners at Children's Wisconsin. The datasets represent random pediatric cases based upon routine clinical indications. Subjects ranged in age from 5 days to 16 years, with a mean age of 7 years. The CT acquisition, contrast, and reconstruction protocols varied across the scanner models and patients, with specifications available in the DICOM headers. Expert contours were manually labeled for up to 29 organ structures per subject. Not all contours are available for all subjects, due to limited field of view or unreliable contouring due to high noise.Data format and usage notesThe data are available on The Cancer Imaging Archive (TCIA_ (https://www.cancerimagingarchive.net/) under the collection Pediatric-CT-SEG. The axial CT image slices for each subject are available in DICOM format. The expert contours are stored in a single DICOM RTSTRUCT file for each subject. The contour names are listed in Table 2.Potential applicationsThis dataset will enable the evaluation and development of organ autosegmentation algorithms for pediatric populations, which exhibit variations in organ shape and size across age. Automated organ segmentation from CT images has numerous applications including radiation therapy, diagnostic tasks, surgical planning, and patient-specific organ dose estimation.

Dataset Information

Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning.

Background

Methods

Results

Conclusions

Publications

Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets