Dataset Information

Accurate auto-labeling of chest X-ray images based on quantitative similarity to an explainable AI model.

ABSTRACT: The inability to accurately, efficiently label large, open-access medical imaging datasets limits the widespread implementation of artificial intelligence models in healthcare. There have been few attempts, however, to automate the annotation of such public databases; one approach, for example, focused on labor-intensive, manual labeling of subsets of these datasets to be used to train new models. In this study, we describe a method for standardized, automated labeling based on similarity to a previously validated, explainable AI (xAI) model-derived-atlas, for which the user can specify a quantitative threshold for a desired level of accuracy (the probability-of-similarity, pSim metric). We show that our xAI model, by calculating the pSim values for each clinical output label based on comparison to its training-set derived reference atlas, can automatically label the external datasets to a user-selected, high level of accuracy, equaling or exceeding that of human experts. We additionally show that, by fine-tuning the original model using the automatically labelled exams for retraining, performance can be preserved or improved, resulting in a highly accurate, more generalized model.

SUBMITTER: Kim D

PROVIDER: S-EPMC8986787 | biostudies-literature | 2022 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Accurate auto-labeling of chest X-ray images based on quantitative similarity to an explainable AI model.

Kim Doyun D Chung Joowon J Choi Jongmun J Succi Marc D MD Conklin John J Longo Maria Gabriela Figueiro MGF Ackman Jeanne B JB Little Brent P BP Petranovic Milena M Kalra Mannudeep K MK Lev Michael H MH Do Synho S

Nature communications 20220406 1

The inability to accurately, efficiently label large, open-access medical imaging datasets limits the widespread implementation of artificial intelligence models in healthcare. There have been few attempts, however, to automate the annotation of such public databases; one approach, for example, focused on labor-intensive, manual labeling of subsets of these datasets to be used to train new models. In this study, we describe a method for standardized, automated labeling based on similarity to a p ...[more]

PMID: 35388010

Similar Datasets

Project description:Artificial intelligence (AI) provides considerable opportunities to assist human work. However, one crucial challenge of human-AI collaboration is that many AI algorithms operate in a black-box manner where the way how the AI makes predictions remains opaque. This makes it difficult for humans to validate a prediction made by AI against their own domain knowledge. For this reason, we hypothesize that augmenting humans with explainable AI improves task performance in human-AI collaboration. To test this hypothesis, we implement explainable AI in the form of visual heatmaps in inspection tasks conducted by domain experts. Visual heatmaps have the advantage that they are easy to understand and help to localize relevant parts of an image. We then compare participants that were either supported by (a) black-box AI or (b) explainable AI, where the latter supports them to follow AI predictions when the AI is accurate or overrule the AI when the AI predictions are wrong. We conducted two preregistered experiments with representative, real-world visual inspection tasks from manufacturing and medicine. The first experiment was conducted with factory workers from an electronics factory, who performed [Formula: see text] assessments of whether electronic products have defects. The second experiment was conducted with radiologists, who performed [Formula: see text] assessments of chest X-ray images to identify lung lesions. The results of our experiments with domain experts performing real-world tasks show that task performance improves when participants are supported by explainable AI with heatmaps instead of black-box AI. We find that explainable AI as a decision aid improved the task performance by 7.7 percentage points (95% confidence interval [CI]: 3.3% to 12.0%, [Formula: see text]) in the manufacturing experiment and by 4.7 percentage points (95% CI: 1.1% to 8.3%, [Formula: see text]) in the medical experiment compared to black-box AI. These gains represent a significant improvement in task performance.

Project description:Background and Aims Chest X-ray (CXR) is indispensable to the assessment of severity, diagnosis, and management of pneumonia. Deep learning is an artificial intelligence (AI) technology that has been applied to the interpretation of medical images. This study investigated the feasibility of classifying fatal pneumonia based on CXR images using deep learning models on publicly available platforms. Methods CXR images of patients with pneumonia at diagnosis were labeled as fatal or nonfatal based on medical records. We applied CXR images from 1031 patients with nonfatal pneumonia and 243 patients with fatal pneumonia for training and self-evaluation of the deep learning models. All labeled CXR images were randomly allocated to the training, validation, and test datasets of deep learning models. Data augmentation techniques were not used in this study. We created two deep learning models using two publicly available platforms. Results The first model showed an area under the precision-recall curve of 0.929 with a sensitivity of 50.0% and a specificity of 92.4% for classifying fatal pneumonia. We evaluated the performance of our deep learning models using sensitivity, specificity, PPV, negative predictive value (NPV), accuracy, and F1 score. Using the external validation test dataset of 100 CXR images, the sensitivity, specificity, accuracy, and F1 score were 68.0%, 86.0%, 77.0%, and 74.7%, respectively. In the original dataset, the performance of the second model showed a sensitivity, specificity, and accuracy of 39.6%, 92.8%, and 82.7%, respectively, while external validation showed values of 38.0%, 92.0%, and 65.0%, respectively. The F1 score was 52.1%. These results were comparable to those obtained by respiratory physicians and residents. Conclusions The deep learning models yielded good accuracy in classifying fatal pneumonia. By further improving the performance, AI could assist physicians in the severity assessment of patients with pneumonia.

Dataset Information

Accurate auto-labeling of chest X-ray images based on quantitative similarity to an explainable AI model.

Publications

Accurate auto-labeling of chest X-ray images based on quantitative similarity to an explainable AI model.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets