Dataset Information

Evaluation of Combined Artificial Intelligence and Radiologist Assessment to Interpret Screening Mammograms.

ABSTRACT:

Importance

Mammography screening currently relies on subjective human interpretation. Artificial intelligence (AI) advances could be used to increase mammography screening accuracy by reducing missed cancers and false positives.

Objective

To evaluate whether AI can overcome human mammography interpretation limitations with a rigorous, unbiased evaluation of machine learning algorithms.

Design, setting, and participants

In this diagnostic accuracy study conducted between September 2016 and November 2017, an international, crowdsourced challenge was hosted to foster AI algorithm development focused on interpreting screening mammography. More than 1100 participants comprising 126 teams from 44 countries participated. Analysis began November 18, 2016.

Main outcomes and measurements

Algorithms used images alone (challenge 1) or combined images, previous examinations (if available), and clinical and demographic risk factor data (challenge 2) and output a score that translated to cancer yes/no within 12 months. Algorithm accuracy for breast cancer detection was evaluated using area under the curve and algorithm specificity compared with radiologists' specificity with radiologists' sensitivity set at 85.9% (United States) and 83.9% (Sweden). An ensemble method aggregating top-performing AI algorithms and radiologists' recall assessment was developed and evaluated.

Results

Overall, 144 231 screening mammograms from 85 580 US women (952 cancer positive ≤12 months from screening) were used for algorithm training and validation. A second independent validation cohort included 166 578 examinations from 68 008 Swedish women (780 cancer positive). The top-performing algorithm achieved an area under the curve of 0.858 (United States) and 0.903 (Sweden) and 66.2% (United States) and 81.2% (Sweden) specificity at the radiologists' sensitivity, lower than community-practice radiologists' specificity of 90.5% (United States) and 98.5% (Sweden). Combining top-performing algorithms and US radiologist assessments resulted in a higher area under the curve of 0.942 and achieved a significantly improved specificity (92.0%) at the same sensitivity.

Conclusions and relevance

While no single AI algorithm outperformed radiologists, an ensemble of AI algorithms combined with radiologist assessment in a single-reader screening environment improved overall accuracy. This study underscores the potential of using machine learning methods for enhancing mammography screening interpretation.

SUBMITTER: Schaffter T

PROVIDER: S-EPMC7052735 | biostudies-literature | 2020 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Evaluation of Combined Artificial Intelligence and Radiologist Assessment to Interpret Screening Mammograms.

Schaffter Thomas T Buist Diana S M DSM Lee Christoph I CI Nikulin Yaroslav Y Ribli Dezso D Guan Yuanfang Y Lotter William W Jie Zequn Z Du Hao H Wang Sijia S Feng Jiashi J Feng Mengling M Kim Hyo-Eun HE Albiol Francisco F Albiol Alberto A Morrell Stephen S Wojna Zbigniew Z Ahsen Mehmet Eren ME Asif Umar U Jimeno Yepes Antonio A Yohanandan Shivanthan S Rabinovici-Cohen Simona S Yi Darvin D Hoff Bruce B Yu Thomas T Chaibub Neto Elias E Rubin Daniel L DL Lindholm Peter P Margolies Laurie R LR McBride Russell Bailey RB Rothstein Joseph H JH Sieh Weiva W Ben-Ari Rami R Harrer Stefan S Trister Andrew A Friend Stephen S Norman Thea T Sahiner Berkman B Strand Fredrik F Guinney Justin J Stolovitzky Gustavo G Mackey Lester L Cahoon Joyce J Shen Li L Sohn Jae Ho JH Trivedi Hari H Shen Yiqiu Y Buturovic Ljubomir L Pereira Jose Costa JC Cardoso Jaime S JS Castro Eduardo E Kalleberg Karl Trygve KT Pelka Obioma O Nedjar Imane I Geras Krzysztof J KJ Nensa Felix F Goan Ethan E Koitka Sven S Caballero Luis L Cox David D DD Krishnaswamy Pavitra P Pandey Gaurav G Friedrich Christoph M CM Perrin Dimitri D Fookes Clinton C Shi Bibo B Cardoso Negrie Gerard G Kawczynski Michael M Cho Kyunghyun K Khoo Can Son CS Lo Joseph Y JY Sorensen A Gregory AG Jung Hwejin H

JAMA network open 20200302 3

<h4>Importance</h4>Mammography screening currently relies on subjective human interpretation. Artificial intelligence (AI) advances could be used to increase mammography screening accuracy by reducing missed cancers and false positives.<h4>Objective</h4>To evaluate whether AI can overcome human mammography interpretation limitations with a rigorous, unbiased evaluation of machine learning algorithms.<h4>Design, setting, and participants</h4>In this diagnostic accuracy study conducted between Sep ...[more]

PMID: 32119094

Similar Datasets

Project description:ImportanceA computer algorithm that performs at or above the level of radiologists in mammography screening assessment could improve the effectiveness of breast cancer screening.ObjectiveTo perform an external evaluation of 3 commercially available artificial intelligence (AI) computer-aided detection algorithms as independent mammography readers and to assess the screening performance when combined with radiologists.Design, setting, and participantsThis retrospective case-control study was based on a double-reader population-based mammography screening cohort of women screened at an academic hospital in Stockholm, Sweden, from 2008 to 2015. The study included 8805 women aged 40 to 74 years who underwent mammography screening and who did not have implants or prior breast cancer. The study sample included 739 women who were diagnosed as having breast cancer (positive) and a random sample of 8066 healthy controls (negative for breast cancer).Main outcomes and measuresPositive follow-up findings were determined by pathology-verified diagnosis at screening or within 12 months thereafter. Negative follow-up findings were determined by a 2-year cancer-free follow-up. Three AI computer-aided detection algorithms (AI-1, AI-2, and AI-3), sourced from different vendors, yielded a continuous score for the suspicion of cancer in each mammography examination. For a decision of normal or abnormal, the cut point was defined by the mean specificity of the first-reader radiologists (96.6%).ResultsThe median age of study participants was 60 years (interquartile range, 50-66 years) for 739 women who received a diagnosis of breast cancer and 54 years (interquartile range, 47-63 years) for 8066 healthy controls. The cases positive for cancer comprised 618 (84%) screen detected and 121 (16%) clinically detected within 12 months of the screening examination. The area under the receiver operating curve for cancer detection was 0.956 (95% CI, 0.948-0.965) for AI-1, 0.922 (95% CI, 0.910-0.934) for AI-2, and 0.920 (95% CI, 0.909-0.931) for AI-3. At the specificity of the radiologists, the sensitivities were 81.9% for AI-1, 67.0% for AI-2, 67.4% for AI-3, 77.4% for first-reader radiologist, and 80.1% for second-reader radiologist. Combining AI-1 with first-reader radiologists achieved 88.6% sensitivity at 93.0% specificity (abnormal defined by either of the 2 making an abnormal assessment). No other examined combination of AI algorithms and radiologists surpassed this sensitivity level.Conclusions and relevanceTo our knowledge, this study is the first independent evaluation of several AI computer-aided detection algorithms for screening mammography. The results of this study indicated that a commercially available AI computer-aided detection algorithm can assess screening mammograms with a sufficient diagnostic performance to be further evaluated as an independent reader in prospective clinical trials. Combining the first readers with the best algorithm identified more cases positive for cancer than combining the first readers with second readers.

Project description:ImportanceUnderstanding the association of artificial intelligence (AI) with physician burnout is crucial for fostering a collaborative interactive environment between physicians and AI.ObjectiveTo estimate the association between AI use in radiology and radiologist burnout.Design, setting, and participantsThis cross-sectional study conducted a questionnaire survey between May and October 2023, using the national quality control system of radiology in China. Participants included radiologists from 1143 hospitals. Radiologists reporting regular or consistent AI use were categorized as the AI group. Statistical analysis was performed from October 2023 to May 2024.ExposureAI use in radiology practice.Main outcomes and measuresBurnout was defined by emotional exhaustion (EE) or depersonalization according to the Maslach Burnout Inventory. Workload was assessed based on working hours, number of image interpretations, hospital level, device type, and role in the workflow. AI acceptance was determined via latent class analysis considering AI-related knowledge, attitude, confidence, and intention. Propensity score-based mixed-effect generalized linear logistic regression was used to estimate the associations between AI use and burnout and its components. Interactions of AI use, workload, and AI acceptance were assessed on additive and multiplicative scales.ResultsAmong 6726 radiologists included in this study, 2376 (35.3%) were female and 4350 (64.7%) were male; the median (IQR) age was 41 (34-48) years; 3017 were in the AI group (1134 [37.6%] female; median [IQR] age, 40 [33-47] years) and 3709 in the non-AI group (1242 [33.5%] female; median [IQR] age, 42 [34-49] years). The weighted prevalence of burnout was significantly higher in the AI group compared with the non-AI group (40.9% vs 38.6%; P < .001). After adjusting for covariates, AI use was significantly associated with increased odds of burnout (odds ratio [OR], 1.20; 95% CI, 1.10-1.30), primarily driven by its association with EE (OR, 1.21; 95% CI, 1.10-1.34). A dose-response association was observed between the frequency of AI use and burnout (P for trend < .001). The associations were more pronounced among radiologists with high workload and lower AI acceptance. A significant negative interaction was noted between high AI acceptance and AI use.Conclusions and relevanceIn this cross-sectional study of radiologist burnout, frequent AI use was associated with an increased risk of radiologist burnout, particularly among those with high workload or lower AI acceptance. Further longitudinal studies are needed to provide more evidence.

Project description:ImportanceExpert-level artificial intelligence (AI) algorithms for prostate biopsy grading have recently been developed. However, the potential impact of integrating such algorithms into pathologist workflows remains largely unexplored.ObjectiveTo evaluate an expert-level AI-based assistive tool when used by pathologists for the grading of prostate biopsies.Design, setting, and participantsThis diagnostic study used a fully crossed multiple-reader, multiple-case design to evaluate an AI-based assistive tool for prostate biopsy grading. Retrospective grading of prostate core needle biopsies from 2 independent medical laboratories in the US was performed between October 2019 and January 2020. A total of 20 general pathologists reviewed 240 prostate core needle biopsies from 240 patients. Each pathologist was randomized to 1 of 2 study cohorts. The 2 cohorts reviewed every case in the opposite modality (with AI assistance vs without AI assistance) to each other, with the modality switching after every 10 cases. After a minimum 4-week washout period for each batch, the pathologists reviewed the cases for a second time using the opposite modality. The pathologist-provided grade group for each biopsy was compared with the majority opinion of urologic pathology subspecialists.ExposureAn AI-based assistive tool for Gleason grading of prostate biopsies.Main outcomes and measuresAgreement between pathologists and subspecialists with and without the use of an AI-based assistive tool for the grading of all prostate biopsies and Gleason grade group 1 biopsies.ResultsBiopsies from 240 patients (median age, 67 years; range, 39-91 years) with a median prostate-specific antigen level of 6.5 ng/mL (range, 0.6-97.0 ng/mL) were included in the analyses. Artificial intelligence-assisted review by pathologists was associated with a 5.6% increase (95% CI, 3.2%-7.9%; P < .001) in agreement with subspecialists (from 69.7% for unassisted reviews to 75.3% for assisted reviews) across all biopsies and a 6.2% increase (95% CI, 2.7%-9.8%; P = .001) in agreement with subspecialists (from 72.3% for unassisted reviews to 78.5% for assisted reviews) for grade group 1 biopsies. A secondary analysis indicated that AI assistance was also associated with improvements in tumor detection, mean review time, mean self-reported confidence, and interpathologist agreement.Conclusions and relevanceIn this study, the use of an AI-based assistive tool for the review of prostate biopsies was associated with improvements in the quality, efficiency, and consistency of cancer detection and grading.

Project description:PurposeTo evaluate an MRI-based radiomic texture classifier alone and combined with radiologist qualitative assessment in predicting pathological complete response (pCR) using restaging MRI with internal training and external validation.MethodsConsecutive patients with locally advanced rectal cancer (LARC) who underwent neoadjuvant therapy followed by total mesorectal excision from March 2012 to February 2016 (Memorial Sloan Kettering Cancer Center/internal dataset, n = 114, 41% female, median age = 55) and July 2014 to October 2015 (Instituto do Câncer do Estado de São Paulo/external dataset, n = 50, 52% female, median age = 64.5) were retrospectively included. Two radiologists (R1, senior; R2, junior) independently evaluated restaging MRI, classifying patients (radiological complete response vs radiological partial response). Model A (n = 33 texture features), model B (n = 91 features including texture, shape, and edge features), and two combination models (model A + B + R1, model A + B + R2) were constructed. Pathology served as the reference standard for neoadjuvant treatment response. Comparison of the classifiers' AUCs on the external set was done using DeLong's test.ResultsModels A and B had similar discriminative ability (P = 0.3; Model B AUC = 83%, 95% CI 70%-97%). Combined models increased inter-reader agreement compared with radiologist-only interpretation (κ = 0.82, 95% CI 0.70-0.89 vs k = 0.25, 95% CI 0.11-0.61). The combined model slightly increased junior radiologist specificity, positive predictive value, and negative predictive values (93% vs 90%, 57% vs 50%, and 91% vs 90%, respectively).ConclusionWe developed and externally validated a combined model using radiomics and radiologist qualitative assessment, which improved inter-reader agreement and slightly increased the diagnostic performance of the junior radiologist in predicting pCR after neoadjuvant treatment in patients with LARC.

Dataset Information

Evaluation of Combined Artificial Intelligence and Radiologist Assessment to Interpret Screening Mammograms.

Importance

Objective

Design, setting, and participants

Main outcomes and measurements

Results

Conclusions and relevance

Publications

Evaluation of Combined Artificial Intelligence and Radiologist Assessment to Interpret Screening Mammograms.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets