A self-adaptive deep learning method for automated eye laterality detection based on color fundus photography.
ABSTRACT: PURPOSE:To provide a self-adaptive deep learning (DL) method to automatically detect the eye laterality based on fundus images. METHODS:A total of 18394 fundus images with real-world eye laterality labels were used for model development and internal validation. A separate dataset of 2000 fundus images with eye laterality labeled manually was used for external validation. A DL model was developed based on a fine-tuned Inception-V3 network with self-adaptive strategy. The area under receiver operator characteristic curve (AUC) with sensitivity and specificity and confusion matrix were applied to assess the model performance. The class activation map (CAM) was used for model visualization. RESULTS:In the external validation (N = 2000, 50% labeled as left eye), the AUC of the DL model for overall eye laterality detection was 0.995 (95% CI, 0.993-0.997) with an accuracy of 99.13%. Specifically for left eye detection, the sensitivity was 99.00% (95% CI, 98.11%-99.49%) and the specificity was 99.10% (95% CI, 98.23%-99.56%). Nineteen images were wrongly classified as compared to the human labels: 12 were due to human wrong labelling, while 7 were due to poor image quality. The CAM showed that the region of interest for eye laterality detection was mainly the optic disc and surrounding areas. CONCLUSION:We proposed a self-adaptive DL method with a high performance in detecting eye laterality based on fundus images. Results of our findings were based on real world labels and thus had practical significance in clinical settings.
Project description:<h4>Purpose</h4>Artificial intelligence (AI) deep learning (DL) has been shown to have significant potential for eye disease detection and screening on retinal photographs in different clinical settings, particular in primary care. However, an automated pre-diagnosis image assessment is essential to streamline the application of the developed AI-DL algorithms. In this study, we developed and validated a DL-based pre-diagnosis assessment module for retinal photographs, targeting image quality (gradable vs. ungradable), field of view (macula-centered vs. optic-disc-centered), and laterality of the eye (right vs. left).<h4>Methods</h4>A total of 21,348 retinal photographs from 1914 subjects from various clinical settings in Hong Kong, Singapore, and the United Kingdom were used for training, internal validation, and external testing for the DL module, developed by two DL-based algorithms (EfficientNet-B0 and MobileNet-V2).<h4>Results</h4>For image-quality assessment, the pre-diagnosis module achieved area under the receiver operating characteristic curve (AUROC) values of 0.975, 0.999, and 0.987 in the internal validation dataset and the two external testing datasets, respectively. For field-of-view assessment, the module had an AUROC value of 1.000 in all of the datasets. For laterality-of-the-eye assessment, the module had AUROC values of 1.000, 0.999, and 0.985 in the internal validation dataset and the two external testing datasets, respectively.<h4>Conclusions</h4>Our study showed that this three-in-one DL module for assessing image quality, field of view, and laterality of the eye of retinal photographs achieved excellent performance and generalizability across different centers and ethnicities.<h4>Translational relevance</h4>The proposed DL-based pre-diagnosis module realized accurate and automated assessments of image quality, field of view, and laterality of the eye of retinal photographs, which could be further integrated into AI-based models to improve operational flow for enhancing disease screening and diagnosis.
Project description:PURPOSE:To validate the performance of a commercially available, CE-certified deep learning (DL) system, RetCAD v.1.3.0 (Thirona, Nijmegen, The Netherlands), for the joint automatic detection of diabetic retinopathy (DR) and age-related macular degeneration (AMD) in colour fundus (CF) images on a dataset with mixed presence of eye diseases. METHODS:Evaluation of joint detection of referable DR and AMD was performed on a DR-AMD dataset with 600 images acquired during routine clinical practice, containing referable and non-referable cases of both diseases. Each image was graded for DR and AMD by an experienced ophthalmologist to establish the reference standard (RS), and by four independent observers for comparison with human performance. Validation was furtherly assessed on Messidor (1200 images) for individual identification of referable DR, and the Age-Related Eye Disease Study (AREDS) dataset (133 821 images) for referable AMD, against the corresponding RS. RESULTS:Regarding joint validation on the DR-AMD dataset, the system achieved an area under the ROC curve (AUC) of 95.1% for detection of referable DR (SE = 90.1%, SP = 90.6%). For referable AMD, the AUC was 94.9% (SE = 91.8%, SP = 87.5%). Average human performance for DR was SE = 61.5% and SP = 97.8%; for AMD, SE = 76.5% and SP = 96.1%. Regarding detection of referable DR in Messidor, AUC was 97.5% (SE = 92.0%, SP = 92.1%); for referable AMD in AREDS, AUC was 92.7% (SE = 85.8%, SP = 86.0%). CONCLUSION:The validated system performs comparably to human experts at simultaneous detection of DR and AMD. This shows that DL systems can facilitate access to joint screening of eye diseases and become a quick and reliable support for ophthalmological experts.
Project description:<h4>Purpose</h4>We aim to develop a multi-task three-dimensional (3D) deep learning (DL) model to detect glaucomatous optic neuropathy (GON) and myopic features (MF) simultaneously from spectral-domain optical coherence tomography (SDOCT) volumetric scans.<h4>Methods</h4>Each volumetric scan was labelled as GON according to the criteria of retinal nerve fibre layer (RNFL) thinning, with a structural defect that correlated in position with the visual field defect (i.e., reference standard). MF were graded by the SDOCT <i>en face</i> images, defined as presence of peripapillary atrophy (PPA), optic disc tilting, or fundus tessellation. The multi-task DL model was developed by ResNet with output of Yes/No GON and Yes/No MF. SDOCT scans were collected in a tertiary eye hospital (Hong Kong SAR, China) for training (80%), tuning (10%), and internal validation (10%). External testing was performed on five independent datasets from eye centres in Hong Kong, the United States, and Singapore, respectively. For GON detection, we compared the model to the average RNFL thickness measurement generated from the SDOCT device. To investigate whether MF can affect the model's performance on GON detection, we conducted subgroup analyses in groups stratified by Yes/No MF. The area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, and accuracy were reported.<h4>Results</h4>A total of 8,151 SDOCT volumetric scans from 3,609 eyes were collected. For detecting GON, in the internal validation, the proposed 3D model had significantly higher AUROC (0.949 vs. 0.913, <i>p</i> < 0.001) than average RNFL thickness in discriminating GON from normal. In the external testing, the two approaches had comparable performance. In the subgroup analysis, the multi-task DL model performed significantly better in the group of "no MF" (0.883 vs. 0.965, <i>p</i>-value < 0.001) in one external testing dataset, but no significant difference in internal validation and other external testing datasets. The multi-task DL model's performance to detect MF was also generalizable in all datasets, with the AUROC values ranging from 0.855 to 0.896.<h4>Conclusion</h4>The proposed multi-task 3D DL model demonstrated high generalizability in all the datasets and the presence of MF did not affect the accuracy of GON detection generally.
Project description:<h4>Introduction</h4>Deep Learning (DL) and Artificial Intelligence (AI) have become widespread due to the advanced technologies and availability of digital data. Supervised learning algorithms have shown human-level performance or even better and are better feature extractor-quantifier than unsupervised learning algorithms. To get huge dataset with good quality control, there is a need of an annotation tool with a customizable feature set. This paper evaluates the viability of having an in house annotation tool which works on a smartphone and can be used in a healthcare setting.<h4>Methods</h4>We developed a smartphone-based grading system to help researchers in grading multiple retinal fundi. The process consisted of designing the flow of user interface (UI) keeping in view feedback from experts. Quantitative and qualitative analysis of change in speed of a grader over time and feature usage statistics was done. The dataset size was approximately 16,000 images with adjudicated labels by a minimum of 2 doctors. Results for an AI model trained on the images graded using this tool and its validation over some public datasets were prepared.<h4>Results</h4>We created a DL model and analysed its performance for a binary referrable DR Classification task, whether a retinal image has Referrable DR or not. A total of 32 doctors used the tool for minimum of 20 images each. Data analytics suggested significant portability and flexibility of the tool. Grader variability for images was in favour of agreement on images annotated. Number of images used to assess agreement is 550. Mean of 75.9% was seen in agreement.<h4>Conclusion</h4>Our aim was to make Annotation of Medical imaging easier and to minimize time taken for annotations without quality degradation. The user feedback and feature usage statistics confirm our hypotheses of incorporation of brightness and contrast variations, green channels and zooming add-ons in correlation to certain disease types. Simulation of multiple review cycles and establishing quality control can boost the accuracy of AI models even further. Although our study aims at developing an annotation tool for diagnosing and classifying diabetic retinopathy fundus images but same concept can be used for fundus images of other ocular diseases as well as other streams of medical science such as radiology where image-based diagnostic applications are utilised.
Project description:<h4>Purpose</h4>Heatmapping techniques can support explainability of deep learning (DL) predictions in medical image analysis. However, individual techniques have been mainly applied in a descriptive way without an objective and systematic evaluation. We investigated comparative performances using diabetic retinopathy lesion detection as a benchmark task.<h4>Methods</h4>The Indian Diabetic Retinopathy Image Dataset (IDRiD) publicly available database contains fundus images of diabetes patients with pixel level annotations of diabetic retinopathy (DR) lesions, the ground truth for this study. Three in advance trained DL models (ResNet50, VGG16 or InceptionV3) were used for DR detection in these images. Next, explainability was visualized with each of the 10 most used heatmapping techniques. The quantitative correspondence between the output of a heatmap and the ground truth was evaluated with the Explainability Consistency Score (ECS), a metric between 0 and 1, developed for this comparative task.<h4>Results</h4>In case of the overall DR lesions detection, the ECS ranged from 0.21 to 0.51 for all model/heatmapping combinations. The highest score was for VGG16+Grad-CAM (ECS = 0.51; 95% confidence interval [CI]: [0.46; 0.55]). For individual lesions, VGG16+Grad-CAM performed best on hemorrhages and hard exudates. ResNet50+SmoothGrad performed best for soft exudates and ResNet50+Guided Backpropagation performed best for microaneurysms.<h4>Conclusions</h4>Our empirical evaluation on the IDRiD database demonstrated that the combination DL model/heatmapping affects explainability when considering common DR lesions. Our approach found considerable disagreement between regions highlighted by heatmaps and expert annotations.<h4>Translational relevance</h4>We warrant a more systematic investigation and analysis of heatmaps for reliable explanation of image-based predictions of deep learning models.
Project description:BACKGROUND:Retinal imaging has been applied for detecting eye diseases and cardiovascular risks using deep learning-based methods. Furthermore, retinal microvascular and structural changes were found in renal function impairments. However, a deep learning-based method using retinal images for detecting early renal function impairment has not yet been well studied. OBJECTIVE:This study aimed to develop and evaluate a deep learning model for detecting early renal function impairment using retinal fundus images. METHODS:This retrospective study enrolled patients who underwent renal function tests with color fundus images captured at any time between January 1, 2001, and August 31, 2019. A deep learning model was constructed to detect impaired renal function from the images. Early renal function impairment was defined as estimated glomerular filtration rate <90 mL/min/1.73 m2. Model performance was evaluated with respect to the receiver operating characteristic curve and area under the curve (AUC). RESULTS:In total, 25,706 retinal fundus images were obtained from 6212 patients for the study period. The images were divided at an 8:1:1 ratio. The training, validation, and testing data sets respectively contained 20,787, 2189, and 2730 images from 4970, 621, and 621 patients. There were 10,686 and 15,020 images determined to indicate normal and impaired renal function, respectively. The AUC of the model was 0.81 in the overall population. In subgroups stratified by serum hemoglobin A1c (HbA1c) level, the AUCs were 0.81, 0.84, 0.85, and 0.87 for the HbA1c levels of ?6.5%, >6.5%, >7.5%, and >10%, respectively. CONCLUSIONS:The deep learning model in this study enables the detection of early renal function impairment using retinal fundus images. The model was more accurate for patients with elevated serum HbA1c levels.
Project description:<h4>Importance</h4>A deep learning system (DLS) that could automatically detect glaucomatous optic neuropathy (GON) with high sensitivity and specificity could expedite screening for GON.<h4>Objective</h4>To establish a DLS for detection of GON using retinal fundus images and glaucoma diagnosis with convoluted neural networks (GD-CNN) that has the ability to be generalized across populations.<h4>Design, setting, and participants</h4>In this cross-sectional study, a DLS for the classification of GON was developed for automated classification of GON using retinal fundus images obtained from the Chinese Glaucoma Study Alliance, the Handan Eye Study, and online databases. The researchers selected 241 032 images were selected as the training data set. The images were entered into the databases on June 9, 2009, obtained on July 11, 2018, and analyses were performed on December 15, 2018. The generalization of the DLS was tested in several validation data sets, which allowed assessment of the DLS in a clinical setting without exclusions, testing against variable image quality based on fundus photographs obtained from websites, evaluation in a population-based study that reflects a natural distribution of patients with glaucoma within the cohort and an additive data set that has a diverse ethnic distribution. An online learning system was established to transfer the trained and validated DLS to generalize the results with fundus images from new sources. To better understand the DLS decision-making process, a prediction visualization test was performed that identified regions of the fundus images utilized by the DLS for diagnosis.<h4>Exposures</h4>Use of a deep learning system.<h4>Main outcomes and measures</h4>Area under the receiver operating characteristics curve (AUC), sensitivity and specificity for DLS with reference to professional graders.<h4>Results</h4>From a total of 274 413 fundus images initially obtained from CGSA, 269 601 images passed initial image quality review and were graded for GON. A total of 241 032 images (definite GON 29 865 [12.4%], probable GON 11 046 [4.6%], unlikely GON 200 121 [83%]) from 68 013 patients were selected using random sampling to train the GD-CNN model. Validation and evaluation of the GD-CNN model was assessed using the remaining 28 569 images from CGSA. The AUC of the GD-CNN model in primary local validation data sets was 0.996 (95% CI, 0.995-0.998), with sensitivity of 96.2% and specificity of 97.7%. The most common reason for both false-negative and false-positive grading by GD-CNN (51 of 119 [46.3%] and 191 of 588 [32.3%]) and manual grading (50 of 113 [44.2%] and 183 of 538 [34.0%]) was pathologic or high myopia.<h4>Conclusions and relevance</h4>Application of GD-CNN to fundus images from different settings and varying image quality demonstrated a high sensitivity, specificity, and generalizability for detecting GON. These findings suggest that automated DLS could enhance current screening programs in a cost-effective and time-efficient manner.
Project description:<h4>Importance</h4>Deep learning (DL) used for discriminative tasks in ophthalmology, such as diagnosing diabetic retinopathy or age-related macular degeneration (AMD), requires large image data sets graded by human experts to train deep convolutional neural networks (DCNNs). In contrast, generative DL techniques could synthesize large new data sets of artificial retina images with different stages of AMD. Such images could enhance existing data sets of common and rare ophthalmic diseases without concern for personally identifying information to assist medical education of students, residents, and retinal specialists, as well as for training new DL diagnostic models for which extensive data sets from large clinical trials of expertly graded images may not exist.<h4>Objective</h4>To develop DL techniques for synthesizing high-resolution realistic fundus images serving as proxy data sets for use by retinal specialists and DL machines.<h4>Design, setting, and participants</h4>Generative adversarial networks were trained on 133 821 color fundus images from 4613 study participants from the Age-Related Eye Disease Study (AREDS), generating synthetic fundus images with and without AMD. We compared retinal specialists' ability to diagnose AMD on both real and synthetic images, asking them to assess image gradability and testing their ability to discern real from synthetic images. The performance of AMD diagnostic DCNNs (referable vs not referable AMD) trained on either all-real vs all-synthetic data sets was compared.<h4>Main outcomes and measures</h4>Accuracy of 2 retinal specialists (T.Y.A.L. and K.D.P.) for diagnosing and distinguishing AMD on real vs synthetic images and diagnostic performance (area under the curve) of DL algorithms trained on synthetic vs real images.<h4>Results</h4>The diagnostic accuracy of 2 retinal specialists on real vs synthetic images was similar. The accuracy of diagnosis as referable vs nonreferable AMD compared with certified human graders for retinal specialist 1 was 84.54% (error margin, 4.06%) on real images vs 84.12% (error margin, 4.16%) on synthetic images and for retinal specialist 2 was 89.47% (error margin, 3.45%) on real images vs 89.19% (error margin, 3.54%) on synthetic images. Retinal specialists could not distinguish real from synthetic images, with an accuracy of 59.50% (error margin, 3.93%) for retinal specialist 1 and 53.67% (error margin, 3.99%) for retinal specialist 2. The DCNNs trained on real data showed an area under the curve of 0.9706 (error margin, 0.0029), and those trained on synthetic data showed an area under the curve of 0.9235 (error margin, 0.0045).<h4>Conclusions and relevance</h4>Deep learning-synthesized images appeared to be realistic to retinal specialists, and DCNNs achieved diagnostic performance on synthetic data close to that for real images, suggesting that DL generative techniques hold promise for training humans and machines.
Project description:Both genetic and environmental factors influence the etiology of age-related macular degeneration (AMD), a leading cause of blindness. AMD severity is primarily measured by fundus images and recently developed machine learning methods can successfully predict AMD progression using image data. However, none of these methods have utilized both genetic and image data for predicting AMD progression. Here we jointly used genotypes and fundus images to predict an eye as having progressed to late AMD with a modified deep convolutional neural network (CNN). In total, we used 31,262 fundus images and 52 AMD-associated genetic variants from 1,351 subjects from the Age-Related Eye Disease Study (AREDS) with disease severity phenotypes and fundus images available at baseline and follow-up visits over a period of 12 years. Our results showed that fundus images coupled with genotypes could predict late AMD progression with an averaged area under the curve (AUC) value of 0.85 (95%CI: 0.83-0.86). The results using fundus images alone showed an averaged AUC of 0.81 (95%CI: 0.80-0.83). We implemented our model in a cloud-based application for individual risk assessment.
Project description:Diabetic retinopathy (DR) screening images are heterogeneous and contain undesirable non-retinal, incorrect field and ungradable samples which require curation, a laborious task to perform manually. We developed and validated single and multi-output laterality, retinal presence, retinal field and gradability classification deep learning (DL) models for automated curation. The internal dataset comprised of 7743 images from DR screening (UK) with 1479 external test images (Portugal and Paraguay). Internal vs external multi-output laterality AUROC were right (0.994 vs 0.905), left (0.994 vs 0.911) and unidentifiable (0.996 vs 0.680). Retinal presence AUROC were (1.000 vs 1.000). Retinal field AUROC were macula (0.994 vs 0.955), nasal (0.995 vs 0.962) and other retinal field (0.997 vs 0.944). Gradability AUROC were (0.985 vs 0.918). DL effectively detects laterality, retinal presence, retinal field and gradability of DR screening images with generalisation between centres and populations. DL models could be used for automated image curation within DR screening.