Project description:BackgroundAutomated classification of medical images through neural networks can reach high accuracy rates but lacks interpretability.ObjectivesTo compare the diagnostic accuracy obtained by using content-based image retrieval (CBIR) to retrieve visually similar dermatoscopic images with corresponding disease labels against predictions made by a neural network.MethodsA neural network was trained to predict disease classes on dermatoscopic images from three retrospectively collected image datasets containing 888, 2750 and 16 691 images, respectively. Diagnosis predictions were made based on the most commonly occurring diagnosis in visually similar images, or based on the top-1 class prediction of the softmax output from the network. Outcome measures were area under the receiver operating characteristic curve (AUC) for predicting a malignant lesion, multiclass-accuracy and mean average precision (mAP), measured on unseen test images of the corresponding dataset.ResultsIn all three datasets the skin cancer predictions from CBIR (evaluating the 16 most similar images) showed AUC values similar to softmax predictions (0·842, 0·806 and 0·852 vs. 0·830, 0·810 and 0·847, respectively; P > 0·99 for all). Similarly, the multiclass-accuracy of CBIR was comparable with softmax predictions. Compared with softmax predictions, networks trained for detecting only three classes performed better on a dataset with eight classes when using CBIR (mAP 0·184 vs. 0·368 and 0·198 vs. 0·403, respectively).ConclusionsPresenting visually similar images based on features from a neural network shows comparable accuracy with the softmax probability-based diagnoses of convolutional neural networks. CBIR may be more helpful than a softmax classifier in improving diagnostic accuracy of clinicians in a routine clinical setting.
Project description:Composite models that combine medical imaging with electronic medical records (EMR) improve predictive power when compared to traditional models that use imaging alone. The digitization of EMR provides potential access to a wealth of medical information, but presents new challenges in algorithm design and inference. Previous studies, such as Phenome Wide Association Study (PheWAS), have shown that EMR data can be used to investigate the relationship between genotypes and clinical conditions. Here, we introduce Phenome-Disease Association Study to extend the statistical capabilities of the PheWAS software through a custom Python package, which creates diagnostic EMR signatures to capture system-wide co-morbidities for a disease population within a given time interval. We investigate the effect of integrating these EMR signatures with radiological data to improve diagnostic classification in disease domains known to have confounding factors because of variable and complex clinical presentation. Specifically, we focus on two studies: First, a study of four major optic nerve related conditions; and second, a study of diabetes. Addition of EMR signature vectors to radiologically derived structural metrics improves the area under the curve (AUC) for diagnostic classification using elastic net regression, for diseases of the optic nerve. For glaucoma, the AUC improves from 0.71 to 0.83, for intrinsic optic nerve disease it increases from 0.72 to 0.91, for optic nerve edema it increases from 0.95 to 0.96, and for thyroid eye disease from 0.79 to 0.89. The EMR signatures recapitulate known comorbidities with diabetes, such as abnormal glucose, but do not significantly modulate image-derived features. In summary, EMR signatures present a scalable and readily applicable.
Project description:With great potential for assisting radiological image interpretation and decision making, content-based image retrieval in the medical domain has become a hot topic in recent years. Many methods to enhance the performance of content-based medical image retrieval have been proposed, among which the relevance feedback (RF) scheme is one of the most promising. Given user feedback information, RF algorithms interactively learn a user's preferences to bridge the "semantic gap" between low-level computerized visual features and high-level human semantic perception and thus improve retrieval performance. However, most existing RF algorithms perform in the original high-dimensional feature space and ignore the manifold structure of the low-level visual features of images. In this paper, we propose a new method, termed dual-force ISOMAP (DFISOMAP), for content-based medical image retrieval. Under the assumption that medical images lie on a low-dimensional manifold embedded in a high-dimensional ambient space, DFISOMAP operates in the following three stages. First, the geometric structure of positive examples in the learned low-dimensional embedding is preserved according to the isometric feature mapping (ISOMAP) criterion. To precisely model the geometric structure, a reconstruction error constraint is also added. Second, the average distance between positive and negative examples is maximized to separate them; this margin maximization acts as a force that pushes negative examples far away from positive examples. Finally, the similarity propagation technique is utilized to provide negative examples with another force that will pull them back into the negative sample set. We evaluate the proposed method on a subset of the IRMA medical image dataset with a RF-based medical image retrieval framework. Experimental results show that DFISOMAP outperforms popular approaches for content-based medical image retrieval in terms of accuracy and stability.
Project description:Medical images commonly exhibit multiple abnormalities. Predicting them requires multi-class classifiers whose training and desired reliable performance can be affected by a combination of factors, such as, dataset size, data source, distribution, and the loss function used to train deep neural networks. Currently, the cross-entropy loss remains the de-facto loss function for training deep learning classifiers. This loss function, however, asserts equal learning from all classes, leading to a bias toward the majority class. Although the choice of the loss function impacts model performance, to the best of our knowledge, we observed that no literature exists that performs a comprehensive analysis and selection of an appropriate loss function toward the classification task under study. In this work, we benchmark various state-of-the-art loss functions, critically analyze model performance, and propose improved loss functions for a multi-class classification task. We select a pediatric chest X-ray (CXR) dataset that includes images with no abnormality (normal), and those exhibiting manifestations consistent with bacterial and viral pneumonia. We construct prediction-level and model-level ensembles to improve classification performance. Our results show that compared to the individual models and the state-of-the-art literature, the weighted averaging of the predictions for top-3 and top-5 model-level ensembles delivered significantly superior classification performance (p < 0.05) in terms of MCC (0.9068, 95% confidence interval (0.8839, 0.9297)) metric. Finally, we performed localization studies to interpret model behavior and confirm that the individual models and ensembles learned task-specific features and highlighted disease-specific regions of interest. The code is available at https://github.com/sivaramakrishnan-rajaraman/multiloss_ensemble_models.
Project description:In December 2019, an outbreak of novel coronavirus pneumonia spread over Wuhan, Hubei Province, China, which then developed into a significant global health public event, giving rise to substantial economic losses. We downloaded throat swab expression profiling data of COVID-19 positive and negative patients from the Gene Expression Omnibus (GEO) database to mine novel diagnostic biomarkers. XGBoost was used to construct the model and select feature genes. Subsequently, we constructed COVID-19 classifiers such as MARS, KNN, SVM, MIL, and RF using machine learning methods. We selected the KNN classifier with the optimal MCC value from these classifiers using the IFS method to identify 24 feature genes. Finally, we used principal component analysis to classify the samples and found that the 24 feature genes could effectively be used to classify COVID-19-positive and negative patients. Additionally, we analyzed the possible biological functions and signaling pathways in which the 24 feature genes were involved by GO and KEGG enrichment analyses. The results demonstrated that these feature genes were primarily enriched in biological functions such as viral transcription and viral gene expression and pathways such as Coronavirus disease-COVID-19. In summary, the 24 feature genes we identified were highly effective in classifying COVID-19 positive and negative patients, which could serve as novel markers for COVID-19.
Project description:Nearest neighbor search, also known as NNS, is a technique used to locate the points in a high-dimensional space closest to a given query point. This technique has multiple applications in medicine, such as searching large medical imaging databases, disease classification, and diagnosis. However, when the number of points is significantly large, the brute-force approach for finding the nearest neighbor becomes computationally infeasible. Therefore, various approaches have been developed to make the search faster and more efficient to support the applications. With a focus on medical imaging, this paper proposes DenseLinkSearch (DLS), an effective and efficient algorithm that searches and retrieves the relevant images from heterogeneous sources of medical images. Towards this, given a medical database, the proposed algorithm builds an index that consists of pre-computed links of each point in the database. The search algorithm utilizes the index to efficiently traverse the database in search of the nearest neighbor. We also explore the role of medical image feature representation in content-based medical image retrieval tasks. We propose a Transformer-based feature representation technique that outperformed the existing pre-trained Transformer-based approaches on benchmark medical image retrieval datasets. We extensively tested the proposed NNS approach and compared the performance with state-of-the-art NNS approaches on benchmark datasets and our created medical image datasets. The proposed approach outperformed the existing approaches in terms of retrieving accurate neighbors and retrieval speed. In comparison to the existing approximate NNS approaches, our proposed DLS approach outperformed them in terms of lower average time per query and ≥ 99% R@10 on 11 out of 13 benchmark datasets. We also found that the proposed medical feature representation approach is better for representing medical images compared to the existing pre-trained image models. The proposed feature extraction strategy obtained an improvement of 9.37%, 7.0%, and 13.33% in terms of P@5, P@10, and P@20, respectively, in comparison to the best-performing pre-trained image model. The source code and datasets of our experiments are available at https://github.com/deepaknlp/DLS.
Project description:Content-based medical image retrieval continues to gain attention for its potential to assist radiological image interpretation and decision making. Many approaches have been proposed to improve the performance of medical image retrieval system, among which visual features such as SIFT, LBP, and intensity histogram play a critical role. Typically, these features are concatenated into a long vector to represent medical images, and thus traditional dimension reduction techniques such as locally linear embedding (LLE), principal component analysis (PCA), or laplacian eigenmaps (LE) can be employed to reduce the "curse of dimensionality". Though these approaches show promising performance for medical image retrieval, the feature-concatenating method ignores the fact that different features have distinct physical meanings. In this paper, we propose a new method called multiview locally linear embedding (MLLE) for medical image retrieval. Following the patch alignment framework, MLLE preserves the geometric structure of the local patch in each feature space according to the LLE criterion. To explore complementary properties among a range of features, MLLE assigns different weights to local patches from different feature spaces. Finally, MLLE employs global coordinate alignment and alternating optimization techniques to learn a smooth low-dimensional embedding from different features. To justify the effectiveness of MLLE for medical image retrieval, we compare it with conventional spectral embedding methods. We conduct experiments on a subset of the IRMA medical image data set. Evaluation results show that MLLE outperforms state-of-the-art dimension reduction methods.
Project description:Rapid advancement in imaging technology generates an enormous amount of heterogeneous medical data for disease diagnosis and rehabilitation process. Radiologists may require related clinical cases from medical archives for analysis and disease diagnosis. It is challenging to retrieve the associated clinical cases automatically, efficiently and accurately from the substantial medical image archive due to diversity in diseases and imaging modalities. We proposed an efficient and accurate approach for medical image modality classification that can used for retrieval of clinical cases from large medical repositories. The proposed approach is developed using transfer learning concept with pre-trained ResNet50 Deep learning model for optimized features extraction followed by linear discriminant analysis classification (TLRN-LDA). Extensive experiments are performed on challenging standard benchmark ImageCLEF-2012 dataset of 31 classes. The developed approach yields improved average classification accuracy of 87.91%, which is higher up-to 10% compared to the state-of-the-art approaches on the same dataset. Moreover, hand-crafted features are extracted for comparison. Performance of TLRN-LDA system demonstrates the effectiveness over state-of-the-art systems. The developed approach may be deployed to diagnostic centers to assist the practitioners for accurate and efficient clinical case retrieval and disease diagnosis.
Project description:Medical image analysis for perfect diagnosis of disease has become a very challenging task. Due to improper diagnosis, required medical treatment may be skipped. Proper diagnosis is needed as suspected lesions could be missed by the physician's eye. Hence, this problem can be settled up by better means with the investigation of similar case studies present in the healthcare database. In this context, this paper substantiates an assistive system that would help dermatologists for accurate identification of 23 different kinds of melanoma. For this, 2300 dermoscopic images were used to train the skin-melanoma similar image search system. The proposed system uses feature extraction by assigning dynamic weights to the low-level features based on the individual characteristics of the searched images. Optimal weights are obtained by the newly proposed optimized pair-wise comparison (OPWC) approach. The uniqueness of the proposed approach is that it provides the dynamic weights to the features of the searched image instead of applying static weights. The proposed approach is supported by analytic hierarchy process (AHP) and meta-heuristic optimization algorithms such as particle swarm optimization (PSO), JAYA, genetic algorithm (GA), and gray wolf optimization (GWO). The proposed approach has been tested with images of 23 classes of melanoma and achieved significant precision and recall. Thus, this approach of skin melanoma image search can be used as an expert assistive system to help dermatologists/physicians for accurate identification of different types of melanomas.
Project description:The global health crisis due to the fast spread of coronavirus disease (Covid-19) has caused great danger to all aspects of healthcare, economy, and other aspects. The highly infectious and insidious nature of the new coronavirus greatly increases the difficulty of outbreak prevention and control. The early and rapid detection of Covid-19 is an effective way to reduce the spread of Covid-19. However, detecting Covid-19 accurately and quickly in large populations remains to be a major challenge worldwide. In this study, A CNN-transformer fusion framework is proposed for the automatic classification of pneumonia on chest X-ray. This framework includes two parts: data processing and image classification. The data processing stage is to eliminate the differences between data from different medical institutions so that they have the same storage format; in the image classification stage, we use a multi-branch network with a custom convolution module and a transformer module, including feature extraction, feature focus, and feature classification sub-networks. Feature extraction subnetworks extract the shallow features of the image and interact with the information through the convolution and transformer modules. Both the local and global features are extracted by the convolution module and transformer module of feature-focus subnetworks, and are classified by the feature classification subnetworks. The proposed network could decide whether or not a patient has pneumonia, and differentiate between Covid-19 and bacterial pneumonia. This network was implemented on the collected benchmark datasets and the result shows that accuracy, precision, recall, and F1 score are 97.09%, 97.16%, 96.93%, and 97.04%, respectively. Our network was compared with other researchers' proposed methods and achieved better results in terms of accuracy, precision, and F1 score, proving that it is superior for Covid-19 detection. With further improvements to this network, we hope that it will provide doctors with an effective tool for diagnosing Covid-19.