Project description:PurposeSemantic segmentation is a fundamental part of the surgical application of deep learning. Traditionally, segmentation in vision tasks has been performed using convolutional neural networks (CNNs), but the transformer architecture has recently been introduced and widely investigated. We aimed to investigate the performance of deep learning models in segmentation in robot-assisted radical prostatectomy (RARP) and identify which of the architectures is superior for segmentation in robotic surgery.Materials and methodsIntraoperative images during RARP were obtained. The dataset was randomly split into training and validation data. Segmentation of the surgical instruments, bladder, prostate, vas and seminal vesicle was performed using three CNN models (DeepLabv3, MANet, and U-Net++) and three transformers (SegFormer, BEiT, and DPT), and their performances were analyzed.ResultsThe overall segmentation performance during RARP varied across different model architectures. For the CNN models, DeepLabV3 achieved a mean Dice score of 0.938, MANet scored 0.944, and U-Net++ reached 0.930. For the transformer architectures, SegFormer attained a mean Dice score of 0.919, BEiT scored 0.916, and DPT achieved 0.940. The performance of CNN models was superior to that of transformer models in segmenting the prostate, vas, and seminal vesicle.ConclusionsDeep learning models provided accurate segmentation of the surgical instruments and anatomical structures observed during RARP. Both CNN and transformer models showed reliable predictions in the segmentation task; however, CNN models may be more suitable than transformer models for organ segmentation and may be more applicable in unusual cases. Further research with large datasets is needed.
Project description:Background/aimsPrevious artificial intelligence (AI) models attempting to segment gastric intestinal metaplasia (GIM) areas have failed to be deployed in real-time endoscopy due to their slow inference speeds. Here, we propose a new GIM segmentation AI model with inference speeds faster than 25 frames per second that maintains a high level of accuracy.MethodsInvestigators from Chulalongkorn University obtained 802 histological-proven GIM images for AI model training. Four strategies were proposed to improve the model accuracy. First, transfer learning was employed to the public colon datasets. Second, an image preprocessing technique contrast-limited adaptive histogram equalization was employed to produce clearer GIM areas. Third, data augmentation was applied for a more robust model. Lastly, the bilateral segmentation network model was applied to segment GIM areas in real time. The results were analyzed using different validity values.ResultsFrom the internal test, our AI model achieved an inference speed of 31.53 frames per second. GIM detection showed sensitivity, specificity, positive predictive, negative predictive, accuracy, and mean intersection over union in GIM segmentation values of 93%, 80%, 82%, 92%, 87%, and 57%, respectively.ConclusionThe bilateral segmentation network combined with transfer learning, contrast-limited adaptive histogram equalization, and data augmentation can provide high sensitivity and good accuracy for GIM detection and segmentation.
Project description:Urban area mapping is an important application of remote sensing which aims at both estimation and change in land cover under the urban area. A major challenge being faced while analyzing Synthetic Aperture Radar (SAR) based remote sensing data is that there is a lot of similarity between highly vegetated urban areas and oriented urban targets with that of actual vegetation. This similarity between some urban areas and vegetation leads to misclassification of the urban area into forest cover. The present work is a precursor study for the dual-frequency L and S-band NASA-ISRO Synthetic Aperture Radar (NISAR) mission and aims at minimizing the misclassification of such highly vegetated and oriented urban targets into vegetation class with the help of deep learning. In this study, three machine learning algorithms Random Forest (RF), K-Nearest Neighbour (KNN), and Support Vector Machine (SVM) have been implemented along with a deep learning model DeepLabv3+ for semantic segmentation of Polarimetric SAR (PolSAR) data. It is a general perception that a large dataset is required for the successful implementation of any deep learning model but in the field of SAR based remote sensing, a major issue is the unavailability of a large benchmark labeled dataset for the implementation of deep learning algorithms from scratch. In current work, it has been shown that a pre-trained deep learning model DeepLabv3+ outperforms the machine learning algorithms for land use and land cover (LULC) classification task even with a small dataset using transfer learning. The highest pixel accuracy of 87.78% and overall pixel accuracy of 85.65% have been achieved with DeepLabv3+ and Random Forest performs best among the machine learning algorithms with overall pixel accuracy of 77.91% while SVM and KNN trail with an overall accuracy of 77.01% and 76.47% respectively. The highest precision of 0.9228 is recorded for the urban class for semantic segmentation task with DeepLabv3+ while machine learning algorithms SVM and RF gave comparable results with a precision of 0.8977 and 0.8958 respectively.
Project description:Semantic segmentation of cityscapes via deep learning is an essential and game-changing research topic that offers a more nuanced comprehension of urban landscapes. Deep learning techniques tackle urban complexity and diversity, which unlocks a broad range of applications. These include urban planning, transportation management, autonomous driving, and smart city efforts. Through rich context and insights, semantic segmentation helps decision-makers and stakeholders make educated decisions for sustainable and effective urban development. This study investigates an in-depth exploration of cityscape image segmentation using the U-Net deep learning model. The proposed U-Net architecture comprises an encoder and decoder structure. The encoder uses convolutional layers and down sampling to extract hierarchical information from input images. Each down sample step reduces spatial dimensions, and increases feature depth, aiding context acquisition. Batch normalization and dropout layers stabilize models and prevent overfitting during encoding. The decoder reconstructs higher-resolution feature maps using "UpSampling2D" layers. Through extensive experimentation and evaluation of the Cityscapes dataset, this study demonstrates the effectiveness of the U-Net model in achieving state-of-the-art results in image segmentation. The results clearly shown that, the proposed model has high accuracy, mean IOU and mean DICE compared to existing models.
Project description:BackgroundManual objective assessment of skill and errors in minimally invasive surgery have been validated with correlation to surgical expertise and patient outcomes. However, assessment and error annotation can be subjective and are time-consuming processes, often precluding their use. Recent years have seen the development of artificial intelligence models to work towards automating the process to allow reduction of errors and truly objective assessment. This study aimed to validate surgical skill rating and error annotations in suturing gestures to inform the development and evaluation of AI models.MethodsSAR-RARP50 open data set was blindly, independently annotated at the gesture level in Robotic-Assisted Radical Prostatectomy (RARP) suturing. Manual objective assessment tools and error annotation methodology, Objective Clinical Human Reliability Analysis (OCHRA), were used as ground truth to train and test vision-based deep learning methods to estimate skill and errors. Analysis included descriptive statistics plus tool validity and reliability.ResultsFifty-four RARP videos (266 min) were analysed. Strong/excellent inter-rater reliability (range r = 0.70-0.89, p < 0.001) and very strong correlation (r = 0.92, p < 0.001) between objective assessment tools was demonstrated. Skill estimation of OSATS and M-GEARS had a Spearman's Correlation Coefficient 0.37 and 0.36, respectively, with normalised mean absolute error representing a prediction error of 17.92% (inverted "accuracy" 82.08%) and 20.6% (inverted "accuracy" 79.4%) respectively. The best performing models in error prediction achieved mean absolute precision of 37.14%, area under the curve 65.10% and Macro-F1 58.97%.ConclusionsThis is the first study to employ detailed error detection methodology and deep learning models within real robotic surgical video. This benchmark evaluation of AI models sets a foundation and promising approach for future advancements in automated technical skill assessment.
Project description:Background and objectivesIntradialytic hypotension has high clinical significance. However, predicting it using conventional statistical models may be difficult because several factors have interactive and complex effects on the risk. Herein, we applied a deep learning model (recurrent neural network) to predict the risk of intradialytic hypotension using a timestamp-bearing dataset.Design, setting, participants, & measurementsWe obtained 261,647 hemodialysis sessions with 1,600,531 independent timestamps (i.e., time-varying vital signs) and randomly divided them into training (70%), validation (5%), calibration (5%), and testing (20%) sets. Intradialytic hypotension was defined when nadir systolic BP was <90 mm Hg (termed intradialytic hypotension 1) or when a decrease in systolic BP ≥20 mm Hg and/or a decrease in mean arterial pressure ≥10 mm Hg on the basis of the initial BPs (termed intradialytic hypotension 2) or prediction time BPs (termed intradialytic hypotension 3) occurred within 1 hour. The area under the receiver operating characteristic curves, the area under the precision-recall curves, and F1 scores obtained using the recurrent neural network model were compared with those obtained using multilayer perceptron, Light Gradient Boosting Machine, and logistic regression models.ResultsThe recurrent neural network model for predicting intradialytic hypotension 1 achieved an area under the receiver operating characteristic curve of 0.94 (95% confidence intervals, 0.94 to 0.94), which was higher than those obtained using the other models (P<0.001). The recurrent neural network model for predicting intradialytic hypotension 2 and intradialytic hypotension 3 achieved area under the receiver operating characteristic curves of 0.87 (interquartile range, 0.87-0.87) and 0.79 (interquartile range, 0.79-0.79), respectively, which were also higher than those obtained using the other models (P≤0.001). The area under the precision-recall curve and F1 score were higher using the recurrent neural network model than they were using the other models. The recurrent neural network models for intradialytic hypotension were highly calibrated.ConclusionsOur deep learning model can be used to predict the real-time risk of intradialytic hypotension.
Project description:Convolutional neural network (CNN) models obtain state of the art performance on image classification, localization, and segmentation tasks. Limitations in computer hardware, most notably memory size in deep learning accelerator cards, prevent relatively large images, such as those from medical and satellite imaging, from being processed as a whole in their original resolution. A fully convolutional topology, such as U-Net, is typically trained on down-sampled images and inferred on images of their original size and resolution, by simply dividing the larger image into smaller (typically overlapping) tiles, making predictions on these tiles, and stitching them back together as the prediction for the whole image. In this study, we show that this tiling technique combined with translationally-invariant nature of CNNs causes small, but relevant differences during inference that can be detrimental in the performance of the model. Here we quantify these variations in both medical (i.e., BraTS) and non-medical (i.e., satellite) images and show that training a 2D U-Net model on the whole image substantially improves the overall model performance. Finally, we compare 2D and 3D semantic segmentation models to show that providing CNN models with a wider context of the image in all three dimensions leads to more accurate and consistent predictions. Our results suggest that tiling the input to CNN models-while perhaps necessary to overcome the memory limitations in computer hardware-may lead to undesirable and unpredictable errors in the model's output that can only be adequately mitigated by increasing the input of the model to the largest possible tile size.
Project description:Background/objectivesIn the field of surgical medicine, the planning and execution of liver resection procedures present formidable challenges, primarily attributable to the intricate and highly individualized nature of liver vascular anatomy. In the current surgical milieu, intraoperative ultrasonography (IOUS) has become indispensable; however, traditional 2D ultrasound imaging's interpretability is hindered by noise and speckle artifacts. Accurate identification of critical structures for preservation during hepatectomy requires advanced surgical skills.MethodsAn AI-based model that can help detect and recognize vessels including the inferior vena cava (IVC); the right (RHV), middle (MHV), and left (LVH) hepatic veins; the portal vein (PV) and its major first and second order branches the left portal vein (LPV), right portal vein (RPV), and right anterior (RAPV) and posterior (RPPV) portal veins, for real-time IOUS navigation can be of immense value in liver surgery. This research aims to advance the capabilities of IOUS-guided interventions by applying an innovative AI-based approach named the "2D-weigthed U-Net model" for the segmentation of multiple blood vessels in real-time IOUS video frames.ResultsOur proposed deep learning (DL) model achieved a mean Dice score of 0.92 for IVC, 0.90 for RHV, 0.89 for MHV, 0.86 for LHV, 0.95 for PV, 0.93 for LPV, 0.84 for RPV, 0.85 for RAPV, and 0.96 for RPPV.ConclusionIn the future, this research will be extended for real-time multi-label segmentation of extended vasculature in the liver, followed by the translation of our model into the surgical suite.