Project description:This paper introduces a video dataset for semantic segmentation of road potholes. This dataset contains 619 high-resolution videos captured in January 2023, covering locations in eight villages within the Hulu Sungai Tengah regency of South Kalimantan, Indonesia. The dataset is divided into three main folders, namely train, val, and test. The train, val, and test folders contain 372 videos for training, 124 videos for validation, and 123 videos for testing, respectively. Each of these main folders has two subfolders, ``RGB'' for the video in the RGB format and ``mask'' for the ground truth segmentation. These videos are precisely two seconds long, containing 48 frames each, and all are in MP4 format. The dataset offers remarkable flexibility, accommodating various research needs, from full-video segmentation to frame extraction. It enables researchers to create ground truth annotations and change the combination of videos in the folders according to their needs. This resource is an asset for researchers, engineers, policymakers, and anyone interested in advancing algorithms for pothole detection and analysis. This dataset allows for benchmarking semantic segmentation algorithms, conducting comparative studies on pothole detection methods, and exploring innovative approaches, offering valuable contributions to the computer vision community.
Project description:Purpose/aimThis paper provides a pedagogical example for systematic machine learning optimization in small dataset image segmentation, emphasizing hyperparameter selections. A simple process is presented for medical physicists to examine hyperparameter optimization. This is also applied to a case-study, demonstrating the benefit of the method.Materials and methodsAn unrestricted public Computed Tomography (CT) dataset, with binary organ segmentation, was used to develop a multiclass segmentation model. To start the optimization process, a preliminary manual search of hyperparameters was conducted and from there a grid search identified the most influential result metrics. A total of 658 different models were trained in 2100 h, using 13 160 effective patients. The quantity of results was analyzed using random forest regression, identifying relative hyperparameter impact.ResultsMetric implied segmentation quality (accuracy 96.8%, precision 95.1%) and visual inspection were found to be mismatched. In this work batch normalization was most important, but performance varied with hyperparameters and metrics selected. Targeted grid-search optimization and random forest analysis of relative hyperparameter importance, was an easily implementable sensitivity analysis approach.ConclusionThe proposed optimization method gives a systematic and quantitative approach to something intuitively understood, that hyperparameters change model performance. Even just grid search optimization with random forest analysis presented here can be informative within hardware and data quality/availability limitations, adding confidence to model validity and minimize decision-making risks. By providing a guided methodology, this work helps medical physicists to improve their model optimization, irrespective of specific challenges posed by datasets and model design.
Project description:Smart farming (SF) applications rely on robust and accurate computer vision systems. An important computer vision task in agriculture is semantic segmentation, which aims to classify each pixel of an image and can be used for selective weed removal. State-of-the-art implementations use convolutional neural networks (CNN) that are trained on large image datasets. In agriculture, publicly available RGB image datasets are scarce and often lack detailed ground-truth information. In contrast to agriculture, other research areas feature RGB-D datasets that combine color (RGB) with additional distance (D) information. Such results show that including distance as an additional modality can improve model performance further. Therefore, we introduce WE3DS as the first RGB-D image dataset for multi-class plant species semantic segmentation in crop farming. It contains 2568 RGB-D images (color image and distance map) and corresponding hand-annotated ground-truth masks. Images were taken under natural light conditions using an RGB-D sensor consisting of two RGB cameras in a stereo setup. Further, we provide a benchmark for RGB-D semantic segmentation on the WE3DS dataset and compare it with a solely RGB-based model. Our trained models achieve up to 70.7% mean Intersection over Union (mIoU) for discriminating between soil, seven crop species, and ten weed species. Finally, our work confirms the finding that additional distance information improves segmentation quality.
Project description:This paper presents a dataset of bird's eye chilies in a single farm for semantic segmentation. The dataset is generated using two cameras that are aligned left and right forming a stereo-vision video capture. By analyzing the disparity between corresponding points in the left and right images, algorithms can calculate the relative distance of objects in the scene. This depth information is useful in various applications, including 3D reconstruction, object tracking, and autonomous navigation. The dataset consists of 1150 left and right compressed images extracted from ten sets of stereo videos taken at ten different locations within the chili farm from the same ages of the bird's eye chilies. Since the dataset is used for semantic segmentation, the ground truth images of manually semantic segmented images are also provided in the dataset. The dataset can be used for 2D and 3D semantic segmentation of the bird's eye view chili farm. Some of the object classes in this dataset are the sky, living things, plantation, flat, construction, nature, and misc.
Project description:The purpose of the dataset is to provide annotated images for pixel classification tasks with application to powered wheelchair users. As some of the widely available datasets contain only general objects, we introduced this dataset to cover the missing pieces, which can be considered as application-specific objects. However, these objects of interest are not only important for powered wheelchair users but also for indoor navigation and environmental understanding in general. For example, indoor assistive and service robots need to comprehend their surroundings to ease navigation and interaction with different size objects. The proposed dataset is recorded using a camera installed on a powered wheelchair. The camera is installed beneath the joystick so that it can have a clear vision with no obstructions from the user's body or legs. The powered wheelchair is then driven through the corridors of the indoor environment, and a one-minute video is recorded. The collected video is annotated on the pixel level for semantic segmentation (pixel classification) tasks. Pixels of different objects are annotated using MATLAB software. The dataset has various object sizes (small, medium, and large), which can explain the variation of the pixel's distribution in the dataset. Usually, Deep Convolutional Neural Networks (DCNNs) that perform well on large-size objects fail to produce accurate results on small-size objects. Whereas training a DCNN on a multi-size objects dataset can build more robust systems. Although the recorded objects are vital for many applications, we have included more images of different kinds of door handles with different angles, orientations, and illuminations as they are rare in the publicly available datasets. The proposed dataset has 1549 images and covers nine different classes. We used the dataset to train and test a semantic segmentation system that can aid and guide visually impaired users by providing visual cues. The dataset is made publicly available at this link.
Project description:Optimizations in logistics require recognition and analysis of human activities. The potential of sensor-based human activity recognition (HAR) in logistics is not yet well explored. Despite a significant increase in HAR datasets in the past twenty years, no available dataset depicts activities in logistics. This contribution presents the first freely accessible logistics-dataset. In the 'Innovationlab Hybrid Services in Logistics' at TU Dortmund University, two picking and one packing scenarios were recreated. Fourteen subjects were recorded individually when performing warehousing activities using Optical marker-based Motion Capture (OMoCap), inertial measurement units (IMUs), and an RGB camera. A total of 758 min of recordings were labeled by 12 annotators in 474 person-h. All the given data have been labeled and categorized into 8 activity classes and 19 binary coarse-semantic descriptions, also called attributes. The dataset is deployed for solving HAR using deep networks.
Project description:ObjectiveState-of-the-art techniques for surgical data analysis report promising results for automated skill assessment and action recognition. The contributions of many of these techniques, however, are limited to study-specific data and validation metrics, making assessment of progress across the field extremely challenging.MethodsIn this paper, we address two major problems for surgical data analysis: First, lack of uniform-shared datasets and benchmarks, and second, lack of consistent validation processes. We address the former by presenting the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), a public dataset that we have created to support comparative research benchmarking. JIGSAWS contains synchronized video and kinematic data from multiple performances of robotic surgical tasks by operators of varying skill. We address the latter by presenting a well-documented evaluation methodology and reporting results for six techniques for automated segmentation and classification of time-series data on JIGSAWS. These techniques comprise four temporal approaches for joint segmentation and classification: hidden Markov model, sparse hidden Markov model (HMM), Markov semi-Markov conditional random field, and skip-chain conditional random field; and two feature-based ones that aim to classify fixed segments: bag of spatiotemporal features and linear dynamical systems.ResultsMost methods recognize gesture activities with approximately 80% overall accuracy under both leave-one-super-trial-out and leave-one-user-out cross-validation settings.ConclusionCurrent methods show promising results on this shared dataset, but room for significant progress remains, particularly for consistent prediction of gesture activities across different surgeons.SignificanceThe results reported in this paper provide the first systematic and uniform evaluation of surgical activity recognition techniques on the benchmark database.
Project description:The development of compact and energy-efficient wearable sensors has led to an increase in the availability of biosignals. To effectively and efficiently analyze continuously recorded and multidimensional time series at scale, the ability to perform meaningful unsupervised data segmentation is an auspicious target. A common way to achieve this is to identify change-points within the time series as the segmentation basis. However, traditional change-point detection algorithms often come with drawbacks, limiting their real-world applicability. Notably, they generally rely on the complete time series to be available and thus cannot be used for real-time applications. Another common limitation is that they poorly (or cannot) handle the segmentation of multidimensional time series. Consequently, the main contribution of this work is to propose a novel unsupervised segmentation algorithm for multidimensional time series named Latent Space Unsupervised Semantic Segmentation (LS-USS), which was designed to easily work with both online and batch data. Latent Space Unsupervised Semantic Segmentation addresses the challenge of multivariate change-point detection by utilizing an autoencoder to learn a 1-dimensional latent space on which change-point detection is then performed. To address the challenge of real-time time series segmentation, this work introduces the Local Threshold Extraction Algorithm (LTEA) and a "batch collapse" algorithm. The "batch collapse" algorithm enables Latent Space Unsupervised Semantic Segmentation to process streaming data by dividing it into manageable batches, while Local Threshold Extraction Algorithm is employed to detect change-points in the time series whenever the computed metric by Latent Space Unsupervised Semantic Segmentation exceeds a predefined threshold. By using these algorithms in combination, our approach is able to accurately segment time series data in real-time, making it well-suited for applications where timely detection of changes is critical. When evaluating Latent Space Unsupervised Semantic Segmentation on a variety of real-world datasets the Latent Space Unsupervised Semantic Segmentation systematically achieves equal or better performance than other state-of-the-art change-point detection algorithms it is compared to in both offline and real-time settings.
Project description:Floods are among the most destructive extreme events that exist, being the main cause of people affected by natural disasters. In the near future, estimated flood intensity and frequency are projected to increase. In this context, automatic and accurate satellite-derived flood maps are key for fast emergency response and damage assessment. However, current approaches for operational flood mapping present limitations due to cloud coverage on acquired satellite images, the accuracy of flood detection, and the generalization of methods across different geographies. In this work, a machine learning framework for operational flood mapping from optical satellite images addressing these problems is presented. It is based on a clouds-aware segmentation model trained in an extended version of the WorldFloods dataset. The model produces accurate and fast water segmentation masks even in areas covered by semitransparent clouds, increasing the coverage for emergency response scenarios. The proposed approach can be applied to both Sentinel-2 and Landsat 8/9 data, which enables a much higher revisit of the damaged region, also key for operational purposes. Detection accuracy and generalization of proposed model is carefully evaluated in a novel global dataset composed of manually labeled flood maps. We provide evidence of better performance than current operational methods based on thresholding spectral indices. Moreover, we demonstrate the applicability of our pipeline to map recent large flood events that occurred in Pakistan, between June and September 2022, and in Australia, between February and April 2022. Finally, the high-resolution (10-30m) flood extent maps are intersected with other high-resolution layers of cropland, building delineations, and population density. Using this workflow, we estimated that approximately 10 million people were affected and 700k buildings and 25,000 km[Formula: see text] of cropland were flooded in 2022 Pakistan floods.
Project description:Angiogenesis is the development of new blood vessels from pre-existing ones. It is a complex multifaceted process that is essential for the adequate functioning of human organisms. The investigation of angiogenesis is conducted using various methods. One of the most popular and most serviceable of these methods in vitro is the short-term culture of endothelial cells on Matrigel. However, a significant disadvantage of this method is the manual analysis of a large number of microphotographs. In this regard, it is necessary to develop a technique for automating the annotation of images of capillary-like structures. Despite the increasing use of deep learning in biomedical image analysis, as far as we know, there still has not been a study on the application of this method to angiogenesis images. To the best of our knowledge, this article demonstrates the first tool based on a convolutional Unet++ encoder-decoder architecture for the semantic segmentation of in vitro angiogenesis simulation images followed by the resulting mask postprocessing for data analysis by experts. The first annotated dataset in this field, AngioCells, is also being made publicly available. To create this dataset, participants were recruited into a markup group, an annotation protocol was developed, and an interparticipant agreement study was carried out.