Project description:The infrared and visible image fusion task aims to generate a single image that preserves complementary features and reduces redundant information from different modalities. Although convolutional neural networks (CNNs) can effectively extract local features and obtain better fusion performance, the size of the receptive field limits its feature extraction ability. Thus, the Transformer architecture has gradually become mainstream to extract global features. However, current Transformer-based fusion methods ignore the enhancement of details, which is important to image fusion tasks and other downstream vision tasks. To this end, a new super feature attention mechanism and the wavelet-guided pooling operation are applied to the fusion network to form a novel fusion network, termed SFPFusion. Specifically, super feature attention is able to establish long-range dependencies of images and to fully extract global features. The extracted global features are processed by wavelet-guided pooling to fully extract multi-scale base information and to enhance the detail features. With the powerful representation ability, only simple fusion strategies are utilized to achieve better fusion performance. The superiority of our method compared with other state-of-the-art methods is demonstrated in qualitative and quantitative experiments on multiple image fusion benchmarks.
Project description:We present a dataset of close range hyperspectral images of materials that span the visible and near infrared spectrums: HyTexiLa (Hyperspectral Texture images acquired in Laboratory). The data is intended to provide high spectral and spatial resolution reflectance images of 112 materials to study spatial and spectral textures. In this paper we discuss the calibration of the data and the method for addressing the distortions during image acquisition. We provide a spectral analysis based on non-negative matrix factorization to quantify the spectral complexity of the samples and extend local binary pattern operators to the hyperspectral texture analysis. The results demonstrate that although the spectral complexity of each of the textures is generally low, increasing the number of bands permits better texture classification, with the opponent band local binary pattern feature giving the best performance.
Project description:The objective of image fusion is to integrate complementary features from source images to better cater to the needs of human and machine vision. However, existing image fusion algorithms predominantly focus on enhancing the visual appeal of the fused image for human perception, often neglecting their impact on subsequent high-level visual tasks, particularly the processing of semantic information. Moreover, these fusion methods that incorporate downstream tasks tend to be overly complex and computationally intensive, which is not conducive to practical applications. To address these issues, a lightweight infrared and visible light image fusion method known as SIFusion, which is based on semantic injection, is proposed in this paper. This method employs a semantic-aware branch to extract semantic feature information, and then integrates these features into the fused features through a Semantic Injection Module (SIM) to meet the semantic requirements of high-level visual tasks. Furthermore, to simplify the complexity of the fusion network, this method introduces an Edge Convolution Module (ECB) based on structural reparameterization technology to enhance the representational capacity of the encoder and decoder. Extensive experimental comparisons demonstrate that the proposed method performs excellently in terms of visual appeal and advanced semantics, providing satisfactory fusion results for subsequent high-level visual tasks even in challenging scenarios.
Project description:The most direct way to find the electrical switchgear fault is to use infrared thermal imaging technology for temperature measurement. However, infrared thermal imaging images are usually polluted by noise, and there are problems such as low contrast and blurred edges. To solve these problems, this article proposes a dual convolutional neural network model based on nonsubsampled contourlet transform (NSCT). First, the overall structure of the model is made wider by combining the two networks. Compared with the deeper convolutional neural network, the dual convolutional neural network (CNN) improves the denoising performance without increasing the computational cost too much. Secondly, the model uses NSCT and inverse NSCT to obtain more texture information and avoid the gridding effect. It achieves a good balance between noise reduction performance and detail retention. A large number of simulation experiments show that the model has the ability to deal with synthetic noise and real noise, which has high practical value.
Project description:The purpose of infrared and visible image fusion is to obtain an image that includes both infrared target and visible information. However, among the existing infrared and visible image fusion methods, some of them give priority to the fusion effect, often with complex design, ignoring the influence of attention mechanisms on deep features, resulting in the lack of visible light texture information in the fusion image. To solve these problems, an infrared and visible image fusion method based on dense gradient attention residuals is proposed in this article. Firstly, squeeze-and-excitation networks are integrated into the gradient convolutional dense block, and a new gradient attention residual dense block is designed to enhance the ability of the network to extract important information. In order to retain more original image information, the feature gradient attention module is introduced to enhance the ability of detail information retention. In the fusion layer, an adaptive weighted energy attention network based on an energy fusion strategy is used to further preserve the infrared and visible details. Through the experimental comparison on the TNO dataset, our method has excellent performance on several evaluation indicators. Specifically, in the indexes of average gradient (AG), information entropy (EN), spatial frequency (SF), mutual information (MI) and standard deviation (SD), our method reached 6.90, 7.46, 17.30, 2.62 and 54.99, respectively, which increased by 37.31%, 6.55%, 32.01%, 8.16%, and 10.01% compared with the other five commonly used methods. These results demonstrate the effectiveness and superiority of our method.
Project description:Addressing the limitation of existing infrared and visible video fusion models, which fail to dynamically adjust fusion strategies based on video differences, often resulting in suboptimal or failed outcomes, we propose an infrared and visible video fusion algorithm that leverages the autonomous and flexible characteristics of multi-agent systems. First, we analyze the functional architecture of agents and the inherent properties of multi-agent systems to construct a multi-agent fusion model and corresponding fusion agents. Next, we identify regions of interest in each frame of the video sequence, focusing on frames that exhibit significant changes. The multi-agent fusion model then perceives the key distinguishing features between the images to be fused, deploys the appropriate fusion agents, and employs the effectiveness of fusion to infer and determine the fusion algorithms, rules, and parameters, ultimately selecting the optimal fusion strategy. Finally, in the context of a complex fusion process, the multi-agent fusion model performs the fusion task through the collaborative interaction of multiple fusion agents. This approach establishes a multi-layered, dynamically adaptable fusion model, enabling real-time adjustments to the fusion algorithm during the infrared and visible video fusion process. Experimental results demonstrate that our method outperforms existing approaches in preserving key targets in infrared videos and structural details in visible videos. Evaluation metrics indicate that the fusion outcomes obtained using our method achieve optimal values in 66.7% of cases, with sub-optimal and higher values accounting for 80.9%, significantly surpassing the performance of traditional single fusion methods.
Project description:Visible-Infrared Person Re-identification (VI-ReID) has been consistently challenged by the significant intra-class variations and cross-modality differences between different cameras. Therefore, the key lies in how to extract discriminative modality-shared features. Existing VI-ReID methods based on Convolutional Neural Networks (CNN) and Vision Transformers (ViT) have shortcomings in capturing global features and controlling computational complexity, respectively. To tackle these challenges, we propose a hybrid network framework called ReMamba. Specifically, we first use a CNN as the backbone network to extract multi-level features. Then, we introduce the Visual State Space (VSS) model, which is responsible for integrating the local features output by the CNN from lower to higher levels. These local features serve as a complement to global information and thereby enhancing the local details clarity of the global features. Considering the potential redundancy and semantic differences between local and global features, we design an adaptive feature aggregation module that automatically filters and effectively aggregates both types of features, incorporating an auxiliary aggregation loss to optimize the aggregation process. Furthermore, to better constrain cross-modality features and intra-modal features, we design a modal consistency identity constraint loss to alleviate cross-modality differences and extract modality-shared information. Extensive experiments conducted on the SYSU-MM01, RegDB, and LLCM datasets demonstrate that our proposed ReMamba outperforms state-of-the-art VI-ReID methods.
Project description:Visible and thermal images acquired from drones (unoccupied aircraft systems) have substantially improved animal monitoring. Combining complementary information from both image types provides a powerful approach for automating detection and classification of multiple animal species to augment drone surveys. We compared eight image fusion methods using thermal and visible drone images combined with two supervised deep learning models, to evaluate the detection and classification of white-tailed deer (Odocoileus virginianus), domestic cow (Bos taurus), and domestic horse (Equus caballus). We classified visible and thermal images separately and compared them with the results of image fusion. Fused images provided minimal improvement for cows and horses compared to visible images alone, likely because the size, shape, and color of these species made them conspicuous against the background. For white-tailed deer, which were typically cryptic against their backgrounds and often in shadows in visible images, the added information from thermal images improved detection and classification in fusion methods from 15 to 85%. Our results suggest that image fusion is ideal for surveying animals inconspicuous from their backgrounds, and our approach uses few image pairs to train compared to typical machine-learning methods. We discuss computational and field considerations to improve drone surveys using our fusion approach.
Project description:Oceanic phenomena detection in synthetic aperture radar (SAR) images is important in the fields of fishery, military, and oceanography. The traditional detection methods of oceanic phenomena in SAR images are based on handcrafted features and detection thresholds, which have a problem of poor generalization ability. Methods based on deep learning have good generalization ability. However, most of the deep learning methods currently applied to oceanic phenomena detection only detect one type of phenomenon. To satisfy the requirements of efficient and accurate detection of multiple information of multiple oceanic phenomena in massive SAR images, this paper proposes an oceanic phenomena detection method in SAR images based on convolutional neural network (CNN). The method first uses ResNet-50 to extract multilevel features. Second, it uses the atrous spatial pyramid pooling (ASPP) module to extract multiscale features. Finally, it fuses multilevel features and multiscale features to detect oceanic phenomena. The SAR images acquired from the Sentinel-1 satellite are used to establish a sample dataset of oceanic phenomena. The method proposed can achieve 91% accuracy on the dataset.
Project description:This article presents a dataset of thermal and visible aerial images of the same flat scene at Melendez campus of Universidad del Valle, Cali, Colombia. The images were acquired using an UAV equipped with either a thermal or a visible camera. The dataset is useful for testing techniques for the improvement, registration and fusion of multi-modal and multi-spectral images. The dataset consists of 30 visible images and their metadata, 80 thermal images and their metadata, and a visible georeferenced orthoimage. The metadata related to every image contains the WGS84 coordinates for allocating the images. Also, the homography matrices between every image and the orthoimage are included in the dataset. The images and homographies are compatible with the well-known assessment protocol for detection and description proposed by Mikolajczyk and Schmid [1].