Abstract
Multispectral images are increasingly used for pedestrian detection. Preliminary fusion strategies would fail to exploit informative features from cross-spectral images, or worse, may introduce additional interference. In this paper, we propose an attention based multi-layer fusion network in the triple-stream deep convolutional neural network architecture for multispectral pedestrian detection. The effectiveness of multi-layer fusion is examined and verified in this work. Furthermore, a channel-wise attention module (CAM) and a spatial-wise attention module (SAM) are developed and incorporated into the network aiming at more subtle adjustment to weights of multispectral features along both the channel and spatial dimensions respectively. Channel-wise attention is trained with self-supervision while spatialwise attention is trained with external supervision as we remodel its learning process as saliency detection. Both attention-based weighting mechanisms are evaluated separately and then sequentially. Experimental results on the KAIST dataset show that the proposed multi-layer cross-spectral fusion R-CNN (CS-RCNN), with spatial-wise weighting applied alone, achieves state-of-the-art performance on all-day detection while outperforming compared methods at nighttime.
Author supplied keywords
Cite
CITATION STYLE
Zhang, Y., Yin, Z., Nie, L., & Huang, S. (2020). Attention based multi-layer fusion of multispectral images for pedestrian detection. IEEE Access, 8, 165071–165084. https://doi.org/10.1109/ACCESS.2020.3022623
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.