Attention based multi-layer fusion of multispectral images for pedestrian detection

65Citations
Citations of this article
35Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Multispectral images are increasingly used for pedestrian detection. Preliminary fusion strategies would fail to exploit informative features from cross-spectral images, or worse, may introduce additional interference. In this paper, we propose an attention based multi-layer fusion network in the triple-stream deep convolutional neural network architecture for multispectral pedestrian detection. The effectiveness of multi-layer fusion is examined and verified in this work. Furthermore, a channel-wise attention module (CAM) and a spatial-wise attention module (SAM) are developed and incorporated into the network aiming at more subtle adjustment to weights of multispectral features along both the channel and spatial dimensions respectively. Channel-wise attention is trained with self-supervision while spatialwise attention is trained with external supervision as we remodel its learning process as saliency detection. Both attention-based weighting mechanisms are evaluated separately and then sequentially. Experimental results on the KAIST dataset show that the proposed multi-layer cross-spectral fusion R-CNN (CS-RCNN), with spatial-wise weighting applied alone, achieves state-of-the-art performance on all-day detection while outperforming compared methods at nighttime.

Cite

CITATION STYLE

APA

Zhang, Y., Yin, Z., Nie, L., & Huang, S. (2020). Attention based multi-layer fusion of multispectral images for pedestrian detection. IEEE Access, 8, 165071–165084. https://doi.org/10.1109/ACCESS.2020.3022623

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free