Remote sensing scene classification aims to automatically assign a specific semantic label to each image. It is challenging to classify remote sensing scene images due to the images' diversity and rich spatial information. Recently, convolutional neural networks have been widely used to overcome these difficulties, such as the famous Visual Geometry Group (VGG) network. However, the VGG network with local receptive fields cannot model the global information of remote sensing images well. It also needs a large number of parameters and floating point operations to achieve satisfactory accuracy. To overcome these challenges, we introduce the self-attention mechanism to the VGG network. Specifically, we replace the last four convolutional layers in the VGG-19 network with two cascaded self-attention blocks, each consisting of two multi-head self-attention (MHSA) layers with the residual network structure. The new structure can simultaneously explore the local and global information from remote sensing scenes. Such improvements not only reduce model parameters but also improve the classification performance. The effectiveness of the proposed method is validated through experiments on four public data sets, i.e., NaSC-TG2, WHU-RS19, AID and EuroSAT.
CITATION STYLE
Liu, Z., Dong, A., Yu, J., Han, Y., Zhou, Y., & Zhao, K. (2022). Scene classification for remote sensing images with self-attention augmented CNN. IET Image Processing, 16(11), 3085–3096. https://doi.org/10.1049/ipr2.12540
Mendeley helps you to discover research relevant for your work.