Scene classification for remote sensing images with self-attention augmented CNN

7Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

Remote sensing scene classification aims to automatically assign a specific semantic label to each image. It is challenging to classify remote sensing scene images due to the images' diversity and rich spatial information. Recently, convolutional neural networks have been widely used to overcome these difficulties, such as the famous Visual Geometry Group (VGG) network. However, the VGG network with local receptive fields cannot model the global information of remote sensing images well. It also needs a large number of parameters and floating point operations to achieve satisfactory accuracy. To overcome these challenges, we introduce the self-attention mechanism to the VGG network. Specifically, we replace the last four convolutional layers in the VGG-19 network with two cascaded self-attention blocks, each consisting of two multi-head self-attention (MHSA) layers with the residual network structure. The new structure can simultaneously explore the local and global information from remote sensing scenes. Such improvements not only reduce model parameters but also improve the classification performance. The effectiveness of the proposed method is validated through experiments on four public data sets, i.e., NaSC-TG2, WHU-RS19, AID and EuroSAT.

Cite

CITATION STYLE

APA

Liu, Z., Dong, A., Yu, J., Han, Y., Zhou, Y., & Zhao, K. (2022). Scene classification for remote sensing images with self-attention augmented CNN. IET Image Processing, 16(11), 3085–3096. https://doi.org/10.1049/ipr2.12540

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free