A dual-model architecture with grouping-attention-fusion for remote sensing scene classification

Junge Shen; Tong Zhang; Yichen Wang; Ruxin Wang; Qi Wang; Min Qi

Journal ArticleOPEN ACCESS

A dual-model architecture with grouping-attention-fusion for remote sensing scene classification

Remote Sensing (2021) 13(3) 1-19

DOI: 10.3390/rs13030433

19Citations

46Readers

Abstract

Remote sensing images contain complex backgrounds and multi-scale objects, which pose a challenging task for scene classification. The performance is highly dependent on the capacity of the scene representation as well as the discriminability of the classifier. Although multiple models possess better properties than a single model on these aspects, the fusion strategy for these models is a key component to maximize the final accuracy. In this paper, we construct a novel dualmodel architecture with a grouping-attention-fusion strategy to improve the performance of scene classification. Specifically, the model employs two different convolutional neural networks (CNNs) for feature extraction, where the grouping-attention-fusion strategy is used to fuse the features of the CNNs in a fine and multi-scale manner. In this way, the resultant feature representation of the scene is enhanced. Moreover, to address the issue of similar appearances between different scenes, we develop a loss function which encourages small intra-class diversities and large inter-class distances. Extensive experiments are conducted on four scene classification datasets include the UCM land-use dataset, the WHU-RS19 dataset, the AID dataset, and the OPTIMAL-31 dataset. The experimental results demonstrate the superiority of the proposed method in comparison with the state-of-the-arts.

Author supplied keywords

Cite

CITATION STYLE

APA

Shen, J., Zhang, T., Wang, Y., Wang, R., Wang, Q., & Qi, M. (2021). A dual-model architecture with grouping-attention-fusion for remote sensing scene classification. Remote Sensing, 13(3), 1–19. https://doi.org/10.3390/rs13030433

A dual-model architecture with grouping-attention-fusion for remote sensing scene classification

Abstract

Author supplied keywords

Cite

Register to see more suggestions