Multi-Scale Cross-Modal Spatial Attention Fusion for Multi-label Image Recognition

Junbing Li; Changqing Zhang; Xueman Wang; Ling Du

Conference Proceedings

Multi-Scale Cross-Modal Spatial Attention Fusion for Multi-label Image Recognition

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12396 LNCS 736-747

DOI: 10.1007/978-3-030-61609-0_58

6Citations

1Readers

Get full text

Abstract

Multi-label image recognition aims to jointly predict multiple tags for an image. Despite great progress achieved, there are still two limitations for existing methods: 1) can not accurately locate the object regions due to the lack of adequate supervision information or semantic guidance; 2) can not effectively identify the target categories of small-size object due to only employing the high-level feature of deep CNN. In this paper, we propose a Multi-Scale Cross-Modal Spatial Attention Fusion (MCSAF) network to accurately locate more informative regions by introducing a spatial attention module, and our model can effectively recognize target classes of different scales with multi-scale cross-modal feature fusion. Furthermore, we develop an adaptive graph convolutional network (Adaptive-GCN) to capture the complex correlations among labels in depth. Empirical studies on benchmark datasets validate the superiority of our proposed model over state-of-the-art methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, J., Zhang, C., Wang, X., & Du, L. (2020). Multi-Scale Cross-Modal Spatial Attention Fusion for Multi-label Image Recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12396 LNCS, pp. 736–747). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-61609-0_58

Multi-Scale Cross-Modal Spatial Attention Fusion for Multi-label Image Recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions