A multi-level attention model for remote sensing image captions

Yangyang Li; Shuangkang Fang; Licheng Jiao; Ruijiao Liu; Ronghua Shang

Journal ArticleOPEN ACCESS

A multi-level attention model for remote sensing image captions

Remote Sensing (2020) 12(6)

DOI: 10.3390/rs12060939

43Citations

29Readers

Abstract

The task of image captioning involves the generation of a sentence that can describe an image appropriately, which is the intersection of computer vision and natural language. Although the research on remote sensing image captions has just started, it has great significance. The attention mechanism is inspired by the way humans think, which is widely used in remote sensing image caption tasks. However, the attention mechanism currently used in this task is mainly aimed at images, which is too simple to express such a complex task well. Therefore, in this paper, we propose a multi-level attention model, which is a closer imitation of attention mechanisms of human beings. This model contains three attention structures, which represent the attention to different areas of the image, the attention to different words, and the attention to vision and semantics. Experiments show that our model has achieved better results than before, which is currently state-of-the-art. In addition, the existing datasets for remote sensing image captioning contain a large number of errors. Therefore, in this paper, a lot of work has been done to modify the existing datasets in order to promote the research of remote sensing image captioning.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, Y., Fang, S., Jiao, L., Liu, R., & Shang, R. (2020). A multi-level attention model for remote sensing image captions. Remote Sensing, 12(6). https://doi.org/10.3390/rs12060939

A multi-level attention model for remote sensing image captions

Abstract

Author supplied keywords

Cite

Register to see more suggestions