Hierarchical attention-based fusion for image caption with multi-grained rewards

Chunlei Wu; Shaozu Yuan; Haiwen Cao; Yiwei Wei; Leiquan Wang

Journal ArticleOPEN ACCESS

Hierarchical attention-based fusion for image caption with multi-grained rewards

IEEE Access (2020) 8 57943-57951

DOI: 10.1109/ACCESS.2020.2981513

11Citations

15Readers

Abstract

Image caption based on reinforcement learning (RL) methods has achieved significant success recently. Most of these methods take CIDEr score as the reward of reinforcement learning algorithm to compute gradients, thus refining the image caption baseline model. However, CIDEr score is not the sole criterion to judge the quality of a generated caption. In this paper, a Hierarchical Attention Fusion (HAF) model is presented as a baseline for image caption based on RL, where multi-level feature maps of Resnet are integrated with hierarchical attention. Revaluation network (REN) is exploited for revaluating CIDEr score by assigning different weights for each word according to the importance of each word in a generating caption. The weighted reward can be regarded as word-level reward. Moreover, Scoring Network (SN) is implemented to score the generating sentence with its corresponding ground truth from a batch of captions. This reward can obtain benefits from additional unmatched ground truth, which acts as sentence-level reward. Experimental results on the COCO dataset show that the proposed methods have achieved competitive performance compared with the related image caption methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Wu, C., Yuan, S., Cao, H., Wei, Y., & Wang, L. (2020). Hierarchical attention-based fusion for image caption with multi-grained rewards. IEEE Access, 8, 57943–57951. https://doi.org/10.1109/ACCESS.2020.2981513

Hierarchical attention-based fusion for image caption with multi-grained rewards

Abstract

Author supplied keywords

Cite

Register to see more suggestions