The current challenges of image captioning technology are how to make the generated captions closely relate to the image information, and generated captions are highly syntactically readable. Therefore, we focus on two problems: 1) how to correctly choose semantic and visual information in an image, 2) how to optimize the syntactic structure of captions, and improve the readability of the syntax. However, the existing work lacks attention to these issues. To solve these problems, we propose an image captioning framework based on the attention balance mechanism and the syntax optimization module, namely ATT-BM-SOM. This model realizes effective fusion of image information and generates high-quality captions, which makes up for the lack of image information selection and syntax readability. Experiments show that our model achieves excellent performance on the MS COCO dataset. Compared with the baseline models, our model performed the best results of 78.1, 58.4, 119.2, and 24.7 on BLUE-1, ROUGE-L, CIDER, and SPICE, respectively.
CITATION STYLE
Yang, Z., & Liu, Q. (2020). ATT-BM-SOM: A Framework of Effectively Choosing Image Information and Optimizing Syntax for Image Captioning. IEEE Access, 8, 50565–50573. https://doi.org/10.1109/ACCESS.2020.2980578
Mendeley helps you to discover research relevant for your work.