Image caption generation via unified retrieval and generation-based method

Shanshan Zhao; Lixiang Li; Haipeng Peng; Zihang Yang; Jiaxuan Zhang

Journal ArticleOPEN ACCESS

Image caption generation via unified retrieval and generation-based method

Applied Sciences (Switzerland) (2020) 10(18)

DOI: 10.3390/APP10186235

10Citations

10Readers

Abstract

Image captioning is a multi-modal transduction task, translating the source image into the target language. Numerous dominant approaches primarily employed the generation-based or the retrieval-based method. These two kinds of frameworks have their advantages and disadvantages. In this work, we make the best of their respective advantages. We adopt the retrieval-based approach to search the visually similar image and their corresponding captions for each queried image in the MSCOCO data set. Based on the retrieved similar sequences and the visual features of the queried image, the proposed de-noising module yielded a set of attended textual features which brought additional textual information for the generation-based model. Finally, the decoder makes use of not only the visual features but also the textual features to generate the output descriptions. Additionally, the incorporated visual encoder and the de-noising module can be applied as a preprocessing component for the decoder-based attention mechanisms. We evaluate the proposed method on the MSCOCO benchmark data set. Extensive experiment yields state-of-the-art performance, and the incorporated module raises the baseline models in terms of almost all the evaluation metrics.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhao, S., Li, L., Peng, H., Yang, Z., & Zhang, J. (2020). Image caption generation via unified retrieval and generation-based method. Applied Sciences (Switzerland), 10(18). https://doi.org/10.3390/APP10186235

Image caption generation via unified retrieval and generation-based method

Abstract

Author supplied keywords

Cite

Register to see more suggestions