MemCap: Memorizing style knowledge for image captioning

75Citations
Citations of this article
53Readers
Mendeley users who have this article in their library.

Abstract

Generating stylized captions for images is a challenging task since it requires not only describing the content of the image accurately but also expressing the desired linguistic style appropriately. In this paper, we propose MemCap, a novel stylized image captioning method that explicitly encodes the knowledge about linguistic styles with memory mechanism. Rather than relying heavily on a language model to capture style factors in existing methods, our method resorts to memorizing stylized elements learned from training corpus. Particularly, we design a memory module that comprises a set of embedding vectors for encoding style-related phrases in training corpus. To acquire the style-related phrases, we develop a sentence decomposing algorithm that splits a stylized sentence into a style-related part that reflects the linguistic style and a content-related part that contains the visual content. When generating captions, our MemCap first extracts content-relevant style knowledge from the memory module via an attention mechanism and then incorporates the extracted knowledge into a language model. Extensive experiments on two stylized image captioning datasets (SentiCap and FlickrStyle10K) demonstrate the effectiveness of our method.

Cite

CITATION STYLE

APA

Zhao, W., Wu, X., & Zhang, X. (2020). MemCap: Memorizing style knowledge for image captioning. In AAAI 2020 - 34th AAAI Conference on Artificial Intelligence (pp. 12984–12992). AAAI press. https://doi.org/10.1609/aaai.v34i07.6998

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free