Generating captions for images of ancient artworks

Shurong Sheng; Marie Francine Moens

Conference ProceedingsOPEN ACCESS

Generating captions for images of ancient artworks

MM 2019 - Proceedings of the 27th ACM International Conference on Multimedia (2019) 2478-2486

DOI: 10.1145/3343031.3350972

26Citations

20Readers

Abstract

The neural encoder-decoder framework is widely adopted for image captioning of natural images. However, few works have contributed to generating captions for cultural images using this scheme. In this paper, we propose an artwork type enriched image captioning model where the encoder represents an input artwork image as a 512-dimensional vector and the decoder generates a corresponding caption based on the input image vector. The artwork type is first predicted by a convolutional neural network classifier and then merged into the decoder. We investigate multiple approaches to integrate the artwork type into the captioning model among which is one that applies a step-wise weighted sum of the artwork type vector and the hidden representation vector of the decoder. This model outperforms three baseline image captioning models for a Chinese art image captioning dataset on all evaluation metrics. One of the baselines is a state-of-the-art approach fusing textual image attributes into the captioning model for natural images. The proposed model also obtains promising results for another Egyptian art image captioning dataset.

Author supplied keywords

Cite

CITATION STYLE

APA

Sheng, S., & Moens, M. F. (2019). Generating captions for images of ancient artworks. In MM 2019 - Proceedings of the 27th ACM International Conference on Multimedia (pp. 2478–2486). Association for Computing Machinery, Inc. https://doi.org/10.1145/3343031.3350972

Generating captions for images of ancient artworks

Abstract

Author supplied keywords

Cite

Register to see more suggestions