Generating captions for images of ancient artworks

25Citations
Citations of this article
19Readers
Mendeley users who have this article in their library.

Abstract

The neural encoder-decoder framework is widely adopted for image captioning of natural images. However, few works have contributed to generating captions for cultural images using this scheme. In this paper, we propose an artwork type enriched image captioning model where the encoder represents an input artwork image as a 512-dimensional vector and the decoder generates a corresponding caption based on the input image vector. The artwork type is first predicted by a convolutional neural network classifier and then merged into the decoder. We investigate multiple approaches to integrate the artwork type into the captioning model among which is one that applies a step-wise weighted sum of the artwork type vector and the hidden representation vector of the decoder. This model outperforms three baseline image captioning models for a Chinese art image captioning dataset on all evaluation metrics. One of the baselines is a state-of-the-art approach fusing textual image attributes into the captioning model for natural images. The proposed model also obtains promising results for another Egyptian art image captioning dataset.

Cite

CITATION STYLE

APA

Sheng, S., & Moens, M. F. (2019). Generating captions for images of ancient artworks. In MM 2019 - Proceedings of the 27th ACM International Conference on Multimedia (pp. 2478–2486). Association for Computing Machinery, Inc. https://doi.org/10.1145/3343031.3350972

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free