Domain-specific image captioning

11Citations
Citations of this article
94Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present a data-driven framework for image caption generation which incorporates visual and textual features with varying degrees of spatial structure. We propose the task of domain-specific image captioning, where many relevant visual details cannot be captured by off-the-shelf general-domain entity detectors. We extract previously-written descriptions from a database and adapt them to new query images, using a joint visual and textual bag-of-words model to determine the correctness of individual words. We implement our model using a large, unlabeled dataset of women’s shoes images and natural language descriptions (Berg et al., 2010). Using both automatic and human evaluations, we show that our captioning method effectively deletes inaccurate words from extracted captions while maintaining a high level of detail in the generated output.

Cite

CITATION STYLE

APA

Mason, R., & Charniak, E. (2014). Domain-specific image captioning. In CoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings (pp. 11–20). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-1602

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free