Domain-specific image captioning

Rebecca Mason; Eugene Charniak

Conference Proceedings

Domain-specific image captioning

CoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings (2014) 11-20

DOI: 10.3115/v1/w14-1602

11Citations

94Readers

Get full text

Abstract

We present a data-driven framework for image caption generation which incorporates visual and textual features with varying degrees of spatial structure. We propose the task of domain-specific image captioning, where many relevant visual details cannot be captured by off-the-shelf general-domain entity detectors. We extract previously-written descriptions from a database and adapt them to new query images, using a joint visual and textual bag-of-words model to determine the correctness of individual words. We implement our model using a large, unlabeled dataset of women’s shoes images and natural language descriptions (Berg et al., 2010). Using both automatic and human evaluations, we show that our captioning method effectively deletes inaccurate words from extracted captions while maintaining a high level of detail in the generated output.

Cite

CITATION STYLE

APA

Mason, R., & Charniak, E. (2014). Domain-specific image captioning. In CoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings (pp. 11–20). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-1602

Domain-specific image captioning

Abstract

Cite

Register to see more suggestions