Learning to interpret and describe abstract scenes

27Citations
Citations of this article
107Readers
Mendeley users who have this article in their library.

Abstract

Given a (static) scene, a human can effortlessly describe what is going on (who is doing what to whom, how, and why). The process requires knowledge about the world, how it is perceived, and described. In this paper we study the problem of interpreting and verbalizing visual information using abstract scenes created from collections of clip art images. We propose a model inspired by machine translation operating over a large parallel corpus of visual relations and linguistic descriptions. We demonstrate that this approach produces human-like scene descriptions which are both fluent and relevant, outperforming a number of competitive alternatives based on templates, sentence-based retrieval, and a multimodal neural language model.

Cite

CITATION STYLE

APA

Gilberto, L., Ortiz, M., Wolff, C., & Lapata, M. (2015). Learning to interpret and describe abstract scenes. In NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (pp. 1505–1515). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/n15-1174

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free