Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval

286Citations
Citations of this article
286Readers
Mendeley users who have this article in their library.

Abstract

Semantically complex queries which include attributes of objects and relations between objects still pose a major challenge to image retrieval systems. Recent work in computer vision has shown that a graph-based semantic representation called a scene graph is an effective representation for very detailed image descriptions and for complex queries for retrieval. In this paper, we show that scene graphs can be effectively created automatically from a natural language scene description. We present a rule-based and a classifier-based scene graph parser whose output can be used for image retrieval. We show that including relations and attributes in the query graph outperforms a model that only considers objects and that using the output of our parsers is almost as effective as using human-constructed scene graphs (Recall@10 of 27.1% vs. 33.4%). Additionally, we demonstrate the general usefulness of parsing to scene graphs by showing that the output can also be used to generate 3D scenes.

Cite

CITATION STYLE

APA

Schuster, S., Krishna, R., Chang, A., Fei-Fei, L., & Manning, C. D. (2015). Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval. In A Workshop of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015 - Workshop on Vision and Language 2015, VL 2015: Vision and Language Meet Cognitive Systems - Proceedings (pp. 70–80). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w15-2812

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free