Grounding semantic roles in images

13Citations
Citations of this article
96Readers
Mendeley users who have this article in their library.

Abstract

We address the task of visual semantic role labeling (vSRL), the identification of the participants of a situation or event in a visual scene, and their labeling with their semantic relations to the event or situation. We render candidate participants as image regions of objects, and train a model which learns to ground roles in the regions which depict the corresponding participant. Experimental results demonstrate that we can train a vSRL model without reliance on prohibitive image-based role annotations, by utilizing noisy data which we extract automatically from image captions using a linguistic SRL system. Furthermore, our model induces frame-semantic visual representations, and their comparison to previous work on supervised visual verb sense disambiguation yields overall better results.

Cite

CITATION STYLE

APA

Silberer, C., & Pinkal, M. (2018). Grounding semantic roles in images. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 (pp. 2616–2626). Association for Computational Linguistics. https://doi.org/10.18653/v1/d18-1282

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free