Grounding semantic roles in images

Carina Silberer; Manfred Pinkal

Conference ProceedingsOPEN ACCESS

Grounding semantic roles in images

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 (2018) 2616-2626

DOI: 10.18653/v1/d18-1282

13Citations

96Readers

Abstract

We address the task of visual semantic role labeling (vSRL), the identification of the participants of a situation or event in a visual scene, and their labeling with their semantic relations to the event or situation. We render candidate participants as image regions of objects, and train a model which learns to ground roles in the regions which depict the corresponding participant. Experimental results demonstrate that we can train a vSRL model without reliance on prohibitive image-based role annotations, by utilizing noisy data which we extract automatically from image captions using a linguistic SRL system. Furthermore, our model induces frame-semantic visual representations, and their comparison to previous work on supervised visual verb sense disambiguation yields overall better results.

Cite

CITATION STYLE

APA

Silberer, C., & Pinkal, M. (2018). Grounding semantic roles in images. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 (pp. 2616–2626). Association for Computational Linguistics. https://doi.org/10.18653/v1/d18-1282

Grounding semantic roles in images

Abstract

Cite

Register to see more suggestions