Do you see what I mean? Visual resolution of linguistic ambiguities

Yevgeni Berzak; Andrei Barbu; Daniel Harari; Boris Katz; Shimon Ullman

Conference Proceedings

Do you see what I mean? Visual resolution of linguistic ambiguities

Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (2015) 1477-1487

DOI: 10.18653/v1/d15-1172

18Citations

143Readers

Get full text

Abstract

Understanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception. In this work, we present a novel task for grounded language understanding: disambiguating a sentence given a visual scene which depicts one of the possible interpretations of that sentence. To this end, we introduce a new multimodal corpus containing ambiguous sentences, representing a wide range of syntactic, semantic and discourse ambiguities, coupled with videos that visualize the different interpretations for each sentence. We address this task by extending a vision model which determines if a sentence is depicted by a video. We demonstrate how such a model can be adjusted to recognize different interpretations of the same underlying sentence, allowing to disambiguate sentences in a unified fashion across the different ambiguity types.

Cite

CITATION STYLE

APA

Berzak, Y., Barbu, A., Harari, D., Katz, B., & Ullman, S. (2015). Do you see what I mean? Visual resolution of linguistic ambiguities. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (pp. 1477–1487). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d15-1172

Do you see what I mean? Visual resolution of linguistic ambiguities

Abstract

Cite

Register to see more suggestions