Vision Meets Definitions: Unsupervised Visual Word Sense Disambiguation Incorporating Gloss Information

5Citations
Citations of this article
24Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Visual Word Sense Disambiguation (VWSD) is a task to find the image that most accurately depicts the correct sense of the target word for the given context. Previously, image-text matching models often suffered from recognizing polysemous words. This paper introduces an unsupervised VWSD approach that uses gloss information of an external lexical knowledge-base, especially the sense definitions. Specifically, we suggest employing Bayesian inference to incorporate the sense definitions when sense information of the answer is not provided. In addition, to ameliorate the out-of-vocabulary (OOV) issue, we propose a context-aware definition generation with GPT-3. Experimental results show that VWSD performance increased significantly with our Bayesian inference-based approach. In addition, our context-aware definition generation achieved prominent performance improvement in OOV examples exhibiting better performance than the existing definition generation method.

Cite

CITATION STYLE

APA

Kwon, S., Garodia, R., Lee, M., Yang, Z., & Yu, H. (2023). Vision Meets Definitions: Unsupervised Visual Word Sense Disambiguation Incorporating Gloss Information. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 1583–1598). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.88

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free