Abstract
Visual Word Sense Disambiguation (VWSD) is a task to find the image that most accurately depicts the correct sense of the target word for the given context. Previously, image-text matching models often suffered from recognizing polysemous words. This paper introduces an unsupervised VWSD approach that uses gloss information of an external lexical knowledge-base, especially the sense definitions. Specifically, we suggest employing Bayesian inference to incorporate the sense definitions when sense information of the answer is not provided. In addition, to ameliorate the out-of-vocabulary (OOV) issue, we propose a context-aware definition generation with GPT-3. Experimental results show that VWSD performance increased significantly with our Bayesian inference-based approach. In addition, our context-aware definition generation achieved prominent performance improvement in OOV examples exhibiting better performance than the existing definition generation method.
Cite
CITATION STYLE
Kwon, S., Garodia, R., Lee, M., Yang, Z., & Yu, H. (2023). Vision Meets Definitions: Unsupervised Visual Word Sense Disambiguation Incorporating Gloss Information. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 1583–1598). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.88
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.