Visual Word Sense Disambiguation shared task at SemEval-2023 aims to identify an image corresponding to the intended meaning of a given ambiguous word (with related context) from a set of candidate images. The lack of textual description for the candidate image and the corresponding word’s ambiguity makes it a challenging problem. This paper describes teamPN’s multi-modal and modular approach to solving this in English track of the task. We efficiently used recent multi-modal pre-trained models backed by real-time multi-modal knowledge graphs to augment textual knowledge for the images and select the best matching image accordingly. We outperformed the baseline model by 5 points and proposed a unique approach that can further work as a framework for other modular and knowledge-backed solutions.
CITATION STYLE
Katyal, N., Rajpoot, P., Tamilarasu, S., & Mustafi, J. (2023). teamPN at SemEval-2023 Task 1: Visual Word Sense Disambiguation Using Zero-Shot MultiModal Approach. In 17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop (pp. 457–461). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.semeval-1.63
Mendeley helps you to discover research relevant for your work.