GPL at SemEval-2023 Task 1: WordNet and CLIP to Disambiguate Images

7Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Given a word in context, the task of Visual Word Sense Disambiguation consists of selecting the correct image among a set of candidates. To select the correct image, we propose a solution blending text augmentation and multimodal models. Text augmentation leverages the fine-grained semantic annotation from WordNet to get a better representation of the textual component. We then compare this sense-augmented text to the set of image using pre-trained multimodal models CLIP and ViLT. Our system has been ranked 16th for the English language, achieving 68.5 points for hit rate and 79.2 for mean reciprocal rank. The code to this project is available on Github1

Cite

CITATION STYLE

APA

Zhang, S., Nath, S., & Mazzaccara, D. (2023). GPL at SemEval-2023 Task 1: WordNet and CLIP to Disambiguate Images. In 17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop (pp. 1592–1597). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.semeval-1.219

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free