GPL at SemEval-2023 Task 1: WordNet and CLIP to Disambiguate Images

Shibingfeng Zhang; Shantanu Nath; Davide Mazzaccara

Conference Proceedings

GPL at SemEval-2023 Task 1: WordNet and CLIP to Disambiguate Images

17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop (2023) 1592-1597

DOI: 10.18653/v1/2023.semeval-1.219

7Citations

16Readers

Get full text

Abstract

Given a word in context, the task of Visual Word Sense Disambiguation consists of selecting the correct image among a set of candidates. To select the correct image, we propose a solution blending text augmentation and multimodal models. Text augmentation leverages the fine-grained semantic annotation from WordNet to get a better representation of the textual component. We then compare this sense-augmented text to the set of image using pre-trained multimodal models CLIP and ViLT. Our system has been ranked 16th for the English language, achieving 68.5 points for hit rate and 79.2 for mean reciprocal rank. The code to this project is available on Github1

Cite

CITATION STYLE

APA

Zhang, S., Nath, S., & Mazzaccara, D. (2023). GPL at SemEval-2023 Task 1: WordNet and CLIP to Disambiguate Images. In 17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop (pp. 1592–1597). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.semeval-1.219

GPL at SemEval-2023 Task 1: WordNet and CLIP to Disambiguate Images

Abstract

Cite

Register to see more suggestions