EViLBERT: Learning task-agnostic multimodal sense embeddings

Agostina Calabrese; Michele Bevilacqua; Roberto Navigli

Conference ProceedingsOPEN ACCESS

EViLBERT: Learning task-agnostic multimodal sense embeddings

IJCAI International Joint Conference on Artificial Intelligence (2020) 2021-January 481-487

DOI: 10.24963/ijcai.2020/67

8Citations

23Readers

Abstract

The problem of grounding language in vision is increasingly attracting scholarly efforts. As of now, however, most of the approaches have been limited to word embeddings, which are not capable of handling polysemous words. This is mainly due to the limited coverage of the available semantically-annotated datasets, hence forcing research to rely on alternative technologies (i.e., image search engines). To address this issue, we introduce EViLBERT, an approach which is able to perform image classification over an open set of concepts, both concrete and non-concrete. Our approach is based on the recently introduced Vision-Language Pretraining (VLP) model, and builds upon a manually-annotated dataset of concept-image pairs. We use our technique to clean up the image-to-concept mapping that is provided within a multilingual knowledge base, resulting in over 258,000 images associated with 42,500 concepts. We show that our VLP-based model can be used to create multimodal sense embeddings starting from our automatically-created dataset. In turn, we also show that these multimodal embeddings improve the performance of a Word Sense Disambiguation architecture over a strong unimodal baseline. We release code, dataset and embeddings at http://babelpic.org.

Cite

CITATION STYLE

APA

Calabrese, A., Bevilacqua, M., & Navigli, R. (2020). EViLBERT: Learning task-agnostic multimodal sense embeddings. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2021-January, pp. 481–487). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2020/67

EViLBERT: Learning task-agnostic multimodal sense embeddings

Abstract

Cite

Register to see more suggestions