MilaNLP at SemEval-2022 Task 5: Using Perceiver IO for Detecting Misogynous Memes with Text and Image Modalities

10Citations
Citations of this article
40Readers
Mendeley users who have this article in their library.

Abstract

Warning: This paper contains examples of language that some people may find offensive or upsetting. In this paper, we describe the system proposed by the MilaNLP team for the Multimedia Automatic Misogyny Identification (MAMI) challenge. We use Perceiver IO as a multimodal late fusion over unimodal streams to address both sub-tasks A and B. We build unimodal embeddings using Vision Transformer (image) and RoBERTa (text transcript). We enrich the input representation using face and demographic recognition, image captioning, and detection of adult content and web entities. To the best of our knowledge, this work is the first to use Perceiver IO combining text and image modalities. The proposed approach outperforms unimodal and multimodal baselines.

Cite

CITATION STYLE

APA

Attanasio, G., Nozza, D., & Bianchi, F. (2022). MilaNLP at SemEval-2022 Task 5: Using Perceiver IO for Detecting Misogynous Memes with Text and Image Modalities. In SemEval 2022 - 16th International Workshop on Semantic Evaluation, Proceedings of the Workshop (pp. 654–662). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.semeval-1.90

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free