Probing Cross-Modal Representations in Multi-Step Relational Reasoning

2Citations
Citations of this article
40Readers
Mendeley users who have this article in their library.

Abstract

We investigate the representations learned by vision and language models in tasks that require relational reasoning. Focusing on the problem of assessing the relative size of objects in abstract visual contexts, we analyse both one-step and two-step reasoning. For the latter, we construct a new dataset of three-image scenes and define a task that requires reasoning at the level of the individual images and across images in a scene. We probe the learned model representations using diagnostic classifiers. Our experiments show that pretrained multimodal transformer-based architectures can perform higher-level relational reasoning, and are able to learn representations for novel tasks and data that are very different from what was seen in pretraining.

Cite

CITATION STYLE

APA

Parfenova, I., Elliott, D., Fernández, R., & Pezzelle, S. (2021). Probing Cross-Modal Representations in Multi-Step Relational Reasoning. In RepL4NLP 2021 - 6th Workshop on Representation Learning for NLP, Proceedings of the Workshop (pp. 152–162). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.repl4nlp-1.16

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free