What is learned in visually grounded neural syntax acquisition

9Citations
Citations of this article
110Readers
Mendeley users who have this article in their library.

Abstract

Visual features are a promising signal for learning bootstrap textual models. However, black-box learning models make it difficult to isolate the specific contribution of visual components. In this analysis, we consider the case study of the Visually Grounded Neural Syntax Learner (Shi et al., 2019), a recent approach for learning syntax from a visual training signal. By constructing simplified versions of the model, we isolate the core factors that yield the model's strong performance. Contrary to what the model might be capable of learning, we find significantly less expressive versions produce similar predictions and perform just as well, or even better. We also find that a simple lexical signal of noun concreteness plays the main role in the model's predictions as opposed to more complex syntactic reasoning.

Cite

CITATION STYLE

APA

Kojima, N., Averbuch-Elor, H., Rush, A., & Artzi, Y. (2020). What is learned in visually grounded neural syntax acquisition. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 2615–2635). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.acl-main.234

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free