DESCRIBE ME AN AUKLET: Generating Grounded Perceptual Category Descriptions

0Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

Human speakers can generate descriptions of perceptual concepts, abstracted from the instance-level. Moreover, such descriptions can be used by other speakers to learn provisional representations of those concepts. Learning and using abstract perceptual concepts is under-investigated in the language-and-vision field. The problem is also highly relevant to the field of representation learning in multi-modal NLP. In this paper, we introduce a framework for testing category-level perceptual grounding in multi-modal language models. In particular, we train separate neural networks to generate and interpret descriptions of visual categories. We measure the communicative success of the two models with the zero-shot classification performance of the interpretation model, which we argue is an indicator of perceptual grounding. Using this framework, we compare the performance of prototype- and exemplar-based representations. Finally, we show that communicative success exposes performance issues in the generation model, not captured by traditional intrinsic NLG evaluation metrics, and argue that these issues stem from a failure to properly ground language in vision at the category level.

Cite

CITATION STYLE

APA

Noble, B., & Ilinykh, N. (2023). DESCRIBE ME AN AUKLET: Generating Grounded Perceptual Category Descriptions. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 9330–9347). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.580

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free