Abstract
The multi-modality nature of human communication has been utilized to enhance the performance of language modeling-related tasks. Driven by the development of large-scale end-to-end learning techniques and the availability of multi-modal data, it becomes possible to represent non-verbal communication behaviors through joint-learning, and directly study their interaction with verbal communication. However, there are still gaps in existing studies to better address the underlying mechanism of how non-verbal expression contributes to the overall communication purpose. Therefore, we explore two questions using mixed-modal language models trained against monologue video data: first, whether incorporating gesture representations can improve the language model's performance (perplexity); second, whether spontaneous gestures demonstrate entropy rate constancy (ERC), which is an empirical pattern found in most verbal language data that supports the rational communication assumption from Information Theory. We have positive and interesting findings for both questions: speakers indeed use spontaneous gestures to convey “meaningful” information that enhances verbal communication, which can be captured with a simple spatial encoding scheme. More importantly, gestures are produced and organized rationally in a similar way as words, which optimizes communication efficiency.
Cite
CITATION STYLE
Xu, Y., & Cheng, Y. (2023). Spontaneous gestures encoded by hand positions can improve language models: An Information-Theoretic motivated study. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 9409–9424). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.600
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.