Demonstrating EMMA: Embodied MultiModal Agent for Language-guided Action Execution in 3D Simulated Environments

2Citations
Citations of this article
26Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We demonstrate EMMA, an embodied multimodal agent which has been developed for the Alexa Prize SimBot Challenge1. The agent acts within a 3D simulated environment for household tasks. EMMA is a unified and multimodal generative model aimed at solving embodied tasks. In contrast to previous work, our approach treats multiple multimodal tasks as a single multimodal conditional text generation problem. Furthermore, we showcase that a single generative agent can solve tasks with visual inputs of varying length, such as answering questions about static images, or executing actions given a sequence of previous frames and dialogue utterances. The demo system will allow users to interact conversationally with EMMA in embodied dialogues in different 3D environments from the TEACh dataset.

Cite

CITATION STYLE

APA

Suglia, A., Hemanthage, B., Nikandrou, M., Pantazopoulos, G., Parekh, A., Eshghi, A., … Rieser, V. (2022). Demonstrating EMMA: Embodied MultiModal Agent for Language-guided Action Execution in 3D Simulated Environments. In SIGDIAL 2022 - 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, Proceedings of the Conference (pp. 649–653). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.sigdial-1.62

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free