Multimodal representation learning for human robot interaction

3Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present a neural network based system capable of learning a multimodal representation of images and words. This representation allows for bidirectional grounding of the meaning of words and the visual attributes that they represent, such as colour, size and object name. We also present a new dataset captured specifically for this task.

Cite

CITATION STYLE

APA

Sheppard, E., & Lohan, K. S. (2020). Multimodal representation learning for human robot interaction. In ACM/IEEE International Conference on Human-Robot Interaction (pp. 445–446). IEEE Computer Society. https://doi.org/10.1145/3371382.3378265

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free