In collaborative tasks, people rely both on verbal and non-verbal cues simultaneously to communicate with each other. For human-robot interaction to run smoothly and naturally, a robot should be equipped with the ability to robustly disambiguate referring expressions. In this work, we propose a model that can disambiguate multimodal fetching requests using modalities such as head movements, hand gestures, and speech. We analysed the acquired data from mixed reality experiments and formulated a hypothesis that modelling temporal dependencies of events in these three modalities increases the model’s predictive power. We evaluated our model on a Bayesian framework to interpret referring expressions with and without exploiting the temporal prior.
CITATION STYLE
Sibirtseva, E., Ghadirzadeh, A., Leite, I., Björkman, M., & Kragic, D. (2019). Exploring Temporal Dependencies in Multimodal Referring Expressions with Mixed Reality. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11575 LNCS, pp. 108–123). Springer Verlag. https://doi.org/10.1007/978-3-030-21565-1_8
Mendeley helps you to discover research relevant for your work.