Group-Level Focus of Visual Attention for Improved Next Speaker Prediction

10Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this work we address the Next Speaker Prediction sub challenge of the ACM '21 MultiMediate Grand Challenge. This challenge poses the problem of turn taking prediction in physically situated multiparty interaction. Solving this problem is essential for enabling fluent real-time multiparty human-machine interaction. This problem is made more difficult by the need for a robust solution that can perform effectively across a wide variety of settings and contexts. Prior work has shown that current state-of-the-art methods rely on machine learning approaches that do not generalize well to new settings and feature distributions. To address this problem, we propose the use of group-level focus of visual attention as additional information. We show that a simple combination of group-level focus of visual attention features and publicly available audio-video synchronizer models is competitive with state-of-the-art methods fine-tuned for the challenge dataset.

Cite

CITATION STYLE

APA

Birmingham, C., Stefanov, K., & Mataric, M. J. (2021). Group-Level Focus of Visual Attention for Improved Next Speaker Prediction. In MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia (pp. 4838–4842). Association for Computing Machinery, Inc. https://doi.org/10.1145/3474085.3479213

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free