Group-Level Focus of Visual Attention for Improved Next Speaker Prediction

Chris Birmingham; Kalin Stefanov; Maja J. Mataric

Conference ProceedingsOPEN ACCESS

Group-Level Focus of Visual Attention for Improved Next Speaker Prediction

MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia (2021) 4838-4842

DOI: 10.1145/3474085.3479213

10Citations

9Readers

Get full text

Abstract

In this work we address the Next Speaker Prediction sub challenge of the ACM '21 MultiMediate Grand Challenge. This challenge poses the problem of turn taking prediction in physically situated multiparty interaction. Solving this problem is essential for enabling fluent real-time multiparty human-machine interaction. This problem is made more difficult by the need for a robust solution that can perform effectively across a wide variety of settings and contexts. Prior work has shown that current state-of-the-art methods rely on machine learning approaches that do not generalize well to new settings and feature distributions. To address this problem, we propose the use of group-level focus of visual attention as additional information. We show that a simple combination of group-level focus of visual attention features and publicly available audio-video synchronizer models is competitive with state-of-the-art methods fine-tuned for the challenge dataset.

Author supplied keywords

Cite

CITATION STYLE

APA

Birmingham, C., Stefanov, K., & Mataric, M. J. (2021). Group-Level Focus of Visual Attention for Improved Next Speaker Prediction. In MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia (pp. 4838–4842). Association for Computing Machinery, Inc. https://doi.org/10.1145/3474085.3479213

Group-Level Focus of Visual Attention for Improved Next Speaker Prediction

Abstract

Author supplied keywords

Cite

Register to see more suggestions