Cross-modal supervision for learning active speaker detection in video

Punarjay Chakravarty; Tinne Tuytelaars

Conference ProceedingsOPEN ACCESS

Cross-modal supervision for learning active speaker detection in video

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9909 LNCS 285-301

DOI: 10.1007/978-3-319-46454-1_18

26Citations

30Readers

Abstract

In this paper, we show how to use audio to supervise the learning of active speaker detection in video. Voice Activity Detection (VAD) guides the learning of the vision-based classifier in a weakly supervised manner. The classifier uses spatio-temporal features to encode upper body motion - facial expressions and gesticulations associated with speaking. We further improve a generic model for active speaker detection by learning person specific models. Finally, we demonstrate the online adaptation of generic models learnt on one dataset, to previously unseen people in a new dataset, again using audio (VAD) for weak supervision. The use of temporal continuity overcomes the lack of clean training data. We are the first to present an active speaker detection system that learns on one audio-visual dataset and automatically adapts to speakers in a new dataset. This work can be seen as an example of how the availability of multi-modal data allows us to learn a model without the need for supervision, by transferring knowledge from one modality to another.

Author supplied keywords

Cite

CITATION STYLE

APA

Chakravarty, P., & Tuytelaars, T. (2016). Cross-modal supervision for learning active speaker detection in video. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9909 LNCS, pp. 285–301). Springer Verlag. https://doi.org/10.1007/978-3-319-46454-1_18

Cross-modal supervision for learning active speaker detection in video

Abstract

Author supplied keywords

Cite

Register to see more suggestions