AV16.3: An audio-visual corpus for speaker localization and tracking

N/ACitations
Citations of this article
47Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Assessing the quality of a speaker localization or tracking algorithm on a few short examples is difficult, especially when the ground-truth is absent or not well defined. One step towards systematic performance evaluation of such algorithms is to provide time-continuous speaker location annotation over a series of real recordings, covering various test cases. Areas of interest include audio, video and audio-visual speaker localization and tracking. The desired location annotation can be either 2-dimensional (image plane) or 3-dimensional (physical space). This paper motivates and describes a corpus of audio-visual data called "AV16.3", along with a method for 3-D location annotation based on calibrated cameras. "16.3" stands for 16 microphones and 3 cameras, recorded in a fully synchronized manner, in a meeting room. Part of this corpus has already been successfully used to report research results. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Lathoud, G., Odobez, J. M., & Gatica-Perez, D. (2005). AV16.3: An audio-visual corpus for speaker localization and tracking. In Lecture Notes in Computer Science (Vol. 3361, pp. 182–195). Springer Verlag. https://doi.org/10.1007/978-3-540-30568-2_16

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free