AV16.3: An audio-visual corpus for speaker localization and tracking

Guillaume Lathoud; Jean Marc Odobez; Daniel Gatica-Perez

Conference Proceedings

AV16.3: An audio-visual corpus for speaker localization and tracking

Lecture Notes in Computer Science (2005) 3361 182-195

DOI: 10.1007/978-3-540-30568-2_16

N/ACitations

47Readers

Get full text

Abstract

Assessing the quality of a speaker localization or tracking algorithm on a few short examples is difficult, especially when the ground-truth is absent or not well defined. One step towards systematic performance evaluation of such algorithms is to provide time-continuous speaker location annotation over a series of real recordings, covering various test cases. Areas of interest include audio, video and audio-visual speaker localization and tracking. The desired location annotation can be either 2-dimensional (image plane) or 3-dimensional (physical space). This paper motivates and describes a corpus of audio-visual data called "AV16.3", along with a method for 3-D location annotation based on calibrated cameras. "16.3" stands for 16 microphones and 3 cameras, recorded in a fully synchronized manner, in a meeting room. Part of this corpus has already been successfully used to report research results. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Lathoud, G., Odobez, J. M., & Gatica-Perez, D. (2005). AV16.3: An audio-visual corpus for speaker localization and tracking. In Lecture Notes in Computer Science (Vol. 3361, pp. 182–195). Springer Verlag. https://doi.org/10.1007/978-3-540-30568-2_16

AV16.3: An audio-visual corpus for speaker localization and tracking

Abstract

Cite

Register to see more suggestions