A bayesian approach to audio-visual speaker identification

Ara V. Nefian; Lu Hong Liang; Tieyan Fu; Xiao Xing Liu

Journal Article

A bayesian approach to audio-visual speaker identification

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2003) 2688 761-769

DOI: 10.1007/3-540-44887-x_88

21Citations

19Readers

Get full text

Abstract

In this paper we describe a text dependent audio-visual speaker identification approach that combines face recognition and audio-visual speech-based identification systems. The temporal sequence of audio and visual observations obtained from the acoustic speech and the shape of the mouth are modeled using a set of coupled hidden Markov models (CHMM), one for each phoneme-viseme pair and for each person in the database. The use of CHMM in our system is justified by the capability of this model to describe the natural audio and visual state asynchrony as well as their conditional dependence over time. Next, the likelihood obtained for each person in the database is combined with the face recognition likelihood obtained using an embedded hidden Markov model (EHMM). Experimental results on XM2VTS database show that our system improves the accuracy of the audio-only or video-only speaker identification at all levels of acoustic signal-to-noise ratio (SNR) from 5 to 30db. © Springer-Verlag 2003.

Cite

CITATION STYLE

APA

Nefian, A. V., Liang, L. H., Fu, T., & Liu, X. X. (2003). A bayesian approach to audio-visual speaker identification. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2688, 761–769. https://doi.org/10.1007/3-540-44887-x_88

A bayesian approach to audio-visual speaker identification

Abstract

Cite

Register to see more suggestions