Two-level bimodal association for audio-visual speech recognition

Jong Seok Lee; Touradj Ebrahimi

Conference Proceedings

Two-level bimodal association for audio-visual speech recognition

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5807 LNCS 133-144

DOI: 10.1007/978-3-642-04697-1_13

2Citations

6Readers

Get full text

Abstract

This paper proposes a new method for bimodal information fusion in audio-visual speech recognition, where cross-modal association is considered in two levels. First, the acoustic and the visual data streams are combined at the feature level by using the canonical correlation analysis, which deals with the problems of audio-visual synchronization and utilizing the cross-modal correlation. Second, information streams are integrated at the decision level for adaptive fusion of the streams according to the noise condition of the given speech datum. Experimental results demonstrate that the proposed method is effective for producing noise-robust recognition performance without a priori knowledge about the noise conditions of the speech data. © 2009 Springer Berlin Heidelberg.

Cite

CITATION STYLE

APA

Lee, J. S., & Ebrahimi, T. (2009). Two-level bimodal association for audio-visual speech recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5807 LNCS, pp. 133–144). https://doi.org/10.1007/978-3-642-04697-1_13

Two-level bimodal association for audio-visual speech recognition

Abstract

Cite

Register to see more suggestions