Audio-visual speech recognition based on AAM parameter and phoneme analysis of visual feature

Yuto Komai; Yasuo Ariki; Tetsuya Takiguchi

Conference ProceedingsOPEN ACCESS

Audio-visual speech recognition based on AAM parameter and phoneme analysis of visual feature

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011) 7087 LNCS(PART1) 97-108

DOI: 10.1007/978-3-642-25367-6_9

3Citations

6Readers

Abstract

As one of the techniques for robust speech recognition under noisy environment, audio-visual speech recognition using lip dynamic visual information together with audio information is attracting attention and the research is advanced in recent years. Since visual information plays a great role in audio-visual speech recognition, what to select as the visual feature becomes a significant point. This paper proposes, for spoken word recognition, to utilize c combined parameter(combined parameter) as the visual feature extracted by Active Appearance Model applied to a face image including the lip area. Combined parameter contains information of the coordinate value and the intensity value as the visual feature. The recognition rate was improved by the proposed feature compared to the conventional features such as DCT and the principal component score. Finally, we integrated the phoneme score from audio information and the viseme score from visual information with high accuracy. © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Komai, Y., Ariki, Y., & Takiguchi, T. (2011). Audio-visual speech recognition based on AAM parameter and phoneme analysis of visual feature. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7087 LNCS, pp. 97–108). https://doi.org/10.1007/978-3-642-25367-6_9

Audio-visual speech recognition based on AAM parameter and phoneme analysis of visual feature

Abstract

Cite

Register to see more suggestions