Lipreading using convolutional neural network

Kuniaki Noda; Yuki Yamaguchi; Kazuhiro Nakadai; Hiroshi G. Okuno; Tetsuya Ogata

Conference ProceedingsOPEN ACCESS

Lipreading using convolutional neural network

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (2014) 1149-1153

DOI: 10.55041/ijsrem13521

117Citations

76Readers

Abstract

In recent automatic speech recognition studies, deep learning architecture applications for acoustic modeling have eclipsed conventional sound features such as Mel-frequency cepstral co- efficients. However, for visual speech recognition (VSR) stud- ies, handcrafted visual feature extraction mechanisms are still widely utilized. In this paper, we propose to apply a convo- lutional neural network (CNN) as a visual feature extraction mechanism for VSR. By training a CNN with images of a speaker's mouth area in combination with phoneme labels, the CNN acquires multiple convolutional filters, used to extract vi- sual features essential for recognizing phonemes. Further, by modeling the temporal dependencies of the generated phoneme label sequences, a hidden Markov model in our proposed sys- Tem recognizes multiple isolated words. Our proposed system is evaluated on an audio-visual speech dataset comprising 300 Japanese words with six different speakers. The evaluation re- sults of our isolated word recognition experiment demonstrate that the visual features acquired by the CNN significantly out- perform those acquired by conventional dimensionality com- pression approaches, including principal component analysis.

Author supplied keywords

Cite

CITATION STYLE

APA

Noda, K., Yamaguchi, Y., Nakadai, K., Okuno, H. G., & Ogata, T. (2014). Lipreading using convolutional neural network. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 1149–1153). International Speech and Communication Association. https://doi.org/10.55041/ijsrem13521

Lipreading using convolutional neural network

Abstract

Author supplied keywords

Cite

Register to see more suggestions