Multi-view automatic lip-reading using neural network

Daehyun Lee; Jongmin Lee; Kee Eung Kim

Conference Proceedings

Multi-view automatic lip-reading using neural network

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10117 LNCS 290-302

DOI: 10.1007/978-3-319-54427-4_22

30Citations

29Readers

Get full text

Abstract

It is well known that automatic lip-reading (ALR), also known as visual speech recognition (VSR), enhances the performance of speech recognition in a noisy environment and also has applications itself. However, ALR is a challenging task due to various lip shapes and ambiguity of visemes (the basic unit of visual speech information). In this paper, we tackle ALR as a classification task using end-to-end neural network based on convolutional neural network and long short-term memory architecture. We conduct single, cross, and multi-view experiments in speaker independent setting with various network configuration to integrate the multi-view data. We achieve 77.9%, 83.8%, and 78.6% classification accuracies in average on single, cross, and multi-view respectively. This result is better than the best score (76%) of preliminary single-view results given by ACCV 2016 workshop on multi-view lip-reading/audio-visual challenges. It also shows that additional view information helps to improve the performance of ALR with neural network architecture.

Cite

CITATION STYLE

APA

Lee, D., Lee, J., & Kim, K. E. (2017). Multi-view automatic lip-reading using neural network. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10117 LNCS, pp. 290–302). Springer Verlag. https://doi.org/10.1007/978-3-319-54427-4_22

Multi-view automatic lip-reading using neural network

Abstract

Cite

Register to see more suggestions