Lip reading is the ability to understand what a person is communicating using just the video information. Due to the advent of Internet and computers, it is now possible to remove human intervention from lip reading. Such automation is only feasible because of a couple of developments in the field of computer vision: availability of a large-scale dataset for training and use of neural network models. The applications to this are numerous. From dictating messages to a device in a noisy environment to improving speech recognition in the current technologies, visual speech recognition has proved to be pivotal. In this paper, the lip-reading models are based on deep neural network architectures that capture temporal data which are created for the task of speech recognition.
CITATION STYLE
Mahadevaswamy, U. B., Shashank Rao, M., Vrushab, S., Anagha, C., & Sangameshwar, V. (2021). Visual speech processing and recognition. In Advances in Intelligent Systems and Computing (Vol. 1141, pp. 481–491). Springer. https://doi.org/10.1007/978-981-15-3383-9_44
Mendeley helps you to discover research relevant for your work.