Abstract
Lip-reading is the process of deciphering text from a speaker’s visual interpretation of facial, lip, and mouth movements without using audio. The challenge is traditionally divided into two stages: creating or learning visual characteristics and prediction. End-to-end techniques for deep lip-reading have been popular in recent years. Existing work on end-to-end models, on the other hand, only does word classification rather than sentence-level sequence prediction. Longer words improve human lip-reading ability, suggesting the relevance of characteristics that capture the temporal context in an inconsistent communication channel. In this study, an end-to-end model based on deep learning convolutional neural network shave been employed to develop an automated lip-reading system that uses a re-current network spatiotemporal convolutions, and the connectionist temporal classification loss to translate a variable-length series of video frames to text. The accuracy of the trained lip-reading process in predicting sentences was evaluated using video-based features.
Author supplied keywords
Cite
CITATION STYLE
Mahboob, K., Nizami, H., Ali, F., & Alvi, F. (2021). Sentences Prediction Based on Automatic Lip-Reading Detection with Deep Learning Convolutional Neural Networks Using Video-Based Features. In Communications in Computer and Information Science (Vol. 1489 CCIS, pp. 42–53). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-16-7334-4_4
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.