A Lip Reading Method Based on 3D Convolutional Vision Transformer

Huijuan Wang; Gangqiang Pu; Tingyu Chen

Journal ArticleOPEN ACCESS

A Lip Reading Method Based on 3D Convolutional Vision Transformer

IEEE Access (2022) 10 77205-77212

DOI: 10.1109/ACCESS.2022.3193231

23Citations

26Readers

Abstract

Lip reading has received increasing attention in recent years. It judges the content of speech based on the movement of the speaker's lips. The rapid development of deep learning has promoted progress in lip reading. However, due to lip reading needs to process the information of continuous video frames, it is necessary to consider the correlation information between adjacent images and the correlation between long-distance images. Moreover, lip reading recognition mainly focuses on the subtle changes of lips and their surrounding environment, and it is necessary to extract the subtle features of small-size images. Therefore, the performance of machine lip reading is generally not high, and the research progress is slow. In order to improve the performance of machine lip reading, we propose a lip reading method based on 3D convolutional vision transformer (3DCvT), which combines vision transformer and 3D convolution to extract the spatio-temporal feature of continuous images, and take full advantage of the properties of convolutions and transformers to extract local and global features from continuous images effectively. The extracted features are then sent to a Bidirectional Gated Recurrent Unit (BiGRU) for sequence modeling. We proved the effectiveness of our method on large-scale lip reading datasets LRW and LRW-1000 and achieved state-of-the-art performance.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, H., Pu, G., & Chen, T. (2022). A Lip Reading Method Based on 3D Convolutional Vision Transformer. IEEE Access, 10, 77205–77212. https://doi.org/10.1109/ACCESS.2022.3193231

A Lip Reading Method Based on 3D Convolutional Vision Transformer

Abstract

Author supplied keywords

Cite

Register to see more suggestions