Leveraging Modality-Specific Representations for Audio-Visual Speech Recognition via Reinforcement Learning

25Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.

Abstract

Audio-visual speech recognition (AVSR) has gained remarkable success for ameliorating the noise-robustness of speech recognition. Mainstream methods focus on fusing audio and visual inputs to obtain modality-invariant representations. However, such representations are prone to over-reliance on audio modality as it is much easier to recognize than video modality in clean conditions. As a result, the AVSR model underestimates the importance of visual stream in face of noise corruption. To this end, we leverage visual modality-specific representations to provide stable complementary information for the AVSR task. Specifically, we propose a reinforcement learning (RL) based framework called MSRL, where the agent dynamically harmonizes modality-invariant and modality-specific representations in the auto-regressive decoding process. We customize a reward function directly related to task-specific metrics (i.e., word error rate), which encourages the MSRL to effectively explore the optimal integration strategy. Experimental results on the LRS3 dataset show that the proposed method achieves state-of-the-art in both clean and various noisy conditions. Furthermore, we demonstrate the better generality of MSRL system than other baselines when test set contains unseen noises.

Cite

CITATION STYLE

APA

Chen, C., Hu, Y., Zhang, Q., Zou, H., Zhu, B., & Chng, E. S. (2023). Leveraging Modality-Specific Representations for Audio-Visual Speech Recognition via Reinforcement Learning. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023 (Vol. 37, pp. 12607–12615). AAAI Press. https://doi.org/10.1609/aaai.v37i11.26484

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free