A Robust Multimodal Speech Recognition Method using Optical Flow Analysis

  • Tamura S
  • Iwano K
  • Furui S
N/ACitations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We propose a new multimodal speech recognition method using optical flow analysis and evaluate its robustness to acoustic and visual noises. Optical flow is defined as the distribution of apparent velocities in the movement of brightness patterns in an image. Since the optical flow is computed without extracting speaker's lip contours and location, robust visual features can be obtained for lip movements. Our method calculates a visual feature set in each frame consisting of maximum and minimum values of the integral of the optical flow. This feature set has not only silence information but also open/close status of the speaker's mouth. The visual feature set is combined with an acoustic feature set in the framework of HMM-based recognition. Triphone HMMs are trained using the combined parameter set extracted from clean speech data. Two multimodal speech recognition experiments have been carried out. First, acoustic white noise was added to speech waveforms, and a speech recognition experiment was conducted using audio-visual data from 11 male speakers uttering connected Japanese digits. The following improvements of relative reduction of digit error rate over the audio-only recognition scheme were achieved when the visual information was incorporated into the silence HMM: 32% at SNR = 10dB and 47% at SNR = 15dB. Second, real-world data distorted both acoustically and visually was recorded in a driving car from six male speakers and recognised. We achieved approximately 17% and 11% relative error reduction compared with audio-only results on batch and incremental MLLR-based adaptation, respectively.

Cite

CITATION STYLE

APA

Tamura, S., Iwano, K., & Furui, S. (2005). A Robust Multimodal Speech Recognition Method using Optical Flow Analysis (pp. 37–53). https://doi.org/10.1007/1-4020-3075-4_3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free