Designing of a robust Human-Computer Interaction (HCI) system is a challenging task,especially for automatic speech recognition (ASR) when working under unfriendly environment.This paper proposesan ASRsystem which uses bimodal information (i.e. Speech along with the visual input) resulting inimproved robustness. In thisresearch staticand dynamic (Δ) audio features are extracted using the Mel-Frequency Cepstral Coefficients (MFCC).The visual feature isextracted using Two-Dimensional Discrete Wavelet Transform (2D-DWT). Audio-video recognition is performed over different combination of visual feature using HMM (Hidden Markov Model) under clean and noisy environmental conditions.Aligarh Muslim University Audio Visual (AMUAV) Hindi database has been chosen as the baseline data. In addition, noisy speech signal performance is evaluated for different Signal to Noise Ratio (SNR: 30 dB to -20 dB). At last, addition of visual information to ASR is reported to increase the accuracy when working under smart assistive environment, i.e. for applications, which may not have the noise-free background condition.
CITATION STYLE
Upadhyaya, P., Farooq, O., Abidi, M. R., & Varshney, P. (2015). Performance evaluation of bimodal Hindi speech recognition under adverse environment. In Advances in Intelligent Systems and Computing (Vol. 328, pp. 347–355). Springer Verlag. https://doi.org/10.1007/978-3-319-12012-6_38
Mendeley helps you to discover research relevant for your work.