Modal odal odal odal Feature Feature Feature Feature E E E Extraction xtraction xtraction xtraction

  • Selvakumar E
  • Shanmuga Priya S
  • 1


    Mendeley users who have this article in their library.
  • N/A


    Citations of this article.


Automatic Speech Recognition (ASR) is an essential component in many Human-Computer Interaction systems. A variety of applications in the field of ASR have reached high performance levels but only for condition-controlled environments. In this project, we reduce the noise in the video lectures using bi-modal feature extraction. Audio signal features need to be enhanced with additional sources of complementary information to overcome problems due to large amounts of acoustic noise. Visual Information extracted from speaker's mouth region seems to be promising and appropriate for giving audio-only recognition a boost. Lip/Mouth detection and tracking combined with traditional Image Processing methods may offer a variety of solutions for the construction of the visual front-end schema. Furthermore, Audio and Visual stream fusion appears to be even more challenging and crucial for designing an efficient AV Recognizer. In this project, we investigate some problems in the field of Audio-Visual Automatic Speech Recognition (AV-ASR) concerning visual feature extraction and audio-visual integration to reduce noise in the video lectures.

Author-supplied keywords

  • ASR
  • Audio-visual automatic speech recognition
  • Feature extraction
  • Multi-stream HMM

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document


  • E S Selvakumar

  • S Shanmuga Priya

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free