Modal odal odal odal Feature Feature Feature Feature E E E Extraction xtraction xtraction xtraction

  • Selvakumar E
  • Shanmuga Priya S
ISSN: 2277-5420
N/ACitations
Citations of this article
1Readers
Mendeley users who have this article in their library.

Abstract

Automatic Speech Recognition (ASR) is an essential component in many Human-Computer Interaction systems. A variety of applications in the field of ASR have reached high performance levels but only for condition-controlled environments. In this project, we reduce the noise in the video lectures using bi-modal feature extraction. Audio signal features need to be enhanced with additional sources of complementary information to overcome problems due to large amounts of acoustic noise. Visual Information extracted from speaker's mouth region seems to be promising and appropriate for giving audio-only recognition a boost. Lip/Mouth detection and tracking combined with traditional Image Processing methods may offer a variety of solutions for the construction of the visual front-end schema. Furthermore, Audio and Visual stream fusion appears to be even more challenging and crucial for designing an efficient AV Recognizer. In this project, we investigate some problems in the field of Audio-Visual Automatic Speech Recognition (AV-ASR) concerning visual feature extraction and audio-visual integration to reduce noise in the video lectures.

Cite

CITATION STYLE

APA

Selvakumar, E. S., & Shanmuga Priya, S. (2013). Modal odal odal odal Feature Feature Feature Feature E E E Extraction xtraction xtraction xtraction. IJCSN International Journal of Computer Science and Network, 2(2).

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free