Visual features extracting & selecting for lipreading

1Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper has put forward a way to select and extract visual features effectively for lipreading. These features come from both low-level and high-level, those are compensatory each other. There are 41 dimensional features to be used for recognition. Tested on a bimodal database AVCC which consists of sentences including all Chinese pronunciation, it achieves an accuracy of 87.8% from 84.1% for automatic speech recognition by lipreading assisting. It improves 19.5% accuracy from 31.7% to 51.2% for speakers dependent and improves 27.7% accuracy from 27.6% to 55.3% for speakers independent when speech recognition under noise conditions. And the paper has proves that visual speech information can reinforce the loss of acoustic information effectively by improving recognition rate from 10% to 30% various with the different amount of noises in speech signals in our system, the improving scope is higher than ASR system of IBM. And it performs better in noisy environments. © Springer-Verlag 2003.

Cite

CITATION STYLE

APA

Yao, H. X., Gao, W., Shan, W., & Xu, M. H. (2003). Visual features extracting & selecting for lipreading. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2688, 251–259. https://doi.org/10.1007/3-540-44887-x_30

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free