Neural Network Models of Sensory Integration for Improved Vowel Recognition

52Citations
Citations of this article
37Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Automatic speech recognizers currently perform poorly in the presence of noise. Humans, on the other hand, often compensate for noise degradation by extracting speech information from alternative sources and then integrating this information with the acoustical signal. Visual signals from the speaker’s face are one source of supplemental speech information. We demonstrate that multiple sources of speech information can be integrated at a sub-symbolic level to improve vowel recognition. Feedforward and recurrent neural networks are trained to estimate the acoustic characteristics of the vocal tract from images of the speaker’s mouth. These estimates are then combined with the noise-degraded acoustic information, effectively increasing the signal-to-noise ratio and improving the recognition of these noise-degraded signals. Alternative symbolic strategies, such as direct categorization of the visual signals into vowels, are also presented. The performances of these neural networks compared favorably with human performance and with other pattern-matching and estimation techniques. © 1990, IEEE

Cite

CITATION STYLE

APA

Yuhas, B. P., Goldstein, M. H., Sejnowski, T. J., & Jenkins, R. E. (1990). Neural Network Models of Sensory Integration for Improved Vowel Recognition. Proceedings of the IEEE, 78(10), 1658–1668. https://doi.org/10.1109/5.58349

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free