Proceedings of the IEEE, vol. 78, issue 10 (1990) pp. 1658-1668
It is demonstrated that multiple sources of speech information can
be integrated at a subsymbolic level to improve vowel recognition.
Feedforward and recurrent neural networks are trained to estimate the
acoustic characteristics of a vocal tract from images of the speaker's
mouth. These estimates are then combined with the noise-degraded
acoustic information, effectively increasing the signal-to-noise ratio
and improving the recognition of these noise-degraded signals.
Alternative symbolic strategies such as direct categorization of the
visual signals into vowels are also presented. The performances of these
neural networks compare favorably with human performance and with other
pattern-matching and estimation techniques
Choose a citation style from the tabs below