This paper describes the integration of audio and visual speech information for robust adaptive speech processing. Since both audio speech signals and visual face configurations are produced by the human speech organs, these two types of information are strongly correlated and sometimes complement each other. This paper describes two applications based on the relationship between the two types of information, that is, bimodal speech recognition robust to acoustic noise that integrates audio-visual information, and speaking face synthesis based on the correlation between audio and visual speech. © Springer-Verlag 2001.
CITATION STYLE
Nakamura, S. (2001). Fusion of audio-visual information for integrated speech processing. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2091 LNCS, pp. 127–143). Springer Verlag. https://doi.org/10.1007/3-540-45344-x_20
Mendeley helps you to discover research relevant for your work.