Robust multi-modal speech recognition in two languages utilizing video and distance information from the kinect

Georgios Galatas; Gerasimos Potamianos; Fillia Makedon

Conference ProceedingsOPEN ACCESS

Robust multi-modal speech recognition in two languages utilizing video and distance information from the kinect

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 8007 LNCS(PART 4) 43-48

DOI: 10.1007/978-3-642-39330-3_5

1Citations

3Readers

Abstract

We investigate the performance of our audio-visual speech recognition system in both English and Greek under the influence of audio noise. We present the architecture of our recently built system that utilizes information from three streams including 3-D distance measurements. The feature extraction approach used is based on the discrete cosine transform and linear discriminant analysis. Data fusion is employed using state-synchronous hidden Markov models. Our experiments were conducted on our recently collected database under a multi-speaker configuration and resulted in higher performance and robustness in comparison to an audio-only recognizer. © 2013 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Galatas, G., Potamianos, G., & Makedon, F. (2013). Robust multi-modal speech recognition in two languages utilizing video and distance information from the kinect. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8007 LNCS, pp. 43–48). https://doi.org/10.1007/978-3-642-39330-3_5

Robust multi-modal speech recognition in two languages utilizing video and distance information from the kinect

Abstract

Author supplied keywords

Cite

Register to see more suggestions