Robust multi-modal speech recognition in two languages utilizing video and distance information from the kinect

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We investigate the performance of our audio-visual speech recognition system in both English and Greek under the influence of audio noise. We present the architecture of our recently built system that utilizes information from three streams including 3-D distance measurements. The feature extraction approach used is based on the discrete cosine transform and linear discriminant analysis. Data fusion is employed using state-synchronous hidden Markov models. Our experiments were conducted on our recently collected database under a multi-speaker configuration and resulted in higher performance and robustness in comparison to an audio-only recognizer. © 2013 Springer-Verlag.

Cite

CITATION STYLE

APA

Galatas, G., Potamianos, G., & Makedon, F. (2013). Robust multi-modal speech recognition in two languages utilizing video and distance information from the kinect. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8007 LNCS, pp. 43–48). https://doi.org/10.1007/978-3-642-39330-3_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free