Extraction of features for lip-reading using autoencoders

Karel Paleček

Conference Proceedings

Extraction of features for lip-reading using autoencoders

Paleček K

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8773 209-216

DOI: 10.1007/978-3-319-11581-8_26

6Citations

5Readers

Get full text

Abstract

We study the incorporation of facial depth data in the task of isolated word visual speech recognition. We propose novel features based on unsupervised training of a single layer autoencoder. The features are extracted from both video and depth channels obtained by Microsoft Kinect device. We perform all experiments on our database of 54 speakers, each uttering 50 words. We compare our autoencoder features to traditional methods such as DCT or PCA. The features are further processed by simplified variant of hierarchical linear discriminant analysis in order to capture the speech dynamics. The classification is performed using a multi-stream Hidden Markov Model for various combinations of audio, video, and depth channels. We also evaluate visual features in the join audio-video isolated word recognition in noisy environments. English.

Author supplied keywords

Cite

CITATION STYLE

APA

Paleček, K. (2014). Extraction of features for lip-reading using autoencoders. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8773, pp. 209–216). Springer Verlag. https://doi.org/10.1007/978-3-319-11581-8_26

Extraction of features for lip-reading using autoencoders

Abstract

Author supplied keywords

Cite

Register to see more suggestions