Deep neural network quantizers outperforming continuous speech recognition systems

2Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In Automatic Speech Recognition (ASR), the acoustic model (AM) is modeled by a Deep Neural Network (DNN). The DNN learns a posterior probability in a supervised fashion utilizing input features and ground-truth labels. Current approaches combine a DNN with a Hidden Markov Model (HMM) in a hybrid approach, which achieved good results in the last years. Similar approaches using a discrete version, hence a Discrete Hidden Markov Model (DHMM), have been disregarded in recent past. Our approach revisits the idea of a discrete system, more precisely the so-called Deep Neural Network Quantizer (DNNQ), demonstrating how a DNNQ is created and trained. We introduce a novel approach to train a DNNQ in a supervised fashion with an arbitrary output layer size even though suitable target values are not available. The proposed method provides a mapping function exploiting fixed ground-truth labels. Consequently, we are able to apply a frame-based cross entropy (CE) training. Our experiments demonstrate that the DNNQ reduces the Word Error Rate (WER) by 17.6 % on monophones and by 2.2 % on triphones, respectively, compared to a continuous HMM-Gaussian Mixture Model (GMM) system.

Cite

CITATION STYLE

APA

Watzel, T., Li, L., Kürzinger, L., & Rigoll, G. (2019). Deep neural network quantizers outperforming continuous speech recognition systems. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11658 LNAI, pp. 530–539). Springer Verlag. https://doi.org/10.1007/978-3-030-26061-3_54

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free