Deep neural network quantizers outperforming continuous speech recognition systems

Tobias Watzel; Lujun Li; Ludwig Kürzinger; Gerhard Rigoll

Conference Proceedings

Deep neural network quantizers outperforming continuous speech recognition systems

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11658 LNAI 530-539

DOI: 10.1007/978-3-030-26061-3_54

2Citations

5Readers

Get full text

Abstract

In Automatic Speech Recognition (ASR), the acoustic model (AM) is modeled by a Deep Neural Network (DNN). The DNN learns a posterior probability in a supervised fashion utilizing input features and ground-truth labels. Current approaches combine a DNN with a Hidden Markov Model (HMM) in a hybrid approach, which achieved good results in the last years. Similar approaches using a discrete version, hence a Discrete Hidden Markov Model (DHMM), have been disregarded in recent past. Our approach revisits the idea of a discrete system, more precisely the so-called Deep Neural Network Quantizer (DNNQ), demonstrating how a DNNQ is created and trained. We introduce a novel approach to train a DNNQ in a supervised fashion with an arbitrary output layer size even though suitable target values are not available. The proposed method provides a mapping function exploiting fixed ground-truth labels. Consequently, we are able to apply a frame-based cross entropy (CE) training. Our experiments demonstrate that the DNNQ reduces the Word Error Rate (WER) by 17.6 % on monophones and by 2.2 % on triphones, respectively, compared to a continuous HMM-Gaussian Mixture Model (GMM) system.

Author supplied keywords

Cite

CITATION STYLE

APA

Watzel, T., Li, L., Kürzinger, L., & Rigoll, G. (2019). Deep neural network quantizers outperforming continuous speech recognition systems. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11658 LNAI, pp. 530–539). Springer Verlag. https://doi.org/10.1007/978-3-030-26061-3_54

Deep neural network quantizers outperforming continuous speech recognition systems

Abstract

Author supplied keywords

Cite

Register to see more suggestions