Building Real-Time Speech Recognition Without CMVN

Thai Son Nguyen; Matthias Sperber; Sebastian Stüker; Alex Waibel

Conference Proceedings

Building Real-Time Speech Recognition Without CMVN

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11096 LNAI 451-460

DOI: 10.1007/978-3-319-99579-3_47

4Citations

12Readers

Get full text

Abstract

Estimating cepstral mean and variance normalization (CMVN) in run-on and real-time settings poses several challenges. Using a moving average for variance and mean estimation requires a comparatively long history of data from a speaker which is not appropriate for short utterances or conversations. Using a pre-estimated global CMVN for speakers instead reduces the recognition performance due to potential mismatch between training and testing data. This paper investigates how to build a real-time run-on speech recognition system using acoustic features without applying CMVN. We propose a feature extraction architecture which can transform unnormalized log mel features to normalized bottleneck features without using historical data. We empirically show that mean and variance normalization is not critical for training neural networks on speech data. Using the proposed feature extraction, we achieved 4.1% word error rate reduction compared to global CMVN on the Skype conversations test set. We also reveal many cases when features without zero-mean can be learnt well by neural networks which stands in contrast to prior work.

Author supplied keywords

Cite

CITATION STYLE

APA

Nguyen, T. S., Sperber, M., Stüker, S., & Waibel, A. (2018). Building Real-Time Speech Recognition Without CMVN. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11096 LNAI, pp. 451–460). Springer Verlag. https://doi.org/10.1007/978-3-319-99579-3_47

Building Real-Time Speech Recognition Without CMVN

Abstract

Author supplied keywords

Cite

Register to see more suggestions