Data augmentation and teacher-student training for LF-MMI based robust speech recognition

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Deep neural networks (DNN) have played a key role in the development of state-of-the-art speech recognition systems. In recent years, lattice-free MMI objective (LF-MMI) has become a popular method for training DNN acoustic models. However, domain adaptation of DNNs from clean to noisy data still remains a challenging problem. In this paper, we compare and combine two methods for adapting LF-MMI-based models to a noisy domain that do not require transcribed noisy data: multi-condition training and teacher-student style domain adaptation. For teacher-student training, we use lattices obtained via decoding untranscribed clean speech as supervision for adapting the model to noisy domain. We use in-domain noise extracted from a large untranscribed speech corpus using voice activity detection for noise-augmentation in multi-condition training and teacher-student training. We show that combining multi-condition training and lattice-based teacher-student training gives better results than either of the methods alone. Furthermore, we show the benefits of using in-domain noise instead of general noise profiles for noise augmentation. Overall, we obtain 7.4% relative improvement in word error rate over a standard multi-condition baseline.

Cite

CITATION STYLE

APA

Asadullah, & Alumäe, T. (2018). Data augmentation and teacher-student training for LF-MMI based robust speech recognition. In Lecture Notes in Computer Science (Vol. 11107 LNAI, pp. 403–410). Springer Verlag. https://doi.org/10.1007/978-3-030-00794-2_43

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free