Abstract
Deep neural networks (DNN) have played a key role in the development of state-of-the-art speech recognition systems. In recent years, lattice-free MMI objective (LF-MMI) has become a popular method for training DNN acoustic models. However, domain adaptation of DNNs from clean to noisy data still remains a challenging problem. In this paper, we compare and combine two methods for adapting LF-MMI-based models to a noisy domain that do not require transcribed noisy data: multi-condition training and teacher-student style domain adaptation. For teacher-student training, we use lattices obtained via decoding untranscribed clean speech as supervision for adapting the model to noisy domain. We use in-domain noise extracted from a large untranscribed speech corpus using voice activity detection for noise-augmentation in multi-condition training and teacher-student training. We show that combining multi-condition training and lattice-based teacher-student training gives better results than either of the methods alone. Furthermore, we show the benefits of using in-domain noise instead of general noise profiles for noise augmentation. Overall, we obtain 7.4% relative improvement in word error rate over a standard multi-condition baseline.
Author supplied keywords
Cite
CITATION STYLE
Asadullah, & Alumäe, T. (2018). Data augmentation and teacher-student training for LF-MMI based robust speech recognition. In Lecture Notes in Computer Science (Vol. 11107 LNAI, pp. 403–410). Springer Verlag. https://doi.org/10.1007/978-3-030-00794-2_43
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.