Improved speaker adaptation by combining i-vector and fmllr with deep bottleneck networks

2Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper investigates how deep bottleneck neural networks can be used to combine the benefits of both i-vectors and speaker-adaptive feature transformations. We show how a GMM-based speech recognizer can be greatly improved by applying feature-space maximum likelihood linear regression (fMLLR) transformation to outputs of a deep bottleneck neural network trained on a concatenation of regular Mel filterbank features and speaker i-vectors. The addition of the i-vectors reduces word error rate of the GMM system by 3–7% compared to an identical system without i-vectors. We also examine Deep Neural Network (DNN) systems trained on various combinations of i-vectors, fMLLR-transformed bottleneck features and other feature space transformations. The best approach results speaker-adapted DNNs which showed 15–19% relative improvement over a strong speaker-independent DNN baseline.

Cite

CITATION STYLE

APA

Nguyen, T. S., Kilgour, K., Sperber, M., & Waibel, A. (2017). Improved speaker adaptation by combining i-vector and fmllr with deep bottleneck networks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10458 LNAI, pp. 417–426). Springer Verlag. https://doi.org/10.1007/978-3-319-66429-3_41

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free