Covariance based deep feature for text-dependent speaker verification

3Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

d-vector approach achieved impressive results in speaker verification. Representation is obtained at utterance level by calculating the mean of the frame level outputs of a hidden layer of the DNN. Although mean based speaker identity representation has achieved good performance, it ignores the variability of frames across the whole utterance, which consequently leads to information loss. This is particularly serious for text-dependent speaker verification, where within-utterance feature variability better reflects text variability than the mean. To address this issue, a new covariance based speaker representation is proposed in this paper. Here, covariance of the frame level outputs is calculated and incorporated into the speaker identity representation. The proposed approach is investigated within a joint multi-task learning framework for text-dependent speaker verification. Experiments on RSR2015 and RedDots showed that, covariance based deep feature can significantly improve the performance compared to the traditional mean based deep features.

Cite

CITATION STYLE

APA

Wang, S., Dinkel, H., Qian, Y., & Yu, K. (2018). Covariance based deep feature for text-dependent speaker verification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11266 LNCS, pp. 231–242). Springer Verlag. https://doi.org/10.1007/978-3-030-02698-1_20

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free