Clustering target speaker on a set of telephone dialogs

1Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The ability of the speaker’s voice model to reproduce detailed parameterization of individual speech features is an important property for its use in solving different biometric problems. In general case one of the main reasons of performance degradation in voice biometric systems is the voice variability that occurs when speaker’s state (emotional, physiological, etc.) or channel conditions are changing. Therefore, accurate modeling of the intra-speaker voice variability leads to a more accurate voice model. This can be achieved by collecting multiple speech samples of the same speaker recorded in diverse conditions to create so-called multi-session model. We consider the case when speech data is represented by dialogues recorded in a single channel. This setup raises the problem of grouping the segments of a target speaker from the set of dialogues. We propose a clustering algorithm to solve this problem, which is based on the probabilistic linear discriminant analysis (PLDA). Our experiments demonstrate effectiveness of the proposed approach compared to solutions based on exhaustive search.

Cite

CITATION STYLE

APA

Shulipa, A., Sholohov, A., & Matveev, Y. (2017). Clustering target speaker on a set of telephone dialogs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10458 LNAI, pp. 244–252). Springer Verlag. https://doi.org/10.1007/978-3-319-66429-3_23

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free