Voice-activity and overlapped speech detection using x-vectors

5Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The x-vectors are features extracted from speech signals using pretrained deep neural networks, such that they discriminate well among different speakers. Their main application lies in speaker identification and verification. This manuscript studies, which other properties are encoded in x-vectors. The focus lies on distinguishing between speech signals/noise and utterances of a single speaker versus overlapped-speech. We attempt to show that the x-vector network is capable to extract multi-purpose features, which can be used by several simple back-end classifiers. This means a common feature extracting front-end for the tasks of voice-activity/overlapped speech detection and speaker identification. Compared to the alternative strategy, that is training of independent classifiers including feature extracting layers for each of the tasks, the common front-end saves computational time during both training and test phase.

Cite

CITATION STYLE

APA

Málek, J., & Žďánský, J. (2020). Voice-activity and overlapped speech detection using x-vectors. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12284 LNAI, pp. 366–376). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-58323-1_40

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free