Zero-Shot Unseen Speaker Anonymization via Voice Conversion

1Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Speech-based interfaces provide convenient methods for controlling various smart devices. For these interfaces to work reliably, considerable speech data with various noise and speaker characteristics must be collected to train the associated speech-processing models. Gathering spoken commands from actual users of devices can improve those devices' performance by familiarizing each device with the individual acoustic characteristic of its particular user's speech. However, the direct acquisition of spoken commands could threaten the privacy of users, as the spoken data would contain sensitive speaker-specific information. Speaker anonymization algorithms can be applied to suppress such sensitive information, while preserving the linguistic content of a user's speech. Previous speaker anonymization algorithms could handle only the voice of speakers who contributed to the training datasets. As speaker anonymization algorithms are typically applied to new speakers (who are absent from the training datasets), a method of handling such speakers (commonly referred to as 'unseen speakers') should be developed. In this paper, we propose a novel method that can effectively suppress the individual characteristics in an unseen speaker's voice, while retaining the linguistic content of the speech. It adopts zero-shot voice conversion methods for the unseen speaker anonymization. Since the proposed method utilizes speaker identity vectors commonly used in many-to-many voice conversion algorithms and does not modify the conversion algorithm itself, it can be easily combined with many other voice conversion algorithms. The proposed method is evaluated using the VCC2018 and VCTK corpora. Speaker identification rate and speech recognition rate are used for quantitative analysis. The experimental results showed that the average speaker identification accuracy was decreased by 92.3% point absolutely and the average speech recognition accuracy was decreased by 17.7% point absolutely after the speaker anonymization by the proposed method.

Cite

CITATION STYLE

APA

Chang, H. P., Yoo, I. C., Jeong, C., & Yook, D. (2022). Zero-Shot Unseen Speaker Anonymization via Voice Conversion. IEEE Access, 10, 130190–130199. https://doi.org/10.1109/ACCESS.2022.3227963

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free