Improving Spoofing Capability for End-to-end Any-to-many Voice Conversion

5Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Audio deep synthesis techniques have been able to generate highquality speech whose authenticity is difficult for humans to recognize. Meanwhile, many anti-spoofing systems have been developed to capture artifacts in the synthesized speech that are imperceptible to human hearing, thus a continuous escalating race of 'attacking and defending' in voice deepfake has started. Hence, to further improve the probability of successfully cheating anti-spoofing systems, we propose a fully end-to-end, any-to-many voice conversion method based on a non-autoregressive structure with the addition of two light but strong post-processing strategies namely silence replacement and global noise perturbation. Experimental results show that the proposed method performs better than current baselines in fooling several state-of-the-art anti-spoofing systems. Better naturalness and speaker similarity are also achieved, resulting in our proposed method showing high deception performance against humans.

Cite

CITATION STYLE

APA

Hua, H., Chen, Z., Zhang, Y., Li, M., & Zhang, P. (2022). Improving Spoofing Capability for End-to-end Any-to-many Voice Conversion. In DDAM 2022 - Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia (pp. 93–100). Association for Computing Machinery, Inc. https://doi.org/10.1145/3552466.3556532

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free