Abstract
Audio deep synthesis techniques have been able to generate highquality speech whose authenticity is difficult for humans to recognize. Meanwhile, many anti-spoofing systems have been developed to capture artifacts in the synthesized speech that are imperceptible to human hearing, thus a continuous escalating race of 'attacking and defending' in voice deepfake has started. Hence, to further improve the probability of successfully cheating anti-spoofing systems, we propose a fully end-to-end, any-to-many voice conversion method based on a non-autoregressive structure with the addition of two light but strong post-processing strategies namely silence replacement and global noise perturbation. Experimental results show that the proposed method performs better than current baselines in fooling several state-of-the-art anti-spoofing systems. Better naturalness and speaker similarity are also achieved, resulting in our proposed method showing high deception performance against humans.
Author supplied keywords
Cite
CITATION STYLE
Hua, H., Chen, Z., Zhang, Y., Li, M., & Zhang, P. (2022). Improving Spoofing Capability for End-to-end Any-to-many Voice Conversion. In DDAM 2022 - Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia (pp. 93–100). Association for Computing Machinery, Inc. https://doi.org/10.1145/3552466.3556532
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.