Intra-gender statistical singing voice conversion with direct waveform modification using log-spectral differential

26Citations
Citations of this article
24Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

This paper presents a novel intra-gender statistical singing voice conversion (SVC) technique with direct waveform modification based on the log-spectrum differential (DIFFSVC) that can convert the voice timbre of a source singer into that of a target singer without vocoder-based waveform generation of the converted singing voice. SVC makes it possible to convert the singing voice characteristics of an arbitrary source singer into those of an arbitrary target singer by converting some of its acoustic features, such as F0, aperiodicity, and spectral features based on a statistical conversion function. However, the sound quality of the converted singing voice is typically degraded compared with that of a natural singing voice, owing to various factors, such as analysis and modeling errors in the vocoding process and over-smoothing of the converted feature trajectory. To alleviate sound quality degradation, we propose a statistical conversion process that directly modifies the signal in the waveform domain by estimating the difference in the spectra of the source and target singers’ singing voices. Additionally, we propose the following several techniques for the DIFFSVC method: 1) derivation of a differential Gaussian mixture model (DIFFGMM) from a conventional Gaussian mixture model (GMM) and 2) a parameter generation algorithm considering the global variance (GV). The experimental results demonstrate that the proposed DIFFSVC methods enable significant improvements in the sound quality of the converted singing voice, while preserving the conversion accuracy of the singer's identity compared with conventional SVC.

Cite

CITATION STYLE

APA

Kobayashi, K., Toda, T., & Nakamura, S. (2018). Intra-gender statistical singing voice conversion with direct waveform modification using log-spectral differential. Speech Communication, 99, 211–220. https://doi.org/10.1016/j.specom.2018.03.011

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free