We propose Gaussian Mixture Model (GMM)-based emotional voice conversion using spectrum and prosody features. In recent years, speech recognition and synthesis techniques have been developed, and an emotional voice conversion technique is required for synthesizing more expressive voices. The common emotional conversion was based on transformation of neutral prosody to emotional prosody by using huge speech corpus. In this paper, we convert a neutral voice to an emot ional voice using GMMs. GMM-based spectrum conversion is widely used to modify non linguistic informat ion such as voice characteristics while keeping linguistic information unchanged. Because the conventional method converts either prosody or voice quality (spectrum), some emot ions are not converted well. In our method, both prosody and voice quality are used for converting a neutral voice to an emotional voice, and it is able to obtain more expressive voices in comparison with conventional methods, such as prosody or spectrum conversion.
CITATION STYLE
Aihara, R., Takashima, R., Takiguchi, T., & Ariki, Y. (2012). GMM-Based Emotional Voice Conversion Using Spectrum and Prosody Features. American Journal of Signal Processing, 2(5), 134–138. https://doi.org/10.5923/j.ajsp.20120205.06
Mendeley helps you to discover research relevant for your work.