GMM-Based Emotional Voice Conversion Using Spectrum and Prosody Features

  • Aihara R
  • Takashima R
  • Takiguchi T
  • et al.
N/ACitations
Citations of this article
31Readers
Mendeley users who have this article in their library.

Abstract

We propose Gaussian Mixture Model (GMM)-based emotional voice conversion using spectrum and prosody features. In recent years, speech recognition and synthesis techniques have been developed, and an emotional voice conversion technique is required for synthesizing more expressive voices. The common emotional conversion was based on transformation of neutral prosody to emotional prosody by using huge speech corpus. In this paper, we convert a neutral voice to an emot ional voice using GMMs. GMM-based spectrum conversion is widely used to modify non linguistic informat ion such as voice characteristics while keeping linguistic information unchanged. Because the conventional method converts either prosody or voice quality (spectrum), some emot ions are not converted well. In our method, both prosody and voice quality are used for converting a neutral voice to an emotional voice, and it is able to obtain more expressive voices in comparison with conventional methods, such as prosody or spectrum conversion.

Cite

CITATION STYLE

APA

Aihara, R., Takashima, R., Takiguchi, T., & Ariki, Y. (2012). GMM-Based Emotional Voice Conversion Using Spectrum and Prosody Features. American Journal of Signal Processing, 2(5), 134–138. https://doi.org/10.5923/j.ajsp.20120205.06

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free