Mostly, artificial intelligence does not show any definite change in emotions. For this reason, it is hard to demonstrate empathy in communication with humans. If frequency modification is applied to neutral emotions, or if a different emotional frequency is added to them, it is possible to develop artificial intelligence with emotions. This study proposes the emotion conversion using the Generative Adversarial Network (GAN) based voice frequency synthesis. The proposed method extracts a frequency from speech data of twenty-four actors and actresses. In other words, it extracts voice features of their different emotions, preserves linguistic features, and converts emotions only. After that, it generates a frequency in variational auto-encoding Wasserstein generative adversarial network (VAW-GAN) in order to make prosody and preserve linguistic information. That makes it possible to learn speech features in parallel. Finally, it corrects a frequency by employing Amplitude Scaling. With the use of the spectral conversion of logarithmic scale, it is converted into a frequency in consideration of human hearing features. Accordingly, the proposed technique provides the emotion conversion of speeches in order to express emotions in line with artificially generated voices or speeches.
CITATION STYLE
Kwon, H. J., Kim, M. J., Baek, J. W., & Chung, K. (2022). Voice Frequency Synthesis using VAW-GAN based Amplitude Scaling for Emotion Transformation. In KSII Transactions on Internet and Information Systems (Vol. 16, pp. 713–725). Korean Society for Internet Information. https://doi.org/10.3837/tiis.2022.02.018
Mendeley helps you to discover research relevant for your work.