Deep Learning Approaches for Speech Emotion Recognition

  • Bhavan A
  • Sharma M
  • Piplani M
  • et al.
N/ACitations
Citations of this article
13Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Learning the latent representation of data in unsupervised fashion is a very interesting process that provides relevant features for enhancing the performance of a classifier. For speech emotion recognition tasks, generating effective features is crucial. Currently, handcrafted features are mostly used for speech emotion recognition, however, features learned automatically using deep learning have shown strong success in many problems, especially in image processing. In particular, deep generative models such as Variational Autoencoders (VAEs) have gained enormous success for generating features for natural images. Inspired by this, we propose VAEs for deriving the latent representation of speech signals and use this representation to classify emotions. To the best of our knowledge, we are the first to propose VAEs for speech emotion classification. Evaluations on the IEMOCAP dataset demonstrate that features learned by VAEs can produce state-of-the-art results for speech emotion classification.

Cite

CITATION STYLE

APA

Bhavan, A., Sharma, M., Piplani, M., Chauhan, P., Hitkul, & Shah, R. R. (2020). Deep Learning Approaches for Speech Emotion Recognition (pp. 259–289). https://doi.org/10.1007/978-981-15-1216-2_10

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free