A Hybrid Technique using CNN LSTM for Speech Emotion Recognition

  • Qazi H
  • et al.
Citations of this article
Mendeley users who have this article in their library.
Get full text


Automatic speech emotion recognition is a very necessary activity for effective human-computer interaction. This paper is motivated by using spectrograms as inputs to the hybrid deep convolutional LSTM for speech emotion recognition. In this study, we trained our proposed model using four convolutional layers for high-level feature extraction from input spectrograms, LSTM layer for accumulating long-term dependencies and finally two dense layers. Experimental results on the SAVEE database shows promising performance. Our proposed model is highly capable as it obtained an accuracy of 94.26%.




Qazi, H., & Kaushik, B. N. (2020). A Hybrid Technique using CNN LSTM for Speech Emotion Recognition. International Journal of Engineering and Advanced Technology, 9(5), 1126–1130. https://doi.org/10.35940/ijeat.e1027.069520

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free