Emotions Classification from Speech with Deep Learning

Andry Chowanda; Yohan Muliono

Journal ArticleOPEN ACCESS

Emotions Classification from Speech with Deep Learning

International Journal of Advanced Computer Science and Applications (2022) 13(4) 777-781

DOI: 10.14569/IJACSA.2022.0130490

3Citations

9Readers

Abstract

Emotions are the essential parts that convey meaning to the interlocutors during social interactions. Hence, recognising emotions is paramount in building a good and natural affective system that can naturally interact with the human interlocutors. However, recognising emotions from social interactions require temporal information in order to classify the emotions correctly. This research aims to propose an architecture that extracts temporal information using the Temporal model of Convolutional Neural Network (CNN) and combined with the Long Short Term Memory (LSTM) architecture from the Speech modality. Several combinations and settings of the architectures were explored and presented in the paper. The results show that the best classifier achieved by the model trained with four layers of CNN combined with one layer of Bidirectional LSTM. Furthermore, the model was trained with an augmented training dataset with seven times more data than the original training dataset. The best model resulted in 94.25%, 57.07%, 0.2577 and 1.1678 for training accuracy, validation accuracy, training loss and validation loss, respectively. Moreover, Neutral (Calm) and Happy are the easiest classes to be recognised, while Angry is the hardest to be classified.

Author supplied keywords

Cite

CITATION STYLE

APA

Chowanda, A., & Muliono, Y. (2022). Emotions Classification from Speech with Deep Learning. International Journal of Advanced Computer Science and Applications, 13(4), 777–781. https://doi.org/10.14569/IJACSA.2022.0130490

Emotions Classification from Speech with Deep Learning

Abstract

Author supplied keywords

Cite

Register to see more suggestions