Performance Analysis of a Chunk-Based Speech Emotion Recognition Model Using RNN

5Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

Recently, artificial-intelligence-based automatic customer response system has been widely used instead of customer service representatives. Therefore, it is important for automatic customer service to promptly recognize emotions in a customer’s voice to provide the appropriate service accordingly. Therefore, we analyzed the performance of the emotion recognition (ER) accuracy as a function of the simulation time using the proposed chunk-based speech ER (CSER) model. The proposed CSER model divides voice signals into 3-s long chunks to efficiently recognize characteristically inherent emotions in the customer’s voice. We evaluated the performance of the ER of voice signal chunks by applying four RNN techniques—long short-term memory (LSTM), bidirectional-LSTM, gated recurrent units (GRU), and bidirectional-GRU—to the proposed CSER model individually to assess its ER accuracy and time efficiency. The results reveal that GRU shows the best time efficiency in recognizing emotions from speech signals in terms of accuracy as a function of simulation time.

Cite

CITATION STYLE

APA

Shin, H. S., & Hong, J. K. (2023). Performance Analysis of a Chunk-Based Speech Emotion Recognition Model Using RNN. Intelligent Automation and Soft Computing, 36(1), 235–248. https://doi.org/10.32604/iasc.2023.033082

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free