Speech Emotion Recognition Using Clustering Based GA-Optimized Feature Set

45Citations
Citations of this article
59Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Speech Emotion Recognition (SER) is a hot topic in academia and industry. Feature engineering plays a pivotal role in building an efficient SER. Although researchers have done a tremendous amount of work in this field, there are still the issues of speech feature choice and the correct application of feature engineering that remains to be solved in the domain of SER. In this research, a feature optimization approach that uses a clustering-based genetic algorithm is proposed. Instead of randomly selecting the new generation, clustering is applied at the fitness evaluation level to detect outliers for exclusion to be part of the next generation. The approach is compared with the standard Genetic Algorithm in the context of audio emotion recognition using Berlin Emotional Speech Database (EMO-DB), Ryerson Audio-Visual Database of Speech and Song (RAVDESS) and, Surrey Audio-Visual Expressed Emotion Dataset (SAVEE). Results signify that the proposed technique effectively improved the emotion classification in speech. The recognition rate of 89.6% for general speakers (both male and female), 86.2% for male speakers, and 88.3% for female speakers on EMO-DB, 82.5% for general speakers, 75.4% for male speakers, and 91.1% for female speaker on RAVDESS, and 77.7% for general speakers on SAVEE is obtained in speaker-dependent experiments. For speaker-independent experiments, we achieved the recognition rate of 77.5% on EMO-DB, 76.2% on RAVDESS and, 69.8 % on SAVEE. All the experiments were performed on MATLAB and the Support Vector Machine (SVM) was used for classification. Results confirm that the proposed method is capable of discriminating emotions effectively and performed better than the other approaches used for comparison in terms of performance measures

Cite

CITATION STYLE

APA

Kanwal, S., & Asghar, S. (2021). Speech Emotion Recognition Using Clustering Based GA-Optimized Feature Set. IEEE Access, 9, 125830–125842. https://doi.org/10.1109/ACCESS.2021.3111659

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free