TLEFuzzyNet: Fuzzy Rank-Based Ensemble of Transfer Learning Models for Emotion Recognition from Human Speeches

Karam Kumar Sahoo; Ishan Dutta; Muhammad Fazal Ijaz; Marcin Wozniak; Pawan Kumar Singh

Journal ArticleOPEN ACCESS

TLEFuzzyNet: Fuzzy Rank-Based Ensemble of Transfer Learning Models for Emotion Recognition from Human Speeches

IEEE Access (2021) 9 166518-166530

DOI: 10.1109/ACCESS.2021.3135658

53Citations

44Readers

Abstract

Human speech is not only a verbose medium of communication but it also conveys emotions. The past decade has seen a lot of research going on with speech data which becomes especially important for human-computer interaction and also healthcare, security, and entertainment. This paper proposes the TLEFuzzyNet model, a three-stage pipeline for emotion recognition from speech. The first stage includes feature extraction by data augmentation of speech signals and extraction of Mel spectrograms, followed by the use of three pretrained transfer learning CNN models namely, ResNet18, Inception_v3, and GoogleNet whose prediction scores are fed to the third stage. In the final stage, we assign Fuzzy Ranks using a modified Gompertz function which gives the final prediction scores after considering the individual scores from the three CNN models. We have used the Surrey Audio-Visual Expressed Emotion (SAVEE), the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and the Berlin Database of Emotional Speech (EmoDB) datasets to evaluate the TLEFuzzyNet model which has achieved state-of-the-art performance and is hence a dependable framework for Speech emotion recognition(SER). All the codes are available using GitHub link: https://github.com/KaramSahoo/SpeechEmotionRecognitionFuzzy

Author supplied keywords

Cite

CITATION STYLE

APA

Sahoo, K. K., Dutta, I., Ijaz, M. F., Wozniak, M., & Singh, P. K. (2021). TLEFuzzyNet: Fuzzy Rank-Based Ensemble of Transfer Learning Models for Emotion Recognition from Human Speeches. IEEE Access, 9, 166518–166530. https://doi.org/10.1109/ACCESS.2021.3135658

Readers over time

Readers' Seniority

PhD / Post grad / Masters / Doc 9

56%

Lecturer / Post doc 4

25%

Researcher 2

13%

Professor / Associate Prof. 1

Readers' Discipline

Engineering 7

50%

Computer Science 5

36%

Philosophy 1

Business, Management and Accounting 1

TLEFuzzyNet: Fuzzy Rank-Based Ensemble of Transfer Learning Models for Emotion Recognition from Human Speeches

Abstract

Author supplied keywords

Register to see more suggestions

Cite

Readers over time

Readers' Seniority

Readers' Discipline