Speech Emotion Recognition by Late Fusion for Bidirectional Reservoir Computing with Random Projection

16Citations
Citations of this article
28Readers
Mendeley users who have this article in their library.

Abstract

Many researchers are inspired by studying Speech Emotion Recognition (SER) because it is considered as a key effort in Human-Computer Interaction (HCI). The main focus of this work is to design a model for emotion recognition from speech, which has plenty of challenges within it. Due to the time series and sparse nature of emotion in speech, we have adopted a multivariate time series feature representation of the input data. The work has also adopted the Echo State Network (ESN) which includes reservoir computing as a special case of the Recurrent Neural Network (RNN) to avoid model complexity because of its untrained and sparse nature when mapping the features into a higher dimensional space. Additionally, we applied dimensionality reduction since it offers significant computational advantages by using Sparse Random Projection (SRP). Late fusion of bidirectionality input has been applied to capture additional information independently of the input data. The experiments for speaker-independent and/or speaker-dependent were performed on four common speech emotion datasets which are Emo-DB, SAVEE, RAVDESS, and FAU Aibo Emotion Corpus. The results show that the designed model outperforms the state-of-the-art with a cheaper computation cost.

References Powered by Scopus

Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication

2920Citations
N/AReaders
Get full text

OpenSMILE - The Munich versatile and fast open-source audio feature extractor

2474Citations
N/AReaders
Get full text

Reservoir computing approaches to recurrent neural network training

2167Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Mel Frequency Cepstral Coefficient and its Applications: A Review

180Citations
N/AReaders
Get full text

An octonion-based nonlinear echo state network for speech emotion recognition in Metaverse

45Citations
N/AReaders
Get full text

CTL-MTNet: A Novel CapsNet and Transfer Learning-Based Mixed Task Net for the Single-Corpus and Cross-Corpus Speech Emotion Recognition

19Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Ibrahim, H., Loo, C. K., & Alnajjar, F. (2021). Speech Emotion Recognition by Late Fusion for Bidirectional Reservoir Computing with Random Projection. IEEE Access, 9, 122855–122871. https://doi.org/10.1109/ACCESS.2021.3107858

Readers over time

‘21‘22‘23‘24‘250481216

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 6

50%

Professor / Associate Prof. 3

25%

Lecturer / Post doc 2

17%

Researcher 1

8%

Readers' Discipline

Tooltip

Computer Science 9

69%

Engineering 2

15%

Neuroscience 1

8%

Physics and Astronomy 1

8%

Save time finding and organizing research with Mendeley

Sign up for free
0