A Comparison of Two Speech Emotion Recognition Algorithms: Pepper Humanoid Versus Bag of Models

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Some of the most exciting applications of Speech Emotion Recognition (SER) focus on gathering emotions in daily life contexts, such as social robotics, voice assistants, entertainment industries, and health support systems. Among the most popular social humanoids launched in the last years, Softbank Pepper® can be remarked. This humanoid sports an exciting multi-modal emotional module, including face gesture recognition and Speech Emotion Recognition. On the other hand, a competitive SER algorithm for embedded systems [2] based on a bag of models (BoM) method was presented in previous works. As Pepper is an exciting and extensible platform, current work represents the first step to a series of future social robotics projects. Specifically, this paper systematically compared Pepper’s SER module (SER-Pepper) against a new release of our SER algorithm based on a BoM of XTraTress and CatBoost (SER-BoM). A complete workbench to achieve a fair comparison has been deployed, including other issues: selecting two well-known SER datasets, SAVEE and RAVNESS, and a standardised playing and recording environment for the files of the former datasets. The SER-BoM algorithm has shown better results in all the validation contexts.

Cite

CITATION STYLE

APA

de la Cal, E., Sedano, J., Gallucci, A., & Valderde, P. (2023). A Comparison of Two Speech Emotion Recognition Algorithms: Pepper Humanoid Versus Bag of Models. In Lecture Notes in Networks and Systems (Vol. 531 LNNS, pp. 635–644). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-18050-7_62

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free