Emotion classification of audio signals using ensemble of support vector machines

18Citations
Citations of this article
21Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This study presents an approach for emotion classification of speech utterances based on ensemble of support vector machines. We considered feature level fusion of the MFCC, total energy and F0 as input feature vectors, and choose bagging method for the classification. Additionally, we also present a new emotional dataset based on a popular animation film, Finding Nemo where emotions are much emphasized to attract attention of spectators. Speech utterances are directly extracted from video audio channel including all background noise. Totally 2054 utterances from 24 speakers were annotated by a group of volunteers based on seven emotion categories. We concentrated on perceived emotion. Our approach has been tested on our newly developed dataset besides publically available datasets of DES and EmoDB. Experiments showed that our approach achieved 77.5% and 66.8% overall accuracy for four and five class classification on EFN dataset respectively. In addition, we achieved 67.6% accuracy on DES (five classes) and 63.5% on EmoDB (seven classes) dataset using ensemble of SVM's with 10 fold cross-validation. © 2008 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Danisman, T., & Alpkocak, A. (2008). Emotion classification of audio signals using ensemble of support vector machines. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5078 LNCS, pp. 205–216). https://doi.org/10.1007/978-3-540-69369-7_23

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free