The involvement of affect information in a spoken dialogue system can increase the user-friendliness and provide a more natural way for the interaction experience. This can be reached by speech emotion recognition, where the features are usually dominated by the spectral amplitude information while they ignore the use of the phase spectrum. In this chapter, we propose to use phase-based features to build up such an emotion recognition system. To exploit these features, we employ Fisher kernels. The according technique encodes the phase-based features by their deviation from a generative Gaussian mixture model. The resulting representation is fed to train a classification model with a linear kernel classifier. Experimental results on the GeWEC database including ‘normal’ and whispered phonation demonstrate the effectiveness of our method.
CITATION STYLE
Deng, J., Xu, X., Zhang, Z., Frühholz, S., Grandjean, D., & Schuller, B. (2017). Fisher kernels on phase-based features for speech emotion recognition. In Lecture Notes in Electrical Engineering (Vol. 427 427 LNEE, pp. 195–203). Springer Verlag. https://doi.org/10.1007/978-981-10-2585-3_15
Mendeley helps you to discover research relevant for your work.