Speech emotion recognition is an interesting and challenging subject due to the emotion gap between speech signals and high-level speech emotion. To bridge this gap, this paper present a method of Chinese speech emotion recognition using Deep belief networks (DBN). DBN is used to perform unsupervised feature learning on the extracted low-level acoustic features. Then, Multi-layer Perceptron (MLP) is initialized in terms of the learning results of hidden layer of DBN, and employed for Chinese speech emotion classification. Experimental results on the Chinese Natural Audio-Visual Emotion Database (CHEAVD), show that the presented method obtains a classification accuracy of 32.80 % and macro average precision of 41.54 % on the testing data from the CHEAVD dataset on speech emotion recognition tasks, significantly outperforming the baseline results provided by the organizers in the speech emotion recognition sub-challenges.
CITATION STYLE
Zhang, S., Zhao, X., Chuang, Y., Guo, W., & Chen, Y. (2016). Feature learning via deep belief network for Chinese speech emotion recognition. In Communications in Computer and Information Science (Vol. 663, pp. 645–651). Springer Verlag. https://doi.org/10.1007/978-981-10-3005-5_53
Mendeley helps you to discover research relevant for your work.