An effective discriminative learning approach for emotion-specific features using deep neural networks

3Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Speech contains rich yet entangled information ranging from phonetic to emotional components. These different components are always mixed together hindering certain tasks from achieving better performance. Therefore, automatically learning a good representation that disentangles these components is non-trivial. In this paper, we propose a hierarchical method to extract utterance-level features from frame-level acoustic features using deep neural networks (DNNs). Moreover, inspired by recent progress in face recognition, we introduce centre loss as a complementary supervision signal to the traditional softmax loss to facilitate the intra-class compactness of the learned features. With the joint supervision of these two loss functions, we can train the DNNs to obtain separable and discriminative emotion-specific features. Experiments on CASIA corpus, Emo-DB corpus and SAVEE database show comparable results with that of state-of-the-art approaches.

Cite

CITATION STYLE

APA

Mao, S., & Ching, P. C. (2018). An effective discriminative learning approach for emotion-specific features using deep neural networks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11304 LNCS, pp. 50–61). Springer Verlag. https://doi.org/10.1007/978-3-030-04212-7_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free