Combined feature representation for emotion classification from Russian speech

Oxana Verkholyak; Alexey Karpov

Conference Proceedings

Combined feature representation for emotion classification from Russian speech

Communications in Computer and Information Science (2018) 789 68-73

DOI: 10.1007/978-3-319-71746-3_6

3Citations

5Readers

Get full text

Abstract

Acoustic feature extraction for emotion classification is possible on different levels. Frame-level features provide low-level description characteristics that preserve temporal structure of the utterance. On the other hand, utterance-level features represent functionals applied to the low-level descriptors and contain important information about speaker emotional state. Utterance-level features are particularly useful for determining emotion intensity, however, they lose information about temporal changes of the signal. Another drawback includes often insufficient number of feature vectors for complex classification tasks. One solution to overcome these problems is to combine the frame-level features and utterance-level features to take advantage of both methods. This paper proposes to obtain low-level feature representation feeding frame-level descriptor sequences to a Long Short-Term Memory (LSTM) network, combine the outcome with the Principal Component Analysis (PCA) representation of utterance-level features, and make the final prediction with a logistic regression classifier.

Author supplied keywords

Cite

CITATION STYLE

APA

Verkholyak, O., & Karpov, A. (2018). Combined feature representation for emotion classification from Russian speech. In Communications in Computer and Information Science (Vol. 789, pp. 68–73). Springer Verlag. https://doi.org/10.1007/978-3-319-71746-3_6

Combined feature representation for emotion classification from Russian speech

Abstract

Author supplied keywords

Cite

Register to see more suggestions