Recognizing emotionally coloured dialogue speech using speaker-adapted DNN-CNN bottleneck features

Kohei Mukaihara; Sakriani Sakti; Satoshi Nakamura

Conference Proceedings

Recognizing emotionally coloured dialogue speech using speaker-adapted DNN-CNN bottleneck features

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10458 LNAI 632-641

DOI: 10.1007/978-3-319-66429-3_63

5Citations

3Readers

Get full text

Abstract

Emotionally coloured speech recognition is a key technology toward achieving human-like spoken dialog systems. However, despite rapid progress in automatic speech recognition (ASR) and emotion research, much less work has examined ASR systems that recognize the verbal content of emotionally coloured speech. Approaches that exist in emotional speech recognition mostly involve adapting standard ASR models to include information about prosody and emotion. In this study, instead of adapting a model to handle emotional speech, we focus on feature transformation methods to solve the mismatch and improve the ASR performance. In this way, we can train the model with emotionally coloured speech without any explicit emotional annotation. We investigate the use of two different deep bottleneck network structures: deep neural networks (DNNs) and convolutional neural networks (CNNs). We hypothesize that the trained bottleneck features may be able to extract essential information that represents the verbal content while abstracting away from superficial differences caused by emotional variance. We also try various combinations of these two bottleneck features with feature-space speaker adaptation. Experiments using Japanese and English emotional speech data reveal that both varieties of bottleneck features and feature-space speaker adaptation successfully improve the emotional speech recognition performance.

Author supplied keywords

Cite

CITATION STYLE

APA

Mukaihara, K., Sakti, S., & Nakamura, S. (2017). Recognizing emotionally coloured dialogue speech using speaker-adapted DNN-CNN bottleneck features. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10458 LNAI, pp. 632–641). Springer Verlag. https://doi.org/10.1007/978-3-319-66429-3_63

Recognizing emotionally coloured dialogue speech using speaker-adapted DNN-CNN bottleneck features

Abstract

Author supplied keywords

Cite

Register to see more suggestions