Deep learning for acoustic addressee detection in spoken dialogue systems

4Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The addressee detection problem arises in real spoken dialogue systems (SDSs) which are supposed to distinguish the speech addressed to them from the speech addressed to real humans. In this work, several modalities were analyzed, and acoustic data has been chosen as the main modality by reason of the most flexible usability in modern SDSs. To resolve the problem of addressee detection, deep learning methods such as fully-connected neural networks and Long Short-Term Memory were applied in the present study. The developed models were improved by using different optimization methods, activation functions and a learning rate optimization method. Also the models were optimized by using a recursive feature elimination method and multiple initialization to increase the training speed. A fully-connected neural network reaches an average recall of 0.78, a Long Short-Term Memory neural network shows an average recall of 0.65. Advantages and disadvantages of both architectures are provided for the particular task.

Cite

CITATION STYLE

APA

Pugachev, A., Akhtiamov, O., Karpov, A., & Minker, W. (2018). Deep learning for acoustic addressee detection in spoken dialogue systems. In Communications in Computer and Information Science (Vol. 789, pp. 45–53). Springer Verlag. https://doi.org/10.1007/978-3-319-71746-3_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free