With the development of Interactive Voice Response (IVR) systems, people can not only operate computer systems through task-oriented conversation but also enjoy non-task-oriented conversation with the computer. When an IVR system generates a response, it usually refers to just verbal information of the user’s utterance. However, when a person gloomily says “I’m fine,” people will respond not by saying “That’s wonderful” but “Really?” or “Are you OK?” because we can consider both verbal and non-verbal information such as tone of voice, facial expressions, gestures, and so on. In this article, we propose an intelligent IVR system that considers not only verbal but also non-verbal information. To estimate a speaker’s emotion (positive, negative, or neutral), 384 acoustic features extracted from the speaker’s utterance are utilized to machine learning (SVM). Artificial Intelligence Markup Language (AIML)-based response generating rules are expanded to be able to consider the speaker’s emotion. As a result of the experiment, subjects felt that the proposed dialog system was more likable, enjoyable, and did not give machine-like reactions.
CITATION STYLE
Takahashi, T., Mera, K., Nhat, T. B., Kurosawa, Y., & Takezawa, T. (2017). Natural language dialog system considering speaker’s emotion calculated from acoustic features. In Lecture Notes in Electrical Engineering (Vol. 427 427 LNEE, pp. 145–157). Springer Verlag. https://doi.org/10.1007/978-981-10-2585-3_11
Mendeley helps you to discover research relevant for your work.