End-to-end listening agent for audiovisual emotional and naturalistic interactions

1Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.

Abstract

In this work, we established the foundations of a framework with the goal to build an end-to-end naturalistic expressive listening agent. The project was split into modules for recognition of the user’s paralinguistic and nonverbal expressions, prediction of the agent’s reactions, synthesis of the agent’s expressions and data recordings of nonverbal conversation expressions. First, a multimodal multitask deep learning-based emotion classification system was built along with a rule-based visual expression detection system. Then several sequence prediction systems for nonverbal expressions were implemented and compared. Also, an audiovisual concatenation-based synthesis system was implemented. Finally, a naturalistic, dyadic emotional conversation database was collected. We report here the work made for each of these modules and our planned future improvements.

Cite

CITATION STYLE

APA

El Haddad, K., Rizk, Y., Heron, L., Hajj, N., Zhao, Y., Kim, J., … Çakmak, H. (2018). End-to-end listening agent for audiovisual emotional and naturalistic interactions. Journal of Science and Technology of the Arts, 10(2), 49–61. https://doi.org/10.7559/citarj.v10i2.424

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free