Engagement represents how much a user is interested in and willing to continue the current dialogue and is the important cue for spoken dialogue systems to adapt the user state. We address engagement recognition based on listener’s multimodal behaviors such as backchannels, laughing, head nodding, and eye gaze. When the ground-truth labels are given by multiple annotators, they differ according to each annotator due to the different perspectives on the multimodal behaviors. We assume that each annotator has a latent character that affects its perception of engagement. We propose a hierarchical Bayesian model that estimates both the engagement level and the character of each annotator as latent variables. Furthermore, we incorporate other latent variables to map the input feature into a sub-space. The experimental result shows that the proposed model achieves higher accuracy than other models that do not take into account the character.
CITATION STYLE
Inoue, K., Lala, D., Takanashi, K., & Kawahara, T. (2019). Latent character model for engagement recognition based on multimodal behaviors. In Lecture Notes in Electrical Engineering (Vol. 579, pp. 119–130). Springer. https://doi.org/10.1007/978-981-13-9443-0_11
Mendeley helps you to discover research relevant for your work.