Abstract
In human-human interactions, a listener uses both verbal tokens and head nods for responding signals, and they frequently co-occur. When humanoid robots and anthropomorphic agents response to a user using verbal tokens and head nods simultaneously, they must be generated in proper timing to each other and have consistent features. In this paper, we propose models to predict co-occurrence and physical features of head nods based on prosodic and syntactic features of verbal response tokens. We used, as predictive variables, the forms, positions, durations, averages/standard deviations of fundamental frequency and loudness of response tokens and head positions at the beginning of response tokens. In addition, considering participation framework, we also used speaker's gaze and listener's gaze at the beginning of response tokens, and applied generalized mixed models to predict the co-occurrence, type, range, repetition and velocity of head nods. The results confirmed that proposed models can predict these outcomes effectively.
Author supplied keywords
Cite
CITATION STYLE
Mori, T., & Den, Y. (2022). Generation Model for Head Nods Consistent with Features of Verbal Response Tokens. Transactions of the Japanese Society for Artificial Intelligence, 37(3). https://doi.org/10.1527/tjsai.37-3_IDS-H
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.