Gestures during spoken dialog play a central role in human communication. As a consequence, models of gesture generation are a key challenge in research on virtual humans, embodied agents capable of faceto- face interaction with people. Machine learning approaches to gesture generation must take into account the conceptual content in utterances, physical properties of speech signals and the physical properties of the gestures themselves. To address this challenge, we proposed a gestural sign scheme to facilitate supervised learning and presented the DCNFmodel, a model to jointly learn deep neural networks and second order linear chain temporal contingency. The approach we took realizes both the mapping relation between speech and gestures while taking account temporal relations among gestures. Our experiments on human co-verbal dataset shows significant improvement over previous work on gesture prediction. A generalization experiment performed on handwriting recognition also shows that DCNFs outperform the state-of-the-art approaches.
CITATION STYLE
Chiu, C. C., Morency, L. P., & Marsella, S. (2015). Predicting co-verbal gestures: A deep and temporal modeling approach. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9238, pp. 152–166). Springer Verlag. https://doi.org/10.1007/978-3-319-21996-7_17
Mendeley helps you to discover research relevant for your work.