TTS-driven synthetic behaviour-generation model for artificial bodies

11Citations
Citations of this article
26Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Visual perception, speech perception and the understanding of perceived information are linked through complex mental processes. Gestures, as part of visual perception and synchronized with verbal information, are a key concept of human social interaction. Even when there is no physical contact (e.g., a phone conversation), humans still tend to express meaning through movement. Embodied conversational agents (ECAs), as well as humanoid robots, are visual recreations of humans and are thus expected to be able to perform similar behaviour in communication. The behaviour generation system proposed in this paper is able to specify expressive behaviour strongly resembling natural movement performed within social interaction. The system is TTS-driven and fused with the time-andspace efficient TTS-engine, called 'PLATTOS'. Visual content and content presentation is formulated based on several linguistic features that are extrapolated from arbitrary input text sequences and prosodic features (e.g., pitch, intonation, stress, emphasis, etc.), as predicted by several verbal modules in the system. According to the evaluation results, when using the proposed system the synchronized co-verbal behaviour can be recreated with a very high-degree of naturalness, either by ECAs or humanoid robots alike. © 2013 Mlakar et al.

Cite

CITATION STYLE

APA

Mlakar, I., Kacic, Z., & Rojc, M. (2013). TTS-driven synthetic behaviour-generation model for artificial bodies. International Journal of Advanced Robotic Systems, 10. https://doi.org/10.5772/56870

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free