A new dimension of speaker stylistic variation was identified during human-computer communication: the convergence of users' speech with the text-to-speech (TTS) heard from an animated software partner. Twenty-four 7- to 10-year-old children conversed with digital fish that embodied different TTS voices as they learned about marine biology. An analysis of children's amplitude, durational features, and dialogue response latencies confirmed that they spontaneously adapt basic acoustic-prosodic features of their speech 10-50%, with the largest adaptations involving utterance pause structure and amplitude. Children's speech adaptations were bi-directional, and dynamically readaptable when introduced to a new software partner's voice. In the design of future conversational systems, the spontaneous convergence of users' speech could be exploited to guide it within system processing bounds, thereby enhancing robustness. The long-term goal of this research is the development of predictive models of human-computer communication to guide the design of next-generation conversational interfaces, in particular ones that are adaptive and focussed on audio exchanges in mobile usage contexts.
CITATION STYLE
Oviatt, S., Darves, C., Coulston, R., & Wesson, M. (2005). Speech Convergence with Animated Personas (pp. 379–397). https://doi.org/10.1007/1-4020-3075-4_20
Mendeley helps you to discover research relevant for your work.