Creating expressive TTS voices for conversation agent applications

1Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Text-to-Speech has traditionally been viewed as a “black box” component, where standard “portfolio” voices are typically offered with a professional but “neutral” speaking style. For commercially important languages many different portfolio voices may be offered all with similar speaking styles. A customer wishing to use TTS will typically choose one of these voices. The only alternative is to opt for a “custom voice” solution. In this case, a customer pays for a TTS voice to be created using their preferred voice talent. Such an approach allows for some “tuning” of the scripts used to create the voice. Limited script elements may be added to provide better coverage of the customer’s expected domain and “gilded phrases” can be included to ensure that specific phrase fragments are spoken perfectly. However, even with such an approach the recording style is strictly controlled and standard scripts are augmented rather than redesigned from scratch. The “black box” approach to TTS allows for systems to be produced which satisfy the needs of a large number of customers, even if this means that solutions may be limited in the persona they present. Recent advances in conversational agent applications have changed people’s expectations of how a computer voice should sound and interact. Suddenly, it’s much more important for the TTS system to present a persona which matches the goals of the application. Such systems demand a more flamboyant, upbeat and expressive voice. The “black box” approach is no longer sufficient; voices for high-end conversational agents are being explicitly “designed” to meet the needs of such applications. These voices are both expressive and light in tone, and a complete contrast to the more conservative voices available for traditional markets. This paper will describe how Nuance is addressing this new and challenging market.

Cite

CITATION STYLE

APA

Breen, A. (2014). Creating expressive TTS voices for conversation agent applications. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8773, pp. 1–14). Springer Verlag. https://doi.org/10.1007/978-3-319-11581-8_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free