Evaluating voice quality and speech synthesis using crowdsourcing

12Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.
Get full text

Abstract

One of the key aspects of creating high quality synthetic speech is the validation process. Establishing validation processes that are reliable and scalable is challenging. Today, the maturity of the crowdsourcing infrastructure along with better techniques for validating the data gathered through crowdsourcing have made it possible to perform reliable speech synthesis validation at a larger scale. In this paper, we present a study of voice quality evaluation using the crowdsourcing platform. We investigate voice gender preference across eight locales for three typical TTS scenarios. We also examine to which degree speaker adaptation can carry over certain voice qualities, such as mood, of the target speaker to the adapted TTS. Based on an existing full TTS font, adaptation is carried out on a smaller amount of speech data from a target speaker. Finally, we show how crowdsourcing contributes to objective assessment when dealing with voice preference in voice talent selection. © 2013 Springer-Verlag.

Cite

CITATION STYLE

APA

Parson, J., Braga, D., Tjalve, M., & Oh, J. (2013). Evaluating voice quality and speech synthesis using crowdsourcing. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8082 LNAI, pp. 233–240). https://doi.org/10.1007/978-3-642-40585-3_30

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free