Evaluating voice quality and speech synthesis using crowdsourcing

Jeanne Parson; Daniela Braga; Michael Tjalve; Jieun Oh

Conference Proceedings

Evaluating voice quality and speech synthesis using crowdsourcing

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 8082 LNAI 233-240

DOI: 10.1007/978-3-642-40585-3_30

12Citations

18Readers

Get full text

Abstract

One of the key aspects of creating high quality synthetic speech is the validation process. Establishing validation processes that are reliable and scalable is challenging. Today, the maturity of the crowdsourcing infrastructure along with better techniques for validating the data gathered through crowdsourcing have made it possible to perform reliable speech synthesis validation at a larger scale. In this paper, we present a study of voice quality evaluation using the crowdsourcing platform. We investigate voice gender preference across eight locales for three typical TTS scenarios. We also examine to which degree speaker adaptation can carry over certain voice qualities, such as mood, of the target speaker to the adapted TTS. Based on an existing full TTS font, adaptation is carried out on a smaller amount of speech data from a target speaker. Finally, we show how crowdsourcing contributes to objective assessment when dealing with voice preference in voice talent selection. © 2013 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Parson, J., Braga, D., Tjalve, M., & Oh, J. (2013). Evaluating voice quality and speech synthesis using crowdsourcing. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8082 LNAI, pp. 233–240). https://doi.org/10.1007/978-3-642-40585-3_30

Evaluating voice quality and speech synthesis using crowdsourcing

Abstract

Author supplied keywords

Cite

Register to see more suggestions