A systematic study of open source and commercial text-to-speech (tts) engines

Jordan Hosier; Jordan Kalfen; Nikhita Sharma; Vijay K. Gurbani

Conference Proceedings

A systematic study of open source and commercial text-to-speech (tts) engines

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12284 LNAI 312-320

DOI: 10.1007/978-3-030-58323-1_34

1Citations

2Readers

Get full text

Abstract

The widespread availability of open source and commercial text-to-speech (TTS) engines allows for the rapid creation of telephony services that require a TTS component. However, there exists neither a standard corpus nor common metrics to objectively evaluate TTS engines. Listening tests are a prominent method of evaluation in the domain where the primary goal is to produce speech targeted at human listeners. Nonetheless, subjective evaluation can be problematic and expensive. Objective evaluation metrics, such as word accuracy and contextual disambiguation (is “Dr.” rendered as Doctor or Drive?), have the benefit of being both inexpensive and unbiased. In this paper, we study seven TTS engines, four open source engines and three commercial ones. We systematically evaluate each TTS engine on two axes: (1) contextual word accuracy (includes support for numbers, homographs, foreign words, acronyms, and directional abbreviations); and (2) naturalness (how natural the TTS sounds to human listeners). Our results indicate that commercial engines may have an edge over open source TTS engines.

Cite

CITATION STYLE

APA

Hosier, J., Kalfen, J., Sharma, N., & Gurbani, V. K. (2020). A systematic study of open source and commercial text-to-speech (tts) engines. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12284 LNAI, pp. 312–320). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-58323-1_34

A systematic study of open source and commercial text-to-speech (tts) engines

Abstract

Cite

Register to see more suggestions