Mixing Synthetic and Recorded Signals for Audio-Book Generation

Meysam Shamsi; Nelly Barbot; Damien Lolive; Jonathan Chevelu

Conference Proceedings

Mixing Synthetic and Recorded Signals for Audio-Book Generation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12335 LNAI 479-489

DOI: 10.1007/978-3-030-60276-5_46

2Citations

2Readers

Get full text

Abstract

Using TTS systems helps to reduce the cost of audio-book generation. This paper investigates the idea of mixing synthetic and recorded natural speech signals to control the trade-off between the overall quality of audio book and its production cost. Firstly, fully synthetic signals and mixed synthetic and natural signals are compared perceptually using different levels of synthetic quality. The listeners’ perception shows that mixed signals are preferred. Next, the order and configuration of mixed signals are studied. The perceptual test does not show any significant difference between the different configurations. Finally, the synthetic quality and the bias of a starting and ending part of mixed signals in perceptual test are investigated.

Author supplied keywords

Cite

CITATION STYLE

APA

Shamsi, M., Barbot, N., Lolive, D., & Chevelu, J. (2020). Mixing Synthetic and Recorded Signals for Audio-Book Generation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12335 LNAI, pp. 479–489). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-60276-5_46

Mixing Synthetic and Recorded Signals for Audio-Book Generation

Abstract

Author supplied keywords

Cite

Register to see more suggestions