A prosodic diphone database for Korean text-to-speech synthesis system

1Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper presents a prosodically conditioned diphone database to be used in a Korean text-to-speech (TTS) synthesis system. The diphones are prosodically conditioned in the sense that a single conventional diphone is stored as different versions taken directly from the different prosodic domains of the prosodically labeled, read sentences (following the K-ToBI prosodic labeling conventions). Four levels of the Korean prosodic domains were observed in the diphone selection process, thereby selecting four different versions of each diphone. A 400-sentence subset of the Korean Newswire Text Corpora were converted to its pronounced form as described in [8] and its read version was prosodically labeled. The greedy algorithm identified 223 sentences containing 1,853 prosodic diphones (out of the 3,977 possible prosodic diphones) that can synthesize all four hundred utterances. Although our system cannot synthesize an unlimited number of sentences at this stage, the quality of the synthesized sentences strongly suggests that it is a viable option to use prosodically conditioned diphones in a text-to-speech synthesis system. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Yoon, K. (2005). A prosodic diphone database for Korean text-to-speech synthesis system. In Lecture Notes in Computer Science (Vol. 3406, pp. 425–428). Springer Verlag. https://doi.org/10.1007/978-3-540-30586-6_45

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free