MODELLING SPEECH TEMPORAL STRUCTURE FOR ESTONIAN TEXT-TO-SPEECH SYNTHESIS: FEATURE SELECTION

  • Mihkla M
N/ACitations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

The article discusses the principles of selecting features for modelling the temporal structure of Estonian speech, using different types of read-out texts, with a view to text-to-speech synthesis (TTS). Feature selection is known to depend on certain general issues regulating speech temporal structure, as well as on some language specific aspects. The durational model of Estonian stands out for some foot-bound features (foot quantity degree, number of feet in the word) being included in the input. In addition to the traditional descriptors of sound context and hierarchical position the prediction of Estonian segmental durations requires information on some morphological, syntactic and lexical features of the word, such as word form, part of sentence, and part of speech. In the prediction of pauses in the speech flow the relevant features are: distance from sentence beginning and from the previous pause, the length and quantity degree of the preceding foot, and the occurrence of a punctuation mark or conjunction. Although expert opinions were used in feature selection, statistical methods should be applied to test the vector of optimal argument features. [ABSTRACT FROM AUTHOR]

Cite

CITATION STYLE

APA

Mihkla, M. (2007). MODELLING SPEECH TEMPORAL STRUCTURE FOR ESTONIAN TEXT-TO-SPEECH SYNTHESIS: FEATURE SELECTION. Trames. Journal of the Humanities and Social Sciences, 11(3), 284–298. https://doi.org/10.3176/tr.2007.3.04

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free