Automatic syllabification and syllable timing of automatically recognized speech – for Czech

1Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Our recent work was focused on automatic speech recognition (ASR) of spoken word archive documents [6, 7]. One of the important tasks was to structuralize the recognized document (to segment the document and to detect sentence boundaries). Prosodic features play significant role in the spoken document structuralization. In our previous work we bound the prosodic information on the ASR events – words and noises. Many prosodic features (e.g. speech rate, vowel prominence or prolongation of last syllables) require higher time resolution than word-level [1]. For that reason we propose a scheme that is able to automatically syllabify the recognized words and by forced-alignment of its phonetic content provide the syllables (and its phonemes) with time-stamps. We presume that words, non-speech events, syllables and phonemes represent an appropriate hierarchical set of structuralization units for processing various prosodic features.

Cite

CITATION STYLE

APA

Boháč, M., Matějů, L., Rott, M., & Šafařík, R. (2016). Automatic syllabification and syllable timing of automatically recognized speech – for Czech. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9924 LNCS, pp. 540–547). Springer Verlag. https://doi.org/10.1007/978-3-319-45510-5_62

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free