Automatic speech recognition for languages in Southeast Asia, including Chinese, Thai and Vietnamese, typically models both acoustics and languages at the syllable level. This paper presents a new approach for recognizing those languages by exploiting information at the word level. The new approach, adapted from our FLaVoR architecture[1], consists of two layers. In the first layer, a pure acoustic-phonemic search generates a dense phoneme network enriched with meta data. In the second layer, a word decoding is performed in the composition of a series of finite state transducers (FST), combining various knowledge sources across sub-lexical, word lexical and word-based language models. Experimental results on the Vietnamese Broadcast News corpus showed that our approach is both effective and flexible. © 2006 Springer-Verlag Berlin/Heidelberg.
CITATION STYLE
Vu, Q., Demuynck, K., & Van Compernolle, D. (2006). Vietnamese automatic speech recognition: The FLaVoR approach. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4274 LNAI, pp. 464–474). https://doi.org/10.1007/11939993_49
Mendeley helps you to discover research relevant for your work.