A minimum cluster-based trigram statistical model for Thai syllabification

2Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Syllabification is a process of extracting syllables from a word. Problems of syllabification are majorly caused from unknown and ambiguous words. This research aims to resolve these problems in Thai language by exploiting relationships among characters in the word. A character clustering scheme is proposed to generate units smaller than a syllable, called Thai Minimum Clusters (TMCs), from a word. TMCs are then merged into syllables using a trigram statistical model. Experimental evaluations are performed to assess the effectiveness of the proposed technique on a standard data set of 77,303 words. The results show that the technique yields 97.61% accuracy. © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Jucksriporn, C., & Sornil, O. (2011). A minimum cluster-based trigram statistical model for Thai syllabification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6609 LNCS, pp. 493–505). https://doi.org/10.1007/978-3-642-19437-5_41

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free