Syllabification is a process of extracting syllables from a word. Problems of syllabification are majorly caused from unknown and ambiguous words. This research aims to resolve these problems in Thai language by exploiting relationships among characters in the word. A character clustering scheme is proposed to generate units smaller than a syllable, called Thai Minimum Clusters (TMCs), from a word. TMCs are then merged into syllables using a trigram statistical model. Experimental evaluations are performed to assess the effectiveness of the proposed technique on a standard data set of 77,303 words. The results show that the technique yields 97.61% accuracy. © 2011 Springer-Verlag.
CITATION STYLE
Jucksriporn, C., & Sornil, O. (2011). A minimum cluster-based trigram statistical model for Thai syllabification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6609 LNCS, pp. 493–505). https://doi.org/10.1007/978-3-642-19437-5_41
Mendeley helps you to discover research relevant for your work.