Syllablebased Neural Thai Word Segmentation

16Citations
Citations of this article
71Readers
Mendeley users who have this article in their library.

Abstract

Word segmentation is a challenging preprocessing step for Thai Natural Language Processing due to the lack of explicit word boundaries. The previous systems rely on powerful neural network architecture alone and ignore linguistic substructures of Thai words. We utilize the linguistic observation that Thai strings can be segmented into syllables, which should narrow down the search space for the word boundaries and provide helpful features. Here, we propose a neural Thai Word Segmenter that uses syllable embeddings to capture linguistic constraints and uses dilated CNN filters to capture the environment of each character. Within this goal, we develop the first MLbased Thai orthographical syllable segmenter, which yields syllable embeddings to be used as features by the word segmenter. Our word segmentation system outperforms the previous state-of-the-art system in both speed and accuracy on both indomain and outdomain datasets.

Cite

CITATION STYLE

APA

Chormai, P., Prasertsom, P., Cheevaprawatdomrong, J., & Rutherford, A. T. (2020). Syllablebased Neural Thai Word Segmentation. In COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference (pp. 4619–4637). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.coling-main.407

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free