Word segmentation is a process to divide a sentence into meaningful units called “word unit” [ISO/DIS 24614-1]. What is a word unit is judged by principles for its internal integrity and external use constraints. A word unit's internal structure is bound by principles of lexical integrity, unpredictability and so on in order to represent one syntactically meaningful unit. Principles for external use include language economy and frequency such that word units could be registered in a lexicon or any other storage for practical reduction of processing complexity for the further syntactic processing after word segmentation. Such principles for word segmentation are applied for Chinese, Japanese and Korean, and impacts of the standard are discussed.
CITATION STYLE
Choi, K. S., Isahara, H., Kanzaki, K., Kim, H., Pak, S. M., & Sun, M. (2009). Word Segmentation Standard in Chinese, Japanese and Korean. In Proceedings of the 7th Workshop on Asian Language Resources, ALR 2009 - in conjunction with the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (pp. 179–186). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1690299.1690325
Mendeley helps you to discover research relevant for your work.