Self-organizing n-gram model for automatic word spacing

Seong Bae Park; Yoon Shik Tae; Se Young Park

Conference ProceedingsOPEN ACCESS

Self-organizing n-gram model for automatic word spacing

COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (2006) 1 633-640

DOI: 10.3115/1220175.1220255

7Citations

84Readers

Abstract

An automatic word spacing is one of the important tasks in Korean language processing and information retrieval. Since there are a number of confusing cases in word spacing of Korean, there are some mistakes in many texts including news articles. This paper presents a high-accurate method for automatic word spacing based on self-organizing n-gram model. This method is basically a variant of n-gram model, but achieves high accuracy by automatically adapting context size. In order to find the optimal context size, the proposed method automatically increases the context size when the contextual distribution after increasing it dose not agree with that of the current context. It also decreases the context size when the distribution of reduced context is similar to that of the current context. This approach achieves high accuracy by considering higher dimensional data in case of necessity, and the increased computational cost are compensated by the reduced context size. The experimental results show that the self-organizing structure of n-gram model enhances the basic model. © 2006 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Park, S. B., Tae, Y. S., & Park, S. Y. (2006). Self-organizing n-gram model for automatic word spacing. In COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Vol. 1, pp. 633–640). https://doi.org/10.3115/1220175.1220255

Self-organizing n-gram model for automatic word spacing

Abstract

Cite

Register to see more suggestions