Skip-gram-KR: Korean word embedding for semantic clustering

Sun Young Ihm; Ji Hye Lee; Young Ho Park

Journal ArticleOPEN ACCESS

Skip-gram-KR: Korean word embedding for semantic clustering

IEEE Access (2019) 7 39948-39961

DOI: 10.1109/ACCESS.2019.2905252

10Citations

32Readers

Abstract

Deep learning algorithms are used in various applications for pattern recognition, natural language processing, speech recognition, and so on. Recently, neural network-based natural language processing techniques use fixed length word embedding. Word embedding is a method of digitizing a word at a specific position into a low-dimensional dense vector with fixed length while preserving the similarity of the distribution of its surrounding words. Currently, the word embedding methods for foreign language are used for Korean words; however, existing word embedding methods are developed for English originally, so they do not reflect the order and structure of the Korean words. In this paper, we propose a word embedding method for Korean, which is called Skip-gram-KR, and a Korean affix tokenizer. Skip-gram-KR creates similar word training data through backward mapping and the two-word skipping method. The experiment results show the proposed method achieved the most accurate performance.

Author supplied keywords

Cite

CITATION STYLE

APA

Ihm, S. Y., Lee, J. H., & Park, Y. H. (2019). Skip-gram-KR: Korean word embedding for semantic clustering. IEEE Access, 7, 39948–39961. https://doi.org/10.1109/ACCESS.2019.2905252

Skip-gram-KR: Korean word embedding for semantic clustering

Abstract

Author supplied keywords

Cite

Register to see more suggestions