One Novel Word Segmentation Method Based on N-Shortest Path in Vietnamese

1Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Automatic word segmentation of Vietnamese is the primary step in Vietnamese text information processing, which would be an important support for cross-language information processing tasks in China and Vietnam. Since the Vietnamese language is an isolating language with tones, each syllable can not only form a word individually, but also create a new word by combining with left and/or right syllables. Therefore, automatic word segmentation of Vietnamese cannot be simply based on spaces. This paper takes automatic word segmentation of the Vietnamese language as the research object. First, it makes a rough segmentation of Vietnamese sentences with the N-shortest path model. Then, syllables in each sentence are abstracted into a directed acyclic graph. Finally, the Vietnamese word segmentation is obtained by calculating the shortest path with the help of the BEMS marking system. The results show that the proposed algorithm achieves a satisfactory performance in Vietnamese word segmentation.

Cite

CITATION STYLE

APA

Ke, X., Luo, H., Chen, J. H., Huang, R., & Lai, J. (2019). One Novel Word Segmentation Method Based on N-Shortest Path in Vietnamese. In Advances in Intelligent Systems and Computing (Vol. 924, pp. 549–557). Springer Verlag. https://doi.org/10.1007/978-981-13-6861-5_47

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free