Vietnamese part of speech tagging based on multi-category words disambiguation model

0Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

POS tagging is a fundamental work in Natural Language Processing, which determines the subsequent processing quality, and the ambiguity of multi-category words directly affects the accuracy of Vietnamese POS tagging. At present, the POS tagging of English and Chinese has achieved better results, but the accuracy of Vietnamese POS tagging is still to be improved. For address this problem, this paper proposes a novel method of Vietnamese POS tagging based on multi-category words disambiguation model and Part of Speech dictionary, the multi-category words dictionary and the non-multi-category words dictionary are generated from the Vietnamese dictionary, which are used to build POS tagging corpus. 396,946 multi-category words have been extracted from the corpus, by using statistical method, the maximum entropy disambiguation model of Vietnamese part of speech is constructed, based on it, the multi-category words and the non-multi-category words are tagged. Experimental results show that the method proposed in the paper is higher than the existing model, which is proved that the method is feasible and effective.

Cite

CITATION STYLE

APA

Chen, Z., Yanchao, L., Jianyi, G., Wei, C., Xin, Y., Zhengtao, Y., & Xiuqin, C. (2018). Vietnamese part of speech tagging based on multi-category words disambiguation model. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10619 LNAI, pp. 267–277). Springer Verlag. https://doi.org/10.1007/978-3-319-73618-1_23

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free