Normalization of Vietnamese tweets on Twitter

Vu H. Nguyen; Hien T. Nguyen; Vaclav Snasel

Conference Proceedings

Normalization of Vietnamese tweets on Twitter

Advances in Intelligent Systems and Computing (2015) 370 179-189

DOI: 10.1007/978-3-319-21206-7_16

6Citations

10Readers

Get full text

Abstract

We study a task of noisy text normalization focusing on Vietnamese tweets. This task aims to improve the performance of applications mining or analyzing semantics of social media contents as well as other social network analysis applications. Since tweets on Twitter are noisy, irregular, short and consist of acronym, spelling errors, processing those tweets is more challenging than that of news or formal texts. In this paper, we proposed a method that aims to normalize Vietnamese tweets by detecting non-standard words as well as spelling errors and correcting them. The method combines a language model with dictionaries and Vietnamese vocabulary structures. We build a dataset including 1,360 Vietnamese tweets to evaluate the proposed method. Experiment results show that our method achieved encouraging performance with 89% F1-Score.

Author supplied keywords

Cite

CITATION STYLE

APA

Nguyen, V. H., Nguyen, H. T., & Snasel, V. (2015). Normalization of Vietnamese tweets on Twitter. In Advances in Intelligent Systems and Computing (Vol. 370, pp. 179–189). Springer Verlag. https://doi.org/10.1007/978-3-319-21206-7_16

Normalization of Vietnamese tweets on Twitter

Abstract

Author supplied keywords

Cite

Register to see more suggestions