Normalization of Vietnamese tweets on Twitter

6Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We study a task of noisy text normalization focusing on Vietnamese tweets. This task aims to improve the performance of applications mining or analyzing semantics of social media contents as well as other social network analysis applications. Since tweets on Twitter are noisy, irregular, short and consist of acronym, spelling errors, processing those tweets is more challenging than that of news or formal texts. In this paper, we proposed a method that aims to normalize Vietnamese tweets by detecting non-standard words as well as spelling errors and correcting them. The method combines a language model with dictionaries and Vietnamese vocabulary structures. We build a dataset including 1,360 Vietnamese tweets to evaluate the proposed method. Experiment results show that our method achieved encouraging performance with 89% F1-Score.

Cite

CITATION STYLE

APA

Nguyen, V. H., Nguyen, H. T., & Snasel, V. (2015). Normalization of Vietnamese tweets on Twitter. In Advances in Intelligent Systems and Computing (Vol. 370, pp. 179–189). Springer Verlag. https://doi.org/10.1007/978-3-319-21206-7_16

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free