We study a task of noisy text normalization focusing on Vietnamese tweets. This task aims to improve the performance of applications mining or analyzing semantics of social media contents as well as other social network analysis applications. Since tweets on Twitter are noisy, irregular, short and consist of acronym, spelling errors, processing those tweets is more challenging than that of news or formal texts. In this paper, we proposed a method that aims to normalize Vietnamese tweets by detecting non-standard words as well as spelling errors and correcting them. The method combines a language model with dictionaries and Vietnamese vocabulary structures. We build a dataset including 1,360 Vietnamese tweets to evaluate the proposed method. Experiment results show that our method achieved encouraging performance with 89% F1-Score.
CITATION STYLE
Nguyen, V. H., Nguyen, H. T., & Snasel, V. (2015). Normalization of Vietnamese tweets on Twitter. In Advances in Intelligent Systems and Computing (Vol. 370, pp. 179–189). Springer Verlag. https://doi.org/10.1007/978-3-319-21206-7_16
Mendeley helps you to discover research relevant for your work.