Abstract
With the growth of the social web, usergenerated text data has reached unprecedented sizes. Non-canonical text normalization provides a way to exploit this as a practical source of training data for language processing systems. The state of the art in Turkish text normalization is composed of a tokenlevel pipeline of modules, heavily dependent on external linguistic resources and manuallydefined rules. Instead, we propose a fullyautomated, context-aware machine translation approach with fewer stages of processing. Experiments with various implementations of our approach show that we are able to surpass the current best-performing system by a large margin.
Cite
CITATION STYLE
Çolakoǧlu, T., Sulubacak, U., & Tantug, A. C. (2019). Normalizing Non-canonical Turkish texts using machine translation approaches. In ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop (pp. 267–272). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p19-2037
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.