Normalizing Non-canonical Turkish texts using machine translation approaches

12Citations
Citations of this article
87Readers
Mendeley users who have this article in their library.

Abstract

With the growth of the social web, usergenerated text data has reached unprecedented sizes. Non-canonical text normalization provides a way to exploit this as a practical source of training data for language processing systems. The state of the art in Turkish text normalization is composed of a tokenlevel pipeline of modules, heavily dependent on external linguistic resources and manuallydefined rules. Instead, we propose a fullyautomated, context-aware machine translation approach with fewer stages of processing. Experiments with various implementations of our approach show that we are able to surpass the current best-performing system by a large margin.

Cite

CITATION STYLE

APA

Çolakoǧlu, T., Sulubacak, U., & Tantug, A. C. (2019). Normalizing Non-canonical Turkish texts using machine translation approaches. In ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop (pp. 267–272). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p19-2037

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free