A random forests text transliteration system for Greek digraphia

1Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Greeklish to Greek transcription does undeniably seem to be a challenging task since it cannot be accomplished by directly mapping each Greek character to a corresponding symbol of the Latin alphabet. The ambiguity in the human way of Greeklish writing, since Greeklish users do not follow a standardized way of transliteration makes the process of transcribing Greeklish back to Greek alphabet challenging. Even though a plethora of deterministic approaches for the task at hand exists, this paper presents a non-deterministic, vocabulary-free approach, which produces comparable and even better results, supports argot and other linguistic peculiarities, based on an ensemble classification methodology of Data Mining, namely Random Forests. Using data from real users from a conglomeration of resources such as Blogs, forums, email lists, etc., as well as artificial data from a robust stochastic Greek to Greeklish transcriber, the proposed approach depicts satisfactory outcomes in the range of 91.5%-98.5%, which is comparable to an alternative commercial approach. © 2011 IFIP International Federation for Information Processing.

Cite

CITATION STYLE

APA

Panteli, A., & Maragoudakis, M. (2011). A random forests text transliteration system for Greek digraphia. In IFIP Advances in Information and Communication Technology (Vol. 364 AICT, pp. 196–201). https://doi.org/10.1007/978-3-642-23960-1_24

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free