Neural Machine Translation for Sinhala-English Code-Mixed Text

3Citations
Citations of this article
39Readers
Mendeley users who have this article in their library.

Abstract

Code-mixing has become a moving method of communication among multilingual speakers. Most of the social media content of the multilingual societies are written in code-mixed text. However, most of the current translation systems neglect to convert code-mixed texts to a standard language. Most of the user written code-mixed content in social media remains unprocessed due to the unavailability of linguistic resource such as parallel corpus. This paper proposes a Neural Machine Translation(NMT) model to translate the Sinhala-English code-mixed text to the Sinhala language. Due to the limited resources available for Sinhala-English code-mixed(SECM) text, a parallel corpus is created with SECM sentences and Sinhala sentences. Srilankan social media sites contain SECM texts more frequently than the standard languages. The model proposed for code-mixed text translation in this study is a combination of Encoder-Decoder framework with LSTM units and Teachers Forcing Algorithm. The translated sentences from the model are evaluated using BLEU(Bilingual Evaluation Understudy) metric. Our model achieved a remarkable BLEU score for the translation.

Cite

CITATION STYLE

APA

Kugathasan, A., & Sumathipala, S. (2021). Neural Machine Translation for Sinhala-English Code-Mixed Text. In International Conference Recent Advances in Natural Language Processing, RANLP (pp. 718–726). Incoma Ltd. https://doi.org/10.26615/978-954-452-072-4_082

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free