Code-mixing has become a moving method of communication among multilingual speakers. Most of the social media content of the multilingual societies are written in code-mixed text. However, most of the current translation systems neglect to convert code-mixed texts to a standard language. Most of the user written code-mixed content in social media remains unprocessed due to the unavailability of linguistic resource such as parallel corpus. This paper proposes a Neural Machine Translation(NMT) model to translate the Sinhala-English code-mixed text to the Sinhala language. Due to the limited resources available for Sinhala-English code-mixed(SECM) text, a parallel corpus is created with SECM sentences and Sinhala sentences. Srilankan social media sites contain SECM texts more frequently than the standard languages. The model proposed for code-mixed text translation in this study is a combination of Encoder-Decoder framework with LSTM units and Teachers Forcing Algorithm. The translated sentences from the model are evaluated using BLEU(Bilingual Evaluation Understudy) metric. Our model achieved a remarkable BLEU score for the translation.
CITATION STYLE
Kugathasan, A., & Sumathipala, S. (2021). Neural Machine Translation for Sinhala-English Code-Mixed Text. In International Conference Recent Advances in Natural Language Processing, RANLP (pp. 718–726). Incoma Ltd. https://doi.org/10.26615/978-954-452-072-4_082
Mendeley helps you to discover research relevant for your work.