KanSan: Kannada-Sanskrit Parallel Corpus Construction for Machine Translation

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Machine Translation (MT) is the process of automatic conversion of text from the source language into a target language preserving the meaning in the source text. Large parallel corpora are the essential resources to build any MT model. However, most of the languages are under-resourced due to lack of computational tools and digital resources with respect to parallel corpora for MT. Further, translation of under-resourced languages with complex morphological structures are more challenging. In view of these factors, this paper describes the practical approaches to develop MT systems for Kannada-Sanskrit language pair from scratch. This work comprises of the construction of KanSan - a parallel corpus for Kannada-Sanskrit language pair and implementation of MT baselines for translating Kannada text to Sanskrit text and vice versa. The models, namely: Recurrent Neural Network (RNN), Bidirectional Recurrent Neural Network (BiRNN), transformer-based Neural Machine Translation (NMT) with and without subword tokenization, and Statistical Machine Translation (SMT) are implemented for MT of Kannada text to Sanskrit text and vice versa. The performance of MT models is measured in terms of Bilingual Evaluation Understudy (BLEU) score. Among all the models, the transformer-based model with subword tokenization performed best with BLEU scores of 9.84 and 12.63 for Kannada to Sanskrit and Sanskrit to Kannada MT respectively.

Cite

CITATION STYLE

APA

Hegde, A., & Shashirekha, H. L. (2023). KanSan: Kannada-Sanskrit Parallel Corpus Construction for Machine Translation. In Communications in Computer and Information Science (Vol. 1802 CCIS, pp. 3–18). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-33231-9_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free