English to Urdu Statistical Machine Translation: Establishing a Baseline

6Citations
Citations of this article
81Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The aim of this paper is to categorize and present the existence of resources for English-to-Urdu machine translation (MT) and to establish an empirical baseline for this task. By doing so, we hope to set up a common ground for MT research with Urdu to allow for a congruent progress in this field. We build baseline phrase-based MT (PBMT) and hierarchical MT systems and report the results on 3 official independent test sets. On all test sets, hierarchial MT significantly outperformed PBMT. The highest single-reference BLEU score is achieved by the hierarchical system and reaches 21.58% but this figure depends on the randomly selected test set. Our manual evaluation of 175 sentences suggests that in 45% of sentences, the hierarchical MT is ranked better than the PBMT output compared to 21% of sentences where PBMT wins, the rest being equal.

Cite

CITATION STYLE

APA

Jawaid, B., Kamran, A., & Bojar, O. (2014). English to Urdu Statistical Machine Translation: Establishing a Baseline. In Proceedings of the Conference - 5th Workshop on South and Southeast Asian NLP, WSSANLP 2014 - co-located with the 25th International Conference on Computational Linguistics, COLING 2014 (pp. 37–42). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-5505

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free