Improving egyptian-to-English SMT by mapping egyptian into MSA

5Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

One of the aims of DARPA BOLT project is to translate the Egyptian blog data into English. While the parallel data for MSA-English is abundantly available, sparsely exists for Egyptian-English and Egyptian-MSA. A notable drop in the translation quality is observed when translating Egyptian to English in comparison with translating from MSA to English. One of the reasons for this drop is the high OOV rate, where as another is the dialectal differences between training and test data. This work is focused on improving Egyptian-to-English translation by bridging the gap between Egyptian and MSA. First we try to reduce the OOV rate by proposing MSA candidates for the unknown Egyptian words through different methods such as spelling correction, suggesting synonyms based on context etc. Secondly we apply convolution model using English as a pivot to map Egyptian words into MSA. We then evaluate our edits by running decoder built on MSA-to-English data. Our spelling-based correction shows an improvement of 1.7 BLEU points over the baseline system, that translates unedited Egyptian into English. © 2014 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Durrani, N., Al-Onaizan, Y., & Ittycheriah, A. (2014). Improving egyptian-to-English SMT by mapping egyptian into MSA. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8404 LNCS, pp. 271–282). Springer Verlag. https://doi.org/10.1007/978-3-642-54903-8_23

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free