A Simple but Effective Approach to Improve Arabizi-to-English Statistical Machine Translation

  • van der Wees M
  • Bisazza A
  • Monz C
N/ACitations
Citations of this article
84Readers
Mendeley users who have this article in their library.

Abstract

A major challenge for statistical machine translation (SMT) of Arabic-to-English user-generated text is the prevalence of text written in Arabizi, or Romanized Arabic. When facing such texts, a translation system trained on conventional Arabic-English data will suffer from extremely low model coverage. In addition, Arabizi is not regulated by any official standardization and therefore highly ambiguous, which prevents rule-based approaches from achieving good translation results. In this paper, we improve Arabizi-to-English machine translation by presenting a simple but effective Arabizi-to-Arabic transliteration pipeline that does not require knowledge by experts or native Arabic speakers. We incorporate this pipeline into a phrase-based SMT system, and show that translation quality after automatically transliterating Arabizi to Arabic yields results that are comparable to those achieved after human transliteration.

Cite

CITATION STYLE

APA

van der Wees, M., Bisazza, A., & Monz, C. (2016). A Simple but Effective Approach to Improve Arabizi-to-English Statistical Machine Translation. Proceedings of the 2nd Workshop on Noisy User-Generated Text (WNUT), 43–50. Retrieved from https://aclanthology.org/W16-3908.pdf

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free