In this paper, we implement a multilingual Statistical Machine Translation (SMT) system for Arabic-English Translation. Arabic Text can be categorized into standard and dialectal Arabic. These two forms of Arabic differ significantly. Different mono-lingual and multi-lingual hybrid SMT approaches are compared. Mono-lingual systems do always result in better translation accuracy in one Arabic form and poor accuracy in the other. Multi-lingual SMT models that are trained with pooled parallel MSA/dialectal data result in better accuracy. However, since the available parallel MSA data are much larger compared to dialectal data, multilingual models are biased to MSA. We propose in the work, a multi-lingual combination of different mono-lingual systems using an Arabic form classifier. The outcome of the classier directs the system to use the appropriate mono-lingual models (standard, dialectal, or mixture). Testing the different SMT systems shows that the proposed classifier-based SMT system outperforms mono-lingual and data-pooled multi-lingual systems.
CITATION STYLE
Bastawisy, A., & Elmahdy, M. (2017). Multi-lingual phrase-based statistical machine translation for Arabic-English. In International Conference Recent Advances in Natural Language Processing, RANLP (Vol. 2017-September, pp. 86–89). Incoma Ltd. https://doi.org/10.26615/978-954-452-049-6_013
Mendeley helps you to discover research relevant for your work.