Abstract
The Arabic language has far richer systems of inflection and derivation than English which has very little morphology. This morphology difference causes a large gap between the vocabulary sizes in any given parallel training corpus. Segmentation of inflected Arabic words is a way to smooth its highly morphological nature. In this paper, we describe some statistically and linguistically motivated methods for Arabic word segmentation. Then, we show the efficiency of proposed methods on the Arabic-English BTEC and NIST tasks.
Cite
CITATION STYLE
El Isbihani, A., Khadivi, S., Bender, O., & Ney, H. (2006). Morpho-syntactic arabic preprocessing for Arabic-to-english statistical machine translation. In HLT-NAACL 2006 - Statistical Machine Translation, Proceedings of the Workshop (pp. 15–22). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1654650.1654654
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.