Arabic Diacritization: Stats, Rules, and Hacks

63Citations
Citations of this article
96Readers
Mendeley users who have this article in their library.

Abstract

In this paper, we present a new and fast state-of-the-art Arabic diacritizer that guesses the diacritics of words and then their case endings. We employ a Viterbi decoder at word-level with back-off to stem, morphological patterns, and transliteration and sequence labeling based diacritization of named entities. For case endings, we use Support Vector Machine (SVM) based ranking coupled with morphological patterns and linguistic rules to properly guess case endings. We achieve a low word level diacritization error of 3.29% and 12.77% without and with case endings respectively on a new multi-genre free of copyright test set. We are making the diacritizer available for free for research purposes.

Cite

CITATION STYLE

APA

Darwish, K., Mubarak, H., & Abdelali, A. (2017). Arabic Diacritization: Stats, Rules, and Hacks. In WANLP 2017, co-located with EACL 2017 - 3rd Arabic Natural Language Processing Workshop, Proceedings of the Workshop (pp. 9–17). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/W17-1302

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free