A hybrid approach for building Arabic diacritizer

Khaled Shaalan; Hitham M.Abo Bakr; Ibrahim Ziedan

Conference Proceedings

A hybrid approach for building Arabic diacritizer

Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages, SEMITIC@EACL 2009 (2009) 27-35

DOI: 10.3115/1621774.1621780

40Citations

90Readers

Get full text

Abstract

Modern standard Arabic is usually written without diacritics. This makes it difficult for performing Arabic text processing. Diacritization helps clarify the meaning of words and disambiguate any vague spellings or pronunciations, as some Arabic words are spelled the same but differ in meaning. In this paper, we address the issue of adding diacritics to undiacritized Arabic text using a hybrid approach. The approach requires an Arabic lexicon and large corpus of fully diacritized text for training purposes in order to detect diacritics. Case- Ending is treated as a separate post processing task using syntactic information. The hybrid approach relies on lexicon retrieval, bigram, and SVM-statistical prioritized techniques. We present results of an evaluation of the proposed diacritization approach and discuss various modifications for improving the performance of this approach.

Cite

CITATION STYLE

APA

Shaalan, K., Bakr, H. M. A., & Ziedan, I. (2009). A hybrid approach for building Arabic diacritizer. In Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages, SEMITIC@EACL 2009 (pp. 27–35). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1621774.1621780

A hybrid approach for building Arabic diacritizer

Abstract

Cite

Register to see more suggestions