Named Entity Recognition in Arabic is a challenging topic because of morphological and lexical richness of Arabic. In this paper, we propose an Arabic NER system that is based on word embedding. Word embedding hold semantic information about the context of the words. We hypothesized that the integration of word embedding features to the conventional lexical and contextual features could improve Arabic NER performance. The Conditional Random Field (CRF) sequence classifier was used. Since most CRF implementations only support categorical features, continuous word embedding vectors are clustered. In this paper, we are mainly investigating the effect of the number of clusters on NER performance. Moreover, the combination of fine and coarse grained clusters has resulted in further recognition improvement.
CITATION STYLE
Sabty, C., Elmahdy, M., & Abdennadher, S. (2023). Arabic Named Entity Recognition Using Clustered Word Embedding. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13396 LNCS, pp. 41–49). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-23793-5_4
Mendeley helps you to discover research relevant for your work.