ANERsys: An arabic named entity recognition system based on maximum entropy

148Citations
Citations of this article
88Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The task of Named Entity Recognition (NER) allows to identify proper names as well as temporal and numeric expressions, in an open-domain text. NER systems proved to be very important for many tasks in Natural Language Processing (NLP) such as Information Retrieval and Question Answering tasks. Unfortunately, the main efforts to build reliable NER systems for the Arabic language have been made in a commercial frame and the approach used as well as the accuracy of the performance are not known. In this paper, we present ANERsys: a NER system built exclusively for Arabic texts based-on n-grams and maximum entropy. Furthermore, we present both the specific Arabic language dependent heuristic and the gazetteers we used to boost our system. We developed our own training and test corpora (ANERcorp) and gazetteers (ANERgazet) to train, evaluate and boost the implemented technique. A major effort was conducted to make sure all the experiments are carried out in the same framework of the CONLL 2002 conference. We carried out several experiments and the preliminary results showed that this approach allows to tackle successfully the problem of NER for the Arabic language. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Benajiba, Y., Rosso, P., & Ruiz, J. M. B. (2007). ANERsys: An arabic named entity recognition system based on maximum entropy. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4394 LNCS, pp. 143–153). Springer Verlag. https://doi.org/10.1007/978-3-540-70939-8_13

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free