Automatic Creation of Arabic Named Entity Annotated Corpus Using Wikipedia

Maha Althobaiti; Udo Kruschwitz; Massimo Poesio

Conference ProceedingsOPEN ACCESS

Automatic Creation of Arabic Named Entity Annotated Corpus Using Wikipedia

EACL 2014 - 14th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Student Research Workshop (2014) 106-115

DOI: 10.3115/v1/e14-3012

N/ACitations

81Readers

Abstract

In this paper we propose a new methodology to exploit Wikipedia features and structure to automatically develop an Arabic NE annotated corpus. Each Wikipedia link is transformed into an NE type of the target article in order to produce the NE annotation. Other Wikipedia features - namely redirects, anchor texts, and inter-language links - are used to tag additional NEs, which appear without links in Wikipedia texts. Furthermore, we have developed a filtering algorithm to eliminate ambiguity when tagging candidate NEs. Herein we also introduce a mechanism based on the high coverage of Wikipedia in order to address two challenges particular to tagging NEs in Arabic text: rich morphology and the absence of capitalisation. The corpus created with our new method (WDC) has been used to train an NE tagger which has been tested on different domains. Judging by the results, an NE tagger trained on WDC can compete with those trained on manually annotated corpora.

Cite

CITATION STYLE

APA

Althobaiti, M., Kruschwitz, U., & Poesio, M. (2014). Automatic Creation of Arabic Named Entity Annotated Corpus Using Wikipedia. In EACL 2014 - 14th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Student Research Workshop (pp. 106–115). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/e14-3012

Readers' Seniority

PhD / Post grad / Masters / Doc 25

69%

Researcher 6

17%

Lecturer / Post doc 4

11%

Professor / Associate Prof. 1

Readers' Discipline

Computer Science 34

81%

Linguistics 6

14%

Neuroscience 1

Social Sciences 1

Automatic Creation of Arabic Named Entity Annotated Corpus Using Wikipedia

Abstract

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline