Analysing wikipedia and gold-standard corpora for NER training

42Citations
Citations of this article
145Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Named entity recognition (NER) for English typically involves one of three gold standards: MUC, CoNLL, or BBN, all created by costly manual annotation. Recent work has used Wikipedia to automatically create a massive corpus of named entity annotated text. We present the first comprehensive cross-corpus evaluation of NER. We identify the causes of poor cross-corpus performance and demonstrate ways of making them more compatible. Using our process, we develop a Wikipedia corpus which outperforms gold standard corpora on cross-corpus evaluation by up to 11%. © 2009 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Nothman, J., Murphy, T., & Curran, J. R. (2009). Analysing wikipedia and gold-standard corpora for NER training. In EACL 2009 - 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings (pp. 612–620). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1609067.1609135

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free