Abstract
Named entity recognition (NER) for English typically involves one of three gold standards: MUC, CoNLL, or BBN, all created by costly manual annotation. Recent work has used Wikipedia to automatically create a massive corpus of named entity annotated text. We present the first comprehensive cross-corpus evaluation of NER. We identify the causes of poor cross-corpus performance and demonstrate ways of making them more compatible. Using our process, we develop a Wikipedia corpus which outperforms gold standard corpora on cross-corpus evaluation by up to 11%. © 2009 Association for Computational Linguistics.
Cite
CITATION STYLE
Nothman, J., Murphy, T., & Curran, J. R. (2009). Analysing wikipedia and gold-standard corpora for NER training. In EACL 2009 - 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings (pp. 612–620). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1609067.1609135
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.