Analysing wikipedia and gold-standard corpora for NER training

Joel Nothman; Tara Murphy; James R. Curran

Conference Proceedings

Analysing wikipedia and gold-standard corpora for NER training

EACL 2009 - 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings (2009) 612-620

DOI: 10.3115/1609067.1609135

42Citations

145Readers

Get full text

Abstract

Named entity recognition (NER) for English typically involves one of three gold standards: MUC, CoNLL, or BBN, all created by costly manual annotation. Recent work has used Wikipedia to automatically create a massive corpus of named entity annotated text. We present the first comprehensive cross-corpus evaluation of NER. We identify the causes of poor cross-corpus performance and demonstrate ways of making them more compatible. Using our process, we develop a Wikipedia corpus which outperforms gold standard corpora on cross-corpus evaluation by up to 11%. © 2009 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Nothman, J., Murphy, T., & Curran, J. R. (2009). Analysing wikipedia and gold-standard corpora for NER training. In EACL 2009 - 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings (pp. 612–620). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1609067.1609135

Analysing wikipedia and gold-standard corpora for NER training

Abstract

Cite

Register to see more suggestions