Tag confidence measure for semi-automatically updating named entity recognition

Kuniko Saito; Kenji Imamura

Conference ProceedingsOPEN ACCESS

Tag confidence measure for semi-automatically updating named entity recognition

NEWS 2009 - 2009 Named Entities Workshop: Shared Task on Transliteration at the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL-IJCNLP 2009 (2009) 168-176

DOI: 10.5715/jnlp.17.4_3

1Citations

77Readers

Abstract

We present two techniques to reduce machine learning cost, i.e., cost of manually annotating unlabeled data, for adapting existing CRF-based named entity recognition (NER) systems to new texts or domains. We introduce the tag posterior probability as the tag confidence measure of an individual NE tag determined by the base model. Dubious tags are automatically detected as recognition errors, and regarded as targets of manual correction. Compared to entire sentence posterior probability, tag posterior probability has the advantage of minimizing system cost by focusing on those parts of the sentence that require manual correction. Using the tag confidence measure, the first technique, known as active learning, asks the editor to assign correct NE tags only to those parts that the base model could not assign tags confidently. Active learning reduces the learning cost by 66%, compared to the conventional method. As the second technique, we propose bootstrapping NER, which semi-automatically corrects dubious tags and updates its model.

Cite

CITATION STYLE

APA

Saito, K., & Imamura, K. (2009). Tag confidence measure for semi-automatically updating named entity recognition. In NEWS 2009 - 2009 Named Entities Workshop: Shared Task on Transliteration at the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL-IJCNLP 2009 (pp. 168–176). Association for Computational Linguistics (ACL). https://doi.org/10.5715/jnlp.17.4_3

Readers' Seniority

PhD / Post grad / Masters / Doc 30

75%

Researcher 5

13%

Professor / Associate Prof. 3

Lecturer / Post doc 2

Readers' Discipline

Computer Science 32

78%

Linguistics 6

15%

Engineering 2

Neuroscience 1

Tag confidence measure for semi-automatically updating named entity recognition

Abstract

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline