Rethinking the Value of Gazetteer in Chinese Named Entity Recognition

Qianglong Chen; Xiangji Zeng; Jiangang Zhu; Yin Zhang; Bojia Lin; Yang Yang; Daxin Jiang

Conference Proceedings

Rethinking the Value of Gazetteer in Chinese Named Entity Recognition

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2022) 13551 LNAI 285-297

DOI: 10.1007/978-3-031-17120-8_23

3Citations

5Readers

Get full text

Abstract

Gazetteer is widely used in Chinese named entity recognition (NER) to enhance span boundary detection and type classification. However, to further understand the generalizability and effectiveness of gazetteers, the NLP community still lacks a systematic analysis of the gazetteer-enhanced NER model. In this paper, we first re-examine the effectiveness of several common practices of the gazetteer-enhanced NER models and carry out a series of detailed analyses to evaluate the relationship between the model performance and the gazetteer characteristics, which can guide us to build a more suitable gazetteer. The findings of this paper are as follows: (1) the gazetteer has a positive impact on the NER model in most situations. (2) the performance of the NER model greatly benefits from the high-quality pre-trained lexeme embeddings. (3) a good gazetteer should cover more entities that can be matched in both the training set and testing set.

Author supplied keywords

Cite

CITATION STYLE

APA

Chen, Q., Zeng, X., Zhu, J., Zhang, Y., Lin, B., Yang, Y., & Jiang, D. (2022). Rethinking the Value of Gazetteer in Chinese Named Entity Recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13551 LNAI, pp. 285–297). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-17120-8_23

Rethinking the Value of Gazetteer in Chinese Named Entity Recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions