A Class-Rebalancing Self-Training Framework for Distantly-Supervised Named Entity Recognition

3Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

Distant supervision reduces the reliance on human annotation in the named entity recognition tasks. The class-level imbalanced distant annotation is a realistic and unexplored problem, and the popular method of self-training can not handle class-level imbalanced learning. More importantly, self-training is dominated by the high-performance class in selecting candidates, and deteriorates the low-performance class with the bias of generated pseudo label. To address the class-level imbalance performance, we propose a class-rebalancing self-training framework for improving the distantly-supervised named entity recognition. In candidate selection, a class-wise flexible threshold is designed to fully explore other classes besides the high-performance class. In label generation, injecting the distant label, a hybrid pseudo label is adopted to provide straight semantic information for the low-performance class. Experiments on five flat and two nested datasets show that our model achieves state-of-the-art results. We also conduct extensive research to analyze the effectiveness of the flexible threshold and the hybrid pseudo label.

Cite

CITATION STYLE

APA

Li, Q., Xie, T., Peng, P., Wang, H., & Wang, G. (2023). A Class-Rebalancing Self-Training Framework for Distantly-Supervised Named Entity Recognition. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 11054–11068). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.703

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free