Abstract
In this paper, we study the named entity recognition (NER) problem under distant supervision. Due to the incompleteness of the external dictionaries and/or knowledge bases, such distantly annotated training data usually suffer from a high false negative rate. To this end, we formulate the Distantly Supervised NER (DS-NER) problem via Multi-class Positive and Unlabeled (MPU) learning and propose a theoretically and practically novel CONFidence-based MPU (Conf-MPU) approach. To handle the incomplete annotations, Conf-MPU consists of two steps. First, a confidence score is estimated for each token of being an entity token. Then, the proposed Conf-MPU risk estimation is applied to train a multi-class classifier for the NER task. Thorough experiments on two benchmark datasets labeled by various external knowledge demonstrate the superiority of the proposed Conf-MPU over existing DS-NER methods. Our code is available at Github.
Cite
CITATION STYLE
Zhou, K., Li, Y., & Li, Q. (2022). Distantly Supervised Named Entity Recognition via Confidence-Based Multi-Class Positive and Unlabeled Learning. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 7198–7211). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.498
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.