Deep learning models have been the state-of-the-art for a variety of challenging tasks in natural language processing, but to achieve good results they often require big labeled datasets. Deep active learning algorithms were designed to reduce the annotation cost for training such models. Current deep active learning algorithms, however, aim at training a good deep learning model with as little labeled data as possible, and as such are not useful in scenarios where the full dataset must be labeled. As a solution to this problem, this work investigates deep active-self learning algorithms that employ self-labeling using the trained model to help alleviate the cost of annotating full datasets for named entity recognition tasks. The experiments performed indicate that the proposed deep active-self learning algorithm is capable of reducing manual annotation costs for labeling the complete dataset for named entity recognition with less than 2% of the self labeled tokens being mislabeled. We also investigate an early stopping technique that doesn’t rely on a validation set, which effectively reduces even further the annotation costs of the proposed active-self learning algorithm in real world scenarios.
CITATION STYLE
Neto, J. R. C. S. A. V. S., & Faleiros, T. de P. (2021). Deep Active-Self Learning Applied to Named Entity Recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13074 LNAI, pp. 405–418). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-91699-2_28
Mendeley helps you to discover research relevant for your work.