Deep neural networks (DNNs) require a large amount of manually labeled training data to make significant achievements. However, manual labeling is laborious and costly. In this study, we propose a method for automatically generating training data and effectively using the generated data to reduce the labeling cost. The generated data (called 'machine-labeled data') is generated using a bagging-based bootstrapping approach. However, using the machine-labeled data does not guarantee high performance because of errors in the automatic labeling. In order to reduce the impact of mislabeling, we applied a transfer learning approach. The effect of our proposed method was verified with two versions of DNN-based named entity recognition (NER) models: bidirectional LSTM-CRF and vanilla BERT. We conducted NER tasks in two languages (English and Korean). The proposed method results in average F1 scores of 78.87% (3.9% point improvement) with bidirectional LSTM-CRF and 82.08% (1% point improvement) with BERT on three Korean NER datasets. In English, the performance increased by an average of 0.45% points with the two DNN-based models. The proposed NER systems outperform the baseline systems in both languages without the need for additional manual labeling.
CITATION STYLE
Kim, J., Ko, Y., & Seo, J. (2020). Construction of Machine-Labeled Data for Improving Named Entity Recognition by Transfer Learning. IEEE Access, 8, 59684–59693. https://doi.org/10.1109/ACCESS.2020.2981361
Mendeley helps you to discover research relevant for your work.