Named entity extraction via automatic labeling and tri-training: Comparison of selection methods

Chien Lung Chou; Chia Hui Chang

Journal Article

Named entity extraction via automatic labeling and tri-training: Comparison of selection methods

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8870 244-255

DOI: 10.1007/978-3-319-12844-3_21

4Citations

9Readers

Get full text

Abstract

Detecting named entities from documents is one of the most important tasks in knowledge engineering. Previous studies rely on annotated training data, which is quite expensive to obtain large training data sets, limiting the effectiveness of recognition. In this research, we propose a semi-supervised learning approach for named entity recognition (NER) via automatic labeling and tritraining which make use of unlabeled data and structured resources containing known named entities. By modifying tri-training for sequence labeling and deriving proper initialization, we can train a NER model for Web news articles automatically with satisfactory performance. In the task of Chinese personal name extraction from 8,672 news articles on the Web (with 364,685 sentences and 54,449 (11,856 distinct) person names), an F-measure of 90.4% can be achieved.

Author supplied keywords

Cite

CITATION STYLE

APA

Chou, C. L., & Chang, C. H. (2014). Named entity extraction via automatic labeling and tri-training: Comparison of selection methods. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8870, 244–255. https://doi.org/10.1007/978-3-319-12844-3_21

Named entity extraction via automatic labeling and tri-training: Comparison of selection methods

Abstract

Author supplied keywords

Cite

Register to see more suggestions