Hybrid conditional random fields and k-means for named entity recognition on indonesian news documents

16Citations
Citations of this article
63Readers
Mendeley users who have this article in their library.

Abstract

The hybrid approach has been widely used in several Natural Language Processing, including Named Entity Recognition (NER). This research proposes a NER system for Indonesian News Documents using Hybrid Conditional Random Fields (CRF) and K-Means. The hybrid approach is to try incorporating word embedding as a cluster from K-Means and take as a feature in CRF. Word embedding is a word representation technique, and it can capture the semantic meaning of the words. The clustering result from K-Means shows that similar meaning word is grouped in the cluster. We believe this feature can improve the performance of the baseline model by adding the semantic relatedness of the word from the cluster features. Word embedding in this research uses Indonesian Word2Vec. The dataset is consisting of 51,241 entities from Indonesian Online News. We conducted some experiments by dividing the corpus into training and testing dataset using percentage splitting. We used 4 scenarios for our experiments, which are 60-40, 70-30, 80-20, and 90-10. The best performance for our model was achieved in 60-40 scenario with F1-Score around 87.18% and also improves about 5.01% compared to the baseline models. We also compare our proposed methods with several models, which are BILSTM and BILSTM-CRF, from previous research. The experiments show that our model can achieve better performance by giving the best improvement of around 4.3%.

Cite

CITATION STYLE

APA

Santoso, J., Setiawan, E. I., Yuniarno, E. M., Hariadi, M., & Purnomo, M. H. (2020). Hybrid conditional random fields and k-means for named entity recognition on indonesian news documents. International Journal of Intelligent Engineering and Systems, 13(3), 233–245. https://doi.org/10.22266/IJIES2020.0630.22

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free