A Hidden Markov Model based named entity recognition system: Bengali and Hindi as case studies

Asif Ekbal; Sivaji Bandyopadhyay

Conference ProceedingsOPEN ACCESS

A Hidden Markov Model based named entity recognition system: Bengali and Hindi as case studies

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4815 LNCS 545-552

DOI: 10.1007/978-3-540-77046-6_67

34Citations

19Readers

Abstract

Named Entity Recognition (NER) has an important role in almost all Natural Language Processing (NLP) application areas including information retrieval, machine translation, question-answering system, automatic summarization etc. This paper reports about the development of a statistical Hidden Markov Model (HMM) based NER system. The system is initially developed for Bengali using a tagged Bengali news corpus, developed from the archive of a leading Bengali newspaper available in the web. The system is trained with a training corpus of 150,000 wordforms, initially tagged with a HMM based part of speech (POS) tagger. Evaluation results of the 10-fold cross validation test yield an average Recall, Precision and F-Score values of 90.2%, 79.48% and 84.5%, respectively. This HMM based NER system is then trained and tested on the Hindi data to show its effectiveness towards the language independent abilities. Experimental results of the 10-fold cross validation test has demonstrated the average Recall, Precision and F-Score values of 82.5%, 74.6% and 78.35%, respectively with 27,151 Hindi wordforms. © Springer-Verlag Berlin Heidelberg 2007.

Author supplied keywords

Cite

CITATION STYLE

APA

Ekbal, A., & Bandyopadhyay, S. (2007). A Hidden Markov Model based named entity recognition system: Bengali and Hindi as case studies. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4815 LNCS, pp. 545–552). Springer Verlag. https://doi.org/10.1007/978-3-540-77046-6_67

A Hidden Markov Model based named entity recognition system: Bengali and Hindi as case studies

Abstract

Author supplied keywords

Cite

Register to see more suggestions