Learning-based named entity recognition for morphologically-rich, resource-scarce languages

Kazi Saidul Hasan; Md Altaf Ur Rahman; Vincent Ng

Conference ProceedingsOPEN ACCESS

Learning-based named entity recognition for morphologically-rich, resource-scarce languages

EACL 2009 - 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings (2009) 354-362

DOI: 10.3115/1609067.1609106

11Citations

80Readers

Abstract

Named entity recognition for morphologically rich, case-insensitive languages, including the majority of Semitic languages, Iranian languages, and Indian languages, is inherently more difficult than its English counterpart. Worse still, progress on machine learning approaches to named entity recognition for many of these languages is currently hampered by the scarcity of annotated data and the lack of an accurate part-of-speech tagger. While it is possible to rely on manually-constructed gazetteers to combat data scarcity, this gazetteer-centric approach has the potential weakness of creating irreproducible results, since these name lists are not publicly available in general. Motivated in part by this concern, we present a learning-based named entity recognizer that does not rely on manually-constructed gazetteers, using Bengali as our representative resource-scarce, morphologically-rich language. Our recognizer achieves a relative improvement of 7.5% in F-measure over a baseline recognizer. Improvements arise from (1) using induced affixes, (2) extracting information from online lexical databases, and (3) jointly modeling part-of-speech tagging and named entity recognition. © 2009 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Hasan, K. S., Ur Rahman, M. A., & Ng, V. (2009). Learning-based named entity recognition for morphologically-rich, resource-scarce languages. In EACL 2009 - 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings (pp. 354–362). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1609067.1609106

Learning-based named entity recognition for morphologically-rich, resource-scarce languages

Abstract

Cite

Register to see more suggestions