A Hybrid POS Tagger for Khasi, an under Resourced Language

5Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

Khasi is an Austro-Asiatic language spoken mainly in the state of Meghalaya, India, and can be considered as an under resourced and under studied language from the natural language processing perspective. Part-of-speech (POS) tagging is one of the major initial requirements in any natural language processing tasks where part of speech is assigned automatically to each word in a sentence. Therefore, it is only natural to initiate the development of a POS tagger for Khasi and this paper presents the construction of a Hybrid POS tagger for Khasi. The tagger is developed to address the tagging errors of a Khasi Hidden Markov Model (HMM) POS tagger by integrating conditional random fields (CRF). This integration incorporates language features which are otherwise not feasible in an HMM POS tagger. The results of the Hybrid Khasi tagger have shown significant improvement in the tagger’s accuracy as well as substantially reducing most of the tagging confusion of the HMM POS tagger.

Cite

CITATION STYLE

APA

Tham, M. J. (2020). A Hybrid POS Tagger for Khasi, an under Resourced Language. International Journal of Advanced Computer Science and Applications, 11(10), 333–342. https://doi.org/10.14569/IJACSA.2020.0111042

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free