Khasi is an Austro-Asiatic language spoken mainly in the state of Meghalaya, India, and can be considered as an under resourced and under studied language from the natural language processing perspective. Part-of-speech (POS) tagging is one of the major initial requirements in any natural language processing tasks where part of speech is assigned automatically to each word in a sentence. Therefore, it is only natural to initiate the development of a POS tagger for Khasi and this paper presents the construction of a Hybrid POS tagger for Khasi. The tagger is developed to address the tagging errors of a Khasi Hidden Markov Model (HMM) POS tagger by integrating conditional random fields (CRF). This integration incorporates language features which are otherwise not feasible in an HMM POS tagger. The results of the Hybrid Khasi tagger have shown significant improvement in the tagger’s accuracy as well as substantially reducing most of the tagging confusion of the HMM POS tagger.
CITATION STYLE
Tham, M. J. (2020). A Hybrid POS Tagger for Khasi, an under Resourced Language. International Journal of Advanced Computer Science and Applications, 11(10), 333–342. https://doi.org/10.14569/IJACSA.2020.0111042
Mendeley helps you to discover research relevant for your work.