A Hybrid POS Tagger for Khasi, an under Resourced Language

Medari Janai Tham

Journal ArticleOPEN ACCESS

A Hybrid POS Tagger for Khasi, an under Resourced Language

Tham M

International Journal of Advanced Computer Science and Applications (2020) 11(10) 333-342

DOI: 10.14569/IJACSA.2020.0111042

9Citations

10Readers

Abstract

Khasi is an Austro-Asiatic language spoken mainly in the state of Meghalaya, India, and can be considered as an under resourced and under studied language from the natural language processing perspective. Part-of-speech (POS) tagging is one of the major initial requirements in any natural language processing tasks where part of speech is assigned automatically to each word in a sentence. Therefore, it is only natural to initiate the development of a POS tagger for Khasi and this paper presents the construction of a Hybrid POS tagger for Khasi. The tagger is developed to address the tagging errors of a Khasi Hidden Markov Model (HMM) POS tagger by integrating conditional random fields (CRF). This integration incorporates language features which are otherwise not feasible in an HMM POS tagger. The results of the Hybrid Khasi tagger have shown significant improvement in the tagger’s accuracy as well as substantially reducing most of the tagging confusion of the HMM POS tagger.

Author supplied keywords

Cite

CITATION STYLE

APA

Tham, M. J. (2020). A Hybrid POS Tagger for Khasi, an under Resourced Language. International Journal of Advanced Computer Science and Applications, 11(10), 333–342. https://doi.org/10.14569/IJACSA.2020.0111042

A Hybrid POS Tagger for Khasi, an under Resourced Language

Abstract

Author supplied keywords

Cite

Register to see more suggestions