Word N-Gram based approach for word sense disambiguation in Telugu Natural Language Processing

3Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Telugu is one of the Dravidian languages which is morphologically rich. As in the other languages it too contains polysemous words which have different meanings in different contexts. There are several language models exist to solve the word sense disambiguation problem with respect to each language like English, Chinese, Hindi and Kannada etc. The proposed method gives a solution for the word sense disambiguation problem with the help of n-gram technique which has given good results in many other languages. The methodology mentioned in this paper finds the co-occurrence words of target polysemous word and we call them as n-grams. A Telugu corpus sent as input for training phase to find n-gram joint probabilities. By considering these joint probabilities the target polysemous word will be assigned a correct sense in testing phase. We evaluate the proposed method on some polysemous Telugu nouns and verbs. The methodology proposed gives the F-measure 0.94 when tested on Telugu corpus collected from CIIL, various news papers and story books.The present methodology can give better results with increase in size of training corpus and in future we plan to evaluate it on all words not only nouns and verbs.

Cite

CITATION STYLE

APA

Prasad, P. D., Sunitha, K. V. N., & Rani, B. P. (2019). Word N-Gram based approach for word sense disambiguation in Telugu Natural Language Processing. International Journal of Recent Technology and Engineering, 7(6), 686–690. https://doi.org/10.35940/ijitee.f1199.0486s419

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free