Hindi is the widely used spoken language in the Indian subcontinent, and is used by more than 260 million Indians citizens. Indian governments has many digital initiatives to serve Indian citizen better, hence Hindi language becomes one of the important languages to serve Indian citizen. The Government initiatives are like smart city, Hospital Services, Common Service Centers, Digital Payment Ecosystem, Pensioners Scheme, Digital Locker and many more. These all initiative are served using mobile and web based applications, which citizens can access easily instead of visiting various government departments. To serve the large Hindi speaking population, it is necessary to handle the ambiguous words which have multiple connotations in any natural language processing task. In this paper, word sense disambiguation for Hindi language is proposed. Proposed method makes use of Lesk algorithm to disambiguate the Hindi words. Novel scoring method is used to assign a sense score to each token of the Hindi sentence. The sense score is calculated based on the gloss, hypernym, hyponym and synonym of the combinations of different sense of tokens. Hindi WordNet database created by CFILT, IIT Bombay is used in the proposed system. The proposed algorithm takes a natural language (NL) sentence in Hindi (Devanagari script) and process the sentence according to the score based approach modeled on the basic Lesk algorithm with the help of Hindi WordNet designed by CFILT IIT Bombay. The solution provided in this paper can be used vividly in various web based applications like Query-Response Systems, Question-Answer Systems, Sentiment analysis, Recommendation systems etc.
CITATION STYLE
Tripathi, P., Mukherjee, P., Hendre, M., Godse, M., & Chakraborty, B. (2021). Word Sense Disambiguation in Hindi Language Using Score Based Modified Lesk Algorithm. International Journal of Computing and Digital Systems, 10(1), 939–954. https://doi.org/10.12785/IJCDS/100185
Mendeley helps you to discover research relevant for your work.