Towards a Richer IndoWordNet with New Additions for Hindi and Gujarati Languages

2Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

The authors of this research paper present a mechanism for dealing with loanwords, missing words, and newly developed terms inclusion issues in WordNets. WordNet has evolved as one of the most prominent Natural Language Processing (NLP) toolkits. This mechanism can be used to improve the WordNet of any language. The authors chose to work with the Hindi and Gujarati languages in this research work to achieve a higher quality research aspect because these are the languages with major dialects. The research work used more than 5000 Hindi verse-based data corpus instead of a prose-based data corpus.As a result, nearly 14000 Hindi words were discovered that were not present in the popular Hindi IndoWordNet, accounting for 13.23 percent of the total existing word count of 105000+. Working with idioms was a distinct method for the Gujarati language. Around 3500 idioms data were used, and nearly 900 Gujarati terms were discovered that did not exist in the IndoWordNet, accounting for nearly 1.4 percent of the total of 64000+ Gujarati words in the IndoWordNet. It will also contribute almost 14000 Hindi words and around 900 Gujarati words to the IndoWordNet project.

Cite

CITATION STYLE

APA

Audichya, M. K., Saini, J. R., & Modh, J. C. (2022). Towards a Richer IndoWordNet with New Additions for Hindi and Gujarati Languages. International Journal of Advanced Computer Science and Applications, 13(7), 837–842. https://doi.org/10.14569/IJACSA.2022.0130797

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free