Comparative analysis of rule-based, dictionary-based and hybrid stemmers for gujarati language

0Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Gujarati is an Indo-Aryan language spoken substantially by people of Gujarat state of India. It is highly and actively used for communication in Gujarat government’s educational institutes and offices, local industries, businesses as well as in media such as newspapers, magazines, radio and television programs. In all these areas, Internet is the keen requirement today. Its utilization will be increased if contents are provided on web in regional language using the notion of Natural Language Processing (NLP). In NLP, stemming plays a vital role in retrieving accurate contents and producing effective results for web search query. It identifies the root word from morphological variants of respective word. There are three typical approaches to perform stemming: rule-based approach, dictionary-based approach and hybrid approach. In this paper, we present a comparative empirical study of these three approaches for Gujarati language. The aim of the study is to evaluate the effectiveness of different types of stemmers for Gujarati language. Firstly, we discuss the rule-based algorithm and present its evaluation with 152 different suffix stripping rules. Next, we illustrate stemming mechanism developed using Gujarati dictionary that contains around 20000 root words. Lastly, we discuss the hybrid approach that is a combination of rule-based and dictionary-based approaches. Experimental results reveal that hybrid approach retrieves more accurate stemmed words compared to rule-based and dictionary-based approaches.

Cite

CITATION STYLE

APA

Dave, N. R., & Mehta, M. A. (2019). Comparative analysis of rule-based, dictionary-based and hybrid stemmers for gujarati language. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11932 LNCS, pp. 140–155). Springer. https://doi.org/10.1007/978-3-030-37188-3_9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free