Information retrieval for Gujarati language using cosine similarity based vector space model

16Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Based on user query, to retrieve most relevant documents from the web for resource poor languages is a crucial task in Information Retrieval (IR) system. This paper presents Cosine Similarity Based Vector Space Document Model (VSDM) for Information Retrieval in Gujarati language. VSDM is widely used in information retrieval and document classification where each document is represented as a vector and each dimension corresponds to a separate term. Influence and relevancy of documents with user query is measured using cosine similarity under vector space where set of documents is considered as a set of vectors. The present work considers user query as a free order text, i.e., the word sequence does not affect results of the IR system. Technically, this is Natural Language Processing (NLP) application wherein stop-words removal, Term Frequency (TF) calculation, Normalized Term Frequency (NF) calculation and Inverse Document Frequency (IDF) calculation was done for 1360 files using Text and PDF formats and precision and recall values of 78 % and 86 % efficiency respectively were recorded. To the best of our knowledge, this is first IR task in Gujarati language using cosine similarity based calculations.

Cite

CITATION STYLE

APA

Rakholia, R. M., & Saini, J. R. (2017). Information retrieval for Gujarati language using cosine similarity based vector space model. In Advances in Intelligent Systems and Computing (Vol. 516, pp. 1–9). Springer Verlag. https://doi.org/10.1007/978-981-10-3156-4_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free