Information retrieval for Gujarati language using cosine similarity based vector space model

Rajnish M. Rakholia; Jatinderkumar R. Saini

Conference Proceedings

Information retrieval for Gujarati language using cosine similarity based vector space model

Advances in Intelligent Systems and Computing (2017) 516 1-9

DOI: 10.1007/978-981-10-3156-4_1

16Citations

11Readers

Get full text

Abstract

Based on user query, to retrieve most relevant documents from the web for resource poor languages is a crucial task in Information Retrieval (IR) system. This paper presents Cosine Similarity Based Vector Space Document Model (VSDM) for Information Retrieval in Gujarati language. VSDM is widely used in information retrieval and document classification where each document is represented as a vector and each dimension corresponds to a separate term. Influence and relevancy of documents with user query is measured using cosine similarity under vector space where set of documents is considered as a set of vectors. The present work considers user query as a free order text, i.e., the word sequence does not affect results of the IR system. Technically, this is Natural Language Processing (NLP) application wherein stop-words removal, Term Frequency (TF) calculation, Normalized Term Frequency (NF) calculation and Inverse Document Frequency (IDF) calculation was done for 1360 files using Text and PDF formats and precision and recall values of 78 % and 86 % efficiency respectively were recorded. To the best of our knowledge, this is first IR task in Gujarati language using cosine similarity based calculations.

Author supplied keywords

Cite

CITATION STYLE

APA

Rakholia, R. M., & Saini, J. R. (2017). Information retrieval for Gujarati language using cosine similarity based vector space model. In Advances in Intelligent Systems and Computing (Vol. 516, pp. 1–9). Springer Verlag. https://doi.org/10.1007/978-981-10-3156-4_1

Information retrieval for Gujarati language using cosine similarity based vector space model

Abstract

Author supplied keywords

Cite

Register to see more suggestions