Document length normalization using effective level of term frequency in large collections

4Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The effectiveness of the information retrieval systems is largely dependent on term-weighting. Most current term-weighting approaches involve the use of term frequency normalization. We develop here a method to assess the potential role of the term frequency-inverse document frequency measures that are commonly used in text retrieval systems. Since automatic information retrieval systems have to deal with documents of varying sizes and terms of varying frequencies, we carried out preliminary tests to evaluate the effect of term-weighing items on the retrieval performance. With regard to the preliminary tests, we identify a novel factor (effective level of term frequency) that represents the document content based on its length and maximum termfrequency. This factor is used to find the maximum main terms within the documents and an appropriate subset of documents containing the query terms. We show that, all document terms need not be considered for ranking a document with respect to a query. Regarding the result of the experiments, the effective level of term frequency (EL) is a significant factor in retrieving relevant documents, especially in large collections. Experiments were undertaken on TREC collections to evaluate the effectiveness of our proposal. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Karbasi, S., & Boughanem, M. (2006). Document length normalization using effective level of term frequency in large collections. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3936 LNCS, pp. 72–83). Springer Verlag. https://doi.org/10.1007/11735106_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free