A Theory of Term Weighting Based on Exploratory Data Analysis

3Citations
Citations of this article
66Readers
Mendeley users who have this article in their library.

Abstract

Techniques of exploratory data analysis are used to study the weight of evidence that the occurrence of a query term provides in support of the hypothesis that a document is relevant to an information need. In particular, the relationship between the document frequency and the weight of evidence is investigated. A correlation between document frequency normalized by collection size and the mutual information between relevance and term occurrence is uncovered. This correlation is found to be robust across a variety of query sets and document collections. Based on this relationship, a theoretical explanation of the efficacy of inverse document frequency for term weighting is developed which differs in both style and content from theories previously put forth. The theory predicts that a "flattening"of idf at both low and high frequency should result in improved retrieval performance. This altered idf formulation is tested on all TREC query sets. Retrieval results corroborate the prediction of improved retrieval performance. In conclusion, we argue that exploratory data analysis can be a valuable tool for research whose goal is the development of an explanatory theory of information retrieval.

Cite

CITATION STYLE

APA

Greiff, W. R. (1998). A Theory of Term Weighting Based on Exploratory Data Analysis. In SIGIR 1998 - Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 11–19). Association for Computing Machinery, Inc. https://doi.org/10.1145/290941.290948

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free