Exploring the stability of IDF term weighting

Xin Fu; Miao Chen

Conference Proceedings

Exploring the stability of IDF term weighting

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2008) 4993 LNCS 10-21

DOI: 10.1007/978-3-540-68636-1_2

2Citations

9Readers

Get full text

Abstract

TF·IDF has been widely used as a term weighting schemes in today's information retrieval systems. However, computation time and cost have become major concerns for its application. This study investigated the similarities and differences between IDF distributions based on the global collection and on different samples and tested the stability of the IDF measure across collections. A more efficient algorithm based on random samples generated a good approximation to the IDF computed over the entire collection, but with less computation overhead. This practice may be particularly informative and helpful for analysis on large database or dynamic environment like the Web. © 2008 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Fu, X., & Chen, M. (2008). Exploring the stability of IDF term weighting. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4993 LNCS, pp. 10–21). https://doi.org/10.1007/978-3-540-68636-1_2

Exploring the stability of IDF term weighting

Abstract

Author supplied keywords

Cite

Register to see more suggestions