TF·IDF has been widely used as a term weighting schemes in today's information retrieval systems. However, computation time and cost have become major concerns for its application. This study investigated the similarities and differences between IDF distributions based on the global collection and on different samples and tested the stability of the IDF measure across collections. A more efficient algorithm based on random samples generated a good approximation to the IDF computed over the entire collection, but with less computation overhead. This practice may be particularly informative and helpful for analysis on large database or dynamic environment like the Web. © 2008 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Fu, X., & Chen, M. (2008). Exploring the stability of IDF term weighting. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4993 LNCS, pp. 10–21). https://doi.org/10.1007/978-3-540-68636-1_2
Mendeley helps you to discover research relevant for your work.