Deriving TF-IDF as a Fisher kernel

18Citations
Citations of this article
60Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The Dirichlet compound multinomial (DCM) distribution has recently been shown to be a good model for documents because it captures the phenomenon of word burstiness, unlike standard models such as the multinomial distribution. This paper investigates the DCM Fisher kernel, a function for comparing documents derived from the DCM. We show that the DCM Fisher kernel has components that are similar to the term frequency (TF) and inverse document frequency (IDF) factors of the standard TF-IDF method for representing documents. Experiments show that the DCM Fisher kernel performs better than alternative kernels for nearest-neighbor document classification, but that the TF-IDF representation still performs best. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Elkan, C. (2005). Deriving TF-IDF as a Fisher kernel. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3772 LNCS, pp. 295–300). https://doi.org/10.1007/11575832_33

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free