Low frequency words tend to be rich in content, and vice versa. But not all equally frequent words are equally mean!ngful. We will use inverse document frequency (IDF), a quantity borrowed from Information Retrieval, to distinguish words like somewhat and boycott. Both somewhat and boycott appeared approximately 1000 times in a corpus of 1989 Associated Press articles, but boycott is a better keyword because its IDF is farther from what would be expected by chance (Poisson).
CITATION STYLE
Church, K., & Gale, W. (1999). Inverse Document Frequency (IDF): A Measure of Deviations from Poisson (pp. 283–295). https://doi.org/10.1007/978-94-017-2390-9_18
Mendeley helps you to discover research relevant for your work.