Update Frequency and Background Corpus Selection in Dynamic TF-IDF Models for First Story Detection

0Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

First Story Detection (FSD) requires a system to detect the very first story that mentions an event from a stream of stories. Nearest neighbour-based models, using the traditional term vector document representations like TF-IDF, currently achieve the state of the art in FSD. Because of its online nature, a dynamic term vector model that is incrementally updated during the detection process is usually adopted for FSD instead of a static model. However, very little research has investigated the selection of hyper-parameters and the background corpora for a dynamic model. In this paper, we analyse how a dynamic term vector model works for FSD, and investigate the impact of different update frequencies and background corpora on FSD performance. Our results show that dynamic models with high update frequencies outperform static model and dynamic models with low update frequencies; and that the FSD performance of dynamic models does not always increase with higher update frequencies, but instead reaches steady state after some update frequency threshold is reached. In addition, we demonstrate that different background corpora have very limited influence on the dynamic models with high update frequencies in terms of FSD performance.

Cite

CITATION STYLE

APA

Wang, F., Ross, R. J., & Kelleher, J. D. (2020). Update Frequency and Background Corpus Selection in Dynamic TF-IDF Models for First Story Detection. In Communications in Computer and Information Science (Vol. 1215 CCIS, pp. 206–217). Springer. https://doi.org/10.1007/978-981-15-6168-9_18

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free