We propose a simple statistical model for the frequency of occurrence of features in a stream of text. Adoption of this model allows us to use classical significance tests to filter the stream for interesting events. We tested the model by building a system and running it on a news corpus. By a subjective evaluation, the system worked remarkably well: almost all of the groups of identified tokens corresponded to news stories and were appropriately placed in time. A preliminary objective evaluation was also used to measure the quality of the system and it showed some of the weaknesses and the power of our approach.
CITATION STYLE
Swan, R., & Allan, J. (1999). Extracting significant time varying features from text. In International Conference on Information and Knowledge Management, Proceedings (pp. 38–45). ACM. https://doi.org/10.1145/319950.319956
Mendeley helps you to discover research relevant for your work.