In this paper, we propose a clustering-based online news topic detection and tracking (TDT) approach based on hierarchical Bayesian nonparametric framework that allows topics to be shared across different news stories in a corpus. Our approach is formulated using the hierarchical Pitman-Yor process mixture model with the inverted Beta-Liouville (IBL) distribution as its component density, which has shown superior performance in modeling text data than the widely used Gaussian distribution. Moreover, we theoretically develop a convergence-guaranteed online learning algorithm that can effectively learn the proposed TDT model from a stream of news stories based on varational Bayes. The merits of our TDT approach are illustrated by comparing it with other well-defined clustering-based TDT approaches on different news data sets.
CITATION STYLE
Fan, W., Guo, Z., Bouguila, N., & Hou, W. (2021). Clustering-Based Online News Topic Detection and Tracking through Hierarchical Bayesian Nonparametric Models. In SIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 2126–2130). Association for Computing Machinery, Inc. https://doi.org/10.1145/3404835.3462982
Mendeley helps you to discover research relevant for your work.