Online subset topic modeling for interactive documents exploration

0Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Data exploration over text databases is an important problem. In an exploration scenario, users would find something useful without previously knowing what exactly they are looking for, until the time they identify them. Therefore, labor-intensive efforts are often required, since users have to review the overview (or detail) results of ad-hoc queries and adjust the queries (e.g., zoom or filter) continuously. Probabilistic topic models are often adopted as a solution to provide the overview for a given text collection, since it could discover the underlying thematic structures of unstructured text data. However, training a topic model for a selected document collection is time consuming. Moreover, frequent model retraining would be introduced by continuous query-adjusting, which leads to large amount of time wasting and therefore is unsuitable for online exploration. To remedy this problem, this paper presents STMS, an algorithm for constructing topic structures in document subsets efficiently. STMS accelerates the process of subset modeling by leveraging global precomputation and applying an efficient sampling-based inference algorithm. The experiments on real world datasets show that STMS achieves orders of magnitude speed-ups than standard topic model, while remaining comparable in terms of modeling quality.

Cite

CITATION STYLE

APA

Li, L., Wu, Y., Ke, Y., Liu, C., Jing, Y., He, Z., & Wang, X. S. (2018). Online subset topic modeling for interactive documents exploration. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10827 LNCS, pp. 916–923). Springer Verlag. https://doi.org/10.1007/978-3-319-91452-7_59

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free