Maintaining topic models for growing corpora

3Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

A reference library can be described as a corpus of an individual composition of documents. Over time, the corpus might grow because an agent decides to extend its corpus with additional documents, e.g., new publications, or new articles. Existing approaches use topic modelling techniques to compare documents with each other within the same corpus by the documents' topic distribution. However, for new documents, only the text, and no topic distribution is available. Thus, this paper describes three techniques for estimating topic distributions of new unseen documents considering the initial documents in a corpus. Additionally, we present an extensive evaluation about the performance and runtime of the three topic modelling techniques for various scenarios and different sized corpora.

Cite

CITATION STYLE

APA

Kuhr, F., Bender, M., Braun, T., & Moller, R. (2020). Maintaining topic models for growing corpora. In Proceedings - 14th IEEE International Conference on Semantic Computing, ICSC 2020 (pp. 451–458). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/ICSC.2020.00087

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free