Abstract
Traditional topic modeling approaches generally rely on document-term co-occurrence statistics to find latent topics in a collection of documents. However, relying only on such statistics can yield incoherent or hard to interpret results for the end-users in many applications where the interest lies in interpreting the resulting topics (e.g. labeling documents, comparing corpora, guiding content exploration, etc.). In this work, we propose to leverage external common sense knowledge, i.e. information from the real world beyond word co-occurrence, to find topics that are more coherent and more easily interpretable by humans. We introduce the Common Sense Topic Model (CSTM), a novel and efficient approach that augments clustering with knowledge extracted from the ConceptNet knowledge graph. We evaluate this approach on several datasets alongside commonly used models using both automatic and human evaluation, and we show how it shows superior affinity to human judgement. The code for the experiments as well as the training data and human evaluation are available at https://github.com/D2KLab/CSTM.
Author supplied keywords
Cite
CITATION STYLE
Harrando, I., & Troncy, R. (2021). Discovering Interpretable Topics by Leveraging Common Sense Knowledge. In K-CAP 2021 - Proceedings of the 11th Knowledge Capture Conference (pp. 265–268). Association for Computing Machinery, Inc. https://doi.org/10.1145/3460210.3493586
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.