CITPM: A cluster-based iterative topical phrase mining framework

13Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

A phrase is a natural, meaningful, essential semantic unit. In topic modeling, visualizing phrases for individual topics is an effective way to explore and understand unstructured text corpora. Unfortunately, existing approaches predominately rely on the general distributional features between topics and phrases on an entire corpus, while ignore the impact of domain-level topical distribution. This often leads to losing domain-specific terminologies, and as a consequence, weakens the coherence of topical phrases. In this paper, we present a novel framework CITPM for topical phrase mining. Our framework views a corpus as a mixture of clusters (domains), and each cluster is characterized by documents sharing similar topical distributions. The CITPM framework iteratively performs phrase mining, topical inferring and cluster updating until a satisfactory final result is obtained. The empirical verification demonstrates our framework outperforms state-of-the-art works in both aspects of interpretability and efficiency.

Cite

CITATION STYLE

APA

Li, B., Wang, B., Zhou, R., Yang, X., & Liu, C. (2016). CITPM: A cluster-based iterative topical phrase mining framework. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9642, pp. 197–213). Springer Verlag. https://doi.org/10.1007/978-3-319-32025-0_13

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free