Hierarchical latent tree analysis for topic detection

Tengfei Liu; Nevin L. Zhang; Peixian Chen

Conference ProceedingsOPEN ACCESS

Hierarchical latent tree analysis for topic detection

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8725 LNAI(PART 2) 256-272

DOI: 10.1007/978-3-662-44851-9_17

35Citations

33Readers

Abstract

In the LDA approach to topic detection, a topic is determined by identifying the words that are used with high frequency when writing about the topic. However, high frequency words in one topic may be also used with high frequency in other topics. Thus they may not be the best words to characterize the topic. In this paper, we propose a new method for topic detection, where a topic is determined by identifying words that appear with high frequency in the topic and low frequency in other topics. We model patterns of word co- occurrence and co-occurrences of those patterns using a hierarchy of discrete latent variables. The states of the latent variables represent clusters of documents and they are interpreted as topics. The words that best distinguish a cluster from other clusters are selected to characterize the topic. Empirical results show that the new method yields topics with clearer thematic characterizations than the alternative approaches. © 2014 Springer-Verlag.

Cite

CITATION STYLE

APA

Liu, T., Zhang, N. L., & Chen, P. (2014). Hierarchical latent tree analysis for topic detection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8725 LNAI, pp. 256–272). Springer Verlag. https://doi.org/10.1007/978-3-662-44851-9_17

Hierarchical latent tree analysis for topic detection

Abstract

Cite

Register to see more suggestions