MPTM: A topic model for multi-part documents

Zhipeng Xie; Liyang Jiang; Tengju Ye; Zhenying He

Conference Proceedings

MPTM: A topic model for multi-part documents

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9050 154-168

DOI: 10.1007/978-3-319-18123-3_10

1Citations

4Readers

Get full text

Abstract

Topic models have been successfully applied to uncover hidden probabilistic structures in collections of documents, where documents are treated as unstructured texts. However, it is not uncommon that some documents, which we call multi-part documents, are composed of multiple named parts. To exploit the information buried in the document-part relationships in the process of topic modeling, this paper adopts two assumptions: the first is that all parts in a given document should have similar topic distributions, and the second is that the multiple versions (corresponding to multiple named parts) of a given topic should have similar word distributions. Based on these two underlying assumptions, we propose a novel topic model for multi-part documents, called Multi-Part Topic Model (or MPTM in short), and develop its construction and inference method with the aid of the techniques of collapsed Gibbs sampling and maximum likelihood estimation. Experimental results on real datasets demonstrate that our approach has not only achieved significant improvement on the qualities of discovered topics, but also boosted the performance in information retrieval and document classification.

Author supplied keywords

Cite

CITATION STYLE

APA

Xie, Z., Jiang, L., Ye, T., & He, Z. (2015). MPTM: A topic model for multi-part documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9050, pp. 154–168). Springer Verlag. https://doi.org/10.1007/978-3-319-18123-3_10

MPTM: A topic model for multi-part documents

Abstract

Author supplied keywords

Cite

Register to see more suggestions