A Guided Topic-Noise Model for Short Texts

Robert Churchill; Lisa Singh; Rebecca Ryan; Pamela Davis-Kean

Conference ProceedingsOPEN ACCESS

A Guided Topic-Noise Model for Short Texts

WWW 2022 - Proceedings of the ACM Web Conference 2022 (2022) 2870-2878

DOI: 10.1145/3485447.3512007

5Citations

6Readers

Get full text

Abstract

Researchers using social media data want to understand the discussions occurring in and about their respective fields. These domain experts often turn to topic models to help them see the entire landscape of the conversation, but unsupervised topic models often produce topic sets that miss topics experts expect or want to see. To solve this problem, we propose Guided Topic-Noise Model (GTM), a semi-supervised topic model designed with large domain-specific social media data sets in mind. The input to GTM is a set of topics that are of interest to the user and a small number of words or phrases that belong to those topics. These seed topics are used to guide the topic generation process, and can be augmented interactively, expanding the seed word list as the model provides new relevant words for different topics. GTM uses a novel initialization and a new sampling algorithm called Generalized Polya Urn (GPU) seed word sampling to produce a topic set that includes expanded seed topics, as well as new unsupervised topics. We demonstrate the robustness of GTM on open-ended responses from a public opinion survey and four domain-specific Twitter data sets.

Author supplied keywords

Cite

CITATION STYLE

APA

Churchill, R., Singh, L., Ryan, R., & Davis-Kean, P. (2022). A Guided Topic-Noise Model for Short Texts. In WWW 2022 - Proceedings of the ACM Web Conference 2022 (pp. 2870–2878). Association for Computing Machinery, Inc. https://doi.org/10.1145/3485447.3512007

A Guided Topic-Noise Model for Short Texts

Abstract

Author supplied keywords

Cite

Register to see more suggestions