A methodology is introduced to test the equality of topic distribution between documents of a corpus. This is achieved by using Latent Dirichlet Allocation (LDA) to estimate the topic distributions and the Kullback-Leibler divergence to measure the dissimilarity between the distributions. The testing approach combines Bayesian and frequentist statistics. Since the sampling distribution of the proposed statistics is unknown, a bootstrap test is suggested. The methodology is illustrated using scientific abstracts from the CMStatistics conference.
CITATION STYLE
Kontoghiorghes, L., & Colubi, A. (2023). Testing the Homogeneity of Topic Distribution Between Documents of a Corpus (pp. 248–254). https://doi.org/10.1007/978-3-031-15509-3_33
Mendeley helps you to discover research relevant for your work.