Distance Measures for Clustering of Documents in a Topic Space

1Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Topic modeling is a method for discovery of topics (groups of words). In this paper we focus on clustering documents in a topic space obtained using the MALLET tool. We tested several different distance measures with two clustering algorithm (spectral clustering, agglomerative hierarchical clustering) and described those that served better (cosine distance, correlation distance, bhattacharyya distance) than the Euclidean metric for k-means algorithm. For evaluation purpose we used Adjusted Mutual Information (AMI) score. The need for such experiments comes from the difficulty of choosing appropriate grouping methods for the given data, which is specific in our case.

Cite

CITATION STYLE

APA

Walkowiak, T., & Gniewkowski, M. (2020). Distance Measures for Clustering of Documents in a Topic Space. In Advances in Intelligent Systems and Computing (Vol. 987, pp. 544–552). Springer Verlag. https://doi.org/10.1007/978-3-030-19501-4_54

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free