Improved Text Clustering Using k-Mean Bayesian Vectoriser

8Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In literature studies, high-dimensional data reduces the efficiency of clustering algorithms and maximises execution time. Therefore, in this paper, we propose an approach called a BV-kmeans (Bayesian Vectorisation along with k-means) that aims to improve document representation models for text clustering. This approach consists of integrating the k-means document clustering with the Bayesian Vectoriser that is used to compute the probability distribution of the documents in the vector space in order to overcome the problems of high-dimensional data and lower the consumption time. We have used various similarity measures which are namely: K divergence, Squared Euclidean distance and Squared χ2 distance in order to determine the effective metrics for modelling the similarity between documents with the proposed approach. We have evaluated the proposed approach on a set of common newspaper websites that have highly dimensional data. Experimental results show that the proposed approach can increase the degree to which a cluster encases documents from a specific category by 85%. This is in comparison with the standard k-means algorithm and it has succeeded in lowering the runtime using the proposed approach by 95% compared to the standard k-means algorithm.

Cite

CITATION STYLE

APA

Alghamdi, H. M., Selamat, A., & Karim, N. S. A. (2014). Improved Text Clustering Using k-Mean Bayesian Vectoriser. Journal of Information and Knowledge Management, 13(3). https://doi.org/10.1142/S0219649214500269

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free