A K-means Text Clustering Algorithm Based on Subject Feature Vector

4Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

As one of the most popular clustering algorithms, k-means is easily influenced by initial points and the number of clusters, besides, the iterative class center calculated by the mean of all points in a cluster is one of the reasons influencing clustering performance. Representational initial points are selected in this paper according to the decision graph composed by local density and distance of each point. Then we propose an improved k-means text clustering algorithm, the iterative class center of the improved algorithm is composed by subject feature vector which can avoid the influence caused by noises. Experiments show that the initial points are selected successfully and the clustering results improve 3%, 5%, 2% and 7% respectively than traditional k-means clustering algorithm on four experimental corpuses of Fudan and Sougou.

Cite

CITATION STYLE

APA

Duo, J., Zhang, P., & Hao, L. (2021). A K-means Text Clustering Algorithm Based on Subject Feature Vector. Journal of Web Engineering, 20(6), 1935–1946. https://doi.org/10.13052/jwe1540-9589.20612

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free