As one of the most popular clustering algorithms, k-means is easily influenced by initial points and the number of clusters, besides, the iterative class center calculated by the mean of all points in a cluster is one of the reasons influencing clustering performance. Representational initial points are selected in this paper according to the decision graph composed by local density and distance of each point. Then we propose an improved k-means text clustering algorithm, the iterative class center of the improved algorithm is composed by subject feature vector which can avoid the influence caused by noises. Experiments show that the initial points are selected successfully and the clustering results improve 3%, 5%, 2% and 7% respectively than traditional k-means clustering algorithm on four experimental corpuses of Fudan and Sougou.
CITATION STYLE
Duo, J., Zhang, P., & Hao, L. (2021). A K-means Text Clustering Algorithm Based on Subject Feature Vector. Journal of Web Engineering, 20(6), 1935–1946. https://doi.org/10.13052/jwe1540-9589.20612
Mendeley helps you to discover research relevant for your work.