A Method of K-Means Clustering Based on TF-IDF for Software Requirements Documents Written in Chinese Language

5Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

Nowadays there is no way to automatically obtain the function points when using function point analyze (FPA) method, especially for the requirement documents written in Chinese language. Considering the characteristics of Chinese grammar in words segmentation, it is necessary to divide words accurately Chinese words, so that the subsequent entity recognition and disambiguation can be carried out in a smaller range, which lays a solid foundation for the efficient automatic extraction of the function points. Therefore, this paper proposed a method of K-Means clustering based on TF-IDF, and conducts experiments with 24 software requirement documents written in Chinese language. The results show that the best clustering effect is achieved when the extracted information is retained by 55% to 75% and the number of clusters takes the middle value of the total number of clusters. Not only for Chinese, this method and conclusion of this paper, but provides an important reference for automatic extraction of function points from software requirements documents written in other Oriental languages, and also fills the gaps of data preprocessing in the early stage of automatic calculation function points.

Author supplied keywords

Cite

CITATION STYLE

APA

ZHU, J., HUANG, S., SHI, Y., WU, K., & WANG, Y. (2022). A Method of K-Means Clustering Based on TF-IDF for Software Requirements Documents Written in Chinese Language. IEICE Transactions on Information and Systems, 105(4), 736–754. https://doi.org/10.1587/transinf.2021EDP7144

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free