There are some problems in automatic keyword extraction of Chinese text, such as large feature extraction error, low precision of extracted keywords, and poor real-time performance. Therefore, an automatic keyword extraction algorithm for Chinese text based on word clustering is designed. Calculate keyword frequency, document frequency and inverse document frequency features through statistical algorithm, measure the degree of interdependence between keywords with the help of point mutual information, and construct keyword feature item quantification matrix with the help of vector space model corresponding to keywords and feature items to complete keyword feature quantification and realize keyword feature extraction of Chinese text. Calculate the average semantic similarity of keyword words, determine the similarity of keyword features, and eliminate the keyword features with high similarity; Set the comprehensive feature value of the importance of single word words in Chinese text, determine the importance of single word words in the text, remove the single word words with low importance, and use Bayesian framework to reduce the dimension of high-dimensional keyword feature data to realize preprocessing research. The mapping results of keyword vector space model are determined by word clustering algorithm, the text clusters of keyword space clustering results are calculated by clustering algorithm, and the keywords are classified by DBN method. On this basis, the automatic keyword extraction model of Chinese text is designed to realize the automatic keyword extraction of Chinese text. The experimental results show that the design algorithm can effectively reduce the feature extraction error and improve the extraction efficiency.
CITATION STYLE
Pan, R. (2023). Automatic Keyword Extraction Algorithm for Chinese Text based on Word Clustering. ACM Transactions on Asian and Low-Resource Language Information Processing. https://doi.org/10.1145/3592793
Mendeley helps you to discover research relevant for your work.