Text information processing is one of the important topics in data mining. It involves the techniques of statistics, machine learning, pattern recognition etc. In the age of big data, a huge amount of text data has been accumulated. At present, the most effective text processing way is classifying them before mining. Therefore, it has attracted great interests of scholars and researchers, and many constructive results have been achieved. But along with the increasing of training samples, the shortages of techniques and limits of their application have appeared gradually. In this paper, we propose a new strategy for classifying documents based on Huffman tree. Firstly, we find out all the candidate classifications by generating a Huffman tree, and then we design a quality measure to select the final classification. Our experiment results show that the proposed algorithm is effective and feasible.
CITATION STYLE
Liu, Y., Wen, Y., Yuan, D., & Cuan, Y. (2014). A huffman tree-based algorithm for clustering documents. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8933, 630–640. https://doi.org/10.1007/978-3-319-14717-8_49
Mendeley helps you to discover research relevant for your work.