A huffman tree-based algorithm for clustering documents

0Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Text information processing is one of the important topics in data mining. It involves the techniques of statistics, machine learning, pattern recognition etc. In the age of big data, a huge amount of text data has been accumulated. At present, the most effective text processing way is classifying them before mining. Therefore, it has attracted great interests of scholars and researchers, and many constructive results have been achieved. But along with the increasing of training samples, the shortages of techniques and limits of their application have appeared gradually. In this paper, we propose a new strategy for classifying documents based on Huffman tree. Firstly, we find out all the candidate classifications by generating a Huffman tree, and then we design a quality measure to select the final classification. Our experiment results show that the proposed algorithm is effective and feasible.

Cite

CITATION STYLE

APA

Liu, Y., Wen, Y., Yuan, D., & Cuan, Y. (2014). A huffman tree-based algorithm for clustering documents. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8933, 630–640. https://doi.org/10.1007/978-3-319-14717-8_49

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free