Abstract—In order to effectively retrieve required information from the large amount of information collected from the Internet, document clustering in text mining becomes a popular research topic. Clustering is the unsupervised classification of data items into groups without the need of training data. Many conventional document clustering methods perform inefficiently for large document of collected information and require special handling for high dimensionality and high volume. We propose the OCFI (Ontology and Closed Frequent Itemset-based Hierarchical Clustering) method, which is a hierarchical clustering method developed for document clustering. OCFI uses common words to cluster documents and builds hierarchical topic tree. In addition, OCFI utilizes ontology to solve the semantic problem and mine the meaning behind the words in documents. Furthermore, we use the closed frequent itemsets instead of only use frequent itemsets, which increases efficiency and scalability. The experimental results reveal that our method is more effective than the well-known document clustering algorithms. The clustering results can be used in the personalized search service to assist users to obtain the information they need.
CITATION STYLE
Lee, C.-J., Hsu, C.-C., & Chen, D.-R. (2017). A Hierarchical Document Clustering Approach with Frequent Itemsets. International Journal of Engineering and Technology, 9(2), 174–178. https://doi.org/10.7763/ijet.2017.v9.965
Mendeley helps you to discover research relevant for your work.