We propose a new data model for Web document representation based on granulation computing, named as Expanded Vector Space Model (EVSM). Traditional Web document clustering is based on two-level knowledge granularity: document and term. It can lead to that clustering results are of "false relevant". In our approach, Web documents are represented in many-level knowledge granularity, Knowledge granularity with sufficiently conceptual sentences is beneficial for knowledge engineers to understand valuable relations hidden in data, With granularity calculation data can be more efficiently and effectively disposed of and knowledge engineers can handle the same dataset in different knowledge levels. This provides more reliable soundness for interpreting results of various data analysis methods. We experimentally evaluate the proposed approach and demonstrate that our algorithm is promising and efficient. © Springer-Verlag Berlin Heidelberg 2006.
CITATION STYLE
Huang, F., & Zhang, S. (2006). Clustering web documents based on knowledge granularity. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3841 LNCS, pp. 85–96). Springer Verlag. https://doi.org/10.1007/11610113_9
Mendeley helps you to discover research relevant for your work.