Clustering web documents based on knowledge granularity

Faliang Huang; Shichao Zhang

Conference Proceedings

Clustering web documents based on knowledge granularity

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 3841 LNCS 85-96

DOI: 10.1007/11610113_9

3Citations

4Readers

Get full text

Abstract

We propose a new data model for Web document representation based on granulation computing, named as Expanded Vector Space Model (EVSM). Traditional Web document clustering is based on two-level knowledge granularity: document and term. It can lead to that clustering results are of "false relevant". In our approach, Web documents are represented in many-level knowledge granularity, Knowledge granularity with sufficiently conceptual sentences is beneficial for knowledge engineers to understand valuable relations hidden in data, With granularity calculation data can be more efficiently and effectively disposed of and knowledge engineers can handle the same dataset in different knowledge levels. This provides more reliable soundness for interpreting results of various data analysis methods. We experimentally evaluate the proposed approach and demonstrate that our algorithm is promising and efficient. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Huang, F., & Zhang, S. (2006). Clustering web documents based on knowledge granularity. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3841 LNCS, pp. 85–96). Springer Verlag. https://doi.org/10.1007/11610113_9

Clustering web documents based on knowledge granularity

Abstract

Cite

Register to see more suggestions