Web document clustering using semantic link analysis

Somjit Arch-int

Conference Proceedings

Web document clustering using semantic link analysis

Arch-int S

Proceedings - International Conference on Computational Intelligence for Modelling, Control and Automation, CIMCA 2005 and International Conference on Intelligent Agents, Web Technologies and Internet (2005) 2 13-18

DOI: 10.1109/cimca.2005.1631438

2Citations

12Readers

Get full text

Abstract

Searching and discovering the relevant information on the Web have always been challenging research areas. Web Document Clustering is a promising technique in preparing a huge collection of Web documents suitable for Web search engines. This paper proposes a semantic document clustering approach to categorize Web documents in a semantic manner. First, the formal methods and algorithms are introduced as techniques for document extraction and clustering. The approach incorporates WordNet and ontology knowledge as the assistant mechanisms such that the resulting set of concepts are thus utilized as formal representation for extracted documents. As a consequence, the semantic-based clusters are finally determined the cluster scores. Next, the semantic-based link analysis method is also proposed for clustering Web documents into semantic clusters that are scored based on the notion of semantic-based concepts and documents. Finally, these document scores are subsequently used for evaluating the semantic document similarity and document quality. As such, the precision criterion is employed for efficient evaluations by comparing with keywords-based search method. The experimental results reported that the proposed method was able to outperform the TF/IDF method up to 9% on average. © 2005 IEEE.

Cite

CITATION STYLE

APA

Arch-int, S. (2005). Web document clustering using semantic link analysis. In Proceedings - International Conference on Computational Intelligence for Modelling, Control and Automation, CIMCA 2005 and International Conference on Intelligent Agents, Web Technologies and Internet (Vol. 2, pp. 13–18). https://doi.org/10.1109/cimca.2005.1631438

Web document clustering using semantic link analysis

Abstract

Cite

Register to see more suggestions