Abstract
Searching and discovering the relevant information on the Web have always been challenging research areas. Web Document Clustering is a promising technique in preparing a huge collection of Web documents suitable for Web search engines. This paper proposes a semantic document clustering approach to categorize Web documents in a semantic manner. First, the formal methods and algorithms are introduced as techniques for document extraction and clustering. The approach incorporates WordNet and ontology knowledge as the assistant mechanisms such that the resulting set of concepts are thus utilized as formal representation for extracted documents. As a consequence, the semantic-based clusters are finally determined the cluster scores. Next, the semantic-based link analysis method is also proposed for clustering Web documents into semantic clusters that are scored based on the notion of semantic-based concepts and documents. Finally, these document scores are subsequently used for evaluating the semantic document similarity and document quality. As such, the precision criterion is employed for efficient evaluations by comparing with keywords-based search method. The experimental results reported that the proposed method was able to outperform the TF/IDF method up to 9% on average. © 2005 IEEE.
Cite
CITATION STYLE
Arch-int, S. (2005). Web document clustering using semantic link analysis. In Proceedings - International Conference on Computational Intelligence for Modelling, Control and Automation, CIMCA 2005 and International Conference on Intelligent Agents, Web Technologies and Internet (Vol. 2, pp. 13–18). https://doi.org/10.1109/cimca.2005.1631438
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.