Factors affecting Web page similarity

25Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Tools that allow effective information organisation, access and navigation are becoming increasingly important on the Web. Similarity between web pages is a concept that is central to such tools. In this paper, we examine the effect that content and layout-related aspects of web pages have on web page similarity. We consider the textual content contained within common HTML tags, the structural layout of pages, and the query terms contained within pages. Our study shows that combinations of factors can yield more promising results than individual factors, and that different aspects of web pages affect similarities between pages in a different manner. We found a number of factors that, when taken into account, can result in effective measures of similarity between web pages. Query information in particular, proved to be important for the effective organisation of web pages. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Tombros, A., & Ali, Z. (2005). Factors affecting Web page similarity. In Lecture Notes in Computer Science (Vol. 3408, pp. 487–501). Springer Verlag. https://doi.org/10.1007/978-3-540-31865-1_35

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free