Defining the boundaries of a web-site, for (say) archiving or information retrieval purposes, is an important but complicated task. In this paper a web-page clustering approach to boundary detection is suggested. The principal issue is feature selection, hampered by the observation that there is no clear understanding of what a web-site is. This paper proposes a definition of a web-site, founded on the principle of user intention, directed at the boundary detection problem; and then reports on a sequence of experiments, using a number of clustering techniques, and a wide range of features and combinations of features to identify web-site boundaries. The preliminary results reported seem to indicate that, in general, a combination of features produces the most appropriate result. © 2010 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Alshukri, A., Coenen, F., & Zito, M. (2010). Web-site boundary detection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6171 LNAI, pp. 529–543). https://doi.org/10.1007/978-3-642-14400-4_41
Mendeley helps you to discover research relevant for your work.