Web-site boundary detection

Ayesh Alshukri; Frans Coenen; Michele Zito

Conference Proceedings

Web-site boundary detection

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 6171 LNAI 529-543

DOI: 10.1007/978-3-642-14400-4_41

5Citations

5Readers

Get full text

Abstract

Defining the boundaries of a web-site, for (say) archiving or information retrieval purposes, is an important but complicated task. In this paper a web-page clustering approach to boundary detection is suggested. The principal issue is feature selection, hampered by the observation that there is no clear understanding of what a web-site is. This paper proposes a definition of a web-site, founded on the principle of user intention, directed at the boundary detection problem; and then reports on a sequence of experiments, using a number of clustering techniques, and a wide range of features and combinations of features to identify web-site boundaries. The preliminary results reported seem to indicate that, in general, a combination of features produces the most appropriate result. © 2010 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Alshukri, A., Coenen, F., & Zito, M. (2010). Web-site boundary detection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6171 LNAI, pp. 529–543). https://doi.org/10.1007/978-3-642-14400-4_41

Web-site boundary detection

Abstract

Author supplied keywords

Cite

Register to see more suggestions