Revisiting lexical signatures to (re-)discover web pages

18Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

A lexical signature (LS) is a small set of terms derived from a document that capture the "aboutness" of that document. A LS generated from a web page can be used to discover that page at a different URL as well as to find relevant pages in the Internet. From a set of randomly selected URLs we took all their copies from the Internet Archive between 1996 and 2007 and generated their LSs. We conducted an overlap analysis of terms in all LSs and found only small overlaps in the early years (1996∈-∈2000) but increasing numbers in the more recent past (from 2003 on). We measured the performance of all LSs in dependence of the number of terms they consist of. We found that LSs created more recently perform better than early LSs created between 1996 and 2000. All LSs created from year 2000 on show a similar pattern in their performance curve. Our results show that 5-, 6- and 7-term LSs perform best with returning the URLs of interest in the top ten of the result set. In about 50% of all cases these URLs are returned as the number one result and in 30% of all times we considered the URLs as not discoved. © 2008 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Klein, M., & Nelson, M. L. (2008). Revisiting lexical signatures to (re-)discover web pages. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5173 LNCS, pp. 371–382). https://doi.org/10.1007/978-3-540-87599-4_38

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free