Crawling web pages with support for client-side dynamism

9Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

There is a great amount of information on the web that can not be accessed by conventional crawler engines. This portion of the web is usually known as the Hidden Web. To be able to deal with this problem, it is necessary to solve two tasks: crawling the client-side and crawling the server-side hidden web. In this paper we present an architecture and a set of related techniques for accessing the information placed in web pages with support for client-side dynamism, dealing with aspects such as JavaScript technology, non-standard session maintenance mechanisms, client redirections, pop-up menus, etc. Our approach leverages current browser APIs and implements novel crawling models and algorithms. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Álvarez, M., Pan, A., Raposo, J., & Hidalgo, J. (2006). Crawling web pages with support for client-side dynamism. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4016 LNCS, pp. 252–262). Springer Verlag. https://doi.org/10.1007/11775300_22

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free