Current challenges in web crawling

Denis Shestakov

Conference ProceedingsOPEN ACCESS

Current challenges in web crawling

Shestakov D

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 7977 LNCS 518-521

DOI: 10.1007/978-3-642-39200-9_49

3Citations

31Readers

Abstract

Web crawling, a process of collecting web pages in an automated manner, is the primary and ubiquitous operation used by a large number of web systems and agents starting from a simple program for website backup to a major web search engine. Due to an astronomical amount of data already published on the Web and ongoing exponential growth of web content, any party that want to take advantage of massive-scale web data faces a high barrier to entry. In this tutorial, we will introduce the audience to five topics: architecture and implementation of high-performance web crawler, collaborative web crawling, crawling the deep Web, crawling multimedia content and future directions in web crawling research. © 2013 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Shestakov, D. (2013). Current challenges in web crawling. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7977 LNCS, pp. 518–521). https://doi.org/10.1007/978-3-642-39200-9_49

Current challenges in web crawling

Abstract

Author supplied keywords

Cite

Register to see more suggestions