ChainMR crawler: A distributed vertical crawler based on mapreduce

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

With the explosive growth of data in the Internet, the single vertical crawler cannot meet the requirements of the high performance of the crawler. The existing distributed vertical crawlers also have the problem of weak capability of customization. In order to solve the above problem, this paper proposes a distributed vertical crawler named ChainMR Crawler. We adopt ChainMapper/Chain‐ Reducer model to design each module of the crawler, use Redis to manage URLs and choose the distributed database Hbase to store the key content of web pages. Experimental results demonstrate that the efficiency of ChainMR Crawler is 6 % higher than Nutch in the field of vertical crawler, which achieves the expected effect.

Cite

CITATION STYLE

APA

Liu, X., & Jin, Z. (2016). ChainMR crawler: A distributed vertical crawler based on mapreduce. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10067 LNCS, pp. 33–39). Springer Verlag. https://doi.org/10.1007/978-3-319-49145-5_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free