A novel crawling algorithm for web pages

2Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Crawler is a main component of search engines. In search engines, crawler part is responsible for discovering and downloading web pages. No search engine can cover whole of the web, thus it has to focus on the most valuable web pages. Several Crawling algorithms like PageRank, OPIC and FICA have been proposed, but they have low throughput. To overcome the problem, we propose a new crawling algorithm, called FICA+ which is easy to implement. In FICA+, importances of pages are determined based on the logarithmic distance and weight of the incoming links. To evaluate FICA+ we use web graph of university of California, Berkeley. Experimental result shows that our algorithm outperforms other crawling algorithms in discovering highly important pages. © 2011 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Golshani, M. A., Derhami, V., & Zarehbidoki, A. (2011). A novel crawling algorithm for web pages. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7097 LNCS, pp. 263–272). https://doi.org/10.1007/978-3-642-25631-8_24

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free