A novel crawling algorithm for web pages

Mohammad Amin Golshani; Vali Derhami; Alimohammad Zarehbidoki

Conference Proceedings

A novel crawling algorithm for web pages

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011) 7097 LNCS 263-272

DOI: 10.1007/978-3-642-25631-8_24

2Citations

5Readers

Get full text

Abstract

Crawler is a main component of search engines. In search engines, crawler part is responsible for discovering and downloading web pages. No search engine can cover whole of the web, thus it has to focus on the most valuable web pages. Several Crawling algorithms like PageRank, OPIC and FICA have been proposed, but they have low throughput. To overcome the problem, we propose a new crawling algorithm, called FICA+ which is easy to implement. In FICA+, importances of pages are determined based on the logarithmic distance and weight of the incoming links. To evaluate FICA+ we use web graph of university of California, Berkeley. Experimental result shows that our algorithm outperforms other crawling algorithms in discovering highly important pages. © 2011 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Golshani, M. A., Derhami, V., & Zarehbidoki, A. (2011). A novel crawling algorithm for web pages. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7097 LNCS, pp. 263–272). https://doi.org/10.1007/978-3-642-25631-8_24

A novel crawling algorithm for web pages

Abstract

Author supplied keywords

Cite

Register to see more suggestions