This paper presents Nautilus, which is a generic framework for crawling deep Web. We provide an abstraction of deep Web crawling process and mechanism of integrating heterogeneous business modules. A Federal Decentralized Architecture is proposed to ensemble advantages of existed P2P networking architectures. We also present effective policies to schedule crawling tasks. Experimental results show our scheduling policies have good performance on load-balance and overall throughput.
CITATION STYLE
Zhao, J., & Wang, P. (2012). Nautilus: A generic framework for crawling deep web. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7696, pp. 141–151). Springer Verlag. https://doi.org/10.1007/978-3-642-34679-8_14
Mendeley helps you to discover research relevant for your work.