Incremental structured web database crawling via history versions

Wei Liu; Jianguo Xiao

Conference Proceedings

Incremental structured web database crawling via history versions

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 6488 LNCS 524-533

DOI: 10.1007/978-3-642-17616-6_46

4Citations

5Readers

Get full text

Abstract

Web database crawling is one of the major kinds of design choices solution for Deep Web data integration. To the best of our knowledge, the existing works only focused on how to crawl all records in a web database at one time. Due to the high dynamic of web databases, it is not practical to always crawl the whole database in order to harvest a small proportion of new records. To this end, this paper studies the problem of incremental web database crawling, which targets at crawling the new records from a web database as many as possible while minimizing the communication costs. In our approach, a new graph model, an incremental crawling task is transformed into a graph traversal process. Based on this graph, appropriate queries are generated for crawling by analyzing the history versions of the web database. Extensive experimental evaluations over real Web databases validate the effectiveness of our techniques and provide insights for future efforts in this direction. © 2010 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Liu, W., & Xiao, J. (2010). Incremental structured web database crawling via history versions. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6488 LNCS, pp. 524–533). https://doi.org/10.1007/978-3-642-17616-6_46

Incremental structured web database crawling via history versions

Abstract

Author supplied keywords

Cite

Register to see more suggestions