Learning robust web wrappers

B. Fazzinga; S. Flesca; A. Tagarelli

Conference Proceedings

Learning robust web wrappers

Lecture Notes in Computer Science (2005) 3588 736-745

DOI: 10.1007/11546924_72

6Citations

6Readers

Get full text

Abstract

A main challenge in wrapping web data is to make wrappers robust w.r.t. variations in HTML sources, reducing human effort as much as possible. In this paper we develop a new approach to speed up the specification of robust wrappers, allowing the wrapper designer to not care about detailed definition of extraction rules. The key-idea is to enable a schema-based wrapping system to automatically generalize an original wrapper w.r.t. a set of example HTML documents. To accomplish this objective, we propose to exploit the notions of extraction rule and wrapper subsumption for computing a most general wrapper which still shares the extraction schema with the original wrapper, while maximizes the generalization of extraction rules w.r.t. the set of example documents. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Fazzinga, B., Flesca, S., & Tagarelli, A. (2005). Learning robust web wrappers. In Lecture Notes in Computer Science (Vol. 3588, pp. 736–745). Springer Verlag. https://doi.org/10.1007/11546924_72

Learning robust web wrappers

Abstract

Cite

Register to see more suggestions