Learning robust web wrappers

6Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

A main challenge in wrapping web data is to make wrappers robust w.r.t. variations in HTML sources, reducing human effort as much as possible. In this paper we develop a new approach to speed up the specification of robust wrappers, allowing the wrapper designer to not care about detailed definition of extraction rules. The key-idea is to enable a schema-based wrapping system to automatically generalize an original wrapper w.r.t. a set of example HTML documents. To accomplish this objective, we propose to exploit the notions of extraction rule and wrapper subsumption for computing a most general wrapper which still shares the extraction schema with the original wrapper, while maximizes the generalization of extraction rules w.r.t. the set of example documents. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Fazzinga, B., Flesca, S., & Tagarelli, A. (2005). Learning robust web wrappers. In Lecture Notes in Computer Science (Vol. 3588, pp. 736–745). Springer Verlag. https://doi.org/10.1007/11546924_72

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free