WebSelF: A web scraping framework

6Citations
Citations of this article
37Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We present WebSelF, a framework for web scraping which models the process of web scraping and decomposes it into four conceptually independent, reusable, and composable constituents. We have validated our framework through a full parameterized implementation that is flexible enough to capture previous work on web scraping. We conducted an experiment that evaluated several qualitatively different web scraping constituents (including previous work and combinations hereof) on about 11,000 HTML pages on daily versions of 17 web sites over a period of more than one year. Our framework solves three concrete problems with current web scraping and our experimental results indicate that composition of previous and our new techniques achieve a higher degree of accuracy, precision and specificity than existing techniques alone. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Thomsen, J. G., Ernst, E., Brabrand, C., & Schwartzbach, M. (2012). WebSelF: A web scraping framework. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7387 LNCS, pp. 347–361). https://doi.org/10.1007/978-3-642-31753-8_28

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free