An unsupervised technique to extract information from semi-structured web pages

7Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We propose a technique that takes two or more web pages generated by the same server-side template and tries to learn a regular expression that represents it and helps extract relevant information from similar pages. Our experimental results on real-world web sites demonstrate that our technique outperforms others in terms of both effectiveness and efficiency and is not affected by HTML errors. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Sleiman, H. A., & Corchuelo, R. (2012). An unsupervised technique to extract information from semi-structured web pages. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7651 LNCS, pp. 631–637). https://doi.org/10.1007/978-3-642-35063-4_46

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free