Approximately repetitive structure detection for wrapper induction

5Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In recent years, much work has been invested into automatically learning wrappers for information extraction from HTML tables and lists. Our research has focused on a system that can learn a wrapper from a single unlabelled page. An essential step is to locate the tabular data within the page. This is not trivial when the structures of data tuples are similar but not identical. In this paper we describe an algorithm that can automatically detect approximate repetitive structures within one sequence. The algorithm does not rely on any domain knowledge or HTML heuristics and it can be used in detecting repetitive patterns and hence to learn wrappers from a single unlabeled tabular page. © Springer-Verlag Berlin Heidelberg 2004.

Cite

CITATION STYLE

APA

Gao, X., Andreae, P., & Collins, R. (2004). Approximately repetitive structure detection for wrapper induction. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 3157, pp. 585–594). Springer Verlag. https://doi.org/10.1007/978-3-540-28633-2_62

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free