Approximately repetitive structure detection for wrapper induction

Xiaoying Gao; Peter Andreae; Richard Collins

Conference Proceedings

Approximately repetitive structure detection for wrapper induction

Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (2004) 3157 585-594

DOI: 10.1007/978-3-540-28633-2_62

5Citations

2Readers

Get full text

Abstract

In recent years, much work has been invested into automatically learning wrappers for information extraction from HTML tables and lists. Our research has focused on a system that can learn a wrapper from a single unlabelled page. An essential step is to locate the tabular data within the page. This is not trivial when the structures of data tuples are similar but not identical. In this paper we describe an algorithm that can automatically detect approximate repetitive structures within one sequence. The algorithm does not rely on any domain knowledge or HTML heuristics and it can be used in detecting repetitive patterns and hence to learn wrappers from a single unlabeled tabular page. © Springer-Verlag Berlin Heidelberg 2004.

Cite

CITATION STYLE

APA

Gao, X., Andreae, P., & Collins, R. (2004). Approximately repetitive structure detection for wrapper induction. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 3157, pp. 585–594). Springer Verlag. https://doi.org/10.1007/978-3-540-28633-2_62

Approximately repetitive structure detection for wrapper induction

Abstract

Cite

Register to see more suggestions