NET - A system for extracting web data from flat and nested data records

51Citations
Citations of this article
33Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper studies automatic extraction of structured data from Web pages. Each of such pages may contain several groups of structured data records. Existing automatic methods still have several limitations. In this paper, we propose a more effective method for the task. Given a page, our method first builds a tag tree based on visual information. It then performs a post-order traversal of the tree and matches subtrees in the process using a tree edit distance method and visual cues. After the process ends, data records are found and data items in them are aligned and extracted. The method can extract data from both flat and nested data records. Experimental evaluation shows that the method performs the extraction task accurately. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Liu, B., & Zhai, Y. (2005). NET - A system for extracting web data from flat and nested data records. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3806 LNCS, pp. 487–495). Springer Verlag. https://doi.org/10.1007/11581062_39

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free