Visual segmentation-based data record extraction from web documents

29Citations
Citations of this article
21Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Semi-structured data records contained in the Web pages provide useful information for shopping agents and metasearch engines. In this paper, we present a visual segmentation-based data record extraction (VSDR) method to extract data records from those Web pages. VSDR method first segments a Web page into semantic blocks using the spatial closeness and visual resemblance of data records, then neighboring and non-neighboring data records are extracted based on a compress and collapse technique. Experimental results show that unlike the existing methods which only generate good results on their test domains, VSDR is a general data record extraction method that is able to produce quite stable and good results on a wide range of Web pages. © 2007 IEEE.

Cite

CITATION STYLE

APA

Li, L., Liu, Y., Obregon, A., & Weatherston, M. (2007). Visual segmentation-based data record extraction from web documents. In 2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007 (pp. 502–507). https://doi.org/10.1109/IRI.2007.4296670

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free