Abstract
Semi-structured data records contained in the Web pages provide useful information for shopping agents and metasearch engines. In this paper, we present a visual segmentation-based data record extraction (VSDR) method to extract data records from those Web pages. VSDR method first segments a Web page into semantic blocks using the spatial closeness and visual resemblance of data records, then neighboring and non-neighboring data records are extracted based on a compress and collapse technique. Experimental results show that unlike the existing methods which only generate good results on their test domains, VSDR is a general data record extraction method that is able to produce quite stable and good results on a wide range of Web pages. © 2007 IEEE.
Cite
CITATION STYLE
Li, L., Liu, Y., Obregon, A., & Weatherston, M. (2007). Visual segmentation-based data record extraction from web documents. In 2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007 (pp. 502–507). https://doi.org/10.1109/IRI.2007.4296670
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.