Extracting data records from query result pages based on visual features

7Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Web databases contain a large amount of structured data which are accessible via their query interfaces only. Query results are presented in dynamically generated web pages, usually in the form of data records, for human use. The problem of automatically extracting data records from query result pages is critical for web data integration applications, such as comparison shopping sites, meta-search engines, etc. A number of approaches to query result extraction have been proposed. As the structures of web pages become more complex, these approaches start to fail. Query result pages usually also contain other types of information in addition to query results, e.g., advertisements, navigation bar, etc. Most of the existing approaches do not remove such irrelevant contents which may affect the accuracy of data record extraction. We have observed that query results are usually displayed in regular visual patterns and terms used in a query often re-appear in query results. We propose a novel approach that makes use of visual features and query terms to identify the data section and extract data records from it. We also use several content and visual features of visual blocks in a data section to filter out noisy blocks. The results of our experiments on a large set of query result pages in different domains show that our proposed approach is highly effective. © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Weng, D., Hong, J., & Bell, D. A. (2011). Extracting data records from query result pages based on visual features. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7051 LNCS, pp. 140–153). https://doi.org/10.1007/978-3-642-24577-0_16

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free