Automatic identification of web query interfaces

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The amount of information contained in databases in the Web has grown explosively in the last years. This information, known as the Deep Web, is dynamically obtained from specific queries to these databases through Web Query Interfaces (WQIs). The problem of finding and accessing databases in the Web is a great challenge due to the Web sites are very dynamic and the information existing is heterogeneous. Therefore, it is necessary to create efficient mechanisms to access, extract and integrate information contained in databases in the Web. Since WQIs are the only means to access databases in the Web, the automatic identification of WQIs plays an important role facilitating traditional search engines to increase the coverage and access interesting information not available on the indexable Web. In this paper we present a strategy for automatic identification of WQIs using supervised learning and making an adequate selection and extraction of HTML elements in the WQIs to form the training set. We present two experimental tests over a corpora of HTML forms considering positive and negative examples. Our proposed strategy achieves better accuracy than previous works reported in the literature. © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Marin-Castro, H. M., Sosa-Sosa, V. J., & Lopez-Arevalo, I. (2011). Automatic identification of web query interfaces. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7095 LNAI, pp. 297–306). https://doi.org/10.1007/978-3-642-25330-0_26

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free