Automatic identification of web query interfaces

Heidy M. Marin-Castro; Victor J. Sosa-Sosa; Ivan Lopez-Arevalo

Conference Proceedings

Automatic identification of web query interfaces

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011) 7095 LNAI(PART 2) 297-306

DOI: 10.1007/978-3-642-25330-0_26

0Citations

3Readers

Get full text

Abstract

The amount of information contained in databases in the Web has grown explosively in the last years. This information, known as the Deep Web, is dynamically obtained from specific queries to these databases through Web Query Interfaces (WQIs). The problem of finding and accessing databases in the Web is a great challenge due to the Web sites are very dynamic and the information existing is heterogeneous. Therefore, it is necessary to create efficient mechanisms to access, extract and integrate information contained in databases in the Web. Since WQIs are the only means to access databases in the Web, the automatic identification of WQIs plays an important role facilitating traditional search engines to increase the coverage and access interesting information not available on the indexable Web. In this paper we present a strategy for automatic identification of WQIs using supervised learning and making an adequate selection and extraction of HTML elements in the WQIs to form the training set. We present two experimental tests over a corpora of HTML forms considering positive and negative examples. Our proposed strategy achieves better accuracy than previous works reported in the literature. © 2011 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Marin-Castro, H. M., Sosa-Sosa, V. J., & Lopez-Arevalo, I. (2011). Automatic identification of web query interfaces. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7095 LNAI, pp. 297–306). https://doi.org/10.1007/978-3-642-25330-0_26

Automatic identification of web query interfaces

Abstract

Author supplied keywords

Cite

Register to see more suggestions