Automatic extraction of logical web lists

Pasqua Fabiana Lanotte; Fabio Fumarola; Michelangelo Ceci; Andrea Scarpino; Michele Damiano Torelli; Donato Malerba

Conference Proceedings

Automatic extraction of logical web lists

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8502 LNAI 365-374

DOI: 10.1007/978-3-319-08326-1_37

6Citations

5Readers

Get full text

Abstract

Recently, there has been increased interest in the extraction of structured data from the web (both "Surface" Web and"Hidden" Web). In particular, in this paper we focus on the automatic extraction of Web Lists. Although this task has been studied extensively, existing approaches are based on the assumption that lists are wholly contained in a Web page.They do not consider that many websites span their listing on several Web Pages and show for each of these only a partial view. Similar to databases, where a view can represent a subset of the data contained in a table, they split a logical list in multiple views (view lists). Automatic extraction of logical lists is an open problem. To tackle this issue we propose an unsupervised and domain-independent algorithm for logical list extraction. Experimental results on real-life and data-intensive Web sites confirm the effectiveness of our approach. © 2014 Springer International Publishing.

Author supplied keywords

Cite

CITATION STYLE

APA

Lanotte, P. F., Fumarola, F., Ceci, M., Scarpino, A., Torelli, M. D., & Malerba, D. (2014). Automatic extraction of logical web lists. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8502 LNAI, pp. 365–374). Springer Verlag. https://doi.org/10.1007/978-3-319-08326-1_37

Automatic extraction of logical web lists

Abstract

Author supplied keywords

Cite

Register to see more suggestions