Robust Web Data Extraction Based on Unsupervised Visual Validation

5Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Visual validation is the process of validating sets of extracted entities by means of visual information. The main advantage of visual validation is to make use of visual information for web information extraction without impacting on the robustness of extractors. In this paper, we show that unsupervised visual validation can be used to create robust web data extractors. More precisely, we evaluate the performance of visual validation on a corpus of visually heterogeneous documents. The selected extraction task consists in extracting the price, name, description, and SKU of unspecified products from unseen documents. Our corpus contains 1000 various products from 100 different sources, which we render public. Results also show that visual validation improves web data extraction even when the extractor is trained with visual features.

Cite

CITATION STYLE

APA

Potvin, B., & Villemaire, R. (2019). Robust Web Data Extraction Based on Unsupervised Visual Validation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11431 LNAI, pp. 77–89). Springer Verlag. https://doi.org/10.1007/978-3-030-14799-0_7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free