We propose a novel approach for extraction of structured web data called ClustVX. It clusters visually similar web page elements by exploiting their visual formatting and structural features. Clusters are then used to derive extraction rules. The experimental evaluation results of ClustVX system on three publicly available benchmark data sets outperform state-of-the-art structured data extraction systems. © 2012 Springer-Verlag.
CITATION STYLE
Grigalis, T., Radvilavičius, L., Čenys, A., & Gordevičius, J. (2012). Clustering visually similar web page elements for structured web data extraction. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7387 LNCS, pp. 435–438). https://doi.org/10.1007/978-3-642-31753-8_38
Mendeley helps you to discover research relevant for your work.