A quantitative comparison of semantic web page segmentation approaches

N/ACitations
Citations of this article
18Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We compare three known semantic web page segmentation algorithms, each serving as an example of a particular approach to the problem, and one self-developed algorithm, WebTerrain, that combines two of the approaches. We compare the performance of the four algorithms for a large benchmark of modern websites we have constructed, examining each algorithm for a total of eight configurations. We found that all algorithms performed better on random pages on average than on popular pages, and results are better when running the algorithms on the HTML obtained from the DOM rather than on the plain HTML. Overall there is much room for improvement as we find the best average F-score to be 0.49, indicating that for modern websites currently available algorithms are not yet of practical use.

Cite

CITATION STYLE

APA

Kreuzer, R., Hage, J., & Feelders, A. (2015). A quantitative comparison of semantic web page segmentation approaches. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9114, pp. 374–391). Springer Verlag. https://doi.org/10.1007/978-3-319-19890-3_24

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free