Classification of web pages is usually done by extracting the textual content of the page and/or by extracting structural features from the HTML. In this work, we present a different approach, where we use the visual appearance of web pages for their classification.We extract generic, low-level visual features directly from the page as it is rendered by a web browser. The visual features used in this document are simple color and edge histograms, Gabor and texture features. These were extracted using an off-the-shelf visual feature extraction method. In three experiments, we classify web pages based on their aesthetic value, their recency and the type of website. Results show that these simple, global visual features already produce good classification results. We also introduce an online tool that uses the trained classifiers to assess new web pages. © 2011 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
De Boer, V., Van Someren, M. W., & Lupascu, T. (2011). Web page classification using image analysis features. In Lecture Notes in Business Information Processing (Vol. 75 LNBIP, pp. 272–285). Springer Verlag. https://doi.org/10.1007/978-3-642-22810-0_20
Mendeley helps you to discover research relevant for your work.