Automatic web document restructuring based on visual information analysis

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Many documents available on the current web have quite a complex structure that allows to present various kinds of information. Apart from the main content, the documents usually contain headers and footers, navigation sections and other types of additional information. For many applications such as document indexing or browsing on special devices, it is desirable that the main document information should precede the additional information in the underlying HTML code. In this paper, we propose a method of document preprocessing that automatically restructures the document code according to this criteria. Our method is based on rendered document analysis. A page segmentation algorithm is used for detecting the basic blocks on the page and the relevance of the individual parts is estimated from the visual properties of the text content. © Springer-Verlag Berlin Heidelberg 2010.

Cite

CITATION STYLE

APA

Burget, R. (2010). Automatic web document restructuring based on visual information analysis. In Advances in Intelligent and Soft Computing (Vol. 67 AISC, pp. 61–70). https://doi.org/10.1007/978-3-642-10687-3_6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free