Integrating content and structure learning: A model of hypertext zoning and sounding

0Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The bag-of-words model is accepted as the first choice when it comes to representing the content of web documents. It benefits from a low time complexity, but this comes at the cost of ignoring document structure. Obviously, there is a trade-off between the range of document modeling and its computational complexity. In this chapter, we present a model of content and structure learning that tackles this trade-off with a focus on delimiting documents as instances of webgenres. We present and evaluate a two-level algorithm of hypertext zoning that integrates the genre-related classification of web documents with their segmentation. In addition, we present an algorithm of hypertext sounding with respect to the thematic demarcation of web documents. © 2011 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Mehler, A., & Waltinger, U. (2011). Integrating content and structure learning: A model of hypertext zoning and sounding. Studies in Computational Intelligence, 370, 299–329. https://doi.org/10.1007/978-3-642-22613-7_15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free