Web-page Summarization Using Clickthrough Data

  • Sun J
  • Shen D
  • Zeng H
 et al. 
  • 57

    Readers

    Mendeley users who have this article in their library.
  • 66

    Citations

    Citations of this article.

Abstract

Most previous Web-page summarization methods treat a Web page as plain text. However, such methods fail to uncover the full knowledge associated with a Web page needed in building a high-quality summary, because many of these methods do not consider the hidden relationships in the Web. Uncovering the hidden knowledge is important in building good Web-page summarizers. In this paper, we extract the extra knowledge from the clickthrough data of a Web search engine to improve Web-page summarization. Wefirst analyze the feasibility in utilizing the clickthrough data to enhance Web-page summarization and then propose two adapted summarization methods that take advantage of the relationships discovered from the clickthrough data. For those pages that are not covered by the clickthrough data, we design a thematic lexicon approach to generate implicit knowledge for them. Our methods are evaluated on a dataset consisting of manually annotated pages as well as a large dataset that is crawled from the Open Directory Project website. The experimental results indicate that significant improvements can be achieved through our proposed summarizer as compared to the summarizers that do not use the clickthrough data.

Author-supplied keywords

  • clickthrough data
  • generic web-page summarization
  • latent semantic analysis
  • thematic lexicon

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

Authors

  • Jian-Tao Sun

  • Dou Shen

  • Hua-Jun Zeng

  • Qiang Yang

  • Yuchang Lu

  • Zheng Chen

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free