Web page classification exploiting contents of surrounding pages for building a high-quality homepage collection

Yuxin Wang; Keizo Oyama

Conference Proceedings

Web page classification exploiting contents of surrounding pages for building a high-quality homepage collection

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 4312 LNCS 515-518

DOI: 10.1007/11931584_61

2Citations

2Readers

Get full text

Abstract

We propose a web page classification method for creating a high quality collection of researchers' homepages. A method to reduce manual assessment required for assuring given precision/recall using a recall-assured and a precision-assured classifier is presented. Each classifier is built with SVM using textual features obtained from each page and its surrounding pages and tuning parameters. These pages are grouped based on connection types and relative URL hierarchy levels, and independent features are extracted from each group. Experiment results show the proposed features evidently improve classification performance and the manual assessment is significantly reduced. © Springer-Verlag Berlin Heidelberg 2006.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, Y., & Oyama, K. (2006). Web page classification exploiting contents of surrounding pages for building a high-quality homepage collection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4312 LNCS, pp. 515–518). Springer Verlag. https://doi.org/10.1007/11931584_61

Web page classification exploiting contents of surrounding pages for building a high-quality homepage collection

Abstract

Author supplied keywords

Cite

Register to see more suggestions