Importance-based web page classification using cost-sensitive SVM

6Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Web page classification is facing great challenges since there is a huge repository and diversity of information. As known, each web page varies both in content and quality, just as PageRank suggested. Typical machine learning algorithms take advantage of positive and negative examples to train a classifier; however, it has been neglected that each instance has a different weight, which can be user pre-defined. This paper presents an effective algorithm based on Cost-Sensitive Support Vector Machine (CS-SVM) to improve the accuracy of classification. During the training process of CS-SVM, different cost factors are attached on the training errors to generate an optimized hyperplane. Our experiments show that CS-SVM outperforms SVM on the standard ODP data set. The web pages with relative high PageRank values contribute most to the classifier and using them for training can exceed the random sampling technique. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Liu, W., Xue, G. R., Yu, Y., & Zeng, H. J. (2005). Importance-based web page classification using cost-sensitive SVM. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3739 LNCS, pp. 127–137). Springer Verlag. https://doi.org/10.1007/11563952_12

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free