A focused crawling for the Web resource discovery using a modified proximal support vector machines

Young Sik Choi; Ki Joo Kim; Mun Su Kang

Conference Proceedings

A focused crawling for the Web resource discovery using a modified proximal support vector machines

Lecture Notes in Computer Science (2005) 3480(I) 186-194

DOI: 10.1007/11424758_20

4Citations

3Readers

Get full text

Abstract

With the rapid growth of the World Wide Web, a focused crawling has been increasingly of importance. The goal of the focused crawling is to seek out and collect the pages that are relevant to a predefined set of topics. The determination of the relevance of a page to a specific topic has been addressed as a classification problem. However, when training the classifiers, one can often encounter some difficulties in selecting negative samples. Such difficulties come from the fact that collecting a set of pages relevant to a specific topic is not a classification process by nature. In this paper, we propose a novel focused crawling method using only positive samples to represent a given topic as a form of hyperplane, where we can obtain such representation from a modified Proximal Support Vector Machines. The distance from a page to the hyperplane is used to prioritize the visit order of the page. We demonstrated the performance of the proposed method over the WebKB data set and the Web. The promising results suggest that our proposed method be more effective to the focused crawling problem than the traditional approaches. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Choi, Y. S., Kim, K. J., & Kang, M. S. (2005). A focused crawling for the Web resource discovery using a modified proximal support vector machines. In Lecture Notes in Computer Science (Vol. 3480, pp. 186–194). Springer Verlag. https://doi.org/10.1007/11424758_20

A focused crawling for the Web resource discovery using a modified proximal support vector machines

Abstract

Cite

Register to see more suggestions