A focused crawling for the Web resource discovery using a modified proximal support vector machines

4Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

With the rapid growth of the World Wide Web, a focused crawling has been increasingly of importance. The goal of the focused crawling is to seek out and collect the pages that are relevant to a predefined set of topics. The determination of the relevance of a page to a specific topic has been addressed as a classification problem. However, when training the classifiers, one can often encounter some difficulties in selecting negative samples. Such difficulties come from the fact that collecting a set of pages relevant to a specific topic is not a classification process by nature. In this paper, we propose a novel focused crawling method using only positive samples to represent a given topic as a form of hyperplane, where we can obtain such representation from a modified Proximal Support Vector Machines. The distance from a page to the hyperplane is used to prioritize the visit order of the page. We demonstrated the performance of the proposed method over the WebKB data set and the Web. The promising results suggest that our proposed method be more effective to the focused crawling problem than the traditional approaches. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Choi, Y. S., Kim, K. J., & Kang, M. S. (2005). A focused crawling for the Web resource discovery using a modified proximal support vector machines. In Lecture Notes in Computer Science (Vol. 3480, pp. 186–194). Springer Verlag. https://doi.org/10.1007/11424758_20

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free