An approach to improving quality of crawlers using Naïve Bayes for classifier and hyperlink filter

1Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Nowadays, most of search engines rely on keywords provided by users. However, keywords may not be sufficiently representative for the main topic of a web page. When searching for a topic, users input their desirable topic in terms of keywords. Keyword-based search engines will return pages that contain the keywords even though these pages are not about the topic. This limits the efficiency of these engines as they may return undesirable result. In this paper, we present an approach to improve the quality of search engines by focusing on web pages related to specific topics. Our system includes three main components: a crawler for gathering web pages, a classifier for classifying web pages by topics, and a hyperlink filter (or distiller) for filtering hyperlinks. We propose Naïve Bayes algorithms for classifier and distiller to enhance the accuracy of the system. We also implement and examine the efficiency of our system by gathering web pages in two topics: Artificial Intelligence and Motorcycle. The result shows that our crawler achieves performance improvements in efficiency over the ones that search by keywords. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Nguyen, H. T. T., & Le, D. K. (2012). An approach to improving quality of crawlers using Naïve Bayes for classifier and hyperlink filter. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7653 LNAI, pp. 525–535). https://doi.org/10.1007/978-3-642-34630-9_54

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free