Smart Focused Web Crawler for Hidden Web

1Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Huge amount of useful data is buried under the layers of hidden web that is accessible when submit forms are filled by users. Web crawlers can access this data only by interacting with web-based search forms. Traditional search engines cannot efficiently search and index these deep or hidden web pages. Retrieving data with high accuracy and coverage in hidden web is a challenging task. Focused crawling guarantees that the document that is found has a place with the particular subject. In the proposed architecture, Smart focused web crawler for hidden web is based on XML parsing of web pages, by first finding the hidden web pages and learning their features. Term frequency–inverse document frequency will be used to build classifier in order to find relevant pages, using completely automatic adaptive learning technique. This system will help in increasing the coverage and accuracy of retrieved web pages. For distributed processing, MapReduce framework of Hadoop will be used.

Author supplied keywords

Cite

CITATION STYLE

APA

Kaur, S., & Geetha, G. (2019). Smart Focused Web Crawler for Hidden Web. In Lecture Notes in Networks and Systems (Vol. 40, pp. 419–427). Springer. https://doi.org/10.1007/978-981-13-0586-3_42

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free