Smart Focused Web Crawler for Hidden Web

Sawroop Kaur; G. Geetha

Book Chapter

Smart Focused Web Crawler for Hidden Web

Springer, (2019), 419-427

DOI: 10.1007/978-981-13-0586-3_42

1Citations

6Readers

Get full text

Abstract

Huge amount of useful data is buried under the layers of hidden web that is accessible when submit forms are filled by users. Web crawlers can access this data only by interacting with web-based search forms. Traditional search engines cannot efficiently search and index these deep or hidden web pages. Retrieving data with high accuracy and coverage in hidden web is a challenging task. Focused crawling guarantees that the document that is found has a place with the particular subject. In the proposed architecture, Smart focused web crawler for hidden web is based on XML parsing of web pages, by first finding the hidden web pages and learning their features. Term frequency–inverse document frequency will be used to build classifier in order to find relevant pages, using completely automatic adaptive learning technique. This system will help in increasing the coverage and accuracy of retrieved web pages. For distributed processing, MapReduce framework of Hadoop will be used.

Author supplied keywords

Cite

CITATION STYLE

APA

Kaur, S., & Geetha, G. (2019). Smart Focused Web Crawler for Hidden Web. In Lecture Notes in Networks and Systems (Vol. 40, pp. 419–427). Springer. https://doi.org/10.1007/978-981-13-0586-3_42

Smart Focused Web Crawler for Hidden Web

Abstract

Author supplied keywords

Cite

Register to see more suggestions