Huge amount of useful data is buried under the layers of hidden web that is accessible when submit forms are filled by users. Web crawlers can access this data only by interacting with web-based search forms. Traditional search engines cannot efficiently search and index these deep or hidden web pages. Retrieving data with high accuracy and coverage in hidden web is a challenging task. Focused crawling guarantees that the document that is found has a place with the particular subject. In the proposed architecture, Smart focused web crawler for hidden web is based on XML parsing of web pages, by first finding the hidden web pages and learning their features. Term frequency–inverse document frequency will be used to build classifier in order to find relevant pages, using completely automatic adaptive learning technique. This system will help in increasing the coverage and accuracy of retrieved web pages. For distributed processing, MapReduce framework of Hadoop will be used.
CITATION STYLE
Kaur, S., & Geetha, G. (2019). Smart Focused Web Crawler for Hidden Web. In Lecture Notes in Networks and Systems (Vol. 40, pp. 419–427). Springer. https://doi.org/10.1007/978-981-13-0586-3_42
Mendeley helps you to discover research relevant for your work.