It is well-known that obtaining deep web information is challenging task and it is required to choose suitable query values for crawling large data source. In this paper, we have proposed architecture specification of a deep web crawler with effective FORM filling strategy using rules. The rules are constructed by analyzing the FORM and combination of parameters. These FORM parameters are classified as most preferable, least preferable and mutually exclusive. For each successful FORM submission, the deep web data is extracted and indexed suitably for information retrieval applications. The performance of the crawler is encouraging when compared to a conventional surface crawler.
CITATION STYLE
Shaila, S. G., Vadivel, A., Devi Mahalakshmi, R., & Karthika, J. (2014). DwCB - architecture specification of deep web crawler bot with rules based on FORM values for domain specific web site. Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, 117, 191–196. https://doi.org/10.1007/978-3-319-11629-7_28
Mendeley helps you to discover research relevant for your work.