Thai related foreign language specific web crawling approach

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

National web archives have been successfully made available through domain-and language-specific web crawlers for years. We here propose another focused web crawler for collecting foreign language web pages that are also related to a nation. Rather finding the most relevant web pages, an ensemble machine learning has been trained with selective features to find relevant clusters of unvisited web pages, called website segments. During consecutive crawling cycles, the machine will be retrained with features extracted from new found website segments. Preliminary experiments in the real web space on Thai-tourism related topics show that this approach can take advantage of recent crawling experiences to produce more promising harvest rates than traditional breadth- and best-first baselines. © Springer Science+Business Media Singapore 2014.

Cite

CITATION STYLE

APA

Suebchua, T., Manaskasemsak, B., & Rungsawang, A. (2014). Thai related foreign language specific web crawling approach. In Lecture Notes in Electrical Engineering (Vol. 285 LNEE, pp. 641–648). Springer Verlag. https://doi.org/10.1007/978-981-4585-18-7_72

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free