A clustering framework to build focused web crawlers for automatic extraction of cultural information

0Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present a novel focused crawling method for extracting and processing cultural data from the web in a fully automated fashion. After downloading the pages, we extract from each document a number of words for each thematic cultural area. We then create multidimensional document vectors comprising the most frequent word occurrences. The dissimilarity between these vectors is measured by the Hamming distance. In the last stage, we employ cluster analysis to partition the document vectors into a number of clusters. Finally, our approach is illustrated via a proof-of-concept application which scrutinizes hundreds of web pages spanning different cultural thematic areas. © 2008 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Tsekouras, G. E., Gavalas, D., Filios, S., Niros, A. D., & Bafaloukas, G. (2008). A clustering framework to build focused web crawlers for automatic extraction of cultural information. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5138 LNAI, pp. 419–424). https://doi.org/10.1007/978-3-540-87881-0_43

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free