A clustering framework to build focused web crawlers for automatic extraction of cultural information

George E. Tsekouras; Damianos Gavalas; Stefanos Filios; Antonios D. Niros; George Bafaloukas

Conference Proceedings

A clustering framework to build focused web crawlers for automatic extraction of cultural information

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2008) 5138 LNAI 419-424

DOI: 10.1007/978-3-540-87881-0_43

0Citations

7Readers

Get full text

Abstract

We present a novel focused crawling method for extracting and processing cultural data from the web in a fully automated fashion. After downloading the pages, we extract from each document a number of words for each thematic cultural area. We then create multidimensional document vectors comprising the most frequent word occurrences. The dissimilarity between these vectors is measured by the Hamming distance. In the last stage, we employ cluster analysis to partition the document vectors into a number of clusters. Finally, our approach is illustrated via a proof-of-concept application which scrutinizes hundreds of web pages spanning different cultural thematic areas. © 2008 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Tsekouras, G. E., Gavalas, D., Filios, S., Niros, A. D., & Bafaloukas, G. (2008). A clustering framework to build focused web crawlers for automatic extraction of cultural information. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5138 LNAI, pp. 419–424). https://doi.org/10.1007/978-3-540-87881-0_43

A clustering framework to build focused web crawlers for automatic extraction of cultural information

Abstract

Author supplied keywords

Cite

Register to see more suggestions