A graph-based framework for web document mining

Adam Schenker; Horst Bunke; Mark Last; Abraham Kandel

Journal ArticleOPEN ACCESS

A graph-based framework for web document mining

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2004) 3163 401-412

DOI: 10.1007/978-3-540-28640-0_38

8Citations

7Readers

Abstract

In this paper we describe methods of performing data mining on web documents, where the web document content is represented by graphs. We show how traditional clustering and classification methods, which usually operate on vector representations of data, can be extended to work with graph-based data. Specifically, we give graphtheoretic extensions of the k-Nearest Neighbors classification algorithm and the k-means clustering algorithm that process graphs, and show how the retention of structural information can lead to improved performance over the case of the vector model approach. We introduce several different types of web document representations that utilize graphs and compare their performance for clustering and classification. © Springer-Verlag 2004.

Cite

CITATION STYLE

APA

Schenker, A., Bunke, H., Last, M., & Kandel, A. (2004). A graph-based framework for web document mining. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3163, 401–412. https://doi.org/10.1007/978-3-540-28640-0_38

A graph-based framework for web document mining

Abstract

Cite

Register to see more suggestions