In this paper we describe methods of performing data mining on web documents, where the web document content is represented by graphs. We show how traditional clustering and classification methods, which usually operate on vector representations of data, can be extended to work with graph-based data. Specifically, we give graphtheoretic extensions of the k-Nearest Neighbors classification algorithm and the k-means clustering algorithm that process graphs, and show how the retention of structural information can lead to improved performance over the case of the vector model approach. We introduce several different types of web document representations that utilize graphs and compare their performance for clustering and classification. © Springer-Verlag 2004.
CITATION STYLE
Schenker, A., Bunke, H., Last, M., & Kandel, A. (2004). A graph-based framework for web document mining. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3163, 401–412. https://doi.org/10.1007/978-3-540-28640-0_38
Mendeley helps you to discover research relevant for your work.