A simple, structure-sensitive approach for Web document classification

Alex Markov; Mark Last

Conference Proceedings

A simple, structure-sensitive approach for Web document classification

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2005) 3528 LNAI 293-298

DOI: 10.1007/11495772_46

8Citations

4Readers

Get full text

Abstract

In this paper we describe a new approach to classification of web documents. Most web classification methods are based on the vector space document representation of information retrieval. Recently the graph based web document representation model was shown to outperform the traditional vector representation using k-Nearest Neighbor (k-NN) classification algorithm. Here we suggest a new hybrid approach to web document classification built upon both, graph and vector representations. K-NN algorithm and three benchmark document collections were used to compare this method to graph and vector based methods separately. Results demonstrate that we succeed in most cases to outperform graph and vector approaches in terms of classification accuracy along with a significant reduction in classification time. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Markov, A., & Last, M. (2005). A simple, structure-sensitive approach for Web document classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3528 LNAI, pp. 293–298). Springer Verlag. https://doi.org/10.1007/11495772_46

A simple, structure-sensitive approach for Web document classification

Abstract

Cite

Register to see more suggestions