Text and hypertext categorization

Houda Benbrahim; Max Bramer

Conference Proceedings

Text and hypertext categorization

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5640 LNAI 11-38

DOI: 10.1007/978-3-642-03226-4_2

7Citations

16Readers

Get full text

Abstract

Automatic categorization of text documents has become an important area of research in the last two decades, with features that make it significantly more difficult than the traditional classification tasks studied in machine learning. A more recent development is the need to classify hypertext documents, most notably web pages. These have features that add further complexity to the categorization task but also offer the possibility of using information that is not available in standard text classification, such as metadata and the content of the web pages that point to and are pointed at by a web page of interest. This chapter surveys the state of the art in text categorization and hypertext categorization, focussing particularly on issues of representation that differentiate them from 'conventional' classification tasks and from each other. © 2009 Springer Berlin Heidelberg.

Cite

CITATION STYLE

APA

Benbrahim, H., & Bramer, M. (2009). Text and hypertext categorization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5640 LNAI, pp. 11–38). https://doi.org/10.1007/978-3-642-03226-4_2

Text and hypertext categorization

Abstract

Cite

Register to see more suggestions