Genre classification of web pages: User study and feasibility analysis

72Citations
Citations of this article
44Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Genre classification means to discriminate between documents bymeans of their form, their style, or their targeted audience. Put another way, genre classification is orthogonal to a classification based on the documents' contents. While most of the existing investigations of an automated genre classification are based on news articles corpora, the idea here is applied to arbitrary Web pages. We see genre classification as a powerful instrument to bring Web-based search services closer to a user's information need. This objective raises two questions: 1 What are useful genres when searching the WWW? 2 Can these genres be reliably identified? The paper in hand presents results from a user study on Web genre usefulness as well as results from the construction of a genre classifier using discriminant analysis, neural network learning, and support vector machines. Particular attention is turned to a classifier's underlying feature set: Aside from the standard feature types we introduce new features that are based on word frequency classes and that can be computed with minimum computational effort. They allow us to construct compact feature sets with few elements, with which a satisfactory genre diversification is achieved. About 70% of the Web-documents are assigned to their true genre; note in this connection that no genre classification benchmark for Web pages has been published so far. © 2004 Springer-Verlag.

Cite

CITATION STYLE

APA

Meyer Zu Eissen, S., & Stein, B. (2004). Genre classification of web pages: User study and feasibility analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3238 LNAI, pp. 256–269). Springer Verlag. https://doi.org/10.1007/978-3-540-30221-6_20

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free