Multi-Label Approaches to Web Genre Identification

  • Vidulin V
  • Luštrek M
  • Gams M
N/ACitations
Citations of this article
17Readers
Mendeley users who have this article in their library.

Abstract

A web page is a complex document which can share conventions of several genres, or contain several parts, each belonging to a different genre. To properly address the genre interplay, a recent proposal in automatic web genre identification is multi-label classification. The dominant approach to such classification is to transform one multi-label machine learning problem into several sub-problems of learning binary single-label classifiers, one for each genre. In this paper we explore multi-class transformation, where each combination of genres is labeled with a single distinct label. This approach is then compared to the binary approach to determine which one better captures the multi-label aspect of web genres. Experimental results show that both of the approaches failed to properly address multi-genre web pages. Obtained differences were a result of the variations in the recognition of one-genre web pages.

Cite

CITATION STYLE

APA

Vidulin, V., Luštrek, M., & Gams, M. (2009). Multi-Label Approaches to Web Genre Identification. Journal for Language Technology and Computational Linguistics, 24(1), 97–114. https://doi.org/10.21248/jlcl.24.2009.115

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free