Facet classification of blogs: Know-Center at the TREC 2009 Blog Distillation Task

ISSN: 1048776X
2Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

Abstract

In this paper, we outline our experiments carried out at the TREC 2009 Blog Distillation Task. Our system is based on a plain text index extracted from the XML feeds of the TREC Blogs08 dataset. This index was used to retrieve candidate blogs for the given topics. The resulting blogs were classified using a Support Vector Machine that was trained on a manually labelled subset of the TREC Blogs08 dataset. Our experiments included three runs on different features: firstly on nouns, secondly on stylometric properties, and thirdly on punctuation statistics. The facet identification based on our approach was successful, although a significant number of candidate blogs were not retrieved at all.

Cite

CITATION STYLE

APA

Lex, E., Granitzer, M., & Juffinger, A. (2009). Facet classification of blogs: Know-Center at the TREC 2009 Blog Distillation Task. In NIST Special Publication.

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free