Facet classification of blogs: Know-Center at the TREC 2009 Blog Distillation Task

Elisabeth Lex; Michael Granitzer; Andreas Juffinger

Conference Proceedings

Facet classification of blogs: Know-Center at the TREC 2009 Blog Distillation Task

NIST Special Publication (2009)

ISSN: 1048776X

2Citations

13Readers

Abstract

In this paper, we outline our experiments carried out at the TREC 2009 Blog Distillation Task. Our system is based on a plain text index extracted from the XML feeds of the TREC Blogs08 dataset. This index was used to retrieve candidate blogs for the given topics. The resulting blogs were classified using a Support Vector Machine that was trained on a manually labelled subset of the TREC Blogs08 dataset. Our experiments included three runs on different features: firstly on nouns, secondly on stylometric properties, and thirdly on punctuation statistics. The facet identification based on our approach was successful, although a significant number of candidate blogs were not retrieved at all.

Cite

CITATION STYLE

APA

Lex, E., Granitzer, M., & Juffinger, A. (2009). Facet classification of blogs: Know-Center at the TREC 2009 Blog Distillation Task. In NIST Special Publication.

Facet classification of blogs: Know-Center at the TREC 2009 Blog Distillation Task

Abstract

Cite

Register to see more suggestions