Topic Classification of Blog Posts Using Distant Supervision

  • Husby S
  • Barbosa D
N/ACitations
Citations of this article
90Readers
Mendeley users who have this article in their library.

Abstract

Classifying blog posts by topics is useful for applications such as search and marketing. However, topic classification is time consuming and error prone, especially in an open domain such as the blogosphere. The state-of-the-art relies on supervised methods, requiring considerable training effort, that use the whole corpus vocabulary as features, demanding considerable memory to process. We show an effective alternative whereby distant supervision is used to obtain training data: we use Wikipedia articles labelled with Freebase domains. We address the memory requirements by using only named entities as features. We test our classifier on a sample of blog posts, and report up to 0.69 accuracy for multi-class labelling and 0.9 for binary classification

Cite

CITATION STYLE

APA

Husby, S. D., & Barbosa, D. (2012). Topic Classification of Blog Posts Using Distant Supervision. Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, 28–36. Retrieved from http://aclweb.org/anthology/W/W12/W12-0604.pdf

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free