Unsupervised document classification with informed topic models

8Citations
Citations of this article
108Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Document classification is an important and common application in natural language processing. Scaling classification approaches to many targets faces a bottleneck in acquiring gold standard labels. In this work, we develop and evaluate a method for using informed topic models to noisily label documents, creating a noisy but usable set of labels for training discriminative classifiers. We investigate multiple ways to train this noisy classifier, and the best performing method usesWikipedia-seeded topic models to approximately label training instances without any supervision. We evaluate these methods on the classification task as well as in an active learning setting, in which they are shown to improve learning rates over traditional active learning.

Cite

CITATION STYLE

APA

Miller, T. A., Dligach, D., & Savova, G. K. (2016). Unsupervised document classification with informed topic models. In BioNLP 2016 - Proceedings of the 15th Workshop on Biomedical Natural Language Processing (pp. 83–91). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-2911

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free