Abstract
Document classification is an important and common application in natural language processing. Scaling classification approaches to many targets faces a bottleneck in acquiring gold standard labels. In this work, we develop and evaluate a method for using informed topic models to noisily label documents, creating a noisy but usable set of labels for training discriminative classifiers. We investigate multiple ways to train this noisy classifier, and the best performing method usesWikipedia-seeded topic models to approximately label training instances without any supervision. We evaluate these methods on the classification task as well as in an active learning setting, in which they are shown to improve learning rates over traditional active learning.
Cite
CITATION STYLE
Miller, T. A., Dligach, D., & Savova, G. K. (2016). Unsupervised document classification with informed topic models. In BioNLP 2016 - Proceedings of the 15th Workshop on Biomedical Natural Language Processing (pp. 83–91). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-2911
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.