Unsupervised document classification with informed topic models

Timothy A. Miller; Dmitriy Dligach; Guergana K. Savova

Conference Proceedings

Unsupervised document classification with informed topic models

BioNLP 2016 - Proceedings of the 15th Workshop on Biomedical Natural Language Processing (2016) 83-91

DOI: 10.18653/v1/w16-2911

8Citations

108Readers

Get full text

Abstract

Document classification is an important and common application in natural language processing. Scaling classification approaches to many targets faces a bottleneck in acquiring gold standard labels. In this work, we develop and evaluate a method for using informed topic models to noisily label documents, creating a noisy but usable set of labels for training discriminative classifiers. We investigate multiple ways to train this noisy classifier, and the best performing method usesWikipedia-seeded topic models to approximately label training instances without any supervision. We evaluate these methods on the classification task as well as in an active learning setting, in which they are shown to improve learning rates over traditional active learning.

Cite

CITATION STYLE

APA

Miller, T. A., Dligach, D., & Savova, G. K. (2016). Unsupervised document classification with informed topic models. In BioNLP 2016 - Proceedings of the 15th Workshop on Biomedical Natural Language Processing (pp. 83–91). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-2911

Unsupervised document classification with informed topic models

Abstract

Cite

Register to see more suggestions