Topics and label propagation: Best of both worlds for weakly supervised text classification

0Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We propose a Label Propagation based algorithm for weakly supervised text classification. We construct a graph where each document is represented by a node and edge weights represent similarities among the documents. Additionally, we discover underlying topics using Latent Dirichlet Allocation (LDA) and enrich the document graph by including the topics in the form of additional nodes. The edge weights between a topic and a text document represent level of “affinity” between them. Our approach does not require document level labelling, instead it expects manual labels only for topic nodes. This significantly minimizes the level of supervision needed as only a few topics are observed to be enough for achieving sufficiently high accuracy. The Label Propagation Algorithm is employed on this enriched graph to propagate labels among the nodes. Our approach combines the advantages of Label Propagation (through document-document similarities) and Topic Modelling (for minimal but smart supervision). We demonstrate the effectiveness of our approach on various datasets and compare with state-of-the-art weakly supervised text classification approaches.

Cite

CITATION STYLE

APA

Pawar, S., Ramrakhiyani, N., Hingmire, S., & Palshikar, G. K. (2018). Topics and label propagation: Best of both worlds for weakly supervised text classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9624 LNCS, pp. 446–459). Springer Verlag. https://doi.org/10.1007/978-3-319-75487-1_35

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free