Robust supervised topic models under label noise

Wei Wang; Bing Guo; Yan Shen; Han Yang; Yaosen Chen; Xinhua Suo

Journal ArticleOPEN ACCESS

Robust supervised topic models under label noise

Machine Learning (2021) 110(5) 907-931

DOI: 10.1007/s10994-021-05967-y

9Citations

13Readers

Abstract

Recently, some statistical topic modeling approaches have been widely applied in the field of supervised document classification. However, there are few researches on these approaches under label noise, which widely exists in real-world applications. For example, many large-scale datasets are collected from websites or annotated by varying quality human-workers, and then have a few mislabeled items. In this paper, we propose two robust topic models for document classification problems: Smoothed Labeled LDA (SL-LDA) and Adaptive Labeled LDA (AL-LDA). SL-LDA is an extension of Labeled LDA (L-LDA), which is a classical supervised topic model. The proposed model overcomes the shortcoming of L-LDA, i.e., overfitting on noisy labels, through Dirichlet smoothing. AL-LDA is an iterative optimization framework based on SL-LDA. At each iterative procedure, we update the Dirichlet prior, which incorporates the observed labels, by a concise algorithm based on maximizingentropy and minimizingcross-entropy principles. This method avoids identifying the noisy label, which is a common difficulty existing in label noise cleaning algorithms. Quantitative experimental results on noisycompletelyatrandom (NCAR) and MultipleNoisySources (MNS) settings demonstrate our models have outstanding performance under noisy labels. Specially, the proposed AL-LDA has significant advantages relative to the state-of-the-art topic modeling approaches under massive label noise.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, W., Guo, B., Shen, Y., Yang, H., Chen, Y., & Suo, X. (2021). Robust supervised topic models under label noise. Machine Learning, 110(5), 907–931. https://doi.org/10.1007/s10994-021-05967-y

Robust supervised topic models under label noise

Abstract

Author supplied keywords

Cite

Register to see more suggestions