Towards Robustness to Label Noise in Text Classification via Noise Modeling

22Citations
Citations of this article
26Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Large datasets in NLP tend to suffer from noisy labels due to erroneous automatic and human annotation procedures. We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over the classifier. We first assign a probability score to each training sample of having a clean or noisy label, using a two-component beta mixture model fitted on the training losses at an early epoch. Using this, we jointly train the classifier and the noise model through a novel de-noising loss having two components: (i) cross-entropy of the noise model prediction with the input label, and (ii) cross-entropy of the classifier prediction with the input label, weighted by the probability of the sample having a clean label. Our empirical evaluation on two text classification tasks and two types of label noise: random and input-conditional, shows that our approach can improve classification accuracy, and prevent over-fitting to the noise.

Cite

CITATION STYLE

APA

Garg, S., Ramakrishnan, G., & Thumbe, V. (2021). Towards Robustness to Label Noise in Text Classification via Noise Modeling. In International Conference on Information and Knowledge Management, Proceedings (pp. 3024–3028). Association for Computing Machinery. https://doi.org/10.1145/3459637.3482204

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free