Towards Robustness to Label Noise in Text Classification via Noise Modeling

Siddhant Garg; Goutham Ramakrishnan; Varun Thumbe

Conference ProceedingsOPEN ACCESS

Towards Robustness to Label Noise in Text Classification via Noise Modeling

International Conference on Information and Knowledge Management, Proceedings (2021) 3024-3028

DOI: 10.1145/3459637.3482204

23Citations

26Readers

Get full text

Abstract

Large datasets in NLP tend to suffer from noisy labels due to erroneous automatic and human annotation procedures. We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over the classifier. We first assign a probability score to each training sample of having a clean or noisy label, using a two-component beta mixture model fitted on the training losses at an early epoch. Using this, we jointly train the classifier and the noise model through a novel de-noising loss having two components: (i) cross-entropy of the noise model prediction with the input label, and (ii) cross-entropy of the classifier prediction with the input label, weighted by the probability of the sample having a clean label. Our empirical evaluation on two text classification tasks and two types of label noise: random and input-conditional, shows that our approach can improve classification accuracy, and prevent over-fitting to the noise.

Author supplied keywords

Cite

CITATION STYLE

APA

Garg, S., Ramakrishnan, G., & Thumbe, V. (2021). Towards Robustness to Label Noise in Text Classification via Noise Modeling. In International Conference on Information and Knowledge Management, Proceedings (pp. 3024–3028). Association for Computing Machinery. https://doi.org/10.1145/3459637.3482204

Towards Robustness to Label Noise in Text Classification via Noise Modeling

Abstract

Author supplied keywords

Cite

Register to see more suggestions