From Label Smoothing to Label Relaxation

40Citations
Citations of this article
42Readers
Mendeley users who have this article in their library.

Abstract

Regularization of (deep) learning models can be realized at the model, loss, or data level. As a technique somewhere in-between loss and data, label smoothing turns deterministic class labels into probability distributions, for example by uniformly distributing a certain part of the probability mass over all classes. A predictive model is then trained on these distributions as targets, using cross-entropy as loss function. While this method has shown improved performance compared to non-smoothed cross-entropy, we argue that the use of a smoothed though still precise probability distribution as a target can be questioned from a theoretical perspective. As an alternative, we propose a generalized technique called label relaxation, in which the target is a set of probabilities represented in terms of an upper probability distribution. This leads to a genuine relaxation of the target instead of a distortion, thereby reducing the risk of incorporating an undesirable bias in the learning process. Methodically, label relaxation leads to the minimization of a novel type of loss function, for which we propose a suitable closed-form expression for model optimization. The effectiveness of the approach is demonstrated in an empirical study on image data.

Cite

CITATION STYLE

APA

Lienen, J., & Hüllermeier, E. (2021). From Label Smoothing to Label Relaxation. In 35th AAAI Conference on Artificial Intelligence, AAAI 2021 (Vol. 10A, pp. 8583–8591). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v35i10.17041

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free