Regularization of (deep) learning models can be realized at the model, loss, or data level. As a technique somewhere in-between loss and data, label smoothing turns deterministic class labels into probability distributions, for example by uniformly distributing a certain part of the probability mass over all classes. A predictive model is then trained on these distributions as targets, using cross-entropy as loss function. While this method has shown improved performance compared to non-smoothed cross-entropy, we argue that the use of a smoothed though still precise probability distribution as a target can be questioned from a theoretical perspective. As an alternative, we propose a generalized technique called label relaxation, in which the target is a set of probabilities represented in terms of an upper probability distribution. This leads to a genuine relaxation of the target instead of a distortion, thereby reducing the risk of incorporating an undesirable bias in the learning process. Methodically, label relaxation leads to the minimization of a novel type of loss function, for which we propose a suitable closed-form expression for model optimization. The effectiveness of the approach is demonstrated in an empirical study on image data.
CITATION STYLE
Lienen, J., & Hüllermeier, E. (2021). From Label Smoothing to Label Relaxation. In 35th AAAI Conference on Artificial Intelligence, AAAI 2021 (Vol. 10A, pp. 8583–8591). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v35i10.17041
Mendeley helps you to discover research relevant for your work.