Compression-Aware training of neural networks using Frank-Wolfe

2Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Many existing Neural Network (NN) pruning approaches rely on either retraining or inducing a strong bias in order to converge to a sparse solution throughout training. A third paradigm, "compression-Aware"training, aims to obtain state-of-The-Art dense models that are robust to a wide range of compression ratios using a single dense training run while also avoiding retraining. We propose a framework centered around a versatile family of norm constraints and the Stochastic Frank-Wolfe (SFW) algorithm that encourage convergence to well-performing solutions while inducing robustness towards filter pruning and low-rank matrix decomposition. Our method outperforms existing compression-Aware approaches and, in the case of low-rank matrix decomposition, it also requires significantly less computational resources than approaches based on nuclear-norm regularization. Our findings indicate that dynamically adjusting the learning rate of SFW, as suggested by Pokutta et al. [57], is crucial for convergence and robustness of SFW-Trained models and we establish a theoretical foundation for that practice.

Cite

CITATION STYLE

APA

Zimmer, M., Spiegel, C., & Pokutta, S. (2025). Compression-Aware training of neural networks using Frank-Wolfe. In De Gruyter Proceedings in Mathematics (pp. 137–167). Walter de Gruyter GmbH. https://doi.org/10.1515/9783111376776-010

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free