Compression-Aware training of neural networks using Frank-Wolfe

Max Zimmer; Christoph Spiegel; Sebastian Pokutta

Conference ProceedingsOPEN ACCESS

Compression-Aware training of neural networks using Frank-Wolfe

De Gruyter Proceedings in Mathematics (2025) 137-167

DOI: 10.1515/9783111376776-010

2Citations

13Readers

Get full text

Abstract

Many existing Neural Network (NN) pruning approaches rely on either retraining or inducing a strong bias in order to converge to a sparse solution throughout training. A third paradigm, "compression-Aware"training, aims to obtain state-of-The-Art dense models that are robust to a wide range of compression ratios using a single dense training run while also avoiding retraining. We propose a framework centered around a versatile family of norm constraints and the Stochastic Frank-Wolfe (SFW) algorithm that encourage convergence to well-performing solutions while inducing robustness towards filter pruning and low-rank matrix decomposition. Our method outperforms existing compression-Aware approaches and, in the case of low-rank matrix decomposition, it also requires significantly less computational resources than approaches based on nuclear-norm regularization. Our findings indicate that dynamically adjusting the learning rate of SFW, as suggested by Pokutta et al. [57], is crucial for convergence and robustness of SFW-Trained models and we establish a theoretical foundation for that practice.

Author supplied keywords

Cite

CITATION STYLE

APA

Zimmer, M., Spiegel, C., & Pokutta, S. (2025). Compression-Aware training of neural networks using Frank-Wolfe. In De Gruyter Proceedings in Mathematics (pp. 137–167). Walter de Gruyter GmbH. https://doi.org/10.1515/9783111376776-010

Compression-Aware training of neural networks using Frank-Wolfe

Abstract

Author supplied keywords

Cite

Register to see more suggestions