A survey of model compression techniques: past, present, and future

36Citations
Citations of this article
48Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The exceptional performance of general-purpose large models has driven various industries to focus on developing domain-specific models. However, large models are not only time-consuming and labor-intensive during the training phase but also have very high hardware requirements during the inference phase, such as large memory and high computational power. These requirements pose considerable challenges for the practical deployment of large models. As these challenges intensify, model compression has become a vital research focus to address these limitations. This paper presents a comprehensive review of the evolution of model compression techniques, from their inception to future directions. To meet the urgent demand for efficient deployment, we delve into several compression methods—such as quantization, pruning, low-rank decomposition, and knowledge distillation—emphasizing their fundamental principles, recent advancements, and innovative strategies. By offering insights into the latest developments and their implications for practical applications, this review serves as a valuable technical resource for researchers and practitioners, providing a range of strategies for model deployment and laying the groundwork for future advancements in model compression.

Cite

CITATION STYLE

APA

Liu, D., Zhu, Y., Liu, Z., Liu, Y., Han, C., Tian, J., … Yi, W. (2025). A survey of model compression techniques: past, present, and future. Frontiers in Robotics and AI. Frontiers Media SA. https://doi.org/10.3389/frobt.2025.1518965

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free