EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs

2Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.

Abstract

Large language models (LLMs) have proven to be very superior to conventional methods in various tasks. However, their expensive computations and high memory requirements are prohibitive for deployment. Model quantization is an effective method for reducing this overhead. The problem is that in most previous works, the quantized model was calibrated using a few samples from the training data, which might affect the generalization of the quantized LLMs to unknown cases and tasks. Hence in this work, we explore an important question: Can we design a data-free quantization method for LLMs to guarantee its generalization performance? In this work, we propose EasyQuant, a training-free and data-free weight-only quantization algorithm for LLMs. Our observation indicates that two factors: outliers in the weight and quantization ranges, are essential for reducing the quantization error. Therefore, in EasyQuant, we leave the outliers (less than 1%) unchanged and optimize the quantization range to reduce the reconstruction error. With these methods, we surprisingly find that EasyQuant achieves comparable performance to the original model. Since EasyQuant does not depend on any training data, the generalization performance of quantized LLMs are safely guaranteed. Moreover, EasyQuant can be implemented in parallel so that the quantized model could be attained in a few minutes even for LLMs over 100B. To our best knowledge, we are the first work that achieves comparable performance with data-dependent algorithms under a data-free setting and our algorithm runs over 10 times faster than the data-dependent methods.

References Powered by Scopus

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

2343Citations
N/AReaders
Get full text

PiQA: An algebra for querying protein data sets

21Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Tang, H., Sun, Y., Wu, D., Liu, K., Zhu, J., & Kang, Z. (2023). EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 9119–9128). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.565

Readers over time

‘23‘24‘250481216

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 4

57%

Researcher 2

29%

Lecturer / Post doc 1

14%

Readers' Discipline

Tooltip

Computer Science 8

89%

Medicine and Dentistry 1

11%

Save time finding and organizing research with Mendeley

Sign up for free
0