“Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors

Zhiying Jiang; Matthew Y.R. Yang; Mikhail Tsirlin; Raphael Tang; Yiqin Dai; Jimmy Lin

Conference Proceedings

“Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2023) 6810-6828

DOI: 10.18653/v1/2023.findings-acl.426

65Citations

134Readers

Get full text

Abstract

Deep neural networks (DNNs) are often used for text classification due to their high accuracy. However, DNNs can be computationally intensive, requiring millions of parameters and large amounts of labeled data, which can make them expensive to use, to optimize, and to transfer to out-of-distribution (OOD) cases in practice. In this paper, we propose a non-parametric alternative to DNNs that's easy, lightweight, and universal in text classification: a combination of a simple compressor like gzip with a k-nearest-neighbor classifier. Without any training parameters, our method achieves results that are competitive with non-pretrained deep learning methods on six in-distribution datasets. It even outperforms BERT on all five OOD datasets, including four low-resource languages. Our method also excels in the few-shot setting, where labeled data are too scarce to train DNNs effectively. Code is available at https://github.com/bazingagin/npc_gzip.

Cite

CITATION STYLE

APA

Jiang, Z., Yang, M. Y. R., Tsirlin, M., Tang, R., Dai, Y., & Lin, J. (2023). “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 6810–6828). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.426

“Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors

Abstract

Cite

Register to see more suggestions