“Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors

65Citations
Citations of this article
134Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Deep neural networks (DNNs) are often used for text classification due to their high accuracy. However, DNNs can be computationally intensive, requiring millions of parameters and large amounts of labeled data, which can make them expensive to use, to optimize, and to transfer to out-of-distribution (OOD) cases in practice. In this paper, we propose a non-parametric alternative to DNNs that's easy, lightweight, and universal in text classification: a combination of a simple compressor like gzip with a k-nearest-neighbor classifier. Without any training parameters, our method achieves results that are competitive with non-pretrained deep learning methods on six in-distribution datasets. It even outperforms BERT on all five OOD datasets, including four low-resource languages. Our method also excels in the few-shot setting, where labeled data are too scarce to train DNNs effectively. Code is available at https://github.com/bazingagin/npc_gzip.

Cite

CITATION STYLE

APA

Jiang, Z., Yang, M. Y. R., Tsirlin, M., Tang, R., Dai, Y., & Lin, J. (2023). “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 6810–6828). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.426

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free