ZeroAE: Pre-trained Language Model based Autoencoder for Transductive Zero-shot Text Classification

Kaihao Guo; Hang Yu; Cong Liao; Jianguo Li; Haipeng Zhang

Conference ProceedingsOPEN ACCESS

ZeroAE: Pre-trained Language Model based Autoencoder for Transductive Zero-shot Text Classification

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2023) 3202-3219

DOI: 10.18653/v1/2023.findings-acl.200

4Citations

10Readers

Abstract

Many text classification tasks require handling unseen domains with plenty of unlabeled data, thus giving rise to the self-adaption or the so-called transductive zero-shot learning (TZSL) problem. However, current methods based solely on encoders or decoders overlook the possibility that these two modules may promote each other. As a first effort to bridge this gap, we propose an autoencoder named ZeroAE. Specifically, the text is encoded with two separate BERT-based encoders into two disentangled spaces, i.e., label-relevant (for classification) and label-irrelevant respectively. The two latent spaces are then decoded by prompting GPT-2 to recover the text as well as to further generate text with labels in the unseen domains to train the encoder in turn. To better exploit the unlabeled data, a novel indirect uncertainty-aware sampling (IUAS) approach is proposed to train ZeroAE. Extensive experiments show that ZeroAE largely surpasses the SOTA methods by 15.93% and 8.70% on average respectively in the label-partially-unseen and label-fully-unseen scenario. Notably, the label-fully-unseen ZeroAE even possesses superior performance to the label-partially-unseen SOTA methods.

Cite

CITATION STYLE

APA

Guo, K., Yu, H., Liao, C., Li, J., & Zhang, H. (2023). ZeroAE: Pre-trained Language Model based Autoencoder for Transductive Zero-shot Text Classification. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 3202–3219). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.200

ZeroAE: Pre-trained Language Model based Autoencoder for Transductive Zero-shot Text Classification

Abstract

Cite

Register to see more suggestions