Multi-pretraining for large-scale text classification

4Citations
Citations of this article
64Readers
Mendeley users who have this article in their library.

Abstract

Deep neural network-based pretraining methods have achieved impressive results in many natural language processing tasks including text classification. However, their applicability to large-scale text classification with numerous categories (e.g., several thousands) is yet to be well-studied, where the training data is insufficient and skewed in terms of categories. In addition, existing pretraining methods usually involve excessive computation and memory overheads. In this paper, we develop a novel multi-pretraining framework for large-scale text classification. This multi-pretraining framework includes both a self-supervised pretraining and a weakly supervised pretraining. We newly introduce an out-of-context words detection task on the unlabeled data as the self-supervised pretraining. It captures the topic-consistency of words used in sentences, which is proven to be useful for text classification. In addition, we propose a weakly supervised pretraining, where labels for text classification are obtained automatically from an existing approach. Experimental results clearly show that both pretraining approaches are effective for large-scale text classification task. The proposed scheme exhibits significant improvements as much as 3.8% in terms of macro-averaging F1-score over strong pretraining methods, while being computationally efficient.

Cite

CITATION STYLE

APA

Kim, K. M., Hyeon, B., Kim, Y., Park, J. H., & Lee, S. K. (2020). Multi-pretraining for large-scale text classification. In Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020 (pp. 2041–2050). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.findings-emnlp.185

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free